qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] block: zero write detection
@ 2011-10-07 15:49 Stefan Hajnoczi
  2011-10-07 15:49 ` [Qemu-devel] [PATCH 1/3] block: add zero write detection interface Stefan Hajnoczi
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2011-10-07 15:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi

Image streaming copies data from the backing file into the image file.  It is
important to represent zero regions from the backing file efficiently during
streaming, otherwise the image file grows to the full virtual disk size and
loses sparseness.

There are two ways to implement zero write detection, they are subtly different:

1. Allow image formats to provide efficient representations for zero regions.
   QED does this with "zero clusters" and it has been discussed for qcow2v3.

2. During streaming, check for zeroes and skip writing to the image file when
   zeroes are detected.

However, there are some disadvantages to #2 because it leaves unallocated holes
in the image file.  If image streaming is aborted before it completes then it
will be necessary to reread all unallocated clusters from the backing file upon
resuming image streaming.  Potentionally worse is that a backing file over a
slow remote connection will have the zero regions fetched again and again if
the guest accesses them.  #1 avoids these problems because the image file
contains information on which regions are zeroes and do not need to be
refetched.

This patch series implements #1 with the existing QED zero cluster feature.  In
the future we can add qcow2v3 zero clusters too.  We can also implement #2
directly in the image streaming code as a fallback when the BlockDriver does
not support zero detection #1 itself.  That way we get the best possible zero
write detection, depending on the image format.

Here is a qemu-iotest to verify that zero write detection is working:
http://repo.or.cz/w/qemu-iotests/stefanha.git/commitdiff/226949695eef51bdcdea3e6ce3d7e5a863427f37

Stefan Hajnoczi (3):
  block: add zero write detection interface
  qed: add zero write detection support
  qemu-io: add zero write detection option

 block.c     |   16 +++++++++++
 block.h     |    2 +
 block/qed.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
 block_int.h |   13 +++++++++
 qemu-io.c   |   35 ++++++++++++++++++++-----
 5 files changed, 132 insertions(+), 15 deletions(-)

-- 
1.7.6.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 1/3] block: add zero write detection interface
  2011-10-07 15:49 [Qemu-devel] [PATCH 0/3] block: zero write detection Stefan Hajnoczi
@ 2011-10-07 15:49 ` Stefan Hajnoczi
  2011-10-07 15:49 ` [Qemu-devel] [PATCH 2/3] qed: add zero write detection support Stefan Hajnoczi
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2011-10-07 15:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi

Some image formats can represent zero regions efficiently even when a
backing file is present.  In order to use this feature they need to
detect zero writes and handle them specially.

Since zero write detection consumes CPU cycles it is disabled by default
and must be explicitly enabled.  This patch adds an interface to do so.

Currently no block drivers actually support zero write detection yet.
This is addressed in follow-up patches.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block.c     |   16 ++++++++++++++++
 block.h     |    2 ++
 block_int.h |   13 +++++++++++++
 3 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index e3fe97f..5cf53d6 100644
--- a/block.c
+++ b/block.c
@@ -481,6 +481,7 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
     bs->valid_key = 0;
     bs->open_flags = flags;
     bs->buffer_alignment = 512;
+    bs->use_zero_detection = false;
 
     pstrcpy(bs->filename, sizeof(bs->filename), filename);
 
@@ -3344,3 +3345,18 @@ out:
 
     return ret;
 }
+
+int bdrv_set_zero_detection(BlockDriverState *bs, bool enable)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (!drv) {
+        return -ENOMEDIUM;
+    }
+    if (!drv->has_zero_detection) {
+        return -ENOTSUP;
+    }
+
+    bs->use_zero_detection = enable;
+    return 0;
+}
diff --git a/block.h b/block.h
index 16bfa0a..283dc27 100644
--- a/block.h
+++ b/block.h
@@ -273,6 +273,8 @@ int bdrv_img_create(const char *filename, const char *fmt,
 void bdrv_set_buffer_alignment(BlockDriverState *bs, int align);
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
+int bdrv_set_zero_detection(BlockDriverState *bs, bool enable);
+
 #define BDRV_SECTORS_PER_DIRTY_CHUNK 2048
 
 void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable);
diff --git a/block_int.h b/block_int.h
index 8c3b863..3e8d768 100644
--- a/block_int.h
+++ b/block_int.h
@@ -146,6 +146,16 @@ struct BlockDriver {
      */
     int (*bdrv_has_zero_init)(BlockDriverState *bs);
 
+    /*
+     * True if zero write detection is supported, false otherwise.
+     *
+     * Block drivers that declare support for zero detection should check
+     * BlockDriverState.use_zero_detection for each write request to decide
+     * whether or not to perform detection.  Since zero detection consumes CPU
+     * cycles it is disabled by default.
+     */
+    bool has_zero_detection;
+
     QLIST_ENTRY(BlockDriver) list;
 };
 
@@ -195,6 +205,9 @@ struct BlockDriverState {
     /* do we need to tell the quest if we have a volatile write cache? */
     int enable_write_cache;
 
+    /* is zero write detection enabled? */
+    bool use_zero_detection;
+
     /* NOTE: the following infos are only hints for real hardware
        drivers. They are not used by the block driver */
     int cyls, heads, secs, translation;
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 2/3] qed: add zero write detection support
  2011-10-07 15:49 [Qemu-devel] [PATCH 0/3] block: zero write detection Stefan Hajnoczi
  2011-10-07 15:49 ` [Qemu-devel] [PATCH 1/3] block: add zero write detection interface Stefan Hajnoczi
@ 2011-10-07 15:49 ` Stefan Hajnoczi
  2011-10-07 15:49 ` [Qemu-devel] [PATCH 3/3] qemu-io: add zero write detection option Stefan Hajnoczi
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2011-10-07 15:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi

The QED image format is able to efficiently represent clusters
containing zeroes with a magic offset value.  This patch implements zero
write detection for allocating writes so that image streaming can copy
over zero clusters from a backing file without expanding the image file
unnecessarily.

This is based code by Anthony Liguori <aliguori@us.ibm.com>.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block/qed.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 73 insertions(+), 8 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index e87dc4d..ec3113b 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -947,9 +947,8 @@ static void qed_aio_write_l1_update(void *opaque, int ret)
 /**
  * Update L2 table with new cluster offsets and write them out
  */
-static void qed_aio_write_l2_update(void *opaque, int ret)
+static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
 {
-    QEDAIOCB *acb = opaque;
     BDRVQEDState *s = acb_to_s(acb);
     bool need_alloc = acb->find_cluster_ret == QED_CLUSTER_L1;
     int index;
@@ -965,7 +964,7 @@ static void qed_aio_write_l2_update(void *opaque, int ret)
 
     index = qed_l2_index(s, acb->cur_pos);
     qed_update_l2_table(s, acb->request.l2_table->table, index, acb->cur_nclusters,
-                         acb->cur_cluster);
+                         offset);
 
     if (need_alloc) {
         /* Write out the whole new L2 table */
@@ -982,6 +981,51 @@ err:
     qed_aio_complete(acb, ret);
 }
 
+static void qed_aio_write_l2_update_cb(void *opaque, int ret)
+{
+    QEDAIOCB *acb = opaque;
+    qed_aio_write_l2_update(acb, ret, acb->cur_cluster);
+}
+
+/**
+ * Determine if we have a zero write to a block of clusters
+ *
+ * We validate that the write is aligned to a cluster boundary, and that it's
+ * a multiple of cluster size with all zeros.
+ */
+static bool qed_is_zero_write(QEDAIOCB *acb)
+{
+    BDRVQEDState *s = acb_to_s(acb);
+    int i;
+
+    if (!qed_offset_is_cluster_aligned(s, acb->cur_pos)) {
+        return false;
+    }
+
+    if (!qed_offset_is_cluster_aligned(s, acb->cur_qiov.size)) {
+        return false;
+    }
+
+    for (i = 0; i < acb->cur_qiov.niov; i++) {
+        struct iovec *iov = &acb->cur_qiov.iov[i];
+        uint64_t *v;
+        int j;
+
+        if ((iov->iov_len & 0x07)) {
+            return false;
+        }
+
+        v = iov->iov_base;
+        for (j = 0; j < iov->iov_len; j += sizeof(v[0])) {
+            if (v[j >> 3]) {
+                return false;
+            }
+        }
+    }
+
+    return true;
+}
+
 /**
  * Flush new data clusters before updating the L2 table
  *
@@ -996,7 +1040,7 @@ static void qed_aio_write_flush_before_l2_update(void *opaque, int ret)
     QEDAIOCB *acb = opaque;
     BDRVQEDState *s = acb_to_s(acb);
 
-    if (!bdrv_aio_flush(s->bs->file, qed_aio_write_l2_update, opaque)) {
+    if (!bdrv_aio_flush(s->bs->file, qed_aio_write_l2_update_cb, opaque)) {
         qed_aio_complete(acb, -EIO);
     }
 }
@@ -1026,7 +1070,7 @@ static void qed_aio_write_main(void *opaque, int ret)
         if (s->bs->backing_hd) {
             next_fn = qed_aio_write_flush_before_l2_update;
         } else {
-            next_fn = qed_aio_write_l2_update;
+            next_fn = qed_aio_write_l2_update_cb;
         }
     }
 
@@ -1092,6 +1136,18 @@ static bool qed_should_set_need_check(BDRVQEDState *s)
     return !(s->header.features & QED_F_NEED_CHECK);
 }
 
+static void qed_aio_write_zero_cluster(void *opaque, int ret)
+{
+    QEDAIOCB *acb = opaque;
+
+    if (ret) {
+        qed_aio_complete(acb, ret);
+        return;
+    }
+
+    qed_aio_write_l2_update(acb, 0, 1);
+}
+
 /**
  * Write new data cluster
  *
@@ -1103,6 +1159,7 @@ static bool qed_should_set_need_check(BDRVQEDState *s)
 static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
 {
     BDRVQEDState *s = acb_to_s(acb);
+    BlockDriverCompletionFunc *cb;
 
     /* Cancel timer when the first allocating request comes in */
     if (QSIMPLEQ_EMPTY(&s->allocating_write_reqs)) {
@@ -1120,14 +1177,21 @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
 
     acb->cur_nclusters = qed_bytes_to_clusters(s,
             qed_offset_into_cluster(s, acb->cur_pos) + len);
-    acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
     qemu_iovec_copy(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
 
+    /* Zero write detection */
+    if (s->bs->use_zero_detection && qed_is_zero_write(acb)) {
+        cb = qed_aio_write_zero_cluster;
+    } else {
+        cb = qed_aio_write_prefill;
+        acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
+    }
+
     if (qed_should_set_need_check(s)) {
         s->header.features |= QED_F_NEED_CHECK;
-        qed_write_header(s, qed_aio_write_prefill, acb);
+        qed_write_header(s, cb, acb);
     } else {
-        qed_aio_write_prefill(acb, 0);
+        cb(acb, 0);
     }
 }
 
@@ -1474,6 +1538,7 @@ static BlockDriver bdrv_qed = {
     .format_name              = "qed",
     .instance_size            = sizeof(BDRVQEDState),
     .create_options           = qed_create_options,
+    .has_zero_detection       = true,
 
     .bdrv_probe               = bdrv_qed_probe,
     .bdrv_open                = bdrv_qed_open,
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH 3/3] qemu-io: add zero write detection option
  2011-10-07 15:49 [Qemu-devel] [PATCH 0/3] block: zero write detection Stefan Hajnoczi
  2011-10-07 15:49 ` [Qemu-devel] [PATCH 1/3] block: add zero write detection interface Stefan Hajnoczi
  2011-10-07 15:49 ` [Qemu-devel] [PATCH 2/3] qed: add zero write detection support Stefan Hajnoczi
@ 2011-10-07 15:49 ` Stefan Hajnoczi
  2011-10-09  9:52 ` [Qemu-devel] [PATCH 0/3] block: zero write detection Mars.cao
  2011-10-11 13:46 ` Kevin Wolf
  4 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2011-10-07 15:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi

Add a -z option to qemu-io and the 'open' command to enable zero write
detection.  This is used by the qemu-iotests 029 test case and allows
scripts to exercise zero write detection.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 qemu-io.c |   35 ++++++++++++++++++++++++++++-------
 1 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index e91af37..94beaf6 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -1593,7 +1593,7 @@ static const cmdinfo_t close_cmd = {
     .oneline    = "close the current open file",
 };
 
-static int openfile(char *name, int flags, int growable)
+static int openfile(char *name, int flags, int growable, int detect_zeroes)
 {
     if (bs) {
         fprintf(stderr, "file open already, try 'help close'\n");
@@ -1615,6 +1615,16 @@ static int openfile(char *name, int flags, int growable)
         }
     }
 
+    if (detect_zeroes) {
+        if (bdrv_set_zero_detection(bs, true) < 0) {
+            fprintf(stderr, "%s: format does not support zero detection\n",
+                    progname);
+            bdrv_delete(bs);
+            bs = NULL;
+            return 1;
+        }
+    }
+
     return 0;
 }
 
@@ -1632,6 +1642,7 @@ static void open_help(void)
 " -s, -- use snapshot file\n"
 " -n, -- disable host cache\n"
 " -g, -- allow file to grow (only applies to protocols)"
+" -z  -- use zero write detection (supported formats only)\n"
 "\n");
 }
 
@@ -1644,7 +1655,7 @@ static const cmdinfo_t open_cmd = {
     .argmin     = 1,
     .argmax     = -1,
     .flags      = CMD_NOFILE_OK,
-    .args       = "[-Crsn] [path]",
+    .args       = "[-Crsnz] [path]",
     .oneline    = "open the file specified by path",
     .help       = open_help,
 };
@@ -1654,9 +1665,10 @@ static int open_f(int argc, char **argv)
     int flags = 0;
     int readonly = 0;
     int growable = 0;
+    int detect_zeroes = 0;
     int c;
 
-    while ((c = getopt(argc, argv, "snrg")) != EOF) {
+    while ((c = getopt(argc, argv, "snrgz")) != EOF) {
         switch (c) {
         case 's':
             flags |= BDRV_O_SNAPSHOT;
@@ -1670,6 +1682,9 @@ static int open_f(int argc, char **argv)
         case 'g':
             growable = 1;
             break;
+        case 'z':
+            detect_zeroes = 1;
+            break;
         default:
             return command_usage(&open_cmd);
         }
@@ -1683,7 +1698,7 @@ static int open_f(int argc, char **argv)
         return command_usage(&open_cmd);
     }
 
-    return openfile(argv[optind], flags, growable);
+    return openfile(argv[optind], flags, growable, detect_zeroes);
 }
 
 static int init_args_command(int index)
@@ -1710,7 +1725,7 @@ static int init_check_command(const cmdinfo_t *ct)
 static void usage(const char *name)
 {
     printf(
-"Usage: %s [-h] [-V] [-rsnm] [-c cmd] ... [file]\n"
+"Usage: %s [-h] [-V] [-rsnmkz] [-c cmd] ... [file]\n"
 "QEMU Disk exerciser\n"
 "\n"
 "  -c, --cmd            command to execute\n"
@@ -1720,6 +1735,7 @@ static void usage(const char *name)
 "  -g, --growable       allow file to grow (only applies to protocols)\n"
 "  -m, --misalign       misalign allocations for O_DIRECT\n"
 "  -k, --native-aio     use kernel AIO implementation (on Linux only)\n"
+"  -z, --detect-zeroes  use zero write detection (supported formats only)\n"
 "  -h, --help           display this help and exit\n"
 "  -V, --version        output version information and exit\n"
 "\n",
@@ -1731,7 +1747,8 @@ int main(int argc, char **argv)
 {
     int readonly = 0;
     int growable = 0;
-    const char *sopt = "hVc:rsnmgk";
+    int detect_zeroes = 0;
+    const char *sopt = "hVc:rsnmgkz";
     const struct option lopt[] = {
         { "help", 0, NULL, 'h' },
         { "version", 0, NULL, 'V' },
@@ -1743,6 +1760,7 @@ int main(int argc, char **argv)
         { "misalign", 0, NULL, 'm' },
         { "growable", 0, NULL, 'g' },
         { "native-aio", 0, NULL, 'k' },
+        { "detect-zeroes", 0, NULL, 'z' },
         { NULL, 0, NULL, 0 }
     };
     int c;
@@ -1774,6 +1792,9 @@ int main(int argc, char **argv)
         case 'k':
             flags |= BDRV_O_NATIVE_AIO;
             break;
+        case 'z':
+            detect_zeroes = 1;
+            break;
         case 'V':
             printf("%s version %s\n", progname, VERSION);
             exit(0);
@@ -1823,7 +1844,7 @@ int main(int argc, char **argv)
     }
 
     if ((argc - optind) == 1) {
-        openfile(argv[optind], flags, growable);
+        openfile(argv[optind], flags, growable, detect_zeroes);
     }
     command_loop();
 
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] block: zero write detection
  2011-10-07 15:49 [Qemu-devel] [PATCH 0/3] block: zero write detection Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2011-10-07 15:49 ` [Qemu-devel] [PATCH 3/3] qemu-io: add zero write detection option Stefan Hajnoczi
@ 2011-10-09  9:52 ` Mars.cao
  2011-10-11 13:46 ` Kevin Wolf
  4 siblings, 0 replies; 9+ messages in thread
From: Mars.cao @ 2011-10-09  9:52 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Marcelo Tosatti, qemu-devel

On 10/07/2011 11:49 PM, Stefan Hajnoczi wrote:
> Image streaming copies data from the backing file into the image file.  It is
> important to represent zero regions from the backing file efficiently during
> streaming, otherwise the image file grows to the full virtual disk size and
> loses sparseness.
>
> There are two ways to implement zero write detection, they are subtly different:
>
> 1. Allow image formats to provide efficient representations for zero regions.
>     QED does this with "zero clusters" and it has been discussed for qcow2v3.
>
> 2. During streaming, check for zeroes and skip writing to the image file when
>     zeroes are detected.
>
> However, there are some disadvantages to #2 because it leaves unallocated holes
> in the image file.  If image streaming is aborted before it completes then it
> will be necessary to reread all unallocated clusters from the backing file upon
> resuming image streaming.  Potentionally worse is that a backing file over a
> slow remote connection will have the zero regions fetched again and again if
> the guest accesses them.  #1 avoids these problems because the image file
> contains information on which regions are zeroes and do not need to be
> refetched.
>
> This patch series implements #1 with the existing QED zero cluster feature.  In
> the future we can add qcow2v3 zero clusters too.  We can also implement #2
> directly in the image streaming code as a fallback when the BlockDriver does
> not support zero detection #1 itself.  That way we get the best possible zero
> write detection, depending on the image format.
>
> Here is a qemu-iotest to verify that zero write detection is working:
> http://repo.or.cz/w/qemu-iotests/stefanha.git/commitdiff/226949695eef51bdcdea3e6ce3d7e5a863427f37
>
> Stefan Hajnoczi (3):
>    block: add zero write detection interface
>    qed: add zero write detection support
>    qemu-io: add zero write detection option
>
>   block.c     |   16 +++++++++++
>   block.h     |    2 +
>   block/qed.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
>   block_int.h |   13 +++++++++
>   qemu-io.c   |   35 ++++++++++++++++++++-----
>   5 files changed, 132 insertions(+), 15 deletions(-)
>
I tested the patch by qemu-iotest 029 test case and also by manually, it 
worked as expected.

Tested-by: Cao,Bing Bu <mars@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] block: zero write detection
  2011-10-07 15:49 [Qemu-devel] [PATCH 0/3] block: zero write detection Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2011-10-09  9:52 ` [Qemu-devel] [PATCH 0/3] block: zero write detection Mars.cao
@ 2011-10-11 13:46 ` Kevin Wolf
  2011-10-12 10:39   ` Stefan Hajnoczi
  4 siblings, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2011-10-11 13:46 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Anthony Liguori, Marcelo Tosatti, qemu-devel

Am 07.10.2011 17:49, schrieb Stefan Hajnoczi:
> Image streaming copies data from the backing file into the image file.  It is
> important to represent zero regions from the backing file efficiently during
> streaming, otherwise the image file grows to the full virtual disk size and
> loses sparseness.
> 
> There are two ways to implement zero write detection, they are subtly different:
> 
> 1. Allow image formats to provide efficient representations for zero regions.
>    QED does this with "zero clusters" and it has been discussed for qcow2v3.
> 
> 2. During streaming, check for zeroes and skip writing to the image file when
>    zeroes are detected.
> 
> However, there are some disadvantages to #2 because it leaves unallocated holes
> in the image file.  If image streaming is aborted before it completes then it
> will be necessary to reread all unallocated clusters from the backing file upon
> resuming image streaming.  Potentionally worse is that a backing file over a
> slow remote connection will have the zero regions fetched again and again if
> the guest accesses them.  #1 avoids these problems because the image file
> contains information on which regions are zeroes and do not need to be
> refetched.
> 
> This patch series implements #1 with the existing QED zero cluster feature.  In
> the future we can add qcow2v3 zero clusters too.  We can also implement #2
> directly in the image streaming code as a fallback when the BlockDriver does
> not support zero detection #1 itself.  That way we get the best possible zero
> write detection, depending on the image format.
> 
> Here is a qemu-iotest to verify that zero write detection is working:
> http://repo.or.cz/w/qemu-iotests/stefanha.git/commitdiff/226949695eef51bdcdea3e6ce3d7e5a863427f37
> 
> Stefan Hajnoczi (3):
>   block: add zero write detection interface
>   qed: add zero write detection support
>   qemu-io: add zero write detection option
> 
>  block.c     |   16 +++++++++++
>  block.h     |    2 +
>  block/qed.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
>  block_int.h |   13 +++++++++
>  qemu-io.c   |   35 ++++++++++++++++++++-----
>  5 files changed, 132 insertions(+), 15 deletions(-)

It's good to have an option to detect zero writes and turn them into
zero clusters, but it's something that introduces some overhead and
probably won't be suitable as a default.

I think what we really want to have for image streaming is an API that
explicitly writes zeros and doesn't have to look at the whole buffer (or
actually doesn't even get a buffer).

The complementing feature on the source image side would be a way to let
the source image format signal that it would read only zeros (e.g.
because the cluster wasn't allocated or a zero cluster).

This would avoid memsetting a full buffer of zeros in bdrv_co_readv,
just to be able to pass this buffer to bdrv_co_writev, where the driver
loops over the whole buffer again in order to verify that it really only
contains zeros.

If you handle it at this generic block layer level, you could also treat
streaming writes differently than guest writes. (You would probably want
to search buffers by default for the former, but not for the latter)

Kevin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] block: zero write detection
  2011-10-11 13:46 ` Kevin Wolf
@ 2011-10-12 10:39   ` Stefan Hajnoczi
  2011-10-12 11:03     ` Kevin Wolf
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2011-10-12 10:39 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Anthony Liguori, Marcelo Tosatti, qemu-devel

On Tue, Oct 11, 2011 at 03:46:28PM +0200, Kevin Wolf wrote:
> Am 07.10.2011 17:49, schrieb Stefan Hajnoczi:
> > Image streaming copies data from the backing file into the image file.  It is
> > important to represent zero regions from the backing file efficiently during
> > streaming, otherwise the image file grows to the full virtual disk size and
> > loses sparseness.
> > 
> > There are two ways to implement zero write detection, they are subtly different:
> > 
> > 1. Allow image formats to provide efficient representations for zero regions.
> >    QED does this with "zero clusters" and it has been discussed for qcow2v3.
> > 
> > 2. During streaming, check for zeroes and skip writing to the image file when
> >    zeroes are detected.
> > 
> > However, there are some disadvantages to #2 because it leaves unallocated holes
> > in the image file.  If image streaming is aborted before it completes then it
> > will be necessary to reread all unallocated clusters from the backing file upon
> > resuming image streaming.  Potentionally worse is that a backing file over a
> > slow remote connection will have the zero regions fetched again and again if
> > the guest accesses them.  #1 avoids these problems because the image file
> > contains information on which regions are zeroes and do not need to be
> > refetched.
> > 
> > This patch series implements #1 with the existing QED zero cluster feature.  In
> > the future we can add qcow2v3 zero clusters too.  We can also implement #2
> > directly in the image streaming code as a fallback when the BlockDriver does
> > not support zero detection #1 itself.  That way we get the best possible zero
> > write detection, depending on the image format.
> > 
> > Here is a qemu-iotest to verify that zero write detection is working:
> > http://repo.or.cz/w/qemu-iotests/stefanha.git/commitdiff/226949695eef51bdcdea3e6ce3d7e5a863427f37
> > 
> > Stefan Hajnoczi (3):
> >   block: add zero write detection interface
> >   qed: add zero write detection support
> >   qemu-io: add zero write detection option
> > 
> >  block.c     |   16 +++++++++++
> >  block.h     |    2 +
> >  block/qed.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
> >  block_int.h |   13 +++++++++
> >  qemu-io.c   |   35 ++++++++++++++++++++-----
> >  5 files changed, 132 insertions(+), 15 deletions(-)
> 
> It's good to have an option to detect zero writes and turn them into
> zero clusters, but it's something that introduces some overhead and
> probably won't be suitable as a default.

Yes, this series simply has a bdrv_set_zero_detection() API to toggle it
at runtime.  By default it is off to save CPU cycles.

> I think what we really want to have for image streaming is an API that
> explicitly writes zeros and doesn't have to look at the whole buffer (or
> actually doesn't even get a buffer).

I didn't take this approach to avoid having block drivers handle the
zero buffers that need to be allocated when the region does not cover
entire clusters.  It can be done for sure but I'm not sure how to do it
nicely yet.

Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] block: zero write detection
  2011-10-12 10:39   ` Stefan Hajnoczi
@ 2011-10-12 11:03     ` Kevin Wolf
  2011-10-12 11:59       ` Stefan Hajnoczi
  0 siblings, 1 reply; 9+ messages in thread
From: Kevin Wolf @ 2011-10-12 11:03 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Anthony Liguori, Marcelo Tosatti, qemu-devel

Am 12.10.2011 12:39, schrieb Stefan Hajnoczi:
> On Tue, Oct 11, 2011 at 03:46:28PM +0200, Kevin Wolf wrote:
>> Am 07.10.2011 17:49, schrieb Stefan Hajnoczi:
>>> Image streaming copies data from the backing file into the image file.  It is
>>> important to represent zero regions from the backing file efficiently during
>>> streaming, otherwise the image file grows to the full virtual disk size and
>>> loses sparseness.
>>>
>>> There are two ways to implement zero write detection, they are subtly different:
>>>
>>> 1. Allow image formats to provide efficient representations for zero regions.
>>>    QED does this with "zero clusters" and it has been discussed for qcow2v3.
>>>
>>> 2. During streaming, check for zeroes and skip writing to the image file when
>>>    zeroes are detected.
>>>
>>> However, there are some disadvantages to #2 because it leaves unallocated holes
>>> in the image file.  If image streaming is aborted before it completes then it
>>> will be necessary to reread all unallocated clusters from the backing file upon
>>> resuming image streaming.  Potentionally worse is that a backing file over a
>>> slow remote connection will have the zero regions fetched again and again if
>>> the guest accesses them.  #1 avoids these problems because the image file
>>> contains information on which regions are zeroes and do not need to be
>>> refetched.
>>>
>>> This patch series implements #1 with the existing QED zero cluster feature.  In
>>> the future we can add qcow2v3 zero clusters too.  We can also implement #2
>>> directly in the image streaming code as a fallback when the BlockDriver does
>>> not support zero detection #1 itself.  That way we get the best possible zero
>>> write detection, depending on the image format.
>>>
>>> Here is a qemu-iotest to verify that zero write detection is working:
>>> http://repo.or.cz/w/qemu-iotests/stefanha.git/commitdiff/226949695eef51bdcdea3e6ce3d7e5a863427f37
>>>
>>> Stefan Hajnoczi (3):
>>>   block: add zero write detection interface
>>>   qed: add zero write detection support
>>>   qemu-io: add zero write detection option
>>>
>>>  block.c     |   16 +++++++++++
>>>  block.h     |    2 +
>>>  block/qed.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
>>>  block_int.h |   13 +++++++++
>>>  qemu-io.c   |   35 ++++++++++++++++++++-----
>>>  5 files changed, 132 insertions(+), 15 deletions(-)
>>
>> It's good to have an option to detect zero writes and turn them into
>> zero clusters, but it's something that introduces some overhead and
>> probably won't be suitable as a default.
> 
> Yes, this series simply has a bdrv_set_zero_detection() API to toggle it
> at runtime.  By default it is off to save CPU cycles.
> 
>> I think what we really want to have for image streaming is an API that
>> explicitly writes zeros and doesn't have to look at the whole buffer (or
>> actually doesn't even get a buffer).
> 
> I didn't take this approach to avoid having block drivers handle the
> zero buffers that need to be allocated when the region does not cover
> entire clusters.  It can be done for sure but I'm not sure how to do it
> nicely yet.

If I understand your QED code right, in such cases it ignores that there
are some zeros that could be turned into a zero cluster. Considering
this and that you always fill a buffer just to be able to check it
(which is known to take considerable time from qemu-img convert
experience) - how could any solution that works consistently, but
requires an allocation in the block driver be less nice?

Kevin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] block: zero write detection
  2011-10-12 11:03     ` Kevin Wolf
@ 2011-10-12 11:59       ` Stefan Hajnoczi
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Hajnoczi @ 2011-10-12 11:59 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Anthony Liguori, Marcelo Tosatti, Stefan Hajnoczi, qemu-devel

On Wed, Oct 12, 2011 at 12:03 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 12.10.2011 12:39, schrieb Stefan Hajnoczi:
>> On Tue, Oct 11, 2011 at 03:46:28PM +0200, Kevin Wolf wrote:
>>> Am 07.10.2011 17:49, schrieb Stefan Hajnoczi:
>>>> Image streaming copies data from the backing file into the image file.  It is
>>>> important to represent zero regions from the backing file efficiently during
>>>> streaming, otherwise the image file grows to the full virtual disk size and
>>>> loses sparseness.
>>>>
>>>> There are two ways to implement zero write detection, they are subtly different:
>>>>
>>>> 1. Allow image formats to provide efficient representations for zero regions.
>>>>    QED does this with "zero clusters" and it has been discussed for qcow2v3.
>>>>
>>>> 2. During streaming, check for zeroes and skip writing to the image file when
>>>>    zeroes are detected.
>>>>
>>>> However, there are some disadvantages to #2 because it leaves unallocated holes
>>>> in the image file.  If image streaming is aborted before it completes then it
>>>> will be necessary to reread all unallocated clusters from the backing file upon
>>>> resuming image streaming.  Potentionally worse is that a backing file over a
>>>> slow remote connection will have the zero regions fetched again and again if
>>>> the guest accesses them.  #1 avoids these problems because the image file
>>>> contains information on which regions are zeroes and do not need to be
>>>> refetched.
>>>>
>>>> This patch series implements #1 with the existing QED zero cluster feature.  In
>>>> the future we can add qcow2v3 zero clusters too.  We can also implement #2
>>>> directly in the image streaming code as a fallback when the BlockDriver does
>>>> not support zero detection #1 itself.  That way we get the best possible zero
>>>> write detection, depending on the image format.
>>>>
>>>> Here is a qemu-iotest to verify that zero write detection is working:
>>>> http://repo.or.cz/w/qemu-iotests/stefanha.git/commitdiff/226949695eef51bdcdea3e6ce3d7e5a863427f37
>>>>
>>>> Stefan Hajnoczi (3):
>>>>   block: add zero write detection interface
>>>>   qed: add zero write detection support
>>>>   qemu-io: add zero write detection option
>>>>
>>>>  block.c     |   16 +++++++++++
>>>>  block.h     |    2 +
>>>>  block/qed.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
>>>>  block_int.h |   13 +++++++++
>>>>  qemu-io.c   |   35 ++++++++++++++++++++-----
>>>>  5 files changed, 132 insertions(+), 15 deletions(-)
>>>
>>> It's good to have an option to detect zero writes and turn them into
>>> zero clusters, but it's something that introduces some overhead and
>>> probably won't be suitable as a default.
>>
>> Yes, this series simply has a bdrv_set_zero_detection() API to toggle it
>> at runtime.  By default it is off to save CPU cycles.
>>
>>> I think what we really want to have for image streaming is an API that
>>> explicitly writes zeros and doesn't have to look at the whole buffer (or
>>> actually doesn't even get a buffer).
>>
>> I didn't take this approach to avoid having block drivers handle the
>> zero buffers that need to be allocated when the region does not cover
>> entire clusters.  It can be done for sure but I'm not sure how to do it
>> nicely yet.
>
> If I understand your QED code right, in such cases it ignores that there
> are some zeros that could be turned into a zero cluster. Considering
> this and that you always fill a buffer just to be able to check it
> (which is known to take considerable time from qemu-img convert
> experience) - how could any solution that works consistently, but
> requires an allocation in the block driver be less nice?

The fallback is easy when you already have a buffer - just do the write :).

My point is that this patch is the simplest approach.  Other
approaches can optimize better and the question is whether they are
worth doing.

Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-10-12 11:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-07 15:49 [Qemu-devel] [PATCH 0/3] block: zero write detection Stefan Hajnoczi
2011-10-07 15:49 ` [Qemu-devel] [PATCH 1/3] block: add zero write detection interface Stefan Hajnoczi
2011-10-07 15:49 ` [Qemu-devel] [PATCH 2/3] qed: add zero write detection support Stefan Hajnoczi
2011-10-07 15:49 ` [Qemu-devel] [PATCH 3/3] qemu-io: add zero write detection option Stefan Hajnoczi
2011-10-09  9:52 ` [Qemu-devel] [PATCH 0/3] block: zero write detection Mars.cao
2011-10-11 13:46 ` Kevin Wolf
2011-10-12 10:39   ` Stefan Hajnoczi
2011-10-12 11:03     ` Kevin Wolf
2011-10-12 11:59       ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).