qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/3] block: zero write detection
@ 2011-12-07 12:10 Stefan Hajnoczi
  2011-12-07 12:10 ` [Qemu-devel] [PATCH v2 1/3] block: add zero write detection interface Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2011-12-07 12:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Marcelo Tosatti, Stefan Hajnoczi

This series adds an interface for optimized writes when data contains all
zeros.  If zero detection is enabled a block driver can take extra steps to
represent zero regions efficiently.

The details of optimized zero representations depend on the image format but
the main block layer change is a field to indicate that zero detection should
be performed.  (This can be CPU intensive since buffers are scanned for all
zeros and the feature is disabled by default.)

This series includes a patch for the QED image format that writes special "zero
clusters" and keeps the image file compact.  In the future qcow2v3 could also
support an efficient zero representation.

My motivation for this feature is efficient image streaming.  The destination
file must stay compact and not be bloated with clusters containing only zeroes
from the source file.

Kevin Wolf <kwolf@redhat.com> has suggested zero detection in the source image
to avoid scanning for zeroes when the source image has an efficient zero
representation.  This is a complementary feature which can be introduced in the
future but does not work for raw files over NFS, for example.  I see the
current series as the minimum support we need with extensions possible in the
future.

Here is a qemu-iotest to verify that zero write detection is working:
http://repo.or.cz/w/qemu-iotests/stefanha.git/commitdiff/226949695eef51bdcdea3e6ce3d7e5a863427f37

Stefan Hajnoczi (3):
  block: add zero write detection interface
  qed: add zero write detection support
  qemu-io: add zero write detection option

 block.c     |   16 ++++++++++++
 block.h     |    3 ++
 block/qed.c |   80 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
 block_int.h |    9 ++++++
 qemu-io.c   |   29 ++++++++++++++++-----
 5 files changed, 122 insertions(+), 15 deletions(-)

-- 
1.7.7.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v2 1/3] block: add zero write detection interface
  2011-12-07 12:10 [Qemu-devel] [PATCH v2 0/3] block: zero write detection Stefan Hajnoczi
@ 2011-12-07 12:10 ` Stefan Hajnoczi
  2011-12-07 12:10 ` [Qemu-devel] [PATCH v2 2/3] qed: add zero write detection support Stefan Hajnoczi
  2011-12-07 12:10 ` [Qemu-devel] [PATCH v2 3/3] qemu-io: add zero write detection option Stefan Hajnoczi
  2 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2011-12-07 12:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Marcelo Tosatti, Stefan Hajnoczi

Some image formats can represent zero regions efficiently even when a
backing file is present.  In order to use this feature they need to
detect zero writes and handle them specially.

Since zero write detection consumes CPU cycles it is disabled by default
and must be explicitly enabled.  This patch adds an interface to do so.

Currently no block drivers actually support zero write detection yet.
This is addressed in follow-up patches.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block.c     |   16 ++++++++++++++++
 block.h     |    3 +++
 block_int.h |    9 +++++++++
 3 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 3f072f6..5dcffb1 100644
--- a/block.c
+++ b/block.c
@@ -554,6 +554,21 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs)
     bs->copy_on_read--;
 }
 
+/**
+ * Multiple users may use this feature without worrying about clobbering its
+ * previous state.  It stays enabled until all users have called to disable it.
+ */
+void bdrv_enable_zero_detection(BlockDriverState *bs)
+{
+    bs->zero_detection++;
+}
+
+void bdrv_disable_zero_detection(BlockDriverState *bs)
+{
+    assert(bs->zero_detection > 0);
+    bs->zero_detection--;
+}
+
 /*
  * Common part for opening disk images and files
  */
@@ -574,6 +589,7 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
     bs->open_flags = flags;
     bs->growable = 0;
     bs->buffer_alignment = 512;
+    bs->zero_detection = 0;
 
     assert(bs->copy_on_read == 0); /* bdrv_new() and bdrv_close() make it so */
     if ((flags & BDRV_O_RDWR) && (flags & BDRV_O_COPY_ON_READ)) {
diff --git a/block.h b/block.h
index 1790f99..30b72e9 100644
--- a/block.h
+++ b/block.h
@@ -317,6 +317,9 @@ int64_t bdrv_get_dirty_count(BlockDriverState *bs);
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
 void bdrv_disable_copy_on_read(BlockDriverState *bs);
 
+void bdrv_enable_zero_detection(BlockDriverState *bs);
+void bdrv_disable_zero_detection(BlockDriverState *bs);
+
 void bdrv_set_in_use(BlockDriverState *bs, int in_use);
 int bdrv_in_use(BlockDriverState *bs);
 
diff --git a/block_int.h b/block_int.h
index 311bd2a..89a860c 100644
--- a/block_int.h
+++ b/block_int.h
@@ -247,6 +247,15 @@ struct BlockDriverState {
     /* do we need to tell the quest if we have a volatile write cache? */
     int enable_write_cache;
 
+    /*
+     * If zero write detection is enabled this field is != 0.
+     *
+     * Block drivers that support zero detection should check this field for
+     * each write request to decide whether or not to perform detection.  Since
+     * zero detection consumes CPU cycles it is disabled by default.
+     */
+    int zero_detection;
+
     /* NOTE: the following infos are only hints for real hardware
        drivers. They are not used by the block driver */
     int cyls, heads, secs, translation;
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v2 2/3] qed: add zero write detection support
  2011-12-07 12:10 [Qemu-devel] [PATCH v2 0/3] block: zero write detection Stefan Hajnoczi
  2011-12-07 12:10 ` [Qemu-devel] [PATCH v2 1/3] block: add zero write detection interface Stefan Hajnoczi
@ 2011-12-07 12:10 ` Stefan Hajnoczi
  2011-12-08 14:29   ` Mark Wu
  2011-12-07 12:10 ` [Qemu-devel] [PATCH v2 3/3] qemu-io: add zero write detection option Stefan Hajnoczi
  2 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2011-12-07 12:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Marcelo Tosatti, Stefan Hajnoczi

The QED image format is able to efficiently represent clusters
containing zeroes with a magic offset value.  This patch implements zero
write detection for allocating writes so that image streaming can copy
over zero clusters from a backing file without expanding the image file
unnecessarily.

This is based code by Anthony Liguori <aliguori@us.ibm.com>.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block/qed.c |   80 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index 8da3ebe..db4246a 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -941,9 +941,8 @@ static void qed_aio_write_l1_update(void *opaque, int ret)
 /**
  * Update L2 table with new cluster offsets and write them out
  */
-static void qed_aio_write_l2_update(void *opaque, int ret)
+static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
 {
-    QEDAIOCB *acb = opaque;
     BDRVQEDState *s = acb_to_s(acb);
     bool need_alloc = acb->find_cluster_ret == QED_CLUSTER_L1;
     int index;
@@ -959,7 +958,7 @@ static void qed_aio_write_l2_update(void *opaque, int ret)
 
     index = qed_l2_index(s, acb->cur_pos);
     qed_update_l2_table(s, acb->request.l2_table->table, index, acb->cur_nclusters,
-                         acb->cur_cluster);
+                         offset);
 
     if (need_alloc) {
         /* Write out the whole new L2 table */
@@ -976,6 +975,51 @@ err:
     qed_aio_complete(acb, ret);
 }
 
+static void qed_aio_write_l2_update_cb(void *opaque, int ret)
+{
+    QEDAIOCB *acb = opaque;
+    qed_aio_write_l2_update(acb, ret, acb->cur_cluster);
+}
+
+/**
+ * Determine if we have a zero write to a block of clusters
+ *
+ * We validate that the write is aligned to a cluster boundary, and that it's
+ * a multiple of cluster size with all zeros.
+ */
+static bool qed_is_zero_write(QEDAIOCB *acb)
+{
+    BDRVQEDState *s = acb_to_s(acb);
+    int i;
+
+    if (!qed_offset_is_cluster_aligned(s, acb->cur_pos)) {
+        return false;
+    }
+
+    if (!qed_offset_is_cluster_aligned(s, acb->cur_qiov.size)) {
+        return false;
+    }
+
+    for (i = 0; i < acb->cur_qiov.niov; i++) {
+        struct iovec *iov = &acb->cur_qiov.iov[i];
+        uint64_t *v;
+        int j;
+
+        if ((iov->iov_len & 0x07)) {
+            return false;
+        }
+
+        v = iov->iov_base;
+        for (j = 0; j < iov->iov_len; j += sizeof(v[0])) {
+            if (v[j >> 3]) {
+                return false;
+            }
+        }
+    }
+
+    return true;
+}
+
 /**
  * Flush new data clusters before updating the L2 table
  *
@@ -990,7 +1034,7 @@ static void qed_aio_write_flush_before_l2_update(void *opaque, int ret)
     QEDAIOCB *acb = opaque;
     BDRVQEDState *s = acb_to_s(acb);
 
-    if (!bdrv_aio_flush(s->bs->file, qed_aio_write_l2_update, opaque)) {
+    if (!bdrv_aio_flush(s->bs->file, qed_aio_write_l2_update_cb, opaque)) {
         qed_aio_complete(acb, -EIO);
     }
 }
@@ -1019,7 +1063,7 @@ static void qed_aio_write_main(void *opaque, int ret)
         if (s->bs->backing_hd) {
             next_fn = qed_aio_write_flush_before_l2_update;
         } else {
-            next_fn = qed_aio_write_l2_update;
+            next_fn = qed_aio_write_l2_update_cb;
         }
     }
 
@@ -1081,6 +1125,18 @@ static bool qed_should_set_need_check(BDRVQEDState *s)
     return !(s->header.features & QED_F_NEED_CHECK);
 }
 
+static void qed_aio_write_zero_cluster(void *opaque, int ret)
+{
+    QEDAIOCB *acb = opaque;
+
+    if (ret) {
+        qed_aio_complete(acb, ret);
+        return;
+    }
+
+    qed_aio_write_l2_update(acb, 0, 1);
+}
+
 /**
  * Write new data cluster
  *
@@ -1092,6 +1148,7 @@ static bool qed_should_set_need_check(BDRVQEDState *s)
 static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
 {
     BDRVQEDState *s = acb_to_s(acb);
+    BlockDriverCompletionFunc *cb;
 
     /* Cancel timer when the first allocating request comes in */
     if (QSIMPLEQ_EMPTY(&s->allocating_write_reqs)) {
@@ -1109,14 +1166,21 @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
 
     acb->cur_nclusters = qed_bytes_to_clusters(s,
             qed_offset_into_cluster(s, acb->cur_pos) + len);
-    acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
     qemu_iovec_copy(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
 
+    /* Zero write detection */
+    if (s->bs->zero_detection && qed_is_zero_write(acb)) {
+        cb = qed_aio_write_zero_cluster;
+    } else {
+        cb = qed_aio_write_prefill;
+        acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
+    }
+
     if (qed_should_set_need_check(s)) {
         s->header.features |= QED_F_NEED_CHECK;
-        qed_write_header(s, qed_aio_write_prefill, acb);
+        qed_write_header(s, cb, acb);
     } else {
-        qed_aio_write_prefill(acb, 0);
+        cb(acb, 0);
     }
 }
 
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v2 3/3] qemu-io: add zero write detection option
  2011-12-07 12:10 [Qemu-devel] [PATCH v2 0/3] block: zero write detection Stefan Hajnoczi
  2011-12-07 12:10 ` [Qemu-devel] [PATCH v2 1/3] block: add zero write detection interface Stefan Hajnoczi
  2011-12-07 12:10 ` [Qemu-devel] [PATCH v2 2/3] qed: add zero write detection support Stefan Hajnoczi
@ 2011-12-07 12:10 ` Stefan Hajnoczi
  2 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2011-12-07 12:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Marcelo Tosatti, Stefan Hajnoczi

Add a -z option to qemu-io and the 'open' command to enable zero write
detection.  This is used by the qemu-iotests 029 test case and allows
scripts to exercise zero write detection.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 qemu-io.c |   29 ++++++++++++++++++++++-------
 1 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index ffa62fb..1568870 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -1594,7 +1594,7 @@ static const cmdinfo_t close_cmd = {
     .oneline    = "close the current open file",
 };
 
-static int openfile(char *name, int flags, int growable)
+static int openfile(char *name, int flags, int growable, int detect_zeroes)
 {
     if (bs) {
         fprintf(stderr, "file open already, try 'help close'\n");
@@ -1617,6 +1617,10 @@ static int openfile(char *name, int flags, int growable)
         }
     }
 
+    if (detect_zeroes) {
+        bdrv_enable_zero_detection(bs);
+    }
+
     return 0;
 }
 
@@ -1634,6 +1638,7 @@ static void open_help(void)
 " -s, -- use snapshot file\n"
 " -n, -- disable host cache\n"
 " -g, -- allow file to grow (only applies to protocols)"
+" -z  -- use zero write detection (supported formats only)\n"
 "\n");
 }
 
@@ -1646,7 +1651,7 @@ static const cmdinfo_t open_cmd = {
     .argmin     = 1,
     .argmax     = -1,
     .flags      = CMD_NOFILE_OK,
-    .args       = "[-Crsn] [path]",
+    .args       = "[-Crsnz] [path]",
     .oneline    = "open the file specified by path",
     .help       = open_help,
 };
@@ -1656,9 +1661,10 @@ static int open_f(int argc, char **argv)
     int flags = 0;
     int readonly = 0;
     int growable = 0;
+    int detect_zeroes = 0;
     int c;
 
-    while ((c = getopt(argc, argv, "snrg")) != EOF) {
+    while ((c = getopt(argc, argv, "snrgz")) != EOF) {
         switch (c) {
         case 's':
             flags |= BDRV_O_SNAPSHOT;
@@ -1672,6 +1678,9 @@ static int open_f(int argc, char **argv)
         case 'g':
             growable = 1;
             break;
+        case 'z':
+            detect_zeroes = 1;
+            break;
         default:
             return command_usage(&open_cmd);
         }
@@ -1685,7 +1694,7 @@ static int open_f(int argc, char **argv)
         return command_usage(&open_cmd);
     }
 
-    return openfile(argv[optind], flags, growable);
+    return openfile(argv[optind], flags, growable, detect_zeroes);
 }
 
 static int init_args_command(int index)
@@ -1712,7 +1721,7 @@ static int init_check_command(const cmdinfo_t *ct)
 static void usage(const char *name)
 {
     printf(
-"Usage: %s [-h] [-V] [-rsnm] [-c cmd] ... [file]\n"
+"Usage: %s [-h] [-V] [-rsnmkz] [-c cmd] ... [file]\n"
 "QEMU Disk exerciser\n"
 "\n"
 "  -c, --cmd            command to execute\n"
@@ -1722,6 +1731,7 @@ static void usage(const char *name)
 "  -g, --growable       allow file to grow (only applies to protocols)\n"
 "  -m, --misalign       misalign allocations for O_DIRECT\n"
 "  -k, --native-aio     use kernel AIO implementation (on Linux only)\n"
+"  -z, --detect-zeroes  use zero write detection (supported formats only)\n"
 "  -h, --help           display this help and exit\n"
 "  -V, --version        output version information and exit\n"
 "\n",
@@ -1733,7 +1743,8 @@ int main(int argc, char **argv)
 {
     int readonly = 0;
     int growable = 0;
-    const char *sopt = "hVc:rsnmgk";
+    int detect_zeroes = 0;
+    const char *sopt = "hVc:rsnmgkz";
     const struct option lopt[] = {
         { "help", 0, NULL, 'h' },
         { "version", 0, NULL, 'V' },
@@ -1745,6 +1756,7 @@ int main(int argc, char **argv)
         { "misalign", 0, NULL, 'm' },
         { "growable", 0, NULL, 'g' },
         { "native-aio", 0, NULL, 'k' },
+        { "detect-zeroes", 0, NULL, 'z' },
         { NULL, 0, NULL, 0 }
     };
     int c;
@@ -1776,6 +1788,9 @@ int main(int argc, char **argv)
         case 'k':
             flags |= BDRV_O_NATIVE_AIO;
             break;
+        case 'z':
+            detect_zeroes = 1;
+            break;
         case 'V':
             printf("%s version %s\n", progname, VERSION);
             exit(0);
@@ -1825,7 +1840,7 @@ int main(int argc, char **argv)
     }
 
     if ((argc - optind) == 1) {
-        openfile(argv[optind], flags, growable);
+        openfile(argv[optind], flags, growable, detect_zeroes);
     }
     command_loop();
 
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] qed: add zero write detection support
  2011-12-07 12:10 ` [Qemu-devel] [PATCH v2 2/3] qed: add zero write detection support Stefan Hajnoczi
@ 2011-12-08 14:29   ` Mark Wu
  2011-12-08 15:54     ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Wu @ 2011-12-08 14:29 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Marcelo Tosatti, qemu-devel

I tried to optimize the zero detecting code with SSE instruction.   The 
idea comes from Paolo's patch "migration: vectorize is_dup_page".  It's 
expected to give us an noticeable improvement. But I didn't find any 
improvement in the qemu-io test even though I increased the image size 
to 5GB.  The following is my test patch.  Could you please review it to 
see if I made any mistake and SSE can help for zero detecting?

Thanks.


diff --git a/block/qed.c b/block/qed.c
index 75a44f3..61e4a27 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -998,6 +998,14 @@ static void qed_aio_write_l2_update_cb(void 
*opaque, int ret)
      qed_aio_write_l2_update(acb, ret, acb->cur_cluster);
  }

+#ifdef __SSE2__
+#include <emmintrin.h>
+#define VECTYPE        __m128i
+#define SPLAT(p)       _mm_set1_epi8(*(p))
+#define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) == 
0xFFFF)
+#define VECTYPE_ZERO   _mm_setzero_si128()
+#endif
+
  /**
   * Determine if we have a zero write to a block of clusters
   *
@@ -1027,6 +1035,19 @@ static bool qed_is_zero_write(QEDAIOCB *acb)
          }

          v = iov->iov_base;
+
+#ifdef __SSE2__
+       if ((iov->iov_len & 0x0f)) {
+            VECTYPE zero = VECTYPE_ZERO;
+            VECTYPE *p = (VECTYPE *)v;
+            for(j = 0; j < iov->iov_len / sizeof(VECTYPE); j++) {
+                 if (!ALL_EQ(p[j], zero)) {
+                    return false;
+                 }
+            }
+            continue;
+        }
+#endif
          for (j = 0; j < iov->iov_len; j += sizeof(v[0])) {
              if (v[j >> 3]) {
                  return false;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] qed: add zero write detection support
  2011-12-08 14:29   ` Mark Wu
@ 2011-12-08 15:54     ` Stefan Hajnoczi
  0 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2011-12-08 15:54 UTC (permalink / raw)
  To: Mark Wu; +Cc: Kevin Wolf, Marcelo Tosatti, Stefan Hajnoczi, qemu-devel

On Thu, Dec 8, 2011 at 2:29 PM, Mark Wu <wudxw@linux.vnet.ibm.com> wrote:
> I tried to optimize the zero detecting code with SSE instruction.   The idea
> comes from Paolo's patch "migration: vectorize is_dup_page".  It's expected
> to give us an noticeable improvement. But I didn't find any improvement in
> the qemu-io test even though I increased the image size to 5GB.  The
> following is my test patch.  Could you please review it to see if I made any
> mistake and SSE can help for zero detecting?

Please put the zero detection function in a common location before
adding serious optimization so that qemu-img.c:is_not_zero() can also
use it.

Out of interest here is the code generated by gcc 4.6.2 from the non-SSE code:

    1d50:	89 c2                	mov    %eax,%edx
    1d52:	c1 fa 03             	sar    $0x3,%edx
    1d55:	48 63 d2             	movslq %edx,%rdx
    1d58:	48 83 3c d6 00       	cmpq   $0x0,(%rsi,%rdx,8)
    1d5d:	0f 85 03 ff ff ff    	jne    1c66 <qed_aio_write_data+0x146>
    1d63:	83 c0 08             	add    $0x8,%eax
    1d66:	48 63 d0             	movslq %eax,%rdx
    1d69:	48 39 d1             	cmp    %rdx,%rcx
    1d6c:	77 e2                	ja     1d50 <qed_aio_write_data+0x230>

Once you have the zero detection code in a utility function it's easy
to write a small test program to run a performance benchmark.

Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-12-08 15:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-07 12:10 [Qemu-devel] [PATCH v2 0/3] block: zero write detection Stefan Hajnoczi
2011-12-07 12:10 ` [Qemu-devel] [PATCH v2 1/3] block: add zero write detection interface Stefan Hajnoczi
2011-12-07 12:10 ` [Qemu-devel] [PATCH v2 2/3] qed: add zero write detection support Stefan Hajnoczi
2011-12-08 14:29   ` Mark Wu
2011-12-08 15:54     ` Stefan Hajnoczi
2011-12-07 12:10 ` [Qemu-devel] [PATCH v2 3/3] qemu-io: add zero write detection option Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).