qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCHv5 0/3] block: optimize zero writes with bdrv_write_zeroes
@ 2014-05-17 22:58 Peter Lieven
  2014-05-17 22:58 ` [Qemu-devel] [PATCHv5 1/3] util: add qemu_iovec_is_zero Peter Lieven
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Peter Lieven @ 2014-05-17 22:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, Peter Lieven, stefanha, pbonzini

this series tries to optimize zero write requests
by automatically using bdrv_write_zeroes if it is
supported by the format. More details can be found
in the commit message to patch 3.

Changes:
v4->v5: - split patch into 3 separate patches [Kevin, Eric]
        - use QEMU_ALIGN_DOWN in qemu_iovec_is_zero [Kevin]
        - remove check for bs->file from the
          "pretty long condition" [Kevin]
        - added possible values for discard-zeroes to QemuOptsList [Eric]
        - fail if detect-zeroes = unmap is choosen without discard = unmap [Kevin]
        - show detect-zeroes mode always in hmp if enabled [Kevin]

v3->v4: - use QAPI generated enum and lookup table [Kevin]
        - added more details about the options in the comments
          of the qapi-schema [Eric]
        - changed the type of detect_zeroes from str to
          BlockdevDetectZeroesOptions. I left the name
          as is because it is consistent with e.g.
          BlockdevDiscardOptions or BlockdevAioOptions [Eric]
        - changed the parse function in blockdev_init to
          be generic usable for other enum parameters
        
v2->v3: - moved parameter parsing to blockdev_init
        - added per device detect_zeroes status to 
          hmp (info block -v) and qmp (query-block) [Eric]
        - added support to enable detect-zeroes also
          for hot added devices [Eric]
        - added missing entry to qemu_common_drive_opts
        - fixed description of qemu_iovec_is_zero [Fam]

v1->v2: - added tests to commit message (Markus)
RFCv2->v1: - fixed paramter parsing strncmp -> strcmp (Eric)
           - fixed typo (choosen->chosen) (Eric)
           - added second example to commit msg

RFCv1->RFCv2: - add detect-zeroes=off|on|unmap knob to drive cmdline parameter
              - call zero detection only for format (bs->file != NULL)

Peter Lieven (3):
  util: add qemu_iovec_is_zero
  blockdev: add a function to parse enum ids from strings
  block: optimize zero writes with bdrv_write_zeroes

 block.c                   |    9 ++++++++
 block/qapi.c              |    1 +
 blockdev.c                |   41 +++++++++++++++++++++++++++++++++++
 hmp.c                     |    5 +++++
 include/block/block_int.h |    1 +
 include/qemu-common.h     |    1 +
 qapi-schema.json          |   52 ++++++++++++++++++++++++++++++++-------------
 qemu-options.hx           |    6 ++++++
 qmp-commands.hx           |    3 +++
 util/iov.c                |   21 ++++++++++++++++++
 10 files changed, 125 insertions(+), 15 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCHv5 1/3] util: add qemu_iovec_is_zero
  2014-05-17 22:58 [Qemu-devel] [PATCHv5 0/3] block: optimize zero writes with bdrv_write_zeroes Peter Lieven
@ 2014-05-17 22:58 ` Peter Lieven
  2014-05-17 22:58 ` [Qemu-devel] [PATCHv5 2/3] blockdev: add a function to parse enum ids from strings Peter Lieven
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Peter Lieven @ 2014-05-17 22:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, Peter Lieven, stefanha, pbonzini

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 include/qemu-common.h |    1 +
 util/iov.c            |   21 +++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index 3f3fd60..66ceceb 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -318,6 +318,7 @@ void qemu_iovec_concat(QEMUIOVector *dst,
 void qemu_iovec_concat_iov(QEMUIOVector *dst,
                            struct iovec *src_iov, unsigned int src_cnt,
                            size_t soffset, size_t sbytes);
+bool qemu_iovec_is_zero(QEMUIOVector *qiov);
 void qemu_iovec_destroy(QEMUIOVector *qiov);
 void qemu_iovec_reset(QEMUIOVector *qiov);
 size_t qemu_iovec_to_buf(QEMUIOVector *qiov, size_t offset,
diff --git a/util/iov.c b/util/iov.c
index 6569b5a..622db81 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -335,6 +335,27 @@ void qemu_iovec_concat(QEMUIOVector *dst,
     qemu_iovec_concat_iov(dst, src->iov, src->niov, soffset, sbytes);
 }
 
+/*
+ *  check if the contents of the iovecs are all zero
+ */
+bool qemu_iovec_is_zero(QEMUIOVector *qiov)
+{
+    int i;
+    for (i = 0; i < qiov->niov; i++) {
+        size_t offs = QEMU_ALIGN_DOWN(qiov->iov[i].iov_len, 4 * sizeof(long));
+        uint8_t *ptr = qiov->iov[i].iov_base;
+        if (offs && !buffer_is_zero(qiov->iov[i].iov_base, offs)) {
+            return false;
+        }
+        for (; offs < qiov->iov[i].iov_len; offs++) {
+            if (ptr[offs]) {
+                return false;
+            }
+        }
+    }
+    return true;
+}
+
 void qemu_iovec_destroy(QEMUIOVector *qiov)
 {
     assert(qiov->nalloc != -1);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCHv5 2/3] blockdev: add a function to parse enum ids from strings
  2014-05-17 22:58 [Qemu-devel] [PATCHv5 0/3] block: optimize zero writes with bdrv_write_zeroes Peter Lieven
  2014-05-17 22:58 ` [Qemu-devel] [PATCHv5 1/3] util: add qemu_iovec_is_zero Peter Lieven
@ 2014-05-17 22:58 ` Peter Lieven
  2014-05-17 22:58 ` [Qemu-devel] [PATCHv5 3/3] block: optimize zero writes with bdrv_write_zeroes Peter Lieven
  2014-05-19 10:25 ` [Qemu-devel] [PATCHv5 0/3] " Kevin Wolf
  3 siblings, 0 replies; 5+ messages in thread
From: Peter Lieven @ 2014-05-17 22:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, Peter Lieven, stefanha, pbonzini

this adds a generic function to recover the enum id of a parameter
given as a string.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 blockdev.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 7810e9f..8358aa2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -288,6 +288,23 @@ static int parse_block_error_action(const char *buf, bool is_read, Error **errp)
     }
 }
 
+static inline int parse_enum_option(const char *lookup[], const char *buf, int max,
+                             int def, Error **errp)
+{
+    int i;
+    if (!buf) {
+        return def;
+    }
+    for (i = 0; i < max; i++) {
+        if (!strcmp(buf, lookup[i])) {
+            return i;
+        }
+    }
+    error_setg(errp, "invalid parameter value: %s",
+               buf);
+    return def;
+}
+
 static bool check_throttle_config(ThrottleConfig *cfg, Error **errp)
 {
     if (throttle_conflicting(cfg)) {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCHv5 3/3] block: optimize zero writes with bdrv_write_zeroes
  2014-05-17 22:58 [Qemu-devel] [PATCHv5 0/3] block: optimize zero writes with bdrv_write_zeroes Peter Lieven
  2014-05-17 22:58 ` [Qemu-devel] [PATCHv5 1/3] util: add qemu_iovec_is_zero Peter Lieven
  2014-05-17 22:58 ` [Qemu-devel] [PATCHv5 2/3] blockdev: add a function to parse enum ids from strings Peter Lieven
@ 2014-05-17 22:58 ` Peter Lieven
  2014-05-19 10:25 ` [Qemu-devel] [PATCHv5 0/3] " Kevin Wolf
  3 siblings, 0 replies; 5+ messages in thread
From: Peter Lieven @ 2014-05-17 22:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, Peter Lieven, stefanha, pbonzini

this patch tries to optimize zero write requests
by automatically using bdrv_write_zeroes if it is
supported by the format.

This significantly speeds up file system initialization and
should speed zero write test used to test backend storage
performance.

I ran the following 2 tests on my internal SSD with a
50G QCOW2 container and on an attached iSCSI storage.

a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX

QCOW2         [off]     [on]     [unmap]
-----
runtime:       14secs    1.1secs  1.1secs
filesize:      937M      18M      18M

iSCSI         [off]     [on]     [unmap]
----
runtime:       9.3s      0.9s     0.9s

b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct

QCOW2         [off]     [on]     [unmap]
-----
runtime:       246secs   18secs   18secs
filesize:      51G       192K     192K
throughput:    203M/s    2.3G/s   2.3G/s

iSCSI*        [off]     [on]     [unmap]
----
runtime:       8mins     45secs   33secs
throughput:    106M/s    1.2G/s   1.6G/s
allocated:     100%      100%     0%

* The storage was connected via an 1Gbit interface.
  It seems to internally handle writing zeroes
  via WRITESAME16 very fast.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block.c                   |    9 ++++++++
 block/qapi.c              |    1 +
 blockdev.c                |   24 +++++++++++++++++++++
 hmp.c                     |    5 +++++
 include/block/block_int.h |    1 +
 qapi-schema.json          |   52 ++++++++++++++++++++++++++++++++-------------
 qemu-options.hx           |    6 ++++++
 qmp-commands.hx           |    3 +++
 8 files changed, 86 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index c90c71a..0aafe67 100644
--- a/block.c
+++ b/block.c
@@ -3248,6 +3248,15 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
 
     ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
 
+    if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
+        !(flags & BDRV_REQ_ZERO_WRITE) && drv->bdrv_co_write_zeroes &&
+        qemu_iovec_is_zero(qiov)) {
+        flags |= BDRV_REQ_ZERO_WRITE;
+        if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
+            flags |= BDRV_REQ_MAY_UNMAP;
+        }
+    }
+
     if (ret < 0) {
         /* Do nothing, write notifier decided to fail this request */
     } else if (flags & BDRV_REQ_ZERO_WRITE) {
diff --git a/block/qapi.c b/block/qapi.c
index af11445..75f44f1 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -50,6 +50,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs)
     }
 
     info->backing_file_depth = bdrv_get_backing_file_depth(bs);
+    info->detect_zeroes = bs->detect_zeroes;
 
     if (bs->io_limits_enabled) {
         ThrottleConfig cfg;
diff --git a/blockdev.c b/blockdev.c
index 8358aa2..5ec55fa 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -341,6 +341,7 @@ static DriveInfo *blockdev_init(const char *file, QDict *bs_opts,
     QemuOpts *opts;
     const char *id;
     bool has_driver_specific_opts;
+    BlockdevDetectZeroesOptions detect_zeroes;
     BlockDriver *drv = NULL;
 
     /* Check common options by copying from bs_opts to opts, all other options
@@ -469,6 +470,24 @@ static DriveInfo *blockdev_init(const char *file, QDict *bs_opts,
         }
     }
 
+    detect_zeroes =
+        parse_enum_option(BlockdevDetectZeroesOptions_lookup,
+                          qemu_opt_get(opts, "detect-zeroes"),
+                          BLOCKDEV_DETECT_ZEROES_OPTIONS_MAX,
+                          BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF,
+                          &error);
+    if (error) {
+        error_propagate(errp, error);
+        goto early_err;
+    }
+
+    if (detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP &&
+        !(bdrv_flags & BDRV_O_UNMAP)) {
+        error_setg(errp, "setting detect-zeroes to unmap is not allowed "
+                         "without setting discard operation to unmap");
+        goto early_err;
+    }
+
     /* init */
     dinfo = g_malloc0(sizeof(*dinfo));
     dinfo->id = g_strdup(qemu_opts_id(opts));
@@ -479,6 +498,7 @@ static DriveInfo *blockdev_init(const char *file, QDict *bs_opts,
     }
     dinfo->bdrv->open_flags = snapshot ? BDRV_O_SNAPSHOT : 0;
     dinfo->bdrv->read_only = ro;
+    dinfo->bdrv->detect_zeroes = detect_zeroes;
     dinfo->refcount = 1;
     if (serial != NULL) {
         dinfo->serial = g_strdup(serial);
@@ -2472,6 +2492,10 @@ QemuOptsList qemu_common_drive_opts = {
             .name = "copy-on-read",
             .type = QEMU_OPT_BOOL,
             .help = "copy read data from backing file into image file",
+        },{
+            .name = "detect-zeroes",
+            .type = QEMU_OPT_STRING,
+            .help = "try to optimize zero writes (off, on, unmap)",
         },
         { /* end of list */ }
     },
diff --git a/hmp.c b/hmp.c
index 5c4d612..ff506f3 100644
--- a/hmp.c
+++ b/hmp.c
@@ -341,6 +341,11 @@ void hmp_info_block(Monitor *mon, const QDict *qdict)
                            info->value->inserted->backing_file_depth);
         }
 
+        if (info->value->inserted->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF) {
+            monitor_printf(mon, "    Detect zeroes:    %s\n",
+                           BlockdevDetectZeroesOptions_lookup[info->value->inserted->detect_zeroes]);
+        }
+
         if (info->value->inserted->bps
             || info->value->inserted->bps_rd
             || info->value->inserted->bps_wr
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 9ffcb69..b8cc926 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -364,6 +364,7 @@ struct BlockDriverState {
     BlockJob *job;
 
     QDict *options;
+    BlockdevDetectZeroesOptions detect_zeroes;
 };
 
 int get_tmp_filename(char *filename, int size);
diff --git a/qapi-schema.json b/qapi-schema.json
index 36cb964..70ed7e3 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -942,6 +942,8 @@
 # @encryption_key_missing: true if the backing device is encrypted but an
 #                          valid encryption key is missing
 #
+# @detect_zeroes: detect and optimize zero writes (Since 2.1)
+#
 # @bps: total throughput limit in bytes per second is specified
 #
 # @bps_rd: read throughput limit in bytes per second is specified
@@ -977,6 +979,7 @@
   'data': { 'file': 'str', '*node-name': 'str', 'ro': 'bool', 'drv': 'str',
             '*backing_file': 'str', 'backing_file_depth': 'int',
             'encrypted': 'bool', 'encryption_key_missing': 'bool',
+            'detect_zeroes': 'BlockdevDetectZeroesOptions',
             'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int',
             'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int',
             'image': 'ImageInfo',
@@ -4255,6 +4258,22 @@
   'data': [ 'ignore', 'unmap' ] }
 
 ##
+# @BlockdevDetectZeroesOptions
+#
+# Describes the operation mode for the automatic conversion of plain
+# zero writes by the OS to driver specific optimized zero write commands.
+#
+# @off:      Disabled (default)
+# @on:       Enabled
+# @unmap:    Enabled and even try to unmap blocks if possible. This requires
+#            also that @BlockdevDiscardOptions is set to unmap for this device.
+#
+# Since: 2.1
+##
+{ 'enum': 'BlockdevDetectZeroesOptions',
+  'data': [ 'off', 'on', 'unmap' ] }
+
+##
 # @BlockdevAioOptions
 #
 # Selects the AIO backend to handle I/O requests
@@ -4306,20 +4325,22 @@
 # Options that are available for all block devices, independent of the block
 # driver.
 #
-# @driver:      block driver name
-# @id:          #optional id by which the new block device can be referred to.
-#               This is a required option on the top level of blockdev-add, and
-#               currently not allowed on any other level.
-# @node-name:   #optional the name of a block driver state node (Since 2.0)
-# @discard:     #optional discard-related options (default: ignore)
-# @cache:       #optional cache-related options
-# @aio:         #optional AIO backend (default: threads)
-# @rerror:      #optional how to handle read errors on the device
-#               (default: report)
-# @werror:      #optional how to handle write errors on the device
-#               (default: enospc)
-# @read-only:   #optional whether the block device should be read-only
-#               (default: false)
+# @driver:        block driver name
+# @id:            #optional id by which the new block device can be referred to.
+#                 This is a required option on the top level of blockdev-add, and
+#                 currently not allowed on any other level.
+# @node-name:     #optional the name of a block driver state node (Since 2.0)
+# @discard:       #optional discard-related options (default: ignore)
+# @cache:         #optional cache-related options
+# @aio:           #optional AIO backend (default: threads)
+# @rerror:        #optional how to handle read errors on the device
+#                 (default: report)
+# @werror:        #optional how to handle write errors on the device
+#                 (default: enospc)
+# @read-only:     #optional whether the block device should be read-only
+#                 (default: false)
+# @detect-zeroes: #optional detect and optimize zero writes (Since 2.1)
+#                 (default: off)
 #
 # Since: 1.7
 ##
@@ -4332,7 +4353,8 @@
             '*aio': 'BlockdevAioOptions',
             '*rerror': 'BlockdevOnError',
             '*werror': 'BlockdevOnError',
-            '*read-only': 'bool' } }
+            '*read-only': 'bool',
+            '*detect-zeroes': 'BlockdevDetectZeroesOptions' } }
 
 ##
 # @BlockdevOptionsFile
diff --git a/qemu-options.hx b/qemu-options.hx
index 781af14..167712f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -414,6 +414,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,serial=s][,addr=A][,rerror=ignore|stop|report]\n"
     "       [,werror=ignore|stop|report|enospc][,id=name][,aio=threads|native]\n"
     "       [,readonly=on|off][,copy-on-read=on|off]\n"
+    "       [,detect-zeroes=on|off|unmap]\n"
     "       [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]]\n"
     "       [[,iops=i]|[[,iops_rd=r][,iops_wr=w]]]\n"
     "       [[,bps_max=bm]|[[,bps_rd_max=rm][,bps_wr_max=wm]]]\n"
@@ -475,6 +476,11 @@ Open drive @option{file} as read-only. Guest write attempts will fail.
 @item copy-on-read=@var{copy-on-read}
 @var{copy-on-read} is "on" or "off" and enables whether to copy read backing
 file sectors into the image file.
+@item detect-zeroes=@var{detect-zeroes}
+@var{detect-zeroes} is "off", "on" or "unmap" and enables the automatic
+conversion of plain zero writes by the OS to driver specific optimized
+zero write commands. You may even choose "unmap" if @var{discard} is set
+to "unmap" to allow a zero write to be converted to an UNMAP operation.
 @end table
 
 By default, the @option{cache=writeback} mode is used. It will report data
diff --git a/qmp-commands.hx b/qmp-commands.hx
index cae890e..2c53c1b 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2032,6 +2032,8 @@ Each json-object contain the following:
          - "iops_rd_max":  read I/O operations max (json-int)
          - "iops_wr_max":  write I/O operations max (json-int)
          - "iops_size": I/O size when limiting by iops (json-int)
+         - "detect_zeroes": detect and optimize zero writing (json-string)
+             - Possible values: "off", "on", "unmap"
          - "image": the detail of the image, it is a json-object containing
             the following:
              - "filename": image file name (json-string)
@@ -2108,6 +2110,7 @@ Example:
                "iops_rd_max": 0,
                "iops_wr_max": 0,
                "iops_size": 0,
+               "detect_zeroes": "on",
                "image":{
                   "filename":"disks/test.qcow2",
                   "format":"qcow2",
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCHv5 0/3] block: optimize zero writes with bdrv_write_zeroes
  2014-05-17 22:58 [Qemu-devel] [PATCHv5 0/3] block: optimize zero writes with bdrv_write_zeroes Peter Lieven
                   ` (2 preceding siblings ...)
  2014-05-17 22:58 ` [Qemu-devel] [PATCHv5 3/3] block: optimize zero writes with bdrv_write_zeroes Peter Lieven
@ 2014-05-19 10:25 ` Kevin Wolf
  3 siblings, 0 replies; 5+ messages in thread
From: Kevin Wolf @ 2014-05-19 10:25 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, famz, qemu-devel, stefanha

Am 18.05.2014 um 00:58 hat Peter Lieven geschrieben:
> this series tries to optimize zero write requests
> by automatically using bdrv_write_zeroes if it is
> supported by the format. More details can be found
> in the commit message to patch 3.

Thanks, fixed up some formatting and applied to the block branch.

Kevin

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-05-19 10:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-17 22:58 [Qemu-devel] [PATCHv5 0/3] block: optimize zero writes with bdrv_write_zeroes Peter Lieven
2014-05-17 22:58 ` [Qemu-devel] [PATCHv5 1/3] util: add qemu_iovec_is_zero Peter Lieven
2014-05-17 22:58 ` [Qemu-devel] [PATCHv5 2/3] blockdev: add a function to parse enum ids from strings Peter Lieven
2014-05-17 22:58 ` [Qemu-devel] [PATCHv5 3/3] block: optimize zero writes with bdrv_write_zeroes Peter Lieven
2014-05-19 10:25 ` [Qemu-devel] [PATCHv5 0/3] " Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).