[Qemu-devel] [PATCH v2 0/3] qcow2 compress threads

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 0/3] qcow2 compress threads
@ 2018-06-20 14:48 Vladimir Sementsov-Ogievskiy
  2018-06-20 14:48 ` [Qemu-devel] [PATCH v2 1/3] qemu-img: allow compressed not-in-order writes Vladimir Sementsov-Ogievskiy
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-20 14:48 UTC (permalink / raw)
  To: qemu-block, qemu-devel; +Cc: mreitz, kwolf, stefanha, pl, den, vsementsov

Hi all!

Here are compress threads for qcow2, to increase performance of
compressed writes.

v2 changes:

02: fix typo in commit msg
    keep "qemu/osdep.h" to be the first included header,
    fix comment style

===========

I've created the following test:

[]# cat ../gen.sh 
#!/bin/bash

echo 'create pattern-file t_pat'

./qemu-img create -f raw t_pat 1000m
./qemu-io -c 'write -P 0xab 0 1000m' t_pat

echo 'create randod t_rand'

dd if=/dev/urandom of=t_rand bs=1M count=1000

[]# cat ../test.sh 
#!/bin/bash

rm -f t_out

echo 'test pattern-file compression'

time ./qemu-img convert -W -f raw -O qcow2 -c t_pat t_out

rm -f t_out

echo 'test random-file compression'

time ./qemu-img convert -W -f raw -O qcow2 -c t_rand t_out

rm -f t_out


and results before the series (and without -W flag):

test pattern-file compression

real    0m16.658s
user    0m16.450s
sys     0m0.628s
test random-file compression

real    0m24.194s
user    0m24.361s
sys     0m0.395s

results with -W flag, after first patch:

test pattern-file compression

real    0m16.242s
user    0m16.895s
sys     0m0.080s
test random-file compression

real    0m23.450s
user    0m23.767s
sys     0m1.085s

results with -W flag, after third patch:

test pattern-file compression

real    0m5.747s
user    0m22.637s
sys     0m0.393s
test random-file compression

real    0m8.402s
user    0m33.315s
sys     0m0.926s

So, we see significant performance gain. But this of course don't work
without -W flag.

results without -W flag, after third patch:

test pattern-file compression

real    0m16.908s
user    0m16.775s
sys     0m0.589s
test random-file compression

real    0m24.913s
user    0m24.586s
sys     0m0.898s

Note: my cpu is 4-cores 8-threads i7-4790

Vladimir Sementsov-Ogievskiy (3):
  qemu-img: allow compressed not-in-order writes
  qcow2: refactor data compression
  qcow2: add compress threads

 block/qcow2.h |   3 ++
 block/qcow2.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++------------
 qemu-img.c    |   5 ---
 3 files changed, 112 insertions(+), 32 deletions(-)

-- 
2.11.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH v2 1/3] qemu-img: allow compressed not-in-order writes
  2018-06-20 14:48 [Qemu-devel] [PATCH v2 0/3] qcow2 compress threads Vladimir Sementsov-Ogievskiy
@ 2018-06-20 14:48 ` Vladimir Sementsov-Ogievskiy
  2018-06-20 14:48 ` [Qemu-devel] [PATCH v2 2/3] qcow2: refactor data compression Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-20 14:48 UTC (permalink / raw)
  To: qemu-block, qemu-devel; +Cc: mreitz, kwolf, stefanha, pl, den, vsementsov

No reason to forbid them, and they are needed to improve performance
with compress-threads in further patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qemu-img.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index e1a506f7f6..7651d8172c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2141,11 +2141,6 @@ static int img_convert(int argc, char **argv)
         goto fail_getopt;
     }
 
-    if (!s.wr_in_order && s.compressed) {
-        error_report("Out of order write and compress are mutually exclusive");
-        goto fail_getopt;
-    }
-
     if (tgt_image_opts && !skip_create) {
         error_report("--target-image-opts requires use of -n flag");
         goto fail_getopt;
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH v2 2/3] qcow2: refactor data compression
  2018-06-20 14:48 [Qemu-devel] [PATCH v2 0/3] qcow2 compress threads Vladimir Sementsov-Ogievskiy
  2018-06-20 14:48 ` [Qemu-devel] [PATCH v2 1/3] qemu-img: allow compressed not-in-order writes Vladimir Sementsov-Ogievskiy
@ 2018-06-20 14:48 ` Vladimir Sementsov-Ogievskiy
  2018-06-20 14:48 ` [Qemu-devel] [PATCH v2 3/3] qcow2: add compress threads Vladimir Sementsov-Ogievskiy
  2018-06-29 18:06 ` [Qemu-devel] [PATCH v2 0/3] qcow2 " Kevin Wolf
  3 siblings, 0 replies; 5+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-20 14:48 UTC (permalink / raw)
  To: qemu-block, qemu-devel; +Cc: mreitz, kwolf, stefanha, pl, den, vsementsov

Make a separate function for compression to be parallelized later.
 - use .avail_out field instead of .next_out to calculate size of
   compressed data. It looks more natural and it allows to keep dest to
   be void pointer
 - set avail_out to be at least one byte less than input, to be sure
   avoid inefficient compression earlier

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.c | 76 ++++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 49 insertions(+), 27 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 945132f692..e431c73e0d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -23,11 +23,14 @@
  */
 
 #include "qemu/osdep.h"
+
+#define ZLIB_CONST
+#include <zlib.h>
+
 #include "block/block_int.h"
 #include "block/qdict.h"
 #include "sysemu/block-backend.h"
 #include "qemu/module.h"
-#include <zlib.h>
 #include "qcow2.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
@@ -3671,6 +3674,46 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset,
     return 0;
 }
 
+/*
+ * qcow2_compress()
+ *
+ * @dest - destination buffer, at least of @size-1 bytes
+ * @src - source buffer, @size bytes
+ *
+ * Returns: compressed size on success
+ *          -1 if compression is inefficient
+ *          -2 on any other error
+ */
+static ssize_t qcow2_compress(void *dest, const void *src, size_t size)
+{
+    ssize_t ret;
+    z_stream strm;
+
+    /* best compression, small window, no zlib header */
+    memset(&strm, 0, sizeof(strm));
+    ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION, Z_DEFLATED,
+                       -12, 9, Z_DEFAULT_STRATEGY);
+    if (ret != 0) {
+        return -2;
+    }
+
+    strm.avail_in = size;
+    strm.next_in = src;
+    strm.avail_out = size - 1;
+    strm.next_out = dest;
+
+    ret = deflate(&strm, Z_FINISH);
+    if (ret == Z_STREAM_END) {
+        ret = size - 1 - strm.avail_out;
+    } else {
+        ret = (ret == Z_OK ? -1 : -2);
+    }
+
+    deflateEnd(&strm);
+
+    return ret;
+}
+
 /* XXX: put compressed sectors first, then all the cluster aligned
    tables to avoid losing bytes in alignment */
 static coroutine_fn int
@@ -3680,8 +3723,8 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
     BDRVQcow2State *s = bs->opaque;
     QEMUIOVector hd_qiov;
     struct iovec iov;
-    z_stream strm;
-    int ret, out_len;
+    int ret;
+    size_t out_len;
     uint8_t *buf, *out_buf;
     int64_t cluster_offset;
 
@@ -3714,32 +3757,11 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
 
     out_buf = g_malloc(s->cluster_size);
 
-    /* best compression, small window, no zlib header */
-    memset(&strm, 0, sizeof(strm));
-    ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION,
-                       Z_DEFLATED, -12,
-                       9, Z_DEFAULT_STRATEGY);
-    if (ret != 0) {
+    out_len = qcow2_compress(out_buf, buf, s->cluster_size);
+    if (out_len == -2) {
         ret = -EINVAL;
         goto fail;
-    }
-
-    strm.avail_in = s->cluster_size;
-    strm.next_in = (uint8_t *)buf;
-    strm.avail_out = s->cluster_size;
-    strm.next_out = out_buf;
-
-    ret = deflate(&strm, Z_FINISH);
-    if (ret != Z_STREAM_END && ret != Z_OK) {
-        deflateEnd(&strm);
-        ret = -EINVAL;
-        goto fail;
-    }
-    out_len = strm.next_out - out_buf;
-
-    deflateEnd(&strm);
-
-    if (ret != Z_STREAM_END || out_len >= s->cluster_size) {
+    } else if (out_len == -1) {
         /* could not compress: write normal cluster */
         ret = qcow2_co_pwritev(bs, offset, bytes, qiov, 0);
         if (ret < 0) {
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH v2 3/3] qcow2: add compress threads
  2018-06-20 14:48 [Qemu-devel] [PATCH v2 0/3] qcow2 compress threads Vladimir Sementsov-Ogievskiy
  2018-06-20 14:48 ` [Qemu-devel] [PATCH v2 1/3] qemu-img: allow compressed not-in-order writes Vladimir Sementsov-Ogievskiy
  2018-06-20 14:48 ` [Qemu-devel] [PATCH v2 2/3] qcow2: refactor data compression Vladimir Sementsov-Ogievskiy
@ 2018-06-20 14:48 ` Vladimir Sementsov-Ogievskiy
  2018-06-29 18:06 ` [Qemu-devel] [PATCH v2 0/3] qcow2 " Kevin Wolf
  3 siblings, 0 replies; 5+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-20 14:48 UTC (permalink / raw)
  To: qemu-block, qemu-devel; +Cc: mreitz, kwolf, stefanha, pl, den, vsementsov

Do data compression in separate threads. This significantly improve
performance for qemu-img convert with -W (allow async writes) and -c
(compressed) options.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.h |  3 +++
 block/qcow2.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 01b5250415..0bd21623c2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -326,6 +326,9 @@ typedef struct BDRVQcow2State {
      * override) */
     char *image_backing_file;
     char *image_backing_format;
+
+    CoQueue compress_wait_queue;
+    int nb_compress_threads;
 } BDRVQcow2State;
 
 typedef struct Qcow2COWRegion {
diff --git a/block/qcow2.c b/block/qcow2.c
index e431c73e0d..362d9452f4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -44,6 +44,7 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qapi-visit-block-core.h"
 #include "crypto.h"
+#include "block/thread-pool.h"
 
 /*
   Differences with QCOW:
@@ -1546,6 +1547,9 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
         qcow2_check_refcounts(bs, &result, 0);
     }
 #endif
+
+    qemu_co_queue_init(&s->compress_wait_queue);
+
     return ret;
 
  fail:
@@ -3714,6 +3718,62 @@ static ssize_t qcow2_compress(void *dest, const void *src, size_t size)
     return ret;
 }
 
+#define MAX_COMPRESS_THREADS 4
+
+typedef struct Qcow2CompressData {
+    void *dest;
+    const void *src;
+    size_t size;
+    ssize_t ret;
+} Qcow2CompressData;
+
+static int qcow2_compress_pool_func(void *opaque)
+{
+    Qcow2CompressData *data = opaque;
+
+    data->ret = qcow2_compress(data->dest, data->src, data->size);
+
+    return 0;
+}
+
+static void qcow2_compress_complete(void *opaque, int ret)
+{
+    qemu_coroutine_enter(opaque);
+}
+
+/* See qcow2_compress definition for parameters description */
+static ssize_t qcow2_co_compress(BlockDriverState *bs,
+                                 void *dest, const void *src, size_t size)
+{
+    BDRVQcow2State *s = bs->opaque;
+    BlockAIOCB *acb;
+    ThreadPool *pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
+    Qcow2CompressData arg = {
+        .dest = dest,
+        .src = src,
+        .size = size,
+    };
+
+    while (s->nb_compress_threads >= MAX_COMPRESS_THREADS) {
+        qemu_co_queue_wait(&s->compress_wait_queue, NULL);
+    }
+
+    s->nb_compress_threads++;
+    acb = thread_pool_submit_aio(pool, qcow2_compress_pool_func, &arg,
+                                 qcow2_compress_complete,
+                                 qemu_coroutine_self());
+
+    if (!acb) {
+        s->nb_compress_threads--;
+        return -EINVAL;
+    }
+    qemu_coroutine_yield();
+    s->nb_compress_threads--;
+    qemu_co_queue_next(&s->compress_wait_queue);
+
+    return arg.ret;
+}
+
 /* XXX: put compressed sectors first, then all the cluster aligned
    tables to avoid losing bytes in alignment */
 static coroutine_fn int
@@ -3757,7 +3817,7 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
 
     out_buf = g_malloc(s->cluster_size);
 
-    out_len = qcow2_compress(out_buf, buf, s->cluster_size);
+    out_len = qcow2_co_compress(bs, out_buf, buf, s->cluster_size);
     if (out_len == -2) {
         ret = -EINVAL;
         goto fail;
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] qcow2 compress threads
  2018-06-20 14:48 [Qemu-devel] [PATCH v2 0/3] qcow2 compress threads Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2018-06-20 14:48 ` [Qemu-devel] [PATCH v2 3/3] qcow2: add compress threads Vladimir Sementsov-Ogievskiy
@ 2018-06-29 18:06 ` Kevin Wolf
  3 siblings, 0 replies; 5+ messages in thread
From: Kevin Wolf @ 2018-06-29 18:06 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-block, qemu-devel, mreitz, stefanha, pl, den

Am 20.06.2018 um 16:48 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Hi all!
> 
> Here are compress threads for qcow2, to increase performance of
> compressed writes.

Thanks, applied to the block branch.

Kevin

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-06-29 18:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-20 14:48 [Qemu-devel] [PATCH v2 0/3] qcow2 compress threads Vladimir Sementsov-Ogievskiy
2018-06-20 14:48 ` [Qemu-devel] [PATCH v2 1/3] qemu-img: allow compressed not-in-order writes Vladimir Sementsov-Ogievskiy
2018-06-20 14:48 ` [Qemu-devel] [PATCH v2 2/3] qcow2: refactor data compression Vladimir Sementsov-Ogievskiy
2018-06-20 14:48 ` [Qemu-devel] [PATCH v2 3/3] qcow2: add compress threads Vladimir Sementsov-Ogievskiy
2018-06-29 18:06 ` [Qemu-devel] [PATCH v2 0/3] qcow2 " Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).