[Qemu-devel] [PATCH 0/3] qcow2 compress threads

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/3] qcow2 compress threads
@ 2018-06-08 19:20 Vladimir Sementsov-Ogievskiy
  2018-06-08 19:20 ` [Qemu-devel] [PATCH 1/3] qemu-img: allow compressed not-in-order writes Vladimir Sementsov-Ogievskiy
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-08 19:20 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, stefanha, pl, vsementsov, den

Hi all!

Here are compress threads for qcow2, to increase performance of
compressed writes.

I've created the following test:

[]# cat ../gen.sh 
#!/bin/bash

echo 'create pattern-file t_pat'

./qemu-img create -f raw t_pat 1000m
./qemu-io -c 'write -P 0xab 0 1000m' t_pat

echo 'create randod t_rand'

dd if=/dev/urandom of=t_rand bs=1M count=1000

[]# cat ../test.sh 
#!/bin/bash

rm -f t_out

echo 'test pattern-file compression'

time ./qemu-img convert -W -f raw -O qcow2 -c t_pat t_out

rm -f t_out

echo 'test random-file compression'

time ./qemu-img convert -W -f raw -O qcow2 -c t_rand t_out

rm -f t_out


and results before the series (and without -W flag):

test pattern-file compression

real    0m16.658s
user    0m16.450s
sys     0m0.628s
test random-file compression

real    0m24.194s
user    0m24.361s
sys     0m0.395s

results with -W flag, after first patch:

test pattern-file compression

real    0m16.242s
user    0m16.895s
sys     0m0.080s
test random-file compression

real    0m23.450s
user    0m23.767s
sys     0m1.085s

results with -W flag, after third patch:

test pattern-file compression

real    0m5.747s
user    0m22.637s
sys     0m0.393s
test random-file compression

real    0m8.402s
user    0m33.315s
sys     0m0.926s

So, we see significant performance gain. But this of course don't work
without -W flag.

results without -W flag, after third patch:

test pattern-file compression

real    0m16.908s
user    0m16.775s
sys     0m0.589s
test random-file compression

real    0m24.913s
user    0m24.586s
sys     0m0.898s

Note: my cpu is 4-cores 8-threads i7-4790

Vladimir Sementsov-Ogievskiy (3):
  qemu-img: allow compressed not-in-order writes
  qcow2: refactor data compression
  qcow2: add compress threads

 block/qcow2.h |   3 ++
 block/qcow2.c | 134 ++++++++++++++++++++++++++++++++++++++++++++++------------
 qemu-img.c    |   5 ---
 3 files changed, 110 insertions(+), 32 deletions(-)

-- 
2.11.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 1/3] qemu-img: allow compressed not-in-order writes
  2018-06-08 19:20 [Qemu-devel] [PATCH 0/3] qcow2 compress threads Vladimir Sementsov-Ogievskiy
@ 2018-06-08 19:20 ` Vladimir Sementsov-Ogievskiy
  2018-06-08 19:20 ` [Qemu-devel] [PATCH 2/3] qcow2: refactor data compression Vladimir Sementsov-Ogievskiy
  2018-06-08 19:20 ` [Qemu-devel] [PATCH 3/3] qcow2: add compress threads Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-08 19:20 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, stefanha, pl, vsementsov, den

No reason to forbid them, and they are needed to improve performance
with compress-threads in further patches.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qemu-img.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 75f1610aa0..df2657b9cb 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2122,11 +2122,6 @@ static int img_convert(int argc, char **argv)
         goto fail_getopt;
     }
 
-    if (!s.wr_in_order && s.compressed) {
-        error_report("Out of order write and compress are mutually exclusive");
-        goto fail_getopt;
-    }
-
     if (tgt_image_opts && !skip_create) {
         error_report("--target-image-opts requires use of -n flag");
         goto fail_getopt;
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 2/3] qcow2: refactor data compression
  2018-06-08 19:20 [Qemu-devel] [PATCH 0/3] qcow2 compress threads Vladimir Sementsov-Ogievskiy
  2018-06-08 19:20 ` [Qemu-devel] [PATCH 1/3] qemu-img: allow compressed not-in-order writes Vladimir Sementsov-Ogievskiy
@ 2018-06-08 19:20 ` Vladimir Sementsov-Ogievskiy
  2018-06-14 13:06   ` Kevin Wolf
  2018-06-08 19:20 ` [Qemu-devel] [PATCH 3/3] qcow2: add compress threads Vladimir Sementsov-Ogievskiy
  2 siblings, 1 reply; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-08 19:20 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, stefanha, pl, vsementsov, den

Make a separate function for compression to be parallelized later.
 - use .avail_aut field instead of .next_out to calculate size of
   compressed data. It looks more natural and it allows to keep dest to
   be void pointer
 - set avail_out to be at least one byte less than input, to be sure
   avoid inefficient compression earlier

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.c | 74 +++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 47 insertions(+), 27 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 549fee9b69..d4dbe329ab 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -22,11 +22,13 @@
  * THE SOFTWARE.
  */
 
+#define ZLIB_CONST
+#include <zlib.h>
+
 #include "qemu/osdep.h"
 #include "block/block_int.h"
 #include "sysemu/block-backend.h"
 #include "qemu/module.h"
-#include <zlib.h>
 #include "qcow2.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
@@ -3674,6 +3676,45 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset,
     return 0;
 }
 
+/* qcow2_compress()
+ *
+ * @dest - destination buffer, at least of @size-1 bytes
+ * @src - source buffer, @size bytes
+ *
+ * Returns: compressed size on success
+ *          -1 if compression is inefficient
+ *          -2 on any other error
+ */
+static ssize_t qcow2_compress(void *dest, const void *src, size_t size)
+{
+    ssize_t ret;
+    z_stream strm;
+
+    /* best compression, small window, no zlib header */
+    memset(&strm, 0, sizeof(strm));
+    ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION, Z_DEFLATED,
+                       -12, 9, Z_DEFAULT_STRATEGY);
+    if (ret != 0) {
+        return -2;
+    }
+
+    strm.avail_in = size;
+    strm.next_in = src;
+    strm.avail_out = size - 1;
+    strm.next_out = dest;
+
+    ret = deflate(&strm, Z_FINISH);
+    if (ret == Z_STREAM_END) {
+        ret = size - 1 - strm.avail_out;
+    } else {
+        ret = (ret == Z_OK ? -1 : -2);
+    }
+
+    deflateEnd(&strm);
+
+    return ret;
+}
+
 /* XXX: put compressed sectors first, then all the cluster aligned
    tables to avoid losing bytes in alignment */
 static coroutine_fn int
@@ -3683,8 +3724,8 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
     BDRVQcow2State *s = bs->opaque;
     QEMUIOVector hd_qiov;
     struct iovec iov;
-    z_stream strm;
-    int ret, out_len;
+    int ret;
+    size_t out_len;
     uint8_t *buf, *out_buf;
     int64_t cluster_offset;
 
@@ -3717,32 +3758,11 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
 
     out_buf = g_malloc(s->cluster_size);
 
-    /* best compression, small window, no zlib header */
-    memset(&strm, 0, sizeof(strm));
-    ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION,
-                       Z_DEFLATED, -12,
-                       9, Z_DEFAULT_STRATEGY);
-    if (ret != 0) {
-        ret = -EINVAL;
-        goto fail;
-    }
-
-    strm.avail_in = s->cluster_size;
-    strm.next_in = (uint8_t *)buf;
-    strm.avail_out = s->cluster_size;
-    strm.next_out = out_buf;
-
-    ret = deflate(&strm, Z_FINISH);
-    if (ret != Z_STREAM_END && ret != Z_OK) {
-        deflateEnd(&strm);
+    out_len = qcow2_compress(out_buf, buf, s->cluster_size);
+    if (out_len == -2) {
         ret = -EINVAL;
         goto fail;
-    }
-    out_len = strm.next_out - out_buf;
-
-    deflateEnd(&strm);
-
-    if (ret != Z_STREAM_END || out_len >= s->cluster_size) {
+    } else if (out_len == -1) {
         /* could not compress: write normal cluster */
         ret = qcow2_co_pwritev(bs, offset, bytes, qiov, 0);
         if (ret < 0) {
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 3/3] qcow2: add compress threads
  2018-06-08 19:20 [Qemu-devel] [PATCH 0/3] qcow2 compress threads Vladimir Sementsov-Ogievskiy
  2018-06-08 19:20 ` [Qemu-devel] [PATCH 1/3] qemu-img: allow compressed not-in-order writes Vladimir Sementsov-Ogievskiy
  2018-06-08 19:20 ` [Qemu-devel] [PATCH 2/3] qcow2: refactor data compression Vladimir Sementsov-Ogievskiy
@ 2018-06-08 19:20 ` Vladimir Sementsov-Ogievskiy
  2018-06-14 13:16   ` Kevin Wolf
  2 siblings, 1 reply; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-08 19:20 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, stefanha, pl, vsementsov, den

Do data compression in separate threads. This significantly improve
performance for qemu-img convert with -W (allow async writes) and -c
(compressed) options.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.h |  3 +++
 block/qcow2.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 01b5250415..0bd21623c2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -326,6 +326,9 @@ typedef struct BDRVQcow2State {
      * override) */
     char *image_backing_file;
     char *image_backing_format;
+
+    CoQueue compress_wait_queue;
+    int nb_compress_threads;
 } BDRVQcow2State;
 
 typedef struct Qcow2COWRegion {
diff --git a/block/qcow2.c b/block/qcow2.c
index d4dbe329ab..91465893e2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -42,6 +42,7 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qapi-visit-block-core.h"
 #include "crypto.h"
+#include "block/thread-pool.h"
 
 /*
   Differences with QCOW:
@@ -1544,6 +1545,9 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
         qcow2_check_refcounts(bs, &result, 0);
     }
 #endif
+
+    qemu_co_queue_init(&s->compress_wait_queue);
+
     return ret;
 
  fail:
@@ -3715,6 +3719,62 @@ static ssize_t qcow2_compress(void *dest, const void *src, size_t size)
     return ret;
 }
 
+#define MAX_COMPRESS_THREADS 4
+
+typedef struct Qcow2CompressData {
+    void *dest;
+    const void *src;
+    size_t size;
+    ssize_t ret;
+} Qcow2CompressData;
+
+static int qcow2_compress_pool_func(void *opaque)
+{
+    Qcow2CompressData *data = opaque;
+
+    data->ret = qcow2_compress(data->dest, data->src, data->size);
+
+    return 0;
+}
+
+static void qcow2_compress_complete(void *opaque, int ret)
+{
+    qemu_coroutine_enter(opaque);
+}
+
+/* See qcow2_compress definition for parameters description */
+static ssize_t qcow2_co_compress(BlockDriverState *bs,
+                                 void *dest, const void *src, size_t size)
+{
+    BDRVQcow2State *s = bs->opaque;
+    BlockAIOCB *acb;
+    ThreadPool *pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
+    Qcow2CompressData arg = {
+        .dest = dest,
+        .src = src,
+        .size = size,
+    };
+
+    while (s->nb_compress_threads >= MAX_COMPRESS_THREADS) {
+        qemu_co_queue_wait(&s->compress_wait_queue, NULL);
+    }
+
+    s->nb_compress_threads++;
+    acb = thread_pool_submit_aio(pool, qcow2_compress_pool_func, &arg,
+                                 qcow2_compress_complete,
+                                 qemu_coroutine_self());
+
+    if (!acb) {
+        s->nb_compress_threads--;
+        return -EINVAL;
+    }
+    qemu_coroutine_yield();
+    s->nb_compress_threads--;
+    qemu_co_queue_next(&s->compress_wait_queue);
+
+    return arg.ret;
+}
+
 /* XXX: put compressed sectors first, then all the cluster aligned
    tables to avoid losing bytes in alignment */
 static coroutine_fn int
@@ -3758,7 +3818,7 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
 
     out_buf = g_malloc(s->cluster_size);
 
-    out_len = qcow2_compress(out_buf, buf, s->cluster_size);
+    out_len = qcow2_co_compress(bs, out_buf, buf, s->cluster_size);
     if (out_len == -2) {
         ret = -EINVAL;
         goto fail;
-- 
2.11.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] qcow2: refactor data compression
  2018-06-08 19:20 ` [Qemu-devel] [PATCH 2/3] qcow2: refactor data compression Vladimir Sementsov-Ogievskiy
@ 2018-06-14 13:06   ` Kevin Wolf
  0 siblings, 0 replies; 8+ messages in thread
From: Kevin Wolf @ 2018-06-14 13:06 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu-block, mreitz, stefanha, pl, den

Am 08.06.2018 um 21:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Make a separate function for compression to be parallelized later.
>  - use .avail_aut field instead of .next_out to calculate size of

s/avail_aut/avail_out/

>    compressed data. It looks more natural and it allows to keep dest to
>    be void pointer
>  - set avail_out to be at least one byte less than input, to be sure
>    avoid inefficient compression earlier
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

>  block/qcow2.c | 74 +++++++++++++++++++++++++++++++++++++----------------------
>  1 file changed, 47 insertions(+), 27 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 549fee9b69..d4dbe329ab 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -22,11 +22,13 @@
>   * THE SOFTWARE.
>   */
>  
> +#define ZLIB_CONST
> +#include <zlib.h>

The first #include must always be "qemu/osdep.h". If you want to
separate zlib.h from the internal headers, you can move it down instead.

>  #include "qemu/osdep.h"
>  #include "block/block_int.h"
>  #include "sysemu/block-backend.h"
>  #include "qemu/module.h"
> -#include <zlib.h>
>  #include "qcow2.h"
>  #include "qemu/error-report.h"
>  #include "qapi/error.h"
> @@ -3674,6 +3676,45 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset,
>      return 0;
>  }
>  
> +/* qcow2_compress()

The first line of the comment should contain only the /*

> + *
> + * @dest - destination buffer, at least of @size-1 bytes
> + * @src - source buffer, @size bytes
> + *
> + * Returns: compressed size on success
> + *          -1 if compression is inefficient
> + *          -2 on any other error
> + */

The logic looks fine.

Initially I intended to request splitting the code motion from the
changes, but I see that this would probably only make things more
complicated, so I'm okay with leaving that as it is.

Kevin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: add compress threads
  2018-06-08 19:20 ` [Qemu-devel] [PATCH 3/3] qcow2: add compress threads Vladimir Sementsov-Ogievskiy
@ 2018-06-14 13:16   ` Kevin Wolf
  2018-06-14 13:19     ` Denis V. Lunev
  0 siblings, 1 reply; 8+ messages in thread
From: Kevin Wolf @ 2018-06-14 13:16 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu-block, mreitz, stefanha, pl, den

Am 08.06.2018 um 21:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Do data compression in separate threads. This significantly improve
> performance for qemu-img convert with -W (allow async writes) and -c
> (compressed) options.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Looks correct to me, but why do we introduce a separate
MAX_COMPRESS_THREADS? Can't we simply leave the maximum number of
threads to the thread poll?

I see that you chose a much smaller number here (4 vs. 64), but is there
actually a good reason for this?

Kevin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: add compress threads
  2018-06-14 13:16   ` Kevin Wolf
@ 2018-06-14 13:19     ` Denis V. Lunev
  2018-06-14 13:29       ` Kevin Wolf
  0 siblings, 1 reply; 8+ messages in thread
From: Denis V. Lunev @ 2018-06-14 13:19 UTC (permalink / raw)
  To: Kevin Wolf, Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu-block, mreitz, stefanha, pl

On 06/14/2018 04:16 PM, Kevin Wolf wrote:
> Am 08.06.2018 um 21:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Do data compression in separate threads. This significantly improve
>> performance for qemu-img convert with -W (allow async writes) and -c
>> (compressed) options.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Looks correct to me, but why do we introduce a separate
> MAX_COMPRESS_THREADS? Can't we simply leave the maximum number of
> threads to the thread poll?
>
> I see that you chose a much smaller number here (4 vs. 64), but is there
> actually a good reason for this?
>
> Kevin
yes. In the other case the guest will suffer much more from this increased
activity and load on the host.

Den

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: add compress threads
  2018-06-14 13:19     ` Denis V. Lunev
@ 2018-06-14 13:29       ` Kevin Wolf
  0 siblings, 0 replies; 8+ messages in thread
From: Kevin Wolf @ 2018-06-14 13:29 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block, mreitz,
	stefanha, pl

Am 14.06.2018 um 15:19 hat Denis V. Lunev geschrieben:
> On 06/14/2018 04:16 PM, Kevin Wolf wrote:
> > Am 08.06.2018 um 21:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> Do data compression in separate threads. This significantly improve
> >> performance for qemu-img convert with -W (allow async writes) and -c
> >> (compressed) options.
> >>
> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > Looks correct to me, but why do we introduce a separate
> > MAX_COMPRESS_THREADS? Can't we simply leave the maximum number of
> > threads to the thread poll?
> >
> > I see that you chose a much smaller number here (4 vs. 64), but is there
> > actually a good reason for this?
> >
> > Kevin
> yes. In the other case the guest will suffer much more from this increased
> activity and load on the host.

Ah, your primary motivation is use in a backup block job? I completely
forgot about that one (and qemu-img shouldn't care because there is no
guest), but that makes some sense.

Makes me wonder whether this value should be configurable. But that can
come later.

Kevin

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-06-14 13:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-08 19:20 [Qemu-devel] [PATCH 0/3] qcow2 compress threads Vladimir Sementsov-Ogievskiy
2018-06-08 19:20 ` [Qemu-devel] [PATCH 1/3] qemu-img: allow compressed not-in-order writes Vladimir Sementsov-Ogievskiy
2018-06-08 19:20 ` [Qemu-devel] [PATCH 2/3] qcow2: refactor data compression Vladimir Sementsov-Ogievskiy
2018-06-14 13:06   ` Kevin Wolf
2018-06-08 19:20 ` [Qemu-devel] [PATCH 3/3] qcow2: add compress threads Vladimir Sementsov-Ogievskiy
2018-06-14 13:16   ` Kevin Wolf
2018-06-14 13:19     ` Denis V. Lunev
2018-06-14 13:29       ` Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).