* [Qemu-devel] [PATCH 0/3] qcow2 compress threads
@ 2018-06-08 19:20 Vladimir Sementsov-Ogievskiy
2018-06-08 19:20 ` [Qemu-devel] [PATCH 1/3] qemu-img: allow compressed not-in-order writes Vladimir Sementsov-Ogievskiy
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-08 19:20 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, stefanha, pl, vsementsov, den
Hi all!
Here are compress threads for qcow2, to increase performance of
compressed writes.
I've created the following test:
[]# cat ../gen.sh
#!/bin/bash
echo 'create pattern-file t_pat'
./qemu-img create -f raw t_pat 1000m
./qemu-io -c 'write -P 0xab 0 1000m' t_pat
echo 'create randod t_rand'
dd if=/dev/urandom of=t_rand bs=1M count=1000
[]# cat ../test.sh
#!/bin/bash
rm -f t_out
echo 'test pattern-file compression'
time ./qemu-img convert -W -f raw -O qcow2 -c t_pat t_out
rm -f t_out
echo 'test random-file compression'
time ./qemu-img convert -W -f raw -O qcow2 -c t_rand t_out
rm -f t_out
and results before the series (and without -W flag):
test pattern-file compression
real 0m16.658s
user 0m16.450s
sys 0m0.628s
test random-file compression
real 0m24.194s
user 0m24.361s
sys 0m0.395s
results with -W flag, after first patch:
test pattern-file compression
real 0m16.242s
user 0m16.895s
sys 0m0.080s
test random-file compression
real 0m23.450s
user 0m23.767s
sys 0m1.085s
results with -W flag, after third patch:
test pattern-file compression
real 0m5.747s
user 0m22.637s
sys 0m0.393s
test random-file compression
real 0m8.402s
user 0m33.315s
sys 0m0.926s
So, we see significant performance gain. But this of course don't work
without -W flag.
results without -W flag, after third patch:
test pattern-file compression
real 0m16.908s
user 0m16.775s
sys 0m0.589s
test random-file compression
real 0m24.913s
user 0m24.586s
sys 0m0.898s
Note: my cpu is 4-cores 8-threads i7-4790
Vladimir Sementsov-Ogievskiy (3):
qemu-img: allow compressed not-in-order writes
qcow2: refactor data compression
qcow2: add compress threads
block/qcow2.h | 3 ++
block/qcow2.c | 134 ++++++++++++++++++++++++++++++++++++++++++++++------------
qemu-img.c | 5 ---
3 files changed, 110 insertions(+), 32 deletions(-)
--
2.11.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 1/3] qemu-img: allow compressed not-in-order writes
2018-06-08 19:20 [Qemu-devel] [PATCH 0/3] qcow2 compress threads Vladimir Sementsov-Ogievskiy
@ 2018-06-08 19:20 ` Vladimir Sementsov-Ogievskiy
2018-06-08 19:20 ` [Qemu-devel] [PATCH 2/3] qcow2: refactor data compression Vladimir Sementsov-Ogievskiy
2018-06-08 19:20 ` [Qemu-devel] [PATCH 3/3] qcow2: add compress threads Vladimir Sementsov-Ogievskiy
2 siblings, 0 replies; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-08 19:20 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, stefanha, pl, vsementsov, den
No reason to forbid them, and they are needed to improve performance
with compress-threads in further patches.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
qemu-img.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/qemu-img.c b/qemu-img.c
index 75f1610aa0..df2657b9cb 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2122,11 +2122,6 @@ static int img_convert(int argc, char **argv)
goto fail_getopt;
}
- if (!s.wr_in_order && s.compressed) {
- error_report("Out of order write and compress are mutually exclusive");
- goto fail_getopt;
- }
-
if (tgt_image_opts && !skip_create) {
error_report("--target-image-opts requires use of -n flag");
goto fail_getopt;
--
2.11.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 2/3] qcow2: refactor data compression
2018-06-08 19:20 [Qemu-devel] [PATCH 0/3] qcow2 compress threads Vladimir Sementsov-Ogievskiy
2018-06-08 19:20 ` [Qemu-devel] [PATCH 1/3] qemu-img: allow compressed not-in-order writes Vladimir Sementsov-Ogievskiy
@ 2018-06-08 19:20 ` Vladimir Sementsov-Ogievskiy
2018-06-14 13:06 ` Kevin Wolf
2018-06-08 19:20 ` [Qemu-devel] [PATCH 3/3] qcow2: add compress threads Vladimir Sementsov-Ogievskiy
2 siblings, 1 reply; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-08 19:20 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, stefanha, pl, vsementsov, den
Make a separate function for compression to be parallelized later.
- use .avail_aut field instead of .next_out to calculate size of
compressed data. It looks more natural and it allows to keep dest to
be void pointer
- set avail_out to be at least one byte less than input, to be sure
avoid inefficient compression earlier
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
block/qcow2.c | 74 +++++++++++++++++++++++++++++++++++++----------------------
1 file changed, 47 insertions(+), 27 deletions(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index 549fee9b69..d4dbe329ab 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -22,11 +22,13 @@
* THE SOFTWARE.
*/
+#define ZLIB_CONST
+#include <zlib.h>
+
#include "qemu/osdep.h"
#include "block/block_int.h"
#include "sysemu/block-backend.h"
#include "qemu/module.h"
-#include <zlib.h>
#include "qcow2.h"
#include "qemu/error-report.h"
#include "qapi/error.h"
@@ -3674,6 +3676,45 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset,
return 0;
}
+/* qcow2_compress()
+ *
+ * @dest - destination buffer, at least of @size-1 bytes
+ * @src - source buffer, @size bytes
+ *
+ * Returns: compressed size on success
+ * -1 if compression is inefficient
+ * -2 on any other error
+ */
+static ssize_t qcow2_compress(void *dest, const void *src, size_t size)
+{
+ ssize_t ret;
+ z_stream strm;
+
+ /* best compression, small window, no zlib header */
+ memset(&strm, 0, sizeof(strm));
+ ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION, Z_DEFLATED,
+ -12, 9, Z_DEFAULT_STRATEGY);
+ if (ret != 0) {
+ return -2;
+ }
+
+ strm.avail_in = size;
+ strm.next_in = src;
+ strm.avail_out = size - 1;
+ strm.next_out = dest;
+
+ ret = deflate(&strm, Z_FINISH);
+ if (ret == Z_STREAM_END) {
+ ret = size - 1 - strm.avail_out;
+ } else {
+ ret = (ret == Z_OK ? -1 : -2);
+ }
+
+ deflateEnd(&strm);
+
+ return ret;
+}
+
/* XXX: put compressed sectors first, then all the cluster aligned
tables to avoid losing bytes in alignment */
static coroutine_fn int
@@ -3683,8 +3724,8 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
BDRVQcow2State *s = bs->opaque;
QEMUIOVector hd_qiov;
struct iovec iov;
- z_stream strm;
- int ret, out_len;
+ int ret;
+ size_t out_len;
uint8_t *buf, *out_buf;
int64_t cluster_offset;
@@ -3717,32 +3758,11 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
out_buf = g_malloc(s->cluster_size);
- /* best compression, small window, no zlib header */
- memset(&strm, 0, sizeof(strm));
- ret = deflateInit2(&strm, Z_DEFAULT_COMPRESSION,
- Z_DEFLATED, -12,
- 9, Z_DEFAULT_STRATEGY);
- if (ret != 0) {
- ret = -EINVAL;
- goto fail;
- }
-
- strm.avail_in = s->cluster_size;
- strm.next_in = (uint8_t *)buf;
- strm.avail_out = s->cluster_size;
- strm.next_out = out_buf;
-
- ret = deflate(&strm, Z_FINISH);
- if (ret != Z_STREAM_END && ret != Z_OK) {
- deflateEnd(&strm);
+ out_len = qcow2_compress(out_buf, buf, s->cluster_size);
+ if (out_len == -2) {
ret = -EINVAL;
goto fail;
- }
- out_len = strm.next_out - out_buf;
-
- deflateEnd(&strm);
-
- if (ret != Z_STREAM_END || out_len >= s->cluster_size) {
+ } else if (out_len == -1) {
/* could not compress: write normal cluster */
ret = qcow2_co_pwritev(bs, offset, bytes, qiov, 0);
if (ret < 0) {
--
2.11.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 3/3] qcow2: add compress threads
2018-06-08 19:20 [Qemu-devel] [PATCH 0/3] qcow2 compress threads Vladimir Sementsov-Ogievskiy
2018-06-08 19:20 ` [Qemu-devel] [PATCH 1/3] qemu-img: allow compressed not-in-order writes Vladimir Sementsov-Ogievskiy
2018-06-08 19:20 ` [Qemu-devel] [PATCH 2/3] qcow2: refactor data compression Vladimir Sementsov-Ogievskiy
@ 2018-06-08 19:20 ` Vladimir Sementsov-Ogievskiy
2018-06-14 13:16 ` Kevin Wolf
2 siblings, 1 reply; 8+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-08 19:20 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, stefanha, pl, vsementsov, den
Do data compression in separate threads. This significantly improve
performance for qemu-img convert with -W (allow async writes) and -c
(compressed) options.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
block/qcow2.h | 3 +++
block/qcow2.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 64 insertions(+), 1 deletion(-)
diff --git a/block/qcow2.h b/block/qcow2.h
index 01b5250415..0bd21623c2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -326,6 +326,9 @@ typedef struct BDRVQcow2State {
* override) */
char *image_backing_file;
char *image_backing_format;
+
+ CoQueue compress_wait_queue;
+ int nb_compress_threads;
} BDRVQcow2State;
typedef struct Qcow2COWRegion {
diff --git a/block/qcow2.c b/block/qcow2.c
index d4dbe329ab..91465893e2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -42,6 +42,7 @@
#include "qapi/qobject-input-visitor.h"
#include "qapi/qapi-visit-block-core.h"
#include "crypto.h"
+#include "block/thread-pool.h"
/*
Differences with QCOW:
@@ -1544,6 +1545,9 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
qcow2_check_refcounts(bs, &result, 0);
}
#endif
+
+ qemu_co_queue_init(&s->compress_wait_queue);
+
return ret;
fail:
@@ -3715,6 +3719,62 @@ static ssize_t qcow2_compress(void *dest, const void *src, size_t size)
return ret;
}
+#define MAX_COMPRESS_THREADS 4
+
+typedef struct Qcow2CompressData {
+ void *dest;
+ const void *src;
+ size_t size;
+ ssize_t ret;
+} Qcow2CompressData;
+
+static int qcow2_compress_pool_func(void *opaque)
+{
+ Qcow2CompressData *data = opaque;
+
+ data->ret = qcow2_compress(data->dest, data->src, data->size);
+
+ return 0;
+}
+
+static void qcow2_compress_complete(void *opaque, int ret)
+{
+ qemu_coroutine_enter(opaque);
+}
+
+/* See qcow2_compress definition for parameters description */
+static ssize_t qcow2_co_compress(BlockDriverState *bs,
+ void *dest, const void *src, size_t size)
+{
+ BDRVQcow2State *s = bs->opaque;
+ BlockAIOCB *acb;
+ ThreadPool *pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
+ Qcow2CompressData arg = {
+ .dest = dest,
+ .src = src,
+ .size = size,
+ };
+
+ while (s->nb_compress_threads >= MAX_COMPRESS_THREADS) {
+ qemu_co_queue_wait(&s->compress_wait_queue, NULL);
+ }
+
+ s->nb_compress_threads++;
+ acb = thread_pool_submit_aio(pool, qcow2_compress_pool_func, &arg,
+ qcow2_compress_complete,
+ qemu_coroutine_self());
+
+ if (!acb) {
+ s->nb_compress_threads--;
+ return -EINVAL;
+ }
+ qemu_coroutine_yield();
+ s->nb_compress_threads--;
+ qemu_co_queue_next(&s->compress_wait_queue);
+
+ return arg.ret;
+}
+
/* XXX: put compressed sectors first, then all the cluster aligned
tables to avoid losing bytes in alignment */
static coroutine_fn int
@@ -3758,7 +3818,7 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
out_buf = g_malloc(s->cluster_size);
- out_len = qcow2_compress(out_buf, buf, s->cluster_size);
+ out_len = qcow2_co_compress(bs, out_buf, buf, s->cluster_size);
if (out_len == -2) {
ret = -EINVAL;
goto fail;
--
2.11.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] qcow2: refactor data compression
2018-06-08 19:20 ` [Qemu-devel] [PATCH 2/3] qcow2: refactor data compression Vladimir Sementsov-Ogievskiy
@ 2018-06-14 13:06 ` Kevin Wolf
0 siblings, 0 replies; 8+ messages in thread
From: Kevin Wolf @ 2018-06-14 13:06 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: qemu-devel, qemu-block, mreitz, stefanha, pl, den
Am 08.06.2018 um 21:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Make a separate function for compression to be parallelized later.
> - use .avail_aut field instead of .next_out to calculate size of
s/avail_aut/avail_out/
> compressed data. It looks more natural and it allows to keep dest to
> be void pointer
> - set avail_out to be at least one byte less than input, to be sure
> avoid inefficient compression earlier
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> block/qcow2.c | 74 +++++++++++++++++++++++++++++++++++++----------------------
> 1 file changed, 47 insertions(+), 27 deletions(-)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 549fee9b69..d4dbe329ab 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -22,11 +22,13 @@
> * THE SOFTWARE.
> */
>
> +#define ZLIB_CONST
> +#include <zlib.h>
The first #include must always be "qemu/osdep.h". If you want to
separate zlib.h from the internal headers, you can move it down instead.
> #include "qemu/osdep.h"
> #include "block/block_int.h"
> #include "sysemu/block-backend.h"
> #include "qemu/module.h"
> -#include <zlib.h>
> #include "qcow2.h"
> #include "qemu/error-report.h"
> #include "qapi/error.h"
> @@ -3674,6 +3676,45 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset,
> return 0;
> }
>
> +/* qcow2_compress()
The first line of the comment should contain only the /*
> + *
> + * @dest - destination buffer, at least of @size-1 bytes
> + * @src - source buffer, @size bytes
> + *
> + * Returns: compressed size on success
> + * -1 if compression is inefficient
> + * -2 on any other error
> + */
The logic looks fine.
Initially I intended to request splitting the code motion from the
changes, but I see that this would probably only make things more
complicated, so I'm okay with leaving that as it is.
Kevin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 3/3] qcow2: add compress threads
2018-06-08 19:20 ` [Qemu-devel] [PATCH 3/3] qcow2: add compress threads Vladimir Sementsov-Ogievskiy
@ 2018-06-14 13:16 ` Kevin Wolf
2018-06-14 13:19 ` Denis V. Lunev
0 siblings, 1 reply; 8+ messages in thread
From: Kevin Wolf @ 2018-06-14 13:16 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: qemu-devel, qemu-block, mreitz, stefanha, pl, den
Am 08.06.2018 um 21:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Do data compression in separate threads. This significantly improve
> performance for qemu-img convert with -W (allow async writes) and -c
> (compressed) options.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Looks correct to me, but why do we introduce a separate
MAX_COMPRESS_THREADS? Can't we simply leave the maximum number of
threads to the thread poll?
I see that you chose a much smaller number here (4 vs. 64), but is there
actually a good reason for this?
Kevin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 3/3] qcow2: add compress threads
2018-06-14 13:16 ` Kevin Wolf
@ 2018-06-14 13:19 ` Denis V. Lunev
2018-06-14 13:29 ` Kevin Wolf
0 siblings, 1 reply; 8+ messages in thread
From: Denis V. Lunev @ 2018-06-14 13:19 UTC (permalink / raw)
To: Kevin Wolf, Vladimir Sementsov-Ogievskiy
Cc: qemu-devel, qemu-block, mreitz, stefanha, pl
On 06/14/2018 04:16 PM, Kevin Wolf wrote:
> Am 08.06.2018 um 21:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Do data compression in separate threads. This significantly improve
>> performance for qemu-img convert with -W (allow async writes) and -c
>> (compressed) options.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Looks correct to me, but why do we introduce a separate
> MAX_COMPRESS_THREADS? Can't we simply leave the maximum number of
> threads to the thread poll?
>
> I see that you chose a much smaller number here (4 vs. 64), but is there
> actually a good reason for this?
>
> Kevin
yes. In the other case the guest will suffer much more from this increased
activity and load on the host.
Den
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 3/3] qcow2: add compress threads
2018-06-14 13:19 ` Denis V. Lunev
@ 2018-06-14 13:29 ` Kevin Wolf
0 siblings, 0 replies; 8+ messages in thread
From: Kevin Wolf @ 2018-06-14 13:29 UTC (permalink / raw)
To: Denis V. Lunev
Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block, mreitz,
stefanha, pl
Am 14.06.2018 um 15:19 hat Denis V. Lunev geschrieben:
> On 06/14/2018 04:16 PM, Kevin Wolf wrote:
> > Am 08.06.2018 um 21:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> Do data compression in separate threads. This significantly improve
> >> performance for qemu-img convert with -W (allow async writes) and -c
> >> (compressed) options.
> >>
> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > Looks correct to me, but why do we introduce a separate
> > MAX_COMPRESS_THREADS? Can't we simply leave the maximum number of
> > threads to the thread poll?
> >
> > I see that you chose a much smaller number here (4 vs. 64), but is there
> > actually a good reason for this?
> >
> > Kevin
> yes. In the other case the guest will suffer much more from this increased
> activity and load on the host.
Ah, your primary motivation is use in a backup block job? I completely
forgot about that one (and qemu-img shouldn't care because there is no
guest), but that makes some sense.
Makes me wonder whether this value should be configurable. But that can
come later.
Kevin
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-06-14 13:29 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-08 19:20 [Qemu-devel] [PATCH 0/3] qcow2 compress threads Vladimir Sementsov-Ogievskiy
2018-06-08 19:20 ` [Qemu-devel] [PATCH 1/3] qemu-img: allow compressed not-in-order writes Vladimir Sementsov-Ogievskiy
2018-06-08 19:20 ` [Qemu-devel] [PATCH 2/3] qcow2: refactor data compression Vladimir Sementsov-Ogievskiy
2018-06-14 13:06 ` Kevin Wolf
2018-06-08 19:20 ` [Qemu-devel] [PATCH 3/3] qcow2: add compress threads Vladimir Sementsov-Ogievskiy
2018-06-14 13:16 ` Kevin Wolf
2018-06-14 13:19 ` Denis V. Lunev
2018-06-14 13:29 ` Kevin Wolf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).