[Qemu-devel] [PATCH 1.8 0/6] qemu-img convert optimizations

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 1.8 0/6] qemu-img convert optimizations
@ 2013-11-25 13:57 Peter Lieven
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 1/6] qemu-img: add support for skipping zeroes in input during convert Peter Lieven
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Peter Lieven @ 2013-11-25 13:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, stefanha

this series adds some optimizations for qemu-img during convert which
have been developed recently:
- skipping input based on get_block_status
- variable I/O buffer size
- align write requests to cluster_size
- show progress in sectors or percent

Peter Lieven (6):
  qemu-img: add support for skipping zeroes in input during convert
  qemu-img: fix usage instruction for qemu-img convert
  qemu-img: add option to specify alternate iobuffer size
  block/iscsi: set bdi->cluster_size
  qemu-img: add option to align writes to cluster_sectors during
    convert
  qemu-img: add option to show progress in sectors

 block/iscsi.c    |    7 +++
 qemu-img-cmds.hx |    4 +-
 qemu-img.c       |  159 ++++++++++++++++++++++++++++++++++++------------------
 qemu-img.texi    |    7 ++-
 4 files changed, 122 insertions(+), 55 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 1.8 1/6] qemu-img: add support for skipping zeroes in input during convert
  2013-11-25 13:57 [Qemu-devel] [PATCH 1.8 0/6] qemu-img convert optimizations Peter Lieven
@ 2013-11-25 13:57 ` Peter Lieven
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 2/6] qemu-img: fix usage instruction for qemu-img convert Peter Lieven
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Peter Lieven @ 2013-11-25 13:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, stefanha

we currently do not check if a sector is allocated during convert.
This means if a sector is unallocated that we allocate a bounce
buffer of zeroes, find out its zero later and do not write it
in the best case. In the worst case this can lead to reading
blocks from a raw device (like iSCSI) altough we could easily
know via get_block_status that they are zero and simply skip them.

This patch also fixes the progress output not being at 100% after
a successful conversion.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 qemu-img.c |   85 ++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 48 insertions(+), 37 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index dc0c2f0..efb744c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1125,13 +1125,15 @@ out3:
 
 static int img_convert(int argc, char **argv)
 {
-    int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size,
+    int c, n, n1, bs_n, bs_i, compress, cluster_size,
         cluster_sectors, skip_create;
+    int64_t ret = 0;
     int progress = 0, flags;
     const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
     BlockDriver *drv, *proto_drv;
     BlockDriverState **bs = NULL, *out_bs = NULL;
-    int64_t total_sectors, nb_sectors, sector_num, bs_offset;
+    int64_t total_sectors, nb_sectors, sector_num, bs_offset,
+            sector_num_next_status = 0;
     uint64_t bs_sectors;
     uint8_t * buf = NULL;
     const uint8_t *buf1;
@@ -1140,7 +1142,6 @@ static int img_convert(int argc, char **argv)
     QEMUOptionParameter *out_baseimg_param;
     char *options = NULL;
     const char *snapshot_name = NULL;
-    float local_progress = 0;
     int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */
     bool quiet = false;
     Error *local_err = NULL;
@@ -1403,10 +1404,6 @@ static int img_convert(int argc, char **argv)
         sector_num = 0;
 
         nb_sectors = total_sectors;
-        if (nb_sectors != 0) {
-            local_progress = (float)100 /
-                (nb_sectors / MIN(nb_sectors, cluster_sectors));
-        }
 
         for(;;) {
             int64_t bs_num;
@@ -1464,7 +1461,7 @@ static int img_convert(int argc, char **argv)
                 }
             }
             sector_num += n;
-            qemu_progress_print(local_progress, 100);
+            qemu_progress_print(100.0 * sector_num / total_sectors, 0);
         }
         /* signal EOF to align */
         bdrv_write_compressed(out_bs, 0, NULL, 0);
@@ -1481,21 +1478,13 @@ static int img_convert(int argc, char **argv)
 
         sector_num = 0; // total number of sectors converted so far
         nb_sectors = total_sectors - sector_num;
-        if (nb_sectors != 0) {
-            local_progress = (float)100 /
-                (nb_sectors / MIN(nb_sectors, IO_BUF_SIZE / 512));
-        }
 
         for(;;) {
             nb_sectors = total_sectors - sector_num;
             if (nb_sectors <= 0) {
+                ret = 0;
                 break;
             }
-            if (nb_sectors >= (IO_BUF_SIZE / 512)) {
-                n = (IO_BUF_SIZE / 512);
-            } else {
-                n = nb_sectors;
-            }
 
             while (sector_num - bs_offset >= bs_sectors) {
                 bs_i ++;
@@ -1507,34 +1496,53 @@ static int img_convert(int argc, char **argv)
                    sector_num, bs_i, bs_offset, bs_sectors); */
             }
 
-            if (n > bs_offset + bs_sectors - sector_num) {
-                n = bs_offset + bs_sectors - sector_num;
-            }
-
-            /* If the output image is being created as a copy on write image,
-               assume that sectors which are unallocated in the input image
-               are present in both the output's and input's base images (no
-               need to copy them). */
-            if (out_baseimg) {
-                ret = bdrv_is_allocated(bs[bs_i], sector_num - bs_offset,
-                                        n, &n1);
+            if ((out_baseimg || has_zero_init) &&
+                sector_num >= sector_num_next_status) {
+                n = nb_sectors > INT_MAX ? INT_MAX : nb_sectors;
+                ret = bdrv_get_block_status(bs[bs_i], sector_num - bs_offset,
+                                            n, &n1);
                 if (ret < 0) {
-                    error_report("error while reading metadata for sector "
-                                 "%" PRId64 ": %s",
-                                 sector_num - bs_offset, strerror(-ret));
+                    error_report("error while reading block status of sector %"
+                                 PRId64 ": %s", sector_num - bs_offset,
+                                 strerror(-ret));
                     goto out;
                 }
-                if (!ret) {
+                /* If the output image is zero initialized, we are not working
+                 * on a shared base and the input is zero we can skip the next
+                 * n1 sectors */
+                if (has_zero_init && !out_baseimg && ret & BDRV_BLOCK_ZERO) {
                     sector_num += n1;
                     continue;
                 }
-                /* The next 'n1' sectors are allocated in the input image. Copy
-                   only those as they may be followed by unallocated sectors. */
-                n = n1;
+                /* If the output image is being created as a copy on write
+                 * image, assume that sectors which are unallocated in the
+                 * input image are present in both the output's and input's
+                 * base images (no need to copy them). */
+                if (out_baseimg) {
+                    if (!(ret & BDRV_BLOCK_DATA)) {
+                        sector_num += n1;
+                        continue;
+                    }
+                    /* The next 'n1' sectors are allocated in the input image.
+                     * Copy only those as they may be followed by unallocated
+                     * sectors. */
+                    nb_sectors = n1;
+                }
+                /* avoid redundant callouts to get_block_status */
+                sector_num_next_status = sector_num + n1;
+            }
+
+            if (nb_sectors >= (IO_BUF_SIZE / 512)) {
+                n = (IO_BUF_SIZE / 512);
             } else {
-                n1 = n;
+                n = nb_sectors;
             }
 
+            if (n > bs_offset + bs_sectors - sector_num) {
+                n = bs_offset + bs_sectors - sector_num;
+            }
+
+            n1 = n;
             ret = bdrv_read(bs[bs_i], sector_num - bs_offset, buf, n);
             if (ret < 0) {
                 error_report("error while reading sector %" PRId64 ": %s",
@@ -1559,10 +1567,13 @@ static int img_convert(int argc, char **argv)
                 n -= n1;
                 buf1 += n1 * 512;
             }
-            qemu_progress_print(local_progress, 100);
+            qemu_progress_print(100.0 * sector_num / total_sectors, 0);
         }
     }
 out:
+    if (!ret) {
+        qemu_progress_print(100, 0);
+    }
     qemu_progress_end();
     free_option_parameters(create_options);
     free_option_parameters(param);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 1.8 2/6] qemu-img: fix usage instruction for qemu-img convert
  2013-11-25 13:57 [Qemu-devel] [PATCH 1.8 0/6] qemu-img convert optimizations Peter Lieven
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 1/6] qemu-img: add support for skipping zeroes in input during convert Peter Lieven
@ 2013-11-25 13:57 ` Peter Lieven
  2013-11-25 17:21   ` Eric Blake
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 3/6] qemu-img: add option to specify alternate iobuffer size Peter Lieven
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2013-11-25 13:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, stefanha

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 qemu-img.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/qemu-img.c b/qemu-img.c
index efb744c..e2d1a0a 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -105,7 +105,6 @@ static void help(void)
            "       conversion. If the number of bytes is 0, the source will not be scanned for\n"
            "       unallocated or zero sectors, and the destination image will always be\n"
            "       fully allocated\n"
-           "       images will always be fully allocated\n"
            "  '--output' takes the format in which the output must be done (human or json)\n"
            "  '-n' skips the target volume creation (useful if the volume is created\n"
            "       prior to running qemu-img)\n"
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 1.8 3/6] qemu-img: add option to specify alternate iobuffer size
  2013-11-25 13:57 [Qemu-devel] [PATCH 1.8 0/6] qemu-img convert optimizations Peter Lieven
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 1/6] qemu-img: add support for skipping zeroes in input during convert Peter Lieven
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 2/6] qemu-img: fix usage instruction for qemu-img convert Peter Lieven
@ 2013-11-25 13:57 ` Peter Lieven
  2013-11-25 14:54   ` Paolo Bonzini
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 4/6] block/iscsi: set bdi->cluster_size Peter Lieven
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2013-11-25 13:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, stefanha

since the convert process is basically a sync operation it might
be benificial in some case to change the hardcoded I/O buffer
size to an alternate (greater) value.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 qemu-img-cmds.hx |    4 ++--
 qemu-img.c       |   25 ++++++++++++++++++++-----
 qemu-img.texi    |    4 +++-
 3 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index da1d965..e0b8ab4 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -34,9 +34,9 @@ STEXI
 ETEXI
 
 DEF("convert", img_convert,
-    "convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] filename [filename2 [...]] output_filename")
+    "convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] [-m iobuf_size] filename [filename2 [...]] output_filename")
 STEXI
-@item convert [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
+@item convert [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("info", img_info,
diff --git a/qemu-img.c b/qemu-img.c
index e2d1a0a..0ce5d14 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -105,6 +105,7 @@ static void help(void)
            "       conversion. If the number of bytes is 0, the source will not be scanned for\n"
            "       unallocated or zero sectors, and the destination image will always be\n"
            "       fully allocated\n"
+           "  '-m' specify I/O buffer size in bytes (default 2M)\n"
            "  '--output' takes the format in which the output must be done (human or json)\n"
            "  '-n' skips the target volume creation (useful if the volume is created\n"
            "       prior to running qemu-img)\n"
@@ -1135,6 +1136,7 @@ static int img_convert(int argc, char **argv)
             sector_num_next_status = 0;
     uint64_t bs_sectors;
     uint8_t * buf = NULL;
+    size_t bufsectors = IO_BUF_SIZE / BDRV_SECTOR_SIZE;
     const uint8_t *buf1;
     BlockDriverInfo bdi;
     QEMUOptionParameter *param = NULL, *create_options = NULL;
@@ -1152,7 +1154,7 @@ static int img_convert(int argc, char **argv)
     compress = 0;
     skip_create = 0;
     for(;;) {
-        c = getopt(argc, argv, "f:O:B:s:hce6o:pS:t:qn");
+        c = getopt(argc, argv, "f:O:B:s:hce6o:pS:m:t:qn");
         if (c == -1) {
             break;
         }
@@ -1200,6 +1202,19 @@ static int img_convert(int argc, char **argv)
             min_sparse = sval / BDRV_SECTOR_SIZE;
             break;
         }
+        case 'm':
+        {
+            int64_t sval;
+            char *end;
+            sval = strtosz_suffix(optarg, &end, STRTOSZ_DEFSUFFIX_B);
+            if (sval < BDRV_SECTOR_SIZE || *end) {
+                error_report("Invalid I/O buffer size specified");
+                return 1;
+            }
+
+            bufsectors = sval / BDRV_SECTOR_SIZE;
+            break;
+        }
         case 'p':
             progress = 1;
             break;
@@ -1371,7 +1386,7 @@ static int img_convert(int argc, char **argv)
     bs_i = 0;
     bs_offset = 0;
     bdrv_get_geometry(bs[0], &bs_sectors);
-    buf = qemu_blockalign(out_bs, IO_BUF_SIZE);
+    buf = qemu_blockalign(out_bs, bufsectors * BDRV_SECTOR_SIZE);
 
     if (skip_create) {
         int64_t output_length = bdrv_getlength(out_bs);
@@ -1394,7 +1409,7 @@ static int img_convert(int argc, char **argv)
             goto out;
         }
         cluster_size = bdi.cluster_size;
-        if (cluster_size <= 0 || cluster_size > IO_BUF_SIZE) {
+        if (cluster_size <= 0 || cluster_size > bufsectors * BDRV_SECTOR_SIZE) {
             error_report("invalid cluster size");
             ret = -1;
             goto out;
@@ -1531,8 +1546,8 @@ static int img_convert(int argc, char **argv)
                 sector_num_next_status = sector_num + n1;
             }
 
-            if (nb_sectors >= (IO_BUF_SIZE / 512)) {
-                n = (IO_BUF_SIZE / 512);
+            if (nb_sectors >= bufsectors) {
+                n = bufsectors;
             } else {
                 n = nb_sectors;
             }
diff --git a/qemu-img.texi b/qemu-img.texi
index da36975..87f9d0f 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -179,7 +179,7 @@ Error on reading data
 
 @end table
 
-@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
+@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
 
 Convert the disk image @var{filename} or a snapshot @var{snapshot_name} to disk image @var{output_filename}
 using format @var{output_fmt}. It can be optionally compressed (@code{-c}
@@ -199,6 +199,8 @@ conversion. If @var{sparse_size} is 0, the source will not be scanned for
 unallocated or zero sectors, and the destination image will always be
 fully allocated.
 
+@var{iobuf_size} indicates the size of the I/O buffer (defaults to 2M).
+
 You can use the @var{backing_file} option to force the output image to be
 created as a copy on write image of the specified base image; the
 @var{backing_file} should have the same content as the input's base image,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 1.8 4/6] block/iscsi: set bdi->cluster_size
  2013-11-25 13:57 [Qemu-devel] [PATCH 1.8 0/6] qemu-img convert optimizations Peter Lieven
                   ` (2 preceding siblings ...)
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 3/6] qemu-img: add option to specify alternate iobuffer size Peter Lieven
@ 2013-11-25 13:57 ` Peter Lieven
  2013-11-25 14:51   ` Paolo Bonzini
  2013-11-25 15:02   ` Paolo Bonzini
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 5/6] qemu-img: add option to align writes to cluster_sectors during convert Peter Lieven
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 6/6] qemu-img: add option to show progress in sectors Peter Lieven
  5 siblings, 2 replies; 25+ messages in thread
From: Peter Lieven @ 2013-11-25 13:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, stefanha

this patch aims to set bdi->cluster_size to the internal page size
of the iscsi target so that enabled callers can align requests
properly.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/iscsi.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 93fee6d..75d6b87 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1580,6 +1580,13 @@ static int iscsi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
     IscsiLun *iscsilun = bs->opaque;
     bdi->unallocated_blocks_are_zero = !!iscsilun->lbprz;
     bdi->can_write_zeroes_with_unmap = iscsilun->lbprz && iscsilun->lbp.lbpws;
+    /* Guess the internal cluster (page) size of the iscsi target by the means
+     * of opt_unmap_gran. Transfer the unmap granularity only if it has a
+     * reasonable size for bdi->cluster_size */
+    if (iscsilun->bl.opt_unmap_gran * iscsilun->block_size >= 64 * 1024 &&
+        iscsilun->bl.opt_unmap_gran * iscsilun->block_size <= 16 * 1024 * 1024) {
+        bdi->cluster_size = iscsilun->bl.opt_unmap_gran * iscsilun->block_size;
+    }
     return 0;
 }
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 1.8 5/6] qemu-img: add option to align writes to cluster_sectors during convert
  2013-11-25 13:57 [Qemu-devel] [PATCH 1.8 0/6] qemu-img convert optimizations Peter Lieven
                   ` (3 preceding siblings ...)
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 4/6] block/iscsi: set bdi->cluster_size Peter Lieven
@ 2013-11-25 13:57 ` Peter Lieven
  2013-11-25 15:11   ` Paolo Bonzini
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 6/6] qemu-img: add option to show progress in sectors Peter Lieven
  5 siblings, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2013-11-25 13:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, stefanha

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 qemu-img-cmds.hx |    4 ++--
 qemu-img.c       |   37 +++++++++++++++++++++++++------------
 qemu-img.texi    |    5 ++++-
 3 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index e0b8ab4..266cdf3 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -34,9 +34,9 @@ STEXI
 ETEXI
 
 DEF("convert", img_convert,
-    "convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] [-m iobuf_size] filename [filename2 [...]] output_filename")
+    "convert [-c] [-p] [-q] [-n] [-a] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] [-m iobuf_size] filename [filename2 [...]] output_filename")
 STEXI
-@item convert [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
+@item convert [-c] [-p] [-q] [-n] [-a] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("info", img_info,
diff --git a/qemu-img.c b/qemu-img.c
index 0ce5d14..9fa8fd4 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -109,6 +109,7 @@ static void help(void)
            "  '--output' takes the format in which the output must be done (human or json)\n"
            "  '-n' skips the target volume creation (useful if the volume is created\n"
            "       prior to running qemu-img)\n"
+           "  '-a' align write requests to cluster size if possible\n"
            "\n"
            "Parameters to check subcommand:\n"
            "  '-r' tries to repair any inconsistencies that are found during the check.\n"
@@ -1125,8 +1126,7 @@ out3:
 
 static int img_convert(int argc, char **argv)
 {
-    int c, n, n1, bs_n, bs_i, compress, cluster_size,
-        cluster_sectors, skip_create;
+    int c, n, n1, bs_n, bs_i, compress, cluster_sectors, skip_create;
     int64_t ret = 0;
     int progress = 0, flags;
     const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
@@ -1144,7 +1144,7 @@ static int img_convert(int argc, char **argv)
     char *options = NULL;
     const char *snapshot_name = NULL;
     int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */
-    bool quiet = false;
+    bool quiet = false, align = false;
     Error *local_err = NULL;
 
     fmt = NULL;
@@ -1154,7 +1154,7 @@ static int img_convert(int argc, char **argv)
     compress = 0;
     skip_create = 0;
     for(;;) {
-        c = getopt(argc, argv, "f:O:B:s:hce6o:pS:m:t:qn");
+        c = getopt(argc, argv, "f:O:B:s:hcae6o:pS:m:t:qn");
         if (c == -1) {
             break;
         }
@@ -1175,6 +1175,9 @@ static int img_convert(int argc, char **argv)
         case 'c':
             compress = 1;
             break;
+        case 'a':
+            align = true;
+            break;
         case 'e':
             error_report("option -e is deprecated, please use \'-o "
                   "encryption\' instead!");
@@ -1402,19 +1405,21 @@ static int img_convert(int argc, char **argv)
         }
     }
 
+    cluster_sectors = 0;
+    ret = bdrv_get_info(out_bs, &bdi);
+    if (ret < 0 && compress) {
+        error_report("could not get block driver info");
+        goto out;
+    } else {
+        cluster_sectors = bdi.cluster_size / BDRV_SECTOR_SIZE;
+    }
+
     if (compress) {
-        ret = bdrv_get_info(out_bs, &bdi);
-        if (ret < 0) {
-            error_report("could not get block driver info");
-            goto out;
-        }
-        cluster_size = bdi.cluster_size;
-        if (cluster_size <= 0 || cluster_size > bufsectors * BDRV_SECTOR_SIZE) {
+        if (cluster_sectors <= 0 || cluster_sectors > bufsectors) {
             error_report("invalid cluster size");
             ret = -1;
             goto out;
         }
-        cluster_sectors = cluster_size >> 9;
         sector_num = 0;
 
         nb_sectors = total_sectors;
@@ -1552,6 +1557,14 @@ static int img_convert(int argc, char **argv)
                 n = nb_sectors;
             }
 
+            if (align && cluster_sectors > 0) {
+                int64_t next_aligned_sector = (sector_num + cluster_sectors);
+                next_aligned_sector -= next_aligned_sector % cluster_sectors;
+                if (sector_num + n > next_aligned_sector) {
+                    n = next_aligned_sector - sector_num;
+                }
+            }
+
             if (n > bs_offset + bs_sectors - sector_num) {
                 n = bs_offset + bs_sectors - sector_num;
             }
diff --git a/qemu-img.texi b/qemu-img.texi
index 87f9d0f..9b1720f 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -179,11 +179,14 @@ Error on reading data
 
 @end table
 
-@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
+@item convert [-c] [-p] [-n] [-a] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
 
 Convert the disk image @var{filename} or a snapshot @var{snapshot_name} to disk image @var{output_filename}
 using format @var{output_fmt}. It can be optionally compressed (@code{-c}
 option) or use any format specific options like encryption (@code{-o} option).
+If the @code{-a} option is specified write requests will be aligned
+to the cluster size of the output image if possible. This is the default
+for compressed images.
 
 Only the formats @code{qcow} and @code{qcow2} support compression. The
 compression is read-only. It means that if a compressed sector is
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH 1.8 6/6] qemu-img: add option to show progress in sectors
  2013-11-25 13:57 [Qemu-devel] [PATCH 1.8 0/6] qemu-img convert optimizations Peter Lieven
                   ` (4 preceding siblings ...)
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 5/6] qemu-img: add option to align writes to cluster_sectors during convert Peter Lieven
@ 2013-11-25 13:57 ` Peter Lieven
  5 siblings, 0 replies; 25+ messages in thread
From: Peter Lieven @ 2013-11-25 13:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, stefanha

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 qemu-img-cmds.hx |    4 ++--
 qemu-img.c       |   23 ++++++++++++++++++++---
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 266cdf3..4190ec1 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -34,9 +34,9 @@ STEXI
 ETEXI
 
 DEF("convert", img_convert,
-    "convert [-c] [-p] [-q] [-n] [-a] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] [-m iobuf_size] filename [filename2 [...]] output_filename")
+    "convert [-c] [-p|pp] [-q] [-n] [-a] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] [-m iobuf_size] filename [filename2 [...]] output_filename")
 STEXI
-@item convert [-c] [-p] [-q] [-n] [-a] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
+@item convert [-c] [-p|pp] [-q] [-n] [-a] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("info", img_info,
diff --git a/qemu-img.c b/qemu-img.c
index 9fa8fd4..f58247f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -99,6 +99,7 @@ static void help(void)
            "       rebasing in this case (useful for renaming the backing file)\n"
            "  '-h' with or without a command shows this help and lists the supported formats\n"
            "  '-p' show progress of command (only certain commands)\n"
+           "  '-pp' show progress of command in sectors (only convert command)\n"
            "  '-q' use Quiet mode - do not print any output (except errors)\n"
            "  '-S' indicates the consecutive number of bytes (defaults to 4k) that must\n"
            "       contain only zeros for qemu-img to create a sparse image during\n"
@@ -1124,6 +1125,14 @@ out3:
     return ret;
 }
 
+static void print_sector_progress(int progress, int64_t sector_num,
+                                  int64_t total_sectors)
+{
+    if (progress == 2) {
+        printf("%ld of %ld sectors converted.\r", sector_num, total_sectors);
+    }
+}
+
 static int img_convert(int argc, char **argv)
 {
     int c, n, n1, bs_n, bs_i, compress, cluster_sectors, skip_create;
@@ -1132,7 +1141,7 @@ static int img_convert(int argc, char **argv)
     const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
     BlockDriver *drv, *proto_drv;
     BlockDriverState **bs = NULL, *out_bs = NULL;
-    int64_t total_sectors, nb_sectors, sector_num, bs_offset,
+    int64_t total_sectors = 0, nb_sectors, sector_num, bs_offset,
             sector_num_next_status = 0;
     uint64_t bs_sectors;
     uint8_t * buf = NULL;
@@ -1219,7 +1228,7 @@ static int img_convert(int argc, char **argv)
             break;
         }
         case 'p':
-            progress = 1;
+            progress++;
             break;
         case 't':
             cache = optarg;
@@ -1245,7 +1254,7 @@ static int img_convert(int argc, char **argv)
     out_filename = argv[argc - 1];
 
     /* Initialize before goto out */
-    qemu_progress_init(progress, 2.0);
+    qemu_progress_init(progress == 1, 2.0);
 
     if (options && is_help_option(options)) {
         ret = print_block_option_help(out_filename, out_fmt);
@@ -1276,6 +1285,8 @@ static int img_convert(int argc, char **argv)
         total_sectors += bs_sectors;
     }
 
+    print_sector_progress(progress, 0, total_sectors);
+
     if (snapshot_name != NULL) {
         if (bs_n > 1) {
             error_report("No support for concatenating multiple snapshot");
@@ -1481,6 +1492,7 @@ static int img_convert(int argc, char **argv)
             }
             sector_num += n;
             qemu_progress_print(100.0 * sector_num / total_sectors, 0);
+            print_sector_progress(progress, sector_num, total_sectors);
         }
         /* signal EOF to align */
         bdrv_write_compressed(out_bs, 0, NULL, 0);
@@ -1595,11 +1607,16 @@ static int img_convert(int argc, char **argv)
                 buf1 += n1 * 512;
             }
             qemu_progress_print(100.0 * sector_num / total_sectors, 0);
+            print_sector_progress(progress, sector_num, total_sectors);
         }
     }
 out:
     if (!ret) {
         qemu_progress_print(100, 0);
+        print_sector_progress(progress, total_sectors, total_sectors);
+    }
+    if (progress == 2) {
+        printf("\n");
     }
     qemu_progress_end();
     free_option_parameters(create_options);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 4/6] block/iscsi: set bdi->cluster_size
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 4/6] block/iscsi: set bdi->cluster_size Peter Lieven
@ 2013-11-25 14:51   ` Paolo Bonzini
  2013-11-25 15:02   ` Paolo Bonzini
  1 sibling, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2013-11-25 14:51 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, qemu-devel, stefanha

Il 25/11/2013 14:57, Peter Lieven ha scritto:
> +    /* Guess the internal cluster (page) size of the iscsi target by the means
> +     * of opt_unmap_gran. Transfer the unmap granularity only if it has a
> +     * reasonable size for bdi->cluster_size */
> +    if (iscsilun->bl.opt_unmap_gran * iscsilun->block_size >= 64 * 1024 &&
> +        iscsilun->bl.opt_unmap_gran * iscsilun->block_size <= 16 * 1024 * 1024) {
> +        bdi->cluster_size = iscsilun->bl.opt_unmap_gran * iscsilun->block_size;
> +    }

I think you are mixing many different concepts:

* The optimal unmap granularity is good as a suggestion for the cluster
size of higher-level formats.

* The optimal transfer granularity (block limits page, bytes 6-7,
min_io_size in Linux) could be used to adjust the length of transfers in
"qemu-img convert".  I have not really thought much about *how* to do it.

* The optimal transfer (block limits page, bytes 12-15, opt_io_size in
Linux) should not be used in "qemu-img convert", I think, unless you can
actually report performance improvements.  This is because in "qemu-img
convert" we need to write the data anyway to the target.  We cannot
schedule other commands between two transfers.  So I don't think any
delays incurred by a very large write should matter.

The maximum transfer length (block limits page, bytes 8-11) should be
handled instead in the iSCSI driver, but I do not see the need to do
this unless we have reports of something not working.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 3/6] qemu-img: add option to specify alternate iobuffer size
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 3/6] qemu-img: add option to specify alternate iobuffer size Peter Lieven
@ 2013-11-25 14:54   ` Paolo Bonzini
  2013-11-25 15:01     ` Peter Lieven
  2013-11-25 15:07     ` Peter Lieven
  0 siblings, 2 replies; 25+ messages in thread
From: Paolo Bonzini @ 2013-11-25 14:54 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, qemu-devel, stefanha

Il 25/11/2013 14:57, Peter Lieven ha scritto:
> since the convert process is basically a sync operation it might
> be benificial in some case to change the hardcoded I/O buffer
> size to an alternate (greater) value.

Do you really need the extra knob?  You can just add to BlockLimits the
optimal transfer length, and use it unconditionally.

(It would be interesting to add a double-buffer to qemu-img convert,
using asynchronous I/O...).

Paolo

> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  qemu-img-cmds.hx |    4 ++--
>  qemu-img.c       |   25 ++++++++++++++++++++-----
>  qemu-img.texi    |    4 +++-
>  3 files changed, 25 insertions(+), 8 deletions(-)
> 
> diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
> index da1d965..e0b8ab4 100644
> --- a/qemu-img-cmds.hx
> +++ b/qemu-img-cmds.hx
> @@ -34,9 +34,9 @@ STEXI
>  ETEXI
>  
>  DEF("convert", img_convert,
> -    "convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] filename [filename2 [...]] output_filename")
> +    "convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] [-m iobuf_size] filename [filename2 [...]] output_filename")
>  STEXI
> -@item convert [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
> +@item convert [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
>  ETEXI
>  
>  DEF("info", img_info,
> diff --git a/qemu-img.c b/qemu-img.c
> index e2d1a0a..0ce5d14 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -105,6 +105,7 @@ static void help(void)
>             "       conversion. If the number of bytes is 0, the source will not be scanned for\n"
>             "       unallocated or zero sectors, and the destination image will always be\n"
>             "       fully allocated\n"
> +           "  '-m' specify I/O buffer size in bytes (default 2M)\n"
>             "  '--output' takes the format in which the output must be done (human or json)\n"
>             "  '-n' skips the target volume creation (useful if the volume is created\n"
>             "       prior to running qemu-img)\n"
> @@ -1135,6 +1136,7 @@ static int img_convert(int argc, char **argv)
>              sector_num_next_status = 0;
>      uint64_t bs_sectors;
>      uint8_t * buf = NULL;
> +    size_t bufsectors = IO_BUF_SIZE / BDRV_SECTOR_SIZE;
>      const uint8_t *buf1;
>      BlockDriverInfo bdi;
>      QEMUOptionParameter *param = NULL, *create_options = NULL;
> @@ -1152,7 +1154,7 @@ static int img_convert(int argc, char **argv)
>      compress = 0;
>      skip_create = 0;
>      for(;;) {
> -        c = getopt(argc, argv, "f:O:B:s:hce6o:pS:t:qn");
> +        c = getopt(argc, argv, "f:O:B:s:hce6o:pS:m:t:qn");
>          if (c == -1) {
>              break;
>          }
> @@ -1200,6 +1202,19 @@ static int img_convert(int argc, char **argv)
>              min_sparse = sval / BDRV_SECTOR_SIZE;
>              break;
>          }
> +        case 'm':
> +        {
> +            int64_t sval;
> +            char *end;
> +            sval = strtosz_suffix(optarg, &end, STRTOSZ_DEFSUFFIX_B);
> +            if (sval < BDRV_SECTOR_SIZE || *end) {
> +                error_report("Invalid I/O buffer size specified");
> +                return 1;
> +            }
> +
> +            bufsectors = sval / BDRV_SECTOR_SIZE;
> +            break;
> +        }
>          case 'p':
>              progress = 1;
>              break;
> @@ -1371,7 +1386,7 @@ static int img_convert(int argc, char **argv)
>      bs_i = 0;
>      bs_offset = 0;
>      bdrv_get_geometry(bs[0], &bs_sectors);
> -    buf = qemu_blockalign(out_bs, IO_BUF_SIZE);
> +    buf = qemu_blockalign(out_bs, bufsectors * BDRV_SECTOR_SIZE);
>  
>      if (skip_create) {
>          int64_t output_length = bdrv_getlength(out_bs);
> @@ -1394,7 +1409,7 @@ static int img_convert(int argc, char **argv)
>              goto out;
>          }
>          cluster_size = bdi.cluster_size;
> -        if (cluster_size <= 0 || cluster_size > IO_BUF_SIZE) {
> +        if (cluster_size <= 0 || cluster_size > bufsectors * BDRV_SECTOR_SIZE) {
>              error_report("invalid cluster size");
>              ret = -1;
>              goto out;
> @@ -1531,8 +1546,8 @@ static int img_convert(int argc, char **argv)
>                  sector_num_next_status = sector_num + n1;
>              }
>  
> -            if (nb_sectors >= (IO_BUF_SIZE / 512)) {
> -                n = (IO_BUF_SIZE / 512);
> +            if (nb_sectors >= bufsectors) {
> +                n = bufsectors;
>              } else {
>                  n = nb_sectors;
>              }
> diff --git a/qemu-img.texi b/qemu-img.texi
> index da36975..87f9d0f 100644
> --- a/qemu-img.texi
> +++ b/qemu-img.texi
> @@ -179,7 +179,7 @@ Error on reading data
>  
>  @end table
>  
> -@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
> +@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
>  
>  Convert the disk image @var{filename} or a snapshot @var{snapshot_name} to disk image @var{output_filename}
>  using format @var{output_fmt}. It can be optionally compressed (@code{-c}
> @@ -199,6 +199,8 @@ conversion. If @var{sparse_size} is 0, the source will not be scanned for
>  unallocated or zero sectors, and the destination image will always be
>  fully allocated.
>  
> +@var{iobuf_size} indicates the size of the I/O buffer (defaults to 2M).
> +
>  You can use the @var{backing_file} option to force the output image to be
>  created as a copy on write image of the specified base image; the
>  @var{backing_file} should have the same content as the input's base image,
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 3/6] qemu-img: add option to specify alternate iobuffer size
  2013-11-25 14:54   ` Paolo Bonzini
@ 2013-11-25 15:01     ` Peter Lieven
  2013-11-25 15:07     ` Peter Lieven
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Lieven @ 2013-11-25 15:01 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, qemu-devel, stefanha

On 25.11.2013 15:54, Paolo Bonzini wrote:
> Il 25/11/2013 14:57, Peter Lieven ha scritto:
>> since the convert process is basically a sync operation it might
>> be benificial in some case to change the hardcoded I/O buffer
>> size to an alternate (greater) value.
> Do you really need the extra knob?  You can just add to BlockLimits the
> optimal transfer length, and use it unconditionally.
That would be possible.
>
> (It would be interesting to add a double-buffer to qemu-img convert,
> using asynchronous I/O...).
That would be the best benefit of all. But I thought this would be too
complicated.

Peter
>
> Paolo
>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>   qemu-img-cmds.hx |    4 ++--
>>   qemu-img.c       |   25 ++++++++++++++++++++-----
>>   qemu-img.texi    |    4 +++-
>>   3 files changed, 25 insertions(+), 8 deletions(-)
>>
>> diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
>> index da1d965..e0b8ab4 100644
>> --- a/qemu-img-cmds.hx
>> +++ b/qemu-img-cmds.hx
>> @@ -34,9 +34,9 @@ STEXI
>>   ETEXI
>>   
>>   DEF("convert", img_convert,
>> -    "convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] filename [filename2 [...]] output_filename")
>> +    "convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] [-m iobuf_size] filename [filename2 [...]] output_filename")
>>   STEXI
>> -@item convert [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
>> +@item convert [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
>>   ETEXI
>>   
>>   DEF("info", img_info,
>> diff --git a/qemu-img.c b/qemu-img.c
>> index e2d1a0a..0ce5d14 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -105,6 +105,7 @@ static void help(void)
>>              "       conversion. If the number of bytes is 0, the source will not be scanned for\n"
>>              "       unallocated or zero sectors, and the destination image will always be\n"
>>              "       fully allocated\n"
>> +           "  '-m' specify I/O buffer size in bytes (default 2M)\n"
>>              "  '--output' takes the format in which the output must be done (human or json)\n"
>>              "  '-n' skips the target volume creation (useful if the volume is created\n"
>>              "       prior to running qemu-img)\n"
>> @@ -1135,6 +1136,7 @@ static int img_convert(int argc, char **argv)
>>               sector_num_next_status = 0;
>>       uint64_t bs_sectors;
>>       uint8_t * buf = NULL;
>> +    size_t bufsectors = IO_BUF_SIZE / BDRV_SECTOR_SIZE;
>>       const uint8_t *buf1;
>>       BlockDriverInfo bdi;
>>       QEMUOptionParameter *param = NULL, *create_options = NULL;
>> @@ -1152,7 +1154,7 @@ static int img_convert(int argc, char **argv)
>>       compress = 0;
>>       skip_create = 0;
>>       for(;;) {
>> -        c = getopt(argc, argv, "f:O:B:s:hce6o:pS:t:qn");
>> +        c = getopt(argc, argv, "f:O:B:s:hce6o:pS:m:t:qn");
>>           if (c == -1) {
>>               break;
>>           }
>> @@ -1200,6 +1202,19 @@ static int img_convert(int argc, char **argv)
>>               min_sparse = sval / BDRV_SECTOR_SIZE;
>>               break;
>>           }
>> +        case 'm':
>> +        {
>> +            int64_t sval;
>> +            char *end;
>> +            sval = strtosz_suffix(optarg, &end, STRTOSZ_DEFSUFFIX_B);
>> +            if (sval < BDRV_SECTOR_SIZE || *end) {
>> +                error_report("Invalid I/O buffer size specified");
>> +                return 1;
>> +            }
>> +
>> +            bufsectors = sval / BDRV_SECTOR_SIZE;
>> +            break;
>> +        }
>>           case 'p':
>>               progress = 1;
>>               break;
>> @@ -1371,7 +1386,7 @@ static int img_convert(int argc, char **argv)
>>       bs_i = 0;
>>       bs_offset = 0;
>>       bdrv_get_geometry(bs[0], &bs_sectors);
>> -    buf = qemu_blockalign(out_bs, IO_BUF_SIZE);
>> +    buf = qemu_blockalign(out_bs, bufsectors * BDRV_SECTOR_SIZE);
>>   
>>       if (skip_create) {
>>           int64_t output_length = bdrv_getlength(out_bs);
>> @@ -1394,7 +1409,7 @@ static int img_convert(int argc, char **argv)
>>               goto out;
>>           }
>>           cluster_size = bdi.cluster_size;
>> -        if (cluster_size <= 0 || cluster_size > IO_BUF_SIZE) {
>> +        if (cluster_size <= 0 || cluster_size > bufsectors * BDRV_SECTOR_SIZE) {
>>               error_report("invalid cluster size");
>>               ret = -1;
>>               goto out;
>> @@ -1531,8 +1546,8 @@ static int img_convert(int argc, char **argv)
>>                   sector_num_next_status = sector_num + n1;
>>               }
>>   
>> -            if (nb_sectors >= (IO_BUF_SIZE / 512)) {
>> -                n = (IO_BUF_SIZE / 512);
>> +            if (nb_sectors >= bufsectors) {
>> +                n = bufsectors;
>>               } else {
>>                   n = nb_sectors;
>>               }
>> diff --git a/qemu-img.texi b/qemu-img.texi
>> index da36975..87f9d0f 100644
>> --- a/qemu-img.texi
>> +++ b/qemu-img.texi
>> @@ -179,7 +179,7 @@ Error on reading data
>>   
>>   @end table
>>   
>> -@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
>> +@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
>>   
>>   Convert the disk image @var{filename} or a snapshot @var{snapshot_name} to disk image @var{output_filename}
>>   using format @var{output_fmt}. It can be optionally compressed (@code{-c}
>> @@ -199,6 +199,8 @@ conversion. If @var{sparse_size} is 0, the source will not be scanned for
>>   unallocated or zero sectors, and the destination image will always be
>>   fully allocated.
>>   
>> +@var{iobuf_size} indicates the size of the I/O buffer (defaults to 2M).
>> +
>>   You can use the @var{backing_file} option to force the output image to be
>>   created as a copy on write image of the specified base image; the
>>   @var{backing_file} should have the same content as the input's base image,
>>


-- 

Mit freundlichen Grüßen

Peter Lieven

...........................................................

   KAMP Netzwerkdienste GmbH
   Vestische Str. 89-91 | 46117 Oberhausen
   Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
   pl@kamp.de | http://www.kamp.de

   Geschäftsführer: Heiner Lante | Michael Lante
   Amtsgericht Duisburg | HRB Nr. 12154
   USt-Id-Nr.: DE 120607556

...........................................................

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 4/6] block/iscsi: set bdi->cluster_size
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 4/6] block/iscsi: set bdi->cluster_size Peter Lieven
  2013-11-25 14:51   ` Paolo Bonzini
@ 2013-11-25 15:02   ` Paolo Bonzini
  1 sibling, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2013-11-25 15:02 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, qemu-devel, stefanha

Il 25/11/2013 14:57, Peter Lieven ha scritto:
> this patch aims to set bdi->cluster_size to the internal page size
> of the iscsi target so that enabled callers can align requests
> properly.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/iscsi.c |    7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 93fee6d..75d6b87 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -1580,6 +1580,13 @@ static int iscsi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
>      IscsiLun *iscsilun = bs->opaque;
>      bdi->unallocated_blocks_are_zero = !!iscsilun->lbprz;
>      bdi->can_write_zeroes_with_unmap = iscsilun->lbprz && iscsilun->lbp.lbpws;
> +    /* Guess the internal cluster (page) size of the iscsi target by the means
> +     * of opt_unmap_gran. Transfer the unmap granularity only if it has a
> +     * reasonable size for bdi->cluster_size */
> +    if (iscsilun->bl.opt_unmap_gran * iscsilun->block_size >= 64 * 1024 &&
> +        iscsilun->bl.opt_unmap_gran * iscsilun->block_size <= 16 * 1024 * 1024) {
> +        bdi->cluster_size = iscsilun->bl.opt_unmap_gran * iscsilun->block_size;
> +    }
>      return 0;
>  }
>  
> 

After looking at patch 5, I think this one is correct.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

but I think you should make higher layers (qcow2) use the lower BDS's
granularity, at least as a default.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 3/6] qemu-img: add option to specify alternate iobuffer size
  2013-11-25 14:54   ` Paolo Bonzini
  2013-11-25 15:01     ` Peter Lieven
@ 2013-11-25 15:07     ` Peter Lieven
  2013-11-25 15:14       ` Paolo Bonzini
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2013-11-25 15:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, qemu-devel, stefanha

On 25.11.2013 15:54, Paolo Bonzini wrote:
> Il 25/11/2013 14:57, Peter Lieven ha scritto:
>> since the convert process is basically a sync operation it might
>> be benificial in some case to change the hardcoded I/O buffer
>> size to an alternate (greater) value.
> Do you really need the extra knob?  You can just add to BlockLimits the
> optimal transfer length, and use it unconditionally.
If you say patch 5 and 3 are ok. What could be done is to remove
this knob and increase the iobuf_size to cluster_size if cluster_size
is greater. I do not want to increase the default iobuf size to anything
greater than 2MB. I do not know why this was choosen, but maybe
there was a reason for it.

The storages we use here have a very strange page size of 15MB.
If I sent aligned 15MB chunks the performace is about +50% compared
to the original qemu-img.

Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 5/6] qemu-img: add option to align writes to cluster_sectors during convert
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 5/6] qemu-img: add option to align writes to cluster_sectors during convert Peter Lieven
@ 2013-11-25 15:11   ` Paolo Bonzini
  2013-11-25 15:32     ` Peter Lieven
  2013-11-25 16:11     ` Peter Lieven
  0 siblings, 2 replies; 25+ messages in thread
From: Paolo Bonzini @ 2013-11-25 15:11 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, qemu-devel, stefanha

Il 25/11/2013 14:57, Peter Lieven ha scritto:
> Signed-off-by: Peter Lieven <pl@kamp.de>

Ok, given this patch I think the cluster_size is the right one to use
here---and also the way you used the optimal unmap granularity makes
sense; you could also use MAX(optimal unmap granularity, optimal
transfer length granularity).

However, there is no need to write one cluster at a time.  What matters,
I think, is to align the *end* of the transfer, so that the next
transfer can start aligned.

> +            if (align && cluster_sectors > 0) {
> +                int64_t next_aligned_sector = (sector_num + cluster_sectors);

So this should be "+ n", not "+ cluster_sectors".

Perhaps it could be conditional on "n > cluster_sectors" (small requests
happen when you have sparse region, and breaking them doesn't help).

Finally, I believe there is no need for a separate "-a" knob.

The patch looks fine to me with these small changes, though.

Also, a couple of ideas for separate patches.  Perhaps the default value
of "-S" could be cluster_size if specified?  This would avoid making raw
images too fragmented, and compounding filesystem-level fragmentation
with qcow2-level fragmentation.  And 4K is too small a default in my
opinion; it could be easily changed to 64K, though 4K was of course an
improvement compared to 512 before commit a22f123 (qemu-img: Require
larger zero areas for sparse handling, 2011-08-26).

Paolo

> +                next_aligned_sector -= next_aligned_sector % cluster_sectors;
> +                if (sector_num + n > next_aligned_sector) {
> +                    n = next_aligned_sector - sector_num;
> +                }
> +            }
> +
>              if (n > bs_offset + bs_sectors - sector_num) {
>                  n = bs_offset + bs_sectors - sector_num;
>              }
> diff --git a/qemu-img.texi b/qemu-img.texi
> index 87f9d0f..9b1720f 100644
> --- a/qemu-img.texi
> +++ b/qemu-img.texi
> @@ -179,11 +179,14 @@ Error on reading data
>  
>  @end table
>  
> -@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
> +@item convert [-c] [-p] [-n] [-a] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
>  
>  Convert the disk image @var{filename} or a snapshot @var{snapshot_name} to disk image @var{output_filename}
>  using format @var{output_fmt}. It can be optionally compressed (@code{-c}
>  option) or use any format specific options like encryption (@code{-o} option).
> +If the @code{-a} option is specified write requests will be aligned
> +to the cluster size of the output image if possible. This is the default
> +for compressed images.
>  
>  Only the formats @code{qcow} and @code{qcow2} support compression. The
>  compression is read-only. It means that if a compressed sector is
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 3/6] qemu-img: add option to specify alternate iobuffer size
  2013-11-25 15:07     ` Peter Lieven
@ 2013-11-25 15:14       ` Paolo Bonzini
  2013-11-25 15:24         ` Peter Lieven
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-11-25 15:14 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, qemu-devel, stefanha

Il 25/11/2013 16:07, Peter Lieven ha scritto:
>>> since the convert process is basically a sync operation it might
>>> be benificial in some case to change the hardcoded I/O buffer
>>> size to an alternate (greater) value.
>> Do you really need the extra knob?  You can just add to BlockLimits the
>> optimal transfer length, and use it unconditionally.
> If you say patch 5 and 3 are ok. What could be done is to remove
> this knob and increase the iobuf_size to cluster_size if cluster_size
> is greater.

Yes, that makes sense because it avoids unnecessary COW.

> I do not want to increase the default iobuf size to anything
> greater than 2MB. I do not know why this was choosen, but maybe
> there was a reason for it.

I think it is fine to increase it to the cluster_size or even to the
optimal transfer length (new BlockLimits field).

> The storages we use here have a very strange page size of 15MB.
> If I sent aligned 15MB chunks the performace is about +50% compared
> to the original qemu-img.

I think composing the optimal transfer length and optimal unmap
granularity (cluster_size) will work well:

* clamp maximum size to optimal transfer length

* then, round final sector down to unmap granularity

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 3/6] qemu-img: add option to specify alternate iobuffer size
  2013-11-25 15:14       ` Paolo Bonzini
@ 2013-11-25 15:24         ` Peter Lieven
  2013-11-25 15:48           ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2013-11-25 15:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, qemu-devel, stefanha

On 25.11.2013 16:14, Paolo Bonzini wrote:
> Il 25/11/2013 16:07, Peter Lieven ha scritto:
>>>> since the convert process is basically a sync operation it might
>>>> be benificial in some case to change the hardcoded I/O buffer
>>>> size to an alternate (greater) value.
>>> Do you really need the extra knob?  You can just add to BlockLimits the
>>> optimal transfer length, and use it unconditionally.
>> If you say patch 5 and 3 are ok. What could be done is to remove
>> this knob and increase the iobuf_size to cluster_size if cluster_size
>> is greater.
> Yes, that makes sense because it avoids unnecessary COW.
>
>> I do not want to increase the default iobuf size to anything
>> greater than 2MB. I do not know why this was choosen, but maybe
>> there was a reason for it.
> I think it is fine to increase it to the cluster_size or even to the
> optimal transfer length (new BlockLimits field).
okay scsi speaking:
  bs->bl.optimal_transfer_length = max(iscsilun->bl.opt_xfer_len, iscsilun->bl.opt_unmap_gran) ?

bdi->cluster_size as in Patch 3?
>
>> The storages we use here have a very strange page size of 15MB.
>> If I sent aligned 15MB chunks the performace is about +50% compared
>> to the original qemu-img.
> I think composing the optimal transfer length and optimal unmap
> granularity (cluster_size) will work well:
>
> * clamp maximum size to optimal transfer length.
or increase it if its larger. for the dell equallogic storages we use the opt_unmap_gran
is 15MB!!!
>
> * then, round final sector down to unmap granularity
okay, i will try your modification to only round down the last sector.
would you use bdi->cluster_size or the unmap alignment field from the bs->bl?

Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 5/6] qemu-img: add option to align writes to cluster_sectors during convert
  2013-11-25 15:11   ` Paolo Bonzini
@ 2013-11-25 15:32     ` Peter Lieven
  2013-11-25 15:50       ` Paolo Bonzini
  2013-11-25 16:11     ` Peter Lieven
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2013-11-25 15:32 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, qemu-devel, stefanha

On 25.11.2013 16:11, Paolo Bonzini wrote:
> Il 25/11/2013 14:57, Peter Lieven ha scritto:
>> Signed-off-by: Peter Lieven <pl@kamp.de>
> Ok, given this patch I think the cluster_size is the right one to use
> here---and also the way you used the optimal unmap granularity makes
> sense; you could also use MAX(optimal unmap granularity, optimal
> transfer length granularity).
>
> However, there is no need to write one cluster at a time.  What matters,
> I think, is to align the *end* of the transfer, so that the next
> transfer can start aligned.
>
>> +            if (align && cluster_sectors > 0) {
>> +                int64_t next_aligned_sector = (sector_num + cluster_sectors);
> So this should be "+ n", not "+ cluster_sectors".
>
> Perhaps it could be conditional on "n > cluster_sectors" (small requests
> happen when you have sparse region, and breaking them doesn't help).
>
> Finally, I believe there is no need for a separate "-a" knob.
>
> The patch looks fine to me with these small changes, though.
>
> Also, a couple of ideas for separate patches.  Perhaps the default value
> of "-S" could be cluster_size if specified?  This would avoid making raw
> images too fragmented, and compounding filesystem-level fragmentation
> with qcow2-level fragmentation.  And 4K is too small a default in my
> opinion; it could be easily changed to 64K, though 4K was of course an
> improvement compared to 512 before commit a22f123 (qemu-img: Require
> larger zero areas for sparse handling, 2011-08-26).
I would vote for 64K or 256K, we already use the first for some time. However, it turned out
that (much) bigger values decrease performance. Setting it
to cluster_size can be dangerous. As described in my case its 15MB and
I think for vhd its 1MB. This can be a lot of zeros that have to be written.

Peter
>
> Paolo
>
>> +                next_aligned_sector -= next_aligned_sector % cluster_sectors;
>> +                if (sector_num + n > next_aligned_sector) {
>> +                    n = next_aligned_sector - sector_num;
>> +                }
>> +            }
>> +
>>               if (n > bs_offset + bs_sectors - sector_num) {
>>                   n = bs_offset + bs_sectors - sector_num;
>>               }
>> diff --git a/qemu-img.texi b/qemu-img.texi
>> index 87f9d0f..9b1720f 100644
>> --- a/qemu-img.texi
>> +++ b/qemu-img.texi
>> @@ -179,11 +179,14 @@ Error on reading data
>>   
>>   @end table
>>   
>> -@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
>> +@item convert [-c] [-p] [-n] [-a] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
>>   
>>   Convert the disk image @var{filename} or a snapshot @var{snapshot_name} to disk image @var{output_filename}
>>   using format @var{output_fmt}. It can be optionally compressed (@code{-c}
>>   option) or use any format specific options like encryption (@code{-o} option).
>> +If the @code{-a} option is specified write requests will be aligned
>> +to the cluster size of the output image if possible. This is the default
>> +for compressed images.
>>   
>>   Only the formats @code{qcow} and @code{qcow2} support compression. The
>>   compression is read-only. It means that if a compressed sector is
>>


-- 

Mit freundlichen Grüßen

Peter Lieven

...........................................................

   KAMP Netzwerkdienste GmbH
   Vestische Str. 89-91 | 46117 Oberhausen
   Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
   pl@kamp.de | http://www.kamp.de

   Geschäftsführer: Heiner Lante | Michael Lante
   Amtsgericht Duisburg | HRB Nr. 12154
   USt-Id-Nr.: DE 120607556

...........................................................

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 3/6] qemu-img: add option to specify alternate iobuffer size
  2013-11-25 15:24         ` Peter Lieven
@ 2013-11-25 15:48           ` Paolo Bonzini
  2013-11-25 15:56             ` Peter Lieven
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-11-25 15:48 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, qemu-devel, stefanha

Il 25/11/2013 16:24, Peter Lieven ha scritto:
> On 25.11.2013 16:14, Paolo Bonzini wrote:
>> Il 25/11/2013 16:07, Peter Lieven ha scritto:
>>>>> since the convert process is basically a sync operation it might
>>>>> be benificial in some case to change the hardcoded I/O buffer
>>>>> size to an alternate (greater) value.
>>>> Do you really need the extra knob?  You can just add to BlockLimits the
>>>> optimal transfer length, and use it unconditionally.
>>> If you say patch 5 and 3 are ok. What could be done is to remove
>>> this knob and increase the iobuf_size to cluster_size if cluster_size
>>> is greater.
>> Yes, that makes sense because it avoids unnecessary COW.
>>
>>> I do not want to increase the default iobuf size to anything
>>> greater than 2MB. I do not know why this was choosen, but maybe
>>> there was a reason for it.
>> I think it is fine to increase it to the cluster_size or even to the
>> optimal transfer length (new BlockLimits field).
> okay scsi speaking:
>  bs->bl.optimal_transfer_length = max(iscsilun->bl.opt_xfer_len,
> iscsilun->bl.opt_unmap_gran) ?

I was thinking more of

  bs->bl.optimal_transfer_length = lun2qemu(iscsilun->bl.opt_xfer_len);

 ...
 iobuf_size =
      max(bs->bl.optimal_transfer_length, cluster_sectors)*512;
 iobuf_size = min(16MB, max(2MB, iobuf_size));

> bdi->cluster_size as in Patch 3?

Yes, that one's fine.

>> * clamp maximum size to optimal transfer length.
> or increase it if its larger. for the dell equallogic storages we use
> the opt_unmap_gran
> is 15MB!!!

You're right, maximum size is really bounded anyway by iobuf_size.

>> * then, round final sector down to unmap granularity
> okay, i will try your modification to only round down the last sector.
> would you use bdi->cluster_size or the unmap alignment field from the
> bs->bl?

bdi->cluster_size.  The qemu-img reason for the rounding is to avoid
unnecessary COW, which is what bdi->cluster_size is about.  It just so
happens that it helps your benchmark too. :)

Paolo

> Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 5/6] qemu-img: add option to align writes to cluster_sectors during convert
  2013-11-25 15:32     ` Peter Lieven
@ 2013-11-25 15:50       ` Paolo Bonzini
  2013-11-25 15:55         ` Peter Lieven
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-11-25 15:50 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, qemu-devel, stefanha

Il 25/11/2013 16:32, Peter Lieven ha scritto:
>>
>> Also, a couple of ideas for separate patches.  Perhaps the default value
>> of "-S" could be cluster_size if specified?  This would avoid making raw
>> images too fragmented, and compounding filesystem-level fragmentation
>> with qcow2-level fragmentation.  And 4K is too small a default in my
>> opinion; it could be easily changed to 64K, though 4K was of course an
>> improvement compared to 512 before commit a22f123 (qemu-img: Require
>> larger zero areas for sparse handling, 2011-08-26).
> I would vote for 64K or 256K, we already use the first for some time.
> However, it turned out
> that (much) bigger values decrease performance. Setting it
> to cluster_size can be dangerous. As described in my case its 15MB and
> I think for vhd its 1MB. This can be a lot of zeros that have to be
> written.

What about max(4096, min(bdi->cluster_size, 1048576))?

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 5/6] qemu-img: add option to align writes to cluster_sectors during convert
  2013-11-25 15:50       ` Paolo Bonzini
@ 2013-11-25 15:55         ` Peter Lieven
  2013-11-25 16:02           ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2013-11-25 15:55 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, qemu-devel, stefanha

On 25.11.2013 16:50, Paolo Bonzini wrote:
> Il 25/11/2013 16:32, Peter Lieven ha scritto:
>>> Also, a couple of ideas for separate patches.  Perhaps the default value
>>> of "-S" could be cluster_size if specified?  This would avoid making raw
>>> images too fragmented, and compounding filesystem-level fragmentation
>>> with qcow2-level fragmentation.  And 4K is too small a default in my
>>> opinion; it could be easily changed to 64K, though 4K was of course an
>>> improvement compared to 512 before commit a22f123 (qemu-img: Require
>>> larger zero areas for sparse handling, 2011-08-26).
>> I would vote for 64K or 256K, we already use the first for some time.
>> However, it turned out
>> that (much) bigger values decrease performance. Setting it
>> to cluster_size can be dangerous. As described in my case its 15MB and
>> I think for vhd its 1MB. This can be a lot of zeros that have to be
>> written.
> What about max(4096, min(bdi->cluster_size, 1048576))?
chaning sparse_size from 65536 to 1048576 about 5% performance decrease...

lieven@lieven-pc:~/git/qemu$ time ./qemu-img convert -pp -m 15728640 -S 1048576 /tmp/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 iscsi://172.21.200.45/iqn.2001-05.com.equallogic:0-8a0906-9d95c510a-344001d54795289f-2012-r2-1-7-0/0
40980480 of 40980480 sectors converted.

real    0m29.263s
user    0m7.544s
sys    0m1.636s
lieven@lieven-pc:~/git/qemu$ time ./qemu-img convert -pp -m 15728640 -S 4096 /tmp/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 iscsi://172.21.200.45/iqn.2001-05.com.equallogic:0-8a0906-9d95c510a-344001d54795289f-2012-r2-1-7-0/0
40980480 of 40980480 sectors converted.

real    0m28.169s
user    0m7.792s
sys    0m1.516s
lieven@lieven-pc:~/git/qemu$ time ./qemu-img convert -pp -m 15728640 -S 65536 /tmp/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 iscsi://172.21.200.45/iqn.2001-05.com.equallogic:0-8a0906-9d95c510a-344001d54795289f-2012-r2-1-7-0/0
40980480 of 40980480 sectors converted.

real    0m27.643s
user    0m7.644s
sys    0m1.520s

i wouldn't go over 64k until we fully understand which impact it has.

Peter

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 3/6] qemu-img: add option to specify alternate iobuffer size
  2013-11-25 15:48           ` Paolo Bonzini
@ 2013-11-25 15:56             ` Peter Lieven
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Lieven @ 2013-11-25 15:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, qemu-devel, stefanha

On 25.11.2013 16:48, Paolo Bonzini wrote:
> Il 25/11/2013 16:24, Peter Lieven ha scritto:
>> On 25.11.2013 16:14, Paolo Bonzini wrote:
>>> Il 25/11/2013 16:07, Peter Lieven ha scritto:
>>>>>> since the convert process is basically a sync operation it might
>>>>>> be benificial in some case to change the hardcoded I/O buffer
>>>>>> size to an alternate (greater) value.
>>>>> Do you really need the extra knob?  You can just add to BlockLimits the
>>>>> optimal transfer length, and use it unconditionally.
>>>> If you say patch 5 and 3 are ok. What could be done is to remove
>>>> this knob and increase the iobuf_size to cluster_size if cluster_size
>>>> is greater.
>>> Yes, that makes sense because it avoids unnecessary COW.
>>>
>>>> I do not want to increase the default iobuf size to anything
>>>> greater than 2MB. I do not know why this was choosen, but maybe
>>>> there was a reason for it.
>>> I think it is fine to increase it to the cluster_size or even to the
>>> optimal transfer length (new BlockLimits field).
>> okay scsi speaking:
>>   bs->bl.optimal_transfer_length = max(iscsilun->bl.opt_xfer_len,
>> iscsilun->bl.opt_unmap_gran) ?
> I was thinking more of
>
>    bs->bl.optimal_transfer_length = lun2qemu(iscsilun->bl.opt_xfer_len);
>
>   ...
>   iobuf_size =
>        max(bs->bl.optimal_transfer_length, cluster_sectors)*512;
>   iobuf_size = min(16MB, max(2MB, iobuf_size));
okay. same result ;-)

Peter ........................................................... KAMP Netzwerkdienste GmbH Vestische Str. 89-91 | 46117 Oberhausen Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40 pl@kamp.de | http://www.kamp.de Geschäftsführer: Heiner Lante | 
Michael Lante Amtsgericht Duisburg | HRB Nr. 12154 USt-Id-Nr.: DE 120607556 ...........................................................

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 5/6] qemu-img: add option to align writes to cluster_sectors during convert
  2013-11-25 15:55         ` Peter Lieven
@ 2013-11-25 16:02           ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2013-11-25 16:02 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, qemu-devel, stefanha

Il 25/11/2013 16:55, Peter Lieven ha scritto:
>>
> chaning sparse_size from 65536 to 1048576 about 5% performance decrease...
> 
> lieven@lieven-pc:~/git/qemu$ time ./qemu-img convert -pp -m 15728640 -S
> 1048576 /tmp/VC-Ubuntu-LTS-12.04.2-64bit.qcow2
> iscsi://172.21.200.45/iqn.2001-05.com.equallogic:0-8a0906-9d95c510a-344001d54795289f-2012-r2-1-7-0/0
> 
> 40980480 of 40980480 sectors converted.
> 
> real    0m29.263s
> user    0m7.544s
> sys    0m1.636s
> lieven@lieven-pc:~/git/qemu$ time ./qemu-img convert -pp -m 15728640 -S
> 4096 /tmp/VC-Ubuntu-LTS-12.04.2-64bit.qcow2
> iscsi://172.21.200.45/iqn.2001-05.com.equallogic:0-8a0906-9d95c510a-344001d54795289f-2012-r2-1-7-0/0
> 
> 40980480 of 40980480 sectors converted.
> 
> real    0m28.169s
> user    0m7.792s
> sys    0m1.516s
> lieven@lieven-pc:~/git/qemu$ time ./qemu-img convert -pp -m 15728640 -S
> 65536 /tmp/VC-Ubuntu-LTS-12.04.2-64bit.qcow2
> iscsi://172.21.200.45/iqn.2001-05.com.equallogic:0-8a0906-9d95c510a-344001d54795289f-2012-r2-1-7-0/0
> 
> 40980480 of 40980480 sectors converted.
> 
> real    0m27.643s
> user    0m7.644s
> sys    0m1.520s
> 
> i wouldn't go over 64k until we fully understand which impact it has.

I agree.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 5/6] qemu-img: add option to align writes to cluster_sectors during convert
  2013-11-25 15:11   ` Paolo Bonzini
  2013-11-25 15:32     ` Peter Lieven
@ 2013-11-25 16:11     ` Peter Lieven
  2013-11-25 16:34       ` Paolo Bonzini
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Lieven @ 2013-11-25 16:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, qemu-devel, stefanha

On 25.11.2013 16:11, Paolo Bonzini wrote:
> Il 25/11/2013 14:57, Peter Lieven ha scritto:
>> Signed-off-by: Peter Lieven <pl@kamp.de>
> Ok, given this patch I think the cluster_size is the right one to use
> here---and also the way you used the optimal unmap granularity makes
> sense; you could also use MAX(optimal unmap granularity, optimal
> transfer length granularity).
>
> However, there is no need to write one cluster at a time.  What matters,
> I think, is to align the *end* of the transfer, so that the next
> transfer can start aligned.
>
>> +            if (align && cluster_sectors > 0) {
>> +                int64_t next_aligned_sector = (sector_num + cluster_sectors);
> So this should be "+ n", not "+ cluster_sectors".
>
> Perhaps it could be conditional on "n > cluster_sectors" (small requests
> happen when you have sparse region, and breaking them doesn't help).
would you also agree to n >= cluster_sectors. In my case
and if especially if n is bound by iobuf_size the case n > cluster_sectors
will be hard to meet.

Peter

>
> Finally, I believe there is no need for a separate "-a" knob.
>
> The patch looks fine to me with these small changes, though.
>
> Also, a couple of ideas for separate patches.  Perhaps the default value
> of "-S" could be cluster_size if specified?  This would avoid making raw
> images too fragmented, and compounding filesystem-level fragmentation
> with qcow2-level fragmentation.  And 4K is too small a default in my
> opinion; it could be easily changed to 64K, though 4K was of course an
> improvement compared to 512 before commit a22f123 (qemu-img: Require
> larger zero areas for sparse handling, 2011-08-26).
>
> Paolo
>
>> +                next_aligned_sector -= next_aligned_sector % cluster_sectors;
>> +                if (sector_num + n > next_aligned_sector) {
>> +                    n = next_aligned_sector - sector_num;
>> +                }
>> +            }
>> +
>>               if (n > bs_offset + bs_sectors - sector_num) {
>>                   n = bs_offset + bs_sectors - sector_num;
>>               }
>> diff --git a/qemu-img.texi b/qemu-img.texi
>> index 87f9d0f..9b1720f 100644
>> --- a/qemu-img.texi
>> +++ b/qemu-img.texi
>> @@ -179,11 +179,14 @@ Error on reading data
>>   
>>   @end table
>>   
>> -@item convert [-c] [-p] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
>> +@item convert [-c] [-p] [-n] [-a] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] [-m @var{iobuf_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
>>   
>>   Convert the disk image @var{filename} or a snapshot @var{snapshot_name} to disk image @var{output_filename}
>>   using format @var{output_fmt}. It can be optionally compressed (@code{-c}
>>   option) or use any format specific options like encryption (@code{-o} option).
>> +If the @code{-a} option is specified write requests will be aligned
>> +to the cluster size of the output image if possible. This is the default
>> +for compressed images.
>>   
>>   Only the formats @code{qcow} and @code{qcow2} support compression. The
>>   compression is read-only. It means that if a compressed sector is
>>


-- 

Mit freundlichen Grüßen

Peter Lieven

...........................................................

   KAMP Netzwerkdienste GmbH
   Vestische Str. 89-91 | 46117 Oberhausen
   Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
   pl@kamp.de | http://www.kamp.de

   Geschäftsführer: Heiner Lante | Michael Lante
   Amtsgericht Duisburg | HRB Nr. 12154
   USt-Id-Nr.: DE 120607556

...........................................................

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 5/6] qemu-img: add option to align writes to cluster_sectors during convert
  2013-11-25 16:11     ` Peter Lieven
@ 2013-11-25 16:34       ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2013-11-25 16:34 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, qemu-devel, stefanha

Il 25/11/2013 17:11, Peter Lieven ha scritto:
> On 25.11.2013 16:11, Paolo Bonzini wrote:
>> Il 25/11/2013 14:57, Peter Lieven ha scritto:
>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> Ok, given this patch I think the cluster_size is the right one to use
>> here---and also the way you used the optimal unmap granularity makes
>> sense; you could also use MAX(optimal unmap granularity, optimal
>> transfer length granularity).
>>
>> However, there is no need to write one cluster at a time.  What matters,
>> I think, is to align the *end* of the transfer, so that the next
>> transfer can start aligned.
>>
>>> +            if (align && cluster_sectors > 0) {
>>> +                int64_t next_aligned_sector = (sector_num +
>>> cluster_sectors);
>> So this should be "+ n", not "+ cluster_sectors".
>>
>> Perhaps it could be conditional on "n > cluster_sectors" (small requests
>> happen when you have sparse region, and breaking them doesn't help).
> 
> would you also agree to n >= cluster_sectors. In my case
> and if especially if n is bound by iobuf_size the case n > cluster_sectors
> will be hard to meet.

Of course.  In fact > alone is wrong ("n > cluster_sectors || n ==
iobuf_size" could be right, but perhaps it's a useless complication).

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 2/6] qemu-img: fix usage instruction for qemu-img convert
  2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 2/6] qemu-img: fix usage instruction for qemu-img convert Peter Lieven
@ 2013-11-25 17:21   ` Eric Blake
  2013-11-26  7:01     ` Peter Lieven
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Blake @ 2013-11-25 17:21 UTC (permalink / raw)
  To: Peter Lieven, qemu-devel; +Cc: kwolf, pbonzini, stefanha

[-- Attachment #1: Type: text/plain, Size: 855 bytes --]

On 11/25/2013 06:57 AM, Peter Lieven wrote:
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  qemu-img.c |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index efb744c..e2d1a0a 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -105,7 +105,6 @@ static void help(void)
>             "       conversion. If the number of bytes is 0, the source will not be scanned for\n"
>             "       unallocated or zero sectors, and the destination image will always be\n"
>             "       fully allocated\n"
> -           "       images will always be fully allocated\n"

This feels like it is worth including in 1.7 as a documentation bug fix.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH 1.8 2/6] qemu-img: fix usage instruction for qemu-img convert
  2013-11-25 17:21   ` Eric Blake
@ 2013-11-26  7:01     ` Peter Lieven
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Lieven @ 2013-11-26  7:01 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, pbonzini, stefanha

On 25.11.2013 18:21, Eric Blake wrote:
> On 11/25/2013 06:57 AM, Peter Lieven wrote:
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>   qemu-img.c |    1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/qemu-img.c b/qemu-img.c
>> index efb744c..e2d1a0a 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -105,7 +105,6 @@ static void help(void)
>>              "       conversion. If the number of bytes is 0, the source will not be scanned for\n"
>>              "       unallocated or zero sectors, and the destination image will always be\n"
>>              "       fully allocated\n"
>> -           "       images will always be fully allocated\n"
> This feels like it is worth including in 1.7 as a documentation bug fix.
Not needed, the offending patch is in block-next and not merged in 1.7.

Peter
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-11-26  7:00 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-25 13:57 [Qemu-devel] [PATCH 1.8 0/6] qemu-img convert optimizations Peter Lieven
2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 1/6] qemu-img: add support for skipping zeroes in input during convert Peter Lieven
2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 2/6] qemu-img: fix usage instruction for qemu-img convert Peter Lieven
2013-11-25 17:21   ` Eric Blake
2013-11-26  7:01     ` Peter Lieven
2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 3/6] qemu-img: add option to specify alternate iobuffer size Peter Lieven
2013-11-25 14:54   ` Paolo Bonzini
2013-11-25 15:01     ` Peter Lieven
2013-11-25 15:07     ` Peter Lieven
2013-11-25 15:14       ` Paolo Bonzini
2013-11-25 15:24         ` Peter Lieven
2013-11-25 15:48           ` Paolo Bonzini
2013-11-25 15:56             ` Peter Lieven
2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 4/6] block/iscsi: set bdi->cluster_size Peter Lieven
2013-11-25 14:51   ` Paolo Bonzini
2013-11-25 15:02   ` Paolo Bonzini
2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 5/6] qemu-img: add option to align writes to cluster_sectors during convert Peter Lieven
2013-11-25 15:11   ` Paolo Bonzini
2013-11-25 15:32     ` Peter Lieven
2013-11-25 15:50       ` Paolo Bonzini
2013-11-25 15:55         ` Peter Lieven
2013-11-25 16:02           ` Paolo Bonzini
2013-11-25 16:11     ` Peter Lieven
2013-11-25 16:34       ` Paolo Bonzini
2013-11-25 13:57 ` [Qemu-devel] [PATCH 1.8 6/6] qemu-img: add option to show progress in sectors Peter Lieven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).