[Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4
@ 2014-11-10 13:45 Max Reitz
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 01/21] qcow2: Add two new fields to BDRVQcowState Max Reitz
                   ` (20 more replies)
  0 siblings, 21 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

This should not need much of a cover letter, but here goes anyway:

As of version 3, the qcow2 file format supports different widths for
refcount entries, ranging from 1 to 64 bit (only powers of two).
Currently, qemu only supports 16 bit, which is the only width supported
by version 2 (compat=0.10) images.

This series adds support to qemu for all other valid refcount orders.
This is mainly done by adding two function pointers into the
BDRVQcowState structure for reading and writing refcount values
independently of the current refcount entry width; all in-memory
refcount arrays (mostly cached refcount blocks) now are void pointers
and are accessed through these functions alone.

Thanks to previous work of making the qemu code agnostic of e.g. the
number of refcount entries per refcount block, the rest is fairly
trivial. The most complex patch in this series is patch 18 which
implements changing the refcount order through qemu-img amend.

To test different refcount widths, simply invoke the qemu-iotests check
program with -o refcount_width=${your_desired_width}. The final test in
this series adds some tests for operations which do not work with
certain refcount orders and for refcount order amendment.


Max Reitz (21):
  qcow2: Add two new fields to BDRVQcowState
  qcow2: Add refcount_width to format-specific info
  qcow2: Use 64 bits for refcount values
  qcow2: Respect error in qcow2_alloc_bytes()
  qcow2: Refcount overflow and qcow2_alloc_bytes()
  qcow2: Helper function for refcount modification
  qcow2: Helper for refcount array size calculation
  qcow2: More helpers for refcount modification
  qcow2: Open images with refcount order != 4
  qcow2: refcount_order parameter for qcow2_create2
  iotests: Prepare for refcount_width option
  qcow2: Allow creation with refcount order != 4
  block: Add opaque value to the amend CB
  qcow2: Use error_report() in qcow2_amend_options()
  qcow2: Use abort() instead of assert(false)
  qcow2: Split upgrade/downgrade paths for amend
  qcow2: Use intermediate helper CB for amend
  qcow2: Add function for refcount order amendment
  qcow2: Invoke refcount order amendment function
  qcow2: Point to amend function in check
  iotests: Add test for different refcount widths

 block.c                          |   4 +-
 block/qcow2-cluster.c            |  23 +-
 block/qcow2-refcount.c           | 846 ++++++++++++++++++++++++++++++++++-----
 block/qcow2.c                    | 252 +++++++++---
 block/qcow2.h                    |  24 +-
 include/block/block.h            |   4 +-
 include/block/block_int.h        |   4 +-
 qapi/block-core.json             |   5 +-
 qemu-img.c                       |   5 +-
 tests/qemu-iotests/007           |   4 +
 tests/qemu-iotests/015           |   1 +
 tests/qemu-iotests/026           |  11 +
 tests/qemu-iotests/029           |   1 +
 tests/qemu-iotests/049.out       | 112 +++---
 tests/qemu-iotests/051           |   1 +
 tests/qemu-iotests/058           |   1 +
 tests/qemu-iotests/060.out       |   1 +
 tests/qemu-iotests/061.out       |  14 +-
 tests/qemu-iotests/065           |  23 +-
 tests/qemu-iotests/067           |   7 +
 tests/qemu-iotests/067.out       |  10 +-
 tests/qemu-iotests/079           |   1 +
 tests/qemu-iotests/079.out       |  18 +-
 tests/qemu-iotests/080           |   1 +
 tests/qemu-iotests/082.out       |  48 ++-
 tests/qemu-iotests/085.out       |  38 +-
 tests/qemu-iotests/089           |   7 +
 tests/qemu-iotests/089.out       |   2 +
 tests/qemu-iotests/090           |   1 +
 tests/qemu-iotests/108           |   6 +
 tests/qemu-iotests/112           | 225 +++++++++++
 tests/qemu-iotests/112.out       | 123 ++++++
 tests/qemu-iotests/common.filter |   3 +-
 tests/qemu-iotests/group         |   1 +
 34 files changed, 1546 insertions(+), 281 deletions(-)
 create mode 100755 tests/qemu-iotests/112
 create mode 100644 tests/qemu-iotests/112.out

-- 
1.9.3

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 01/21] qcow2: Add two new fields to BDRVQcowState
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-10 19:00   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 02/21] qcow2: Add refcount_width to format-specific info Max Reitz
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add two new fields regarding refcount information (the bit width of
every entry and the maximum refcount value) to the BDRVQcowState.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 2 +-
 block/qcow2.c          | 9 +++++++++
 block/qcow2.h          | 2 ++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 9afdb40..6016211 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -584,7 +584,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
 
         refcount = be16_to_cpu(refcount_block[block_index]);
         refcount += addend;
-        if (refcount < 0 || refcount > 0xffff) {
+        if (refcount < 0 || refcount > s->refcount_max) {
             ret = -EINVAL;
             goto fail;
         }
diff --git a/block/qcow2.c b/block/qcow2.c
index d120494..f57aff9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -684,6 +684,15 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
     s->refcount_order = header.refcount_order;
+    s->refcount_bits = 1 << s->refcount_order;
+    if (s->refcount_order < 6) {
+        s->refcount_max = (UINT64_C(1) << s->refcount_bits) - 1;
+    } else {
+        /* The above shift would overflow with s->refcount_bits == 64;
+         * furthermore, we do not want to use UINT64_MAX because refcounts will
+         * be passed around in int64_ts (negative values for -errno) */
+        s->refcount_max = INT64_MAX;
+    }
 
     if (header.crypt_method > QCOW_CRYPT_AES) {
         error_setg(errp, "Unsupported encryption method: %" PRIu32,
diff --git a/block/qcow2.h b/block/qcow2.h
index 6e39a1b..4d8c902 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -258,6 +258,8 @@ typedef struct BDRVQcowState {
     int qcow_version;
     bool use_lazy_refcounts;
     int refcount_order;
+    int refcount_bits;
+    uint64_t refcount_max;
 
     bool discard_passthrough[QCOW2_DISCARD_MAX];
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 02/21] qcow2: Add refcount_width to format-specific info
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 01/21] qcow2: Add two new fields to BDRVQcowState Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-10 19:06   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 03/21] qcow2: Use 64 bits for refcount values Max Reitz
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add the bit width of every refcount entry to the format-specific
information.

This breaks some test outputs, fix them.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c              |  4 +++-
 qapi/block-core.json       |  5 ++++-
 tests/qemu-iotests/060.out |  1 +
 tests/qemu-iotests/065     | 23 +++++++++++++++--------
 tests/qemu-iotests/067.out | 10 +++++-----
 tests/qemu-iotests/082.out |  7 +++++++
 tests/qemu-iotests/089.out |  2 ++
 7 files changed, 37 insertions(+), 15 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index f57aff9..d70e927 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2475,7 +2475,8 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs)
     };
     if (s->qcow_version == 2) {
         *spec_info->qcow2 = (ImageInfoSpecificQCow2){
-            .compat = g_strdup("0.10"),
+            .compat             = g_strdup("0.10"),
+            .refcount_width     = s->refcount_bits,
         };
     } else if (s->qcow_version == 3) {
         *spec_info->qcow2 = (ImageInfoSpecificQCow2){
@@ -2486,6 +2487,7 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs)
             .corrupt            = s->incompatible_features &
                                   QCOW2_INCOMPAT_CORRUPT,
             .has_corrupt        = true,
+            .refcount_width     = s->refcount_bits,
         };
     }
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6b4040f..8394c9b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -41,13 +41,16 @@
 # @corrupt: #optional true if the image has been marked corrupt; only valid for
 #           compat >= 1.1 (since 2.2)
 #
+# @refcount-width: width of a refcount entry in bits (since 2.3)
+#
 # Since: 1.7
 ##
 { 'type': 'ImageInfoSpecificQCow2',
   'data': {
       'compat': 'str',
       '*lazy-refcounts': 'bool',
-      '*corrupt': 'bool'
+      '*corrupt': 'bool',
+      'refcount-width': 'int'
   } }
 
 ##
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 9419da1..17b3eaf 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -19,6 +19,7 @@ cluster_size: 65536
 Format specific information:
     compat: 1.1
     lazy refcounts: false
+    refcount width: 16
     corrupt: true
 qemu-io: can't open device TEST_DIR/t.IMGFMT: IMGFMT: Image is corrupt; cannot be opened read/write
 read 512/512 bytes at offset 0
diff --git a/tests/qemu-iotests/065 b/tests/qemu-iotests/065
index 8d3a9c9..8539aeb 100755
--- a/tests/qemu-iotests/065
+++ b/tests/qemu-iotests/065
@@ -88,34 +88,41 @@ class TestQMP(TestImageInfoSpecific):
 class TestQCow2(TestQemuImgInfo):
     '''Testing a qcow2 version 2 image'''
     img_options = 'compat=0.10'
-    json_compare = { 'compat': '0.10' }
-    human_compare = [ 'compat: 0.10' ]
+    json_compare = { 'compat': '0.10', 'refcount-width': 16 }
+    human_compare = [ 'compat: 0.10', 'refcount width: 16' ]
 
 class TestQCow3NotLazy(TestQemuImgInfo):
     '''Testing a qcow2 version 3 image with lazy refcounts disabled'''
     img_options = 'compat=1.1,lazy_refcounts=off'
-    json_compare = { 'compat': '1.1', 'lazy-refcounts': False, 'corrupt': False }
-    human_compare = [ 'compat: 1.1', 'lazy refcounts: false', 'corrupt: false' ]
+    json_compare = { 'compat': '1.1', 'lazy-refcounts': False,
+                     'refcount-width': 16, 'corrupt': False }
+    human_compare = [ 'compat: 1.1', 'lazy refcounts: false',
+                      'refcount width: 16', 'corrupt: false' ]
 
 class TestQCow3Lazy(TestQemuImgInfo):
     '''Testing a qcow2 version 3 image with lazy refcounts enabled'''
     img_options = 'compat=1.1,lazy_refcounts=on'
-    json_compare = { 'compat': '1.1', 'lazy-refcounts': True, 'corrupt': False }
-    human_compare = [ 'compat: 1.1', 'lazy refcounts: true', 'corrupt: false' ]
+    json_compare = { 'compat': '1.1', 'lazy-refcounts': True,
+                     'refcount-width': 16, 'corrupt': False }
+    human_compare = [ 'compat: 1.1', 'lazy refcounts: true',
+                      'refcount width: 16', 'corrupt: false' ]
 
 class TestQCow3NotLazyQMP(TestQMP):
     '''Testing a qcow2 version 3 image with lazy refcounts disabled, opening
        with lazy refcounts enabled'''
     img_options = 'compat=1.1,lazy_refcounts=off'
     qemu_options = 'lazy-refcounts=on'
-    compare = { 'compat': '1.1', 'lazy-refcounts': False, 'corrupt': False }
+    compare = { 'compat': '1.1', 'lazy-refcounts': False,
+                'refcount-width': 16, 'corrupt': False }
+
 
 class TestQCow3LazyQMP(TestQMP):
     '''Testing a qcow2 version 3 image with lazy refcounts enabled, opening
        with lazy refcounts disabled'''
     img_options = 'compat=1.1,lazy_refcounts=on'
     qemu_options = 'lazy-refcounts=off'
-    compare = { 'compat': '1.1', 'lazy-refcounts': True, 'corrupt': False }
+    compare = { 'compat': '1.1', 'lazy-refcounts': True,
+                'refcount-width': 16, 'corrupt': False }
 
 TestImageInfoSpecific = None
 TestQemuImgInfo = None
diff --git a/tests/qemu-iotests/067.out b/tests/qemu-iotests/067.out
index 0f72dcf..c04c57e 100644
--- a/tests/qemu-iotests/067.out
+++ b/tests/qemu-iotests/067.out
@@ -6,7 +6,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,if=none,id=disk -device virtio-blk-pci,drive=disk,id=virtio0
 QMP_VERSION
 {"return": {}}
-{"return": [{"io-status": "ok", "device": "disk", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 134217728, "filename": "TEST_DIR/t.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": SIZE, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "TEST_DIR/t.qcow2", "encryption_key_missing": false}, "type": "unknown"}, {"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}]}
+{"return": [{"io-status": "ok", "device": "disk", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 134217728, "filename": "TEST_DIR/t.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": SIZE, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-width": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "TEST_DIR/t.qcow2", "encryption_key_missing": false}, "type": "unknown"}, {"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}]}
 {"return": {}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio0/virtio-backend"}}
@@ -24,7 +24,7 @@ QMP_VERSION
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,if=none,id=disk
 QMP_VERSION
 {"return": {}}
-{"return": [{"device": "disk", "locked": false, "removable": true, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 134217728, "filename": "TEST_DIR/t.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": SIZE, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "TEST_DIR/t.qcow2", "encryption_key_missing": false}, "tray_open": false, "type": "unknown"}, {"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}]}
+{"return": [{"device": "disk", "locked": false, "removable": true, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 134217728, "filename": "TEST_DIR/t.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": SIZE, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-width": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "TEST_DIR/t.qcow2", "encryption_key_missing": false}, "tray_open": false, "type": "unknown"}, {"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}]}
 {"return": {}}
 {"return": {}}
 {"return": {}}
@@ -44,7 +44,7 @@ Testing:
 QMP_VERSION
 {"return": {}}
 {"return": "OK\r\n"}
-{"return": [{"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "disk", "locked": false, "removable": true, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 134217728, "filename": "TEST_DIR/t.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": SIZE, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "TEST_DIR/t.qcow2", "encryption_key_missing": false}, "tray_open": false, "type": "unknown"}]}
+{"return": [{"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "disk", "locked": false, "removable": true, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 134217728, "filename": "TEST_DIR/t.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": SIZE, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-width": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "TEST_DIR/t.qcow2", "encryption_key_missing": false}, "tray_open": false, "type": "unknown"}]}
 {"return": {}}
 {"return": {}}
 {"return": {}}
@@ -64,14 +64,14 @@ Testing:
 QMP_VERSION
 {"return": {}}
 {"return": {}}
-{"return": [{"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "disk", "locked": false, "removable": true, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 134217728, "filename": "TEST_DIR/t.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": SIZE, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "TEST_DIR/t.qcow2", "encryption_key_missing": false}, "tray_open": false, "type": "unknown"}]}
+{"return": [{"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "disk", "locked": false, "removable": true, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 134217728, "filename": "TEST_DIR/t.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": SIZE, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-width": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "TEST_DIR/t.qcow2", "encryption_key_missing": false}, "tray_open": false, "type": "unknown"}]}
 {"return": {}}
 {"return": {}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio0/virtio-backend"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "DEVICE_DELETED", "data": {"device": "virtio0", "path": "/machine/peripheral/virtio0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "RESET"}
-{"return": [{"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"io-status": "ok", "device": "disk", "locked": false, "removable": true, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 134217728, "filename": "TEST_DIR/t.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": SIZE, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "TEST_DIR/t.qcow2", "encryption_key_missing": false}, "tray_open": false, "type": "unknown"}]}
+{"return": [{"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"io-status": "ok", "device": "disk", "locked": false, "removable": true, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 134217728, "filename": "TEST_DIR/t.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": SIZE, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-width": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "TEST_DIR/t.qcow2", "encryption_key_missing": false}, "tray_open": false, "type": "unknown"}]}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN"}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "DEVICE_TRAY_MOVED", "data": {"device": "ide1-cd0", "tray-open": true}}
diff --git a/tests/qemu-iotests/082.out b/tests/qemu-iotests/082.out
index 0a3ab5a..4b14b4f 100644
--- a/tests/qemu-iotests/082.out
+++ b/tests/qemu-iotests/082.out
@@ -21,6 +21,7 @@ cluster_size: 4096
 Format specific information:
     compat: 1.1
     lazy refcounts: true
+    refcount width: 16
     corrupt: false
 
 Testing: create -f qcow2 -o cluster_size=4k -o lazy_refcounts=on -o cluster_size=8k TEST_DIR/t.qcow2 128M
@@ -35,6 +36,7 @@ cluster_size: 8192
 Format specific information:
     compat: 1.1
     lazy refcounts: true
+    refcount width: 16
     corrupt: false
 
 Testing: create -f qcow2 -o cluster_size=4k,cluster_size=8k TEST_DIR/t.qcow2 128M
@@ -199,6 +201,7 @@ cluster_size: 4096
 Format specific information:
     compat: 1.1
     lazy refcounts: true
+    refcount width: 16
     corrupt: false
 
 Testing: convert -O qcow2 -o cluster_size=4k -o lazy_refcounts=on -o cluster_size=8k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -212,6 +215,7 @@ cluster_size: 8192
 Format specific information:
     compat: 1.1
     lazy refcounts: true
+    refcount width: 16
     corrupt: false
 
 Testing: convert -O qcow2 -o cluster_size=4k,cluster_size=8k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -361,6 +365,7 @@ cluster_size: 65536
 Format specific information:
     compat: 1.1
     lazy refcounts: true
+    refcount width: 16
     corrupt: false
 
 Testing: amend -f qcow2 -o size=130M -o lazy_refcounts=off TEST_DIR/t.qcow2
@@ -374,6 +379,7 @@ cluster_size: 65536
 Format specific information:
     compat: 1.1
     lazy refcounts: false
+    refcount width: 16
     corrupt: false
 
 Testing: amend -f qcow2 -o size=8M -o lazy_refcounts=on -o size=132M TEST_DIR/t.qcow2
@@ -387,6 +393,7 @@ cluster_size: 65536
 Format specific information:
     compat: 1.1
     lazy refcounts: true
+    refcount width: 16
     corrupt: false
 
 Testing: amend -f qcow2 -o size=4M,size=148M TEST_DIR/t.qcow2
diff --git a/tests/qemu-iotests/089.out b/tests/qemu-iotests/089.out
index b2b0390..d788b46 100644
--- a/tests/qemu-iotests/089.out
+++ b/tests/qemu-iotests/089.out
@@ -41,6 +41,7 @@ vm state offset: 512 MiB
 Format specific information:
     compat: 1.1
     lazy refcounts: false
+    refcount width: 16
     corrupt: false
 format name: IMGFMT
 cluster size: 64 KiB
@@ -48,5 +49,6 @@ vm state offset: 512 MiB
 Format specific information:
     compat: 1.1
     lazy refcounts: false
+    refcount width: 16
     corrupt: false
 *** done
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 03/21] qcow2: Use 64 bits for refcount values
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 01/21] qcow2: Add two new fields to BDRVQcowState Max Reitz
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 02/21] qcow2: Add refcount_width to format-specific info Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-10 20:59   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 04/21] qcow2: Respect error in qcow2_alloc_bytes() Max Reitz
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Refcounts may have a width of up to 64 bit, so qemu should use the same
width to represent refcount values internally.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cluster.c  |  9 ++++++---
 block/qcow2-refcount.c | 37 ++++++++++++++++++++-----------------
 block/qcow2.h          |  7 ++++---
 3 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index df0b2c9..ab43902 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1640,7 +1640,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
     for (i = 0; i < l1_size; i++) {
         uint64_t l2_offset = l1_table[i] & L1E_OFFSET_MASK;
         bool l2_dirty = false;
-        int l2_refcount;
+        int64_t l2_refcount;
 
         if (!l2_offset) {
             /* unallocated */
@@ -1696,14 +1696,17 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                 }
 
                 if (l2_refcount > 1) {
+                    int64_t ret64;
+
                     /* For shared L2 tables, set the refcount accordingly (it is
                      * already 1 and needs to be l2_refcount) */
-                    ret = qcow2_update_cluster_refcount(bs,
+                    ret64 = qcow2_update_cluster_refcount(bs,
                             offset >> s->cluster_bits, l2_refcount - 1,
                             QCOW2_DISCARD_OTHER);
-                    if (ret < 0) {
+                    if (ret64 < 0) {
                         qcow2_free_clusters(bs, offset, s->cluster_size,
                                             QCOW2_DISCARD_OTHER);
+                        ret = ret64;
                         goto fail;
                     }
                 }
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6016211..6e06531 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -91,14 +91,14 @@ static int load_refcount_block(BlockDriverState *bs,
  * return value is the refcount of the cluster, negative values are -errno
  * and indicate an error.
  */
-int qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
+int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t refcount_table_index, block_index;
     int64_t refcount_block_offset;
     int ret;
     uint16_t *refcount_block;
-    uint16_t refcount;
+    int64_t refcount;
 
     refcount_table_index = cluster_index >> s->refcount_block_bits;
     if (refcount_table_index >= s->refcount_table_size)
@@ -556,9 +556,10 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
     for(cluster_offset = start; cluster_offset <= last;
         cluster_offset += s->cluster_size)
     {
-        int block_index, refcount;
+        int block_index;
         int64_t cluster_index = cluster_offset >> s->cluster_bits;
         int64_t table_index = cluster_index >> s->refcount_block_bits;
+        int64_t refcount;
 
         /* Load the refcount block and allocate it if needed */
         if (table_index != old_table_index) {
@@ -634,10 +635,10 @@ fail:
  * If the return value is non-negative, it is the new refcount of the cluster.
  * If it is negative, it is -errno and indicates an error.
  */
-int qcow2_update_cluster_refcount(BlockDriverState *bs,
-                                  int64_t cluster_index,
-                                  int addend,
-                                  enum qcow2_discard_type type)
+int64_t qcow2_update_cluster_refcount(BlockDriverState *bs,
+                                      int64_t cluster_index,
+                                      int addend,
+                                      enum qcow2_discard_type type)
 {
     BDRVQcowState *s = bs->opaque;
     int ret;
@@ -663,7 +664,7 @@ static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size)
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t i, nb_clusters;
-    int refcount;
+    int64_t refcount;
 
     nb_clusters = size_to_clusters(s, size);
 retry:
@@ -722,7 +723,8 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
     BDRVQcowState *s = bs->opaque;
     uint64_t cluster_index;
     uint64_t i;
-    int refcount, ret;
+    int64_t refcount;
+    int ret;
 
     assert(nb_clusters >= 0);
     if (nb_clusters == 0) {
@@ -878,8 +880,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
     BDRVQcowState *s = bs->opaque;
     uint64_t *l1_table, *l2_table, l2_offset, offset, l1_size2;
     bool l1_allocated = false;
-    int64_t old_offset, old_l2_offset;
-    int i, j, l1_modified = 0, nb_csectors, refcount;
+    int64_t old_offset, old_l2_offset, refcount;
+    int i, j, l1_modified = 0, nb_csectors;
     int ret;
 
     l2_table = NULL;
@@ -1341,7 +1343,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
     BDRVQcowState *s = bs->opaque;
     uint64_t *l2_table = qemu_blockalign(bs, s->cluster_size);
     int ret;
-    int refcount;
+    int64_t refcount;
     int i, j;
 
     for (i = 0; i < s->l1_size; i++) {
@@ -1360,7 +1362,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
         }
         if ((refcount == 1) != ((l1_entry & QCOW_OFLAG_COPIED) != 0)) {
             fprintf(stderr, "%s OFLAG_COPIED L2 cluster: l1_index=%d "
-                    "l1_entry=%" PRIx64 " refcount=%d\n",
+                    "l1_entry=%" PRIx64 " refcount=%" PRId64 "\n",
                     fix & BDRV_FIX_ERRORS ? "Repairing" :
                                             "ERROR",
                     i, l1_entry, refcount);
@@ -1403,7 +1405,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
                 }
                 if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
                     fprintf(stderr, "%s OFLAG_COPIED data cluster: "
-                            "l2_entry=%" PRIx64 " refcount=%d\n",
+                            "l2_entry=%" PRIx64 " refcount=%" PRId64 "\n",
                             fix & BDRV_FIX_ERRORS ? "Repairing" :
                                                     "ERROR",
                             l2_entry, refcount);
@@ -1628,8 +1630,8 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                               uint16_t *refcount_table, int64_t nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
-    int64_t i;
-    int refcount1, refcount2, ret;
+    int64_t i, refcount1, refcount2;
+    int ret;
 
     for (i = 0, *highest_cluster = 0; i < nb_clusters; i++) {
         refcount1 = qcow2_get_refcount(bs, i);
@@ -1657,7 +1659,8 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                 num_fixed = &res->corruptions_fixed;
             }
 
-            fprintf(stderr, "%s cluster %" PRId64 " refcount=%d reference=%d\n",
+            fprintf(stderr, "%s cluster %" PRId64 " refcount=%" PRId64
+                    " reference=%" PRId64 "\n",
                    num_fixed != NULL     ? "Repairing" :
                    refcount1 < refcount2 ? "ERROR" :
                                            "Leaked",
diff --git a/block/qcow2.h b/block/qcow2.h
index 4d8c902..0f8eb15 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -489,10 +489,11 @@ void qcow2_signal_corruption(BlockDriverState *bs, bool fatal, int64_t offset,
 int qcow2_refcount_init(BlockDriverState *bs);
 void qcow2_refcount_close(BlockDriverState *bs);
 
-int qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index);
+int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index);
 
-int qcow2_update_cluster_refcount(BlockDriverState *bs, int64_t cluster_index,
-                                  int addend, enum qcow2_discard_type type);
+int64_t qcow2_update_cluster_refcount(BlockDriverState *bs,
+                                      int64_t cluster_index, int addend,
+                                      enum qcow2_discard_type type);
 
 int64_t qcow2_alloc_clusters(BlockDriverState *bs, uint64_t size);
 int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 04/21] qcow2: Respect error in qcow2_alloc_bytes()
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (2 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 03/21] qcow2: Use 64 bits for refcount values Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-10 21:05   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes() Max Reitz
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

qcow2_update_cluster_refcount() may fail, and qcow2_alloc_bytes() should
mind that case.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6e06531..be4e5fe 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -761,7 +761,8 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
 int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 {
     BDRVQcowState *s = bs->opaque;
-    int64_t offset, cluster_offset;
+    int64_t offset, cluster_offset, new_cluster;
+    int64_t ret;
     int free_in_cluster;
 
     BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_BYTES);
@@ -783,23 +784,32 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
         free_in_cluster -= size;
         if (free_in_cluster == 0)
             s->free_byte_offset = 0;
-        if (offset_into_cluster(s, offset) != 0)
-            qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1,
-                                          QCOW2_DISCARD_NEVER);
+        if (offset_into_cluster(s, offset) != 0) {
+            ret = qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits,
+                                                1, QCOW2_DISCARD_NEVER);
+            if (ret < 0) {
+                return ret;
+            }
+        }
     } else {
-        offset = qcow2_alloc_clusters(bs, s->cluster_size);
-        if (offset < 0) {
-            return offset;
+        new_cluster = qcow2_alloc_clusters(bs, s->cluster_size);
+        if (new_cluster < 0) {
+            return new_cluster;
         }
         cluster_offset = start_of_cluster(s, s->free_byte_offset);
-        if ((cluster_offset + s->cluster_size) == offset) {
+        if ((cluster_offset + s->cluster_size) == new_cluster) {
             /* we are lucky: contiguous data */
             offset = s->free_byte_offset;
-            qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1,
-                                          QCOW2_DISCARD_NEVER);
+            ret = qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits,
+                                                1, QCOW2_DISCARD_NEVER);
+            if (ret < 0) {
+                qcow2_free_clusters(bs, new_cluster, s->cluster_size,
+                                    QCOW2_DISCARD_NEVER);
+                return ret;
+            }
             s->free_byte_offset += size;
         } else {
-            s->free_byte_offset = offset;
+            s->free_byte_offset = new_cluster;
             goto redo;
         }
     }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes()
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (3 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 04/21] qcow2: Respect error in qcow2_alloc_bytes() Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-10 21:12   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification Max Reitz
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

qcow2_alloc_bytes() may reuse a cluster multiple times, in which case
the refcount is increased accordingly. However, if this would lead to an
overflow the function should instead just not reuse this cluster and
allocate a new one.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index be4e5fe..66c78c0 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -761,12 +761,13 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
 int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 {
     BDRVQcowState *s = bs->opaque;
-    int64_t offset, cluster_offset, new_cluster;
+    int64_t offset, cluster_offset, new_cluster, refcount;
     int64_t ret;
     int free_in_cluster;
 
     BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_BYTES);
     assert(size > 0 && size <= s->cluster_size);
+ redo:
     if (s->free_byte_offset == 0) {
         offset = qcow2_alloc_clusters(bs, s->cluster_size);
         if (offset < 0) {
@@ -774,12 +775,25 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
         }
         s->free_byte_offset = offset;
     }
- redo:
+
     free_in_cluster = s->cluster_size -
         offset_into_cluster(s, s->free_byte_offset);
     if (size <= free_in_cluster) {
         /* enough space in current cluster */
         offset = s->free_byte_offset;
+
+        if (offset_into_cluster(s, offset) != 0) {
+            /* We will have to increase the refcount of this cluster; if the
+             * maximum has been reached already, this cluster cannot be used */
+            refcount = qcow2_get_refcount(bs, offset >> s->cluster_bits);
+            if (refcount < 0) {
+                return refcount;
+            } else if (refcount == s->refcount_max) {
+                s->free_byte_offset = 0;
+                goto redo;
+            }
+        }
+
         s->free_byte_offset += size;
         free_in_cluster -= size;
         if (free_in_cluster == 0)
@@ -800,6 +814,20 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
         if ((cluster_offset + s->cluster_size) == new_cluster) {
             /* we are lucky: contiguous data */
             offset = s->free_byte_offset;
+
+            /* Same as above: In order to reuse the cluster, the refcount has to
+             * be increased; if that will not work, we are not so lucky after
+             * all */
+            refcount = qcow2_get_refcount(bs, offset >> s->cluster_bits);
+            if (refcount < 0) {
+                qcow2_free_clusters(bs, new_cluster, s->cluster_size,
+                                    QCOW2_DISCARD_NEVER);
+                return refcount;
+            } else if (refcount == s->refcount_max) {
+                s->free_byte_offset = offset;
+                goto redo;
+            }
+
             ret = qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits,
                                                 1, QCOW2_DISCARD_NEVER);
             if (ret < 0) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (4 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes() Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-10 22:30   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 07/21] qcow2: Helper for refcount array size calculation Max Reitz
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Since refcounts do not always have to be a uint16_t, all refcount blocks
and arrays in memory should not have a specific type (thus they become
pointers to void) and for accessing them, two helper functions are used
(a getter and a setter). Those functions are called indirectly through
function pointers in the BDRVQcowState so they may later be exchanged
for different refcount orders.

At the same time, replace all sizeof(**refcount_table) etc. in the qcow2
check code by s->refcount_bits / 8. Note that this might lead to wrong
values due to truncating division, but currently s->refcount_bits is
always 16, and before the upcoming patch which removes this limitation
another patch will make the division round up correctly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 152 +++++++++++++++++++++++++++++--------------------
 block/qcow2.h          |   8 +++
 2 files changed, 98 insertions(+), 62 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 66c78c0..16652da 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -32,6 +32,11 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
                             int64_t offset, int64_t length,
                             int addend, enum qcow2_discard_type type);
 
+static uint64_t get_refcount_ro4(const void *refcount_array, uint64_t index);
+
+static void set_refcount_ro4(void *refcount_array, uint64_t index,
+                             uint64_t value);
+
 
 /*********************************************************/
 /* refcount handling */
@@ -42,6 +47,9 @@ int qcow2_refcount_init(BlockDriverState *bs)
     unsigned int refcount_table_size2, i;
     int ret;
 
+    s->get_refcount = &get_refcount_ro4;
+    s->set_refcount = &set_refcount_ro4;
+
     assert(s->refcount_table_size <= INT_MAX / sizeof(uint64_t));
     refcount_table_size2 = s->refcount_table_size * sizeof(uint64_t);
     s->refcount_table = g_try_malloc(refcount_table_size2);
@@ -72,6 +80,19 @@ void qcow2_refcount_close(BlockDriverState *bs)
 }
 
 
+static uint64_t get_refcount_ro4(const void *refcount_array, uint64_t index)
+{
+    return be16_to_cpu(((const uint16_t *)refcount_array)[index]);
+}
+
+static void set_refcount_ro4(void *refcount_array, uint64_t index,
+                             uint64_t value)
+{
+    assert(!(value >> 16));
+    ((uint16_t *)refcount_array)[index] = cpu_to_be16(value);
+}
+
+
 static int load_refcount_block(BlockDriverState *bs,
                                int64_t refcount_block_offset,
                                void **refcount_block)
@@ -97,7 +118,7 @@ int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
     uint64_t refcount_table_index, block_index;
     int64_t refcount_block_offset;
     int ret;
-    uint16_t *refcount_block;
+    void *refcount_block;
     int64_t refcount;
 
     refcount_table_index = cluster_index >> s->refcount_block_bits;
@@ -116,20 +137,24 @@ int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
     }
 
     ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
-        (void**) &refcount_block);
+                          &refcount_block);
     if (ret < 0) {
         return ret;
     }
 
     block_index = cluster_index & (s->refcount_block_size - 1);
-    refcount = be16_to_cpu(refcount_block[block_index]);
+    refcount = s->get_refcount(refcount_block, block_index);
 
-    ret = qcow2_cache_put(bs, s->refcount_block_cache,
-        (void**) &refcount_block);
+    ret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
     if (ret < 0) {
         return ret;
     }
 
+    if (refcount < 0) {
+        /* overflow */
+        return -ERANGE;
+    }
+
     return refcount;
 }
 
@@ -169,7 +194,7 @@ static int in_same_refcount_block(BDRVQcowState *s, uint64_t offset_a,
  * Returns 0 on success or -errno in error case
  */
 static int alloc_refcount_block(BlockDriverState *bs,
-    int64_t cluster_index, uint16_t **refcount_block)
+                                int64_t cluster_index, void **refcount_block)
 {
     BDRVQcowState *s = bs->opaque;
     unsigned int refcount_table_index;
@@ -196,7 +221,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
             }
 
              return load_refcount_block(bs, refcount_block_offset,
-                 (void**) refcount_block);
+                                        refcount_block);
         }
     }
 
@@ -256,7 +281,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
         /* The block describes itself, need to update the cache */
         int block_index = (new_block >> s->cluster_bits) &
             (s->refcount_block_size - 1);
-        (*refcount_block)[block_index] = cpu_to_be16(1);
+        s->set_refcount(*refcount_block, block_index, 1);
     } else {
         /* Described somewhere else. This can recurse at most twice before we
          * arrive at a block that describes itself. */
@@ -274,7 +299,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
         /* Initialize the new refcount block only after updating its refcount,
          * update_refcount uses the refcount cache itself */
         ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
-            (void**) refcount_block);
+                                    refcount_block);
         if (ret < 0) {
             goto fail_block;
         }
@@ -308,7 +333,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
         return -EAGAIN;
     }
 
-    ret = qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
+    ret = qcow2_cache_put(bs, s->refcount_block_cache, refcount_block);
     if (ret < 0) {
         goto fail_block;
     }
@@ -362,7 +387,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
         s->cluster_size;
     uint64_t table_offset = meta_offset + blocks_clusters * s->cluster_size;
     uint64_t *new_table = g_try_new0(uint64_t, table_size);
-    uint16_t *new_blocks = g_try_malloc0(blocks_clusters * s->cluster_size);
+    void *new_blocks = g_try_malloc0(blocks_clusters * s->cluster_size);
 
     assert(table_size > 0 && blocks_clusters > 0);
     if (new_table == NULL || new_blocks == NULL) {
@@ -384,7 +409,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
     uint64_t table_clusters = size_to_clusters(s, table_size * sizeof(uint64_t));
     int block = 0;
     for (i = 0; i < table_clusters + blocks_clusters; i++) {
-        new_blocks[block++] = cpu_to_be16(1);
+        s->set_refcount(new_blocks, block++, 1);
     }
 
     /* Write refcount blocks to disk */
@@ -437,7 +462,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
     qcow2_free_clusters(bs, old_table_offset, old_table_size * sizeof(uint64_t),
                         QCOW2_DISCARD_OTHER);
 
-    ret = load_refcount_block(bs, new_block, (void**) refcount_block);
+    ret = load_refcount_block(bs, new_block, refcount_block);
     if (ret < 0) {
         return ret;
     }
@@ -452,7 +477,7 @@ fail_table:
     g_free(new_table);
 fail_block:
     if (*refcount_block != NULL) {
-        qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
+        qcow2_cache_put(bs, s->refcount_block_cache, refcount_block);
     }
     return ret;
 }
@@ -532,7 +557,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
 {
     BDRVQcowState *s = bs->opaque;
     int64_t start, last, cluster_offset;
-    uint16_t *refcount_block = NULL;
+    void *refcount_block = NULL;
     int64_t old_table_index = -1;
     int ret;
 
@@ -583,7 +608,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
         /* we can update the count and save it */
         block_index = cluster_index & (s->refcount_block_size - 1);
 
-        refcount = be16_to_cpu(refcount_block[block_index]);
+        refcount = s->get_refcount(refcount_block, block_index);
         refcount += addend;
         if (refcount < 0 || refcount > s->refcount_max) {
             ret = -EINVAL;
@@ -592,7 +617,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
         if (refcount == 0 && cluster_index < s->free_cluster_index) {
             s->free_cluster_index = cluster_index;
         }
-        refcount_block[block_index] = cpu_to_be16(refcount);
+        s->set_refcount(refcount_block, block_index, refcount);
 
         if (refcount == 0 && s->discard_passthrough[type]) {
             update_refcount_discard(bs, cluster_offset, s->cluster_size);
@@ -608,8 +633,7 @@ fail:
     /* Write last changed block to disk */
     if (refcount_block) {
         int wret;
-        wret = qcow2_cache_put(bs, s->refcount_block_cache,
-            (void**) &refcount_block);
+        wret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
         if (wret < 0) {
             return ret < 0 ? ret : wret;
         }
@@ -1118,12 +1142,13 @@ fail:
  */
 static int inc_refcounts(BlockDriverState *bs,
                          BdrvCheckResult *res,
-                         uint16_t **refcount_table,
+                         void **refcount_table,
                          int64_t *refcount_table_size,
                          int64_t offset, int64_t size)
 {
     BDRVQcowState *s = bs->opaque;
-    uint64_t start, last, cluster_offset, k;
+    uint64_t start, last, cluster_offset, k, refcount;
+    int64_t i;
 
     if (size <= 0) {
         return 0;
@@ -1136,12 +1161,12 @@ static int inc_refcounts(BlockDriverState *bs,
         k = cluster_offset >> s->cluster_bits;
         if (k >= *refcount_table_size) {
             int64_t old_refcount_table_size = *refcount_table_size;
-            uint16_t *new_refcount_table;
+            void *new_refcount_table;
 
             *refcount_table_size = k + 1;
             new_refcount_table = g_try_realloc(*refcount_table,
                                                *refcount_table_size *
-                                               sizeof(**refcount_table));
+                                               s->refcount_bits / 8);
             if (!new_refcount_table) {
                 *refcount_table_size = old_refcount_table_size;
                 res->check_errors++;
@@ -1149,16 +1174,19 @@ static int inc_refcounts(BlockDriverState *bs,
             }
             *refcount_table = new_refcount_table;
 
-            memset(*refcount_table + old_refcount_table_size, 0,
-                   (*refcount_table_size - old_refcount_table_size) *
-                   sizeof(**refcount_table));
+            for (i = old_refcount_table_size; i < *refcount_table_size; i++) {
+                s->set_refcount(*refcount_table, i, 0);
+            }
         }
 
-        if (++(*refcount_table)[k] == 0) {
+        refcount = s->get_refcount(*refcount_table, k);
+        if (refcount == s->refcount_max) {
             fprintf(stderr, "ERROR: overflow cluster offset=0x%" PRIx64
                     "\n", cluster_offset);
             res->corruptions++;
+            continue;
         }
+        s->set_refcount(*refcount_table, k, refcount + 1);
     }
 
     return 0;
@@ -1178,7 +1206,7 @@ enum {
  * error occurred.
  */
 static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
-    uint16_t **refcount_table, int64_t *refcount_table_size, int64_t l2_offset,
+    void **refcount_table, int64_t *refcount_table_size, int64_t l2_offset,
     int flags)
 {
     BDRVQcowState *s = bs->opaque;
@@ -1296,7 +1324,7 @@ fail:
  */
 static int check_refcounts_l1(BlockDriverState *bs,
                               BdrvCheckResult *res,
-                              uint16_t **refcount_table,
+                              void **refcount_table,
                               int64_t *refcount_table_size,
                               int64_t l1_table_offset, int l1_size,
                               int flags)
@@ -1493,10 +1521,10 @@ fail:
  */
 static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
                            BdrvCheckMode fix, bool *rebuild,
-                           uint16_t **refcount_table, int64_t *nb_clusters)
+                           void **refcount_table, int64_t *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
-    int64_t i, size;
+    int64_t i, j, size;
     int ret;
 
     for(i = 0; i < s->refcount_table_size; i++) {
@@ -1519,7 +1547,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
 
             if (fix & BDRV_FIX_ERRORS) {
                 int64_t old_nb_clusters = *nb_clusters;
-                uint16_t *new_refcount_table;
+                void *new_refcount_table;
 
                 if (offset > INT64_MAX - s->cluster_size) {
                     ret = -EINVAL;
@@ -1541,7 +1569,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
 
                 new_refcount_table = g_try_realloc(*refcount_table,
                                                    *nb_clusters *
-                                                   sizeof(**refcount_table));
+                                                   s->refcount_bits / 8);
                 if (!new_refcount_table) {
                     *nb_clusters = old_nb_clusters;
                     res->check_errors++;
@@ -1549,9 +1577,9 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
                 }
                 *refcount_table = new_refcount_table;
 
-                memset(*refcount_table + old_nb_clusters, 0,
-                       (*nb_clusters - old_nb_clusters) *
-                       sizeof(**refcount_table));
+                for (j = old_nb_clusters; j < *nb_clusters; j++) {
+                    s->set_refcount(*refcount_table, j, 0);
+                }
 
                 if (cluster >= *nb_clusters) {
                     ret = -EINVAL;
@@ -1586,9 +1614,10 @@ resize_fail:
             if (ret < 0) {
                 return ret;
             }
-            if ((*refcount_table)[cluster] != 1) {
+            if (s->get_refcount(*refcount_table, cluster) != 1) {
                 fprintf(stderr, "ERROR refcount block %" PRId64
-                        " refcount=%d\n", i, (*refcount_table)[cluster]);
+                        " refcount=%" PRIu64 "\n", i,
+                        s->get_refcount(*refcount_table, cluster));
                 res->corruptions++;
                 *rebuild = true;
             }
@@ -1603,7 +1632,7 @@ resize_fail:
  */
 static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                                BdrvCheckMode fix, bool *rebuild,
-                               uint16_t **refcount_table, int64_t *nb_clusters)
+                               void **refcount_table, int64_t *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
     int64_t i;
@@ -1611,7 +1640,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     int ret;
 
     if (!*refcount_table) {
-        *refcount_table = g_try_new0(uint16_t, *nb_clusters);
+        *refcount_table = g_try_malloc0(*nb_clusters * s->refcount_bits / 8);
         if (*nb_clusters && *refcount_table == NULL) {
             res->check_errors++;
             return -ENOMEM;
@@ -1665,7 +1694,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
 static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                               BdrvCheckMode fix, bool *rebuild,
                               int64_t *highest_cluster,
-                              uint16_t *refcount_table, int64_t nb_clusters)
+                              void *refcount_table, int64_t nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
     int64_t i, refcount1, refcount2;
@@ -1680,7 +1709,7 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
             continue;
         }
 
-        refcount2 = refcount_table[i];
+        refcount2 = s->get_refcount(refcount_table, i);
 
         if (refcount1 > 0 || refcount2 > 0) {
             *highest_cluster = i;
@@ -1738,7 +1767,7 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
  */
 static int64_t alloc_clusters_imrt(BlockDriverState *bs,
                                    int cluster_count,
-                                   uint16_t **refcount_table,
+                                   void **refcount_table,
                                    int64_t *imrt_nb_clusters,
                                    int64_t *first_free_cluster)
 {
@@ -1754,7 +1783,7 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
          contiguous_free_clusters < cluster_count;
          cluster++)
     {
-        if (!(*refcount_table)[cluster]) {
+        if (!s->get_refcount(*refcount_table, cluster)) {
             contiguous_free_clusters++;
             if (first_gap) {
                 /* If this is the first free cluster found, update
@@ -1776,7 +1805,7 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
      * accordingly to append free clusters at the end of the image */
     if (contiguous_free_clusters < cluster_count) {
         int64_t old_imrt_nb_clusters = *imrt_nb_clusters;
-        uint16_t *new_refcount_table;
+        void *new_refcount_table;
 
         /* contiguous_free_clusters clusters are already empty at the image end;
          * we need cluster_count clusters; therefore, we have to allocate
@@ -1787,22 +1816,22 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
         *imrt_nb_clusters = cluster + cluster_count - contiguous_free_clusters;
         new_refcount_table = g_try_realloc(*refcount_table,
                                            *imrt_nb_clusters *
-                                           sizeof(**refcount_table));
+                                           s->refcount_bits / 8);
         if (!new_refcount_table) {
             *imrt_nb_clusters = old_imrt_nb_clusters;
             return -ENOMEM;
         }
         *refcount_table = new_refcount_table;
 
-        memset(*refcount_table + old_imrt_nb_clusters, 0,
-               (*imrt_nb_clusters - old_imrt_nb_clusters) *
-               sizeof(**refcount_table));
+        for (i = old_imrt_nb_clusters; i < *imrt_nb_clusters; i++) {
+            s->set_refcount(*refcount_table, i, 0);
+        }
     }
 
     /* Go back to the first free cluster */
     cluster -= contiguous_free_clusters;
     for (i = 0; i < cluster_count; i++) {
-        (*refcount_table)[cluster + i] = 1;
+        s->set_refcount(*refcount_table, cluster + i, 1);
     }
 
     return cluster << s->cluster_bits;
@@ -1818,7 +1847,7 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
  */
 static int rebuild_refcount_structure(BlockDriverState *bs,
                                       BdrvCheckResult *res,
-                                      uint16_t **refcount_table,
+                                      void **refcount_table,
                                       int64_t *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
@@ -1826,8 +1855,8 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
     int64_t refblock_offset, refblock_start, refblock_index;
     uint32_t reftable_size = 0;
     uint64_t *on_disk_reftable = NULL;
-    uint16_t *on_disk_refblock;
-    int i, ret = 0;
+    void *on_disk_refblock;
+    int ret = 0;
     struct {
         uint64_t reftable_offset;
         uint32_t reftable_clusters;
@@ -1837,7 +1866,7 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
 
 write_refblocks:
     for (; cluster < *nb_clusters; cluster++) {
-        if (!(*refcount_table)[cluster]) {
+        if (!s->get_refcount(*refcount_table, cluster)) {
             continue;
         }
 
@@ -1911,12 +1940,11 @@ write_refblocks:
         }
 
         on_disk_refblock = qemu_blockalign0(bs->file, s->cluster_size);
-        for (i = 0; i < s->refcount_block_size &&
-                    refblock_start + i < *nb_clusters; i++)
-        {
-            on_disk_refblock[i] =
-                cpu_to_be16((*refcount_table)[refblock_start + i]);
-        }
+
+        memcpy(on_disk_refblock, (void *)((uintptr_t)*refcount_table +
+                                 (refblock_index << s->refcount_block_bits)),
+               MIN(s->refcount_block_size, *nb_clusters - refblock_start)
+               * s->refcount_bits / 8);
 
         ret = bdrv_write(bs->file, refblock_offset / BDRV_SECTOR_SIZE,
                          (void *)on_disk_refblock, s->cluster_sectors);
@@ -2015,7 +2043,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     BDRVQcowState *s = bs->opaque;
     BdrvCheckResult pre_compare_res;
     int64_t size, highest_cluster, nb_clusters;
-    uint16_t *refcount_table = NULL;
+    void *refcount_table = NULL;
     bool rebuild = false;
     int ret;
 
@@ -2064,7 +2092,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
         /* Because the old reftable has been exchanged for a new one the
          * references have to be recalculated */
         rebuild = false;
-        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
+        memset(refcount_table, 0, nb_clusters * s->refcount_bits / 8);
         ret = calculate_refcounts(bs, res, 0, &rebuild, &refcount_table,
                                   &nb_clusters);
         if (ret < 0) {
diff --git a/block/qcow2.h b/block/qcow2.h
index 0f8eb15..1c63221 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -213,6 +213,11 @@ typedef struct Qcow2DiscardRegion {
     QTAILQ_ENTRY(Qcow2DiscardRegion) next;
 } Qcow2DiscardRegion;
 
+typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
+                                      uint64_t index);
+typedef void Qcow2SetRefcountFunc(void *refcount_array,
+                                  uint64_t index, uint64_t value);
+
 typedef struct BDRVQcowState {
     int cluster_bits;
     int cluster_size;
@@ -261,6 +266,9 @@ typedef struct BDRVQcowState {
     int refcount_bits;
     uint64_t refcount_max;
 
+    Qcow2GetRefcountFunc *get_refcount;
+    Qcow2SetRefcountFunc *set_refcount;
+
     bool discard_passthrough[QCOW2_DISCARD_MAX];
 
     int overlap_check; /* bitmask of Qcow2MetadataOverlap values */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 07/21] qcow2: Helper for refcount array size calculation
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (5 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-10 22:49   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 08/21] qcow2: More helpers for refcount modification Max Reitz
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add a helper function which correctly calculates the byte size of a
refcount array for any refcount order, and use that function.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 16652da..cfb4807 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1132,6 +1132,20 @@ fail:
 /* refcount checking functions */
 
 
+static size_t refcount_array_byte_size(BDRVQcowState *s, uint64_t entries)
+{
+    if (s->refcount_order < 3) {
+        /* sub-byte width */
+        int shift = 3 - s->refcount_order;
+        return (entries + (1 << shift) - 1) >> shift;
+    } else if (s->refcount_order == 3) {
+        /* byte width */
+        return entries;
+    } else {
+        /* multiple bytes wide */
+        return entries << (s->refcount_order - 3);
+    }
+}
 
 /*
  * Increases the refcount for a range of clusters in a given refcount table.
@@ -1161,12 +1175,13 @@ static int inc_refcounts(BlockDriverState *bs,
         k = cluster_offset >> s->cluster_bits;
         if (k >= *refcount_table_size) {
             int64_t old_refcount_table_size = *refcount_table_size;
+            size_t new_byte_size;
             void *new_refcount_table;
 
             *refcount_table_size = k + 1;
-            new_refcount_table = g_try_realloc(*refcount_table,
-                                               *refcount_table_size *
-                                               s->refcount_bits / 8);
+            new_byte_size = refcount_array_byte_size(s, *refcount_table_size);
+
+            new_refcount_table = g_try_realloc(*refcount_table, new_byte_size);
             if (!new_refcount_table) {
                 *refcount_table_size = old_refcount_table_size;
                 res->check_errors++;
@@ -1547,6 +1562,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
 
             if (fix & BDRV_FIX_ERRORS) {
                 int64_t old_nb_clusters = *nb_clusters;
+                size_t new_byte_size;
                 void *new_refcount_table;
 
                 if (offset > INT64_MAX - s->cluster_size) {
@@ -1567,9 +1583,9 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
                 *nb_clusters = size_to_clusters(s, size);
                 assert(*nb_clusters >= old_nb_clusters);
 
+                new_byte_size = refcount_array_byte_size(s, *nb_clusters);
                 new_refcount_table = g_try_realloc(*refcount_table,
-                                                   *nb_clusters *
-                                                   s->refcount_bits / 8);
+                                                   new_byte_size);
                 if (!new_refcount_table) {
                     *nb_clusters = old_nb_clusters;
                     res->check_errors++;
@@ -1640,7 +1656,8 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     int ret;
 
     if (!*refcount_table) {
-        *refcount_table = g_try_malloc0(*nb_clusters * s->refcount_bits / 8);
+        size_t byte_size = refcount_array_byte_size(s, *nb_clusters);
+        *refcount_table = g_try_malloc0(byte_size);
         if (*nb_clusters && *refcount_table == NULL) {
             res->check_errors++;
             return -ENOMEM;
@@ -1805,6 +1822,7 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
      * accordingly to append free clusters at the end of the image */
     if (contiguous_free_clusters < cluster_count) {
         int64_t old_imrt_nb_clusters = *imrt_nb_clusters;
+        size_t new_byte_size;
         void *new_refcount_table;
 
         /* contiguous_free_clusters clusters are already empty at the image end;
@@ -1814,9 +1832,8 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
          * may exceed old_imrt_nb_clusters if *first_free_cluster pointed beyond
          * the image end) */
         *imrt_nb_clusters = cluster + cluster_count - contiguous_free_clusters;
-        new_refcount_table = g_try_realloc(*refcount_table,
-                                           *imrt_nb_clusters *
-                                           s->refcount_bits / 8);
+        new_byte_size = refcount_array_byte_size(s, *imrt_nb_clusters);
+        new_refcount_table = g_try_realloc(*refcount_table, new_byte_size);
         if (!new_refcount_table) {
             *imrt_nb_clusters = old_imrt_nb_clusters;
             return -ENOMEM;
@@ -1943,8 +1960,8 @@ write_refblocks:
 
         memcpy(on_disk_refblock, (void *)((uintptr_t)*refcount_table +
                                  (refblock_index << s->refcount_block_bits)),
-               MIN(s->refcount_block_size, *nb_clusters - refblock_start)
-               * s->refcount_bits / 8);
+               refcount_array_byte_size(s, MIN(s->refcount_block_size,
+                                               *nb_clusters - refblock_start)));
 
         ret = bdrv_write(bs->file, refblock_offset / BDRV_SECTOR_SIZE,
                          (void *)on_disk_refblock, s->cluster_sectors);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 08/21] qcow2: More helpers for refcount modification
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (6 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 07/21] qcow2: Helper for refcount array size calculation Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-11  0:29   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 09/21] qcow2: Open images with refcount order != 4 Max Reitz
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add helper functions for getting and setting refcounts in a refcount
array for any possible refcount order, and choose the correct one during
refcount initialization.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 143 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 141 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index cfb4807..08b2ddb 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -32,10 +32,73 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
                             int64_t offset, int64_t length,
                             int addend, enum qcow2_discard_type type);
 
+static uint64_t get_refcount_ro0(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro1(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro2(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro3(const void *refcount_array, uint64_t index);
 static uint64_t get_refcount_ro4(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro5(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro6(const void *refcount_array, uint64_t index);
 
+static void set_refcount_ro0(void *refcount_array, uint64_t index,
+                             uint64_t value);
+static void set_refcount_ro1(void *refcount_array, uint64_t index,
+                             uint64_t value);
+static void set_refcount_ro2(void *refcount_array, uint64_t index,
+                             uint64_t value);
+static void set_refcount_ro3(void *refcount_array, uint64_t index,
+                             uint64_t value);
 static void set_refcount_ro4(void *refcount_array, uint64_t index,
                              uint64_t value);
+static void set_refcount_ro5(void *refcount_array, uint64_t index,
+                             uint64_t value);
+static void set_refcount_ro6(void *refcount_array, uint64_t index,
+                             uint64_t value);
+
+static void get_refcount_functions(int refcount_order,
+                                   Qcow2GetRefcountFunc **get,
+                                   Qcow2SetRefcountFunc **set)
+{
+    switch (refcount_order) {
+        case 0:
+            *get = &get_refcount_ro0;
+            *set = &set_refcount_ro0;
+            break;
+
+        case 1:
+            *get = &get_refcount_ro1;
+            *set = &set_refcount_ro1;
+            break;
+
+        case 2:
+            *get = &get_refcount_ro2;
+            *set = &set_refcount_ro2;
+            break;
+
+        case 3:
+            *get = &get_refcount_ro3;
+            *set = &set_refcount_ro3;
+            break;
+
+        case 4:
+            *get = &get_refcount_ro4;
+            *set = &set_refcount_ro4;
+            break;
+
+        case 5:
+            *get = &get_refcount_ro5;
+            *set = &set_refcount_ro5;
+            break;
+
+        case 6:
+            *get = &get_refcount_ro6;
+            *set = &set_refcount_ro6;
+            break;
+
+        default:
+            abort();
+    }
+}
 
 
 /*********************************************************/
@@ -47,8 +110,8 @@ int qcow2_refcount_init(BlockDriverState *bs)
     unsigned int refcount_table_size2, i;
     int ret;
 
-    s->get_refcount = &get_refcount_ro4;
-    s->set_refcount = &set_refcount_ro4;
+    get_refcount_functions(s->refcount_order,
+                           &s->get_refcount, &s->set_refcount);
 
     assert(s->refcount_table_size <= INT_MAX / sizeof(uint64_t));
     refcount_table_size2 = s->refcount_table_size * sizeof(uint64_t);
@@ -80,6 +143,59 @@ void qcow2_refcount_close(BlockDriverState *bs)
 }
 
 
+static uint64_t get_refcount_ro0(const void *refcount_array, uint64_t index)
+{
+    return (((const uint8_t *)refcount_array)[index / 8] >> (index % 8)) & 0x1;
+}
+
+static void set_refcount_ro0(void *refcount_array, uint64_t index,
+                             uint64_t value)
+{
+    assert(!(value >> 1));
+    ((uint8_t *)refcount_array)[index / 8] &= ~(0x1 << (index % 8));
+    ((uint8_t *)refcount_array)[index / 8] |= value << (index % 8);
+}
+
+static uint64_t get_refcount_ro1(const void *refcount_array, uint64_t index)
+{
+    return (((const uint8_t *)refcount_array)[index / 4] >> (2 * (index % 4)))
+           & 0x3;
+}
+
+static void set_refcount_ro1(void *refcount_array, uint64_t index,
+                             uint64_t value)
+{
+    assert(!(value >> 2));
+    ((uint8_t *)refcount_array)[index / 4] &= ~(0x3 << (2 * (index % 4)));
+    ((uint8_t *)refcount_array)[index / 4] |= value << (2 * (index % 4));
+}
+
+static uint64_t get_refcount_ro2(const void *refcount_array, uint64_t index)
+{
+    return (((const uint8_t *)refcount_array)[index / 2] >> (4 * (index % 2)))
+           & 0xf;
+}
+
+static void set_refcount_ro2(void *refcount_array, uint64_t index,
+                             uint64_t value)
+{
+    assert(!(value >> 4));
+    ((uint8_t *)refcount_array)[index / 2] &= ~(0xf << (4 * (index % 2)));
+    ((uint8_t *)refcount_array)[index / 2] |= value << (4 * (index % 2));
+}
+
+static uint64_t get_refcount_ro3(const void *refcount_array, uint64_t index)
+{
+    return ((const uint8_t *)refcount_array)[index];
+}
+
+static void set_refcount_ro3(void *refcount_array, uint64_t index,
+                             uint64_t value)
+{
+    assert(!(value >> 8));
+    ((uint8_t *)refcount_array)[index] = value;
+}
+
 static uint64_t get_refcount_ro4(const void *refcount_array, uint64_t index)
 {
     return be16_to_cpu(((const uint16_t *)refcount_array)[index]);
@@ -92,6 +208,29 @@ static void set_refcount_ro4(void *refcount_array, uint64_t index,
     ((uint16_t *)refcount_array)[index] = cpu_to_be16(value);
 }
 
+static uint64_t get_refcount_ro5(const void *refcount_array, uint64_t index)
+{
+    return be32_to_cpu(((const uint32_t *)refcount_array)[index]);
+}
+
+static void set_refcount_ro5(void *refcount_array, uint64_t index,
+                             uint64_t value)
+{
+    assert(!(value >> 32));
+    ((uint32_t *)refcount_array)[index] = cpu_to_be32(value);
+}
+
+static uint64_t get_refcount_ro6(const void *refcount_array, uint64_t index)
+{
+    return be64_to_cpu(((const uint64_t *)refcount_array)[index]);
+}
+
+static void set_refcount_ro6(void *refcount_array, uint64_t index,
+                             uint64_t value)
+{
+    ((uint64_t *)refcount_array)[index] = cpu_to_be64(value);
+}
+
 
 static int load_refcount_block(BlockDriverState *bs,
                                int64_t refcount_block_offset,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 09/21] qcow2: Open images with refcount order != 4
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (7 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 08/21] qcow2: More helpers for refcount modification Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-10 17:03   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 10/21] qcow2: refcount_order parameter for qcow2_create2 Max Reitz
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

No longer refuse to open images with a different refcount entry width
than 16 bits; only reject images with a refcount width larger than 64
bits (which is prohibited by the specification).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d70e927..b718e75 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -677,10 +677,10 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     /* Check support for various header values */
-    if (header.refcount_order != 4) {
-        report_unsupported(bs, errp, "%d bit reference counts",
-                           1 << header.refcount_order);
-        ret = -ENOTSUP;
+    if (header.refcount_order > 6) {
+        error_setg(errp, "Reference count entry width too large (%i bit); may "
+                   "not exceed 64 bit", 1 << header.refcount_order);
+        ret = -EINVAL;
         goto fail;
     }
     s->refcount_order = header.refcount_order;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 10/21] qcow2: refcount_order parameter for qcow2_create2
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (8 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 09/21] qcow2: Open images with refcount order != 4 Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-11  5:40   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 11/21] iotests: Prepare for refcount_width option Max Reitz
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add a refcount_order parameter to qcow2_create2(), use that value for
the image header and for calculating the size required for
preallocation.

For now, always pass 4.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index b718e75..9accd88 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1775,7 +1775,7 @@ static int preallocate(BlockDriverState *bs)
 static int qcow2_create2(const char *filename, int64_t total_size,
                          const char *backing_file, const char *backing_format,
                          int flags, size_t cluster_size, PreallocMode prealloc,
-                         QemuOpts *opts, int version,
+                         QemuOpts *opts, int version, int refcount_order,
                          Error **errp)
 {
     /* Calculate cluster_bits */
@@ -1811,6 +1811,13 @@ static int qcow2_create2(const char *filename, int64_t total_size,
         int64_t meta_size = 0;
         uint64_t nreftablee, nrefblocke, nl1e, nl2e;
         int64_t aligned_total_size = align_offset(total_size, cluster_size);
+        int refblock_bits, refblock_size;
+        /* refcount entry size in bytes */
+        double rces = (1 << refcount_order) / 8.;
+
+        /* see qcow2_open() */
+        refblock_bits = cluster_bits - (refcount_order - 3);
+        refblock_size = 1 << refblock_bits;
 
         /* header: 1 cluster */
         meta_size += cluster_size;
@@ -1835,20 +1842,20 @@ static int qcow2_create2(const char *filename, int64_t total_size,
          *   c = cluster size
          *   y1 = number of refcount blocks entries
          *   y2 = meta size including everything
+         *   rces = refcount entry size in bytes
          * then,
          *   y1 = (y2 + a)/c
-         *   y2 = y1 * sizeof(u16) + y1 * sizeof(u16) * sizeof(u64) / c + m
+         *   y2 = y1 * rces + y1 * rces * sizeof(u64) / c + m
          * we can get y1:
-         *   y1 = (a + m) / (c - sizeof(u16) - sizeof(u16) * sizeof(u64) / c)
+         *   y1 = (a + m) / (c - rces - rces * sizeof(u64) / c)
          */
-        nrefblocke = (aligned_total_size + meta_size + cluster_size) /
-            (cluster_size - sizeof(uint16_t) -
-             1.0 * sizeof(uint16_t) * sizeof(uint64_t) / cluster_size);
-        nrefblocke = align_offset(nrefblocke, cluster_size / sizeof(uint16_t));
-        meta_size += nrefblocke * sizeof(uint16_t);
+        nrefblocke = (aligned_total_size + meta_size + cluster_size)
+                   / (cluster_size - rces - rces * sizeof(uint64_t)
+                                                 / cluster_size);
+        meta_size += DIV_ROUND_UP(nrefblocke, refblock_size) * cluster_size;
 
         /* total size of refcount tables */
-        nreftablee = nrefblocke * sizeof(uint16_t) / cluster_size;
+        nreftablee = nrefblocke / refblock_size;
         nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
         meta_size += nreftablee * sizeof(uint64_t);
 
@@ -1883,7 +1890,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
         .l1_size                    = cpu_to_be32(0),
         .refcount_table_offset      = cpu_to_be64(cluster_size),
         .refcount_table_clusters    = cpu_to_be32(1),
-        .refcount_order             = cpu_to_be32(4),
+        .refcount_order             = cpu_to_be32(refcount_order),
         .header_length              = cpu_to_be32(sizeof(*header)),
     };
 
@@ -2003,6 +2010,7 @@ static int qcow2_create(const char *filename, QemuOpts *opts, Error **errp)
     size_t cluster_size = DEFAULT_CLUSTER_SIZE;
     PreallocMode prealloc;
     int version = 3;
+    int refcount_width = 16, refcount_order;
     Error *local_err = NULL;
     int ret;
 
@@ -2057,8 +2065,19 @@ static int qcow2_create(const char *filename, QemuOpts *opts, Error **errp)
         goto finish;
     }
 
+    if (version < 3 && refcount_width != 16) {
+        error_setg(errp, "Different refcount widths than 16 bits require "
+                   "compatibility level 1.1 or above (use compat=1.1 or "
+                   "greater)");
+        ret = -EINVAL;
+        goto finish;
+    }
+
+    refcount_order = ffs(refcount_width) - 1;
+
     ret = qcow2_create2(filename, size, backing_file, backing_fmt, flags,
-                        cluster_size, prealloc, opts, version, &local_err);
+                        cluster_size, prealloc, opts, version, refcount_order,
+                        &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
     }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 11/21] iotests: Prepare for refcount_width option
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (9 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 10/21] qcow2: refcount_order parameter for qcow2_create2 Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-11 17:57   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 12/21] qcow2: Allow creation with refcount order != 4 Max Reitz
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Some tests do not work well with certain refcount widths (i.e. you
cannot create internal snapshots with refcount_width=1), so make those
widths unsupported.

Furthermore, add another filter to _filter_img_create in common.filter
which filters out the refcount_width value.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/007           |  4 ++++
 tests/qemu-iotests/015           |  1 +
 tests/qemu-iotests/026           | 11 +++++++++++
 tests/qemu-iotests/029           |  1 +
 tests/qemu-iotests/051           |  1 +
 tests/qemu-iotests/058           |  1 +
 tests/qemu-iotests/067           |  7 +++++++
 tests/qemu-iotests/079           |  1 +
 tests/qemu-iotests/080           |  1 +
 tests/qemu-iotests/089           |  7 +++++++
 tests/qemu-iotests/090           |  1 +
 tests/qemu-iotests/108           |  6 ++++++
 tests/qemu-iotests/common.filter |  3 ++-
 13 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/007 b/tests/qemu-iotests/007
index fe1a743..de39d1b 100755
--- a/tests/qemu-iotests/007
+++ b/tests/qemu-iotests/007
@@ -43,6 +43,10 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto generic
 _supported_os Linux
+# refcount_width must be at least 4 bits so we can create ten internal snapshots
+# (1 bit supports none, 2 bits support three, 4 bits support 15)
+_unsupported_imgopts 'refcount_width=1[^0-9]' \
+                     'refcount_width=2[^0-9]'
 
 echo
 echo "creating image"
diff --git a/tests/qemu-iotests/015 b/tests/qemu-iotests/015
index 099d757..65a1a11 100755
--- a/tests/qemu-iotests/015
+++ b/tests/qemu-iotests/015
@@ -43,6 +43,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto generic
 _supported_os Linux
+_unsupported_imgopts 'refcount_width=1[^0-9]'
 
 echo
 echo "creating image"
diff --git a/tests/qemu-iotests/026 b/tests/qemu-iotests/026
index df2884b..8526f1a 100755
--- a/tests/qemu-iotests/026
+++ b/tests/qemu-iotests/026
@@ -46,6 +46,17 @@ _supported_proto file
 _supported_os Linux
 _default_cache_mode "writethrough"
 _supported_cache_modes "writethrough" "none"
+# The refcount table tests expect a certain minimum width for refcount entries
+# (so that the refcount table actually needs to grow); that minimum is 16 bits,
+# being the default refcount entry width.
+# 32 and 64 bits do not work either, however, due to different leaked cluster
+# count on error.
+_unsupported_imgopts 'refcount_width=1[^0-9]' \
+                     'refcount_width=2[^0-9]' \
+                     'refcount_width=4[^0-9]' \
+                     'refcount_width=8[^0-9]' \
+                     'refcount_width=32[^0-9]' \
+                     'refcount_width=64[^0-9]' \
 
 echo "Errors while writing 128 kB"
 echo
diff --git a/tests/qemu-iotests/029 b/tests/qemu-iotests/029
index fa46ace..aa416a6 100755
--- a/tests/qemu-iotests/029
+++ b/tests/qemu-iotests/029
@@ -44,6 +44,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto generic
 _supported_os Linux
+_unsupported_imgopts 'refcount_width=1[^0-9]'
 
 offset_size=24
 offset_l1_size=36
diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
index 11c858f..1ddafb0 100755
--- a/tests/qemu-iotests/051
+++ b/tests/qemu-iotests/051
@@ -41,6 +41,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file
 _supported_os Linux
+_unsupported_imgopts refcount_width
 
 function do_run_qemu()
 {
diff --git a/tests/qemu-iotests/058 b/tests/qemu-iotests/058
index 14584cd..4fc0793 100755
--- a/tests/qemu-iotests/058
+++ b/tests/qemu-iotests/058
@@ -88,6 +88,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file
 _require_command QEMU_NBD
+_unsupported_imgopts 'refcount_width=1[^0-9]'
 
 echo
 echo "== preparing image =="
diff --git a/tests/qemu-iotests/067 b/tests/qemu-iotests/067
index d025192..98d30b5 100755
--- a/tests/qemu-iotests/067
+++ b/tests/qemu-iotests/067
@@ -35,6 +35,13 @@ status=1	# failure is the default!
 _supported_fmt qcow2
 _supported_proto file
 _supported_os Linux
+# Because this would change the output of query-block
+_unsupported_imgopts 'refcount_width=1[^0-9]' \
+                     'refcount_width=2[^0-9]' \
+                     'refcount_width=4[^0-9]' \
+                     'refcount_width=8[^0-9]' \
+                     'refcount_width=32[^0-9]' \
+                     'refcount_width=64[^0-9]'
 
 function do_run_qemu()
 {
diff --git a/tests/qemu-iotests/079 b/tests/qemu-iotests/079
index 6613cfb..23bf4d8 100755
--- a/tests/qemu-iotests/079
+++ b/tests/qemu-iotests/079
@@ -41,6 +41,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file nfs
 _supported_os Linux
+_unsupported_imgopts refcount_width
 
 function test_qemu_img()
 {
diff --git a/tests/qemu-iotests/080 b/tests/qemu-iotests/080
index 9de337c..28b8715 100755
--- a/tests/qemu-iotests/080
+++ b/tests/qemu-iotests/080
@@ -42,6 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file
 _supported_os Linux
+_unsupported_imgopts 'refcount_width=1[^0-9]'
 
 header_size=104
 
diff --git a/tests/qemu-iotests/089 b/tests/qemu-iotests/089
index dffc977..dba64f4 100755
--- a/tests/qemu-iotests/089
+++ b/tests/qemu-iotests/089
@@ -41,6 +41,13 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file
 _supported_os Linux
+# Because this would change the output of qemu_io -c info
+_unsupported_imgopts 'refcount_width=1[^0-9]' \
+                     'refcount_width=2[^0-9]' \
+                     'refcount_width=4[^0-9]' \
+                     'refcount_width=8[^0-9]' \
+                     'refcount_width=32[^0-9]' \
+                     'refcount_width=64[^0-9]'
 
 # Using an image filename containing quotation marks will render the JSON data
 # below invalid. In that case, we have little choice but simply not to run this
diff --git a/tests/qemu-iotests/090 b/tests/qemu-iotests/090
index 70b5a6f..359b631 100755
--- a/tests/qemu-iotests/090
+++ b/tests/qemu-iotests/090
@@ -41,6 +41,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file nfs
 _supported_os Linux
+_unsupported_imgopts 'refcount_width=1[^0-9]'
 
 IMG_SIZE=128K
 
diff --git a/tests/qemu-iotests/108 b/tests/qemu-iotests/108
index 12fc92a..2269930 100755
--- a/tests/qemu-iotests/108
+++ b/tests/qemu-iotests/108
@@ -43,6 +43,12 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file
 _supported_os Linux
+_unsupported_imgopts 'refcount_width=1[^0-9]' \
+                     'refcount_width=2[^0-9]' \
+                     'refcount_width=4[^0-9]' \
+                     'refcount_width=8[^0-9]' \
+                     'refcount_width=32[^0-9]' \
+                     'refcount_width=64[^0-9]'
 
 echo
 echo '=== Repairing an image without any refcount table ==='
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 3acdb30..1641f85 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -189,7 +189,8 @@ _filter_img_create()
         -e "s# block_size=[0-9]\\+##g" \
         -e "s# block_state_zero=\\(on\\|off\\)##g" \
         -e "s# log_size=[0-9]\\+##g" \
-        -e "s/archipelago:a/TEST_DIR\//g"
+        -e "s/archipelago:a/TEST_DIR\//g" \
+        -e "s# refcount_width=[0-9]\\+##g"
 }
 
 _filter_img_info()
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 12/21] qcow2: Allow creation with refcount order != 4
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (10 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 11/21] iotests: Prepare for refcount_width option Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-11 18:05   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 13/21] block: Add opaque value to the amend CB Max Reitz
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add a creation option to qcow2 for setting the refcount order of images
to be created, and respect that option's value.

This breaks some test outputs, fix them.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c              |  20 ++++++++
 include/block/block_int.h  |   1 +
 tests/qemu-iotests/049.out | 112 ++++++++++++++++++++++-----------------------
 tests/qemu-iotests/079.out |  18 ++++----
 tests/qemu-iotests/082.out |  41 ++++++++++++++---
 tests/qemu-iotests/085.out |  38 +++++++--------
 6 files changed, 139 insertions(+), 91 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 9accd88..5ec9e34 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2065,6 +2065,17 @@ static int qcow2_create(const char *filename, QemuOpts *opts, Error **errp)
         goto finish;
     }
 
+    refcount_width = qemu_opt_get_number_del(opts, BLOCK_OPT_REFCOUNT_WIDTH,
+                                             refcount_width);
+    if (refcount_width <= 0 || refcount_width > 64 ||
+        !is_power_of_2(refcount_width))
+    {
+        error_setg(errp, "Refcount width must be a power of two and may not "
+                   "exceed 64 bits");
+        ret = -EINVAL;
+        goto finish;
+    }
+
     if (version < 3 && refcount_width != 16) {
         error_setg(errp, "Different refcount widths than 16 bits require "
                    "compatibility level 1.1 or above (use compat=1.1 or "
@@ -2704,6 +2715,9 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
         } else if (!strcmp(desc->name, "lazy_refcounts")) {
             lazy_refcounts = qemu_opt_get_bool(opts, "lazy_refcounts",
                                                lazy_refcounts);
+        } else if (!strcmp(desc->name, "refcount_width")) {
+            error_report("Cannot change refcount entry width");
+            return -ENOTSUP;
         } else {
             /* if this assertion fails, this probably means a new option was
              * added without having it covered here */
@@ -2873,6 +2887,12 @@ static QemuOptsList qcow2_create_opts = {
             .help = "Postpone refcount updates",
             .def_value_str = "off"
         },
+        {
+            .name = BLOCK_OPT_REFCOUNT_WIDTH,
+            .type = QEMU_OPT_NUMBER,
+            .help = "Width of a reference count entry in bits",
+            .def_value_str = "16"
+        },
         { /* end of list */ }
     }
 };
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a1c17b9..c34d610 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -56,6 +56,7 @@
 #define BLOCK_OPT_ADAPTER_TYPE      "adapter_type"
 #define BLOCK_OPT_REDUNDANCY        "redundancy"
 #define BLOCK_OPT_NOCOW             "nocow"
+#define BLOCK_OPT_REFCOUNT_WIDTH    "refcount_width"
 
 typedef struct BdrvTrackedRequest {
     BlockDriverState *bs;
diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out
index 09ca0ae..9369c12 100644
--- a/tests/qemu-iotests/049.out
+++ b/tests/qemu-iotests/049.out
@@ -4,90 +4,90 @@ QA output created by 049
 == 1. Traditional size parameter ==
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024b
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1K
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1T
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024.0
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024.0b
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5K
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5T
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 == 2. Specifying size via -o ==
 
 qemu-img create -f qcow2 -o size=1024 TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1024b TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1k TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1K TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1M TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1G TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1T TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1024.0 TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1024.0b TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1.5k TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1.5K TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1.5M TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1.5G TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1.5T TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 == 3. Invalid sizes ==
 
@@ -97,7 +97,7 @@ qemu-img: Image size must be less than 8 EiB!
 qemu-img create -f qcow2 -o size=-1024 TEST_DIR/t.qcow2
 qemu-img: qcow2 doesn't support shrinking images yet
 qemu-img: TEST_DIR/t.qcow2: Could not resize image: Operation not supported
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=-1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=-1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- -1k
 qemu-img: Image size must be less than 8 EiB!
@@ -105,17 +105,17 @@ qemu-img: Image size must be less than 8 EiB!
 qemu-img create -f qcow2 -o size=-1k TEST_DIR/t.qcow2
 qemu-img: qcow2 doesn't support shrinking images yet
 qemu-img: TEST_DIR/t.qcow2: Could not resize image: Operation not supported
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=-1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=-1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- 1kilobyte
-qemu-img: Invalid image size specified! You may use k, M, G, T, P or E suffixes for 
+qemu-img: Invalid image size specified! You may use k, M, G, T, P or E suffixes for
 qemu-img: kilobytes, megabytes, gigabytes, terabytes, petabytes and exabytes.
 
 qemu-img create -f qcow2 -o size=1kilobyte TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- foobar
-qemu-img: Invalid image size specified! You may use k, M, G, T, P or E suffixes for 
+qemu-img: Invalid image size specified! You may use k, M, G, T, P or E suffixes for
 qemu-img: kilobytes, megabytes, gigabytes, terabytes, petabytes and exabytes.
 
 qemu-img create -f qcow2 -o size=foobar TEST_DIR/t.qcow2
@@ -125,84 +125,84 @@ qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2'
 == Check correct interpretation of suffixes for cluster size ==
 
 qemu-img create -f qcow2 -o cluster_size=1024 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=1024b TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=1k TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=1K TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=1M TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1048576 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1048576 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=1024.0 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=1024.0b TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5k TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=512 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=512 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5K TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=512 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=512 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5M TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=524288 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=524288 lazy_refcounts=off refcount_width=16
 
 == Check compat level option ==
 
 qemu-img create -f qcow2 -o compat=0.10 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.10' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.10' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o compat=1.1 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='1.1' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='1.1' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o compat=0.42 TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid compatibility level: '0.42'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.42' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.42' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o compat=foobar TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid compatibility level: 'foobar'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='foobar' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='foobar' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 == Check preallocation option ==
 
 qemu-img create -f qcow2 -o preallocation=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 preallocation='off' lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 preallocation='off' lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o preallocation=metadata TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 preallocation='metadata' lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 preallocation='metadata' lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o preallocation=1234 TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: invalid parameter value: 1234
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 preallocation='1234' lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 preallocation='1234' lazy_refcounts=off refcount_width=16
 
 == Check encryption option ==
 
 qemu-img create -f qcow2 -o encryption=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o encryption=on TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=on cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=on cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 == Check lazy_refcounts option (only with v3) ==
 
 qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='1.1' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='1.1' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='1.1' encryption=off cluster_size=65536 lazy_refcounts=on 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='1.1' encryption=off cluster_size=65536 lazy_refcounts=on refcount_width=16
 
 qemu-img create -f qcow2 -o compat=0.10,lazy_refcounts=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.10' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.10' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o compat=0.10,lazy_refcounts=on TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.10' encryption=off cluster_size=65536 lazy_refcounts=on 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.10' encryption=off cluster_size=65536 lazy_refcounts=on refcount_width=16
 
 *** done
diff --git a/tests/qemu-iotests/079.out b/tests/qemu-iotests/079.out
index ef4b8c9..8443762 100644
--- a/tests/qemu-iotests/079.out
+++ b/tests/qemu-iotests/079.out
@@ -2,31 +2,31 @@ QA output created by 079
 === Check option preallocation and cluster_size ===
 
 qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=16384 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=16384 preallocation='metadata' lazy_refcounts=off
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=16384 preallocation='metadata' lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=32768 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=32768 preallocation='metadata' lazy_refcounts=off
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=32768 preallocation='metadata' lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=65536 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=65536 preallocation='metadata' lazy_refcounts=off
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=65536 preallocation='metadata' lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=131072 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=131072 preallocation='metadata' lazy_refcounts=off
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=131072 preallocation='metadata' lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=262144 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=262144 preallocation='metadata' lazy_refcounts=off
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=262144 preallocation='metadata' lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=524288 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=524288 preallocation='metadata' lazy_refcounts=off
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=524288 preallocation='metadata' lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=1048576 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=1048576 preallocation='metadata' lazy_refcounts=off
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=1048576 preallocation='metadata' lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=2097152 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=2097152 preallocation='metadata' lazy_refcounts=off
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=2097152 preallocation='metadata' lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=4194304 TEST_DIR/t.qcow2 4G
 qemu-img: TEST_DIR/t.qcow2: Cluster size must be a power of two between 512 and 2048k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=4194304 preallocation='metadata' lazy_refcounts=off
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=4194304 preallocation='metadata' lazy_refcounts=off refcount_width=16
 
 *** done
diff --git a/tests/qemu-iotests/082.out b/tests/qemu-iotests/082.out
index 4b14b4f..dc8bdd3 100644
--- a/tests/qemu-iotests/082.out
+++ b/tests/qemu-iotests/082.out
@@ -3,14 +3,14 @@ QA output created by 082
 === create: Options specified more than once ===
 
 Testing: create -f foo -f qcow2 TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128M (134217728 bytes)
 cluster_size: 65536
 
 Testing: create -f qcow2 -o cluster_size=4k -o lazy_refcounts=on TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=4096 lazy_refcounts=on 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=4096 lazy_refcounts=on refcount_width=16
 
 Testing: info TEST_DIR/t.qcow2
 image: TEST_DIR/t.qcow2
@@ -25,7 +25,7 @@ Format specific information:
     corrupt: false
 
 Testing: create -f qcow2 -o cluster_size=4k -o lazy_refcounts=on -o cluster_size=8k TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=8192 lazy_refcounts=on 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=8192 lazy_refcounts=on refcount_width=16
 
 Testing: info TEST_DIR/t.qcow2
 image: TEST_DIR/t.qcow2
@@ -40,7 +40,7 @@ Format specific information:
     corrupt: false
 
 Testing: create -f qcow2 -o cluster_size=4k,cluster_size=8k TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=8192 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=8192 lazy_refcounts=off refcount_width=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128M (134217728 bytes)
@@ -58,6 +58,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o ? TEST_DIR/t.qcow2 128M
@@ -70,6 +71,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o cluster_size=4k,help TEST_DIR/t.qcow2 128M
@@ -82,6 +84,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o cluster_size=4k,? TEST_DIR/t.qcow2 128M
@@ -94,6 +97,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o help,cluster_size=4k TEST_DIR/t.qcow2 128M
@@ -106,6 +110,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o ?,cluster_size=4k TEST_DIR/t.qcow2 128M
@@ -118,6 +123,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o cluster_size=4k -o help TEST_DIR/t.qcow2 128M
@@ -130,6 +136,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o cluster_size=4k -o ? TEST_DIR/t.qcow2 128M
@@ -142,13 +149,14 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o backing_file=TEST_DIR/t.qcow2,,help TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2,help' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2,help' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 Testing: create -f qcow2 -o backing_file=TEST_DIR/t.qcow2,,? TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2,?' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2,?' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 Testing: create -f qcow2 -o backing_file=TEST_DIR/t.qcow2, -o help TEST_DIR/t.qcow2 128M
 qemu-img: Invalid option list: backing_file=TEST_DIR/t.qcow2,
@@ -169,6 +177,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 
 Testing: create -o help
 Supported options:
@@ -177,7 +186,7 @@ size             Virtual disk size
 === convert: Options specified more than once ===
 
 Testing: create -f qcow2 TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 Testing: convert -f foo -f qcow2 TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
 image: TEST_DIR/t.IMGFMT.base
@@ -236,6 +245,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o ? TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -248,6 +258,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o cluster_size=4k,help TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -260,6 +271,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o cluster_size=4k,? TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -272,6 +284,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o help,cluster_size=4k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -284,6 +297,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o ?,cluster_size=4k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -296,6 +310,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o cluster_size=4k -o help TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -308,6 +323,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o cluster_size=4k -o ? TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -320,6 +336,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o backing_file=TEST_DIR/t.qcow2,,help TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -347,6 +364,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 
 Testing: convert -o help
 Supported options:
@@ -414,6 +432,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o ? TEST_DIR/t.qcow2
@@ -426,6 +445,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o cluster_size=4k,help TEST_DIR/t.qcow2
@@ -438,6 +458,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o cluster_size=4k,? TEST_DIR/t.qcow2
@@ -450,6 +471,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o help,cluster_size=4k TEST_DIR/t.qcow2
@@ -462,6 +484,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o ?,cluster_size=4k TEST_DIR/t.qcow2
@@ -474,6 +497,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o cluster_size=4k -o help TEST_DIR/t.qcow2
@@ -486,6 +510,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o cluster_size=4k -o ? TEST_DIR/t.qcow2
@@ -498,6 +523,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o backing_file=TEST_DIR/t.qcow2,,help TEST_DIR/t.qcow2
@@ -527,6 +553,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 
 Testing: convert -o help
 Supported options:
diff --git a/tests/qemu-iotests/085.out b/tests/qemu-iotests/085.out
index 0f2b17f..2e86fb7 100644
--- a/tests/qemu-iotests/085.out
+++ b/tests/qemu-iotests/085.out
@@ -11,7 +11,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 
 === Create a single snapshot on virtio0 ===
 
-Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2.orig' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2.orig' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
 
 === Invalid command - missing device and nodename ===
@@ -25,31 +25,31 @@ Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file
 
 === Create several transactional group snapshots ===
 
-Formatting 'TEST_DIR/2-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/1-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/2-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/2-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/1-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/2-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/3-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/2-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/3-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/2-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/3-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/2-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/3-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/2-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/4-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/3-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/4-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/3-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/4-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/3-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/4-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/3-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/5-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/4-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/5-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/4-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/5-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/4-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/5-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/4-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/6-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/5-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/6-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/5-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/6-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/5-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/6-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/5-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/7-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/6-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/7-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/6-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/7-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/6-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/7-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/6-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/8-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/7-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/8-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/7-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/8-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/7-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/8-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/7-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/9-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/8-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/9-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/8-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/9-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/8-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/9-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/8-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/10-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/9-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/10-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/9-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/10-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/9-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/10-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/9-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
 *** done
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 13/21] block: Add opaque value to the amend CB
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (11 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 12/21] qcow2: Allow creation with refcount order != 4 Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-11 18:08   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 14/21] qcow2: Use error_report() in qcow2_amend_options() Max Reitz
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add an opaque value which is to be passed to the bdrv_amend_options()
status callback.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block.c                   |  4 ++--
 block/qcow2-cluster.c     | 14 ++++++++------
 block/qcow2.c             |  9 +++++----
 block/qcow2.h             |  3 ++-
 include/block/block.h     |  4 ++--
 include/block/block_int.h |  3 ++-
 qemu-img.c                |  5 +++--
 7 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/block.c b/block.c
index c979d51..c34b188 100644
--- a/block.c
+++ b/block.c
@@ -5790,12 +5790,12 @@ void bdrv_add_before_write_notifier(BlockDriverState *bs,
 }
 
 int bdrv_amend_options(BlockDriverState *bs, QemuOpts *opts,
-                       BlockDriverAmendStatusCB *status_cb)
+                       BlockDriverAmendStatusCB *status_cb, void *cb_opaque)
 {
     if (!bs->drv->bdrv_amend_options) {
         return -ENOTSUP;
     }
-    return bs->drv->bdrv_amend_options(bs, opts, status_cb);
+    return bs->drv->bdrv_amend_options(bs, opts, status_cb, cb_opaque);
 }
 
 /* This function will be called by the bdrv_recurse_is_first_non_filter method
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ab43902..2daf334 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1620,7 +1620,8 @@ fail:
 static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                                       int l1_size, int64_t *visited_l1_entries,
                                       int64_t l1_entries,
-                                      BlockDriverAmendStatusCB *status_cb)
+                                      BlockDriverAmendStatusCB *status_cb,
+                                      void *cb_opaque)
 {
     BDRVQcowState *s = bs->opaque;
     bool is_active_l1 = (l1_table == s->l1_table);
@@ -1646,7 +1647,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
             /* unallocated */
             (*visited_l1_entries)++;
             if (status_cb) {
-                status_cb(bs, *visited_l1_entries, l1_entries);
+                status_cb(bs, *visited_l1_entries, l1_entries, cb_opaque);
             }
             continue;
         }
@@ -1768,7 +1769,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
 
         (*visited_l1_entries)++;
         if (status_cb) {
-            status_cb(bs, *visited_l1_entries, l1_entries);
+            status_cb(bs, *visited_l1_entries, l1_entries, cb_opaque);
         }
     }
 
@@ -1797,7 +1798,8 @@ fail:
  * qcow2 version which doesn't yet support metadata zero clusters.
  */
 int qcow2_expand_zero_clusters(BlockDriverState *bs,
-                               BlockDriverAmendStatusCB *status_cb)
+                               BlockDriverAmendStatusCB *status_cb,
+                               void *cb_opaque)
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t *l1_table = NULL;
@@ -1814,7 +1816,7 @@ int qcow2_expand_zero_clusters(BlockDriverState *bs,
 
     ret = expand_zero_clusters_in_l1(bs, s->l1_table, s->l1_size,
                                      &visited_l1_entries, l1_entries,
-                                     status_cb);
+                                     status_cb, cb_opaque);
     if (ret < 0) {
         goto fail;
     }
@@ -1849,7 +1851,7 @@ int qcow2_expand_zero_clusters(BlockDriverState *bs,
 
         ret = expand_zero_clusters_in_l1(bs, l1_table, s->snapshots[i].l1_size,
                                          &visited_l1_entries, l1_entries,
-                                         status_cb);
+                                         status_cb, cb_opaque);
         if (ret < 0) {
             goto fail;
         }
diff --git a/block/qcow2.c b/block/qcow2.c
index 5ec9e34..21a1883 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2592,7 +2592,7 @@ static int qcow2_load_vmstate(BlockDriverState *bs, uint8_t *buf,
  * have to be removed.
  */
 static int qcow2_downgrade(BlockDriverState *bs, int target_version,
-                           BlockDriverAmendStatusCB *status_cb)
+                           BlockDriverAmendStatusCB *status_cb, void *cb_opaque)
 {
     BDRVQcowState *s = bs->opaque;
     int current_version = s->qcow_version;
@@ -2641,7 +2641,7 @@ static int qcow2_downgrade(BlockDriverState *bs, int target_version,
     /* clearing autoclear features is trivial */
     s->autoclear_features = 0;
 
-    ret = qcow2_expand_zero_clusters(bs, status_cb);
+    ret = qcow2_expand_zero_clusters(bs, status_cb, cb_opaque);
     if (ret < 0) {
         return ret;
     }
@@ -2656,7 +2656,8 @@ static int qcow2_downgrade(BlockDriverState *bs, int target_version,
 }
 
 static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
-                               BlockDriverAmendStatusCB *status_cb)
+                               BlockDriverAmendStatusCB *status_cb,
+                               void *cb_opaque)
 {
     BDRVQcowState *s = bs->opaque;
     int old_version = s->qcow_version, new_version = old_version;
@@ -2737,7 +2738,7 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
                 return ret;
             }
         } else {
-            ret = qcow2_downgrade(bs, new_version, status_cb);
+            ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
             if (ret < 0) {
                 return ret;
             }
diff --git a/block/qcow2.h b/block/qcow2.h
index 1c63221..fe12c54 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -551,7 +551,8 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
 int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors);
 
 int qcow2_expand_zero_clusters(BlockDriverState *bs,
-                               BlockDriverAmendStatusCB *status_cb);
+                               BlockDriverAmendStatusCB *status_cb,
+                               void *cb_opaque);
 
 /* qcow2-snapshot.c functions */
 int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info);
diff --git a/include/block/block.h b/include/block/block.h
index 287dcab..5fd4c81 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -272,9 +272,9 @@ int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix);
  * block driver; total_work_size may change during the course of the amendment
  * operation */
 typedef void BlockDriverAmendStatusCB(BlockDriverState *bs, int64_t offset,
-                                      int64_t total_work_size);
+                                      int64_t total_work_size, void *opaque);
 int bdrv_amend_options(BlockDriverState *bs_new, QemuOpts *opts,
-                       BlockDriverAmendStatusCB *status_cb);
+                       BlockDriverAmendStatusCB *status_cb, void *cb_opaque);
 
 /* external snapshots */
 bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c34d610..e2167ab 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -234,7 +234,8 @@ struct BlockDriver {
         BdrvCheckMode fix);
 
     int (*bdrv_amend_options)(BlockDriverState *bs, QemuOpts *opts,
-                              BlockDriverAmendStatusCB *status_cb);
+                              BlockDriverAmendStatusCB *status_cb,
+                              void *cb_opaque);
 
     void (*bdrv_debug_event)(BlockDriverState *bs, BlkDebugEvent event);
 
diff --git a/qemu-img.c b/qemu-img.c
index a42335c..e0595fe 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2869,7 +2869,8 @@ out:
 }
 
 static void amend_status_cb(BlockDriverState *bs,
-                            int64_t offset, int64_t total_work_size)
+                            int64_t offset, int64_t total_work_size,
+                            void *opaque)
 {
     qemu_progress_print(100.f * offset / total_work_size, 0);
 }
@@ -2982,7 +2983,7 @@ static int img_amend(int argc, char **argv)
 
     /* In case the driver does not call amend_status_cb() */
     qemu_progress_print(0.f, 0);
-    ret = bdrv_amend_options(bs, opts, &amend_status_cb);
+    ret = bdrv_amend_options(bs, opts, &amend_status_cb, NULL);
     qemu_progress_print(100.f, 0);
     if (ret < 0) {
         error_report("Error while amending options: %s", strerror(-ret));
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 14/21] qcow2: Use error_report() in qcow2_amend_options()
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (12 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 13/21] block: Add opaque value to the amend CB Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-11 18:11   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 15/21] qcow2: Use abort() instead of assert(false) Max Reitz
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c              | 14 ++++++--------
 tests/qemu-iotests/061.out | 14 +++++++-------
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 21a1883..beb7187 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2686,11 +2686,11 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
             } else if (!strcmp(compat, "1.1")) {
                 new_version = 3;
             } else {
-                fprintf(stderr, "Unknown compatibility level %s.\n", compat);
+                error_report("Unknown compatibility level %s.", compat);
                 return -EINVAL;
             }
         } else if (!strcmp(desc->name, "preallocation")) {
-            fprintf(stderr, "Cannot change preallocation mode.\n");
+            error_report("Cannot change preallocation mode.");
             return -ENOTSUP;
         } else if (!strcmp(desc->name, "size")) {
             new_size = qemu_opt_get_size(opts, "size", 0);
@@ -2701,16 +2701,14 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
         } else if (!strcmp(desc->name, "encryption")) {
             encrypt = qemu_opt_get_bool(opts, "encryption", s->crypt_method);
             if (encrypt != !!s->crypt_method) {
-                fprintf(stderr, "Changing the encryption flag is not "
-                        "supported.\n");
+                error_report("Changing the encryption flag is not supported.");
                 return -ENOTSUP;
             }
         } else if (!strcmp(desc->name, "cluster_size")) {
             cluster_size = qemu_opt_get_size(opts, "cluster_size",
                                              cluster_size);
             if (cluster_size != s->cluster_size) {
-                fprintf(stderr, "Changing the cluster size is not "
-                        "supported.\n");
+                error_report("Changing the cluster size is not supported.");
                 return -ENOTSUP;
             }
         } else if (!strcmp(desc->name, "lazy_refcounts")) {
@@ -2756,8 +2754,8 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
     if (s->use_lazy_refcounts != lazy_refcounts) {
         if (lazy_refcounts) {
             if (s->qcow_version < 3) {
-                fprintf(stderr, "Lazy refcounts only supported with compatibility "
-                        "level 1.1 and above (use compat=1.1 or greater)\n");
+                error_report("Lazy refcounts only supported with compatibility "
+                             "level 1.1 and above (use compat=1.1 or greater)");
                 return -EINVAL;
             }
             s->compatible_features |= QCOW2_COMPAT_LAZY_REFCOUNTS;
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index 9045544..128838c 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -281,19 +281,19 @@ No errors were found on the image.
 === Testing invalid configurations ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
-Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
+qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
 qemu-img: Error while amending options: Invalid argument
-Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
+qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
 qemu-img: Error while amending options: Invalid argument
-Unknown compatibility level 0.42.
+qemu-img: Unknown compatibility level 0.42.
 qemu-img: Error while amending options: Invalid argument
 qemu-img: Invalid parameter 'foo'
 qemu-img: Invalid options for file format 'qcow2'
-Changing the cluster size is not supported.
+qemu-img: Changing the cluster size is not supported.
 qemu-img: Error while amending options: Operation not supported
-Changing the encryption flag is not supported.
+qemu-img: Changing the encryption flag is not supported.
 qemu-img: Error while amending options: Operation not supported
-Cannot change preallocation mode.
+qemu-img: Cannot change preallocation mode.
 qemu-img: Error while amending options: Operation not supported
 
 === Testing correct handling of unset value ===
@@ -301,7 +301,7 @@ qemu-img: Error while amending options: Operation not supported
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
 Should work:
 Should not work:
-Changing the cluster size is not supported.
+qemu-img: Changing the cluster size is not supported.
 qemu-img: Error while amending options: Operation not supported
 
 === Testing zero expansion on inactive clusters ===
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 15/21] qcow2: Use abort() instead of assert(false)
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (13 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 14/21] qcow2: Use error_report() in qcow2_amend_options() Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-11 18:12   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 16/21] qcow2: Split upgrade/downgrade paths for amend Max Reitz
                   ` (5 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index beb7187..ebf843f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2718,9 +2718,9 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
             error_report("Cannot change refcount entry width");
             return -ENOTSUP;
         } else {
-            /* if this assertion fails, this probably means a new option was
+            /* if this point is reached, this probably means a new option was
              * added without having it covered here */
-            assert(false);
+            abort();
         }
 
         desc++;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 16/21] qcow2: Split upgrade/downgrade paths for amend
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (14 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 15/21] qcow2: Use abort() instead of assert(false) Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-11 18:14   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 17/21] qcow2: Use intermediate helper CB " Max Reitz
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

If the image version should be upgraded, that is the first we should do;
if it should be downgraded, that is the last we should do. So split the
version change block into an upgrade part at the start and a downgrade
part at the end.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index ebf843f..eaef251 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2726,20 +2726,13 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
         desc++;
     }
 
-    if (new_version != old_version) {
-        if (new_version > old_version) {
-            /* Upgrade */
-            s->qcow_version = new_version;
-            ret = qcow2_update_header(bs);
-            if (ret < 0) {
-                s->qcow_version = old_version;
-                return ret;
-            }
-        } else {
-            ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
-            if (ret < 0) {
-                return ret;
-            }
+    /* Upgrade first (some features may require compat=1.1) */
+    if (new_version > old_version) {
+        s->qcow_version = new_version;
+        ret = qcow2_update_header(bs);
+        if (ret < 0) {
+            s->qcow_version = old_version;
+            return ret;
         }
     }
 
@@ -2753,7 +2746,7 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
 
     if (s->use_lazy_refcounts != lazy_refcounts) {
         if (lazy_refcounts) {
-            if (s->qcow_version < 3) {
+            if (new_version < 3) {
                 error_report("Lazy refcounts only supported with compatibility "
                              "level 1.1 and above (use compat=1.1 or greater)");
                 return -EINVAL;
@@ -2789,6 +2782,14 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
         }
     }
 
+    /* Downgrade last (so unsupported features can be removed before) */
+    if (new_version < old_version) {
+        ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
     return 0;
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 17/21] qcow2: Use intermediate helper CB for amend
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (15 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 16/21] qcow2: Split upgrade/downgrade paths for amend Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-11 21:05   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment Max Reitz
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

If there is more than one time-consuming operation to be performed for
qcow2_amend_options(), we need an intermediate CB which coordinates the
progress of the individual operations and passes the result to the
original status callback.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 75 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index eaef251..e6b93d1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2655,6 +2655,71 @@ static int qcow2_downgrade(BlockDriverState *bs, int target_version,
     return 0;
 }
 
+typedef enum Qcow2AmendOperation {
+    /* This is the value Qcow2AmendHelperCBInfo::last_operation will be
+     * statically initialized to so that the helper CB can discern the first
+     * invocation from an operation change */
+    QCOW2_NO_OPERATION = 0,
+
+    QCOW2_DOWNGRADING,
+} Qcow2AmendOperation;
+
+typedef struct Qcow2AmendHelperCBInfo {
+    /* The code coordinating the amend operations should only modify
+     * these four fields; the rest will be managed by the CB */
+    BlockDriverAmendStatusCB *original_status_cb;
+    void *original_cb_opaque;
+
+    Qcow2AmendOperation current_operation;
+
+    /* Total number of operations to perform (only set once) */
+    int total_operations;
+
+    /* The following fields are managed by the CB */
+
+    /* Number of operations completed */
+    int operations_completed;
+
+    /* Cumulative offset of all completed operations */
+    int64_t offset_completed;
+
+    Qcow2AmendOperation last_operation;
+    int64_t last_work_size;
+} Qcow2AmendHelperCBInfo;
+
+static void qcow2_amend_helper_cb(BlockDriverState *bs, int64_t offset,
+                                 int64_t total_work_size, void *opaque)
+{
+    Qcow2AmendHelperCBInfo *info = opaque;
+    int64_t current_work_size;
+    int64_t projected_work_size;
+
+    if (info->current_operation != info->last_operation) {
+        if (info->last_operation != QCOW2_NO_OPERATION) {
+            info->offset_completed += info->last_work_size;
+            info->operations_completed++;
+        }
+
+        info->last_operation = info->current_operation;
+    }
+
+    info->last_work_size = total_work_size;
+
+    current_work_size = info->offset_completed + total_work_size;
+
+    /* current_work_size is the total work size for (operations_completed + 1)
+     * operations (which includes this one), so multiply it by the number of
+     * operations not covered and divide it by the number of operations
+     * covered to get a projection for the operations not covered */
+    projected_work_size = current_work_size * (info->total_operations -
+                                               info->operations_completed - 1)
+                                            / (info->operations_completed + 1);
+
+    info->original_status_cb(bs, info->offset_completed + offset,
+                             current_work_size + projected_work_size,
+                             info->original_cb_opaque);
+}
+
 static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
                                BlockDriverAmendStatusCB *status_cb,
                                void *cb_opaque)
@@ -2669,6 +2734,7 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
     bool encrypt;
     int ret;
     QemuOptDesc *desc = opts->list->desc;
+    Qcow2AmendHelperCBInfo helper_cb_info;
 
     while (desc && desc->name) {
         if (!qemu_opt_find(opts, desc->name)) {
@@ -2726,6 +2792,12 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
         desc++;
     }
 
+    helper_cb_info = (Qcow2AmendHelperCBInfo){
+        .original_status_cb = status_cb,
+        .original_cb_opaque = cb_opaque,
+        .total_operations = (new_version < old_version)
+    };
+
     /* Upgrade first (some features may require compat=1.1) */
     if (new_version > old_version) {
         s->qcow_version = new_version;
@@ -2784,7 +2856,9 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
 
     /* Downgrade last (so unsupported features can be removed before) */
     if (new_version < old_version) {
-        ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
+        helper_cb_info.current_operation = QCOW2_DOWNGRADING;
+        ret = qcow2_downgrade(bs, new_version, &qcow2_amend_helper_cb,
+                              &helper_cb_info);
         if (ret < 0) {
             return ret;
         }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (16 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 17/21] qcow2: Use intermediate helper CB " Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-12  4:15   ` Eric Blake
  2014-11-12 14:19   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 19/21] qcow2: Invoke refcount order amendment function Max Reitz
                   ` (2 subsequent siblings)
  20 siblings, 2 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add a function qcow2_change_refcount_order() which allows changing the
refcount order of a qcow2 image.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 424 +++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h          |   4 +
 2 files changed, 428 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 08b2ddb..59b6437 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2473,3 +2473,427 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset,
 
     return 0;
 }
+
+/**
+ * This "operation" for walk_over_reftable() allocates the refblock on disk (if
+ * it is not empty) and inserts its offset into the new reftable. The size of
+ * this new reftable is increased as required.
+ */
+static int alloc_refblock(BlockDriverState *bs, uint64_t **reftable,
+                          uint64_t reftable_index, uint64_t *reftable_size,
+                          void *refblock, bool refblock_empty, Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t offset;
+
+    if (!refblock_empty && reftable_index >= *reftable_size) {
+        uint64_t *new_reftable;
+        uint64_t new_reftable_size;
+
+        new_reftable_size = ROUND_UP(reftable_index + 1,
+                                     s->cluster_size / sizeof(uint64_t));
+        if (new_reftable_size > QCOW_MAX_REFTABLE_SIZE / sizeof(uint64_t)) {
+            error_setg(errp,
+                       "This operation would make the refcount table grow "
+                       "beyond the maximum size supported by QEMU, aborting");
+            return -ENOTSUP;
+        }
+
+        new_reftable = g_try_realloc(*reftable, new_reftable_size *
+                                                sizeof(uint64_t));
+        if (!new_reftable) {
+            error_setg(errp, "Failed to increase reftable buffer size");
+            return -ENOMEM;
+        }
+
+        memset(new_reftable + *reftable_size, 0,
+               (new_reftable_size - *reftable_size) * sizeof(uint64_t));
+
+        *reftable      = new_reftable;
+        *reftable_size = new_reftable_size;
+    }
+
+    if (refblock_empty) {
+        if (reftable_index < *reftable_size) {
+            (*reftable)[reftable_index] = 0;
+        }
+    } else {
+        offset = qcow2_alloc_clusters(bs, s->cluster_size);
+        if (offset < 0) {
+            error_setg_errno(errp, -offset, "Failed to allocate refblock");
+            return offset;
+        }
+        (*reftable)[reftable_index++] = offset;
+    }
+
+    return 0;
+}
+
+/**
+ * This "operation" for walk_over_reftable() writes the refblock to disk at the
+ * offset specified by the new reftable's entry. It does not modify the new
+ * reftable or change any refcounts.
+ */
+static int flush_refblock(BlockDriverState *bs, uint64_t **reftable,
+                          uint64_t reftable_index, uint64_t *reftable_size,
+                          void *refblock, bool refblock_empty, Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t offset;
+    int ret;
+
+    if (refblock_empty) {
+        if (reftable_index < *reftable_size) {
+            assert((*reftable)[reftable_index] == 0);
+        }
+    } else {
+        /* The first pass with alloc_refblock() made the reftable large enough
+         */
+        assert(reftable_index < *reftable_size);
+        offset = (*reftable)[reftable_index];
+        assert(offset != 0);
+
+        ret = qcow2_pre_write_overlap_check(bs, 0, offset, s->cluster_size);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Overlap check failed");
+            return ret;
+        }
+
+        ret = bdrv_pwrite(bs->file, offset, refblock, s->cluster_size);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to write refblock");
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+/**
+ * This function walks over the existing reftable and every referenced refblock;
+ * if @new_set_refcount is non-NULL, it is called for every refcount entry to
+ * create an equal new entry in the passed @new_refblock. Once that
+ * @new_refblock is completely filled, @operation will be called.
+ *
+ * @operation is expected to combine the @new_refblock and its entry in the new
+ * reftable (which is described by the parameters starting with "reftable").
+ * @refblock_empty is set if all entries in the refblock are zero.
+ *
+ * @status_cb and @cb_opaque are used for the amend operation's status callback.
+ * @index is the index of the walk_over_reftable() calls and @total is the total
+ * number of walk_over_reftable() calls per amend operation. Both are used for
+ * calculating the parameters for the status callback.
+ */
+static int walk_over_reftable(BlockDriverState *bs, uint64_t **new_reftable,
+                              uint64_t *new_reftable_index,
+                              uint64_t *new_reftable_size,
+                              void *new_refblock, int new_refblock_size,
+                              int new_refcount_bits,
+                              int (*operation)(BlockDriverState *bs,
+                                               uint64_t **reftable,
+                                               uint64_t reftable_index,
+                                               uint64_t *reftable_size,
+                                               void *refblock,
+                                               bool refblock_empty,
+                                               Error **errp),
+                              Qcow2SetRefcountFunc *new_set_refcount,
+                              BlockDriverAmendStatusCB *status_cb,
+                              void *cb_opaque, int index, int total,
+                              Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t reftable_index;
+    bool new_refblock_empty = true;
+    int refblock_index;
+    int new_refblock_index = 0;
+    int ret;
+
+    for (reftable_index = 0; reftable_index < s->refcount_table_size;
+         reftable_index++)
+    {
+        uint64_t refblock_offset = s->refcount_table[reftable_index]
+                                 & REFT_OFFSET_MASK;
+
+        status_cb(bs, (uint64_t)index * s->refcount_table_size + reftable_index,
+                  (uint64_t)total * s->refcount_table_size, cb_opaque);
+
+        if (refblock_offset) {
+            void *refblock;
+
+            if (offset_into_cluster(s, refblock_offset)) {
+                qcow2_signal_corruption(bs, true, -1, -1, "Refblock offset %#"
+                                        PRIx64 " unaligned (reftable index: %#"
+                                        PRIx64 ")", refblock_offset,
+                                        reftable_index);
+                error_setg(errp,
+                           "Image is corrupt (unaligned refblock offset)");
+                return -EIO;
+            }
+
+            ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offset,
+                                  &refblock);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "Failed to retrieve refblock");
+                return ret;
+            }
+
+            for (refblock_index = 0; refblock_index < s->refcount_block_size;
+                 refblock_index++)
+            {
+                uint64_t refcount;
+
+                if (new_refblock_index >= new_refblock_size) {
+                    /* new_refblock is now complete */
+                    ret = operation(bs, new_reftable, *new_reftable_index,
+                                    new_reftable_size, new_refblock,
+                                    new_refblock_empty, errp);
+                    if (ret < 0) {
+                        qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+                        return ret;
+                    }
+
+                    (*new_reftable_index)++;
+                    new_refblock_index = 0;
+                    new_refblock_empty = true;
+                }
+
+                refcount = s->get_refcount(refblock, refblock_index);
+                if (new_refcount_bits < 64 && refcount >> new_refcount_bits) {
+                    uint64_t offset;
+
+                    qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+
+                    offset = ((reftable_index << s->refcount_block_bits)
+                              + refblock_index) << s->cluster_bits;
+
+                    error_setg(errp, "Cannot decrease refcount entry width to "
+                               "%i bits: Cluster at offset %#" PRIx64 " has a "
+                               "refcount of %" PRIu64, new_refcount_bits,
+                               offset, refcount);
+                    return -EINVAL;
+                }
+
+                if (new_set_refcount) {
+                    new_set_refcount(new_refblock, new_refblock_index++, refcount);
+                } else {
+                    new_refblock_index++;
+                }
+                new_refblock_empty = new_refblock_empty && refcount == 0;
+            }
+
+            ret = qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "Failed to put refblock back into "
+                                 "the cache");
+                return ret;
+            }
+        } else {
+            /* No refblock means every refcount is 0 */
+            for (refblock_index = 0; refblock_index < s->refcount_block_size;
+                 refblock_index++)
+            {
+                if (new_refblock_index >= new_refblock_size) {
+                    /* new_refblock is now complete */
+                    ret = operation(bs, new_reftable, *new_reftable_index,
+                                    new_reftable_size, new_refblock,
+                                    new_refblock_empty, errp);
+                    if (ret < 0) {
+                        return ret;
+                    }
+
+                    (*new_reftable_index)++;
+                    new_refblock_index = 0;
+                    new_refblock_empty = true;
+                }
+
+                if (new_set_refcount) {
+                    new_set_refcount(new_refblock, new_refblock_index++, 0);
+                } else {
+                    new_refblock_index++;
+                }
+            }
+        }
+    }
+
+    if (new_refblock_index > 0) {
+        /* Complete the potentially existing partially filled final refblock */
+        if (new_set_refcount) {
+            for (; new_refblock_index < new_refblock_size;
+                 new_refblock_index++)
+            {
+                new_set_refcount(new_refblock, new_refblock_index, 0);
+            }
+        }
+
+        ret = operation(bs, new_reftable, *new_reftable_index,
+                        new_reftable_size, new_refblock, new_refblock_empty,
+                        errp);
+        if (ret < 0) {
+            return ret;
+        }
+
+        (*new_reftable_index)++;
+    }
+
+    return 0;
+}
+
+int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
+                                BlockDriverAmendStatusCB *status_cb,
+                                void *cb_opaque, Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    Qcow2GetRefcountFunc *new_get_refcount;
+    Qcow2SetRefcountFunc *new_set_refcount;
+    void *new_refblock = qemu_blockalign(bs->file, s->cluster_size);
+    uint64_t *new_reftable = NULL, new_reftable_size = 0;
+    uint64_t *old_reftable, old_reftable_size, old_reftable_offset;
+    uint64_t new_reftable_index = 0;
+    uint64_t i;
+    int64_t new_reftable_offset;
+    int new_refblock_size, new_refcount_bits = 1 << refcount_order;
+    int old_refcount_order;
+    int ret;
+
+    assert(s->qcow_version >= 3);
+    assert(refcount_order >= 0 && refcount_order <= 6);
+
+    /* see qcow2_open() */
+    new_refblock_size = 1 << (s->cluster_bits - (refcount_order - 3));
+
+    get_refcount_functions(refcount_order,
+                           &new_get_refcount, &new_set_refcount);
+
+
+    /* First, allocate the structures so they are present in the refcount
+     * structures */
+    ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
+                             &new_reftable_size, NULL, new_refblock_size,
+                             new_refcount_bits, &alloc_refblock, NULL,
+                             status_cb, cb_opaque, 0, 2, errp);
+    if (ret < 0) {
+        goto done;
+    }
+
+    /* The new_reftable_size is now valid and will not be changed anymore,
+     * so we can now allocate the reftable */
+    new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size *
+                                                   sizeof(uint64_t));
+    if (new_reftable_offset < 0) {
+        error_setg_errno(errp, -new_reftable_offset,
+                         "Failed to allocate the new reftable");
+        ret = new_reftable_offset;
+        goto done;
+    }
+
+    new_reftable_index = 0;
+
+    /* Second, write the new refblocks */
+    ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
+                             &new_reftable_size, new_refblock,
+                             new_refblock_size, new_refcount_bits,
+                             &flush_refblock, new_set_refcount,
+                             status_cb, cb_opaque, 1, 2, errp);
+    if (ret < 0) {
+        goto done;
+    }
+
+
+    /* Write the new reftable */
+    ret = qcow2_pre_write_overlap_check(bs, 0, new_reftable_offset,
+                                        new_reftable_size * sizeof(uint64_t));
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Overlap check failed");
+        goto done;
+    }
+
+    for (i = 0; i < new_reftable_size; i++) {
+        cpu_to_be64s(&new_reftable[i]);
+    }
+
+    ret = bdrv_pwrite(bs->file, new_reftable_offset, new_reftable,
+                      new_reftable_size * sizeof(uint64_t));
+
+    for (i = 0; i < new_reftable_size; i++) {
+        be64_to_cpus(&new_reftable[i]);
+    }
+
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Failed to write the new reftable");
+        goto done;
+    }
+
+
+    /* Empty the refcount cache */
+    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Failed to flush the refblock cache");
+        goto done;
+    }
+
+    /* Update the image header to point to the new reftable; this only updates
+     * the fields which are relevant to qcow2_update_header(); other fields
+     * such as s->refcount_table or s->refcount_bits stay stale for now
+     * (because we have to restore everything if qcow2_update_header() fails) */
+    old_refcount_order  = s->refcount_order;
+    old_reftable_size   = s->refcount_table_size;
+    old_reftable_offset = s->refcount_table_offset;
+
+    s->refcount_order        = refcount_order;
+    s->refcount_table_size   = new_reftable_size;
+    s->refcount_table_offset = new_reftable_offset;
+
+    ret = qcow2_update_header(bs);
+    if (ret < 0) {
+        s->refcount_order        = old_refcount_order;
+        s->refcount_table_size   = old_reftable_size;
+        s->refcount_table_offset = old_reftable_offset;
+        error_setg_errno(errp, -ret, "Failed to update the qcow2 header");
+        goto done;
+    }
+
+    /* Now update the rest of the in-memory information */
+    old_reftable = s->refcount_table;
+    s->refcount_table = new_reftable;
+
+    /* For cleaning up all old refblocks */
+    new_reftable      = old_reftable;
+    new_reftable_size = old_reftable_size;
+
+    s->refcount_bits = 1 << refcount_order;
+    if (refcount_order < 6) {
+        s->refcount_max = (UINT64_C(1) << s->refcount_bits) - 1;
+    } else {
+        s->refcount_max = INT64_MAX;
+    }
+
+    s->refcount_block_bits = s->cluster_bits - (refcount_order - 3);
+    s->refcount_block_size = 1 << s->refcount_block_bits;
+
+    s->get_refcount = new_get_refcount;
+    s->set_refcount = new_set_refcount;
+
+    /* And free the old reftable (the old refblocks are freed below the "done"
+     * label) */
+    qcow2_free_clusters(bs, old_reftable_offset,
+                        old_reftable_size * sizeof(uint64_t),
+                        QCOW2_DISCARD_NEVER);
+
+done:
+    if (new_reftable) {
+        /* On success, new_reftable actually points to the old reftable (and
+         * new_reftable_size is the old reftable's size); but that is just
+         * fine */
+        for (i = 0; i < new_reftable_size; i++) {
+            uint64_t offset = new_reftable[i] & REFT_OFFSET_MASK;
+            if (offset) {
+                qcow2_free_clusters(bs, offset, s->cluster_size,
+                                    QCOW2_DISCARD_NEVER);
+            }
+        }
+        g_free(new_reftable);
+    }
+
+    qemu_vfree(new_refblock);
+    return ret;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index fe12c54..5b96519 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -526,6 +526,10 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
 int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset,
                                   int64_t size);
 
+int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
+                                BlockDriverAmendStatusCB *status_cb,
+                                void *cb_opaque, Error **errp);
+
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
                         bool exact_size);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 19/21] qcow2: Invoke refcount order amendment function
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (17 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-12  4:36   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 20/21] qcow2: Point to amend function in check Max Reitz
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 21/21] iotests: Add test for different refcount widths Max Reitz
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Make use of qcow2_change_refcount_order() to support changing the
refcount order with qemu-img amend.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 44 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index e6b93d1..e8a6bb1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2607,13 +2607,7 @@ static int qcow2_downgrade(BlockDriverState *bs, int target_version,
     }
 
     if (s->refcount_order != 4) {
-        /* we would have to convert the image to a refcount_order == 4 image
-         * here; however, since qemu (at the time of writing this) does not
-         * support anything different than 4 anyway, there is no point in doing
-         * so right now; however, we should error out (if qemu supports this in
-         * the future and this code has not been adapted) */
-        error_report("qcow2_downgrade: Image refcount orders other than 4 are "
-                     "currently not supported.");
+        error_report("compat=0.10 requires refcount_width=16");
         return -ENOTSUP;
     }
 
@@ -2661,6 +2655,7 @@ typedef enum Qcow2AmendOperation {
      * invocation from an operation change */
     QCOW2_NO_OPERATION = 0,
 
+    QCOW2_CHANGING_REFCOUNT_ORDER,
     QCOW2_DOWNGRADING,
 } Qcow2AmendOperation;
 
@@ -2732,6 +2727,7 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
     const char *compat = NULL;
     uint64_t cluster_size = s->cluster_size;
     bool encrypt;
+    int refcount_width = s->refcount_bits;
     int ret;
     QemuOptDesc *desc = opts->list->desc;
     Qcow2AmendHelperCBInfo helper_cb_info;
@@ -2781,8 +2777,16 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
             lazy_refcounts = qemu_opt_get_bool(opts, "lazy_refcounts",
                                                lazy_refcounts);
         } else if (!strcmp(desc->name, "refcount_width")) {
-            error_report("Cannot change refcount entry width");
-            return -ENOTSUP;
+            refcount_width = qemu_opt_get_number(opts, "refcount_width",
+                                                 refcount_width);
+
+            if (refcount_width <= 0 || refcount_width > 64 ||
+                !is_power_of_2(refcount_width))
+            {
+                error_report("Refcount width must be a power of two and may "
+                             "not exceed 64 bits");
+                return -EINVAL;
+            }
         } else {
             /* if this point is reached, this probably means a new option was
              * added without having it covered here */
@@ -2796,6 +2800,7 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
         .original_status_cb = status_cb,
         .original_cb_opaque = cb_opaque,
         .total_operations = (new_version < old_version)
+                          + (s->refcount_bits != refcount_width)
     };
 
     /* Upgrade first (some features may require compat=1.1) */
@@ -2808,6 +2813,27 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
         }
     }
 
+    if (s->refcount_bits != refcount_width) {
+        int refcount_order = ffs(refcount_width) - 1;
+        Error *local_error = NULL;
+
+        if (new_version < 3 && refcount_width != 16) {
+            error_report("Different refcount widths than 16 bits require "
+                         "compatibility level 1.1 or above (use compat=1.1 or "
+                         "greater)");
+            return -EINVAL;
+        }
+
+        helper_cb_info.current_operation = QCOW2_CHANGING_REFCOUNT_ORDER;
+        ret = qcow2_change_refcount_order(bs, refcount_order,
+                                          &qcow2_amend_helper_cb,
+                                          &helper_cb_info, &local_error);
+        if (ret < 0) {
+            qerror_report_err(local_error);
+            return ret;
+        }
+    }
+
     if (backing_file || backing_format) {
         ret = qcow2_change_backing_file(bs, backing_file ?: bs->backing_file,
                                         backing_format ?: bs->backing_format);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 20/21] qcow2: Point to amend function in check
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (18 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 19/21] qcow2: Invoke refcount order amendment function Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-12  4:38   ` Eric Blake
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 21/21] iotests: Add test for different refcount widths Max Reitz
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

If a reference count is not representable with the current refcount
order, the image check should point to qemu-img amend for increasing the
refcount order. However, qemu-img amend needs write access to the image
which cannot be provided if the image is marked corrupt; and the image
check will not mark the image consistent unless everything actually is
consistent.

Therefore, if an image is marked corrupt and the image check encounters
a reference count overflow, it cannot be fixed by using qemu-img amend
to increase the refcount order. Instead, one has to use qemu-img convert
to create a completely new copy of the image in this case.

Alternatively, we may want to give the user a way of manually removing
the corrupt flag, maybe through qemu-img amend, but this is not part of
this patch.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 59b6437..a4723cd 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1337,6 +1337,9 @@ static int inc_refcounts(BlockDriverState *bs,
         if (refcount == s->refcount_max) {
             fprintf(stderr, "ERROR: overflow cluster offset=0x%" PRIx64
                     "\n", cluster_offset);
+            fprintf(stderr, "Use qemu-img amend to increase the refcount entry "
+                    "width or qemu-img convert to create a clean copy if the "
+                    "image cannot be opened for writing\n");
             res->corruptions++;
             continue;
         }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [Qemu-devel] [PATCH 21/21] iotests: Add test for different refcount widths
  2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (19 preceding siblings ...)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 20/21] qcow2: Point to amend function in check Max Reitz
@ 2014-11-10 13:45 ` Max Reitz
  2014-11-11 19:53   ` Eric Blake
  20 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-10 13:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add a test for conversion between different refcount widths and errors
specific to certain widths (i.e. snapshots with refcount_width=1).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/112     | 225 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/112.out | 123 +++++++++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 349 insertions(+)
 create mode 100755 tests/qemu-iotests/112
 create mode 100644 tests/qemu-iotests/112.out

diff --git a/tests/qemu-iotests/112 b/tests/qemu-iotests/112
new file mode 100755
index 0000000..a55f1ed
--- /dev/null
+++ b/tests/qemu-iotests/112
@@ -0,0 +1,225 @@
+#!/bin/bash
+#
+# Test cases for different refcount_widths
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=mreitz@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+tmp=/tmp/$$
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+# This test will set refcount_width on its own which would conflict with the
+# manual setting; compat will be overridden as well
+_unsupported_imgopts refcount_width 'compat=0.10'
+
+function print_refcount_width()
+{
+    $QEMU_IMG info "$TEST_IMG" | grep 'refcount width:' | sed -e 's/^ *//'
+}
+
+echo
+echo '=== refcount_width limits ==='
+echo
+
+# Must be positive (non-zero)
+IMGOPTS="$IMGOPTS,refcount_width=0" _make_test_img 64M
+# Must be positive (non-negative)
+IMGOPTS="$IMGOPTS,refcount_width=-1" _make_test_img 64M
+# May not exceed 64
+IMGOPTS="$IMGOPTS,refcount_width=128" _make_test_img 64M
+# Must be a power of two
+IMGOPTS="$IMGOPTS,refcount_width=42" _make_test_img 64M
+
+# 1 is the minimum
+IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
+print_refcount_width
+
+# 64 is the maximum
+IMGOPTS="$IMGOPTS,refcount_width=64" _make_test_img 64M
+print_refcount_width
+
+# 16 is the default
+_make_test_img 64M
+print_refcount_width
+
+echo
+echo '=== refcount_width and compat=0.10 ==='
+echo
+
+# Should work
+IMGOPTS="$IMGOPTS,compat=0.10,refcount_width=16" _make_test_img 64M
+print_refcount_width
+
+# Should not work
+IMGOPTS="$IMGOPTS,compat=0.10,refcount_width=1" _make_test_img 64M
+IMGOPTS="$IMGOPTS,compat=0.10,refcount_width=64" _make_test_img 64M
+
+
+echo
+echo '=== Snapshot limit on refcount_width=1 ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
+print_refcount_width
+
+$QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
+
+# Should fail
+$QEMU_IMG snapshot -c foo "$TEST_IMG"
+
+# The new L1 table could/shoud be leaked
+_check_test_img
+
+echo
+echo '=== Snapshot limit on refcount_width=2 ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=2" _make_test_img 64M
+print_refcount_width
+
+$QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
+
+# Should succeed
+$QEMU_IMG snapshot -c foo "$TEST_IMG"
+$QEMU_IMG snapshot -c bar "$TEST_IMG"
+# Should fail (4th reference)
+$QEMU_IMG snapshot -c baz "$TEST_IMG"
+
+# The new L1 table could/shoud be leaked
+_check_test_img
+
+echo
+echo '=== Compressed clusters with refcount_width=1 ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
+print_refcount_width
+
+# Both should fit into a single host cluster; instead of failing to increase the
+# refcount of that cluster, qemu should just allocate a new cluster and make
+# this operation succeed
+$QEMU_IO -c 'write -P 0 -c  0  64k' \
+         -c 'write -P 1 -c 64k 64k' \
+         "$TEST_IMG" | _filter_qemu_io
+
+_check_test_img
+
+echo
+echo '=== Amend from refcount_width=16 to refcount_width=1 ==='
+echo
+
+_make_test_img 64M
+print_refcount_width
+
+$QEMU_IO -c 'write 16M 32M' "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG amend -o refcount_width=1 "$TEST_IMG"
+_check_test_img
+print_refcount_width
+
+echo
+echo '=== Amend from refcount_width=1 to refcount_width=64 ==='
+echo
+
+$QEMU_IMG amend -o refcount_width=64 "$TEST_IMG"
+_check_test_img
+print_refcount_width
+
+echo
+echo '=== Amend to compat=0.10 ==='
+echo
+
+# Should not work because refcount_width needs to be 16 for compat=0.10
+$QEMU_IMG amend -o compat=0.10 "$TEST_IMG"
+print_refcount_width
+# Should work
+$QEMU_IMG amend -o compat=0.10,refcount_width=16 "$TEST_IMG"
+_check_test_img
+print_refcount_width
+
+# Get back to compat=1.1 and refcount_width=16
+$QEMU_IMG amend -o compat=1.1 "$TEST_IMG"
+print_refcount_width
+# Should not work
+$QEMU_IMG amend -o refcount_width=32,compat=0.10 "$TEST_IMG"
+print_refcount_width
+
+echo
+echo '=== Amend with snapshot ==='
+echo
+
+$QEMU_IMG snapshot -c foo "$TEST_IMG"
+# Just to have different refcounts across the image
+$QEMU_IO -c 'write 0 16M' "$TEST_IMG" | _filter_qemu_io
+
+# Should not work
+$QEMU_IMG amend -o refcount_width=1 "$TEST_IMG"
+_check_test_img
+print_refcount_width
+
+# Should work
+$QEMU_IMG amend -o refcount_width=2 "$TEST_IMG"
+_check_test_img
+print_refcount_width
+
+echo
+echo '=== Testing too many references for check ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
+print_refcount_width
+
+# This cluster should be created at 0x50000
+$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
+# Now make the second L2 entriy (the L2 table should be at 0x40000) point to
+# that cluster, so we have two references
+poke_file "$TEST_IMG" $((0x40008)) "\x80\x00\x00\x00\x00\x05\x00\x00"
+
+# This should say "please use amend"
+_check_test_img -r all
+
+# So we do that
+$QEMU_IMG amend -o refcount_width=2 "$TEST_IMG"
+print_refcount_width
+
+# And try again
+_check_test_img -r all
+
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/112.out b/tests/qemu-iotests/112.out
new file mode 100644
index 0000000..d6a6a84
--- /dev/null
+++ b/tests/qemu-iotests/112.out
@@ -0,0 +1,123 @@
+QA output created by 112
+
+=== refcount_width limits ===
+
+qemu-img: TEST_DIR/t.IMGFMT: Refcount width must be a power of two and may not exceed 64 bits
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+qemu-img: TEST_DIR/t.IMGFMT: Refcount width must be a power of two and may not exceed 64 bits
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 refcount_width=-1
+qemu-img: TEST_DIR/t.IMGFMT: Refcount width must be a power of two and may not exceed 64 bits
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+qemu-img: TEST_DIR/t.IMGFMT: Refcount width must be a power of two and may not exceed 64 bits
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 1
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 64
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 16
+
+=== refcount_width and compat=0.10 ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 16
+qemu-img: TEST_DIR/t.IMGFMT: Different refcount widths than 16 bits require compatibility level 1.1 or above (use compat=1.1 or greater)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+qemu-img: TEST_DIR/t.IMGFMT: Different refcount widths than 16 bits require compatibility level 1.1 or above (use compat=1.1 or greater)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+
+=== Snapshot limit on refcount_width=1 ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 1
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+qemu-img: Could not create snapshot 'foo': -22 (Invalid argument)
+Leaked cluster 6 refcount=1 reference=0
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
+
+=== Snapshot limit on refcount_width=2 ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 2
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+qemu-img: Could not create snapshot 'baz': -22 (Invalid argument)
+Leaked cluster 7 refcount=1 reference=0
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
+
+=== Compressed clusters with refcount_width=1 ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 1
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 65536
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+
+=== Amend from refcount_width=16 to refcount_width=1 ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 16
+wrote 33554432/33554432 bytes at offset 16777216
+32 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+refcount width: 1
+
+=== Amend from refcount_width=1 to refcount_width=64 ===
+
+No errors were found on the image.
+refcount width: 64
+
+=== Amend to compat=0.10 ===
+
+qemu-img: compat=0.10 requires refcount_width=16
+qemu-img: Error while amending options: Operation not supported
+refcount width: 64
+No errors were found on the image.
+refcount width: 16
+refcount width: 16
+qemu-img: Different refcount widths than 16 bits require compatibility level 1.1 or above (use compat=1.1 or greater)
+qemu-img: Error while amending options: Invalid argument
+refcount width: 16
+
+=== Amend with snapshot ===
+
+wrote 16777216/16777216 bytes at offset 0
+16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+qemu-img: Cannot decrease refcount entry width to 1 bits: Cluster at offset 0x50000 has a refcount of 2
+qemu-img: Error while amending options: Invalid argument
+No errors were found on the image.
+refcount width: 16
+No errors were found on the image.
+refcount width: 2
+
+=== Testing too many references for check ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 1
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+ERROR: overflow cluster offset=0x50000
+Use qemu-img amend to increase the refcount entry width or qemu-img convert to create a clean copy if the image cannot be opened for writing
+
+1 errors were found on the image.
+Data may be corrupted, or further writes to the image may corrupt it.
+refcount width: 2
+ERROR cluster 5 refcount=1 reference=2
+Repairing cluster 5 refcount=1 reference=2
+Repairing OFLAG_COPIED data cluster: l2_entry=8000000000050000 refcount=2
+Repairing OFLAG_COPIED data cluster: l2_entry=8000000000050000 refcount=2
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    3 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 7dfe469..593f3dd 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -112,3 +112,4 @@
 107 rw auto quick
 108 rw auto quick
 111 rw auto quick
+112 rw auto
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 09/21] qcow2: Open images with refcount order != 4
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 09/21] qcow2: Open images with refcount order != 4 Max Reitz
@ 2014-11-10 17:03   ` Eric Blake
  2014-11-10 17:06     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-10 17:03 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1475 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> No longer refuse to open images with a different refcount entry width
> than 16 bits; only reject images with a refcount width larger than 64
> bits (which is prohibited by the specification).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index d70e927..b718e75 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -677,10 +677,10 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>      }
>  
>      /* Check support for various header values */
> -    if (header.refcount_order != 4) {
> -        report_unsupported(bs, errp, "%d bit reference counts",
> -                           1 << header.refcount_order);
> -        ret = -ENOTSUP;
> +    if (header.refcount_order > 6) {
> +        error_setg(errp, "Reference count entry width too large (%i bit); may "
> +                   "not exceed 64 bit", 1 << header.refcount_order);

Overflows if I fuzz an image to put 32 or larger into
header.refcount_order. It may be better to just tweak the error message
to state that the order cannot exceed 6, rather than trying to display
the actual bit width that the user is requesting, as then you avoid the
'1 << problem'.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 09/21] qcow2: Open images with refcount order != 4
  2014-11-10 17:03   ` Eric Blake
@ 2014-11-10 17:06     ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-10 17:06 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-10 at 18:03, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> No longer refuse to open images with a different refcount entry width
>> than 16 bits; only reject images with a refcount width larger than 64
>> bits (which is prohibited by the specification).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index d70e927..b718e75 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -677,10 +677,10 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>>       }
>>   
>>       /* Check support for various header values */
>> -    if (header.refcount_order != 4) {
>> -        report_unsupported(bs, errp, "%d bit reference counts",
>> -                           1 << header.refcount_order);
>> -        ret = -ENOTSUP;
>> +    if (header.refcount_order > 6) {
>> +        error_setg(errp, "Reference count entry width too large (%i bit); may "
>> +                   "not exceed 64 bit", 1 << header.refcount_order);
> Overflows if I fuzz an image to put 32 or larger into
> header.refcount_order. It may be better to just tweak the error message
> to state that the order cannot exceed 6, rather than trying to display
> the actual bit width that the user is requesting, as then you avoid the
> '1 << problem'.

Yes, probably. Or we display it in binary: "0b1%0*i", 
header.refcount_order, 0 (which works since refcount_order is guaranteed 
to exceed 6).

Joking aside, yes, I'll omit the theoretical refcount width.

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 01/21] qcow2: Add two new fields to BDRVQcowState
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 01/21] qcow2: Add two new fields to BDRVQcowState Max Reitz
@ 2014-11-10 19:00   ` Eric Blake
  0 siblings, 0 replies; 75+ messages in thread
From: Eric Blake @ 2014-11-10 19:00 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Add two new fields regarding refcount information (the bit width of
> every entry and the maximum refcount value) to the BDRVQcowState.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 2 +-
>  block/qcow2.c          | 9 +++++++++
>  block/qcow2.h          | 2 ++
>  3 files changed, 12 insertions(+), 1 deletion(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

> +++ b/block/qcow2.c
> @@ -684,6 +684,15 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>          goto fail;
>      }
>      s->refcount_order = header.refcount_order;
> +    s->refcount_bits = 1 << s->refcount_order;

Not shown is the context where a few lines before still enforces
refcount_order==4, so this doesn't overflow.  When later patches relax
that, I'll make sure we don't overflow here as well.

> +    if (s->refcount_order < 6) {
> +        s->refcount_max = (UINT64_C(1) << s->refcount_bits) - 1;

I don't see the UINT64_C macro get much use, but like it better than
casting :)

> +    } else {

I don't know if Coverity might complain about dead code during bisection
(since we can't get here until we relax refcount_order to not be forced
to 4), but that's a layer beyond making sure 'make check' works so I
don't care.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 02/21] qcow2: Add refcount_width to format-specific info
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 02/21] qcow2: Add refcount_width to format-specific info Max Reitz
@ 2014-11-10 19:06   ` Eric Blake
  2014-11-11  8:11     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-10 19:06 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 4339 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Add the bit width of every refcount entry to the format-specific
> information.
> 
> This breaks some test outputs, fix them.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c              |  4 +++-
>  qapi/block-core.json       |  5 ++++-
>  tests/qemu-iotests/060.out |  1 +
>  tests/qemu-iotests/065     | 23 +++++++++++++++--------
>  tests/qemu-iotests/067.out | 10 +++++-----
>  tests/qemu-iotests/082.out |  7 +++++++
>  tests/qemu-iotests/089.out |  2 ++
>  7 files changed, 37 insertions(+), 15 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index f57aff9..d70e927 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2475,7 +2475,8 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs)
>      };
>      if (s->qcow_version == 2) {
>          *spec_info->qcow2 = (ImageInfoSpecificQCow2){
> -            .compat = g_strdup("0.10"),
> +            .compat             = g_strdup("0.10"),
> +            .refcount_width     = s->refcount_bits,

Hmm - is it really worth displaying a constant?  Since the 0.10 format
cannot change the width from 16, I'm not sure if it adds anything to the
output to display it.  After all, there's other things we omit for the
old format when they cannot be altered (such as the state of a lazy
flag).  On the other hand, if it makes your changes to later iotests
easier for tests that operate on both image formats, I'm not opposed to it.


> +++ b/tests/qemu-iotests/065
> @@ -88,34 +88,41 @@ class TestQMP(TestImageInfoSpecific):
>  class TestQCow2(TestQemuImgInfo):
>      '''Testing a qcow2 version 2 image'''
>      img_options = 'compat=0.10'
> -    json_compare = { 'compat': '0.10' }
> -    human_compare = [ 'compat: 0.10' ]
> +    json_compare = { 'compat': '0.10', 'refcount-width': 16 }
> +    human_compare = [ 'compat: 0.10', 'refcount width: 16' ]

This would be a test that does not change if you decide to not output
the constant.

> -{"return": [{"io-status": "ok", "device": "disk", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 134217728, "filename": "TEST_DIR/t.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": SIZE, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "TEST_DIR/t.qcow2", "encryption_key_missing": false}, "type": "unknown"}, {"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}]}
> +{"return": [{"io-status": "ok", "device": "disk", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 134217728, "filename": "TEST_DIR/t.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": SIZE, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-width": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "TEST_DIR/t.qcow2", "encryption_key_missing": false}, "type": "unknown"}, {"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}]}

It would be nice to figure out how to avoid long lines in the testsuite.
 But that's a project for another day.

If you can make a strong argument for always outputting the constant
width of 16 for 0.10 formats, then I can live with it, so:

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 03/21] qcow2: Use 64 bits for refcount values
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 03/21] qcow2: Use 64 bits for refcount values Max Reitz
@ 2014-11-10 20:59   ` Eric Blake
  2014-11-11  8:12     ` Max Reitz
  2014-11-11  9:22     ` Kevin Wolf
  0 siblings, 2 replies; 75+ messages in thread
From: Eric Blake @ 2014-11-10 20:59 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Refcounts may have a width of up to 64 bit, so qemu should use the same

s/bit/bits/

> width to represent refcount values internally.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-cluster.c  |  9 ++++++---
>  block/qcow2-refcount.c | 37 ++++++++++++++++++++-----------------
>  block/qcow2.h          |  7 ++++---
>  3 files changed, 30 insertions(+), 23 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index df0b2c9..ab43902 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1640,7 +1640,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
>      for (i = 0; i < l1_size; i++) {
>          uint64_t l2_offset = l1_table[i] & L1E_OFFSET_MASK;
>          bool l2_dirty = false;
> -        int l2_refcount;
> +        int64_t l2_refcount;

You may want to mention in the commit message that you choose a signed
type to allow negative for errors, and therefore we really allow only up
to 63 useful bits.  Or even mention that this is  okay because no one
can feasibly generate an image with more than 2^63 refs to the same
cluster (there isn't that much storage or time to do such a task in our
lifetime...)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 04/21] qcow2: Respect error in qcow2_alloc_bytes()
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 04/21] qcow2: Respect error in qcow2_alloc_bytes() Max Reitz
@ 2014-11-10 21:05   ` Eric Blake
  0 siblings, 0 replies; 75+ messages in thread
From: Eric Blake @ 2014-11-10 21:05 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> qcow2_update_cluster_refcount() may fail, and qcow2_alloc_bytes() should
> mind that case.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 32 +++++++++++++++++++++-----------
>  1 file changed, 21 insertions(+), 11 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes()
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes() Max Reitz
@ 2014-11-10 21:12   ` Eric Blake
  2014-11-11  8:22     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-10 21:12 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1227 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> qcow2_alloc_bytes() may reuse a cluster multiple times, in which case
> the refcount is increased accordingly. However, if this would lead to an
> overflow the function should instead just not reuse this cluster and
> allocate a new one.

So if recount_order is 1 (2 bits per refcount, max refcount of 4), and
we encounter the same cluster 6 times (say by 5 back-to-back internal
snapshots), does this code optimize to only 2 clusters (both with
refcount 3) or does it result in each of the last 3 clusters spilling to
its own 1-ref cluster for a total of 4 clusters?  Short of Benoit's work
on deduplication, is there even a way to avoid inefficient use of
spilled clusters?  But I guess answering that can be a separate patch;
inefficiency is annoying, but not technically wrong and therefore not a
reason to reject this one.

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++--
>  1 file changed, 30 insertions(+), 2 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification Max Reitz
@ 2014-11-10 22:30   ` Eric Blake
  2014-11-11  8:35     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-10 22:30 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 10592 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Since refcounts do not always have to be a uint16_t, all refcount blocks
> and arrays in memory should not have a specific type (thus they become
> pointers to void) and for accessing them, two helper functions are used
> (a getter and a setter). Those functions are called indirectly through
> function pointers in the BDRVQcowState so they may later be exchanged
> for different refcount orders.
> 
> At the same time, replace all sizeof(**refcount_table) etc. in the qcow2
> check code by s->refcount_bits / 8. Note that this might lead to wrong
> values due to truncating division, but currently s->refcount_bits is
> always 16, and before the upcoming patch which removes this limitation
> another patch will make the division round up correctly.

Thanks for pointing out that this transition is still in progress, and
needs more patches.  I agree that for this patch, the division is safe.

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 152 +++++++++++++++++++++++++++++--------------------
>  block/qcow2.h          |   8 +++
>  2 files changed, 98 insertions(+), 62 deletions(-)
> 

> @@ -116,20 +137,24 @@ int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
>      }
>  
>      ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
> -        (void**) &refcount_block);
> +                          &refcount_block);
>      if (ret < 0) {
>          return ret;
>      }
>  
>      block_index = cluster_index & (s->refcount_block_size - 1);
> -    refcount = be16_to_cpu(refcount_block[block_index]);
> +    refcount = s->get_refcount(refcount_block, block_index);
>  
> -    ret = qcow2_cache_put(bs, s->refcount_block_cache,
> -        (void**) &refcount_block);
> +    ret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
>      if (ret < 0) {
>          return ret;
>      }
>  
> +    if (refcount < 0) {
> +        /* overflow */
> +        return -ERANGE;
> +    }

Should you be checking for overflow prior to calling qcow2_cache_put?

> @@ -362,7 +387,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
>          s->cluster_size;
>      uint64_t table_offset = meta_offset + blocks_clusters * s->cluster_size;
>      uint64_t *new_table = g_try_new0(uint64_t, table_size);
> -    uint16_t *new_blocks = g_try_malloc0(blocks_clusters * s->cluster_size);
> +    void *new_blocks = g_try_malloc0(blocks_clusters * s->cluster_size);

Can this multiplication ever overflow?  Would we be better off with a
g_try_new0 approach?

> @@ -1118,12 +1142,13 @@ fail:
>   */
>  static int inc_refcounts(BlockDriverState *bs,
>                           BdrvCheckResult *res,
> -                         uint16_t **refcount_table,
> +                         void **refcount_table,
>                           int64_t *refcount_table_size,
>                           int64_t offset, int64_t size)
>  {
>      BDRVQcowState *s = bs->opaque;
> -    uint64_t start, last, cluster_offset, k;
> +    uint64_t start, last, cluster_offset, k, refcount;

Why uint64_t, when you limit to int64_t in other patches?

> +    int64_t i;
>  
>      if (size <= 0) {
>          return 0;
> @@ -1136,12 +1161,12 @@ static int inc_refcounts(BlockDriverState *bs,
>          k = cluster_offset >> s->cluster_bits;
>          if (k >= *refcount_table_size) {
>              int64_t old_refcount_table_size = *refcount_table_size;
> -            uint16_t *new_refcount_table;
> +            void *new_refcount_table;
>  
>              *refcount_table_size = k + 1;
>              new_refcount_table = g_try_realloc(*refcount_table,
>                                                 *refcount_table_size *
> -                                               sizeof(**refcount_table));
> +                                               s->refcount_bits / 8);

This multiplies before dividing.  Can it ever overflow, where writing
*refcount_table_size * (s->refcount_bits / 8) would be safer?  Also, is
it better to use a malloc variant that checks for overflow (I think it
is g_try_renew?) instead of open-coding the multiply?

>              if (!new_refcount_table) {
>                  *refcount_table_size = old_refcount_table_size;
>                  res->check_errors++;
> @@ -1149,16 +1174,19 @@ static int inc_refcounts(BlockDriverState *bs,
>              }
>              *refcount_table = new_refcount_table;
>  
> -            memset(*refcount_table + old_refcount_table_size, 0,
> -                   (*refcount_table_size - old_refcount_table_size) *
> -                   sizeof(**refcount_table));
> +            for (i = old_refcount_table_size; i < *refcount_table_size; i++) {
> +                s->set_refcount(*refcount_table, i, 0);
> +            }

This feels slower than memset.  Any chance we can add an optimization
that brings back the speed of memset (may require an additional callback
in addition to the getter and setter)?

> @@ -1178,7 +1206,7 @@ enum {
>   * error occurred.
>   */
>  static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
> -    uint16_t **refcount_table, int64_t *refcount_table_size, int64_t l2_offset,
> +    void **refcount_table, int64_t *refcount_table_size, int64_t l2_offset,
>      int flags)

I noticed you cleaned up indentation in a lot of the patch, but not
here.  Any reason?

> @@ -1541,7 +1569,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>  
>                  new_refcount_table = g_try_realloc(*refcount_table,
>                                                     *nb_clusters *
> -                                                   sizeof(**refcount_table));
> +                                                   s->refcount_bits / 8);

Another possible overflow or g_try_renew site?

>                  if (!new_refcount_table) {
>                      *nb_clusters = old_nb_clusters;
>                      res->check_errors++;
> @@ -1549,9 +1577,9 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>                  }
>                  *refcount_table = new_refcount_table;
>  
> -                memset(*refcount_table + old_nb_clusters, 0,
> -                       (*nb_clusters - old_nb_clusters) *
> -                       sizeof(**refcount_table));
> +                for (j = old_nb_clusters; j < *nb_clusters; j++) {
> +                    s->set_refcount(*refcount_table, j, 0);
> +                }

Another memset pessimation.  Maybe even having a callback to expand the
table, and factor out more of the common code of reallocating the table
and clearing all new entries.


> @@ -1611,7 +1640,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>      int ret;
>  
>      if (!*refcount_table) {
> -        *refcount_table = g_try_new0(uint16_t, *nb_clusters);
> +        *refcount_table = g_try_malloc0(*nb_clusters * s->refcount_bits / 8);

Feels like a step backwards in overflow detection?

> @@ -1787,22 +1816,22 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
>          *imrt_nb_clusters = cluster + cluster_count - contiguous_free_clusters;
>          new_refcount_table = g_try_realloc(*refcount_table,
>                                             *imrt_nb_clusters *
> -                                           sizeof(**refcount_table));
> +                                           s->refcount_bits / 8);

Another possible overflow

>          if (!new_refcount_table) {
>              *imrt_nb_clusters = old_imrt_nb_clusters;
>              return -ENOMEM;
>          }
>          *refcount_table = new_refcount_table;
>  
> -        memset(*refcount_table + old_imrt_nb_clusters, 0,
> -               (*imrt_nb_clusters - old_imrt_nb_clusters) *
> -               sizeof(**refcount_table));
> +        for (i = old_imrt_nb_clusters; i < *imrt_nb_clusters; i++) {
> +            s->set_refcount(*refcount_table, i, 0);
> +        }
>      }

and another resize where we pessimize memset


> @@ -1911,12 +1940,11 @@ write_refblocks:
>          }
>  
>          on_disk_refblock = qemu_blockalign0(bs->file, s->cluster_size);
> -        for (i = 0; i < s->refcount_block_size &&
> -                    refblock_start + i < *nb_clusters; i++)
> -        {
> -            on_disk_refblock[i] =
> -                cpu_to_be16((*refcount_table)[refblock_start + i]);
> -        }
> +
> +        memcpy(on_disk_refblock, (void *)((uintptr_t)*refcount_table +
> +                                 (refblock_index << s->refcount_block_bits)),
> +               MIN(s->refcount_block_size, *nb_clusters - refblock_start)
> +               * s->refcount_bits / 8);
>  

This one's different in that you move TO a memcpy instead of open-coded
loop.  But I still worry if multiply before /8 could be a problem.

> @@ -2064,7 +2092,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>          /* Because the old reftable has been exchanged for a new one the
>           * references have to be recalculated */
>          rebuild = false;
> -        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
> +        memset(refcount_table, 0, nb_clusters * s->refcount_bits / 8);

Another /8 possible overflow.

>          ret = calculate_refcounts(bs, res, 0, &rebuild, &refcount_table,
>                                    &nb_clusters);
>          if (ret < 0) {
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 0f8eb15..1c63221 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -213,6 +213,11 @@ typedef struct Qcow2DiscardRegion {
>      QTAILQ_ENTRY(Qcow2DiscardRegion) next;
>  } Qcow2DiscardRegion;
>  
> +typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
> +                                      uint64_t index);
> +typedef void Qcow2SetRefcountFunc(void *refcount_array,
> +                                  uint64_t index, uint64_t value);

Do you want int64_t for any of the types here, to make it obvious that
you can't exceed 2^63?

Looks like you are on track to a sane conversion, but I'm worried enough
about the math that it probably needs a respin (either comments stating
why we know we don't overflow, or else safer constructs).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 07/21] qcow2: Helper for refcount array size calculation
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 07/21] qcow2: Helper for refcount array size calculation Max Reitz
@ 2014-11-10 22:49   ` Eric Blake
  2014-11-11  8:37     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-10 22:49 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2400 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Add a helper function which correctly calculates the byte size of a
> refcount array for any refcount order, and use that function.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 39 ++++++++++++++++++++++++++++-----------
>  1 file changed, 28 insertions(+), 11 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 16652da..cfb4807 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1132,6 +1132,20 @@ fail:
>  /* refcount checking functions */
>  
>  
> +static size_t refcount_array_byte_size(BDRVQcowState *s, uint64_t entries)
> +{
> +    if (s->refcount_order < 3) {
> +        /* sub-byte width */
> +        int shift = 3 - s->refcount_order;
> +        return (entries + (1 << shift) - 1) >> shift;
> +    } else if (s->refcount_order == 3) {
> +        /* byte width */
> +        return entries;
> +    } else {
> +        /* multiple bytes wide */
> +        return entries << (s->refcount_order - 3);
> +    }

A comment proving why this can't overflow might be nice (if I analyzed
correctly, entries will be computed by file size / clusters, and in the
worst case, the smallest cluster and largest refcount_order results in
'(size >> 9) << (6 - 3)' which is still safe).

> @@ -1161,12 +1175,13 @@ static int inc_refcounts(BlockDriverState *bs,
>          k = cluster_offset >> s->cluster_bits;
>          if (k >= *refcount_table_size) {
>              int64_t old_refcount_table_size = *refcount_table_size;
> +            size_t new_byte_size;
>              void *new_refcount_table;
>  
>              *refcount_table_size = k + 1;
> -            new_refcount_table = g_try_realloc(*refcount_table,
> -                                               *refcount_table_size *
> -                                               s->refcount_bits / 8);
> +            new_byte_size = refcount_array_byte_size(s, *refcount_table_size);
> +
> +            new_refcount_table = g_try_realloc(*refcount_table, new_byte_size);

Yay - this addresses one of my possible overflow comments on 6/21.

I wonder if the series would have less churn if you rearranged this
patch to come before 6/21.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 08/21] qcow2: More helpers for refcount modification
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 08/21] qcow2: More helpers for refcount modification Max Reitz
@ 2014-11-11  0:29   ` Eric Blake
  2014-11-11  8:42     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-11  0:29 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1100 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Add helper functions for getting and setting refcounts in a refcount
> array for any possible refcount order, and choose the correct one during
> refcount initialization.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 143 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 141 insertions(+), 2 deletions(-)
> 

> +
> +static uint64_t get_refcount_ro6(const void *refcount_array, uint64_t index)
> +{
> +    return be64_to_cpu(((const uint64_t *)refcount_array)[index]);
> +}

Should this return int64_t and error out if the user ever exceeded 2**63
via image fuzzing?

> +
> +static void set_refcount_ro6(void *refcount_array, uint64_t index,
> +                             uint64_t value)
> +{
> +    ((uint64_t *)refcount_array)[index] = cpu_to_be64(value);
> +}

Should this assert that value <= INT64_MAX, since that's what the rest
of the code caps it to?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 10/21] qcow2: refcount_order parameter for qcow2_create2
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 10/21] qcow2: refcount_order parameter for qcow2_create2 Max Reitz
@ 2014-11-11  5:40   ` Eric Blake
  2014-11-11  8:48     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-11  5:40 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2011 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Add a refcount_order parameter to qcow2_create2(), use that value for
> the image header and for calculating the size required for
> preallocation.
> 
> For now, always pass 4.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 41 ++++++++++++++++++++++++++++++-----------
>  1 file changed, 30 insertions(+), 11 deletions(-)
> 

> @@ -1811,6 +1811,13 @@ static int qcow2_create2(const char *filename, int64_t total_size,
>          int64_t meta_size = 0;
>          uint64_t nreftablee, nrefblocke, nl1e, nl2e;
>          int64_t aligned_total_size = align_offset(total_size, cluster_size);
> +        int refblock_bits, refblock_size;
> +        /* refcount entry size in bytes */
> +        double rces = (1 << refcount_order) / 8.;

Would float be any simpler than double?

> +
> +        /* see qcow2_open() */
> +        refblock_bits = cluster_bits - (refcount_order - 3);
> +        refblock_size = 1 << refblock_bits;
>  
>          /* header: 1 cluster */
>          meta_size += cluster_size;
> @@ -1835,20 +1842,20 @@ static int qcow2_create2(const char *filename, int64_t total_size,
>           *   c = cluster size
>           *   y1 = number of refcount blocks entries
>           *   y2 = meta size including everything
> +         *   rces = refcount entry size in bytes
>           * then,
>           *   y1 = (y2 + a)/c
> -         *   y2 = y1 * sizeof(u16) + y1 * sizeof(u16) * sizeof(u64) / c + m
> +         *   y2 = y1 * rces + y1 * rces * sizeof(u64) / c + m

Hmm. This changes from integral to floating point.  Are we going to
suffer from any rounding problems?  I guess you want double to ensure
maximum precision; but whereas earlier patches were limited with
refcount_order of 6 to 2^63, this now limits us around 2^53 if we are
still trying to be accurate.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 02/21] qcow2: Add refcount_width to format-specific info
  2014-11-10 19:06   ` Eric Blake
@ 2014-11-11  8:11     ` Max Reitz
  2014-11-11 15:49       ` Eric Blake
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-11  8:11 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-10 at 20:06, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> Add the bit width of every refcount entry to the format-specific
>> information.
>>
>> This breaks some test outputs, fix them.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2.c              |  4 +++-
>>   qapi/block-core.json       |  5 ++++-
>>   tests/qemu-iotests/060.out |  1 +
>>   tests/qemu-iotests/065     | 23 +++++++++++++++--------
>>   tests/qemu-iotests/067.out | 10 +++++-----
>>   tests/qemu-iotests/082.out |  7 +++++++
>>   tests/qemu-iotests/089.out |  2 ++
>>   7 files changed, 37 insertions(+), 15 deletions(-)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index f57aff9..d70e927 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -2475,7 +2475,8 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs)
>>       };
>>       if (s->qcow_version == 2) {
>>           *spec_info->qcow2 = (ImageInfoSpecificQCow2){
>> -            .compat = g_strdup("0.10"),
>> +            .compat             = g_strdup("0.10"),
>> +            .refcount_width     = s->refcount_bits,
> Hmm - is it really worth displaying a constant?  Since the 0.10 format
> cannot change the width from 16, I'm not sure if it adds anything to the
> output to display it.  After all, there's other things we omit for the
> old format when they cannot be altered (such as the state of a lazy
> flag).  On the other hand, if it makes your changes to later iotests
> easier for tests that operate on both image formats, I'm not opposed to it.

Yes, I thought about not displaying it. But whereas "corrupt" or "lazy 
refcounts" simply do not make sense with compat=0.10 images (it's simply 
impossible), the refcount width does make sense. It's always 16 bits 
(I'm noticing myself how I keep swapping between "bit" and "bits", but I 
just can't help it) but I personally find it interesting enough to 
display. I'd be fine with dropping it from compat=0.10, though.

But in retrospect, I'd rather make the other two flags always visible 
than now drop this entry. However, not displaying a bool if it's always 
false makes more sense to me than not displaying an integer because it's 
always constant.

>> +++ b/tests/qemu-iotests/065
>> @@ -88,34 +88,41 @@ class TestQMP(TestImageInfoSpecific):
>>   class TestQCow2(TestQemuImgInfo):
>>       '''Testing a qcow2 version 2 image'''
>>       img_options = 'compat=0.10'
>> -    json_compare = { 'compat': '0.10' }
>> -    human_compare = [ 'compat: 0.10' ]
>> +    json_compare = { 'compat': '0.10', 'refcount-width': 16 }
>> +    human_compare = [ 'compat: 0.10', 'refcount width: 16' ]
> This would be a test that does not change if you decide to not output
> the constant.
>
>> -{"return": [{"io-status": "ok", "device": "disk", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 134217728, "filename": "TEST_DIR/t.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": SIZE, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "TEST_DIR/t.qcow2", "encryption_key_missing": false}, "type": "unknown"}, {"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}]}
>> +{"return": [{"io-status": "ok", "device": "disk", "locked": false, "removable": false, "inserted": {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 134217728, "filename": "TEST_DIR/t.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": SIZE, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-width": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "file": "TEST_DIR/t.qcow2", "encryption_key_missing": false}, "type": "unknown"}, {"io-status": "ok", "device": "ide1-cd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "floppy0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}, {"device": "sd0", "locked": false, "removable": true, "tray_open": false, "type": "unknown"}]}
> It would be nice to figure out how to avoid long lines in the testsuite.
>   But that's a project for another day.

Yes, Kevin already proposed adding a "pretty" flag to -qmp which uses 
the pretty JSON formatter (which breaks lines and indents).

> If you can make a strong argument for always outputting the constant
> width of 16 for 0.10 formats, then I can live with it, so:

You decide whether it's strong enough. :-)

My main argument is "If a bool is not displayed one can assume it to be 
false; if an integer is not displayed which naturally cannot be 0, I 
will have no idea what it would be, even if it's constant for that image 
version".

> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 03/21] qcow2: Use 64 bits for refcount values
  2014-11-10 20:59   ` Eric Blake
@ 2014-11-11  8:12     ` Max Reitz
  2014-11-11  9:22     ` Kevin Wolf
  1 sibling, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-11  8:12 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-10 at 21:59, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> Refcounts may have a width of up to 64 bit, so qemu should use the same
> s/bit/bits/

See my reply to your review on patch 2. I keep swapping knowing which to 
use - now I know, thanks!

>> width to represent refcount values internally.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-cluster.c  |  9 ++++++---
>>   block/qcow2-refcount.c | 37 ++++++++++++++++++++-----------------
>>   block/qcow2.h          |  7 ++++---
>>   3 files changed, 30 insertions(+), 23 deletions(-)
>>
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index df0b2c9..ab43902 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -1640,7 +1640,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
>>       for (i = 0; i < l1_size; i++) {
>>           uint64_t l2_offset = l1_table[i] & L1E_OFFSET_MASK;
>>           bool l2_dirty = false;
>> -        int l2_refcount;
>> +        int64_t l2_refcount;
> You may want to mention in the commit message that you choose a signed
> type to allow negative for errors, and therefore we really allow only up
> to 63 useful bits.  Or even mention that this is  okay because no one
> can feasibly generate an image with more than 2^63 refs to the same
> cluster (there isn't that much storage or time to do such a task in our
> lifetime...)

Will do.

Max

> Reviewed-by: Eric Blake <eblake@redhat.com>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes()
  2014-11-10 21:12   ` Eric Blake
@ 2014-11-11  8:22     ` Max Reitz
  2014-11-11 16:13       ` Eric Blake
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-11  8:22 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-10 at 22:12, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> qcow2_alloc_bytes() may reuse a cluster multiple times, in which case
>> the refcount is increased accordingly. However, if this would lead to an
>> overflow the function should instead just not reuse this cluster and
>> allocate a new one.
> So if recount_order is 1 (2 bits per refcount, max refcount of 4

*max refcount of 3 (0b11)

> ), and
> we encounter the same cluster 6 times (say by 5 back-to-back internal
> snapshots), does this code optimize to only 2 clusters (both with
> refcount 3) or does it result in each of the last 3 clusters spilling to
> its own 1-ref cluster for a total of 4 clusters?  Short of Benoit's work
> on deduplication, is there even a way to avoid inefficient use of
> spilled clusters?

I'm not sure what you're referring to; maybe I should add that 
qcow2_alloc_bytes() is used for allocating compressed clusters (which 
ideally don't take up a full host cluster), so "reuse" in this context 
just means that several compressed clusters share one host cluster.

Maybe you're referring to the following situation: We have the default 
cluster size of 64k. Now we're trying to allocate 16k for each of the 
compressed clusters A, B, C and D. D won't fit into that cluster because 
the maximum refcount is three, so it will be put into a newly allocated 
host cluster. Finally, we're trying to allocate 32k for a compressed 
cluster E, which will then be put into the same cluster as D. We 
therefore have the following allocation (each sub-box representing 16k):

+---+---+---+---+   +---+---+---+---+
|A |B | C |   |   | D |   E | |
+---+---+---+---+   +---+---+---+---+

whereas the ideal allocation would be:

+---+---+---+---+   +---+---+---+---+
|A |B |   E   |   | C | D | | |
+---+---+---+---+   +---+---+---+---+

This is a problem, but I think first it's a minor one (just use a 
sufficiently large refcount width if you're going to use compressed 
clusters) and second it's about compressed clusters, whose performance I 
could hardly care less about, frankly.

Max

> But I guess answering that can be a separate patch;
> inefficiency is annoying, but not technically wrong and therefore not a
> reason to reject this one.
>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++--
>>   1 file changed, 30 insertions(+), 2 deletions(-)
>>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification
  2014-11-10 22:30   ` Eric Blake
@ 2014-11-11  8:35     ` Max Reitz
  2014-11-11  9:43       ` Max Reitz
  2014-11-11 10:56       ` Max Reitz
  0 siblings, 2 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-11  8:35 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-10 at 23:30, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> Since refcounts do not always have to be a uint16_t, all refcount blocks
>> and arrays in memory should not have a specific type (thus they become
>> pointers to void) and for accessing them, two helper functions are used
>> (a getter and a setter). Those functions are called indirectly through
>> function pointers in the BDRVQcowState so they may later be exchanged
>> for different refcount orders.
>>
>> At the same time, replace all sizeof(**refcount_table) etc. in the qcow2
>> check code by s->refcount_bits / 8. Note that this might lead to wrong
>> values due to truncating division, but currently s->refcount_bits is
>> always 16, and before the upcoming patch which removes this limitation
>> another patch will make the division round up correctly.
> Thanks for pointing out that this transition is still in progress, and
> needs more patches.  I agree that for this patch, the division is safe.
>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 152 +++++++++++++++++++++++++++++--------------------
>>   block/qcow2.h          |   8 +++
>>   2 files changed, 98 insertions(+), 62 deletions(-)
>>
>> @@ -116,20 +137,24 @@ int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
>>       }
>>   
>>       ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
>> -        (void**) &refcount_block);
>> +                          &refcount_block);
>>       if (ret < 0) {
>>           return ret;
>>       }
>>   
>>       block_index = cluster_index & (s->refcount_block_size - 1);
>> -    refcount = be16_to_cpu(refcount_block[block_index]);
>> +    refcount = s->get_refcount(refcount_block, block_index);
>>   
>> -    ret = qcow2_cache_put(bs, s->refcount_block_cache,
>> -        (void**) &refcount_block);
>> +    ret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
>>       if (ret < 0) {
>>           return ret;
>>       }
>>   
>> +    if (refcount < 0) {
>> +        /* overflow */
>> +        return -ERANGE;
>> +    }
> Should you be checking for overflow prior to calling qcow2_cache_put?

I don't think so; we want to free the cache entry reference even if the 
refblock seems unusable.

>> @@ -362,7 +387,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>           s->cluster_size;
>>       uint64_t table_offset = meta_offset + blocks_clusters * s->cluster_size;
>>       uint64_t *new_table = g_try_new0(uint64_t, table_size);
>> -    uint16_t *new_blocks = g_try_malloc0(blocks_clusters * s->cluster_size);
>> +    void *new_blocks = g_try_malloc0(blocks_clusters * s->cluster_size);
> Can this multiplication ever overflow?  Would we be better off with a
> g_try_new0 approach?

Yes, you're right. Patch 7 introduces a helper function which will allow 
checking such overflows in a central place, but I haven't done so in 
this version.

>> @@ -1118,12 +1142,13 @@ fail:
>>    */
>>   static int inc_refcounts(BlockDriverState *bs,
>>                            BdrvCheckResult *res,
>> -                         uint16_t **refcount_table,
>> +                         void **refcount_table,
>>                            int64_t *refcount_table_size,
>>                            int64_t offset, int64_t size)
>>   {
>>       BDRVQcowState *s = bs->opaque;
>> -    uint64_t start, last, cluster_offset, k;
>> +    uint64_t start, last, cluster_offset, k, refcount;
> Why uint64_t, when you limit to int64_t in other patches?

Because we don't need to check for errors here. But I guess we should 
really use int64_t everywhere to be consistent (so that if something 
breaks because a cluster has more than INT64_MAX references, it breaks 
everywhere).

>> +    int64_t i;
>>   
>>       if (size <= 0) {
>>           return 0;
>> @@ -1136,12 +1161,12 @@ static int inc_refcounts(BlockDriverState *bs,
>>           k = cluster_offset >> s->cluster_bits;
>>           if (k >= *refcount_table_size) {
>>               int64_t old_refcount_table_size = *refcount_table_size;
>> -            uint16_t *new_refcount_table;
>> +            void *new_refcount_table;
>>   
>>               *refcount_table_size = k + 1;
>>               new_refcount_table = g_try_realloc(*refcount_table,
>>                                                  *refcount_table_size *
>> -                                               sizeof(**refcount_table));
>> +                                               s->refcount_bits / 8);
> This multiplies before dividing.  Can it ever overflow, where writing
> *refcount_table_size * (s->refcount_bits / 8) would be safer?  Also, is
> it better to use a malloc variant that checks for overflow (I think it
> is g_try_renew?) instead of open-coding the multiply?
>
>>               if (!new_refcount_table) {
>>                   *refcount_table_size = old_refcount_table_size;
>>                   res->check_errors++;
>> @@ -1149,16 +1174,19 @@ static int inc_refcounts(BlockDriverState *bs,
>>               }
>>               *refcount_table = new_refcount_table;
>>   
>> -            memset(*refcount_table + old_refcount_table_size, 0,
>> -                   (*refcount_table_size - old_refcount_table_size) *
>> -                   sizeof(**refcount_table));
>> +            for (i = old_refcount_table_size; i < *refcount_table_size; i++) {
>> +                s->set_refcount(*refcount_table, i, 0);
>> +            }
> This feels slower than memset.

It is, yes. But this is the check function, I don't think performance is 
all that important here (especially not operations in RAM).

> Any chance we can add an optimization
> that brings back the speed of memset (may require an additional callback
> in addition to the getter and setter)?

For sub-byte refcount widths, we would have to manually set all non-byte 
aligned entries to 0 and then use memset() on the rest. Not impossible, 
but I think too complicated for a place where performance is not critical.

>> @@ -1178,7 +1206,7 @@ enum {
>>    * error occurred.
>>    */
>>   static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>> -    uint16_t **refcount_table, int64_t *refcount_table_size, int64_t l2_offset,
>> +    void **refcount_table, int64_t *refcount_table_size, int64_t l2_offset,
>>       int flags)
> I noticed you cleaned up indentation in a lot of the patch, but not
> here.  Any reason?

Maybe I remembered touching that header once already and not fixing up 
the indentation. Will do in v2.

>> @@ -1541,7 +1569,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>>   
>>                   new_refcount_table = g_try_realloc(*refcount_table,
>>                                                      *nb_clusters *
>> -                                                   sizeof(**refcount_table));
>> +                                                   s->refcount_bits / 8);
> Another possible overflow or g_try_renew site?
>
>>                   if (!new_refcount_table) {
>>                       *nb_clusters = old_nb_clusters;
>>                       res->check_errors++;
>> @@ -1549,9 +1577,9 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>>                   }
>>                   *refcount_table = new_refcount_table;
>>   
>> -                memset(*refcount_table + old_nb_clusters, 0,
>> -                       (*nb_clusters - old_nb_clusters) *
>> -                       sizeof(**refcount_table));
>> +                for (j = old_nb_clusters; j < *nb_clusters; j++) {
>> +                    s->set_refcount(*refcount_table, j, 0);
>> +                }
> Another memset pessimation.  Maybe even having a callback to expand the
> table, and factor out more of the common code of reallocating the table
> and clearing all new entries.

That sounds useful, yes. I'll look into it, but I'll probably still go 
without memset().

>> @@ -1611,7 +1640,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>       int ret;
>>   
>>       if (!*refcount_table) {
>> -        *refcount_table = g_try_new0(uint16_t, *nb_clusters);
>> +        *refcount_table = g_try_malloc0(*nb_clusters * s->refcount_bits / 8);
> Feels like a step backwards in overflow detection?
>
>> @@ -1787,22 +1816,22 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
>>           *imrt_nb_clusters = cluster + cluster_count - contiguous_free_clusters;
>>           new_refcount_table = g_try_realloc(*refcount_table,
>>                                              *imrt_nb_clusters *
>> -                                           sizeof(**refcount_table));
>> +                                           s->refcount_bits / 8);
> Another possible overflow
>
>>           if (!new_refcount_table) {
>>               *imrt_nb_clusters = old_imrt_nb_clusters;
>>               return -ENOMEM;
>>           }
>>           *refcount_table = new_refcount_table;
>>   
>> -        memset(*refcount_table + old_imrt_nb_clusters, 0,
>> -               (*imrt_nb_clusters - old_imrt_nb_clusters) *
>> -               sizeof(**refcount_table));
>> +        for (i = old_imrt_nb_clusters; i < *imrt_nb_clusters; i++) {
>> +            s->set_refcount(*refcount_table, i, 0);
>> +        }
>>       }
> and another resize where we pessimize memset
>
>
>> @@ -1911,12 +1940,11 @@ write_refblocks:
>>           }
>>   
>>           on_disk_refblock = qemu_blockalign0(bs->file, s->cluster_size);
>> -        for (i = 0; i < s->refcount_block_size &&
>> -                    refblock_start + i < *nb_clusters; i++)
>> -        {
>> -            on_disk_refblock[i] =
>> -                cpu_to_be16((*refcount_table)[refblock_start + i]);
>> -        }
>> +
>> +        memcpy(on_disk_refblock, (void *)((uintptr_t)*refcount_table +
>> +                                 (refblock_index << s->refcount_block_bits)),
>> +               MIN(s->refcount_block_size, *nb_clusters - refblock_start)
>> +               * s->refcount_bits / 8);
>>   
> This one's different in that you move TO a memcpy instead of open-coded
> loop.

Yes, because the set_refcount() and get_refcount() helpers store the 
refcounts already in big endian, so now we can directly use the data 
without conversion.

> But I still worry if multiply before /8 could be a problem.
>
>> @@ -2064,7 +2092,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>           /* Because the old reftable has been exchanged for a new one the
>>            * references have to be recalculated */
>>           rebuild = false;
>> -        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
>> +        memset(refcount_table, 0, nb_clusters * s->refcount_bits / 8);
> Another /8 possible overflow.
>
>>           ret = calculate_refcounts(bs, res, 0, &rebuild, &refcount_table,
>>                                     &nb_clusters);
>>           if (ret < 0) {
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index 0f8eb15..1c63221 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -213,6 +213,11 @@ typedef struct Qcow2DiscardRegion {
>>       QTAILQ_ENTRY(Qcow2DiscardRegion) next;
>>   } Qcow2DiscardRegion;
>>   
>> +typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
>> +                                      uint64_t index);
>> +typedef void Qcow2SetRefcountFunc(void *refcount_array,
>> +                                  uint64_t index, uint64_t value);
> Do you want int64_t for any of the types here, to make it obvious that
> you can't exceed 2^63?

Yes, I do.

> Looks like you are on track to a sane conversion, but I'm worried enough
> about the math that it probably needs a respin (either comments stating
> why we know we don't overflow, or else safer constructs).

I'll add a comment in the commit message why overflows do not really get 
more probable than they were before, but the real overflow prevention 
will happen in patch 7.

I think I'll factor out the refcount array resize which automatically 
sets the newly allocated entries to zero. Now that I think about it... I 
can actually get away without sub-byte operations. Since .*realloc() 
will only allocate full bytes anyway, I only need to set that newly 
allocated area to zero (with memset()). So it'll be back to memset(). 
(I'm still wondering why there's no g_try_realloc0() or something; 
probably because the glib does not require the libc heap manager to know 
the exact size of each heap object which would be necessary for that to 
work)

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 07/21] qcow2: Helper for refcount array size calculation
  2014-11-10 22:49   ` Eric Blake
@ 2014-11-11  8:37     ` Max Reitz
  2014-11-11 10:08       ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-11  8:37 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-10 at 23:49, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> Add a helper function which correctly calculates the byte size of a
>> refcount array for any refcount order, and use that function.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 39 ++++++++++++++++++++++++++++-----------
>>   1 file changed, 28 insertions(+), 11 deletions(-)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 16652da..cfb4807 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1132,6 +1132,20 @@ fail:
>>   /* refcount checking functions */
>>   
>>   
>> +static size_t refcount_array_byte_size(BDRVQcowState *s, uint64_t entries)
>> +{
>> +    if (s->refcount_order < 3) {
>> +        /* sub-byte width */
>> +        int shift = 3 - s->refcount_order;
>> +        return (entries + (1 << shift) - 1) >> shift;
>> +    } else if (s->refcount_order == 3) {
>> +        /* byte width */
>> +        return entries;
>> +    } else {
>> +        /* multiple bytes wide */
>> +        return entries << (s->refcount_order - 3);
>> +    }
> A comment proving why this can't overflow might be nice (if I analyzed
> correctly, entries will be computed by file size / clusters, and in the
> worst case, the smallest cluster and largest refcount_order results in
> '(size >> 9) << (6 - 3)' which is still safe).

Yes, will do.

>> @@ -1161,12 +1175,13 @@ static int inc_refcounts(BlockDriverState *bs,
>>           k = cluster_offset >> s->cluster_bits;
>>           if (k >= *refcount_table_size) {
>>               int64_t old_refcount_table_size = *refcount_table_size;
>> +            size_t new_byte_size;
>>               void *new_refcount_table;
>>   
>>               *refcount_table_size = k + 1;
>> -            new_refcount_table = g_try_realloc(*refcount_table,
>> -                                               *refcount_table_size *
>> -                                               s->refcount_bits / 8);
>> +            new_byte_size = refcount_array_byte_size(s, *refcount_table_size);
>> +
>> +            new_refcount_table = g_try_realloc(*refcount_table, new_byte_size);
> Yay - this addresses one of my possible overflow comments on 6/21.
>
> I wonder if the series would have less churn if you rearranged this
> patch to come before 6/21.

Why not, I'll add an __attribute__((used)) to it (which should be fine 
for the duration of a single patch).

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 08/21] qcow2: More helpers for refcount modification
  2014-11-11  0:29   ` Eric Blake
@ 2014-11-11  8:42     ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-11  8:42 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-11 at 01:29, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> Add helper functions for getting and setting refcounts in a refcount
>> array for any possible refcount order, and choose the correct one during
>> refcount initialization.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 143 ++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 141 insertions(+), 2 deletions(-)
>>
>> +
>> +static uint64_t get_refcount_ro6(const void *refcount_array, uint64_t index)
>> +{
>> +    return be64_to_cpu(((const uint64_t *)refcount_array)[index]);
>> +}
> Should this return int64_t and error out if the user ever exceeded 2**63
> via image fuzzing?

I don't know. It's nice that these helper functions cannot return an 
error and thus we don't have to check for a error. I think checking that 
the value didn't overflow in qcow2_get_refcount() should be sufficient 
and relieves the other callers (mainly the image check for its in-memory 
refcount table/array) which know that the value cannot overflow from 
error checking.

Although I did forget an overflow check after the call to get_refcount() 
in update_refcount_discard().

>> +
>> +static void set_refcount_ro6(void *refcount_array, uint64_t index,
>> +                             uint64_t value)
>> +{
>> +    ((uint64_t *)refcount_array)[index] = cpu_to_be64(value);
>> +}
> Should this assert that value <= INT64_MAX, since that's what the rest
> of the code caps it to?

Yes, that it should most certainly do.

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 10/21] qcow2: refcount_order parameter for qcow2_create2
  2014-11-11  5:40   ` Eric Blake
@ 2014-11-11  8:48     ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-11  8:48 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-11 at 06:40, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> Add a refcount_order parameter to qcow2_create2(), use that value for
>> the image header and for calculating the size required for
>> preallocation.
>>
>> For now, always pass 4.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2.c | 41 ++++++++++++++++++++++++++++++-----------
>>   1 file changed, 30 insertions(+), 11 deletions(-)
>>
>> @@ -1811,6 +1811,13 @@ static int qcow2_create2(const char *filename, int64_t total_size,
>>           int64_t meta_size = 0;
>>           uint64_t nreftablee, nrefblocke, nl1e, nl2e;
>>           int64_t aligned_total_size = align_offset(total_size, cluster_size);
>> +        int refblock_bits, refblock_size;
>> +        /* refcount entry size in bytes */
>> +        double rces = (1 << refcount_order) / 8.;
> Would float be any simpler than double?

Maybe. I'm generally in favor of using float whenever possible, but I 
don't think it's necessary here. This is really not a performance issue 
here and having more precision is probably better.

>> +
>> +        /* see qcow2_open() */
>> +        refblock_bits = cluster_bits - (refcount_order - 3);
>> +        refblock_size = 1 << refblock_bits;
>>   
>>           /* header: 1 cluster */
>>           meta_size += cluster_size;
>> @@ -1835,20 +1842,20 @@ static int qcow2_create2(const char *filename, int64_t total_size,
>>            *   c = cluster size
>>            *   y1 = number of refcount blocks entries
>>            *   y2 = meta size including everything
>> +         *   rces = refcount entry size in bytes
>>            * then,
>>            *   y1 = (y2 + a)/c
>> -         *   y2 = y1 * sizeof(u16) + y1 * sizeof(u16) * sizeof(u64) / c + m
>> +         *   y2 = y1 * rces + y1 * rces * sizeof(u64) / c + m
> Hmm. This changes from integral to floating point.  Are we going to
> suffer from any rounding problems?

We are already suffering from rounding problems. See the actual code 
below this comment.

> I guess you want double to ensure
> maximum precision; but whereas earlier patches were limited with
> refcount_order of 6 to 2^63, this now limits us around 2^53 if we are
> still trying to be accurate.

We are not trying to be accurate, which is one of the reasons I rejected 
the first version of the patch which introduced this code (though that 
version had more issues which had been fixed in the final version). If 
you remember, I did write a very similar function which only used 
integral types in order not to suffer from precision problems. Some 
version of this code used that function, but then Kevin rejected my 
patch because it was too complicated, and this was taken in with the 
argument of not having to be really exact when it comes to 
preallocation: Being close enough should be fine.

So, in short, this code did not try to be exact from the beginning (it 
did use floating point arithmetic before this patch, see the "1.0 *"). 
If we want to be exact, there's still the code which I have the LaTeX 
PDF for. ;-)

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 03/21] qcow2: Use 64 bits for refcount values
  2014-11-10 20:59   ` Eric Blake
  2014-11-11  8:12     ` Max Reitz
@ 2014-11-11  9:22     ` Kevin Wolf
  2014-11-11  9:25       ` Max Reitz
  1 sibling, 1 reply; 75+ messages in thread
From: Kevin Wolf @ 2014-11-11  9:22 UTC (permalink / raw)
  To: Eric Blake; +Cc: Peter Lieven, qemu-devel, Stefan Hajnoczi, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1714 bytes --]

Am 10.11.2014 um 21:59 hat Eric Blake geschrieben:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
> > Refcounts may have a width of up to 64 bit, so qemu should use the same
> 
> s/bit/bits/
> 
> > width to represent refcount values internally.
> > 
> > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > ---
> >  block/qcow2-cluster.c  |  9 ++++++---
> >  block/qcow2-refcount.c | 37 ++++++++++++++++++++-----------------
> >  block/qcow2.h          |  7 ++++---
> >  3 files changed, 30 insertions(+), 23 deletions(-)
> > 
> > diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> > index df0b2c9..ab43902 100644
> > --- a/block/qcow2-cluster.c
> > +++ b/block/qcow2-cluster.c
> > @@ -1640,7 +1640,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
> >      for (i = 0; i < l1_size; i++) {
> >          uint64_t l2_offset = l1_table[i] & L1E_OFFSET_MASK;
> >          bool l2_dirty = false;
> > -        int l2_refcount;
> > +        int64_t l2_refcount;
> 
> You may want to mention in the commit message that you choose a signed
> type to allow negative for errors, and therefore we really allow only up
> to 63 useful bits.  Or even mention that this is  okay because no one
> can feasibly generate an image with more than 2^63 refs to the same
> cluster (there isn't that much storage or time to do such a task in our
> lifetime...)

Should patch 1 then set refcount_max = 2^63 for refcount order 6?

Also note that while it might not be feasible to create a cluster with
2^63 references, this doesn't mean that it's impossible to create a
cluster with a stored refcount of (more than) 2^63. We'll have to have
checks there.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 03/21] qcow2: Use 64 bits for refcount values
  2014-11-11  9:22     ` Kevin Wolf
@ 2014-11-11  9:25       ` Max Reitz
  2014-11-11  9:26         ` Max Reitz
  2014-11-11  9:36         ` Kevin Wolf
  0 siblings, 2 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-11  9:25 UTC (permalink / raw)
  To: Kevin Wolf, Eric Blake; +Cc: Peter Lieven, qemu-devel, Stefan Hajnoczi

On 2014-11-11 at 10:22, Kevin Wolf wrote:
> Am 10.11.2014 um 21:59 hat Eric Blake geschrieben:
>> On 11/10/2014 06:45 AM, Max Reitz wrote:
>>> Refcounts may have a width of up to 64 bit, so qemu should use the same
>> s/bit/bits/
>>
>>> width to represent refcount values internally.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>   block/qcow2-cluster.c  |  9 ++++++---
>>>   block/qcow2-refcount.c | 37 ++++++++++++++++++++-----------------
>>>   block/qcow2.h          |  7 ++++---
>>>   3 files changed, 30 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>> index df0b2c9..ab43902 100644
>>> --- a/block/qcow2-cluster.c
>>> +++ b/block/qcow2-cluster.c
>>> @@ -1640,7 +1640,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
>>>       for (i = 0; i < l1_size; i++) {
>>>           uint64_t l2_offset = l1_table[i] & L1E_OFFSET_MASK;
>>>           bool l2_dirty = false;
>>> -        int l2_refcount;
>>> +        int64_t l2_refcount;
>> You may want to mention in the commit message that you choose a signed
>> type to allow negative for errors, and therefore we really allow only up
>> to 63 useful bits.  Or even mention that this is  okay because no one
>> can feasibly generate an image with more than 2^63 refs to the same
>> cluster (there isn't that much storage or time to do such a task in our
>> lifetime...)
> Should patch 1 then set refcount_max = 2^63 for refcount order 6?

It does set refcount_max to INT64_MAX (instead of UINT64_MAX, and there 
is a comment above that line why it's the signed maximum).

> Also note that while it might not be feasible to create a cluster with
> 2^63 references, this doesn't mean that it's impossible to create a
> cluster with a stored refcount of (more than) 2^63. We'll have to have
> checks there.

Yes, the check is done in qcow2_get_refcount() (and needs to be done in 
update_refcount_discard() as well, which I forgot in this version) and 
consists of returning -ERANGE on error.

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 03/21] qcow2: Use 64 bits for refcount values
  2014-11-11  9:25       ` Max Reitz
@ 2014-11-11  9:26         ` Max Reitz
  2014-11-11  9:36         ` Kevin Wolf
  1 sibling, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-11  9:26 UTC (permalink / raw)
  To: Kevin Wolf, Eric Blake; +Cc: Peter Lieven, qemu-devel, Stefan Hajnoczi

On 2014-11-11 at 10:25, Max Reitz wrote:
> On 2014-11-11 at 10:22, Kevin Wolf wrote:
>> Am 10.11.2014 um 21:59 hat Eric Blake geschrieben:
>>> On 11/10/2014 06:45 AM, Max Reitz wrote:
>>>> Refcounts may have a width of up to 64 bit, so qemu should use the 
>>>> same
>>> s/bit/bits/
>>>
>>>> width to represent refcount values internally.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>   block/qcow2-cluster.c  |  9 ++++++---
>>>>   block/qcow2-refcount.c | 37 ++++++++++++++++++++-----------------
>>>>   block/qcow2.h          |  7 ++++---
>>>>   3 files changed, 30 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>>> index df0b2c9..ab43902 100644
>>>> --- a/block/qcow2-cluster.c
>>>> +++ b/block/qcow2-cluster.c
>>>> @@ -1640,7 +1640,7 @@ static int 
>>>> expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
>>>>       for (i = 0; i < l1_size; i++) {
>>>>           uint64_t l2_offset = l1_table[i] & L1E_OFFSET_MASK;
>>>>           bool l2_dirty = false;
>>>> -        int l2_refcount;
>>>> +        int64_t l2_refcount;
>>> You may want to mention in the commit message that you choose a signed
>>> type to allow negative for errors, and therefore we really allow 
>>> only up
>>> to 63 useful bits.  Or even mention that this is  okay because no one
>>> can feasibly generate an image with more than 2^63 refs to the same
>>> cluster (there isn't that much storage or time to do such a task in our
>>> lifetime...)
>> Should patch 1 then set refcount_max = 2^63 for refcount order 6?
>
> It does set refcount_max to INT64_MAX (instead of UINT64_MAX, and 
> there is a comment above that line why it's the signed maximum).
>
>> Also note that while it might not be feasible to create a cluster with
>> 2^63 references, this doesn't mean that it's impossible to create a
>> cluster with a stored refcount of (more than) 2^63. We'll have to have
>> checks there.
>
> Yes, the check is done in qcow2_get_refcount() (and needs to be done 
> in update_refcount_discard() as well, which I forgot in this version) 
> and consists of returning -ERANGE on error.

s/error/overflow/

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 03/21] qcow2: Use 64 bits for refcount values
  2014-11-11  9:25       ` Max Reitz
  2014-11-11  9:26         ` Max Reitz
@ 2014-11-11  9:36         ` Kevin Wolf
  1 sibling, 0 replies; 75+ messages in thread
From: Kevin Wolf @ 2014-11-11  9:36 UTC (permalink / raw)
  To: Max Reitz; +Cc: Peter Lieven, qemu-devel, Stefan Hajnoczi

Am 11.11.2014 um 10:25 hat Max Reitz geschrieben:
> On 2014-11-11 at 10:22, Kevin Wolf wrote:
> >Am 10.11.2014 um 21:59 hat Eric Blake geschrieben:
> >>On 11/10/2014 06:45 AM, Max Reitz wrote:
> >>>Refcounts may have a width of up to 64 bit, so qemu should use the same
> >>s/bit/bits/
> >>
> >>>width to represent refcount values internally.
> >>>
> >>>Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>---
> >>>  block/qcow2-cluster.c  |  9 ++++++---
> >>>  block/qcow2-refcount.c | 37 ++++++++++++++++++++-----------------
> >>>  block/qcow2.h          |  7 ++++---
> >>>  3 files changed, 30 insertions(+), 23 deletions(-)
> >>>
> >>>diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> >>>index df0b2c9..ab43902 100644
> >>>--- a/block/qcow2-cluster.c
> >>>+++ b/block/qcow2-cluster.c
> >>>@@ -1640,7 +1640,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
> >>>      for (i = 0; i < l1_size; i++) {
> >>>          uint64_t l2_offset = l1_table[i] & L1E_OFFSET_MASK;
> >>>          bool l2_dirty = false;
> >>>-        int l2_refcount;
> >>>+        int64_t l2_refcount;
> >>You may want to mention in the commit message that you choose a signed
> >>type to allow negative for errors, and therefore we really allow only up
> >>to 63 useful bits.  Or even mention that this is  okay because no one
> >>can feasibly generate an image with more than 2^63 refs to the same
> >>cluster (there isn't that much storage or time to do such a task in our
> >>lifetime...)
> >Should patch 1 then set refcount_max = 2^63 for refcount order 6?
> 
> It does set refcount_max to INT64_MAX (instead of UINT64_MAX, and
> there is a comment above that line why it's the signed maximum).

Right, I should read patches instead of just Eric's reply before I reply
something myself...

Kevin

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification
  2014-11-11  8:35     ` Max Reitz
@ 2014-11-11  9:43       ` Max Reitz
  2014-11-11 10:56       ` Max Reitz
  1 sibling, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-11  9:43 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-11 at 09:35, Max Reitz wrote:
> On 2014-11-10 at 23:30, Eric Blake wrote:
>> On 11/10/2014 06:45 AM, Max Reitz wrote:
>>> Since refcounts do not always have to be a uint16_t, all refcount 
>>> blocks
>>> and arrays in memory should not have a specific type (thus they become
>>> pointers to void) and for accessing them, two helper functions are used
>>> (a getter and a setter). Those functions are called indirectly through
>>> function pointers in the BDRVQcowState so they may later be exchanged
>>> for different refcount orders.
>>>
>>> At the same time, replace all sizeof(**refcount_table) etc. in the 
>>> qcow2
>>> check code by s->refcount_bits / 8. Note that this might lead to wrong
>>> values due to truncating division, but currently s->refcount_bits is
>>> always 16, and before the upcoming patch which removes this limitation
>>> another patch will make the division round up correctly.
>> Thanks for pointing out that this transition is still in progress, and
>> needs more patches.  I agree that for this patch, the division is safe.
>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>   block/qcow2-refcount.c | 152 
>>> +++++++++++++++++++++++++++++--------------------
>>>   block/qcow2.h          |   8 +++
>>>   2 files changed, 98 insertions(+), 62 deletions(-)
>>>
>>> @@ -116,20 +137,24 @@ int64_t qcow2_get_refcount(BlockDriverState 
>>> *bs, int64_t cluster_index)
>>>       }
>>>         ret = qcow2_cache_get(bs, s->refcount_block_cache, 
>>> refcount_block_offset,
>>> -        (void**) &refcount_block);
>>> +                          &refcount_block);
>>>       if (ret < 0) {
>>>           return ret;
>>>       }
>>>         block_index = cluster_index & (s->refcount_block_size - 1);
>>> -    refcount = be16_to_cpu(refcount_block[block_index]);
>>> +    refcount = s->get_refcount(refcount_block, block_index);
>>>   -    ret = qcow2_cache_put(bs, s->refcount_block_cache,
>>> -        (void**) &refcount_block);
>>> +    ret = qcow2_cache_put(bs, s->refcount_block_cache, 
>>> &refcount_block);
>>>       if (ret < 0) {
>>>           return ret;
>>>       }
>>>   +    if (refcount < 0) {
>>> +        /* overflow */
>>> +        return -ERANGE;
>>> +    }
>> Should you be checking for overflow prior to calling qcow2_cache_put?
>
> I don't think so; we want to free the cache entry reference even if 
> the refblock seems unusable.
>
>>> @@ -362,7 +387,7 @@ static int alloc_refcount_block(BlockDriverState 
>>> *bs,
>>>           s->cluster_size;
>>>       uint64_t table_offset = meta_offset + blocks_clusters * 
>>> s->cluster_size;
>>>       uint64_t *new_table = g_try_new0(uint64_t, table_size);
>>> -    uint16_t *new_blocks = g_try_malloc0(blocks_clusters * 
>>> s->cluster_size);
>>> +    void *new_blocks = g_try_malloc0(blocks_clusters * 
>>> s->cluster_size);
>> Can this multiplication ever overflow?  Would we be better off with a
>> g_try_new0 approach?
>
> Yes, you're right. Patch 7 introduces a helper function which will 
> allow checking such overflows in a central place, but I haven't done 
> so in this version.
>
>>> @@ -1118,12 +1142,13 @@ fail:
>>>    */
>>>   static int inc_refcounts(BlockDriverState *bs,
>>>                            BdrvCheckResult *res,
>>> -                         uint16_t **refcount_table,
>>> +                         void **refcount_table,
>>>                            int64_t *refcount_table_size,
>>>                            int64_t offset, int64_t size)
>>>   {
>>>       BDRVQcowState *s = bs->opaque;
>>> -    uint64_t start, last, cluster_offset, k;
>>> +    uint64_t start, last, cluster_offset, k, refcount;
>> Why uint64_t, when you limit to int64_t in other patches?
>
> Because we don't need to check for errors here. But I guess we should 
> really use int64_t everywhere to be consistent (so that if something 
> breaks because a cluster has more than INT64_MAX references, it breaks 
> everywhere).
>
>>> +    int64_t i;
>>>         if (size <= 0) {
>>>           return 0;
>>> @@ -1136,12 +1161,12 @@ static int inc_refcounts(BlockDriverState *bs,
>>>           k = cluster_offset >> s->cluster_bits;
>>>           if (k >= *refcount_table_size) {
>>>               int64_t old_refcount_table_size = *refcount_table_size;
>>> -            uint16_t *new_refcount_table;
>>> +            void *new_refcount_table;
>>>                 *refcount_table_size = k + 1;
>>>               new_refcount_table = g_try_realloc(*refcount_table,
>>> *refcount_table_size *
>>> - sizeof(**refcount_table));
>>> + s->refcount_bits / 8);
>> This multiplies before dividing.  Can it ever overflow, where writing
>> *refcount_table_size * (s->refcount_bits / 8) would be safer?  Also, is
>> it better to use a malloc variant that checks for overflow (I think it
>> is g_try_renew?) instead of open-coding the multiply?
>>
>>>               if (!new_refcount_table) {
>>>                   *refcount_table_size = old_refcount_table_size;
>>>                   res->check_errors++;
>>> @@ -1149,16 +1174,19 @@ static int inc_refcounts(BlockDriverState *bs,
>>>               }
>>>               *refcount_table = new_refcount_table;
>>>   -            memset(*refcount_table + old_refcount_table_size, 0,
>>> -                   (*refcount_table_size - old_refcount_table_size) *
>>> -                   sizeof(**refcount_table));
>>> +            for (i = old_refcount_table_size; i < 
>>> *refcount_table_size; i++) {
>>> +                s->set_refcount(*refcount_table, i, 0);
>>> +            }
>> This feels slower than memset.
>
> It is, yes. But this is the check function, I don't think performance 
> is all that important here (especially not operations in RAM).
>
>> Any chance we can add an optimization
>> that brings back the speed of memset (may require an additional callback
>> in addition to the getter and setter)?
>
> For sub-byte refcount widths, we would have to manually set all 
> non-byte aligned entries to 0 and then use memset() on the rest. Not 
> impossible, but I think too complicated for a place where performance 
> is not critical.
>
>>> @@ -1178,7 +1206,7 @@ enum {
>>>    * error occurred.
>>>    */
>>>   static int check_refcounts_l2(BlockDriverState *bs, 
>>> BdrvCheckResult *res,
>>> -    uint16_t **refcount_table, int64_t *refcount_table_size, 
>>> int64_t l2_offset,
>>> +    void **refcount_table, int64_t *refcount_table_size, int64_t 
>>> l2_offset,
>>>       int flags)
>> I noticed you cleaned up indentation in a lot of the patch, but not
>> here.  Any reason?
>
> Maybe I remembered touching that header once already and not fixing up 
> the indentation. Will do in v2.
>
>>> @@ -1541,7 +1569,7 @@ static int check_refblocks(BlockDriverState 
>>> *bs, BdrvCheckResult *res,
>>>                     new_refcount_table = g_try_realloc(*refcount_table,
>>> *nb_clusters *
>>> - sizeof(**refcount_table));
>>> + s->refcount_bits / 8);
>> Another possible overflow or g_try_renew site?
>>
>>>                   if (!new_refcount_table) {
>>>                       *nb_clusters = old_nb_clusters;
>>>                       res->check_errors++;
>>> @@ -1549,9 +1577,9 @@ static int check_refblocks(BlockDriverState 
>>> *bs, BdrvCheckResult *res,
>>>                   }
>>>                   *refcount_table = new_refcount_table;
>>>   -                memset(*refcount_table + old_nb_clusters, 0,
>>> -                       (*nb_clusters - old_nb_clusters) *
>>> -                       sizeof(**refcount_table));
>>> +                for (j = old_nb_clusters; j < *nb_clusters; j++) {
>>> +                    s->set_refcount(*refcount_table, j, 0);
>>> +                }
>> Another memset pessimation.  Maybe even having a callback to expand the
>> table, and factor out more of the common code of reallocating the table
>> and clearing all new entries.
>
> That sounds useful, yes. I'll look into it, but I'll probably still go 
> without memset().
>
>>> @@ -1611,7 +1640,7 @@ static int 
>>> calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>>       int ret;
>>>         if (!*refcount_table) {
>>> -        *refcount_table = g_try_new0(uint16_t, *nb_clusters);
>>> +        *refcount_table = g_try_malloc0(*nb_clusters * 
>>> s->refcount_bits / 8);
>> Feels like a step backwards in overflow detection?
>>
>>> @@ -1787,22 +1816,22 @@ static int64_t 
>>> alloc_clusters_imrt(BlockDriverState *bs,
>>>           *imrt_nb_clusters = cluster + cluster_count - 
>>> contiguous_free_clusters;
>>>           new_refcount_table = g_try_realloc(*refcount_table,
>>>                                              *imrt_nb_clusters *
>>> - sizeof(**refcount_table));
>>> + s->refcount_bits / 8);
>> Another possible overflow
>>
>>>           if (!new_refcount_table) {
>>>               *imrt_nb_clusters = old_imrt_nb_clusters;
>>>               return -ENOMEM;
>>>           }
>>>           *refcount_table = new_refcount_table;
>>>   -        memset(*refcount_table + old_imrt_nb_clusters, 0,
>>> -               (*imrt_nb_clusters - old_imrt_nb_clusters) *
>>> -               sizeof(**refcount_table));
>>> +        for (i = old_imrt_nb_clusters; i < *imrt_nb_clusters; i++) {
>>> +            s->set_refcount(*refcount_table, i, 0);
>>> +        }
>>>       }
>> and another resize where we pessimize memset
>>
>>
>>> @@ -1911,12 +1940,11 @@ write_refblocks:
>>>           }
>>>             on_disk_refblock = qemu_blockalign0(bs->file, 
>>> s->cluster_size);
>>> -        for (i = 0; i < s->refcount_block_size &&
>>> -                    refblock_start + i < *nb_clusters; i++)
>>> -        {
>>> -            on_disk_refblock[i] =
>>> -                cpu_to_be16((*refcount_table)[refblock_start + i]);
>>> -        }
>>> +
>>> +        memcpy(on_disk_refblock, (void *)((uintptr_t)*refcount_table +
>>> +                                 (refblock_index << 
>>> s->refcount_block_bits)),
>>> +               MIN(s->refcount_block_size, *nb_clusters - 
>>> refblock_start)
>>> +               * s->refcount_bits / 8);
>> This one's different in that you move TO a memcpy instead of open-coded
>> loop.
>
> Yes, because the set_refcount() and get_refcount() helpers store the 
> refcounts already in big endian, so now we can directly use the data 
> without conversion.
>
>> But I still worry if multiply before /8 could be a problem.
>>
>>> @@ -2064,7 +2092,7 @@ int qcow2_check_refcounts(BlockDriverState 
>>> *bs, BdrvCheckResult *res,
>>>           /* Because the old reftable has been exchanged for a new 
>>> one the
>>>            * references have to be recalculated */
>>>           rebuild = false;
>>> -        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
>>> +        memset(refcount_table, 0, nb_clusters * s->refcount_bits / 8);
>> Another /8 possible overflow.
>>
>>>           ret = calculate_refcounts(bs, res, 0, &rebuild, 
>>> &refcount_table,
>>>                                     &nb_clusters);
>>>           if (ret < 0) {
>>> diff --git a/block/qcow2.h b/block/qcow2.h
>>> index 0f8eb15..1c63221 100644
>>> --- a/block/qcow2.h
>>> +++ b/block/qcow2.h
>>> @@ -213,6 +213,11 @@ typedef struct Qcow2DiscardRegion {
>>>       QTAILQ_ENTRY(Qcow2DiscardRegion) next;
>>>   } Qcow2DiscardRegion;
>>>   +typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
>>> +                                      uint64_t index);
>>> +typedef void Qcow2SetRefcountFunc(void *refcount_array,
>>> +                                  uint64_t index, uint64_t value);
>> Do you want int64_t for any of the types here, to make it obvious that
>> you can't exceed 2^63?
>
> Yes, I do.

To be honest, I just read your reply and not the original code (which 
I'm doing now while working on v2). I think I don't want int64_t after 
all. I generally do want to use int64_t for all refcount values, but in 
this case, as these are just helpers for directly accessing refcount 
arrays (which I think should not fail, see my reply for patch 8), I'd 
rather keep them using uint64_t.

Max

>> Looks like you are on track to a sane conversion, but I'm worried enough
>> about the math that it probably needs a respin (either comments stating
>> why we know we don't overflow, or else safer constructs).
>
> I'll add a comment in the commit message why overflows do not really 
> get more probable than they were before, but the real overflow 
> prevention will happen in patch 7.
>
> I think I'll factor out the refcount array resize which automatically 
> sets the newly allocated entries to zero. Now that I think about it... 
> I can actually get away without sub-byte operations. Since .*realloc() 
> will only allocate full bytes anyway, I only need to set that newly 
> allocated area to zero (with memset()). So it'll be back to memset(). 
> (I'm still wondering why there's no g_try_realloc0() or something; 
> probably because the glib does not require the libc heap manager to 
> know the exact size of each heap object which would be necessary for 
> that to work)
>
> Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 07/21] qcow2: Helper for refcount array size calculation
  2014-11-11  8:37     ` Max Reitz
@ 2014-11-11 10:08       ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-11 10:08 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-11 at 09:37, Max Reitz wrote:
> On 2014-11-10 at 23:49, Eric Blake wrote:
>> On 11/10/2014 06:45 AM, Max Reitz wrote:
>>> Add a helper function which correctly calculates the byte size of a
>>> refcount array for any refcount order, and use that function.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>   block/qcow2-refcount.c | 39 ++++++++++++++++++++++++++++-----------
>>>   1 file changed, 28 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>>> index 16652da..cfb4807 100644
>>> --- a/block/qcow2-refcount.c
>>> +++ b/block/qcow2-refcount.c
>>> @@ -1132,6 +1132,20 @@ fail:
>>>   /* refcount checking functions */
>>>     +static size_t refcount_array_byte_size(BDRVQcowState *s, 
>>> uint64_t entries)
>>> +{
>>> +    if (s->refcount_order < 3) {
>>> +        /* sub-byte width */
>>> +        int shift = 3 - s->refcount_order;
>>> +        return (entries + (1 << shift) - 1) >> shift;
>>> +    } else if (s->refcount_order == 3) {
>>> +        /* byte width */
>>> +        return entries;
>>> +    } else {
>>> +        /* multiple bytes wide */
>>> +        return entries << (s->refcount_order - 3);
>>> +    }
>> A comment proving why this can't overflow might be nice (if I analyzed
>> correctly, entries will be computed by file size / clusters, and in the
>> worst case, the smallest cluster and largest refcount_order results in
>> '(size >> 9) << (6 - 3)' which is still safe).
>
> Yes, will do.
>
>>> @@ -1161,12 +1175,13 @@ static int inc_refcounts(BlockDriverState *bs,
>>>           k = cluster_offset >> s->cluster_bits;
>>>           if (k >= *refcount_table_size) {
>>>               int64_t old_refcount_table_size = *refcount_table_size;
>>> +            size_t new_byte_size;
>>>               void *new_refcount_table;
>>>                 *refcount_table_size = k + 1;
>>> -            new_refcount_table = g_try_realloc(*refcount_table,
>>> - *refcount_table_size *
>>> - s->refcount_bits / 8);
>>> +            new_byte_size = refcount_array_byte_size(s, 
>>> *refcount_table_size);
>>> +
>>> +            new_refcount_table = g_try_realloc(*refcount_table, 
>>> new_byte_size);
>> Yay - this addresses one of my possible overflow comments on 6/21.
>>
>> I wonder if the series would have less churn if you rearranged this
>> patch to come before 6/21.
>
> Why not, I'll add an __attribute__((used)) to it (which should be fine 
> for the duration of a single patch).

I'm not sure why I thought that might be necessary. Of course it isn't.

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification
  2014-11-11  8:35     ` Max Reitz
  2014-11-11  9:43       ` Max Reitz
@ 2014-11-11 10:56       ` Max Reitz
  1 sibling, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-11 10:56 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-11 at 09:35, Max Reitz wrote:
> On 2014-11-10 at 23:30, Eric Blake wrote:
>> On 11/10/2014 06:45 AM, Max Reitz wrote:
>>> Since refcounts do not always have to be a uint16_t, all refcount 
>>> blocks
>>> and arrays in memory should not have a specific type (thus they become
>>> pointers to void) and for accessing them, two helper functions are used
>>> (a getter and a setter). Those functions are called indirectly through
>>> function pointers in the BDRVQcowState so they may later be exchanged
>>> for different refcount orders.
>>>
>>> At the same time, replace all sizeof(**refcount_table) etc. in the 
>>> qcow2
>>> check code by s->refcount_bits / 8. Note that this might lead to wrong
>>> values due to truncating division, but currently s->refcount_bits is
>>> always 16, and before the upcoming patch which removes this limitation
>>> another patch will make the division round up correctly.
>> Thanks for pointing out that this transition is still in progress, and
>> needs more patches.  I agree that for this patch, the division is safe.
>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>   block/qcow2-refcount.c | 152 
>>> +++++++++++++++++++++++++++++--------------------
>>>   block/qcow2.h          |   8 +++
>>>   2 files changed, 98 insertions(+), 62 deletions(-)
>>>
>>> @@ -116,20 +137,24 @@ int64_t qcow2_get_refcount(BlockDriverState 
>>> *bs, int64_t cluster_index)
>>>       }
>>>         ret = qcow2_cache_get(bs, s->refcount_block_cache, 
>>> refcount_block_offset,
>>> -        (void**) &refcount_block);
>>> +                          &refcount_block);
>>>       if (ret < 0) {
>>>           return ret;
>>>       }
>>>         block_index = cluster_index & (s->refcount_block_size - 1);
>>> -    refcount = be16_to_cpu(refcount_block[block_index]);
>>> +    refcount = s->get_refcount(refcount_block, block_index);
>>>   -    ret = qcow2_cache_put(bs, s->refcount_block_cache,
>>> -        (void**) &refcount_block);
>>> +    ret = qcow2_cache_put(bs, s->refcount_block_cache, 
>>> &refcount_block);
>>>       if (ret < 0) {
>>>           return ret;
>>>       }
>>>   +    if (refcount < 0) {
>>> +        /* overflow */
>>> +        return -ERANGE;
>>> +    }
>> Should you be checking for overflow prior to calling qcow2_cache_put?
>
> I don't think so; we want to free the cache entry reference even if 
> the refblock seems unusable.
>
>>> @@ -362,7 +387,7 @@ static int alloc_refcount_block(BlockDriverState 
>>> *bs,
>>>           s->cluster_size;
>>>       uint64_t table_offset = meta_offset + blocks_clusters * 
>>> s->cluster_size;
>>>       uint64_t *new_table = g_try_new0(uint64_t, table_size);
>>> -    uint16_t *new_blocks = g_try_malloc0(blocks_clusters * 
>>> s->cluster_size);
>>> +    void *new_blocks = g_try_malloc0(blocks_clusters * 
>>> s->cluster_size);
>> Can this multiplication ever overflow?  Would we be better off with a
>> g_try_new0 approach?
>
> Yes, you're right. Patch 7 introduces a helper function which will 
> allow checking such overflows in a central place, but I haven't done 
> so in this version.

I should really start reading my own code before responding; at least be 
assured that I do read my code when working on a respin. Of course this 
has nothing to do with patch 7, and I'll fix it. I don't think there'll 
be a problem here (because first, it would be pre-existing, and second, 
this would require offsets beyond 64 bits).

Max

>>> @@ -1118,12 +1142,13 @@ fail:
>>>    */
>>>   static int inc_refcounts(BlockDriverState *bs,
>>>                            BdrvCheckResult *res,
>>> -                         uint16_t **refcount_table,
>>> +                         void **refcount_table,
>>>                            int64_t *refcount_table_size,
>>>                            int64_t offset, int64_t size)
>>>   {
>>>       BDRVQcowState *s = bs->opaque;
>>> -    uint64_t start, last, cluster_offset, k;
>>> +    uint64_t start, last, cluster_offset, k, refcount;
>> Why uint64_t, when you limit to int64_t in other patches?
>
> Because we don't need to check for errors here. But I guess we should 
> really use int64_t everywhere to be consistent (so that if something 
> breaks because a cluster has more than INT64_MAX references, it breaks 
> everywhere).
>
>>> +    int64_t i;
>>>         if (size <= 0) {
>>>           return 0;
>>> @@ -1136,12 +1161,12 @@ static int inc_refcounts(BlockDriverState *bs,
>>>           k = cluster_offset >> s->cluster_bits;
>>>           if (k >= *refcount_table_size) {
>>>               int64_t old_refcount_table_size = *refcount_table_size;
>>> -            uint16_t *new_refcount_table;
>>> +            void *new_refcount_table;
>>>                 *refcount_table_size = k + 1;
>>>               new_refcount_table = g_try_realloc(*refcount_table,
>>> *refcount_table_size *
>>> - sizeof(**refcount_table));
>>> + s->refcount_bits / 8);
>> This multiplies before dividing.  Can it ever overflow, where writing
>> *refcount_table_size * (s->refcount_bits / 8) would be safer?  Also, is
>> it better to use a malloc variant that checks for overflow (I think it
>> is g_try_renew?) instead of open-coding the multiply?
>>
>>>               if (!new_refcount_table) {
>>>                   *refcount_table_size = old_refcount_table_size;
>>>                   res->check_errors++;
>>> @@ -1149,16 +1174,19 @@ static int inc_refcounts(BlockDriverState *bs,
>>>               }
>>>               *refcount_table = new_refcount_table;
>>>   -            memset(*refcount_table + old_refcount_table_size, 0,
>>> -                   (*refcount_table_size - old_refcount_table_size) *
>>> -                   sizeof(**refcount_table));
>>> +            for (i = old_refcount_table_size; i < 
>>> *refcount_table_size; i++) {
>>> +                s->set_refcount(*refcount_table, i, 0);
>>> +            }
>> This feels slower than memset.
>
> It is, yes. But this is the check function, I don't think performance 
> is all that important here (especially not operations in RAM).
>
>> Any chance we can add an optimization
>> that brings back the speed of memset (may require an additional callback
>> in addition to the getter and setter)?
>
> For sub-byte refcount widths, we would have to manually set all 
> non-byte aligned entries to 0 and then use memset() on the rest. Not 
> impossible, but I think too complicated for a place where performance 
> is not critical.
>
>>> @@ -1178,7 +1206,7 @@ enum {
>>>    * error occurred.
>>>    */
>>>   static int check_refcounts_l2(BlockDriverState *bs, 
>>> BdrvCheckResult *res,
>>> -    uint16_t **refcount_table, int64_t *refcount_table_size, 
>>> int64_t l2_offset,
>>> +    void **refcount_table, int64_t *refcount_table_size, int64_t 
>>> l2_offset,
>>>       int flags)
>> I noticed you cleaned up indentation in a lot of the patch, but not
>> here.  Any reason?
>
> Maybe I remembered touching that header once already and not fixing up 
> the indentation. Will do in v2.
>
>>> @@ -1541,7 +1569,7 @@ static int check_refblocks(BlockDriverState 
>>> *bs, BdrvCheckResult *res,
>>>                     new_refcount_table = g_try_realloc(*refcount_table,
>>> *nb_clusters *
>>> - sizeof(**refcount_table));
>>> + s->refcount_bits / 8);
>> Another possible overflow or g_try_renew site?
>>
>>>                   if (!new_refcount_table) {
>>>                       *nb_clusters = old_nb_clusters;
>>>                       res->check_errors++;
>>> @@ -1549,9 +1577,9 @@ static int check_refblocks(BlockDriverState 
>>> *bs, BdrvCheckResult *res,
>>>                   }
>>>                   *refcount_table = new_refcount_table;
>>>   -                memset(*refcount_table + old_nb_clusters, 0,
>>> -                       (*nb_clusters - old_nb_clusters) *
>>> -                       sizeof(**refcount_table));
>>> +                for (j = old_nb_clusters; j < *nb_clusters; j++) {
>>> +                    s->set_refcount(*refcount_table, j, 0);
>>> +                }
>> Another memset pessimation.  Maybe even having a callback to expand the
>> table, and factor out more of the common code of reallocating the table
>> and clearing all new entries.
>
> That sounds useful, yes. I'll look into it, but I'll probably still go 
> without memset().
>
>>> @@ -1611,7 +1640,7 @@ static int 
>>> calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>>       int ret;
>>>         if (!*refcount_table) {
>>> -        *refcount_table = g_try_new0(uint16_t, *nb_clusters);
>>> +        *refcount_table = g_try_malloc0(*nb_clusters * 
>>> s->refcount_bits / 8);
>> Feels like a step backwards in overflow detection?
>>
>>> @@ -1787,22 +1816,22 @@ static int64_t 
>>> alloc_clusters_imrt(BlockDriverState *bs,
>>>           *imrt_nb_clusters = cluster + cluster_count - 
>>> contiguous_free_clusters;
>>>           new_refcount_table = g_try_realloc(*refcount_table,
>>>                                              *imrt_nb_clusters *
>>> - sizeof(**refcount_table));
>>> + s->refcount_bits / 8);
>> Another possible overflow
>>
>>>           if (!new_refcount_table) {
>>>               *imrt_nb_clusters = old_imrt_nb_clusters;
>>>               return -ENOMEM;
>>>           }
>>>           *refcount_table = new_refcount_table;
>>>   -        memset(*refcount_table + old_imrt_nb_clusters, 0,
>>> -               (*imrt_nb_clusters - old_imrt_nb_clusters) *
>>> -               sizeof(**refcount_table));
>>> +        for (i = old_imrt_nb_clusters; i < *imrt_nb_clusters; i++) {
>>> +            s->set_refcount(*refcount_table, i, 0);
>>> +        }
>>>       }
>> and another resize where we pessimize memset
>>
>>
>>> @@ -1911,12 +1940,11 @@ write_refblocks:
>>>           }
>>>             on_disk_refblock = qemu_blockalign0(bs->file, 
>>> s->cluster_size);
>>> -        for (i = 0; i < s->refcount_block_size &&
>>> -                    refblock_start + i < *nb_clusters; i++)
>>> -        {
>>> -            on_disk_refblock[i] =
>>> -                cpu_to_be16((*refcount_table)[refblock_start + i]);
>>> -        }
>>> +
>>> +        memcpy(on_disk_refblock, (void *)((uintptr_t)*refcount_table +
>>> +                                 (refblock_index << 
>>> s->refcount_block_bits)),
>>> +               MIN(s->refcount_block_size, *nb_clusters - 
>>> refblock_start)
>>> +               * s->refcount_bits / 8);
>> This one's different in that you move TO a memcpy instead of open-coded
>> loop.
>
> Yes, because the set_refcount() and get_refcount() helpers store the 
> refcounts already in big endian, so now we can directly use the data 
> without conversion.
>
>> But I still worry if multiply before /8 could be a problem.
>>
>>> @@ -2064,7 +2092,7 @@ int qcow2_check_refcounts(BlockDriverState 
>>> *bs, BdrvCheckResult *res,
>>>           /* Because the old reftable has been exchanged for a new 
>>> one the
>>>            * references have to be recalculated */
>>>           rebuild = false;
>>> -        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
>>> +        memset(refcount_table, 0, nb_clusters * s->refcount_bits / 8);
>> Another /8 possible overflow.
>>
>>>           ret = calculate_refcounts(bs, res, 0, &rebuild, 
>>> &refcount_table,
>>>                                     &nb_clusters);
>>>           if (ret < 0) {
>>> diff --git a/block/qcow2.h b/block/qcow2.h
>>> index 0f8eb15..1c63221 100644
>>> --- a/block/qcow2.h
>>> +++ b/block/qcow2.h
>>> @@ -213,6 +213,11 @@ typedef struct Qcow2DiscardRegion {
>>>       QTAILQ_ENTRY(Qcow2DiscardRegion) next;
>>>   } Qcow2DiscardRegion;
>>>   +typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
>>> +                                      uint64_t index);
>>> +typedef void Qcow2SetRefcountFunc(void *refcount_array,
>>> +                                  uint64_t index, uint64_t value);
>> Do you want int64_t for any of the types here, to make it obvious that
>> you can't exceed 2^63?
>
> Yes, I do.
>
>> Looks like you are on track to a sane conversion, but I'm worried enough
>> about the math that it probably needs a respin (either comments stating
>> why we know we don't overflow, or else safer constructs).
>
> I'll add a comment in the commit message why overflows do not really 
> get more probable than they were before, but the real overflow 
> prevention will happen in patch 7.
>
> I think I'll factor out the refcount array resize which automatically 
> sets the newly allocated entries to zero. Now that I think about it... 
> I can actually get away without sub-byte operations. Since .*realloc() 
> will only allocate full bytes anyway, I only need to set that newly 
> allocated area to zero (with memset()). So it'll be back to memset(). 
> (I'm still wondering why there's no g_try_realloc0() or something; 
> probably because the glib does not require the libc heap manager to 
> know the exact size of each heap object which would be necessary for 
> that to work)
>
> Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 02/21] qcow2: Add refcount_width to format-specific info
  2014-11-11  8:11     ` Max Reitz
@ 2014-11-11 15:49       ` Eric Blake
  0 siblings, 0 replies; 75+ messages in thread
From: Eric Blake @ 2014-11-11 15:49 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2033 bytes --]

On 11/11/2014 01:11 AM, Max Reitz wrote:
>>> +            .compat             = g_strdup("0.10"),
>>> +            .refcount_width     = s->refcount_bits,
>> Hmm - is it really worth displaying a constant?  Since the 0.10 format
>> cannot change the width from 16, I'm not sure if it adds anything to the
>> output to display it.  After all, there's other things we omit for the
>> old format when they cannot be altered (such as the state of a lazy
>> flag).  On the other hand, if it makes your changes to later iotests
>> easier for tests that operate on both image formats, I'm not opposed
>> to it.
> 
> Yes, I thought about not displaying it. But whereas "corrupt" or "lazy
> refcounts" simply do not make sense with compat=0.10 images (it's simply
> impossible), the refcount width does make sense. It's always 16 bits
> (I'm noticing myself how I keep swapping between "bit" and "bits", but I
> just can't help it) but I personally find it interesting enough to
> display. I'd be fine with dropping it from compat=0.10, though.
> 
> But in retrospect, I'd rather make the other two flags always visible
> than now drop this entry. However, not displaying a bool if it's always
> false makes more sense to me than not displaying an integer because it's
> always constant.
> 

>> If you can make a strong argument for always outputting the constant
>> width of 16 for 0.10 formats, then I can live with it, so:
> 
> You decide whether it's strong enough. :-)
> 
> My main argument is "If a bool is not displayed one can assume it to be
> false; if an integer is not displayed which naturally cannot be 0, I
> will have no idea what it would be, even if it's constant for that image
> version".

Sounds fairly convincing :)  Add a paragraph like that to the commit
message, and I'm sold!

> 
>> Reviewed-by: Eric Blake <eblake@redhat.com>

So looks like you get to keep this.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes()
  2014-11-11  8:22     ` Max Reitz
@ 2014-11-11 16:13       ` Eric Blake
  2014-11-11 16:18         ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-11 16:13 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 3220 bytes --]

On 11/11/2014 01:22 AM, Max Reitz wrote:
> On 2014-11-10 at 22:12, Eric Blake wrote:
>> On 11/10/2014 06:45 AM, Max Reitz wrote:
>>> qcow2_alloc_bytes() may reuse a cluster multiple times, in which case
>>> the refcount is increased accordingly. However, if this would lead to an
>>> overflow the function should instead just not reuse this cluster and
>>> allocate a new one.
>> So if recount_order is 1 (2 bits per refcount, max refcount of 4
> 
> *max refcount of 3 (0b11)

Oh right, because 0 is special.  Although I think I figured that out...

> 
>> ), and
>> we encounter the same cluster 6 times (say by 5 back-to-back internal
>> snapshots), does this code optimize to only 2 clusters (both with
>> refcount 3) or does it result in each of the last 3 clusters spilling to

...when talking about 3 shares of a cluster.

>> its own 1-ref cluster for a total of 4 clusters?  Short of Benoit's work
>> on deduplication, is there even a way to avoid inefficient use of
>> spilled clusters?
> 
> I'm not sure what you're referring to; maybe I should add that
> qcow2_alloc_bytes() is used for allocating compressed clusters (which
> ideally don't take up a full host cluster), so "reuse" in this context
> just means that several compressed clusters share one host cluster.

No, I was thinking about internal snapshots rather than compressed
clusters (although there's probably some overlap on what happens).

> 
> Maybe you're referring to the following situation: We have the default
> cluster size of 64k. Now we're trying to allocate 16k for each of the
> compressed clusters A, B, C and D. D won't fit into that cluster because
> the maximum refcount is three, so it will be put into a newly allocated
> host cluster. Finally, we're trying to allocate 32k for a compressed
> cluster E, which will then be put into the same cluster as D. We
> therefore have the following allocation (each sub-box representing 16k):
> 
> +---+---+---+---+   +---+---+---+---+
> |A |B | C |   |   | D |   E | |
> +---+---+---+---+   +---+---+---+---+
> 
> whereas the ideal allocation would be:
> 
> +---+---+---+---+   +---+---+---+---+
> |A |B |   E   |   | C | D | | |
> +---+---+---+---+   +---+---+---+---+
> 
> This is a problem, but I think first it's a minor one (just use a
> sufficiently large refcount width if you're going to use compressed
> clusters) and second it's about compressed clusters, whose performance I
> could hardly care less about, frankly.

No, I was envisioning that we have a brand new image with one cluster
allocated (cluster 1 has refcount 1), then 5 times in a row we do
'savevm' to take an internal snapshot.  If I understand your code
correctly, the first two snapshots increase the refcount, so cluster 1
has a refcount of 3. Then the next snapshot can't increase the refcount,
so it instead copies the contents to cluster 2.  The fourth and fifth
snapshots also see that cluster 1 is full, and allocate cluster 3 and 4;
whereas a more efficient usage would increase the refcount of cluster 2
instead of allocating.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes()
  2014-11-11 16:13       ` Eric Blake
@ 2014-11-11 16:18         ` Max Reitz
  2014-11-11 19:49           ` Eric Blake
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-11 16:18 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-11 at 17:13, Eric Blake wrote:
> On 11/11/2014 01:22 AM, Max Reitz wrote:
>> On 2014-11-10 at 22:12, Eric Blake wrote:
>>> On 11/10/2014 06:45 AM, Max Reitz wrote:
>>>> qcow2_alloc_bytes() may reuse a cluster multiple times, in which case
>>>> the refcount is increased accordingly. However, if this would lead to an
>>>> overflow the function should instead just not reuse this cluster and
>>>> allocate a new one.
>>> So if recount_order is 1 (2 bits per refcount, max refcount of 4
>> *max refcount of 3 (0b11)
> Oh right, because 0 is special.  Although I think I figured that out...
>
>>> ), and
>>> we encounter the same cluster 6 times (say by 5 back-to-back internal
>>> snapshots), does this code optimize to only 2 clusters (both with
>>> refcount 3) or does it result in each of the last 3 clusters spilling to
> ...when talking about 3 shares of a cluster.
>
>>> its own 1-ref cluster for a total of 4 clusters?  Short of Benoit's work
>>> on deduplication, is there even a way to avoid inefficient use of
>>> spilled clusters?
>> I'm not sure what you're referring to; maybe I should add that
>> qcow2_alloc_bytes() is used for allocating compressed clusters (which
>> ideally don't take up a full host cluster), so "reuse" in this context
>> just means that several compressed clusters share one host cluster.
> No, I was thinking about internal snapshots rather than compressed
> clusters (although there's probably some overlap on what happens).
>
>> Maybe you're referring to the following situation: We have the default
>> cluster size of 64k. Now we're trying to allocate 16k for each of the
>> compressed clusters A, B, C and D. D won't fit into that cluster because
>> the maximum refcount is three, so it will be put into a newly allocated
>> host cluster. Finally, we're trying to allocate 32k for a compressed
>> cluster E, which will then be put into the same cluster as D. We
>> therefore have the following allocation (each sub-box representing 16k):
>>
>> +---+---+---+---+   +---+---+---+---+
>> |A |B | C |   |   | D |   E | |
>> +---+---+---+---+   +---+---+---+---+
>>
>> whereas the ideal allocation would be:
>>
>> +---+---+---+---+   +---+---+---+---+
>> |A |B |   E   |   | C | D | | |
>> +---+---+---+---+   +---+---+---+---+
>>
>> This is a problem, but I think first it's a minor one (just use a
>> sufficiently large refcount width if you're going to use compressed
>> clusters) and second it's about compressed clusters, whose performance I
>> could hardly care less about, frankly.
> No, I was envisioning that we have a brand new image with one cluster
> allocated (cluster 1 has refcount 1), then 5 times in a row we do
> 'savevm' to take an internal snapshot.  If I understand your code
> correctly, the first two snapshots increase the refcount, so cluster 1
> has a refcount of 3. Then the next snapshot can't increase the refcount,
> so it instead copies the contents to cluster 2.

No, it just errors out.

qcow2_alloc_bytes() is only used for allocating space for a compressed 
cluster. When taking a snapshot, update_refcount() will be called to 
increase the clusters' refcounts, and that function will simply throw an 
error.

Max

> The fourth and fifth
> snapshots also see that cluster 1 is full, and allocate cluster 3 and 4;
> whereas a more efficient usage would increase the refcount of cluster 2
> instead of allocating.
>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 11/21] iotests: Prepare for refcount_width option
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 11/21] iotests: Prepare for refcount_width option Max Reitz
@ 2014-11-11 17:57   ` Eric Blake
  2014-11-12  8:41     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-11 17:57 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 3726 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Some tests do not work well with certain refcount widths (i.e. you
> cannot create internal snapshots with refcount_width=1), so make those
> widths unsupported.
> 
> Furthermore, add another filter to _filter_img_create in common.filter
> which filters out the refcount_width value.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tests/qemu-iotests/007           |  4 ++++
>  tests/qemu-iotests/015           |  1 +
>  tests/qemu-iotests/026           | 11 +++++++++++
>  tests/qemu-iotests/029           |  1 +
>  tests/qemu-iotests/051           |  1 +
>  tests/qemu-iotests/058           |  1 +
>  tests/qemu-iotests/067           |  7 +++++++
>  tests/qemu-iotests/079           |  1 +
>  tests/qemu-iotests/080           |  1 +
>  tests/qemu-iotests/089           |  7 +++++++
>  tests/qemu-iotests/090           |  1 +
>  tests/qemu-iotests/108           |  6 ++++++
>  tests/qemu-iotests/common.filter |  3 ++-
>  13 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/007 b/tests/qemu-iotests/007
> index fe1a743..de39d1b 100755
> --- a/tests/qemu-iotests/007
> +++ b/tests/qemu-iotests/007
> @@ -43,6 +43,10 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>  _supported_fmt qcow2
>  _supported_proto generic
>  _supported_os Linux
> +# refcount_width must be at least 4 bits so we can create ten internal snapshots
> +# (1 bit supports none, 2 bits support three, 4 bits support 15)

Feels like an off-by-one comment.  A width of 1 bit support a max
refcount of 1 (therefore no snapshots), a width of 2 bits supports a max
refcount of 3 (therefore 2 snapshots in addition to the original), a
width of 4 bits supports a max refcount of 15 (therefore only 14 snapshots).

> +++ b/tests/qemu-iotests/067
> @@ -35,6 +35,13 @@ status=1	# failure is the default!
>  _supported_fmt qcow2
>  _supported_proto file
>  _supported_os Linux
> +# Because this would change the output of query-block
> +_unsupported_imgopts 'refcount_width=1[^0-9]' \
> +                     'refcount_width=2[^0-9]' \
> +                     'refcount_width=4[^0-9]' \
> +                     'refcount_width=8[^0-9]' \
> +                     'refcount_width=32[^0-9]' \
> +                     'refcount_width=64[^0-9]'

It might be more compact to exploit globbing and just say:

_unsupported_imgopts 'refcount_width=?[^6]'

which leaves refcount_width=16 as the only pattern that doesn't match
the glob.  But that feels more fragile, so I can live with your longer list.

> +++ b/tests/qemu-iotests/089
> @@ -41,6 +41,13 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>  _supported_fmt qcow2
>  _supported_proto file
>  _supported_os Linux
> +# Because this would change the output of qemu_io -c info
> +_unsupported_imgopts 'refcount_width=1[^0-9]' \

I like how you give reasons for some tests...

> +++ b/tests/qemu-iotests/090
> @@ -41,6 +41,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>  _supported_fmt qcow2
>  _supported_proto file nfs
>  _supported_os Linux
> +_unsupported_imgopts 'refcount_width=1[^0-9]'

...so why not do it for all tests?

At any rate, the patch makes sense, so whether or not you tweak comments,

Reviewed-by: Eric Blake <eblake@redhat.com>

I'm assuming that later in the series you add a test that explicitly
covers the error message given when a refcount_order=0 (width=1) image
is attempted to be used with snapshots, since that will fail (internal
snapshots are simply not possible without a refcount that can't exceed 1).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 12/21] qcow2: Allow creation with refcount order != 4
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 12/21] qcow2: Allow creation with refcount order != 4 Max Reitz
@ 2014-11-11 18:05   ` Eric Blake
  2014-11-12  8:47     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-11 18:05 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1320 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Add a creation option to qcow2 for setting the refcount order of images
> to be created, and respect that option's value.
> 
> This breaks some test outputs, fix them.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c              |  20 ++++++++
>  include/block/block_int.h  |   1 +
>  tests/qemu-iotests/049.out | 112 ++++++++++++++++++++++-----------------------
>  tests/qemu-iotests/079.out |  18 ++++----
>  tests/qemu-iotests/082.out |  41 ++++++++++++++---
>  tests/qemu-iotests/085.out |  38 +++++++--------
>  6 files changed, 139 insertions(+), 91 deletions(-)

Is there any .json file that needs to be modified to allow this option
to be set via QMP?  I guess that can be a followup.

>  qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- 1kilobyte
> -qemu-img: Invalid image size specified! You may use k, M, G, T, P or E suffixes for 
> +qemu-img: Invalid image size specified! You may use k, M, G, T, P or E suffixes for
>  qemu-img: kilobytes, megabytes, gigabytes, terabytes, petabytes and exabytes.

Nice that you are also getting rid of trailing whitespace.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 13/21] block: Add opaque value to the amend CB
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 13/21] block: Add opaque value to the amend CB Max Reitz
@ 2014-11-11 18:08   ` Eric Blake
  0 siblings, 0 replies; 75+ messages in thread
From: Eric Blake @ 2014-11-11 18:08 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 705 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Add an opaque value which is to be passed to the bdrv_amend_options()
> status callback.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block.c                   |  4 ++--
>  block/qcow2-cluster.c     | 14 ++++++++------
>  block/qcow2.c             |  9 +++++----
>  block/qcow2.h             |  3 ++-
>  include/block/block.h     |  4 ++--
>  include/block/block_int.h |  3 ++-
>  qemu-img.c                |  5 +++--
>  7 files changed, 24 insertions(+), 18 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 14/21] qcow2: Use error_report() in qcow2_amend_options()
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 14/21] qcow2: Use error_report() in qcow2_amend_options() Max Reitz
@ 2014-11-11 18:11   ` Eric Blake
  2014-11-12  8:47     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-11 18:11 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2615 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c              | 14 ++++++--------
>  tests/qemu-iotests/061.out | 14 +++++++-------
>  2 files changed, 13 insertions(+), 15 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 21a1883..beb7187 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2686,11 +2686,11 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
>              } else if (!strcmp(compat, "1.1")) {
>                  new_version = 3;
>              } else {
> -                fprintf(stderr, "Unknown compatibility level %s.\n", compat);
> +                error_report("Unknown compatibility level %s.", compat);

Not many error_report() locations include a trailing '.'

> +++ b/tests/qemu-iotests/061.out
> @@ -281,19 +281,19 @@ No errors were found on the image.
>  === Testing invalid configurations ===
>  
>  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
> -Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
> +qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
>  qemu-img: Error while amending options: Invalid argument
> -Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
> +qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
>  qemu-img: Error while amending options: Invalid argument
> -Unknown compatibility level 0.42.
> +qemu-img: Unknown compatibility level 0.42.
>  qemu-img: Error while amending options: Invalid argument
>  qemu-img: Invalid parameter 'foo'
>  qemu-img: Invalid options for file format 'qcow2'
> -Changing the cluster size is not supported.
> +qemu-img: Changing the cluster size is not supported.
>  qemu-img: Error while amending options: Operation not supported
> -Changing the encryption flag is not supported.
> +qemu-img: Changing the encryption flag is not supported.
>  qemu-img: Error while amending options: Operation not supported
> -Cannot change preallocation mode.
> +qemu-img: Cannot change preallocation mode.
>  qemu-img: Error while amending options: Operation not supported

See - most of the messages do not end in '.'.  Probably worth cleaning
up if you respin.  But it's not a show-stopper if you leave it, so:

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 15/21] qcow2: Use abort() instead of assert(false)
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 15/21] qcow2: Use abort() instead of assert(false) Max Reitz
@ 2014-11-11 18:12   ` Eric Blake
  2014-11-12  8:48     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-11 18:12 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1067 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

Is it worth hoisting this one into 2.2 via the -trivial tree?

> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index beb7187..ebf843f 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2718,9 +2718,9 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
>              error_report("Cannot change refcount entry width");
>              return -ENOTSUP;
>          } else {
> -            /* if this assertion fails, this probably means a new option was
> +            /* if this point is reached, this probably means a new option was
>               * added without having it covered here */
> -            assert(false);
> +            abort();
>          }
>  
>          desc++;
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 16/21] qcow2: Split upgrade/downgrade paths for amend
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 16/21] qcow2: Split upgrade/downgrade paths for amend Max Reitz
@ 2014-11-11 18:14   ` Eric Blake
  0 siblings, 0 replies; 75+ messages in thread
From: Eric Blake @ 2014-11-11 18:14 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 617 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> If the image version should be upgraded, that is the first we should do;
> if it should be downgraded, that is the last we should do. So split the
> version change block into an upgrade part at the start and a downgrade
> part at the end.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 31 ++++++++++++++++---------------
>  1 file changed, 16 insertions(+), 15 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes()
  2014-11-11 16:18         ` Max Reitz
@ 2014-11-11 19:49           ` Eric Blake
  2014-11-12  8:52             ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-11 19:49 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1496 bytes --]

On 11/11/2014 09:18 AM, Max Reitz wrote:

>> No, I was envisioning that we have a brand new image with one cluster
>> allocated (cluster 1 has refcount 1), then 5 times in a row we do
>> 'savevm' to take an internal snapshot.  If I understand your code
>> correctly, the first two snapshots increase the refcount, so cluster 1
>> has a refcount of 3. Then the next snapshot can't increase the refcount,
>> so it instead copies the contents to cluster 2.
> 
> No, it just errors out.
> 
> qcow2_alloc_bytes() is only used for allocating space for a compressed
> cluster. When taking a snapshot, update_refcount() will be called to
> increase the clusters' refcounts, and that function will simply throw an
> error.

That's okay for now (always better for an initial feature to be
conservative, then expand it later if there is demand).  But I wonder if
we could be made smarter in the future and auto-COW any cluster that
would otherwise exceed max refcount.  Thus, for a refcount_order=0
(width=1) image, a snapshot now doubles the size of the image (as every
single cluster would COW into a new cluster) rather than erroring out.
Food for thought; maybe worth injecting comments into this series
(whether in code or in commit messages, as appropriate) pointing out
that we thought about the future possibility even though we chose not to
allow it for now.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 21/21] iotests: Add test for different refcount widths
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 21/21] iotests: Add test for different refcount widths Max Reitz
@ 2014-11-11 19:53   ` Eric Blake
  2014-11-12  8:58     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-11 19:53 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 3842 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Add a test for conversion between different refcount widths and errors
> specific to certain widths (i.e. snapshots with refcount_width=1).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tests/qemu-iotests/112     | 225 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/112.out | 123 +++++++++++++++++++++++++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 349 insertions(+)
>  create mode 100755 tests/qemu-iotests/112
>  create mode 100644 tests/qemu-iotests/112.out
> 

> +
> +# This tests qocw2-specific low-level functionality
> +_supported_fmt qcow2
> +_supported_proto file
> +_supported_os Linux

Might work on more than just Linux, but then again, it's probably worth
scrubbing the whole testsuite for situations like that, so don't worry
about it here.

> +# This test will set refcount_width on its own which would conflict with the
> +# manual setting; compat will be overridden as well
> +_unsupported_imgopts refcount_width 'compat=0.10'
> +
> +function print_refcount_width()
> +{
> +    $QEMU_IMG info "$TEST_IMG" | grep 'refcount width:' | sed -e 's/^ *//'

grep|sed is almost always a waste.  This is equivalent:

   $QEMU_IMG info "$TEST_IMG" | sed -n '/refcount width:/ s/^ *//p'

> +echo
> +echo '=== Snapshot limit on refcount_width=1 ==='
> +echo
> +
> +IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
> +print_refcount_width
> +
> +$QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
> +
> +# Should fail
> +$QEMU_IMG snapshot -c foo "$TEST_IMG"
> +
> +# The new L1 table could/shoud be leaked

s/shoud/should/

> +_check_test_img
> +
> +echo
> +echo '=== Snapshot limit on refcount_width=2 ==='
> +echo
> +
> +IMGOPTS="$IMGOPTS,refcount_width=2" _make_test_img 64M
> +print_refcount_width
> +
> +$QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
> +
> +# Should succeed
> +$QEMU_IMG snapshot -c foo "$TEST_IMG"
> +$QEMU_IMG snapshot -c bar "$TEST_IMG"
> +# Should fail (4th reference)
> +$QEMU_IMG snapshot -c baz "$TEST_IMG"
> +
> +# The new L1 table could/shoud be leaked

again

> +echo
> +echo '=== Amend with snapshot ==='
> +echo
> +
> +$QEMU_IMG snapshot -c foo "$TEST_IMG"
> +# Just to have different refcounts across the image
> +$QEMU_IO -c 'write 0 16M' "$TEST_IMG" | _filter_qemu_io
> +
> +# Should not work
> +$QEMU_IMG amend -o refcount_width=1 "$TEST_IMG"
> +_check_test_img
> +print_refcount_width

This matches your initial implementation. Someday, though, we may decide
to auto-COW any overflowed cluster, and thus allow the conversion to
succeed.  Worth a comment?

> +echo '=== Testing too many references for check ==='
> +echo
> +
> +IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
> +print_refcount_width
> +
> +# This cluster should be created at 0x50000
> +$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
> +# Now make the second L2 entriy (the L2 table should be at 0x40000) point to

s/entriy/entry/


> +# success, all done
> +echo '*** done'
> +rm -f $seq.full
> +status=0

Overall a nice set of tests!


> +=== Snapshot limit on refcount_width=1 ===
> +
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
> +refcount width: 1
> +wrote 512/512 bytes at offset 0
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +qemu-img: Could not create snapshot 'foo': -22 (Invalid argument)
> +Leaked cluster 6 refcount=1 reference=0

Bummer that the error message did not state WHY (because a cluster would
overflow refcounts), but I'm not sure how hard it would be to make that
better, and at least we correctly errored out.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 17/21] qcow2: Use intermediate helper CB for amend
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 17/21] qcow2: Use intermediate helper CB " Max Reitz
@ 2014-11-11 21:05   ` Eric Blake
  2014-11-12  9:10     ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-11 21:05 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 8881 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> If there is more than one time-consuming operation to be performed for
> qcow2_amend_options(), we need an intermediate CB which coordinates the
> progress of the individual operations and passes the result to the
> original status callback.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 75 insertions(+), 1 deletion(-)

Getting trickier to review.

> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index eaef251..e6b93d1 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2655,6 +2655,71 @@ static int qcow2_downgrade(BlockDriverState *bs, int target_version,
>      return 0;
>  }
>  
> +typedef enum Qcow2AmendOperation {
> +    /* This is the value Qcow2AmendHelperCBInfo::last_operation will be
> +     * statically initialized to so that the helper CB can discern the first
> +     * invocation from an operation change */
> +    QCOW2_NO_OPERATION = 0,
> +
> +    QCOW2_DOWNGRADING,
> +} Qcow2AmendOperation;

So for this patch, you still have just one operation, but later in the
series, you add a second (and the goal of THIS patch is that it will
work even if there are 3 or more operations, even though this series
doesn't add that many).

> +static void qcow2_amend_helper_cb(BlockDriverState *bs, int64_t offset,
> +                                 int64_t total_work_size, void *opaque)

indentation looks off

> +{
> +    Qcow2AmendHelperCBInfo *info = opaque;
> +    int64_t current_work_size;
> +    int64_t projected_work_size;

Worth asserting that info->total_operations is non-zero?  Or is there
ever a valid case for calling the callback even when there are no
sub-operations, and therefore we are automatically complete (offset ==
total_work_size)?

> +
> +    if (info->current_operation != info->last_operation) {
> +        if (info->last_operation != QCOW2_NO_OPERATION) {
> +            info->offset_completed += info->last_work_size;
> +            info->operations_completed++;
> +        }

Would it be any easier to guarantee that we come to 100% completion by
requiring the coordinator to pass a final completion callback? [1]
 info->current_operation = QCOW2_NO_OPERATION;
 cb(bs, 0, 0, info)

> +
> +        info->last_operation = info->current_operation;
> +    }
> +
> +    info->last_work_size = total_work_size;

Took me a while to realize that total_work_size is the incoming
(estimated) total size for the current sub-operation, and not the total
over the combination of all sub-operations...

> +
> +    current_work_size = info->offset_completed + total_work_size;
> +
> +    /* current_work_size is the total work size for (operations_completed + 1)

but this comment helped.

> +     * operations (which includes this one), so multiply it by the number of
> +     * operations not covered and divide it by the number of operations
> +     * covered to get a projection for the operations not covered */
> +    projected_work_size = current_work_size * (info->total_operations -
> +                                               info->operations_completed - 1)
> +                                            / (info->operations_completed + 1);

So, when there is just one sub-operation (which is the case until later
patches add a second), this results in the following calculation for ALL
calls during the intermediate steps of the sub-operation:

projected_work_size = total_work_size * (1 - 0 - 1) / (0 + 1)

that is, we are projecting 0 additional work because we have zero
additional stages to complete.  Am I correct that we will never enter
the callback in a state where
info->operations_completed==info->total_operations?  (because if we do,
you'd have a computation of final_size * (1 - 1 - 1) / (1 + 1) which
looks weird).  Worth an assert()?  Then again, my proposal above [1] to
guarantee a 100% completion by use of a final cleanup callback would
indeed reach the point where operations_completed==total_operations.

> +
> +    info->original_status_cb(bs, info->offset_completed + offset,
> +                             current_work_size + projected_work_size,
> +                             info->original_cb_opaque);

So, as long as we don't add a second phase, this is strictly equivalent
to calling the original callback with the original offset (since
info->offset_completed remains 0) and original work size (since
projected_work_size remains 0).  That part works fine.

Let's see what happens if we had three phases.  To make it more
interesting, let's pick some numbers - how about the first phase
progresses from 0-10, the second from 0-100, and the third from 0-10,
and where none of the sub-operations change predicted total_work_size.
The caller would first set info->current_operation to 1, then call the
callback a few times; how about twice with 5/10 and 10/10.  For both
calls, current_work_size is 0+10, then projected_work_size is
10*(3-0-1)/(0+1) == 20, and we call the original callback with
(0+5)/(10+20) and (0+10)/(10+20).  Pretty good (5/30 and 10/30 are right
on if the first sub-command is exactly one-third of the time of the
overall command; and even if it is not, it still shows reasonable progress).

Then we move on to the second sub-command where the coordinator updates
info->current_operation to 2 before triggering several callbacks; let's
suppose it reports at 0/100, 30/100, 60/100, and 100/100.  The first
call updates info to track that we've detected a change in sub-command
(offset_completed is now 10, operations_completed is now 1).  Then for
all four calls, current_work_size is 10+100, and projected_work_size is
110*(3-1-1)/(1+1) == 55.  So we call the original callback with
(10+0)/(110+55), (10+30)/(110+55), (10+60)/(110+55), (10+100)/(110+55).
 The first report of 10/165 looks like we jumped backwards (much smaller
progress than our previous report of 10/30), but that's merely a
representation that this phase is estimating a larger total_work count,
and we have no way of correlating whether 1 unit of work count in each
phase is equivalent to an equal amount of time.  But by the end, we
report 110/165, which is spot on for being two-thirds complete.

Another assignment to info->current_operation, and a couple more
callbacks; let's again use 5/10 and 10/10.  The first callback updates
info (offset_completed is now 110, operations_completed is now 2).  For
each call, current_work_size is 110+10, and projected_work_size is
120*(3-2-1/(2+1) == 0.  We call the original callback with
(120+5)/(120+10) and (120+10)/(120+10).  We've done a very rapid jump
from 2/3 to 125/130, but end the overall operation with the two values
equal.  So the function is not very smooth, but at least it is as good
an estimate as possible along each stage of the operation, and we never
violate the premise of reporting equal values until all sub-commands are
complete.

> +}
> +
>  static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
>                                 BlockDriverAmendStatusCB *status_cb,
>                                 void *cb_opaque)
> @@ -2669,6 +2734,7 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
>      bool encrypt;
>      int ret;
>      QemuOptDesc *desc = opts->list->desc;
> +    Qcow2AmendHelperCBInfo helper_cb_info;
>  
>      while (desc && desc->name) {
>          if (!qemu_opt_find(opts, desc->name)) {
> @@ -2726,6 +2792,12 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
>          desc++;
>      }
>  
> +    helper_cb_info = (Qcow2AmendHelperCBInfo){
> +        .original_status_cb = status_cb,
> +        .original_cb_opaque = cb_opaque,
> +        .total_operations = (new_version < old_version)
> +    };

Slick.

> +
>      /* Upgrade first (some features may require compat=1.1) */
>      if (new_version > old_version) {
>          s->qcow_version = new_version;
> @@ -2784,7 +2856,9 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
>  
>      /* Downgrade last (so unsupported features can be removed before) */
>      if (new_version < old_version) {
> -        ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
> +        helper_cb_info.current_operation = QCOW2_DOWNGRADING;
> +        ret = qcow2_downgrade(bs, new_version, &qcow2_amend_helper_cb,
> +                              &helper_cb_info);

Looks correct to me. Other than the indentation issue and possible
addition of some asserts, this is good to go.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment Max Reitz
@ 2014-11-12  4:15   ` Eric Blake
  2014-11-12  9:55     ` Max Reitz
  2014-11-12 14:19   ` Eric Blake
  1 sibling, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-12  4:15 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 28050 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Add a function qcow2_change_refcount_order() which allows changing the
> refcount order of a qcow2 image.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 424 +++++++++++++++++++++++++++++++++++++++++++++++++
>  block/qcow2.h          |   4 +
>  2 files changed, 428 insertions(+)

This is fairly big; you may want to get a second review from a
maintainer rather than blindly trusting me.

My review was not linear, but I left the email in linear order.  Feel
free to ask for clarification if my presentation is too hard to follow.

> +
> +/**
> + * This "operation" for walk_over_reftable() allocates the refblock on disk (if
> + * it is not empty) and inserts its offset into the new reftable. The size of
> + * this new reftable is increased as required.
> + */
> +static int alloc_refblock(BlockDriverState *bs, uint64_t **reftable,
> +                          uint64_t reftable_index, uint64_t *reftable_size,
> +                          void *refblock, bool refblock_empty, Error **errp)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int64_t offset;
> +
> +    if (!refblock_empty && reftable_index >= *reftable_size) {
> +        uint64_t *new_reftable;
> +        uint64_t new_reftable_size;
> +
> +        new_reftable_size = ROUND_UP(reftable_index + 1,
> +                                     s->cluster_size / sizeof(uint64_t));
> +        if (new_reftable_size > QCOW_MAX_REFTABLE_SIZE / sizeof(uint64_t)) {
> +            error_setg(errp,
> +                       "This operation would make the refcount table grow "
> +                       "beyond the maximum size supported by QEMU, aborting");
> +            return -ENOTSUP;
> +        }
> +
> +        new_reftable = g_try_realloc(*reftable, new_reftable_size *
> +                                                sizeof(uint64_t));

Safe from overflow based on checks a few lines earlier.  Good.

> +        if (!new_reftable) {
> +            error_setg(errp, "Failed to increase reftable buffer size");
> +            return -ENOMEM;
> +        }
> +
> +        memset(new_reftable + *reftable_size, 0,
> +               (new_reftable_size - *reftable_size) * sizeof(uint64_t));
> +
> +        *reftable      = new_reftable;
> +        *reftable_size = new_reftable_size;

Just to check my math here:

Suppose we have an image with 512-byte clusters, and are changing from
16-bit refcount (order 4) to 64-bit refcount.  Also, suppose the
existing image has exactly filled a one-cluster refcount table (that is,
there are 64 refcount blocks, each describing at a refblock with all 256
refcount entries full, for a total image size of exactly 8M).  The
original image occupies a header (1 cluster), L1 and L2 tables, and
data; but 65 of the 16k clusters tied up in the image are dedicated to
the refcount structures.

Meanwhile, the new refcount table will have to point to 256 refcount
blocks, each holding only 64 entries, which in turn implies that the
refcount table now has to be at least 4 clusters long.  But as this
requires at least 260 clusters to represent, then even if we were able
to reuse the 65 clusters of the original table, we'd still be allocating
at least 195 clusters; in reality, your code doesn't free any old
clusters until after allocating the new, because it is easier to keep
the old table live until the new table is populated.  The process of
allocating the new clusters means we actually end up with a new refcount
table of 5 clusters long, where not all 320 refblocks will be populated.
 But as long as we are keeping the old table up-to-date for the refblock
allocations, it ALSO means that we caused a rollover of the old table
from 1 cluster into 2, which itself consumes several clusters (the
larger table must be contiguous, and we must also set up a refblock to
describe the larger table, so we've added at least three clusters
associated to the original table during the course of preparing the new
table).

Hmm - that means I found a bug in your implementation.  See [3] below.

> +    }
> +
> +    if (refblock_empty) {
> +        if (reftable_index < *reftable_size) {
> +            (*reftable)[reftable_index] = 0;

Necessary since you used g_try_realloc which leaves the new reftable
uninitialized.  Reasonable (rather than a memset) since the caller will
be visiting every single refblock in the table anyways.

> +        }
> +    } else {
> +        offset = qcow2_alloc_clusters(bs, s->cluster_size);

As mentioned above, this action will potentially change
s->refcount_table_size of the original table, which in turn makes the
caller execute its loops more often to cover the increased allocation.
Does qcow2_alloc_clusters() guarantee that the just-allocated cluster is
zero-initialized (and/or should we add a flag to the function to allow
the caller to choose whether to force zero allocation instead of leaving
uninitialized)?  See [4] below for why I ask.

> +        if (offset < 0) {
> +            error_setg_errno(errp, -offset, "Failed to allocate refblock");
> +            return offset;
> +        }
> +        (*reftable)[reftable_index++] = offset;
> +    }
> +
> +    return 0;
> +}
> +
> +/**
> + * This "operation" for walk_over_reftable() writes the refblock to disk at the
> + * offset specified by the new reftable's entry. It does not modify the new
> + * reftable or change any refcounts.
> + */
> +static int flush_refblock(BlockDriverState *bs, uint64_t **reftable,
> +                          uint64_t reftable_index, uint64_t *reftable_size,
> +                          void *refblock, bool refblock_empty, Error **errp)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int64_t offset;
> +    int ret;
> +
> +    if (refblock_empty) {
> +        if (reftable_index < *reftable_size) {
> +            assert((*reftable)[reftable_index] == 0);
> +        }
> +    } else {
> +        /* The first pass with alloc_refblock() made the reftable large enough
> +         */
> +        assert(reftable_index < *reftable_size);

Okay, I see why you couldn't hoist this assert outside of the if - the
caller may call this with refblock_empty for any refblocks at the tail
of the final partial reftable cluster.

> +        offset = (*reftable)[reftable_index];
> +        assert(offset != 0);
> +
> +        ret = qcow2_pre_write_overlap_check(bs, 0, offset, s->cluster_size);
> +        if (ret < 0) {
> +            error_setg_errno(errp, -ret, "Overlap check failed");
> +            return ret;
> +        }
> +
> +        ret = bdrv_pwrite(bs->file, offset, refblock, s->cluster_size);
> +        if (ret < 0) {
> +            error_setg_errno(errp, -ret, "Failed to write refblock");
> +            return ret;

If we fail here, do we leak all clusters written so far?  At least the
image is still consistent.  After reading further, I think I answered
myself at point [5].

> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +/**
> + * This function walks over the existing reftable and every referenced refblock;
> + * if @new_set_refcount is non-NULL, it is called for every refcount entry to
> + * create an equal new entry in the passed @new_refblock. Once that
> + * @new_refblock is completely filled, @operation will be called.
> + *
> + * @operation is expected to combine the @new_refblock and its entry in the new
> + * reftable (which is described by the parameters starting with "reftable").
> + * @refblock_empty is set if all entries in the refblock are zero.
> + *
> + * @status_cb and @cb_opaque are used for the amend operation's status callback.
> + * @index is the index of the walk_over_reftable() calls and @total is the total
> + * number of walk_over_reftable() calls per amend operation. Both are used for
> + * calculating the parameters for the status callback.

Nice writeup; I was referring to it frequently during review.

> + */
> +static int walk_over_reftable(BlockDriverState *bs, uint64_t **new_reftable,
> +                              uint64_t *new_reftable_index,
> +                              uint64_t *new_reftable_size,
> +                              void *new_refblock, int new_refblock_size,
> +                              int new_refcount_bits,
> +                              int (*operation)(BlockDriverState *bs,
> +                                               uint64_t **reftable,
> +                                               uint64_t reftable_index,
> +                                               uint64_t *reftable_size,
> +                                               void *refblock,
> +                                               bool refblock_empty,
> +                                               Error **errp),

Worth a typedef?  Maybe not; I managed.

> +                              Qcow2SetRefcountFunc *new_set_refcount,
> +                              BlockDriverAmendStatusCB *status_cb,
> +                              void *cb_opaque, int index, int total,
> +                              Error **errp)

After several reads of the patch, I see that this walk function gets
called twice - first with a NULL new_set_refcount (merely to figure out
how big the new reftable should be, as well as allocating all necessary
non-zero refcount blocks, but not committing the top-level reftable to
any particular file location); the second walk then commits the new
refcounts to disk (updating each non-zero entry in all the new refcount
blocks to match their original counterparts, but no allocation
required).  Pretty slick to ensure that we are sure that the new table
is feasible before actually swapping over to it, while still allowing a
fairly clean rollback on early failure.

> +{
> +    BDRVQcowState *s = bs->opaque;
> +    uint64_t reftable_index;
> +    bool new_refblock_empty = true;
> +    int refblock_index;
> +    int new_refblock_index = 0;
> +    int ret;
> +
> +    for (reftable_index = 0; reftable_index < s->refcount_table_size;
> +         reftable_index++)

Outer loop - for each cluster of the top-level reference table, visit
each child table and update the status callback.  On the first walk,
s->refcount_table_size might be increasing during calls to operation().

> +    {
> +        uint64_t refblock_offset = s->refcount_table[reftable_index]
> +                                 & REFT_OFFSET_MASK;
> +
> +        status_cb(bs, (uint64_t)index * s->refcount_table_size + reftable_index,
> +                  (uint64_t)total * s->refcount_table_size, cb_opaque);
> +

This never quite reaches 100%, and the caller also never reaches 100%.
I think you want one more call to status_cb() at the end of the loop (at
either site [1] or [2]) that passes an equal index and total to make it
obvious that this (portion of the) long-running conversion is complete.
 Since s->refcount_table_size may grow during the loop, the callback
does not necessarily have a constant total size; good thing we already
documented that progress bars need not have a constant total.

> +        if (refblock_offset) {
> +            void *refblock;
> +
> +            if (offset_into_cluster(s, refblock_offset)) {
> +                qcow2_signal_corruption(bs, true, -1, -1, "Refblock offset %#"
> +                                        PRIx64 " unaligned (reftable index: %#"
> +                                        PRIx64 ")", refblock_offset,
> +                                        reftable_index);
> +                error_setg(errp,
> +                           "Image is corrupt (unaligned refblock offset)");
> +                return -EIO;
> +            }
> +
> +            ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offset,
> +                                  &refblock);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret, "Failed to retrieve refblock");
> +                return ret;
> +            }
> +
> +            for (refblock_index = 0; refblock_index < s->refcount_block_size;
> +                 refblock_index++)
> +            {

If a child table (refcount block) exists, visit each refcount entry
within the table (at least one refcount in that visit should be
non-empty, otherwise we could garbage-collect the refblock and put a 0
entry in the outer loop).

> +                uint64_t refcount;
> +
> +                if (new_refblock_index >= new_refblock_size) {
> +                    /* new_refblock is now complete */
> +                    ret = operation(bs, new_reftable, *new_reftable_index,
> +                                    new_reftable_size, new_refblock,
> +                                    new_refblock_empty, errp);

The new refcount table will either be filled faster than the original
(when going from small to large refcount - calling operation() multiple
times per inner loop) or will be filled slower than the original (when
going from large to small; operation() will only be called after several
outer loops).

> +                    if (ret < 0) {
> +                        qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
> +                        return ret;
> +                    }
> +
> +                    (*new_reftable_index)++;
> +                    new_refblock_index = 0;
> +                    new_refblock_empty = true;
> +                }
> +
> +                refcount = s->get_refcount(refblock, refblock_index);
> +                if (new_refcount_bits < 64 && refcount >> new_refcount_bits) {

Technically, this get_refcount() call is dead code on the second walk,
since the first walk already validated things, so you could push all of
this code...

> +                    uint64_t offset;
> +
> +                    qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
> +
> +                    offset = ((reftable_index << s->refcount_block_bits)
> +                              + refblock_index) << s->cluster_bits;
> +
> +                    error_setg(errp, "Cannot decrease refcount entry width to "
> +                               "%i bits: Cluster at offset %#" PRIx64 " has a "
> +                               "refcount of %" PRIu64, new_refcount_bits,
> +                               offset, refcount);
> +                    return -EINVAL;
> +                }
> +
> +                if (new_set_refcount) {
> +                    new_set_refcount(new_refblock, new_refblock_index++, refcount);
> +                } else {

...here, in the branch only run on the first walk.

> +                    new_refblock_index++;
> +                }
> +                new_refblock_empty = new_refblock_empty && refcount == 0;

Worth condensing to 'new_refblock_empty &= !refcount'?  Maybe not.

> +            }
> +
> +            ret = qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret, "Failed to put refblock back into "
> +                                 "the cache");
> +                return ret;
> +            }
> +        } else {
> +            /* No refblock means every refcount is 0 */
> +            for (refblock_index = 0; refblock_index < s->refcount_block_size;
> +                 refblock_index++)

Again, visiting each (implied) entry for the given refcount block of the
outer loop.  When enlarging the width, each of the new blocks will also
be all zero; but when shrinking the width, even though all entries on
this pass are zero, we may be combining this pass with another outer
loop with non-zero data for a non-zero block in the resulting new table.

> +            {
> +                if (new_refblock_index >= new_refblock_size) {
> +                    /* new_refblock is now complete */
> +                    ret = operation(bs, new_reftable, *new_reftable_index,
> +                                    new_reftable_size, new_refblock,
> +                                    new_refblock_empty, errp);
> +                    if (ret < 0) {
> +                        return ret;
> +                    }
> +
> +                    (*new_reftable_index)++;
> +                    new_refblock_index = 0;
> +                    new_refblock_empty = true;
> +                }
> +
> +                if (new_set_refcount) {
> +                    new_set_refcount(new_refblock, new_refblock_index++, 0);

Would it be worth guaranteeing that every new refblock is 0-initialized
when allocated, so that you can skip setting a refcount to 0?  This
question depends on the answer about block allocation asked at [4] above.

> +                } else {
> +                    new_refblock_index++;
> +                }
> +            }
> +        }
> +    }
> +
> +    if (new_refblock_index > 0) {
> +        /* Complete the potentially existing partially filled final refblock */
> +        if (new_set_refcount) {
> +            for (; new_refblock_index < new_refblock_size;
> +                 new_refblock_index++)
> +            {
> +                new_set_refcount(new_refblock, new_refblock_index, 0);

Again, if you 0-initialize refblocks when allocated, you could skip this
(another instance of [4] above).

> +            }
> +        }
> +
> +        ret = operation(bs, new_reftable, *new_reftable_index,
> +                        new_reftable_size, new_refblock, new_refblock_empty,
> +                        errp);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        (*new_reftable_index)++;
> +    }

site [1] mentioned above, as a good place to make a final status
callback at 100%.  But if you do it here, it means that we call the
status callback twice with the same values (the 100% value of the first
loop is the 0% value of the second loop) - not the end of the world, but
may impact any testsuite that tracks progress reports.

> +
> +    return 0;
> +}
> +
> +int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
> +                                BlockDriverAmendStatusCB *status_cb,
> +                                void *cb_opaque, Error **errp)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    Qcow2GetRefcountFunc *new_get_refcount;
> +    Qcow2SetRefcountFunc *new_set_refcount;
> +    void *new_refblock = qemu_blockalign(bs->file, s->cluster_size);
> +    uint64_t *new_reftable = NULL, new_reftable_size = 0;
> +    uint64_t *old_reftable, old_reftable_size, old_reftable_offset;
> +    uint64_t new_reftable_index = 0;
> +    uint64_t i;
> +    int64_t new_reftable_offset;
> +    int new_refblock_size, new_refcount_bits = 1 << refcount_order;
> +    int old_refcount_order;
> +    int ret;
> +
> +    assert(s->qcow_version >= 3);
> +    assert(refcount_order >= 0 && refcount_order <= 6);
> +
> +    /* see qcow2_open() */
> +    new_refblock_size = 1 << (s->cluster_bits - (refcount_order - 3));

Safe (cluster_bits is always at least 9, and at most 21 in our current
implementation, so we are shifting anywhere from 6 to 24 positions).

> +
> +    get_refcount_functions(refcount_order,
> +                           &new_get_refcount, &new_set_refcount);
> +
> +
> +    /* First, allocate the structures so they are present in the refcount
> +     * structures */
> +    ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
> +                             &new_reftable_size, NULL, new_refblock_size,
> +                             new_refcount_bits, &alloc_refblock, NULL,
> +                             status_cb, cb_opaque, 0, 2, errp);
> +    if (ret < 0) {
> +        goto done;
> +    }
> +
> +    /* The new_reftable_size is now valid and will not be changed anymore,
> +     * so we can now allocate the reftable */
> +    new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size *
> +                                                   sizeof(uint64_t));

And here is your bug, that I hinted at with the mention of [3] above.
This allocation can potentially cause an overflow of the existing
reftable to occupy one more cluster.  Remember my thought experiment
above, how an 8 megabyte image rolls from 1 to 2 clusters during the
course of allocating refblocks for the new table?  What if the original
image wasn't completely full, but things are perfectly sized with enough
free clusters, then all of the refblock allocations done during the
first walk will still fit, and it is only this final allocation of the
new reftable that will cause the rollover, at which point we've failed
to account for the new refblock size.  That is, I think I could craft an
image that would trigger either an assertion failure or an out-of-bounds
array access during the second walk.

> +    if (new_reftable_offset < 0) {
> +        error_setg_errno(errp, -new_reftable_offset,
> +                         "Failed to allocate the new reftable");
> +        ret = new_reftable_offset;
> +        goto done;

If we fail here, do we leak allocations of the refblocks?  I guess not;
based on another forward reference to point [5].

> +    }
> +
> +    new_reftable_index = 0;
> +
> +    /* Second, write the new refblocks */
> +    ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
> +                             &new_reftable_size, new_refblock,
> +                             new_refblock_size, new_refcount_bits,
> +                             &flush_refblock, new_set_refcount,
> +                             status_cb, cb_opaque, 1, 2, errp);
> +    if (ret < 0) {
> +        goto done;
> +    }

If we fail here, it looks like we DO leak the clusters allocated for the
new reftable (again, point [5]).

> +
> +
> +    /* Write the new reftable */
> +    ret = qcow2_pre_write_overlap_check(bs, 0, new_reftable_offset,
> +                                        new_reftable_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "Overlap check failed");
> +        goto done;
> +    }
> +
> +    for (i = 0; i < new_reftable_size; i++) {
> +        cpu_to_be64s(&new_reftable[i]);
> +    }
> +
> +    ret = bdrv_pwrite(bs->file, new_reftable_offset, new_reftable,
> +                      new_reftable_size * sizeof(uint64_t));
> +
> +    for (i = 0; i < new_reftable_size; i++) {
> +        be64_to_cpus(&new_reftable[i]);
> +    }
> +
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "Failed to write the new reftable");
> +        goto done;
> +    }

Looks like you correctly maintain the in-memory copy in preferred cpu
byte order, while writing to disk in big-endian order.

> +
> +
> +    /* Empty the refcount cache */
> +    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "Failed to flush the refblock cache");
> +        goto done;
> +    }
> +
> +    /* Update the image header to point to the new reftable; this only updates
> +     * the fields which are relevant to qcow2_update_header(); other fields
> +     * such as s->refcount_table or s->refcount_bits stay stale for now
> +     * (because we have to restore everything if qcow2_update_header() fails) */
> +    old_refcount_order  = s->refcount_order;
> +    old_reftable_size   = s->refcount_table_size;
> +    old_reftable_offset = s->refcount_table_offset;
> +
> +    s->refcount_order        = refcount_order;
> +    s->refcount_table_size   = new_reftable_size;
> +    s->refcount_table_offset = new_reftable_offset;
> +
> +    ret = qcow2_update_header(bs);
> +    if (ret < 0) {
> +        s->refcount_order        = old_refcount_order;
> +        s->refcount_table_size   = old_reftable_size;
> +        s->refcount_table_offset = old_reftable_offset;
> +        error_setg_errno(errp, -ret, "Failed to update the qcow2 header");
> +        goto done;
> +    }

Failures up to here still have issues leaking the new reftable
allocation (point [5]).

> +
> +    /* Now update the rest of the in-memory information */
> +    old_reftable = s->refcount_table;
> +    s->refcount_table = new_reftable;
> +
> +    /* For cleaning up all old refblocks */
> +    new_reftable      = old_reftable;
> +    new_reftable_size = old_reftable_size;
> +
> +    s->refcount_bits = 1 << refcount_order;
> +    if (refcount_order < 6) {
> +        s->refcount_max = (UINT64_C(1) << s->refcount_bits) - 1;
> +    } else {
> +        s->refcount_max = INT64_MAX;
> +    }

Is it worth factoring this computation into a common helper, since it
appeared in an earlier patch as well?

> +
> +    s->refcount_block_bits = s->cluster_bits - (refcount_order - 3);
> +    s->refcount_block_size = 1 << s->refcount_block_bits;
> +
> +    s->get_refcount = new_get_refcount;
> +    s->set_refcount = new_set_refcount;
> +
> +    /* And free the old reftable (the old refblocks are freed below the "done"
> +     * label) */
> +    qcow2_free_clusters(bs, old_reftable_offset,
> +                        old_reftable_size * sizeof(uint64_t),
> +                        QCOW2_DISCARD_NEVER);

site [2] mentioned above, as a possible point where you might want to
ensure the callback is called with equal progress and total values to
ensure the caller knows the job is done.  Except this site doesn't have
quite as much information as site [1] about what total size all the
other status callbacks were using.

> +
> +done:
> +    if (new_reftable) {
> +        /* On success, new_reftable actually points to the old reftable (and
> +         * new_reftable_size is the old reftable's size); but that is just
> +         * fine */
> +        for (i = 0; i < new_reftable_size; i++) {
> +            uint64_t offset = new_reftable[i] & REFT_OFFSET_MASK;
> +            if (offset) {
> +                qcow2_free_clusters(bs, offset, s->cluster_size,
> +                                    QCOW2_DISCARD_NEVER);
> +            }
> +        }
> +        g_free(new_reftable);

So here is point [5] - if we failed early, this tries to clean up all
allocated refblocks associated with the new table.  It does NOT clean up
any refblocks allocated due to resizing the old table to be slightly
larger, but that should be fine (not a leak, so much as an image that is
now a couple clusters larger than the minimum required size).  However,
while you clean up the clusters associated with refblocks (layer 2), the
cleanup of old clusters associated with the reftable (layer 1) happened
before the done: label on success, but that means that on failure, you
are NOT cleaning up the clusters associated with the new reftable.

> +    }
> +
> +    qemu_vfree(new_refblock);
> +    return ret;
> +}
> diff --git a/block/qcow2.h b/block/qcow2.h
> index fe12c54..5b96519 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -526,6 +526,10 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
>  int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset,
>                                    int64_t size);
>  
> +int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
> +                                BlockDriverAmendStatusCB *status_cb,
> +                                void *cb_opaque, Error **errp);
> +
>  /* qcow2-cluster.c functions */
>  int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
>                          bool exact_size);
> 

Interesting patch.  Hope my review helps you prepare a better v2.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 19/21] qcow2: Invoke refcount order amendment function
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 19/21] qcow2: Invoke refcount order amendment function Max Reitz
@ 2014-11-12  4:36   ` Eric Blake
  0 siblings, 0 replies; 75+ messages in thread
From: Eric Blake @ 2014-11-12  4:36 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 492 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Make use of qcow2_change_refcount_order() to support changing the
> refcount order with qemu-img amend.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 44 +++++++++++++++++++++++++++++++++++---------
>  1 file changed, 35 insertions(+), 9 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 20/21] qcow2: Point to amend function in check
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 20/21] qcow2: Point to amend function in check Max Reitz
@ 2014-11-12  4:38   ` Eric Blake
  0 siblings, 0 replies; 75+ messages in thread
From: Eric Blake @ 2014-11-12  4:38 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1174 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> If a reference count is not representable with the current refcount
> order, the image check should point to qemu-img amend for increasing the
> refcount order. However, qemu-img amend needs write access to the image
> which cannot be provided if the image is marked corrupt; and the image
> check will not mark the image consistent unless everything actually is
> consistent.
> 
> Therefore, if an image is marked corrupt and the image check encounters
> a reference count overflow, it cannot be fixed by using qemu-img amend
> to increase the refcount order. Instead, one has to use qemu-img convert
> to create a completely new copy of the image in this case.
> 
> Alternatively, we may want to give the user a way of manually removing
> the corrupt flag, maybe through qemu-img amend, but this is not part of
> this patch.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 11/21] iotests: Prepare for refcount_width option
  2014-11-11 17:57   ` Eric Blake
@ 2014-11-12  8:41     ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-12  8:41 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-11 at 18:57, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> Some tests do not work well with certain refcount widths (i.e. you
>> cannot create internal snapshots with refcount_width=1), so make those
>> widths unsupported.
>>
>> Furthermore, add another filter to _filter_img_create in common.filter
>> which filters out the refcount_width value.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/007           |  4 ++++
>>   tests/qemu-iotests/015           |  1 +
>>   tests/qemu-iotests/026           | 11 +++++++++++
>>   tests/qemu-iotests/029           |  1 +
>>   tests/qemu-iotests/051           |  1 +
>>   tests/qemu-iotests/058           |  1 +
>>   tests/qemu-iotests/067           |  7 +++++++
>>   tests/qemu-iotests/079           |  1 +
>>   tests/qemu-iotests/080           |  1 +
>>   tests/qemu-iotests/089           |  7 +++++++
>>   tests/qemu-iotests/090           |  1 +
>>   tests/qemu-iotests/108           |  6 ++++++
>>   tests/qemu-iotests/common.filter |  3 ++-
>>   13 files changed, 44 insertions(+), 1 deletion(-)
>>
>> diff --git a/tests/qemu-iotests/007 b/tests/qemu-iotests/007
>> index fe1a743..de39d1b 100755
>> --- a/tests/qemu-iotests/007
>> +++ b/tests/qemu-iotests/007
>> @@ -43,6 +43,10 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>>   _supported_fmt qcow2
>>   _supported_proto generic
>>   _supported_os Linux
>> +# refcount_width must be at least 4 bits so we can create ten internal snapshots
>> +# (1 bit supports none, 2 bits support three, 4 bits support 15)
> Feels like an off-by-one comment.  A width of 1 bit support a max
> refcount of 1 (therefore no snapshots), a width of 2 bits supports a max
> refcount of 3 (therefore 2 snapshots in addition to the original), a
> width of 4 bits supports a max refcount of 15 (therefore only 14 snapshots).

Telling you how to correctly get to the right maximum refcount and then 
getting it wrong myself is a bit embarrassing...

>> +++ b/tests/qemu-iotests/067
>> @@ -35,6 +35,13 @@ status=1	# failure is the default!
>>   _supported_fmt qcow2
>>   _supported_proto file
>>   _supported_os Linux
>> +# Because this would change the output of query-block
>> +_unsupported_imgopts 'refcount_width=1[^0-9]' \
>> +                     'refcount_width=2[^0-9]' \
>> +                     'refcount_width=4[^0-9]' \
>> +                     'refcount_width=8[^0-9]' \
>> +                     'refcount_width=32[^0-9]' \
>> +                     'refcount_width=64[^0-9]'
> It might be more compact to exploit globbing and just say:
>
> _unsupported_imgopts 'refcount_width=?[^6]'
>
> which leaves refcount_width=16 as the only pattern that doesn't match
> the glob.  But that feels more fragile, so I can live with your longer list.

Well, maybe using ?[^6] is even better because this isn't about ruling 
out the options 1, 2, 4, 8, 32 and 64, but rather only allowing 16. 
Thus, using ?[^6] seems more explicit.

>> +++ b/tests/qemu-iotests/089
>> @@ -41,6 +41,13 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>>   _supported_fmt qcow2
>>   _supported_proto file
>>   _supported_os Linux
>> +# Because this would change the output of qemu_io -c info
>> +_unsupported_imgopts 'refcount_width=1[^0-9]' \
> I like how you give reasons for some tests...
>
>> +++ b/tests/qemu-iotests/090
>> @@ -41,6 +41,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>>   _supported_fmt qcow2
>>   _supported_proto file nfs
>>   _supported_os Linux
>> +_unsupported_imgopts 'refcount_width=1[^0-9]'
> ...so why not do it for all tests?

Because the ones I didn't give reasons for are the ones I spotted first, 
so they seemed obvious to me. ;-) (I wondered the same thing when 
looking through the patches before submitting them, but decided to just 
leave it at that)

I'll add a comment for every test.

> At any rate, the patch makes sense, so whether or not you tweak comments,
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> I'm assuming that later in the series you add a test that explicitly
> covers the error message given when a refcount_order=0 (width=1) image
> is attempted to be used with snapshots, since that will fail (internal
> snapshots are simply not possible without a refcount that can't exceed 1).

Well, as you yourself explained, they are indeed possible if done right 
(immediate COW). But that'll go into another series.

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 12/21] qcow2: Allow creation with refcount order != 4
  2014-11-11 18:05   ` Eric Blake
@ 2014-11-12  8:47     ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-12  8:47 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-11 at 19:05, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> Add a creation option to qcow2 for setting the refcount order of images
>> to be created, and respect that option's value.
>>
>> This breaks some test outputs, fix them.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2.c              |  20 ++++++++
>>   include/block/block_int.h  |   1 +
>>   tests/qemu-iotests/049.out | 112 ++++++++++++++++++++++-----------------------
>>   tests/qemu-iotests/079.out |  18 ++++----
>>   tests/qemu-iotests/082.out |  41 ++++++++++++++---
>>   tests/qemu-iotests/085.out |  38 +++++++--------
>>   6 files changed, 139 insertions(+), 91 deletions(-)
> Is there any .json file that needs to be modified to allow this option
> to be set via QMP?  I guess that can be a followup.

Good point, but I can't find any JSON object which contains a 
"lazy-refcounts" (or "lazy_refcounts") field and which seems to be 
related to image creation (if "lazy-refcounts" is somewhere not relating 
to options for opening an existing image, "refcount-width" should be 
there, too). I guess there probably is no way of directly creating an 
image over QMP yet at all.

>>   qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- 1kilobyte
>> -qemu-img: Invalid image size specified! You may use k, M, G, T, P or E suffixes for
>> +qemu-img: Invalid image size specified! You may use k, M, G, T, P or E suffixes for
>>   qemu-img: kilobytes, megabytes, gigabytes, terabytes, petabytes and exabytes.
> Nice that you are also getting rid of trailing whitespace.

That was actually unintentional. You seem fine with it, though, so I'll 
remove all trailing whitespace in all the test outputs I'm touching in v2.

Max

> Reviewed-by: Eric Blake <eblake@redhat.com>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 14/21] qcow2: Use error_report() in qcow2_amend_options()
  2014-11-11 18:11   ` Eric Blake
@ 2014-11-12  8:47     ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-12  8:47 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-11 at 19:11, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2.c              | 14 ++++++--------
>>   tests/qemu-iotests/061.out | 14 +++++++-------
>>   2 files changed, 13 insertions(+), 15 deletions(-)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 21a1883..beb7187 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -2686,11 +2686,11 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
>>               } else if (!strcmp(compat, "1.1")) {
>>                   new_version = 3;
>>               } else {
>> -                fprintf(stderr, "Unknown compatibility level %s.\n", compat);
>> +                error_report("Unknown compatibility level %s.", compat);
> Not many error_report() locations include a trailing '.'
>
>> +++ b/tests/qemu-iotests/061.out
>> @@ -281,19 +281,19 @@ No errors were found on the image.
>>   === Testing invalid configurations ===
>>   
>>   Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
>> -Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
>> +qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
>>   qemu-img: Error while amending options: Invalid argument
>> -Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
>> +qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
>>   qemu-img: Error while amending options: Invalid argument
>> -Unknown compatibility level 0.42.
>> +qemu-img: Unknown compatibility level 0.42.
>>   qemu-img: Error while amending options: Invalid argument
>>   qemu-img: Invalid parameter 'foo'
>>   qemu-img: Invalid options for file format 'qcow2'
>> -Changing the cluster size is not supported.
>> +qemu-img: Changing the cluster size is not supported.
>>   qemu-img: Error while amending options: Operation not supported
>> -Changing the encryption flag is not supported.
>> +qemu-img: Changing the encryption flag is not supported.
>>   qemu-img: Error while amending options: Operation not supported
>> -Cannot change preallocation mode.
>> +qemu-img: Cannot change preallocation mode.
>>   qemu-img: Error while amending options: Operation not supported
> See - most of the messages do not end in '.'.  Probably worth cleaning
> up if you respin.  But it's not a show-stopper if you leave it, so:

I'll clean it up.

Max

> Reviewed-by: Eric Blake <eblake@redhat.com>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 15/21] qcow2: Use abort() instead of assert(false)
  2014-11-11 18:12   ` Eric Blake
@ 2014-11-12  8:48     ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-12  8:48 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-11 at 19:12, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> Is it worth hoisting this one into 2.2 via the -trivial tree?

No, as explained this point can only be reached if there is some 
creation option for qcow2 images which is not handled by any of the 
branches in this function. Since there is no such thing currently in 
master and there most certainly won't be in 2.2 (thanks to hard freeze), 
it's fine to keep it out of 2.2.

Max

>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index beb7187..ebf843f 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -2718,9 +2718,9 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
>>               error_report("Cannot change refcount entry width");
>>               return -ENOTSUP;
>>           } else {
>> -            /* if this assertion fails, this probably means a new option was
>> +            /* if this point is reached, this probably means a new option was
>>                * added without having it covered here */
>> -            assert(false);
>> +            abort();
>>           }
>>   
>>           desc++;
>>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes()
  2014-11-11 19:49           ` Eric Blake
@ 2014-11-12  8:52             ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-12  8:52 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-11 at 20:49, Eric Blake wrote:
> On 11/11/2014 09:18 AM, Max Reitz wrote:
>
>>> No, I was envisioning that we have a brand new image with one cluster
>>> allocated (cluster 1 has refcount 1), then 5 times in a row we do
>>> 'savevm' to take an internal snapshot.  If I understand your code
>>> correctly, the first two snapshots increase the refcount, so cluster 1
>>> has a refcount of 3. Then the next snapshot can't increase the refcount,
>>> so it instead copies the contents to cluster 2.
>> No, it just errors out.
>>
>> qcow2_alloc_bytes() is only used for allocating space for a compressed
>> cluster. When taking a snapshot, update_refcount() will be called to
>> increase the clusters' refcounts, and that function will simply throw an
>> error.
> That's okay for now (always better for an initial feature to be
> conservative, then expand it later if there is demand).  But I wonder if
> we could be made smarter in the future and auto-COW any cluster that
> would otherwise exceed max refcount.  Thus, for a refcount_order=0
> (width=1) image, a snapshot now doubles the size of the image (as every
> single cluster would COW into a new cluster) rather than erroring out.
> Food for thought; maybe worth injecting comments into this series
> (whether in code or in commit messages, as appropriate) pointing out
> that we thought about the future possibility even though we chose not to
> allow it for now.

Ah, right, thank you. Yes, that sounds like a good idea, I'll see to it 
at some later point in time.

I think adding comments will be hard because the snapshot functions 
aren't really modified. They just try to increase the refcount and that 
may now fail earlier than it did for a refcount width of 16 bits, so 
there's no real change in behavior there, it's just that it's now 
reasonably possible to hit that case. I will add appropriate comments to 
the test case (which tests this snapshotting issue), though.

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 21/21] iotests: Add test for different refcount widths
  2014-11-11 19:53   ` Eric Blake
@ 2014-11-12  8:58     ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-12  8:58 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-11 at 20:53, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> Add a test for conversion between different refcount widths and errors
>> specific to certain widths (i.e. snapshots with refcount_width=1).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/112     | 225 +++++++++++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/112.out | 123 +++++++++++++++++++++++++
>>   tests/qemu-iotests/group   |   1 +
>>   3 files changed, 349 insertions(+)
>>   create mode 100755 tests/qemu-iotests/112
>>   create mode 100644 tests/qemu-iotests/112.out
>>
>> +
>> +# This tests qocw2-specific low-level functionality
>> +_supported_fmt qcow2
>> +_supported_proto file
>> +_supported_os Linux
> Might work on more than just Linux, but then again, it's probably worth
> scrubbing the whole testsuite for situations like that, so don't worry
> about it here.
>
>> +# This test will set refcount_width on its own which would conflict with the
>> +# manual setting; compat will be overridden as well
>> +_unsupported_imgopts refcount_width 'compat=0.10'
>> +
>> +function print_refcount_width()
>> +{
>> +    $QEMU_IMG info "$TEST_IMG" | grep 'refcount width:' | sed -e 's/^ *//'
> grep|sed is almost always a waste.  This is equivalent:
>
>     $QEMU_IMG info "$TEST_IMG" | sed -n '/refcount width:/ s/^ *//p'

As said before, I just don't know sed well enough. My knowledge of sed 
is "you can use it for regex replacement with -e", and that's about it. 
Oh, and "you can do it in-place with -i" and "someone wrote Sokoban in 
sed". And maybe even "sed better fits the term Standard EDitor than ed 
does".

But thanks a lot for telling me, I'm just always afraid to learn sed 
because it seems even more unreadable than perl to me...

>> +echo
>> +echo '=== Snapshot limit on refcount_width=1 ==='
>> +echo
>> +
>> +IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
>> +print_refcount_width
>> +
>> +$QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
>> +
>> +# Should fail
>> +$QEMU_IMG snapshot -c foo "$TEST_IMG"
>> +
>> +# The new L1 table could/shoud be leaked
> s/shoud/should/

Right.

>> +_check_test_img
>> +
>> +echo
>> +echo '=== Snapshot limit on refcount_width=2 ==='
>> +echo
>> +
>> +IMGOPTS="$IMGOPTS,refcount_width=2" _make_test_img 64M
>> +print_refcount_width
>> +
>> +$QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
>> +
>> +# Should succeed
>> +$QEMU_IMG snapshot -c foo "$TEST_IMG"
>> +$QEMU_IMG snapshot -c bar "$TEST_IMG"
>> +# Should fail (4th reference)
>> +$QEMU_IMG snapshot -c baz "$TEST_IMG"
>> +
>> +# The new L1 table could/shoud be leaked
> again

yyp is dangerous.

>> +echo
>> +echo '=== Amend with snapshot ==='
>> +echo
>> +
>> +$QEMU_IMG snapshot -c foo "$TEST_IMG"
>> +# Just to have different refcounts across the image
>> +$QEMU_IO -c 'write 0 16M' "$TEST_IMG" | _filter_qemu_io
>> +
>> +# Should not work
>> +$QEMU_IMG amend -o refcount_width=1 "$TEST_IMG"
>> +_check_test_img
>> +print_refcount_width
> This matches your initial implementation. Someday, though, we may decide
> to auto-COW any overflowed cluster, and thus allow the conversion to
> succeed.  Worth a comment?

Yes, will do.

>> +echo '=== Testing too many references for check ==='
>> +echo
>> +
>> +IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
>> +print_refcount_width
>> +
>> +# This cluster should be created at 0x50000
>> +$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
>> +# Now make the second L2 entriy (the L2 table should be at 0x40000) point to
> s/entriy/entry/

I think this happened because at one point in time it said something 
about "L2 entries".

>> +# success, all done
>> +echo '*** done'
>> +rm -f $seq.full
>> +status=0
> Overall a nice set of tests!
>
>
>> +=== Snapshot limit on refcount_width=1 ===
>> +
>> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
>> +refcount width: 1
>> +wrote 512/512 bytes at offset 0
>> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>> +qemu-img: Could not create snapshot 'foo': -22 (Invalid argument)
>> +Leaked cluster 6 refcount=1 reference=0
> Bummer that the error message did not state WHY (because a cluster would
> overflow refcounts), but I'm not sure how hard it would be to make that
> better, and at least we correctly errored out.

Yes, I know, it's a really bad error. The problem is that no Error 
object is used in that path at all so it will be rather cumbersome, but 
I'll look into it one more time.

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 17/21] qcow2: Use intermediate helper CB for amend
  2014-11-11 21:05   ` Eric Blake
@ 2014-11-12  9:10     ` Max Reitz
  0 siblings, 0 replies; 75+ messages in thread
From: Max Reitz @ 2014-11-12  9:10 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-11 at 22:05, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> If there is more than one time-consuming operation to be performed for
>> qcow2_amend_options(), we need an intermediate CB which coordinates the
>> progress of the individual operations and passes the result to the
>> original status callback.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 75 insertions(+), 1 deletion(-)
> Getting trickier to review.

Yes, and 18 is the worst.

>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index eaef251..e6b93d1 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -2655,6 +2655,71 @@ static int qcow2_downgrade(BlockDriverState *bs, int target_version,
>>       return 0;
>>   }
>>   
>> +typedef enum Qcow2AmendOperation {
>> +    /* This is the value Qcow2AmendHelperCBInfo::last_operation will be
>> +     * statically initialized to so that the helper CB can discern the first
>> +     * invocation from an operation change */
>> +    QCOW2_NO_OPERATION = 0,
>> +
>> +    QCOW2_DOWNGRADING,
>> +} Qcow2AmendOperation;
> So for this patch, you still have just one operation, but later in the
> series, you add a second (and the goal of THIS patch is that it will
> work even if there are 3 or more operations, even though this series
> doesn't add that many).

Right.

>> +static void qcow2_amend_helper_cb(BlockDriverState *bs, int64_t offset,
>> +                                 int64_t total_work_size, void *opaque)
> indentation looks off

Right, will fix.

>> +{
>> +    Qcow2AmendHelperCBInfo *info = opaque;
>> +    int64_t current_work_size;
>> +    int64_t projected_work_size;
> Worth asserting that info->total_operations is non-zero?  Or is there
> ever a valid case for calling the callback even when there are no
> sub-operations, and therefore we are automatically complete (offset ==
> total_work_size)?

No, the driver does not have to call the status CB; qemu-img amend will 
mind that case by first printing 0 %, then invoking the amend operation, 
and then printing 100 %. Will add an assertion.

>> +
>> +    if (info->current_operation != info->last_operation) {
>> +        if (info->last_operation != QCOW2_NO_OPERATION) {
>> +            info->offset_completed += info->last_work_size;
>> +            info->operations_completed++;
>> +        }
> Would it be any easier to guarantee that we come to 100% completion by
> requiring the coordinator to pass a final completion callback? [1]
>   info->current_operation = QCOW2_NO_OPERATION;
>   cb(bs, 0, 0, info)

No, because the amend CB does not have to be called on completion; 
img_amend() in qemu-img.c takes care of that case.

>> +
>> +        info->last_operation = info->current_operation;
>> +    }
>> +
>> +    info->last_work_size = total_work_size;
> Took me a while to realize that total_work_size is the incoming
> (estimated) total size for the current sub-operation, and not the total
> over the combination of all sub-operations...

Ah, right, I'll change the variable name, maybe to operation_work_size 
or something like that.

>> +
>> +    current_work_size = info->offset_completed + total_work_size;
>> +
>> +    /* current_work_size is the total work size for (operations_completed + 1)
> but this comment helped.
>
>> +     * operations (which includes this one), so multiply it by the number of
>> +     * operations not covered and divide it by the number of operations
>> +     * covered to get a projection for the operations not covered */
>> +    projected_work_size = current_work_size * (info->total_operations -
>> +                                               info->operations_completed - 1)
>> +                                            / (info->operations_completed + 1);
> So, when there is just one sub-operation (which is the case until later
> patches add a second), this results in the following calculation for ALL
> calls during the intermediate steps of the sub-operation:
>
> projected_work_size = total_work_size * (1 - 0 - 1) / (0 + 1)
>
> that is, we are projecting 0 additional work because we have zero
> additional stages to complete.  Am I correct that we will never enter
> the callback in a state where
> info->operations_completed==info->total_operations?

Yes, we won't.

> (because if we do,
> you'd have a computation of final_size * (1 - 1 - 1) / (1 + 1) which
> looks weird).  Worth an assert()?

assert()s are always worth it.

> Then again, my proposal above [1] to
> guarantee a 100% completion by use of a final cleanup callback would
> indeed reach the point where operations_completed==total_operations.
>
>> +
>> +    info->original_status_cb(bs, info->offset_completed + offset,
>> +                             current_work_size + projected_work_size,
>> +                             info->original_cb_opaque);
> So, as long as we don't add a second phase, this is strictly equivalent
> to calling the original callback with the original offset (since
> info->offset_completed remains 0) and original work size (since
> projected_work_size remains 0).  That part works fine.
>
> Let's see what happens if we had three phases.  To make it more
> interesting, let's pick some numbers - how about the first phase
> progresses from 0-10, the second from 0-100, and the third from 0-10,
> and where none of the sub-operations change predicted total_work_size.
> The caller would first set info->current_operation to 1, then call the
> callback a few times; how about twice with 5/10 and 10/10.  For both
> calls, current_work_size is 0+10, then projected_work_size is
> 10*(3-0-1)/(0+1) == 20, and we call the original callback with
> (0+5)/(10+20) and (0+10)/(10+20).  Pretty good (5/30 and 10/30 are right
> on if the first sub-command is exactly one-third of the time of the
> overall command; and even if it is not, it still shows reasonable progress).
>
> Then we move on to the second sub-command where the coordinator updates
> info->current_operation to 2 before triggering several callbacks; let's
> suppose it reports at 0/100, 30/100, 60/100, and 100/100.  The first
> call updates info to track that we've detected a change in sub-command
> (offset_completed is now 10, operations_completed is now 1).  Then for
> all four calls, current_work_size is 10+100, and projected_work_size is
> 110*(3-1-1)/(1+1) == 55.  So we call the original callback with
> (10+0)/(110+55), (10+30)/(110+55), (10+60)/(110+55), (10+100)/(110+55).
>   The first report of 10/165 looks like we jumped backwards (much smaller
> progress than our previous report of 10/30), but that's merely a
> representation that this phase is estimating a larger total_work count,
> and we have no way of correlating whether 1 unit of work count in each
> phase is equivalent to an equal amount of time.  But by the end, we
> report 110/165, which is spot on for being two-thirds complete.
>
> Another assignment to info->current_operation, and a couple more
> callbacks; let's again use 5/10 and 10/10.  The first callback updates
> info (offset_completed is now 110, operations_completed is now 2).  For
> each call, current_work_size is 110+10, and projected_work_size is
> 120*(3-2-1/(2+1) == 0.  We call the original callback with
> (120+5)/(120+10) and (120+10)/(120+10).  We've done a very rapid jump
> from 2/3 to 125/130, but end the overall operation with the two values
> equal.  So the function is not very smooth, but at least it is as good
> an estimate as possible along each stage of the operation, and we never
> violate the premise of reporting equal values until all sub-commands are
> complete.

Yes, it's not pretty but it's the best we can do without either 
hard-coding some estimations or trying some kind of dry-run to determine 
each operation's work size beforehand.

>> +}
>> +
>>   static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
>>                                  BlockDriverAmendStatusCB *status_cb,
>>                                  void *cb_opaque)
>> @@ -2669,6 +2734,7 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
>>       bool encrypt;
>>       int ret;
>>       QemuOptDesc *desc = opts->list->desc;
>> +    Qcow2AmendHelperCBInfo helper_cb_info;
>>   
>>       while (desc && desc->name) {
>>           if (!qemu_opt_find(opts, desc->name)) {
>> @@ -2726,6 +2792,12 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
>>           desc++;
>>       }
>>   
>> +    helper_cb_info = (Qcow2AmendHelperCBInfo){
>> +        .original_status_cb = status_cb,
>> +        .original_cb_opaque = cb_opaque,
>> +        .total_operations = (new_version < old_version)
>> +    };
> Slick.
>
>> +
>>       /* Upgrade first (some features may require compat=1.1) */
>>       if (new_version > old_version) {
>>           s->qcow_version = new_version;
>> @@ -2784,7 +2856,9 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
>>   
>>       /* Downgrade last (so unsupported features can be removed before) */
>>       if (new_version < old_version) {
>> -        ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
>> +        helper_cb_info.current_operation = QCOW2_DOWNGRADING;
>> +        ret = qcow2_downgrade(bs, new_version, &qcow2_amend_helper_cb,
>> +                              &helper_cb_info);
> Looks correct to me. Other than the indentation issue and possible
> addition of some asserts, this is good to go.
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment
  2014-11-12  4:15   ` Eric Blake
@ 2014-11-12  9:55     ` Max Reitz
  2014-11-12 13:50       ` Eric Blake
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-12  9:55 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-12 at 05:15, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> Add a function qcow2_change_refcount_order() which allows changing the
>> refcount order of a qcow2 image.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 424 +++++++++++++++++++++++++++++++++++++++++++++++++
>>   block/qcow2.h          |   4 +
>>   2 files changed, 428 insertions(+)
> This is fairly big; you may want to get a second review from a
> maintainer rather than blindly trusting me.

I didn't really see the point in splitting it up. Introducing the static 
helper functions first would not be very helpful either, so I thought 
what I'd do as a reviewer, and that was "apply the patch and just read 
through all the code". Splitting it into multiple patches would not have 
helped there (and I don't see how I could split this patch into logical 
changes, where at the start we have some inefficient but simple 
implementation and it gets better over time).

I'll need a review from a maintainer anyway, but I won't get one without 
being able to show another review first...

> My review was not linear, but I left the email in linear order.  Feel
> free to ask for clarification if my presentation is too hard to follow.
>
>> +
>> +/**
>> + * This "operation" for walk_over_reftable() allocates the refblock on disk (if
>> + * it is not empty) and inserts its offset into the new reftable. The size of
>> + * this new reftable is increased as required.
>> + */
>> +static int alloc_refblock(BlockDriverState *bs, uint64_t **reftable,
>> +                          uint64_t reftable_index, uint64_t *reftable_size,
>> +                          void *refblock, bool refblock_empty, Error **errp)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int64_t offset;
>> +
>> +    if (!refblock_empty && reftable_index >= *reftable_size) {
>> +        uint64_t *new_reftable;
>> +        uint64_t new_reftable_size;
>> +
>> +        new_reftable_size = ROUND_UP(reftable_index + 1,
>> +                                     s->cluster_size / sizeof(uint64_t));
>> +        if (new_reftable_size > QCOW_MAX_REFTABLE_SIZE / sizeof(uint64_t)) {
>> +            error_setg(errp,
>> +                       "This operation would make the refcount table grow "
>> +                       "beyond the maximum size supported by QEMU, aborting");
>> +            return -ENOTSUP;
>> +        }
>> +
>> +        new_reftable = g_try_realloc(*reftable, new_reftable_size *
>> +                                                sizeof(uint64_t));
> Safe from overflow based on checks a few lines earlier.  Good.
>
>> +        if (!new_reftable) {
>> +            error_setg(errp, "Failed to increase reftable buffer size");
>> +            return -ENOMEM;
>> +        }
>> +
>> +        memset(new_reftable + *reftable_size, 0,
>> +               (new_reftable_size - *reftable_size) * sizeof(uint64_t));
>> +
>> +        *reftable      = new_reftable;
>> +        *reftable_size = new_reftable_size;
> Just to check my math here:
>
> Suppose we have an image with 512-byte clusters, and are changing from
> 16-bit refcount (order 4) to 64-bit refcount.  Also, suppose the
> existing image has exactly filled a one-cluster refcount table (that is,
> there are 64 refcount blocks, each describing at a refblock with all 256
> refcount entries full, for a total image size of exactly 8M).  The
> original image occupies a header (1 cluster), L1 and L2 tables, and
> data; but 65 of the 16k clusters tied up in the image are dedicated to
> the refcount structures.
>
> Meanwhile, the new refcount table will have to point to 256 refcount
> blocks, each holding only 64 entries, which in turn implies that the
> refcount table now has to be at least 4 clusters long.  But as this
> requires at least 260 clusters to represent, then even if we were able
> to reuse the 65 clusters of the original table, we'd still be allocating
> at least 195 clusters; in reality, your code doesn't free any old
> clusters until after allocating the new, because it is easier to keep
> the old table live until the new table is populated.  The process of
> allocating the new clusters means we actually end up with a new refcount
> table of 5 clusters long, where not all 320 refblocks will be populated.
>   But as long as we are keeping the old table up-to-date for the refblock
> allocations, it ALSO means that we caused a rollover of the old table
> from 1 cluster into 2, which itself consumes several clusters (the
> larger table must be contiguous, and we must also set up a refblock to
> describe the larger table, so we've added at least three clusters
> associated to the original table during the course of preparing the new
> table).
>
> Hmm - that means I found a bug in your implementation.  See [3] below.
>
>> +    }
>> +
>> +    if (refblock_empty) {
>> +        if (reftable_index < *reftable_size) {
>> +            (*reftable)[reftable_index] = 0;
> Necessary since you used g_try_realloc which leaves the new reftable
> uninitialized.  Reasonable (rather than a memset) since the caller will
> be visiting every single refblock in the table anyways.
>
>> +        }
>> +    } else {
>> +        offset = qcow2_alloc_clusters(bs, s->cluster_size);
> As mentioned above, this action will potentially change
> s->refcount_table_size of the original table, which in turn makes the
> caller execute its loops more often to cover the increased allocation.
> Does qcow2_alloc_clusters() guarantee that the just-allocated cluster is
> zero-initialized (and/or should we add a flag to the function to allow
> the caller to choose whether to force zero allocation instead of leaving
> uninitialized)?  See [4] below for why I ask.
>
>> +        if (offset < 0) {
>> +            error_setg_errno(errp, -offset, "Failed to allocate refblock");
>> +            return offset;
>> +        }
>> +        (*reftable)[reftable_index++] = offset;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * This "operation" for walk_over_reftable() writes the refblock to disk at the
>> + * offset specified by the new reftable's entry. It does not modify the new
>> + * reftable or change any refcounts.
>> + */
>> +static int flush_refblock(BlockDriverState *bs, uint64_t **reftable,
>> +                          uint64_t reftable_index, uint64_t *reftable_size,
>> +                          void *refblock, bool refblock_empty, Error **errp)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int64_t offset;
>> +    int ret;
>> +
>> +    if (refblock_empty) {
>> +        if (reftable_index < *reftable_size) {
>> +            assert((*reftable)[reftable_index] == 0);
>> +        }
>> +    } else {
>> +        /* The first pass with alloc_refblock() made the reftable large enough
>> +         */
>> +        assert(reftable_index < *reftable_size);
> Okay, I see why you couldn't hoist this assert outside of the if - the
> caller may call this with refblock_empty for any refblocks at the tail
> of the final partial reftable cluster.
>
>> +        offset = (*reftable)[reftable_index];
>> +        assert(offset != 0);
>> +
>> +        ret = qcow2_pre_write_overlap_check(bs, 0, offset, s->cluster_size);
>> +        if (ret < 0) {
>> +            error_setg_errno(errp, -ret, "Overlap check failed");
>> +            return ret;
>> +        }
>> +
>> +        ret = bdrv_pwrite(bs->file, offset, refblock, s->cluster_size);
>> +        if (ret < 0) {
>> +            error_setg_errno(errp, -ret, "Failed to write refblock");
>> +            return ret;
> If we fail here, do we leak all clusters written so far?  At least the
> image is still consistent.  After reading further, I think I answered
> myself at point [5].
>
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/**
>> + * This function walks over the existing reftable and every referenced refblock;
>> + * if @new_set_refcount is non-NULL, it is called for every refcount entry to
>> + * create an equal new entry in the passed @new_refblock. Once that
>> + * @new_refblock is completely filled, @operation will be called.
>> + *
>> + * @operation is expected to combine the @new_refblock and its entry in the new
>> + * reftable (which is described by the parameters starting with "reftable").
>> + * @refblock_empty is set if all entries in the refblock are zero.
>> + *
>> + * @status_cb and @cb_opaque are used for the amend operation's status callback.
>> + * @index is the index of the walk_over_reftable() calls and @total is the total
>> + * number of walk_over_reftable() calls per amend operation. Both are used for
>> + * calculating the parameters for the status callback.
> Nice writeup; I was referring to it frequently during review.
>
>> + */
>> +static int walk_over_reftable(BlockDriverState *bs, uint64_t **new_reftable,
>> +                              uint64_t *new_reftable_index,
>> +                              uint64_t *new_reftable_size,
>> +                              void *new_refblock, int new_refblock_size,
>> +                              int new_refcount_bits,
>> +                              int (*operation)(BlockDriverState *bs,
>> +                                               uint64_t **reftable,
>> +                                               uint64_t reftable_index,
>> +                                               uint64_t *reftable_size,
>> +                                               void *refblock,
>> +                                               bool refblock_empty,
>> +                                               Error **errp),
> Worth a typedef?  Maybe not; I managed.
>
>> +                              Qcow2SetRefcountFunc *new_set_refcount,
>> +                              BlockDriverAmendStatusCB *status_cb,
>> +                              void *cb_opaque, int index, int total,
>> +                              Error **errp)
> After several reads of the patch, I see that this walk function gets
> called twice - first with a NULL new_set_refcount (merely to figure out
> how big the new reftable should be, as well as allocating all necessary
> non-zero refcount blocks, but not committing the top-level reftable to
> any particular file location); the second walk then commits the new
> refcounts to disk (updating each non-zero entry in all the new refcount
> blocks to match their original counterparts, but no allocation
> required).  Pretty slick to ensure that we are sure that the new table
> is feasible before actually swapping over to it, while still allowing a
> fairly clean rollback on early failure.
>
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    uint64_t reftable_index;
>> +    bool new_refblock_empty = true;
>> +    int refblock_index;
>> +    int new_refblock_index = 0;
>> +    int ret;
>> +
>> +    for (reftable_index = 0; reftable_index < s->refcount_table_size;
>> +         reftable_index++)
> Outer loop - for each cluster of the top-level reference table, visit
> each child table and update the status callback.  On the first walk,
> s->refcount_table_size might be increasing during calls to operation().
>
>> +    {
>> +        uint64_t refblock_offset = s->refcount_table[reftable_index]
>> +                                 & REFT_OFFSET_MASK;
>> +
>> +        status_cb(bs, (uint64_t)index * s->refcount_table_size + reftable_index,
>> +                  (uint64_t)total * s->refcount_table_size, cb_opaque);
>> +
> This never quite reaches 100%, and the caller also never reaches 100%.
> I think you want one more call to status_cb() at the end of the loop (at
> either site [1] or [2]) that passes an equal index and total to make it
> obvious that this (portion of the) long-running conversion is complete.
>   Since s->refcount_table_size may grow during the loop, the callback
> does not necessarily have a constant total size; good thing we already
> documented that progress bars need not have a constant total.
>
>> +        if (refblock_offset) {
>> +            void *refblock;
>> +
>> +            if (offset_into_cluster(s, refblock_offset)) {
>> +                qcow2_signal_corruption(bs, true, -1, -1, "Refblock offset %#"
>> +                                        PRIx64 " unaligned (reftable index: %#"
>> +                                        PRIx64 ")", refblock_offset,
>> +                                        reftable_index);
>> +                error_setg(errp,
>> +                           "Image is corrupt (unaligned refblock offset)");
>> +                return -EIO;
>> +            }
>> +
>> +            ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offset,
>> +                                  &refblock);
>> +            if (ret < 0) {
>> +                error_setg_errno(errp, -ret, "Failed to retrieve refblock");
>> +                return ret;
>> +            }
>> +
>> +            for (refblock_index = 0; refblock_index < s->refcount_block_size;
>> +                 refblock_index++)
>> +            {
> If a child table (refcount block) exists, visit each refcount entry
> within the table (at least one refcount in that visit should be
> non-empty, otherwise we could garbage-collect the refblock and put a 0
> entry in the outer loop).
>
>> +                uint64_t refcount;
>> +
>> +                if (new_refblock_index >= new_refblock_size) {
>> +                    /* new_refblock is now complete */
>> +                    ret = operation(bs, new_reftable, *new_reftable_index,
>> +                                    new_reftable_size, new_refblock,
>> +                                    new_refblock_empty, errp);
> The new refcount table will either be filled faster than the original
> (when going from small to large refcount - calling operation() multiple
> times per inner loop) or will be filled slower than the original (when
> going from large to small; operation() will only be called after several
> outer loops).
>
>> +                    if (ret < 0) {
>> +                        qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
>> +                        return ret;
>> +                    }
>> +
>> +                    (*new_reftable_index)++;
>> +                    new_refblock_index = 0;
>> +                    new_refblock_empty = true;
>> +                }
>> +
>> +                refcount = s->get_refcount(refblock, refblock_index);
>> +                if (new_refcount_bits < 64 && refcount >> new_refcount_bits) {
> Technically, this get_refcount() call is dead code on the second walk,
> since the first walk already validated things, so you could push all of
> this code...
>
>> +                    uint64_t offset;
>> +
>> +                    qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
>> +
>> +                    offset = ((reftable_index << s->refcount_block_bits)
>> +                              + refblock_index) << s->cluster_bits;
>> +
>> +                    error_setg(errp, "Cannot decrease refcount entry width to "
>> +                               "%i bits: Cluster at offset %#" PRIx64 " has a "
>> +                               "refcount of %" PRIu64, new_refcount_bits,
>> +                               offset, refcount);
>> +                    return -EINVAL;
>> +                }
>> +
>> +                if (new_set_refcount) {
>> +                    new_set_refcount(new_refblock, new_refblock_index++, refcount);
>> +                } else {
> ...here, in the branch only run on the first walk.

Well, yes, but I wanted to keep this function as agnostic to what the 
caller wants to do with it as possible. I'd rather decide depending on 
whether index == 0 because that's a better way of discerning the first walk.

>> +                    new_refblock_index++;
>> +                }
>> +                new_refblock_empty = new_refblock_empty && refcount == 0;
> Worth condensing to 'new_refblock_empty &= !refcount'?  Maybe not.

I personally would find that harder to read.

>> +            }
>> +
>> +            ret = qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
>> +            if (ret < 0) {
>> +                error_setg_errno(errp, -ret, "Failed to put refblock back into "
>> +                                 "the cache");
>> +                return ret;
>> +            }
>> +        } else {
>> +            /* No refblock means every refcount is 0 */
>> +            for (refblock_index = 0; refblock_index < s->refcount_block_size;
>> +                 refblock_index++)
> Again, visiting each (implied) entry for the given refcount block of the
> outer loop.  When enlarging the width, each of the new blocks will also
> be all zero; but when shrinking the width, even though all entries on
> this pass are zero, we may be combining this pass with another outer
> loop with non-zero data for a non-zero block in the resulting new table.
>
>> +            {
>> +                if (new_refblock_index >= new_refblock_size) {
>> +                    /* new_refblock is now complete */
>> +                    ret = operation(bs, new_reftable, *new_reftable_index,
>> +                                    new_reftable_size, new_refblock,
>> +                                    new_refblock_empty, errp);
>> +                    if (ret < 0) {
>> +                        return ret;
>> +                    }
>> +
>> +                    (*new_reftable_index)++;
>> +                    new_refblock_index = 0;
>> +                    new_refblock_empty = true;
>> +                }
>> +
>> +                if (new_set_refcount) {
>> +                    new_set_refcount(new_refblock, new_refblock_index++, 0);
> Would it be worth guaranteeing that every new refblock is 0-initialized
> when allocated, so that you can skip setting a refcount to 0?  This
> question depends on the answer about block allocation asked at [4] above.

This function sets a value in the buffer new_refblock, not in the 
cluster on disk. Therefore, in order to be able to omit this call, we'd 
have to call a memset() with 0 on new_refblock after each call to 
operation(). I don't think it's worth it. This is more explicit and 
won't cost much performance.

>> +                } else {
>> +                    new_refblock_index++;
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    if (new_refblock_index > 0) {
>> +        /* Complete the potentially existing partially filled final refblock */
>> +        if (new_set_refcount) {
>> +            for (; new_refblock_index < new_refblock_size;
>> +                 new_refblock_index++)
>> +            {
>> +                new_set_refcount(new_refblock, new_refblock_index, 0);
> Again, if you 0-initialize refblocks when allocated, you could skip this
> (another instance of [4] above).
>
>> +            }
>> +        }
>> +
>> +        ret = operation(bs, new_reftable, *new_reftable_index,
>> +                        new_reftable_size, new_refblock, new_refblock_empty,
>> +                        errp);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>> +
>> +        (*new_reftable_index)++;
>> +    }
> site [1] mentioned above, as a good place to make a final status
> callback at 100%.  But if you do it here, it means that we call the
> status callback twice with the same values (the 100% value of the first
> loop is the 0% value of the second loop) - not the end of the world, but
> may impact any testsuite that tracks progress reports.

Well, why not.

>> +
>> +    return 0;
>> +}
>> +
>> +int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
>> +                                BlockDriverAmendStatusCB *status_cb,
>> +                                void *cb_opaque, Error **errp)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    Qcow2GetRefcountFunc *new_get_refcount;
>> +    Qcow2SetRefcountFunc *new_set_refcount;
>> +    void *new_refblock = qemu_blockalign(bs->file, s->cluster_size);
>> +    uint64_t *new_reftable = NULL, new_reftable_size = 0;
>> +    uint64_t *old_reftable, old_reftable_size, old_reftable_offset;
>> +    uint64_t new_reftable_index = 0;
>> +    uint64_t i;
>> +    int64_t new_reftable_offset;
>> +    int new_refblock_size, new_refcount_bits = 1 << refcount_order;
>> +    int old_refcount_order;
>> +    int ret;
>> +
>> +    assert(s->qcow_version >= 3);
>> +    assert(refcount_order >= 0 && refcount_order <= 6);
>> +
>> +    /* see qcow2_open() */
>> +    new_refblock_size = 1 << (s->cluster_bits - (refcount_order - 3));
> Safe (cluster_bits is always at least 9, and at most 21 in our current
> implementation, so we are shifting anywhere from 6 to 24 positions).
>
>> +
>> +    get_refcount_functions(refcount_order,
>> +                           &new_get_refcount, &new_set_refcount);
>> +
>> +
>> +    /* First, allocate the structures so they are present in the refcount
>> +     * structures */
>> +    ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
>> +                             &new_reftable_size, NULL, new_refblock_size,
>> +                             new_refcount_bits, &alloc_refblock, NULL,
>> +                             status_cb, cb_opaque, 0, 2, errp);
>> +    if (ret < 0) {
>> +        goto done;
>> +    }
>> +
>> +    /* The new_reftable_size is now valid and will not be changed anymore,
>> +     * so we can now allocate the reftable */
>> +    new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size *
>> +                                                   sizeof(uint64_t));
> And here is your bug, that I hinted at with the mention of [3] above.
> This allocation can potentially cause an overflow of the existing
> reftable to occupy one more cluster.

An additional bug is that the new reftable may be referenced by an 
existing refblock which was completely empty, though (or at least 
referenced by part of an existing refblock which was to be turned into a 
new refblock which was then completely empty, and thus omitted from 
allocation).

> Remember my thought experiment
> above, how an 8 megabyte image rolls from 1 to 2 clusters during the
> course of allocating refblocks for the new table?  What if the original
> image wasn't completely full, but things are perfectly sized with enough
> free clusters, then all of the refblock allocations done during the
> first walk will still fit, and it is only this final allocation of the
> new reftable that will cause the rollover, at which point we've failed
> to account for the new refblock size.  That is, I think I could craft an
> image that would trigger either an assertion failure or an out-of-bounds
> array access during the second walk.

You're completely right, this will be a pain to fix, though... The 
simplest way would probably to check whether the new_reftable_size 
should be increased due to this operation and if it did, rerun 
walk_over_reftable() with the alloc_refblock() function only allocating 
refblocks if the new reftable does not already point to a refblock for a 
certain index. This would be repeated until the new_reftable_size is 
constant. And the really simplest incarnation of this would be to have a 
flag whether any allocations were done and repeat until everything is fine.

Another way would be to somehow integrate the allocation of the new 
reftable into walk_over_reftable() and then to only mind the additional 
reftable entries.

But probably the first way is the correct one because due to 
reallocation of the old reftable, some intermediate refblocks which were 
empty before are now filled.

I'll have to craft some test images myself, not least to be able to 
include them in the iotest.

>> +    if (new_reftable_offset < 0) {
>> +        error_setg_errno(errp, -new_reftable_offset,
>> +                         "Failed to allocate the new reftable");
>> +        ret = new_reftable_offset;
>> +        goto done;
> If we fail here, do we leak allocations of the refblocks?  I guess not;
> based on another forward reference to point [5].
>
>> +    }
>> +
>> +    new_reftable_index = 0;
>> +
>> +    /* Second, write the new refblocks */
>> +    ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
>> +                             &new_reftable_size, new_refblock,
>> +                             new_refblock_size, new_refcount_bits,
>> +                             &flush_refblock, new_set_refcount,
>> +                             status_cb, cb_opaque, 1, 2, errp);
>> +    if (ret < 0) {
>> +        goto done;
>> +    }
> If we fail here, it looks like we DO leak the clusters allocated for the
> new reftable (again, point [5]).
>
>> +
>> +
>> +    /* Write the new reftable */
>> +    ret = qcow2_pre_write_overlap_check(bs, 0, new_reftable_offset,
>> +                                        new_reftable_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        error_setg_errno(errp, -ret, "Overlap check failed");
>> +        goto done;
>> +    }
>> +
>> +    for (i = 0; i < new_reftable_size; i++) {
>> +        cpu_to_be64s(&new_reftable[i]);
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs->file, new_reftable_offset, new_reftable,
>> +                      new_reftable_size * sizeof(uint64_t));
>> +
>> +    for (i = 0; i < new_reftable_size; i++) {
>> +        be64_to_cpus(&new_reftable[i]);
>> +    }
>> +
>> +    if (ret < 0) {
>> +        error_setg_errno(errp, -ret, "Failed to write the new reftable");
>> +        goto done;
>> +    }
> Looks like you correctly maintain the in-memory copy in preferred cpu
> byte order, while writing to disk in big-endian order.
>
>> +
>> +
>> +    /* Empty the refcount cache */
>> +    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
>> +    if (ret < 0) {
>> +        error_setg_errno(errp, -ret, "Failed to flush the refblock cache");
>> +        goto done;
>> +    }
>> +
>> +    /* Update the image header to point to the new reftable; this only updates
>> +     * the fields which are relevant to qcow2_update_header(); other fields
>> +     * such as s->refcount_table or s->refcount_bits stay stale for now
>> +     * (because we have to restore everything if qcow2_update_header() fails) */
>> +    old_refcount_order  = s->refcount_order;
>> +    old_reftable_size   = s->refcount_table_size;
>> +    old_reftable_offset = s->refcount_table_offset;
>> +
>> +    s->refcount_order        = refcount_order;
>> +    s->refcount_table_size   = new_reftable_size;
>> +    s->refcount_table_offset = new_reftable_offset;
>> +
>> +    ret = qcow2_update_header(bs);
>> +    if (ret < 0) {
>> +        s->refcount_order        = old_refcount_order;
>> +        s->refcount_table_size   = old_reftable_size;
>> +        s->refcount_table_offset = old_reftable_offset;
>> +        error_setg_errno(errp, -ret, "Failed to update the qcow2 header");
>> +        goto done;
>> +    }
> Failures up to here still have issues leaking the new reftable
> allocation (point [5]).
>
>> +
>> +    /* Now update the rest of the in-memory information */
>> +    old_reftable = s->refcount_table;
>> +    s->refcount_table = new_reftable;
>> +
>> +    /* For cleaning up all old refblocks */
>> +    new_reftable      = old_reftable;
>> +    new_reftable_size = old_reftable_size;
>> +
>> +    s->refcount_bits = 1 << refcount_order;
>> +    if (refcount_order < 6) {
>> +        s->refcount_max = (UINT64_C(1) << s->refcount_bits) - 1;
>> +    } else {
>> +        s->refcount_max = INT64_MAX;
>> +    }
> Is it worth factoring this computation into a common helper, since it
> appeared in an earlier patch as well?

Well, it does appear in qcow2_open(), as do the next two lines. The only 
reason to factor them out would be so that neither place is forgotten if 
one of them is changed; but the only reason I could imagine for this 
would be to replace the INT64_MAX by UINT64_MAX at some point in the 
future, but I guess that point is still far away (because there's no 
reasonable way someone would be needing the 64th bit, as you agreed on) 
, so it should be fine that way.

>> +
>> +    s->refcount_block_bits = s->cluster_bits - (refcount_order - 3);
>> +    s->refcount_block_size = 1 << s->refcount_block_bits;
>> +
>> +    s->get_refcount = new_get_refcount;
>> +    s->set_refcount = new_set_refcount;
>> +
>> +    /* And free the old reftable (the old refblocks are freed below the "done"
>> +     * label) */
>> +    qcow2_free_clusters(bs, old_reftable_offset,
>> +                        old_reftable_size * sizeof(uint64_t),
>> +                        QCOW2_DISCARD_NEVER);
> site [2] mentioned above, as a possible point where you might want to
> ensure the callback is called with equal progress and total values to
> ensure the caller knows the job is done.  Except this site doesn't have
> quite as much information as site [1] about what total size all the
> other status callbacks were using.
>
>> +
>> +done:
>> +    if (new_reftable) {
>> +        /* On success, new_reftable actually points to the old reftable (and
>> +         * new_reftable_size is the old reftable's size); but that is just
>> +         * fine */
>> +        for (i = 0; i < new_reftable_size; i++) {
>> +            uint64_t offset = new_reftable[i] & REFT_OFFSET_MASK;
>> +            if (offset) {
>> +                qcow2_free_clusters(bs, offset, s->cluster_size,
>> +                                    QCOW2_DISCARD_NEVER);
>> +            }
>> +        }
>> +        g_free(new_reftable);
> So here is point [5] - if we failed early, this tries to clean up all
> allocated refblocks associated with the new table.  It does NOT clean up
> any refblocks allocated due to resizing the old table to be slightly
> larger, but that should be fine (not a leak, so much as an image that is
> now a couple clusters larger than the minimum required size).  However,
> while you clean up the clusters associated with refblocks (layer 2), the
> cleanup of old clusters associated with the reftable (layer 1) happened
> before the done: label on success, but that means that on failure, you
> are NOT cleaning up the clusters associated with the new reftable.

Oops, right, will fix.

>> +    }
>> +
>> +    qemu_vfree(new_refblock);
>> +    return ret;
>> +}
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index fe12c54..5b96519 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -526,6 +526,10 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
>>   int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset,
>>                                     int64_t size);
>>   
>> +int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
>> +                                BlockDriverAmendStatusCB *status_cb,
>> +                                void *cb_opaque, Error **errp);
>> +
>>   /* qcow2-cluster.c functions */
>>   int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
>>                           bool exact_size);
>>
> Interesting patch.  Hope my review helps you prepare a better v2.

If everything else fails, I'll just split the amend stuff from this 
series. But I'll work it out somehow. And your review will definitely 
help, thanks a lot!

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment
  2014-11-12  9:55     ` Max Reitz
@ 2014-11-12 13:50       ` Eric Blake
  0 siblings, 0 replies; 75+ messages in thread
From: Eric Blake @ 2014-11-12 13:50 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 8880 bytes --]

On 11/12/2014 02:55 AM, Max Reitz wrote:
> On 2014-11-12 at 05:15, Eric Blake wrote:
>> On 11/10/2014 06:45 AM, Max Reitz wrote:
>>> Add a function qcow2_change_refcount_order() which allows changing the
>>> refcount order of a qcow2 image.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>   block/qcow2-refcount.c | 424
>>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>>   block/qcow2.h          |   4 +
>>>   2 files changed, 428 insertions(+)
>> This is fairly big; you may want to get a second review from a
>> maintainer rather than blindly trusting me.
> 
> I didn't really see the point in splitting it up. Introducing the static
> helper functions first would not be very helpful either, so I thought
> what I'd do as a reviewer, and that was "apply the patch and just read
> through all the code". Splitting it into multiple patches would not have
> helped there (and I don't see how I could split this patch into logical
> changes, where at the start we have some inefficient but simple
> implementation and it gets better over time).

Yes, I agree that your patch is not divisible.  Big doesn't always mean
bad :)


>>> +                refcount = s->get_refcount(refblock, refblock_index);
>>> +                if (new_refcount_bits < 64 && refcount >>
>>> new_refcount_bits) {
>> Technically, this get_refcount() call is dead code on the second walk,
>> since the first walk already validated things, so you could push all of
>> this code...
>>
>>> +                    uint64_t offset;
>>> +
>>> +                    qcow2_cache_put(bs, s->refcount_block_cache,
>>> &refblock);
>>> +
>>> +                    offset = ((reftable_index <<
>>> s->refcount_block_bits)
>>> +                              + refblock_index) << s->cluster_bits;
>>> +
>>> +                    error_setg(errp, "Cannot decrease refcount entry
>>> width to "
>>> +                               "%i bits: Cluster at offset %#"
>>> PRIx64 " has a "
>>> +                               "refcount of %" PRIu64,
>>> new_refcount_bits,
>>> +                               offset, refcount);
>>> +                    return -EINVAL;
>>> +                }
>>> +
>>> +                if (new_set_refcount) {
>>> +                    new_set_refcount(new_refblock,
>>> new_refblock_index++, refcount);
>>> +                } else {
>> ...here, in the branch only run on the first walk.
> 
> Well, yes, but I wanted to keep this function as agnostic to what the
> caller wants to do with it as possible. I'd rather decide depending on
> whether index == 0 because that's a better way of discerning the first
> walk.

I was thinking a bit more about how to avoid the allocation corner case
bug, and wonder if three walks instead of 2 is the right solution, in
which case the first two walks are both allocations, and index==0 is no
longer a reliable witness of whether these checks are needed.  More on
that below.

> 
>>> +                    new_refblock_index++;
>>> +                }
>>> +                new_refblock_empty = new_refblock_empty && refcount
>>> == 0;
>> Worth condensing to 'new_refblock_empty &= !refcount'?  Maybe not.
> 
> I personally would find that harder to read.

No need to change it then.


>>> +                if (new_set_refcount) {
>>> +                    new_set_refcount(new_refblock,
>>> new_refblock_index++, 0);
>> Would it be worth guaranteeing that every new refblock is 0-initialized
>> when allocated, so that you can skip setting a refcount to 0?  This
>> question depends on the answer about block allocation asked at [4] above.
> 
> This function sets a value in the buffer new_refblock, not in the
> cluster on disk. Therefore, in order to be able to omit this call, we'd
> have to call a memset() with 0 on new_refblock after each call to
> operation(). I don't think it's worth it. This is more explicit and
> won't cost much performance.

Yeah, that's about the same conclusion I came to after finishing the
whole review, although I didn't state it very well (this was one of my
earlier comments in my non-linear review, that I didn't touch up later).


>>> +    /* The new_reftable_size is now valid and will not be changed
>>> anymore,
>>> +     * so we can now allocate the reftable */
>>> +    new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size *
>>> +                                                   sizeof(uint64_t));
>> And here is your bug, that I hinted at with the mention of [3] above.
>> This allocation can potentially cause an overflow of the existing
>> reftable to occupy one more cluster.
> 
> An additional bug is that the new reftable may be referenced by an
> existing refblock which was completely empty, though (or at least
> referenced by part of an existing refblock which was to be turned into a
> new refblock which was then completely empty, and thus omitted from
> allocation).
> 
>> Remember my thought experiment
>> above, how an 8 megabyte image rolls from 1 to 2 clusters during the
>> course of allocating refblocks for the new table?  What if the original
>> image wasn't completely full, but things are perfectly sized with enough
>> free clusters, then all of the refblock allocations done during the
>> first walk will still fit, and it is only this final allocation of the
>> new reftable that will cause the rollover, at which point we've failed
>> to account for the new refblock size.  That is, I think I could craft an
>> image that would trigger either an assertion failure or an out-of-bounds
>> array access during the second walk.
> 
> You're completely right, this will be a pain to fix, though... The
> simplest way would probably to check whether the new_reftable_size
> should be increased due to this operation and if it did, rerun
> walk_over_reftable() with the alloc_refblock() function only allocating
> refblocks if the new reftable does not already point to a refblock for a
> certain index. This would be repeated until the new_reftable_size is
> constant. And the really simplest incarnation of this would be to have a
> flag whether any allocations were done and repeat until everything is fine.
> 
> Another way would be to somehow integrate the allocation of the new
> reftable into walk_over_reftable() and then to only mind the additional
> reftable entries.
> 
> But probably the first way is the correct one because due to
> reallocation of the old reftable, some intermediate refblocks which were
> empty before are now filled.
> 
> I'll have to craft some test images myself, not least to be able to
> include them in the iotest.

Overnight, I had been thinking about it (I don't know if it's a good or
a bad thing when a patch is so mentally engaging that it becomes the
thing on my mind).  My initial idea was to teach the walk function how
to start and stop at given limits, maybe by passing an address to
dereference as the end point.  Then do something like:

original_limit = s->refcount_table_size;
walk allocations with limits of 0, &original_limit
allocate a contiguous block of limit+1 clusters for the new reftable
walk allocations with limits of original_limit, &s->refcount_table_size
if limit+1 is too big after all (most of the time), then free the tail
walk refcount assignment with limits of 0, &s->refcount_table_size

Allocating limit+1 for the new refblock should always be safe.  In the
majority of cases, it overallocates; but as we are already having
several small holes in the image after freeing the original reftable
means it won't hurt to have one more cluster hole.  It's better to have
the algorithm be safe and waste a few clusters than to figure out how to
pack it into the most efficient space possible at the expense of more
code complexity.  And maybe someday we'll implement a cluster
defragmenter for packing an image down into no holes.  In the minority
of cases, the +1 should always be sufficient to cover any additional
allocation spillovers.

But you raise a good point that the original image may have holes on the
first walk, where allocating refblocks for the new table will turn those
holes into refcounts that must be picked up, so I think you are right
that you have to walk the ENTIRE image on the second pass, rather than
just the tail of the image where you stopped early on the first pass.

Good luck on coming up with a plan to tackle it.

>> Interesting patch.  Hope my review helps you prepare a better v2.
> 
> If everything else fails, I'll just split the amend stuff from this
> series. But I'll work it out somehow. And your review will definitely
> help, thanks a lot!

Glad to hear it.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment
  2014-11-10 13:45 ` [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment Max Reitz
  2014-11-12  4:15   ` Eric Blake
@ 2014-11-12 14:19   ` Eric Blake
  2014-11-12 14:21     ` Max Reitz
  1 sibling, 1 reply; 75+ messages in thread
From: Eric Blake @ 2014-11-12 14:19 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 482 bytes --]

On 11/10/2014 06:45 AM, Max Reitz wrote:
> Add a function qcow2_change_refcount_order() which allows changing the
> refcount order of a qcow2 image.

A thought: didn't you just submit a patch that marked the image as
dirty, nuked the on-disk refcount, then rebuilt one using the in-memory
refcounts?  Would reusing THAT code be any better than writing this patch?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment
  2014-11-12 14:19   ` Eric Blake
@ 2014-11-12 14:21     ` Max Reitz
  2014-11-12 17:45       ` Max Reitz
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-12 14:21 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-12 at 15:19, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> Add a function qcow2_change_refcount_order() which allows changing the
>> refcount order of a qcow2 image.
> A thought: didn't you just submit a patch that marked the image as
> dirty, nuked the on-disk refcount, then rebuilt one using the in-memory
> refcounts?  Would reusing THAT code be any better than writing this patch?

Yes, I thought about that, too... The problem is that that patch 
requires all refcount blocks to fit in memory at the same time (or 
generally, the qcow2 check function requires that, for now). I'd really 
like to avoid that, if possible, but maybe it isn't possible after all.

But if you say it like that ("nuke"), I guess I'll give it a try. Maybe 
it looks funny enough.

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment
  2014-11-12 14:21     ` Max Reitz
@ 2014-11-12 17:45       ` Max Reitz
  2014-11-12 20:21         ` Eric Blake
  0 siblings, 1 reply; 75+ messages in thread
From: Max Reitz @ 2014-11-12 17:45 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-12 at 15:21, Max Reitz wrote:
> On 2014-11-12 at 15:19, Eric Blake wrote:
>> On 11/10/2014 06:45 AM, Max Reitz wrote:
>>> Add a function qcow2_change_refcount_order() which allows changing the
>>> refcount order of a qcow2 image.
>> A thought: didn't you just submit a patch that marked the image as
>> dirty, nuked the on-disk refcount, then rebuilt one using the in-memory
>> refcounts?  Would reusing THAT code be any better than writing this 
>> patch?
>
> Yes, I thought about that, too... The problem is that that patch 
> requires all refcount blocks to fit in memory at the same time (or 
> generally, the qcow2 check function requires that, for now). I'd 
> really like to avoid that, if possible, but maybe it isn't possible 
> after all.
>
> But if you say it like that ("nuke"), I guess I'll give it a try. 
> Maybe it looks funny enough.

Okay, I gave it a try. It does work, but one problem that would require 
larger changes (which would be relatively easy to review, though, I 
guess...) is a refcount overflow during the image check.

What I did was the following: First, check all refcounts against the 
maximum after the amendment. Second, do an image check without repairing 
anything, because we only want to do this on clean images. Third, wipe 
all references to the existing reftable. Fourth, use a function normally 
used by the image check to calculate the refcounts (which results in the 
in-memory refcount table). Fifth, create new refcount structures using 
the rebuild_refcount_structure() function for image repairing.

There are (at least) three problems with this approach. The first is a 
rather cosmetic one: You can't easily give progress reports. There are 
four time-consuming steps here (wiping references to the existing 
reftable is not time-consuming), so this approach can only say when 25 % 
are done, 50 %, 75 % and 100 %.

The second is that if an error occurs during rebuilding the refcount 
structures, it's close to impossible to restore the old ones, because 
the new structures may have been partially written thus overwriting the 
old ones. But having marked the image dirty should suffice to "solve" this.

And the third one is that the initial check (whether the image is 
consistent at all) may throw an error because of refcount overflows. 
This error will tell you to use amend to increase the refcount width. 
Well, too bad. To solve this, we'd have to be able to do the refcount 
consistency check with an arbitrary refcount order (in this case the 
target refcount order), which would require some work on the check 
functions.

I'll just go with the original idea for now.

Max

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment
  2014-11-12 17:45       ` Max Reitz
@ 2014-11-12 20:21         ` Eric Blake
  0 siblings, 0 replies; 75+ messages in thread
From: Eric Blake @ 2014-11-12 20:21 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2121 bytes --]

On 11/12/2014 10:45 AM, Max Reitz wrote:
> On 2014-11-12 at 15:21, Max Reitz wrote:
>> On 2014-11-12 at 15:19, Eric Blake wrote:
>>> On 11/10/2014 06:45 AM, Max Reitz wrote:
>>>> Add a function qcow2_change_refcount_order() which allows changing the
>>>> refcount order of a qcow2 image.
>>> A thought: didn't you just submit a patch that marked the image as
>>> dirty, nuked the on-disk refcount, then rebuilt one using the in-memory
>>> refcounts?  Would reusing THAT code be any better than writing this
>>> patch?
>>
> 
> There are (at least) three problems with this approach. The first is a
> rather cosmetic one: You can't easily give progress reports. There are
> four time-consuming steps here (wiping references to the existing
> reftable is not time-consuming), so this approach can only say when 25 %
> are done, 50 %, 75 % and 100 %.

Yep, definitely annoying.

> 
> The second is that if an error occurs during rebuilding the refcount
> structures, it's close to impossible to restore the old ones, because
> the new structures may have been partially written thus overwriting the
> old ones. But having marked the image dirty should suffice to "solve" this.

Solved, but not as clean as your current patch that maintains image
consistency throughout and doesn't require a rebuild.  I guess a
tradeoff of clean images vs. code reuse can go either way.

> 
> And the third one is that the initial check (whether the image is
> consistent at all) may throw an error because of refcount overflows.
> This error will tell you to use amend to increase the refcount width.
> Well, too bad. To solve this, we'd have to be able to do the refcount
> consistency check with an arbitrary refcount order (in this case the
> target refcount order), which would require some work on the check
> functions.
> 
> I'll just go with the original idea for now.

Okay.  Just making sure we are considering alternatives when justifying
why we go with a particular solution.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2014-11-12 20:22 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 01/21] qcow2: Add two new fields to BDRVQcowState Max Reitz
2014-11-10 19:00   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 02/21] qcow2: Add refcount_width to format-specific info Max Reitz
2014-11-10 19:06   ` Eric Blake
2014-11-11  8:11     ` Max Reitz
2014-11-11 15:49       ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 03/21] qcow2: Use 64 bits for refcount values Max Reitz
2014-11-10 20:59   ` Eric Blake
2014-11-11  8:12     ` Max Reitz
2014-11-11  9:22     ` Kevin Wolf
2014-11-11  9:25       ` Max Reitz
2014-11-11  9:26         ` Max Reitz
2014-11-11  9:36         ` Kevin Wolf
2014-11-10 13:45 ` [Qemu-devel] [PATCH 04/21] qcow2: Respect error in qcow2_alloc_bytes() Max Reitz
2014-11-10 21:05   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes() Max Reitz
2014-11-10 21:12   ` Eric Blake
2014-11-11  8:22     ` Max Reitz
2014-11-11 16:13       ` Eric Blake
2014-11-11 16:18         ` Max Reitz
2014-11-11 19:49           ` Eric Blake
2014-11-12  8:52             ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification Max Reitz
2014-11-10 22:30   ` Eric Blake
2014-11-11  8:35     ` Max Reitz
2014-11-11  9:43       ` Max Reitz
2014-11-11 10:56       ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 07/21] qcow2: Helper for refcount array size calculation Max Reitz
2014-11-10 22:49   ` Eric Blake
2014-11-11  8:37     ` Max Reitz
2014-11-11 10:08       ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 08/21] qcow2: More helpers for refcount modification Max Reitz
2014-11-11  0:29   ` Eric Blake
2014-11-11  8:42     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 09/21] qcow2: Open images with refcount order != 4 Max Reitz
2014-11-10 17:03   ` Eric Blake
2014-11-10 17:06     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 10/21] qcow2: refcount_order parameter for qcow2_create2 Max Reitz
2014-11-11  5:40   ` Eric Blake
2014-11-11  8:48     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 11/21] iotests: Prepare for refcount_width option Max Reitz
2014-11-11 17:57   ` Eric Blake
2014-11-12  8:41     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 12/21] qcow2: Allow creation with refcount order != 4 Max Reitz
2014-11-11 18:05   ` Eric Blake
2014-11-12  8:47     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 13/21] block: Add opaque value to the amend CB Max Reitz
2014-11-11 18:08   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 14/21] qcow2: Use error_report() in qcow2_amend_options() Max Reitz
2014-11-11 18:11   ` Eric Blake
2014-11-12  8:47     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 15/21] qcow2: Use abort() instead of assert(false) Max Reitz
2014-11-11 18:12   ` Eric Blake
2014-11-12  8:48     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 16/21] qcow2: Split upgrade/downgrade paths for amend Max Reitz
2014-11-11 18:14   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 17/21] qcow2: Use intermediate helper CB " Max Reitz
2014-11-11 21:05   ` Eric Blake
2014-11-12  9:10     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment Max Reitz
2014-11-12  4:15   ` Eric Blake
2014-11-12  9:55     ` Max Reitz
2014-11-12 13:50       ` Eric Blake
2014-11-12 14:19   ` Eric Blake
2014-11-12 14:21     ` Max Reitz
2014-11-12 17:45       ` Max Reitz
2014-11-12 20:21         ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 19/21] qcow2: Invoke refcount order amendment function Max Reitz
2014-11-12  4:36   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 20/21] qcow2: Point to amend function in check Max Reitz
2014-11-12  4:38   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 21/21] iotests: Add test for different refcount widths Max Reitz
2014-11-11 19:53   ` Eric Blake
2014-11-12  8:58     ` Max Reitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).