* [Qemu-devel] [PATCH v2 1/2] qcow2: Repair OFLAG_COPIED when fixing leaks
2018-05-09 20:00 [Qemu-devel] [PATCH v2 0/2] qcow2: Repair OFLAG_COPIED when fixing leaks Max Reitz
@ 2018-05-09 20:00 ` Max Reitz
2018-05-09 20:00 ` [Qemu-devel] [PATCH v2 2/2] iotests: Repairing error during snapshot deletion Max Reitz
2018-06-01 11:08 ` [Qemu-devel] [PATCH v2 0/2] qcow2: Repair OFLAG_COPIED when fixing leaks Max Reitz
2 siblings, 0 replies; 4+ messages in thread
From: Max Reitz @ 2018-05-09 20:00 UTC (permalink / raw)
To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf
Repairing OFLAG_COPIED is usually safe because it is done after the
refcounts have been repaired. Therefore, it we did not find anyone else
referencing a data or L2 cluster, it makes no sense to not set
OFLAG_COPIED -- and the other direction (clearing OFLAG_COPIED) is
always safe, anyway, it may just induce leaks.
Furthermore, if OFLAG_COPIED is actually consistent with a wrong (leaky)
refcount, we will decrement the refcount with -r leaks, but OFLAG_COPIED
will then be wrong. qemu-img check should not produce images that are
more corrupted afterwards then they were before.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1527085
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
block/qcow2-refcount.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 2dc23005b7..e1ab4d0bfd 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1799,6 +1799,19 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
int ret;
uint64_t refcount;
int i, j;
+ bool repair;
+
+ if (fix & BDRV_FIX_ERRORS) {
+ /* Always repair */
+ repair = true;
+ } else if (fix & BDRV_FIX_LEAKS) {
+ /* Repair only if that seems safe: This function is always
+ * called after the refcounts have been fixed, so the refcount
+ * is accurate if that repair was successful */
+ repair = !res->check_errors && !res->corruptions && !res->leaks;
+ } else {
+ repair = false;
+ }
for (i = 0; i < s->l1_size; i++) {
uint64_t l1_entry = s->l1_table[i];
@@ -1818,10 +1831,8 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
if ((refcount == 1) != ((l1_entry & QCOW_OFLAG_COPIED) != 0)) {
fprintf(stderr, "%s OFLAG_COPIED L2 cluster: l1_index=%d "
"l1_entry=%" PRIx64 " refcount=%" PRIu64 "\n",
- fix & BDRV_FIX_ERRORS ? "Repairing" :
- "ERROR",
- i, l1_entry, refcount);
- if (fix & BDRV_FIX_ERRORS) {
+ repair ? "Repairing" : "ERROR", i, l1_entry, refcount);
+ if (repair) {
s->l1_table[i] = refcount == 1
? l1_entry | QCOW_OFLAG_COPIED
: l1_entry & ~QCOW_OFLAG_COPIED;
@@ -1862,10 +1873,8 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
fprintf(stderr, "%s OFLAG_COPIED data cluster: "
"l2_entry=%" PRIx64 " refcount=%" PRIu64 "\n",
- fix & BDRV_FIX_ERRORS ? "Repairing" :
- "ERROR",
- l2_entry, refcount);
- if (fix & BDRV_FIX_ERRORS) {
+ repair ? "Repairing" : "ERROR", l2_entry, refcount);
+ if (repair) {
l2_table[j] = cpu_to_be64(refcount == 1
? l2_entry | QCOW_OFLAG_COPIED
: l2_entry & ~QCOW_OFLAG_COPIED);
--
2.14.3
^ permalink raw reply related [flat|nested] 4+ messages in thread* [Qemu-devel] [PATCH v2 2/2] iotests: Repairing error during snapshot deletion
2018-05-09 20:00 [Qemu-devel] [PATCH v2 0/2] qcow2: Repair OFLAG_COPIED when fixing leaks Max Reitz
2018-05-09 20:00 ` [Qemu-devel] [PATCH v2 1/2] " Max Reitz
@ 2018-05-09 20:00 ` Max Reitz
2018-06-01 11:08 ` [Qemu-devel] [PATCH v2 0/2] qcow2: Repair OFLAG_COPIED when fixing leaks Max Reitz
2 siblings, 0 replies; 4+ messages in thread
From: Max Reitz @ 2018-05-09 20:00 UTC (permalink / raw)
To: qemu-block; +Cc: qemu-devel, Max Reitz, Kevin Wolf
This adds a test for an I/O error during snapshot deletion, and maybe
more importantly, for how to repair the resulting image. If the
snapshot has been deleted before the error occurs, the only negative
result will be leaked clusters -- and those should be repairable with
qemu-img check -r leaks.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
tests/qemu-iotests/217 | 90 ++++++++++++++++++++++++++++++++++++++++++++++
tests/qemu-iotests/217.out | 42 ++++++++++++++++++++++
tests/qemu-iotests/group | 1 +
3 files changed, 133 insertions(+)
create mode 100755 tests/qemu-iotests/217
create mode 100644 tests/qemu-iotests/217.out
diff --git a/tests/qemu-iotests/217 b/tests/qemu-iotests/217
new file mode 100755
index 0000000000..d3ab5d72be
--- /dev/null
+++ b/tests/qemu-iotests/217
@@ -0,0 +1,90 @@
+#!/bin/bash
+#
+# I/O errors when working with internal qcow2 snapshots, and repairing
+# the result
+#
+# Copyright (C) 2018 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+seq=$(basename $0)
+echo "QA output created by $seq"
+
+status=1 # failure is the default!
+
+_cleanup()
+{
+ _cleanup_test_img
+ rm -f "$TEST_DIR/blkdebug.conf"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# This test is specific to qcow2
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+# This test needs clusters with at least a refcount of 2 so that
+# OFLAG_COPIED is not set. refcount_bits=1 is therefore unsupported.
+_unsupported_imgopts 'refcount_bits=1[^0-9]'
+
+echo
+echo '=== Simulating an I/O error during snapshot deletion ==='
+echo
+
+_make_test_img 64M
+$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
+
+# Create the snapshot
+$QEMU_IMG snapshot -c foo "$TEST_IMG"
+
+# Verify the snapshot is there
+echo
+_img_info | grep 'Snapshot list'
+echo '(Snapshot filtered)'
+echo
+
+# Try to delete the snapshot (with an error happening when freeing the
+# then leaked clusters)
+cat > "$TEST_DIR/blkdebug.conf" <<EOF
+[inject-error]
+event = "cluster_free"
+errno = "5"
+EOF
+$QEMU_IMG snapshot -d foo "blkdebug:$TEST_DIR/blkdebug.conf:$TEST_IMG"
+
+# Verify the snapshot is gone
+echo
+_img_info | grep 'Snapshot list'
+
+# Should only show leaks
+echo '--- Checking test image ---'
+_check_test_img
+
+echo
+
+# As there are only leaks, this should be able to fully repair the
+# image
+echo '--- Repairing test image ---'
+_check_test_img -r leaks
+
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/217.out b/tests/qemu-iotests/217.out
new file mode 100644
index 0000000000..e3fc40a8c7
--- /dev/null
+++ b/tests/qemu-iotests/217.out
@@ -0,0 +1,42 @@
+QA output created by 217
+
+=== Simulating an I/O error during snapshot deletion ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Snapshot list:
+(Snapshot filtered)
+
+qcow2_free_clusters failed: Input/output error
+qemu-img: Could not delete snapshot 'foo': Failed to free the cluster and L1 table: Input/output error
+
+--- Checking test image ---
+Leaked cluster 4 refcount=2 reference=1
+Leaked cluster 5 refcount=2 reference=1
+Leaked cluster 6 refcount=1 reference=0
+Leaked cluster 7 refcount=1 reference=0
+
+4 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
+
+--- Repairing test image ---
+Leaked cluster 4 refcount=2 reference=1
+Leaked cluster 5 refcount=2 reference=1
+Leaked cluster 6 refcount=1 reference=0
+Leaked cluster 7 refcount=1 reference=0
+Repairing cluster 4 refcount=2 reference=1
+Repairing cluster 5 refcount=2 reference=1
+Repairing cluster 6 refcount=1 reference=0
+Repairing cluster 7 refcount=1 reference=0
+Repairing OFLAG_COPIED L2 cluster: l1_index=0 l1_entry=40000 refcount=1
+Repairing OFLAG_COPIED data cluster: l2_entry=50000 refcount=1
+The following inconsistencies were found and repaired:
+
+ 4 leaked clusters
+ 2 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index cc8cd8cc8e..76a41cae3f 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -215,4 +215,5 @@
214 rw auto
215 rw auto quick
216 rw auto quick
+217 rw auto quick
218 rw auto quick
--
2.14.3
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [Qemu-devel] [PATCH v2 0/2] qcow2: Repair OFLAG_COPIED when fixing leaks
2018-05-09 20:00 [Qemu-devel] [PATCH v2 0/2] qcow2: Repair OFLAG_COPIED when fixing leaks Max Reitz
2018-05-09 20:00 ` [Qemu-devel] [PATCH v2 1/2] " Max Reitz
2018-05-09 20:00 ` [Qemu-devel] [PATCH v2 2/2] iotests: Repairing error during snapshot deletion Max Reitz
@ 2018-06-01 11:08 ` Max Reitz
2 siblings, 0 replies; 4+ messages in thread
From: Max Reitz @ 2018-06-01 11:08 UTC (permalink / raw)
To: qemu-block; +Cc: qemu-devel, Kevin Wolf
[-- Attachment #1: Type: text/plain, Size: 2538 bytes --]
On 2018-05-09 22:00, Max Reitz wrote:
> Suppose you have an image with consistent OFLAG_COPIED and refcounts.
> Now further suppose that image has leaked clusters (single reference,
> but refcount 2). When checking such an image with qemu-img check, it
> will notify you of the leakage, and that's it.
>
> Now when trying to repair that image, you naturally use -r leaks because
> that should be sufficient and you want to give qemu-img check just as
> many "permissions" as it needs so it doesn't destroy your image by
> accident. But while the qcow2 driver will fix the refcounts, alas! it
> will now complain about OFLAG_COPIED because that should be set now.
> However, it will not set that without -r all. So, congratulations, now
> you have an image that qemu-img check tells you is really corrupted.
>
> This series makes -r leaks fix OFLAG_COPIED as well, but only if
> repairing the refcounts was completely successful.
>
> There is a Red Hat Bugzilla entry here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1527085
>
> ...but I'm afraid the bug description is set to confidential, although I
> don't know why. Sorry. :-/
> For everyone who cannot read it, let's say the test case in this series
> may very well be a good indication of what you don't see.
>
>
> v2:
> - Reworded the comment in patch 2 on why the test does not support
> refcount_bits=1 -- it's not just because of the snapshotting, but
> mostly because we test OFLAG_COPIED and refcount_bits=1 won't get us
> far there [Eric]
>
>
> git-backport-diff against v1:
>
> Key:
> [----] : patches are identical
> [####] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
>
> 001/2:[----] [--] 'qcow2: Repair OFLAG_COPIED when fixing leaks'
> 002/2:[0003] [FC] 'iotests: Repairing error during snapshot deletion'
>
>
> Max Reitz (2):
> qcow2: Repair OFLAG_COPIED when fixing leaks
> iotests: Repairing error during snapshot deletion
>
> block/qcow2-refcount.c | 25 ++++++++-----
> tests/qemu-iotests/217 | 90 ++++++++++++++++++++++++++++++++++++++++++++++
> tests/qemu-iotests/217.out | 42 ++++++++++++++++++++++
> tests/qemu-iotests/group | 1 +
> 4 files changed, 150 insertions(+), 8 deletions(-)
> create mode 100755 tests/qemu-iotests/217
> create mode 100644 tests/qemu-iotests/217.out
Applied to my block branch.
Max
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread