[Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
@ 2012-05-11 16:48 Kevin Wolf
  2012-05-11 16:48 ` [Qemu-devel] [PATCH 1/3] qemu-img check -r for repairing images Kevin Wolf
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Kevin Wolf @ 2012-05-11 16:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

A prerequisite for a "QED mode" in qcow2, which doesn't update the refcount
table except on clean shutdown, is that refcounts can be repaired when the
image is opened the next time after a crash.

This series adds a qemu-img check option that doesn't only check, but also
tries to fix the errors that it found.

Kevin Wolf (3):
  qemu-img check -r for repairing images
  qemu-img check: Print fixed clusters and recheck
  qcow2: Support for fixing refcount inconsistencies

 block.c                |    4 ++--
 block.h                |    9 ++++++++-
 block/qcow2-refcount.c |   27 +++++++++++++++++++++++++--
 block/qcow2.c          |    5 +++--
 block/qcow2.h          |    3 ++-
 block/qed-check.c      |    2 ++
 block/qed.c            |    5 +++--
 block/vdi.c            |    7 ++++++-
 block_int.h            |    3 ++-
 qemu-img-cmds.hx       |    4 ++--
 qemu-img.c             |   35 ++++++++++++++++++++++++++++++++---
 qemu-img.texi          |    7 ++++++-
 12 files changed, 93 insertions(+), 18 deletions(-)

-- 
1.7.6.5

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 1/3] qemu-img check -r for repairing images
  2012-05-11 16:48 [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts Kevin Wolf
@ 2012-05-11 16:48 ` Kevin Wolf
  2012-05-11 16:48 ` [Qemu-devel] [PATCH 2/3] qemu-img check: Print fixed clusters and recheck Kevin Wolf
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Kevin Wolf @ 2012-05-11 16:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

The QED block driver already provides the functionality to not only
detect inconsistencies in images, but also fix them. However, this
functionality cannot be manually invoked with qemu-img, but the
check happens only automatically during bdrv_open().

This adds a -r switch to qemu-img check that allows manual invocation
of an image repair.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c          |    4 ++--
 block.h          |    7 ++++++-
 block/qcow2.c    |    7 ++++++-
 block/qed.c      |    5 +++--
 block/vdi.c      |    7 ++++++-
 block_int.h      |    3 ++-
 qemu-img-cmds.hx |    4 ++--
 qemu-img.c       |   25 ++++++++++++++++++++++---
 qemu-img.texi    |    7 ++++++-
 9 files changed, 55 insertions(+), 14 deletions(-)

diff --git a/block.c b/block.c
index acebe46..8e928b9 100644
--- a/block.c
+++ b/block.c
@@ -1211,14 +1211,14 @@ bool bdrv_dev_is_medium_locked(BlockDriverState *bs)
  * free of errors) or -errno when an internal error occurred. The results of the
  * check are stored in res.
  */
-int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res)
+int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix)
 {
     if (bs->drv->bdrv_check == NULL) {
         return -ENOTSUP;
     }
 
     memset(res, 0, sizeof(*res));
-    return bs->drv->bdrv_check(bs, res);
+    return bs->drv->bdrv_check(bs, res, fix);
 }
 
 #define COMMIT_BUF_SECTORS 2048
diff --git a/block.h b/block.h
index 799cf48..61b7e8e 100644
--- a/block.h
+++ b/block.h
@@ -190,7 +190,12 @@ typedef struct BdrvCheckResult {
     BlockFragInfo bfi;
 } BdrvCheckResult;
 
-int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res);
+typedef enum {
+    BDRV_FIX_LEAKS    = 1,
+    BDRV_FIX_ERRORS   = 2,
+} BdrvCheckMode;
+
+int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix);
 
 /* async block I/O */
 typedef void BlockDriverDirtyHandler(BlockDriverState *bs, int64_t sector,
diff --git a/block/qcow2.c b/block/qcow2.c
index 3997e19..23cc920 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1508,8 +1508,13 @@ static int qcow2_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 }
 
 
-static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result)
+static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
+                       BdrvCheckMode fix)
 {
+    if (fix) {
+        return -ENOTSUP;
+    }
+
     return qcow2_check_refcounts(bs, result);
 }
 
diff --git a/block/qed.c b/block/qed.c
index 30a31f9..ab59724 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -1517,11 +1517,12 @@ static void bdrv_qed_invalidate_cache(BlockDriverState *bs)
     bdrv_qed_open(bs, bs->open_flags);
 }
 
-static int bdrv_qed_check(BlockDriverState *bs, BdrvCheckResult *result)
+static int bdrv_qed_check(BlockDriverState *bs, BdrvCheckResult *result,
+                          BdrvCheckMode fix)
 {
     BDRVQEDState *s = bs->opaque;
 
-    return qed_check(s, result, false);
+    return qed_check(s, result, !!fix);
 }
 
 static QEMUOptionParameter qed_create_options[] = {
diff --git a/block/vdi.c b/block/vdi.c
index 119d3c7..57325d6 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -277,7 +277,8 @@ static void vdi_header_print(VdiHeader *header)
 }
 #endif
 
-static int vdi_check(BlockDriverState *bs, BdrvCheckResult *res)
+static int vdi_check(BlockDriverState *bs, BdrvCheckResult *res,
+                     BdrvCheckMode fix)
 {
     /* TODO: additional checks possible. */
     BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
@@ -286,6 +287,10 @@ static int vdi_check(BlockDriverState *bs, BdrvCheckResult *res)
     uint32_t *bmap;
     logout("\n");
 
+    if (fix) {
+        return -ENOTSUP;
+    }
+
     bmap = g_malloc(s->header.blocks_in_image * sizeof(uint32_t));
     memset(bmap, 0xff, s->header.blocks_in_image * sizeof(uint32_t));
 
diff --git a/block_int.h b/block_int.h
index b80e66d..a0fffe1 100644
--- a/block_int.h
+++ b/block_int.h
@@ -241,7 +241,8 @@ struct BlockDriver {
      * Returns 0 for completed check, -errno for internal errors.
      * The check results are stored in result.
      */
-    int (*bdrv_check)(BlockDriverState* bs, BdrvCheckResult *result);
+    int (*bdrv_check)(BlockDriverState* bs, BdrvCheckResult *result,
+        BdrvCheckMode fix);
 
     void (*bdrv_debug_event)(BlockDriverState *bs, BlkDebugEvent event);
 
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 49dce7c..39419a0 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -10,9 +10,9 @@ STEXI
 ETEXI
 
 DEF("check", img_check,
-    "check [-f fmt] filename")
+    "check [-f fmt] [-r [leaks | all]] filename")
 STEXI
-@item check [-f @var{fmt}] @var{filename}
+@item check [-f @var{fmt}] [-r [leaks | all]] @var{filename}
 ETEXI
 
 DEF("create", img_create,
diff --git a/qemu-img.c b/qemu-img.c
index 5434ddc..b151076 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -85,6 +85,12 @@ static void help(void)
            "  '-S' indicates the consecutive number of bytes that must contain only zeros\n"
            "       for qemu-img to create a sparse image during conversion\n"
            "\n"
+           "Parameters to check subcommand:\n"
+           "  '-r' tries to repair any inconsistencies that are found during the check.\n"
+           "       '-r leaks' repairs only cluster leaks, whereas '-r all' fixes all\n"
+           "       kinds of errors, with a higher risk of choosing the wrong fix or\n"
+           "       hiding corruption that has already occured.\n"
+           "\n"
            "Parameters to snapshot subcommand:\n"
            "  'snapshot' is the name of the snapshot to create, apply or delete\n"
            "  '-a' applies a snapshot (revert disk to saved state)\n"
@@ -372,10 +378,12 @@ static int img_check(int argc, char **argv)
     const char *filename, *fmt;
     BlockDriverState *bs;
     BdrvCheckResult result;
+    int fix = 0;
+    int flags = BDRV_O_FLAGS;
 
     fmt = NULL;
     for(;;) {
-        c = getopt(argc, argv, "f:h");
+        c = getopt(argc, argv, "f:hr:");
         if (c == -1) {
             break;
         }
@@ -387,6 +395,17 @@ static int img_check(int argc, char **argv)
         case 'f':
             fmt = optarg;
             break;
+        case 'r':
+            flags |= BDRV_O_RDWR;
+
+            if (!strcmp(optarg, "leaks")) {
+                fix = BDRV_FIX_LEAKS;
+            } else if (!strcmp(optarg, "all")) {
+                fix = BDRV_FIX_LEAKS | BDRV_FIX_ERRORS;
+            } else {
+                help();
+            }
+            break;
         }
     }
     if (optind >= argc) {
@@ -394,11 +413,11 @@ static int img_check(int argc, char **argv)
     }
     filename = argv[optind++];
 
-    bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS);
+    bs = bdrv_new_open(filename, fmt, flags);
     if (!bs) {
         return 1;
     }
-    ret = bdrv_check(bs, &result);
+    ret = bdrv_check(bs, &result, fix);
 
     if (ret == -ENOTSUP) {
         error_report("This image format does not support checks");
diff --git a/qemu-img.texi b/qemu-img.texi
index 5a3dc41..34610a3 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -70,10 +70,15 @@ lists all snapshots in the given image
 Command description:
 
 @table @option
-@item check [-f @var{fmt}] @var{filename}
+@item check [-f @var{fmt}] [-r [leaks | all]] @var{filename}
 
 Perform a consistency check on the disk image @var{filename}.
 
+If @code{-r} is specified, qemu-img tries to repair any inconsistencies found
+during the check. @code{-r leaks} repairs only cluster leaks, whereas
+@code{-r all} fixes all kinds of errors, with a higher risk of choosing the
+wrong fix or hiding corruption that has already occured.
+
 Only the formats @code{qcow2}, @code{qed} and @code{vdi} support
 consistency checks.
 
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 2/3] qemu-img check: Print fixed clusters and recheck
  2012-05-11 16:48 [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts Kevin Wolf
  2012-05-11 16:48 ` [Qemu-devel] [PATCH 1/3] qemu-img check -r for repairing images Kevin Wolf
@ 2012-05-11 16:48 ` Kevin Wolf
  2012-05-11 16:48 ` [Qemu-devel] [PATCH 3/3] qcow2: Support for fixing refcount inconsistencies Kevin Wolf
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Kevin Wolf @ 2012-05-11 16:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

When any inconsistencies have been fixed, print the statistics and run
another check to make sure everything is correct now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.h           |    2 ++
 block/qed-check.c |    2 ++
 qemu-img.c        |   10 ++++++++++
 3 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/block.h b/block.h
index 61b7e8e..f8200eb 100644
--- a/block.h
+++ b/block.h
@@ -187,6 +187,8 @@ typedef struct BdrvCheckResult {
     int corruptions;
     int leaks;
     int check_errors;
+    int corruptions_fixed;
+    int leaks_fixed;
     BlockFragInfo bfi;
 } BdrvCheckResult;
 
diff --git a/block/qed-check.c b/block/qed-check.c
index 94327ff..5edf607 100644
--- a/block/qed-check.c
+++ b/block/qed-check.c
@@ -87,6 +87,7 @@ static unsigned int qed_check_l2_table(QEDCheck *check, QEDTable *table)
         if (!qed_check_cluster_offset(s, offset)) {
             if (check->fix) {
                 table->offsets[i] = 0;
+                check->result->corruptions_fixed++;
             } else {
                 check->result->corruptions++;
             }
@@ -127,6 +128,7 @@ static int qed_check_l1_table(QEDCheck *check, QEDTable *table)
             /* Clear invalid offset */
             if (check->fix) {
                 table->offsets[i] = 0;
+                check->result->corruptions_fixed++;
             } else {
                 check->result->corruptions++;
             }
diff --git a/qemu-img.c b/qemu-img.c
index b151076..e74be21 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -425,6 +425,16 @@ static int img_check(int argc, char **argv)
         return 1;
     }
 
+    if (result.corruptions_fixed || result.leaks_fixed) {
+        printf("The following inconsistencies were found and repaired:\n\n"
+               "    %d leaked clusters\n"
+               "    %d corruptions\n\n"
+               "Double checking the fixed image now...\n",
+               result.leaks_fixed,
+               result.corruptions_fixed);
+        ret = bdrv_check(bs, &result, 0);
+    }
+
     if (!(result.corruptions || result.leaks || result.check_errors)) {
         printf("No errors were found on the image.\n");
     } else {
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 3/3] qcow2: Support for fixing refcount inconsistencies
  2012-05-11 16:48 [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts Kevin Wolf
  2012-05-11 16:48 ` [Qemu-devel] [PATCH 1/3] qemu-img check -r for repairing images Kevin Wolf
  2012-05-11 16:48 ` [Qemu-devel] [PATCH 2/3] qemu-img check: Print fixed clusters and recheck Kevin Wolf
@ 2012-05-11 16:48 ` Kevin Wolf
  2012-05-25 15:33   ` Stefan Hajnoczi
  2012-05-25 15:34 ` [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts Stefan Hajnoczi
       [not found] ` <CAEH94Lj24QE5SS5EyYUy1+BGqn=LihGVn8SRbxt8t9X+RVQ6-A@mail.gmail.com>
  4 siblings, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2012-05-11 16:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-refcount.c |   27 +++++++++++++++++++++++++--
 block/qcow2.c          |    6 +-----
 block/qcow2.h          |    3 ++-
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 443c021..6a9a672 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1122,7 +1122,8 @@ fail:
  * Returns 0 if no errors are found, the number of errors in case the image is
  * detected as corrupted, and -errno when an internal error occurred.
  */
-int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res)
+int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
+                          BdrvCheckMode fix)
 {
     BDRVQcowState *s = bs->opaque;
     int64_t size;
@@ -1205,9 +1206,31 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res)
 
         refcount2 = refcount_table[i];
         if (refcount1 != refcount2) {
+
+            /* Check if we're allowed to fix the mismatch */
+            int *num_fixed = NULL;
+            if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) {
+                num_fixed = &res->leaks_fixed;
+            } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERRORS)) {
+                num_fixed = &res->corruptions_fixed;
+            }
+
             fprintf(stderr, "%s cluster %d refcount=%d reference=%d\n",
-                   refcount1 < refcount2 ? "ERROR" : "Leaked",
+                   num_fixed != NULL     ? "Repairing" :
+                   refcount1 < refcount2 ? "ERROR" :
+                                           "Leaked",
                    i, refcount1, refcount2);
+
+            if (num_fixed) {
+                ret = update_refcount(bs, i << s->cluster_bits, 1,
+                                      refcount2 - refcount1);
+                if (ret >= 0) {
+                    (*num_fixed)++;
+                    continue;
+                }
+            }
+
+            /* And if we couldn't, print an error */
             if (refcount1 < refcount2) {
                 res->corruptions++;
             } else {
diff --git a/block/qcow2.c b/block/qcow2.c
index 23cc920..cc751d2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1511,11 +1511,7 @@ static int qcow2_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
                        BdrvCheckMode fix)
 {
-    if (fix) {
-        return -ENOTSUP;
-    }
-
-    return qcow2_check_refcounts(bs, result);
+    return qcow2_check_refcounts(bs, result, fix);
 }
 
 #if 0
diff --git a/block/qcow2.h b/block/qcow2.h
index 9b7b2d6..5fb7f61 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -275,7 +275,8 @@ void qcow2_free_any_clusters(BlockDriverState *bs,
 int qcow2_update_snapshot_refcount(BlockDriverState *bs,
     int64_t l1_table_offset, int l1_size, int addend);
 
-int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res);
+int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
+                          BdrvCheckMode fix);
 
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size);
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: Support for fixing refcount inconsistencies
  2012-05-11 16:48 ` [Qemu-devel] [PATCH 3/3] qcow2: Support for fixing refcount inconsistencies Kevin Wolf
@ 2012-05-25 15:33   ` Stefan Hajnoczi
  2012-05-25 16:28     ` Kevin Wolf
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2012-05-25 15:33 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

On Fri, May 11, 2012 at 5:48 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> @@ -1205,9 +1206,31 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res)
>
>         refcount2 = refcount_table[i];
>         if (refcount1 != refcount2) {
> +
> +            /* Check if we're allowed to fix the mismatch */
> +            int *num_fixed = NULL;
> +            if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) {
> +                num_fixed = &res->leaks_fixed;
> +            } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERRORS)) {
> +                num_fixed = &res->corruptions_fixed;
> +            }
> +
>             fprintf(stderr, "%s cluster %d refcount=%d reference=%d\n",
> -                   refcount1 < refcount2 ? "ERROR" : "Leaked",
> +                   num_fixed != NULL     ? "Repairing" :
> +                   refcount1 < refcount2 ? "ERROR" :
> +                                           "Leaked",
>                    i, refcount1, refcount2);
> +
> +            if (num_fixed) {
> +                ret = update_refcount(bs, i << s->cluster_bits, 1,
> +                                      refcount2 - refcount1);

It would be nicer to use int64_t for i.  I haven't checked but it
makes me nervous to shift ints here.

Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
  2012-05-11 16:48 [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts Kevin Wolf
                   ` (2 preceding siblings ...)
  2012-05-11 16:48 ` [Qemu-devel] [PATCH 3/3] qcow2: Support for fixing refcount inconsistencies Kevin Wolf
@ 2012-05-25 15:34 ` Stefan Hajnoczi
       [not found] ` <CAEH94Lj24QE5SS5EyYUy1+BGqn=LihGVn8SRbxt8t9X+RVQ6-A@mail.gmail.com>
  4 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2012-05-25 15:34 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

On Fri, May 11, 2012 at 5:48 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> A prerequisite for a "QED mode" in qcow2, which doesn't update the refcount
> table except on clean shutdown, is that refcounts can be repaired when the
> image is opened the next time after a crash.
>
> This series adds a qemu-img check option that doesn't only check, but also
> tries to fix the errors that it found.
>
> Kevin Wolf (3):
>  qemu-img check -r for repairing images
>  qemu-img check: Print fixed clusters and recheck
>  qcow2: Support for fixing refcount inconsistencies
>
>  block.c                |    4 ++--
>  block.h                |    9 ++++++++-
>  block/qcow2-refcount.c |   27 +++++++++++++++++++++++++--
>  block/qcow2.c          |    5 +++--
>  block/qcow2.h          |    3 ++-
>  block/qed-check.c      |    2 ++
>  block/qed.c            |    5 +++--
>  block/vdi.c            |    7 ++++++-
>  block_int.h            |    3 ++-
>  qemu-img-cmds.hx       |    4 ++--
>  qemu-img.c             |   35 ++++++++++++++++++++++++++++++++---
>  qemu-img.texi          |    7 ++++++-
>  12 files changed, 93 insertions(+), 18 deletions(-)

Looks good except for the one comment I posted.

Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qcow2: Support for fixing refcount inconsistencies
  2012-05-25 15:33   ` Stefan Hajnoczi
@ 2012-05-25 16:28     ` Kevin Wolf
  0 siblings, 0 replies; 13+ messages in thread
From: Kevin Wolf @ 2012-05-25 16:28 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

Am 25.05.2012 17:33, schrieb Stefan Hajnoczi:
> On Fri, May 11, 2012 at 5:48 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> @@ -1205,9 +1206,31 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res)
>>
>>         refcount2 = refcount_table[i];
>>         if (refcount1 != refcount2) {
>> +
>> +            /* Check if we're allowed to fix the mismatch */
>> +            int *num_fixed = NULL;
>> +            if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) {
>> +                num_fixed = &res->leaks_fixed;
>> +            } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERRORS)) {
>> +                num_fixed = &res->corruptions_fixed;
>> +            }
>> +
>>             fprintf(stderr, "%s cluster %d refcount=%d reference=%d\n",
>> -                   refcount1 < refcount2 ? "ERROR" : "Leaked",
>> +                   num_fixed != NULL     ? "Repairing" :
>> +                   refcount1 < refcount2 ? "ERROR" :
>> +                                           "Leaked",
>>                    i, refcount1, refcount2);
>> +
>> +            if (num_fixed) {
>> +                ret = update_refcount(bs, i << s->cluster_bits, 1,
>> +                                      refcount2 - refcount1);
> 
> It would be nicer to use int64_t for i.  I haven't checked but it
> makes me nervous to shift ints here.

Thanks, good catch. Fixed in block-next.

Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
       [not found]   ` <CAJSP0QW1Er9wQ8syyKyWc7p=_QSmN4Qj-ekSZZ++Zg=--UjkmA@mail.gmail.com>
@ 2012-06-01  5:22     ` Zhi Yong Wu
  2012-06-01  8:06       ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Zhi Yong Wu @ 2012-06-01  5:22 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: QEMU Developers

On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>> On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> A prerequisite for a "QED mode" in qcow2, which doesn't update the refcount
>> Recently some new concepts such as "QED mode" in qcow2 are seen
>> frequencely, can anyone explain what it means? thanks.
>
> qcow2 has more metadata than qed.  More metadata means more write
> operations when allocating new clusters.
>
> In order to overcome this performance issue qcow2 has a metadata
> cache.  But when QEMU is launched with -drive ...,cache=writethrough
> (the default) the metadata cache *must* be in writethrough mode
Why must i be? If the option with -drive ..,cache=writethrough is
specified. it means that host page cache is on while guest disk cache
is off. Since the metadata cache exists in host page cache, not guest,
i think that it is in writeback mode.

> instead of writeback mode.  In other words, every metadata update
> needs to be written to the image file before we complete the guest's
What will mean one guest's wirte request is completed?
> write request.  This means the metadata cache only hides the metadata
> performance issue when -drive ...,cache=direct|writeback are used
> because there we can keep metadata changes buffered in memory until
> the guest flushes the emulated disk write cache.
>
> "QED mode" is a solution for -drive ...,cache=writethrough|directsync.
>  It simply doesn't update refcount metadata in the qcow2 image file
> immediately in exchange for a refcount fixup step that is introduced
Can you say this with more details? Why is this step need only when
image file is opened? After image file is opened, and some guest's
write requests are completed, maybe the refcount fixup step need to be
done once.
> when opening the image file.  It's like doing an fsck operation on a
> file system when mounting it.
>
> Stefan



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
  2012-06-01  5:22     ` Zhi Yong Wu
@ 2012-06-01  8:06       ` Stefan Hajnoczi
  2012-06-01  8:26         ` Zhi Yong Wu
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2012-06-01  8:06 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: QEMU Developers

On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
> On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>>> On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>> A prerequisite for a "QED mode" in qcow2, which doesn't update the refcount
>>> Recently some new concepts such as "QED mode" in qcow2 are seen
>>> frequencely, can anyone explain what it means? thanks.
>>
>> qcow2 has more metadata than qed.  More metadata means more write
>> operations when allocating new clusters.
>>
>> In order to overcome this performance issue qcow2 has a metadata
>> cache.  But when QEMU is launched with -drive ...,cache=writethrough
>> (the default) the metadata cache *must* be in writethrough mode
> Why must i be? If the option with -drive ..,cache=writethrough is
> specified. it means that host page cache is on while guest disk cache
> is off. Since the metadata cache exists in host page cache, not guest,
> i think that it is in writeback mode.

Since the emulated disk write cache is off, we must ensure that guest
writes are on disk before completing them.  Therefore we cannot cache
metadata updates in host RAM - it would be lost on power failure but
we promised the guest its writes reached the disk!

>> instead of writeback mode.  In other words, every metadata update
>> needs to be written to the image file before we complete the guest's
> What will mean one guest's wirte request is completed?

For example, virtio-blk fills in the success status code and raises an
interrupt.  This notifies the guest that the write is done.

>> write request.  This means the metadata cache only hides the metadata
>> performance issue when -drive ...,cache=direct|writeback are used
>> because there we can keep metadata changes buffered in memory until
>> the guest flushes the emulated disk write cache.
>>
>> "QED mode" is a solution for -drive ...,cache=writethrough|directsync.
>>  It simply doesn't update refcount metadata in the qcow2 image file
>> immediately in exchange for a refcount fixup step that is introduced
> Can you say this with more details? Why is this step need only when
> image file is opened? After image file is opened, and some guest's
> write requests are completed, maybe the refcount fixup step need to be
> done once.

If we don't update refcounts on disk then they become outdated and no
longer reflect the true allocation information.  It's not safe to rely
on outdated refcount information since we could allocate the same
cluster multiple times - this means data corruption.  By running a
consistency check when opening a dirty image file we guarantee that we
have accurate refcount information again.

As an optimization we will commit refcount information to disk when
closing the image file and mark it clean.  This means a clean QEMU
shutdown does not require a consistency check on startup - but in the
worst case (power failure or crash) we will have a dirty image file.

Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
  2012-06-01  8:06       ` Stefan Hajnoczi
@ 2012-06-01  8:26         ` Zhi Yong Wu
  2012-06-06 10:32           ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Zhi Yong Wu @ 2012-06-01  8:26 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: QEMU Developers

On Fri, Jun 1, 2012 at 4:06 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>> On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>> On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>>>> On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>>> A prerequisite for a "QED mode" in qcow2, which doesn't update the refcount
>>>> Recently some new concepts such as "QED mode" in qcow2 are seen
>>>> frequencely, can anyone explain what it means? thanks.
>>>
>>> qcow2 has more metadata than qed.  More metadata means more write
>>> operations when allocating new clusters.
>>>
>>> In order to overcome this performance issue qcow2 has a metadata
>>> cache.  But when QEMU is launched with -drive ...,cache=writethrough
>>> (the default) the metadata cache *must* be in writethrough mode
>> Why must i be? If the option with -drive ..,cache=writethrough is
>> specified. it means that host page cache is on while guest disk cache
>> is off. Since the metadata cache exists in host page cache, not guest,
>> i think that it is in writeback mode.
>
> Since the emulated disk write cache is off, we must ensure that guest
> writes are on disk before completing them.  Therefore we cannot cache
> metadata updates in host RAM - it would be lost on power failure but
But host page cache is *on* in this mode, which means that metadata
should be cached in host RAM. how do you explain this?

> we promised the guest its writes reached the disk!
>
>>> instead of writeback mode.  In other words, every metadata update
>>> needs to be written to the image file before we complete the guest's
>> What will mean one guest's wirte request is completed?
>
> For example, virtio-blk fills in the success status code and raises an
> interrupt.  This notifies the guest that the write is done.
Great, thanks.
>
>>> write request.  This means the metadata cache only hides the metadata
>>> performance issue when -drive ...,cache=direct|writeback are used
>>> because there we can keep metadata changes buffered in memory until
>>> the guest flushes the emulated disk write cache.
>>>
>>> "QED mode" is a solution for -drive ...,cache=writethrough|directsync.
>>>  It simply doesn't update refcount metadata in the qcow2 image file
l1/l2 info need to be updated to qcow2 image file?
>>> immediately in exchange for a refcount fixup step that is introduced
>> Can you say this with more details? Why is this step need only when
>> image file is opened? After image file is opened, and some guest's
>> write requests are completed, maybe the refcount fixup step need to be
>> done once.
>
> If we don't update refcounts on disk then they become outdated and no
> longer reflect the true allocation information.  It's not safe to rely
> on outdated refcount information since we could allocate the same
> cluster multiple times - this means data corruption.  By running a
> consistency check when opening a dirty image file we guarantee that we
> have accurate refcount information again.
ah, i got it now.
>
> As an optimization we will commit refcount information to disk when
> closing the image file and mark it clean.  This means a clean QEMU
> shutdown does not require a consistency check on startup - but in the
> worst case (power failure or crash) we will have a dirty image file.
Yeah, a consistency check on startup is good, i think. thanks.
>
> Stefan



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
  2012-06-01  8:26         ` Zhi Yong Wu
@ 2012-06-06 10:32           ` Stefan Hajnoczi
  2012-06-06 14:53             ` Zhi Yong Wu
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2012-06-06 10:32 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: QEMU Developers

On Fri, Jun 1, 2012 at 9:26 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
> On Fri, Jun 1, 2012 at 4:06 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>>> On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>>> On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>>>>> On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>>>> A prerequisite for a "QED mode" in qcow2, which doesn't update the refcount
>>>>> Recently some new concepts such as "QED mode" in qcow2 are seen
>>>>> frequencely, can anyone explain what it means? thanks.
>>>>
>>>> qcow2 has more metadata than qed.  More metadata means more write
>>>> operations when allocating new clusters.
>>>>
>>>> In order to overcome this performance issue qcow2 has a metadata
>>>> cache.  But when QEMU is launched with -drive ...,cache=writethrough
>>>> (the default) the metadata cache *must* be in writethrough mode
>>> Why must i be? If the option with -drive ..,cache=writethrough is
>>> specified. it means that host page cache is on while guest disk cache
>>> is off. Since the metadata cache exists in host page cache, not guest,
>>> i think that it is in writeback mode.
>>
>> Since the emulated disk write cache is off, we must ensure that guest
>> writes are on disk before completing them.  Therefore we cannot cache
>> metadata updates in host RAM - it would be lost on power failure but
> But host page cache is *on* in this mode, which means that metadata
> should be cached in host RAM. how do you explain this?

cache=writethrough means that the file is opened with O_SYNC.  Every
single write reaches the physical disk - that's why it's called a
"writethrough" cache.  Read requests, however, can be satisfied from
the host page cache.

In other words, cache=writethrough ensures that all data reaches the
disk but may give performance benefits to read-heavy workloads
(especially when guest RAM is much smaller than host RAM, so the host
page cache would have a high hit rate).

>> we promised the guest its writes reached the disk!
>>
>>>> instead of writeback mode.  In other words, every metadata update
>>>> needs to be written to the image file before we complete the guest's
>>> What will mean one guest's wirte request is completed?
>>
>> For example, virtio-blk fills in the success status code and raises an
>> interrupt.  This notifies the guest that the write is done.
> Great, thanks.
>>
>>>> write request.  This means the metadata cache only hides the metadata
>>>> performance issue when -drive ...,cache=direct|writeback are used
>>>> because there we can keep metadata changes buffered in memory until
>>>> the guest flushes the emulated disk write cache.
>>>>
>>>> "QED mode" is a solution for -drive ...,cache=writethrough|directsync.
>>>>  It simply doesn't update refcount metadata in the qcow2 image file
> l1/l2 info need to be updated to qcow2 image file?

Yes, this is necessary to ensure written data is accessible in the
future.  Without the L1/L2 tables we cannot find the data we wrote.

Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
  2012-06-06 10:32           ` Stefan Hajnoczi
@ 2012-06-06 14:53             ` Zhi Yong Wu
  2012-06-07 15:07               ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Zhi Yong Wu @ 2012-06-06 14:53 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: QEMU Developers

On Wed, Jun 6, 2012 at 6:32 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Fri, Jun 1, 2012 at 9:26 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>> On Fri, Jun 1, 2012 at 4:06 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>> On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>>>> On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>>>> On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>>>>>> On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>>>>> A prerequisite for a "QED mode" in qcow2, which doesn't update the refcount
>>>>>> Recently some new concepts such as "QED mode" in qcow2 are seen
>>>>>> frequencely, can anyone explain what it means? thanks.
>>>>>
>>>>> qcow2 has more metadata than qed.  More metadata means more write
>>>>> operations when allocating new clusters.
>>>>>
>>>>> In order to overcome this performance issue qcow2 has a metadata
>>>>> cache.  But when QEMU is launched with -drive ...,cache=writethrough
>>>>> (the default) the metadata cache *must* be in writethrough mode
>>>> Why must i be? If the option with -drive ..,cache=writethrough is
>>>> specified. it means that host page cache is on while guest disk cache
>>>> is off. Since the metadata cache exists in host page cache, not guest,
>>>> i think that it is in writeback mode.
>>>
>>> Since the emulated disk write cache is off, we must ensure that guest
>>> writes are on disk before completing them.  Therefore we cannot cache
>>> metadata updates in host RAM - it would be lost on power failure but
>> But host page cache is *on* in this mode, which means that metadata
>> should be cached in host RAM. how do you explain this?
>
> cache=writethrough means that the file is opened with O_SYNC.  Every
> single write reaches the physical disk - that's why it's called a
> "writethrough" cache.  Read requests, however, can be satisfied from
> the host page cache.
>
> In other words, cache=writethrough ensures that all data reaches the
> disk but may give performance benefits to read-heavy workloads
> (especially when guest RAM is much smaller than host RAM, so the host
> page cache would have a high hit rate).
Ah, i see now, cache=writethrough mean that host page cache is applied
to read request, not write. thanks.
>
>>> we promised the guest its writes reached the disk!
>>>
>>>>> instead of writeback mode.  In other words, every metadata update
>>>>> needs to be written to the image file before we complete the guest's
>>>> What will mean one guest's wirte request is completed?
>>>
>>> For example, virtio-blk fills in the success status code and raises an
>>> interrupt.  This notifies the guest that the write is done.
>> Great, thanks.
>>>
>>>>> write request.  This means the metadata cache only hides the metadata
>>>>> performance issue when -drive ...,cache=direct|writeback are used
>>>>> because there we can keep metadata changes buffered in memory until
>>>>> the guest flushes the emulated disk write cache.
>>>>>
>>>>> "QED mode" is a solution for -drive ...,cache=writethrough|directsync.
>>>>>  It simply doesn't update refcount metadata in the qcow2 image file
>> l1/l2 info need to be updated to qcow2 image file?
>
> Yes, this is necessary to ensure written data is accessible in the
> future.  Without the L1/L2 tables we cannot find the data we wrote.
>
> Stefan



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts
  2012-06-06 14:53             ` Zhi Yong Wu
@ 2012-06-07 15:07               ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2012-06-07 15:07 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: QEMU Developers

On Wed, Jun 6, 2012 at 3:53 PM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
> On Wed, Jun 6, 2012 at 6:32 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Fri, Jun 1, 2012 at 9:26 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>>> On Fri, Jun 1, 2012 at 4:06 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>>> On Fri, Jun 1, 2012 at 6:22 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>>>>> On Thu, May 31, 2012 at 5:26 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>>>>> On Wed, May 30, 2012 at 9:31 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>>>>>>> On Sat, May 12, 2012 at 12:48 AM, Kevin Wolf <kwolf@redhat.com> wrote:
>>>>>>>> A prerequisite for a "QED mode" in qcow2, which doesn't update the refcount
>>>>>>> Recently some new concepts such as "QED mode" in qcow2 are seen
>>>>>>> frequencely, can anyone explain what it means? thanks.
>>>>>>
>>>>>> qcow2 has more metadata than qed.  More metadata means more write
>>>>>> operations when allocating new clusters.
>>>>>>
>>>>>> In order to overcome this performance issue qcow2 has a metadata
>>>>>> cache.  But when QEMU is launched with -drive ...,cache=writethrough
>>>>>> (the default) the metadata cache *must* be in writethrough mode
>>>>> Why must i be? If the option with -drive ..,cache=writethrough is
>>>>> specified. it means that host page cache is on while guest disk cache
>>>>> is off. Since the metadata cache exists in host page cache, not guest,
>>>>> i think that it is in writeback mode.
>>>>
>>>> Since the emulated disk write cache is off, we must ensure that guest
>>>> writes are on disk before completing them.  Therefore we cannot cache
>>>> metadata updates in host RAM - it would be lost on power failure but
>>> But host page cache is *on* in this mode, which means that metadata
>>> should be cached in host RAM. how do you explain this?
>>
>> cache=writethrough means that the file is opened with O_SYNC.  Every
>> single write reaches the physical disk - that's why it's called a
>> "writethrough" cache.  Read requests, however, can be satisfied from
>> the host page cache.
>>
>> In other words, cache=writethrough ensures that all data reaches the
>> disk but may give performance benefits to read-heavy workloads
>> (especially when guest RAM is much smaller than host RAM, so the host
>> page cache would have a high hit rate).
> Ah, i see now, cache=writethrough mean that host page cache is applied
> to read request, not write. thanks.

Writes are placed in the host page cache so future reads can be served
from the cache.  But O_SYNC also forces the kernel to immediately sync
the data in the host page cache to disk.

Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-06-07 15:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-11 16:48 [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts Kevin Wolf
2012-05-11 16:48 ` [Qemu-devel] [PATCH 1/3] qemu-img check -r for repairing images Kevin Wolf
2012-05-11 16:48 ` [Qemu-devel] [PATCH 2/3] qemu-img check: Print fixed clusters and recheck Kevin Wolf
2012-05-11 16:48 ` [Qemu-devel] [PATCH 3/3] qcow2: Support for fixing refcount inconsistencies Kevin Wolf
2012-05-25 15:33   ` Stefan Hajnoczi
2012-05-25 16:28     ` Kevin Wolf
2012-05-25 15:34 ` [Qemu-devel] [PATCH block-next 0/3] qemu-img check/qcow2: Allow fixing refcounts Stefan Hajnoczi
     [not found] ` <CAEH94Lj24QE5SS5EyYUy1+BGqn=LihGVn8SRbxt8t9X+RVQ6-A@mail.gmail.com>
     [not found]   ` <CAJSP0QW1Er9wQ8syyKyWc7p=_QSmN4Qj-ekSZZ++Zg=--UjkmA@mail.gmail.com>
2012-06-01  5:22     ` Zhi Yong Wu
2012-06-01  8:06       ` Stefan Hajnoczi
2012-06-01  8:26         ` Zhi Yong Wu
2012-06-06 10:32           ` Stefan Hajnoczi
2012-06-06 14:53             ` Zhi Yong Wu
2012-06-07 15:07               ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).