From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1K20WZ-0003tB-5T
	for qemu-devel@nongnu.org; Fri, 30 May 2008 05:03:47 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1K20WW-0003s1-Bj
	for qemu-devel@nongnu.org; Fri, 30 May 2008 05:03:45 -0400
Received: from [199.232.76.173] (port=42860 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1K20WU-0003rm-5E
	for qemu-devel@nongnu.org; Fri, 30 May 2008 05:03:42 -0400
Received: from rv-out-0708.google.com ([209.85.198.244]:49294)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <m.bevand@gmail.com>) id 1K20WT-0001QG-RY
	for qemu-devel@nongnu.org; Fri, 30 May 2008 05:03:42 -0400
Received: by rv-out-0708.google.com with SMTP id f25so3612216rvb.22
	for <qemu-devel@nongnu.org>; Fri, 30 May 2008 02:03:40 -0700 (PDT)
Message-ID: <aaccfcb60805300203l18b71b9cne1b7bf8d9fc04cbd@mail.gmail.com>
Date: Fri, 30 May 2008 02:03:40 -0700
From: "Marc Bevand" <m.bevand@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Subject: [Qemu-devel] [PATCH] New qemu-img convert -B option to preserve the
	COW aspect of images and/or re-base them
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

[PATCH] New qemu-img convert -B option to preserve the COW aspect of
images and/or re-base them

If a disk image hd_a is a copy-on-write image based on the backing
file hd_base, it is currently impossible to use qemu-img to convert
hd_a to hd_b (possibly using another disk image format) while keeping
hd_b a copy-on-write image of hd_base. qemu-img also doesn't provide a
feature that would let an enduser re-base a image, for example: adjust
hd_a's backing file name from hd_base to hd_base2 if it had to change
for some reason.

This patch solves the 2 above problems by adding a new qemu-img
convert -B option. This is a generic feature that should work with
ANY disk image format supporting backing files. Examples:

  $ qemu-img info hd_a
  image: hd_a
  file format: qcow
  virtual size: 6.0G (6442450944 bytes)
  disk size: 28K
  cluster_size: 512
  backing file: hd_base (actual path: hd_base)

Converting hd_a (qcow) to hd_b (qcow2) while preserving the
copy-on-write aspect of the image:

  $ qemu-img convert hd_a -O qcow2 -B hd_base hd_b
  $ qemu-img info hd_b
  image: hd_b
  file format: qcow2
  virtual size: 6.0G (6442450944 bytes)
  disk size: 36K
  cluster_size: 4096
  backing file: hd_base (actual path: hd_base)

Renaming the backing file without losing hd_a:

  $ ln hd_base hd_base2
  $ qemu-img convert hd_a -O qcow -B hd_base2 hd_a2
  $ mv hd_a2 hd_a
  $ rm hd_base
  $ qemu-img info hd_a
  image: hd_a
  file format: qcow
  virtual size: 6.0G (6442450944 bytes)
  disk size: 28K
  cluster_size: 512
  backing file: hd_base2 (actual path: hd_base2)

Patch made against SVN's rev 4622.


Signed-off-by: Marc Bevand <m.bevand <at> gmail.com>

Index: qemu-img.texi
===================================================================
--- qemu-img.texi	(revision 4622)
+++ qemu-img.texi	(working copy)
@@ -10,7 +10,7 @@
 @table @option
 @item create [-e] [-6] [-b @var{base_image}] [-f @var{fmt}]
@var{filename} [@var{size}]
 @item commit [-f @var{fmt}] @var{filename}
-@item convert [-c] [-e] [-6] [-f @var{fmt}] @var{filename} [-O
@var{output_fmt}] @var{output_filename}
+@item convert [-c] [-e] [-6] [-f @var{fmt}] [-O @var{output_fmt}] [-B
@var{output_base_image}] @var{filename} [@var{filename2} [...]]
@var{output_filename}
 @item info [-f @var{fmt}] @var{filename}
 @end table

@@ -21,7 +21,11 @@
 @item base_image
 is the read-only disk image which is used as base for a copy on
     write image; the copy on write image only stores the modified data
-
+@item output_base_image
+forces the output image to be created as a copy on write
+image of the specified base image; @code{output_base_image} should
have the same
+content as the input's base image, however the path, image format, etc may
+differ
 @item fmt
 is the disk image format. It is guessed automatically in most cases.
The following formats are supported:

Index: block.c
===================================================================
--- block.c	(revision 4622)
+++ block.c	(working copy)
@@ -884,6 +884,32 @@
         bdrv_flush(bs->backing_hd);
 }

+/*
+ * Returns true iff the specified sector is present in the disk image. Drivers
+ * not implementing the functionality are assumed to not support backing files,
+ * hence all their sectors are reported as allocated.
+ *
+ * 'pnum' is set to the number of sectors (including and immediately following
+ * the specified sector) that are known to be in the same
+ * allocated/unallocated state.
+ *
+ * 'nb_sectors' is the max value 'pnum' should be set to.
+ */
+int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
+	int *pnum)
+{
+    if (!bs->drv->bdrv_is_allocated) {
+        if (sector_num >= bs->total_sectors) {
+            *pnum = 0;
+            return 0;
+        }
+        int64_t n = bs->total_sectors - sector_num;
+        *pnum = (n < nb_sectors) ? (n) : (nb_sectors);
+        return 1;
+    }
+    return bs->drv->bdrv_is_allocated(bs, sector_num, nb_sectors, pnum);
+}
+
 #ifndef QEMU_IMG
 void bdrv_info(void)
 {
Index: block.h
===================================================================
--- block.h	(revision 4622)
+++ block.h	(working copy)
@@ -99,6 +99,8 @@

 /* Ensure contents are flushed to disk.  */
 void bdrv_flush(BlockDriverState *bs);
+int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
+	int *pnum);

 #define BDRV_TYPE_HD     0
 #define BDRV_TYPE_CDROM  1
Index: qemu-img.c
===================================================================
--- qemu-img.c	(revision 4622)
+++ qemu-img.c	(working copy)
@@ -55,13 +55,17 @@
            "Command syntax:\n"
            "  create [-e] [-6] [-b base_image] [-f fmt] filename [size]\n"
            "  commit [-f fmt] filename\n"
-           "  convert [-c] [-e] [-6] [-f fmt] [-O output_fmt]
filename [filename2 [...]] output_filename\n"
+           "  convert [-c] [-e] [-6] [-f fmt] [-O output_fmt] [-B
output_base_image] filename [filename2 [...]] output_filename\n"
            "  info [-f fmt] filename\n"
            "\n"
            "Command parameters:\n"
            "  'filename' is a disk image filename\n"
            "  'base_image' is the read-only disk image which is used
as base for a copy on\n"
            "    write image; the copy on write image only stores the
modified data\n"
+           "  'output_base_image' forces the output image to be
created as a copy on write\n"
+           "    image of the specified base image;
'output_base_image' should have the same\n"
+           "    content as the input's base image, however the path,
image format, etc may\n"
+           "    differ\n"
            "  'fmt' is the disk image format. It is guessed
automatically in most cases\n"
            "  'size' is the disk image size in kilobytes. Optional
suffixes 'M' (megabyte)\n"
            "    and 'G' (gigabyte) are supported\n"
@@ -350,6 +354,13 @@
     return 0;
 }

+/*
+ * Returns true iff the first sector pointed to by 'buf' contains at least
+ * a non-NUL byte.
+ *
+ * 'pnum' is set to the number of sectors (including and immediately following
+ * the first one) that are known to be in the same allocated/unallocated state.
+ */
 static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum)
 {
     int v, i;
@@ -373,7 +384,7 @@
 static int img_convert(int argc, char **argv)
 {
     int c, ret, n, n1, bs_n, bs_i, flags, cluster_size, cluster_sectors;
-    const char *fmt, *out_fmt, *out_filename;
+    const char *fmt, *out_fmt, *out_baseimg, *out_filename;
     BlockDriver *drv;
     BlockDriverState **bs, *out_bs;
     int64_t total_sectors, nb_sectors, sector_num, bs_offset;
@@ -384,9 +395,10 @@

     fmt = NULL;
     out_fmt = "raw";
+    out_baseimg = NULL;
     flags = 0;
     for(;;) {
-        c = getopt(argc, argv, "f:O:hce6");
+        c = getopt(argc, argv, "f:O:B:hce6");
         if (c == -1)
             break;
         switch(c) {
@@ -399,6 +411,9 @@
         case 'O':
             out_fmt = optarg;
             break;
+        case 'B':
+            out_baseimg = optarg;
+            break;
         case 'c':
             flags |= BLOCK_FLAG_COMPRESS;
             break;
@@ -415,6 +430,9 @@
     if (bs_n < 1) help();

     out_filename = argv[argc - 1];
+
+    if (bs_n > 1 && out_baseimg)
+        error("-B makes no sense when concatenating multiple input images");

     bs = calloc(bs_n, sizeof(BlockDriverState *));
     if (!bs)
@@ -441,7 +459,7 @@
     if (flags & BLOCK_FLAG_ENCRYPT && flags & BLOCK_FLAG_COMPRESS)
         error("Compression and encryption not supported at the same time");

-    ret = bdrv_create(drv, out_filename, total_sectors, NULL, flags);
+    ret = bdrv_create(drv, out_filename, total_sectors, out_baseimg, flags);
     if (ret < 0) {
         if (ret == -ENOTSUP) {
             error("Formatting not supported for file format '%s'", fmt);
@@ -520,7 +538,7 @@
         /* signal EOF to align */
         bdrv_write_compressed(out_bs, 0, NULL, 0);
     } else {
-        sector_num = 0;
+        sector_num = 0; // total number of sectors converted so far
         for(;;) {
             nb_sectors = total_sectors - sector_num;
             if (nb_sectors <= 0)
@@ -543,6 +561,20 @@
             if (n > bs_offset + bs_sectors - sector_num)
                 n = bs_offset + bs_sectors - sector_num;

+            /* If the output image is being created as a copy on write image,
+               assume that sectors which are unallocated in the input image
+               are present in both the output's and input's base images (no
+               need to copy them). */
+            if (out_baseimg) {
+               if (!bdrv_is_allocated(bs[bs_i], sector_num -
bs_offset, n, &n1)) {
+                  sector_num += n1;
+                  continue;
+               }
+               /* The next 'n1' sectors are allocated in the input image. Copy
+                  only those as they may be followed by unallocated sectors. */
+               n = n1;
+            }
+
             if (bdrv_read(bs[bs_i], sector_num - bs_offset, buf, n) < 0)
                 error("error while reading");
             /* NOTE: at the same time we convert, we do not write zero
@@ -550,7 +582,10 @@
                should add a specific call to have the info to go faster */
             buf1 = buf;
             while (n > 0) {
-                if (is_allocated_sectors(buf1, n, &n1)) {
+                /* If the output image is being created as a copy on
write image,
+                   copy all sectors even the ones containing only NUL bytes,
+                   because they may differ from the sectors in the
base image. */
+                if (out_baseimg || is_allocated_sectors(buf1, n, &n1)) {
                     if (bdrv_write(out_bs, sector_num, buf1, n1) < 0)
                         error("error while writing");
                 }