qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel
@ 2011-01-04 14:33 Juan Quintela
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 1/5] migration: exit with error code Juan Quintela
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
  To: qemu-devel

Hi

This patch set creates infrastructure to invalidate buffers on
migration target machine.  The best way to see the problems is:

# create a new qcow2 image
qemu-img create -f qcow2 foo.img
# start the destination host
qemu .... path=foo.img....
# start the source host with one installation
qemu .... path=foo.img....
# migrate after lots of disk writes

Destination will have "read" the beggining of the blocks of the file
(where the headers are).  There are two bugs here:
a- we need to re-read the image after migration, to have the new values
   (reopening fixes it)
b- we need to be sure that we read the new blocks that are on the server,
   not the buffered ones locally from the start of the run.
   NFS: flush on source and close + open on target invalidates the cache
   Block devices: on linux, BLKFLSBUF invalidates all the buffers for that
   device.  This fixes iSCSI & FiberChannel.

I tested iSCSI & NFS.  NFS patch have been on RHEL5 kvm forever (I just
forget to send the patch upstream).  Our NFS gurus & cluster gurus told
that this is enough for linux to ensure consistence.

Once there, I fixed a couple of minor bugs (the first 3 patches):
- migration should exit with error 1 as everything else.
- memory leak on drive_uninit.
- fix cleanup on error on drive_init()

Later, Juan.

Juan Quintela (5):
  migration: exit with error code
  blockdev: don't leak id on removal
  blockdev: release resources in the error case
  Reopen files after migration
  drive_open: Add invalidate option for block devices

 block.h           |    2 ++
 block/raw-posix.c |   24 ++++++++++++++++++++++++
 blockdev.c        |   53 +++++++++++++++++++++++++++++++++++++++++++++--------
 blockdev.h        |    6 ++++++
 migration.c       |    8 +++++++-
 vl.c              |    2 +-
 6 files changed, 85 insertions(+), 10 deletions(-)

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 1/5] migration: exit with error code
  2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 2/5] blockdev: don't leak id on removal Juan Quintela
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
  To: qemu-devel

exits due to errors should end with error code 1, not zero or negative.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration.c |    2 +-
 vl.c        |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration.c b/migration.c
index e5ba51c..a8b65e5 100644
--- a/migration.c
+++ b/migration.c
@@ -62,7 +62,7 @@ void process_incoming_migration(QEMUFile *f)
 {
     if (qemu_loadvm_state(f) < 0) {
         fprintf(stderr, "load of migration failed\n");
-        exit(0);
+        exit(1);
     }
     qemu_announce_self();
     DPRINTF("successfully loaded vm state\n");
diff --git a/vl.c b/vl.c
index 78fcef1..e5349af 100644
--- a/vl.c
+++ b/vl.c
@@ -3107,7 +3107,7 @@ int main(int argc, char **argv, char **envp)
         if (ret < 0) {
             fprintf(stderr, "Migration failed. Exit code %s(%d), exiting.\n",
                     incoming, ret);
-            exit(ret);
+            exit(1);
         }
     } else if (autostart) {
         vm_start();
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 2/5] blockdev: don't leak id on removal
  2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 1/5] migration: exit with error code Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 3/5] blockdev: release resources in the error case Juan Quintela
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 blockdev.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index d7add36..da619ad 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -115,6 +115,7 @@ void drive_uninit(DriveInfo *dinfo)
     qemu_opts_del(dinfo->opts);
     bdrv_delete(dinfo->bdrv);
     QTAILQ_REMOVE(&drives, dinfo, next);
+    qemu_free(dinfo->id);
     qemu_free(dinfo);
 }

-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 3/5] blockdev: release resources in the error case
  2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 1/5] migration: exit with error code Juan Quintela
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 2/5] blockdev: don't leak id on removal Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 blockdev.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index da619ad..f9bb659 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -467,7 +467,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
     }
     if (!file || !*file) {
         *fatal_error = 0;
-        return NULL;
+        goto error;
     }
     if (snapshot) {
         /* always use cache=unsafe with snapshot */
@@ -481,7 +481,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
     } else if (ro == 1) {
         if (type != IF_SCSI && type != IF_VIRTIO && type != IF_FLOPPY && type != IF_NONE) {
             fprintf(stderr, "qemu: readonly flag not supported for drive with this interface\n");
-            return NULL;
+            goto error;
         }
     }

@@ -491,13 +491,16 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
     if (ret < 0) {
         fprintf(stderr, "qemu: could not open disk image %s: %s\n",
                         file, strerror(-ret));
-        return NULL;
+        goto error;
     }

     if (bdrv_key_required(dinfo->bdrv))
         autostart = 0;
     *fatal_error = 0;
     return dinfo;
+error:
+    drive_uninit(dinfo);
+    return NULL;
 }

 void do_commit(Monitor *mon, const QDict *qdict)
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 4/5] Reopen files after migration
  2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
                   ` (2 preceding siblings ...)
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 3/5] blockdev: release resources in the error case Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
  2011-01-04 19:05   ` Blue Swirl
  2011-01-05 15:52   ` Markus Armbruster
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
  2011-01-10 10:15 ` [Qemu-devel] Re: [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
  5 siblings, 2 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
  To: qemu-devel

We need to invalidate the Read Cache on the destination, otherwise we
have corruption.  Easy way to reproduce it is:

- create an qcow2 images
- start qemu on destination of migration (qemu .... -incoming tcp:...)
- start qemu on source of migration and do one install.
- migrate at the end of install (when lot of disk IO has happened).

Destination of migration has a local copy of the L1/L2 tables that existed
at the beginning, before the install started.  We have disk corruption at
this point.  The solution (for NFS) is to just re-open the file.  Operations
have to happen in this order:

- source of migration: flush()
- destination: close(file);
- destination: open(file)

it is not necesary that source of migration close the file.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 blockdev.c  |   42 +++++++++++++++++++++++++++++++++++++-----
 blockdev.h  |    6 ++++++
 migration.c |    6 ++++++
 3 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index f9bb659..3f3df7a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -116,6 +116,7 @@ void drive_uninit(DriveInfo *dinfo)
     bdrv_delete(dinfo->bdrv);
     QTAILQ_REMOVE(&drives, dinfo, next);
     qemu_free(dinfo->id);
+    qemu_free(dinfo->file);
     qemu_free(dinfo);
 }

@@ -136,6 +137,36 @@ static int parse_block_error_action(const char *buf, int is_read)
     }
 }

+static int drive_open(DriveInfo *dinfo)
+{
+    int res = bdrv_open(dinfo->bdrv, dinfo->file, dinfo->bdrv_flags, dinfo->drv);
+
+    if (res < 0) {
+        fprintf(stderr, "qemu: could not open disk image %s: %s\n",
+                        dinfo->file, strerror(errno));
+    }
+    return res;
+}
+
+int drives_reinit(void)
+{
+    DriveInfo *dinfo;
+
+    QTAILQ_FOREACH(dinfo, &drives, next) {
+        if (dinfo->opened && !bdrv_is_read_only(dinfo->bdrv)) {
+            int res;
+            bdrv_close(dinfo->bdrv);
+            res = drive_open(dinfo);
+            if (res) {
+		    fprintf(stderr, "qemu: re-open of %s failed wth error %d\n",
+			    dinfo->file, res);
+		    return res;
+	    }
+        }
+    }
+    return 0;
+}
+
 DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
 {
     const char *buf;
@@ -156,7 +187,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
     const char *devaddr;
     DriveInfo *dinfo;
     int snapshot = 0;
-    int ret;

     *fatal_error = 1;

@@ -487,10 +517,12 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)

     bdrv_flags |= ro ? 0 : BDRV_O_RDWR;

-    ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
-    if (ret < 0) {
-        fprintf(stderr, "qemu: could not open disk image %s: %s\n",
-                        file, strerror(-ret));
+    dinfo->file = qemu_strdup(file);
+    dinfo->bdrv_flags = bdrv_flags;
+    dinfo->drv = drv;
+    dinfo->opened = 1;
+
+    if (drive_open(dinfo) < 0) {
         goto error;
     }

diff --git a/blockdev.h b/blockdev.h
index 4536b5c..e009fee 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -29,6 +29,10 @@ struct DriveInfo {
     QemuOpts *opts;
     char serial[BLOCK_SERIAL_STRLEN + 1];
     QTAILQ_ENTRY(DriveInfo) next;
+    int opened;
+    int bdrv_flags;
+    char *file;
+    BlockDriver *drv;
 };

 #define MAX_IDE_DEVS	2
@@ -42,6 +46,8 @@ DriveInfo *drive_get_by_blockdev(BlockDriverState *bs);
 QemuOpts *drive_add(const char *file, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
 DriveInfo *drive_init(QemuOpts *arg, int default_to_scsi, int *fatal_error);

+extern int drives_reinit(void);
+
 /* device-hotplug */

 DriveInfo *add_init_drive(const char *opts);
diff --git a/migration.c b/migration.c
index a8b65e5..fdff804 100644
--- a/migration.c
+++ b/migration.c
@@ -17,6 +17,7 @@
 #include "buffered_file.h"
 #include "sysemu.h"
 #include "block.h"
+#include "blockdev.h"
 #include "qemu_socket.h"
 #include "block-migration.h"
 #include "qemu-objects.h"
@@ -69,6 +70,11 @@ void process_incoming_migration(QEMUFile *f)

     incoming_expected = false;

+    if (drives_reinit() != 0) {
+        fprintf(stderr, "reopening of drives failed\n");
+        exit(1);
+    }
+
     if (autostart)
         vm_start();
 }
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices
  2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
                   ` (3 preceding siblings ...)
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
  2011-01-04 19:06   ` Blue Swirl
  2011-01-07  8:38   ` [Qemu-devel] " Christoph Hellwig
  2011-01-10 10:15 ` [Qemu-devel] Re: [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
  5 siblings, 2 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
  To: qemu-devel

Linux allows to invalidate block devices.  This is needed for the incoming
migration part.

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 block.h           |    2 ++
 block/raw-posix.c |   24 ++++++++++++++++++++++++
 blockdev.c        |    9 +++++----
 3 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/block.h b/block.h
index f923add..5ac96a5 100644
--- a/block.h
+++ b/block.h
@@ -34,6 +34,8 @@ typedef struct QEMUSnapshotInfo {
 #define BDRV_O_NATIVE_AIO  0x0080 /* use native AIO instead of the thread pool */
 #define BDRV_O_NO_BACKING  0x0100 /* don't open the backing file */
 #define BDRV_O_NO_FLUSH    0x0200 /* disable flushing on this disk */
+#define BDRV_O_INVALIDATE  0x0400 /* invalidate buffer cache for this device.
+                                     re-read things from server */

 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 6b72470..9439cf1 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -51,6 +51,7 @@
 #include <sys/param.h>
 #include <linux/cdrom.h>
 #include <linux/fd.h>
+#include <linux/fs.h>
 #endif
 #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
 #include <signal.h>
@@ -168,6 +169,29 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
     s->fd = fd;
     s->aligned_buf = NULL;

+#ifdef __linux__
+    if ((bdrv_flags & BDRV_O_INVALIDATE)) {
+        struct stat buf;
+        int res;
+
+        res = fstat(fd, &buf);
+
+        if (res < 0) {
+            return -errno;
+        }
+
+        if (S_ISBLK(buf.st_mode)) {
+            printf("we are in a block device: %s\n", filename);
+            res = ioctl(fd, BLKFLSBUF, 0);
+            if (res < 0) {
+                fprintf(stderr, "qemu: buffer invalidation of %s"
+                        " failed with error %d\n", filename, errno);
+                return -errno;
+            }
+        }
+    }
+#endif /* __linux__ */
+
     if ((bdrv_flags & BDRV_O_NOCACHE)) {
         /*
          * Allocate a buffer for read/modify/write cycles.  Chose the size
diff --git a/blockdev.c b/blockdev.c
index 3f3df7a..94920b8 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -137,9 +137,10 @@ static int parse_block_error_action(const char *buf, int is_read)
     }
 }

-static int drive_open(DriveInfo *dinfo)
+static int drive_open(DriveInfo *dinfo, int extra_flags)
 {
-    int res = bdrv_open(dinfo->bdrv, dinfo->file, dinfo->bdrv_flags, dinfo->drv);
+    int res = bdrv_open(dinfo->bdrv, dinfo->file,
+                        dinfo->bdrv_flags | extra_flags, dinfo->drv);

     if (res < 0) {
         fprintf(stderr, "qemu: could not open disk image %s: %s\n",
@@ -156,7 +157,7 @@ int drives_reinit(void)
         if (dinfo->opened && !bdrv_is_read_only(dinfo->bdrv)) {
             int res;
             bdrv_close(dinfo->bdrv);
-            res = drive_open(dinfo);
+            res = drive_open(dinfo, BDRV_O_INVALIDATE);
             if (res) {
 		    fprintf(stderr, "qemu: re-open of %s failed wth error %d\n",
 			    dinfo->file, res);
@@ -522,7 +523,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
     dinfo->drv = drv;
     dinfo->opened = 1;

-    if (drive_open(dinfo) < 0) {
+    if (drive_open(dinfo, 0) < 0) {
         goto error;
     }

-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] Reopen files after migration
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
@ 2011-01-04 19:05   ` Blue Swirl
  2011-01-05 15:52   ` Markus Armbruster
  1 sibling, 0 replies; 12+ messages in thread
From: Blue Swirl @ 2011-01-04 19:05 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel

On Tue, Jan 4, 2011 at 2:33 PM, Juan Quintela <quintela@redhat.com> wrote:
> We need to invalidate the Read Cache on the destination, otherwise we
> have corruption.  Easy way to reproduce it is:
>
> - create an qcow2 images
> - start qemu on destination of migration (qemu .... -incoming tcp:...)
> - start qemu on source of migration and do one install.
> - migrate at the end of install (when lot of disk IO has happened).
>
> Destination of migration has a local copy of the L1/L2 tables that existed
> at the beginning, before the install started.  We have disk corruption at
> this point.  The solution (for NFS) is to just re-open the file.  Operations
> have to happen in this order:
>
> - source of migration: flush()
> - destination: close(file);
> - destination: open(file)
>
> it is not necesary that source of migration close the file.
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  blockdev.c  |   42 +++++++++++++++++++++++++++++++++++++-----
>  blockdev.h  |    6 ++++++
>  migration.c |    6 ++++++
>  3 files changed, 49 insertions(+), 5 deletions(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index f9bb659..3f3df7a 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -116,6 +116,7 @@ void drive_uninit(DriveInfo *dinfo)
>     bdrv_delete(dinfo->bdrv);
>     QTAILQ_REMOVE(&drives, dinfo, next);
>     qemu_free(dinfo->id);
> +    qemu_free(dinfo->file);
>     qemu_free(dinfo);
>  }
>
> @@ -136,6 +137,36 @@ static int parse_block_error_action(const char *buf, int is_read)
>     }
>  }
>
> +static int drive_open(DriveInfo *dinfo)
> +{
> +    int res = bdrv_open(dinfo->bdrv, dinfo->file, dinfo->bdrv_flags, dinfo->drv);
> +
> +    if (res < 0) {
> +        fprintf(stderr, "qemu: could not open disk image %s: %s\n",
> +                        dinfo->file, strerror(errno));
> +    }
> +    return res;
> +}
> +
> +int drives_reinit(void)
> +{
> +    DriveInfo *dinfo;
> +
> +    QTAILQ_FOREACH(dinfo, &drives, next) {
> +        if (dinfo->opened && !bdrv_is_read_only(dinfo->bdrv)) {
> +            int res;
> +            bdrv_close(dinfo->bdrv);
> +            res = drive_open(dinfo);
> +            if (res) {
> +                   fprintf(stderr, "qemu: re-open of %s failed wth error %d\n",
> +                           dinfo->file, res);
> +                   return res;
> +           }
> +        }
> +    }
> +    return 0;
> +}
> +
>  DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
>  {
>     const char *buf;
> @@ -156,7 +187,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
>     const char *devaddr;
>     DriveInfo *dinfo;
>     int snapshot = 0;
> -    int ret;
>
>     *fatal_error = 1;
>
> @@ -487,10 +517,12 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
>
>     bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>
> -    ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
> -    if (ret < 0) {
> -        fprintf(stderr, "qemu: could not open disk image %s: %s\n",
> -                        file, strerror(-ret));
> +    dinfo->file = qemu_strdup(file);
> +    dinfo->bdrv_flags = bdrv_flags;
> +    dinfo->drv = drv;
> +    dinfo->opened = 1;
> +
> +    if (drive_open(dinfo) < 0) {
>         goto error;
>     }
>
> diff --git a/blockdev.h b/blockdev.h
> index 4536b5c..e009fee 100644
> --- a/blockdev.h
> +++ b/blockdev.h
> @@ -29,6 +29,10 @@ struct DriveInfo {
>     QemuOpts *opts;
>     char serial[BLOCK_SERIAL_STRLEN + 1];
>     QTAILQ_ENTRY(DriveInfo) next;
> +    int opened;
> +    int bdrv_flags;
> +    char *file;
> +    BlockDriver *drv;
>  };
>
>  #define MAX_IDE_DEVS   2
> @@ -42,6 +46,8 @@ DriveInfo *drive_get_by_blockdev(BlockDriverState *bs);
>  QemuOpts *drive_add(const char *file, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
>  DriveInfo *drive_init(QemuOpts *arg, int default_to_scsi, int *fatal_error);
>
> +extern int drives_reinit(void);

'extern' is useless for functions.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
@ 2011-01-04 19:06   ` Blue Swirl
  2011-01-04 19:23     ` [Qemu-devel] " Juan Quintela
  2011-01-07  8:38   ` [Qemu-devel] " Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Blue Swirl @ 2011-01-04 19:06 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel

On Tue, Jan 4, 2011 at 2:33 PM, Juan Quintela <quintela@redhat.com> wrote:
> Linux allows to invalidate block devices.  This is needed for the incoming
> migration part.
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  block.h           |    2 ++
>  block/raw-posix.c |   24 ++++++++++++++++++++++++
>  blockdev.c        |    9 +++++----
>  3 files changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/block.h b/block.h
> index f923add..5ac96a5 100644
> --- a/block.h
> +++ b/block.h
> @@ -34,6 +34,8 @@ typedef struct QEMUSnapshotInfo {
>  #define BDRV_O_NATIVE_AIO  0x0080 /* use native AIO instead of the thread pool */
>  #define BDRV_O_NO_BACKING  0x0100 /* don't open the backing file */
>  #define BDRV_O_NO_FLUSH    0x0200 /* disable flushing on this disk */
> +#define BDRV_O_INVALIDATE  0x0400 /* invalidate buffer cache for this device.
> +                                     re-read things from server */
>
>  #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 6b72470..9439cf1 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -51,6 +51,7 @@
>  #include <sys/param.h>
>  #include <linux/cdrom.h>
>  #include <linux/fd.h>
> +#include <linux/fs.h>
>  #endif
>  #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
>  #include <signal.h>
> @@ -168,6 +169,29 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>     s->fd = fd;
>     s->aligned_buf = NULL;
>
> +#ifdef __linux__
> +    if ((bdrv_flags & BDRV_O_INVALIDATE)) {
> +        struct stat buf;
> +        int res;
> +
> +        res = fstat(fd, &buf);
> +
> +        if (res < 0) {
> +            return -errno;
> +        }
> +
> +        if (S_ISBLK(buf.st_mode)) {
> +            printf("we are in a block device: %s\n", filename);

Leftover debugging?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] Re: [PATCH 5/5] drive_open: Add invalidate option for block devices
  2011-01-04 19:06   ` Blue Swirl
@ 2011-01-04 19:23     ` Juan Quintela
  0 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 19:23 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

Blue Swirl <blauwirbel@gmail.com> wrote:
> On Tue, Jan 4, 2011 at 2:33 PM, Juan Quintela <quintela@redhat.com> wrote:
>> Linux allows to invalidate block devices.  This is needed for the incoming
>> migration part.
>>
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> ---
>>  block.h           |    2 ++
>>  block/raw-posix.c |   24 ++++++++++++++++++++++++
>>  blockdev.c        |    9 +++++----
>>  3 files changed, 31 insertions(+), 4 deletions(-)
>>
>> diff --git a/block.h b/block.h
>> index f923add..5ac96a5 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -34,6 +34,8 @@ typedef struct QEMUSnapshotInfo {
>>  #define BDRV_O_NATIVE_AIO  0x0080 /* use native AIO instead of the thread pool */
>>  #define BDRV_O_NO_BACKING  0x0100 /* don't open the backing file */
>>  #define BDRV_O_NO_FLUSH    0x0200 /* disable flushing on this disk */
>> +#define BDRV_O_INVALIDATE  0x0400 /* invalidate buffer cache for this device.
>> +                                     re-read things from server */
>>
>>  #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
>>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index 6b72470..9439cf1 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -51,6 +51,7 @@
>>  #include <sys/param.h>
>>  #include <linux/cdrom.h>
>>  #include <linux/fd.h>
>> +#include <linux/fs.h>
>>  #endif
>>  #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
>>  #include <signal.h>
>> @@ -168,6 +169,29 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>>     s->fd = fd;
>>     s->aligned_buf = NULL;
>>
>> +#ifdef __linux__
>> +    if ((bdrv_flags & BDRV_O_INVALIDATE)) {
>> +        struct stat buf;
>> +        int res;
>> +
>> +        res = fstat(fd, &buf);
>> +
>> +        if (res < 0) {
>> +            return -errno;
>> +        }
>> +
>> +        if (S_ISBLK(buf.st_mode)) {
>> +            printf("we are in a block device: %s\n", filename);
>
> Leftover debugging?

ouch, yes.

thanks for the spotting.

Later, Juan.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] Reopen files after migration
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
  2011-01-04 19:05   ` Blue Swirl
@ 2011-01-05 15:52   ` Markus Armbruster
  1 sibling, 0 replies; 12+ messages in thread
From: Markus Armbruster @ 2011-01-05 15:52 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel

Juan Quintela <quintela@redhat.com> writes:

> We need to invalidate the Read Cache on the destination, otherwise we
> have corruption.  Easy way to reproduce it is:
>
> - create an qcow2 images
> - start qemu on destination of migration (qemu .... -incoming tcp:...)
> - start qemu on source of migration and do one install.
> - migrate at the end of install (when lot of disk IO has happened).
>
> Destination of migration has a local copy of the L1/L2 tables that existed
> at the beginning, before the install started.  We have disk corruption at
> this point.  The solution (for NFS) is to just re-open the file.  Operations
> have to happen in this order:
>
> - source of migration: flush()
> - destination: close(file);
> - destination: open(file)
>
> it is not necesary that source of migration close the file.
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  blockdev.c  |   42 +++++++++++++++++++++++++++++++++++++-----
>  blockdev.h  |    6 ++++++
>  migration.c |    6 ++++++
>  3 files changed, 49 insertions(+), 5 deletions(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index f9bb659..3f3df7a 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -116,6 +116,7 @@ void drive_uninit(DriveInfo *dinfo)
>      bdrv_delete(dinfo->bdrv);
>      QTAILQ_REMOVE(&drives, dinfo, next);
>      qemu_free(dinfo->id);
> +    qemu_free(dinfo->file);
>      qemu_free(dinfo);
>  }
>
> @@ -136,6 +137,36 @@ static int parse_block_error_action(const char *buf, int is_read)
>      }
>  }
>
> +static int drive_open(DriveInfo *dinfo)
> +{
> +    int res = bdrv_open(dinfo->bdrv, dinfo->file, dinfo->bdrv_flags, dinfo->drv);
> +
> +    if (res < 0) {
> +        fprintf(stderr, "qemu: could not open disk image %s: %s\n",
> +                        dinfo->file, strerror(errno));
> +    }
> +    return res;
> +}
> +
> +int drives_reinit(void)
> +{
> +    DriveInfo *dinfo;
> +
> +    QTAILQ_FOREACH(dinfo, &drives, next) {
> +        if (dinfo->opened && !bdrv_is_read_only(dinfo->bdrv)) {
> +            int res;
> +            bdrv_close(dinfo->bdrv);
> +            res = drive_open(dinfo);
> +            if (res) {
> +		    fprintf(stderr, "qemu: re-open of %s failed wth error %d\n",
> +			    dinfo->file, res);
> +		    return res;
> +	    }
> +        }
> +    }
> +    return 0;
> +}
> +

Shouldn't this live one layer down, in block.c?

We already have two reopens there, in bdrv_commit() and
bdrv_snapshot_goto().

I reduced DriveInfo to a mere helper object for drive_init() & friends
(see commit f8b6cc00, for instance).  Real block work should not use
it.

>  DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
>  {
>      const char *buf;
> @@ -156,7 +187,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
>      const char *devaddr;
>      DriveInfo *dinfo;
>      int snapshot = 0;
> -    int ret;
>
>      *fatal_error = 1;
>
> @@ -487,10 +517,12 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
>
>      bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>
> -    ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
> -    if (ret < 0) {
> -        fprintf(stderr, "qemu: could not open disk image %s: %s\n",
> -                        file, strerror(-ret));
> +    dinfo->file = qemu_strdup(file);
> +    dinfo->bdrv_flags = bdrv_flags;
> +    dinfo->drv = drv;
> +    dinfo->opened = 1;
> +
> +    if (drive_open(dinfo) < 0) {
>          goto error;
>      }
>
> diff --git a/blockdev.h b/blockdev.h
> index 4536b5c..e009fee 100644
> --- a/blockdev.h
> +++ b/blockdev.h
> @@ -29,6 +29,10 @@ struct DriveInfo {
>      QemuOpts *opts;
>      char serial[BLOCK_SERIAL_STRLEN + 1];
>      QTAILQ_ENTRY(DriveInfo) next;
> +    int opened;

Could you explain why you need this member?

> +    int bdrv_flags;

Use BlockDriverState's open_flags, like the other reopens?

> +    char *file;

BlockDriverState's filename?

> +    BlockDriver *drv;

BlockDriverState's drv?

>  };
>
>  #define MAX_IDE_DEVS	2
> @@ -42,6 +46,8 @@ DriveInfo *drive_get_by_blockdev(BlockDriverState *bs);
>  QemuOpts *drive_add(const char *file, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
>  DriveInfo *drive_init(QemuOpts *arg, int default_to_scsi, int *fatal_error);
>
> +extern int drives_reinit(void);
> +
>  /* device-hotplug */
>
>  DriveInfo *add_init_drive(const char *opts);
> diff --git a/migration.c b/migration.c
> index a8b65e5..fdff804 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -17,6 +17,7 @@
>  #include "buffered_file.h"
>  #include "sysemu.h"
>  #include "block.h"
> +#include "blockdev.h"
>  #include "qemu_socket.h"
>  #include "block-migration.h"
>  #include "qemu-objects.h"
> @@ -69,6 +70,11 @@ void process_incoming_migration(QEMUFile *f)
>
>      incoming_expected = false;
>
> +    if (drives_reinit() != 0) {
> +        fprintf(stderr, "reopening of drives failed\n");
> +        exit(1);
> +    }
> +
>      if (autostart)
>          vm_start();
>  }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
  2011-01-04 19:06   ` Blue Swirl
@ 2011-01-07  8:38   ` Christoph Hellwig
  1 sibling, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2011-01-07  8:38 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel

On Tue, Jan 04, 2011 at 03:33:30PM +0100, Juan Quintela wrote:
> Linux allows to invalidate block devices.  This is needed for the incoming
> migration part.

It's not.  The only thing that makes migration on block devices safe is
using O_DIRECT, aka cache=none.  

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] Re: [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel
  2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
                   ` (4 preceding siblings ...)
  2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
@ 2011-01-10 10:15 ` Juan Quintela
  5 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-10 10:15 UTC (permalink / raw)
  To: qemu-devel

Juan Quintela <quintela@redhat.com> wrote:
> Hi

Nack myself :-(

After discussions with hch on irc, he "convinced" me that only way to
fix the problem is forcing O_DIRECT if migration is going to happen.

So, changing patchset to check that cache=none is used in all read/write
devices.

Later, Juan.

> This patch set creates infrastructure to invalidate buffers on
> migration target machine.  The best way to see the problems is:
>
> # create a new qcow2 image
> qemu-img create -f qcow2 foo.img
> # start the destination host
> qemu .... path=foo.img....
> # start the source host with one installation
> qemu .... path=foo.img....
> # migrate after lots of disk writes
>
> Destination will have "read" the beggining of the blocks of the file
> (where the headers are).  There are two bugs here:
> a- we need to re-read the image after migration, to have the new values
>    (reopening fixes it)
> b- we need to be sure that we read the new blocks that are on the server,
>    not the buffered ones locally from the start of the run.
>    NFS: flush on source and close + open on target invalidates the cache
>    Block devices: on linux, BLKFLSBUF invalidates all the buffers for that
>    device.  This fixes iSCSI & FiberChannel.
>
> I tested iSCSI & NFS.  NFS patch have been on RHEL5 kvm forever (I just
> forget to send the patch upstream).  Our NFS gurus & cluster gurus told
> that this is enough for linux to ensure consistence.
>
> Once there, I fixed a couple of minor bugs (the first 3 patches):
> - migration should exit with error 1 as everything else.
> - memory leak on drive_uninit.
> - fix cleanup on error on drive_init()
>
> Later, Juan.
>
> Juan Quintela (5):
>   migration: exit with error code
>   blockdev: don't leak id on removal
>   blockdev: release resources in the error case
>   Reopen files after migration
>   drive_open: Add invalidate option for block devices
>
>  block.h           |    2 ++
>  block/raw-posix.c |   24 ++++++++++++++++++++++++
>  blockdev.c        |   53 +++++++++++++++++++++++++++++++++++++++++++++--------
>  blockdev.h        |    6 ++++++
>  migration.c       |    8 +++++++-
>  vl.c              |    2 +-
>  6 files changed, 85 insertions(+), 10 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-01-10 10:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 1/5] migration: exit with error code Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 2/5] blockdev: don't leak id on removal Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 3/5] blockdev: release resources in the error case Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
2011-01-04 19:05   ` Blue Swirl
2011-01-05 15:52   ` Markus Armbruster
2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
2011-01-04 19:06   ` Blue Swirl
2011-01-04 19:23     ` [Qemu-devel] " Juan Quintela
2011-01-07  8:38   ` [Qemu-devel] " Christoph Hellwig
2011-01-10 10:15 ` [Qemu-devel] Re: [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).