* [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel
@ 2011-01-04 14:33 Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 1/5] migration: exit with error code Juan Quintela
` (5 more replies)
0 siblings, 6 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
To: qemu-devel
Hi
This patch set creates infrastructure to invalidate buffers on
migration target machine. The best way to see the problems is:
# create a new qcow2 image
qemu-img create -f qcow2 foo.img
# start the destination host
qemu .... path=foo.img....
# start the source host with one installation
qemu .... path=foo.img....
# migrate after lots of disk writes
Destination will have "read" the beggining of the blocks of the file
(where the headers are). There are two bugs here:
a- we need to re-read the image after migration, to have the new values
(reopening fixes it)
b- we need to be sure that we read the new blocks that are on the server,
not the buffered ones locally from the start of the run.
NFS: flush on source and close + open on target invalidates the cache
Block devices: on linux, BLKFLSBUF invalidates all the buffers for that
device. This fixes iSCSI & FiberChannel.
I tested iSCSI & NFS. NFS patch have been on RHEL5 kvm forever (I just
forget to send the patch upstream). Our NFS gurus & cluster gurus told
that this is enough for linux to ensure consistence.
Once there, I fixed a couple of minor bugs (the first 3 patches):
- migration should exit with error 1 as everything else.
- memory leak on drive_uninit.
- fix cleanup on error on drive_init()
Later, Juan.
Juan Quintela (5):
migration: exit with error code
blockdev: don't leak id on removal
blockdev: release resources in the error case
Reopen files after migration
drive_open: Add invalidate option for block devices
block.h | 2 ++
block/raw-posix.c | 24 ++++++++++++++++++++++++
blockdev.c | 53 +++++++++++++++++++++++++++++++++++++++++++++--------
blockdev.h | 6 ++++++
migration.c | 8 +++++++-
vl.c | 2 +-
6 files changed, 85 insertions(+), 10 deletions(-)
--
1.7.3.4
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 1/5] migration: exit with error code
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 2/5] blockdev: don't leak id on removal Juan Quintela
` (4 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
To: qemu-devel
exits due to errors should end with error code 1, not zero or negative.
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
migration.c | 2 +-
vl.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/migration.c b/migration.c
index e5ba51c..a8b65e5 100644
--- a/migration.c
+++ b/migration.c
@@ -62,7 +62,7 @@ void process_incoming_migration(QEMUFile *f)
{
if (qemu_loadvm_state(f) < 0) {
fprintf(stderr, "load of migration failed\n");
- exit(0);
+ exit(1);
}
qemu_announce_self();
DPRINTF("successfully loaded vm state\n");
diff --git a/vl.c b/vl.c
index 78fcef1..e5349af 100644
--- a/vl.c
+++ b/vl.c
@@ -3107,7 +3107,7 @@ int main(int argc, char **argv, char **envp)
if (ret < 0) {
fprintf(stderr, "Migration failed. Exit code %s(%d), exiting.\n",
incoming, ret);
- exit(ret);
+ exit(1);
}
} else if (autostart) {
vm_start();
--
1.7.3.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 2/5] blockdev: don't leak id on removal
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 1/5] migration: exit with error code Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 3/5] blockdev: release resources in the error case Juan Quintela
` (3 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
blockdev.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index d7add36..da619ad 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -115,6 +115,7 @@ void drive_uninit(DriveInfo *dinfo)
qemu_opts_del(dinfo->opts);
bdrv_delete(dinfo->bdrv);
QTAILQ_REMOVE(&drives, dinfo, next);
+ qemu_free(dinfo->id);
qemu_free(dinfo);
}
--
1.7.3.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 3/5] blockdev: release resources in the error case
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 1/5] migration: exit with error code Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 2/5] blockdev: don't leak id on removal Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
` (2 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
blockdev.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index da619ad..f9bb659 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -467,7 +467,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
}
if (!file || !*file) {
*fatal_error = 0;
- return NULL;
+ goto error;
}
if (snapshot) {
/* always use cache=unsafe with snapshot */
@@ -481,7 +481,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
} else if (ro == 1) {
if (type != IF_SCSI && type != IF_VIRTIO && type != IF_FLOPPY && type != IF_NONE) {
fprintf(stderr, "qemu: readonly flag not supported for drive with this interface\n");
- return NULL;
+ goto error;
}
}
@@ -491,13 +491,16 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
if (ret < 0) {
fprintf(stderr, "qemu: could not open disk image %s: %s\n",
file, strerror(-ret));
- return NULL;
+ goto error;
}
if (bdrv_key_required(dinfo->bdrv))
autostart = 0;
*fatal_error = 0;
return dinfo;
+error:
+ drive_uninit(dinfo);
+ return NULL;
}
void do_commit(Monitor *mon, const QDict *qdict)
--
1.7.3.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 4/5] Reopen files after migration
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
` (2 preceding siblings ...)
2011-01-04 14:33 ` [Qemu-devel] [PATCH 3/5] blockdev: release resources in the error case Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
2011-01-04 19:05 ` Blue Swirl
2011-01-05 15:52 ` Markus Armbruster
2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
2011-01-10 10:15 ` [Qemu-devel] Re: [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
5 siblings, 2 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
To: qemu-devel
We need to invalidate the Read Cache on the destination, otherwise we
have corruption. Easy way to reproduce it is:
- create an qcow2 images
- start qemu on destination of migration (qemu .... -incoming tcp:...)
- start qemu on source of migration and do one install.
- migrate at the end of install (when lot of disk IO has happened).
Destination of migration has a local copy of the L1/L2 tables that existed
at the beginning, before the install started. We have disk corruption at
this point. The solution (for NFS) is to just re-open the file. Operations
have to happen in this order:
- source of migration: flush()
- destination: close(file);
- destination: open(file)
it is not necesary that source of migration close the file.
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
blockdev.c | 42 +++++++++++++++++++++++++++++++++++++-----
blockdev.h | 6 ++++++
migration.c | 6 ++++++
3 files changed, 49 insertions(+), 5 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index f9bb659..3f3df7a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -116,6 +116,7 @@ void drive_uninit(DriveInfo *dinfo)
bdrv_delete(dinfo->bdrv);
QTAILQ_REMOVE(&drives, dinfo, next);
qemu_free(dinfo->id);
+ qemu_free(dinfo->file);
qemu_free(dinfo);
}
@@ -136,6 +137,36 @@ static int parse_block_error_action(const char *buf, int is_read)
}
}
+static int drive_open(DriveInfo *dinfo)
+{
+ int res = bdrv_open(dinfo->bdrv, dinfo->file, dinfo->bdrv_flags, dinfo->drv);
+
+ if (res < 0) {
+ fprintf(stderr, "qemu: could not open disk image %s: %s\n",
+ dinfo->file, strerror(errno));
+ }
+ return res;
+}
+
+int drives_reinit(void)
+{
+ DriveInfo *dinfo;
+
+ QTAILQ_FOREACH(dinfo, &drives, next) {
+ if (dinfo->opened && !bdrv_is_read_only(dinfo->bdrv)) {
+ int res;
+ bdrv_close(dinfo->bdrv);
+ res = drive_open(dinfo);
+ if (res) {
+ fprintf(stderr, "qemu: re-open of %s failed wth error %d\n",
+ dinfo->file, res);
+ return res;
+ }
+ }
+ }
+ return 0;
+}
+
DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
{
const char *buf;
@@ -156,7 +187,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
const char *devaddr;
DriveInfo *dinfo;
int snapshot = 0;
- int ret;
*fatal_error = 1;
@@ -487,10 +517,12 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
- ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
- if (ret < 0) {
- fprintf(stderr, "qemu: could not open disk image %s: %s\n",
- file, strerror(-ret));
+ dinfo->file = qemu_strdup(file);
+ dinfo->bdrv_flags = bdrv_flags;
+ dinfo->drv = drv;
+ dinfo->opened = 1;
+
+ if (drive_open(dinfo) < 0) {
goto error;
}
diff --git a/blockdev.h b/blockdev.h
index 4536b5c..e009fee 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -29,6 +29,10 @@ struct DriveInfo {
QemuOpts *opts;
char serial[BLOCK_SERIAL_STRLEN + 1];
QTAILQ_ENTRY(DriveInfo) next;
+ int opened;
+ int bdrv_flags;
+ char *file;
+ BlockDriver *drv;
};
#define MAX_IDE_DEVS 2
@@ -42,6 +46,8 @@ DriveInfo *drive_get_by_blockdev(BlockDriverState *bs);
QemuOpts *drive_add(const char *file, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
DriveInfo *drive_init(QemuOpts *arg, int default_to_scsi, int *fatal_error);
+extern int drives_reinit(void);
+
/* device-hotplug */
DriveInfo *add_init_drive(const char *opts);
diff --git a/migration.c b/migration.c
index a8b65e5..fdff804 100644
--- a/migration.c
+++ b/migration.c
@@ -17,6 +17,7 @@
#include "buffered_file.h"
#include "sysemu.h"
#include "block.h"
+#include "blockdev.h"
#include "qemu_socket.h"
#include "block-migration.h"
#include "qemu-objects.h"
@@ -69,6 +70,11 @@ void process_incoming_migration(QEMUFile *f)
incoming_expected = false;
+ if (drives_reinit() != 0) {
+ fprintf(stderr, "reopening of drives failed\n");
+ exit(1);
+ }
+
if (autostart)
vm_start();
}
--
1.7.3.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
` (3 preceding siblings ...)
2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
2011-01-04 19:06 ` Blue Swirl
2011-01-07 8:38 ` [Qemu-devel] " Christoph Hellwig
2011-01-10 10:15 ` [Qemu-devel] Re: [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
5 siblings, 2 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
To: qemu-devel
Linux allows to invalidate block devices. This is needed for the incoming
migration part.
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
block.h | 2 ++
block/raw-posix.c | 24 ++++++++++++++++++++++++
blockdev.c | 9 +++++----
3 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/block.h b/block.h
index f923add..5ac96a5 100644
--- a/block.h
+++ b/block.h
@@ -34,6 +34,8 @@ typedef struct QEMUSnapshotInfo {
#define BDRV_O_NATIVE_AIO 0x0080 /* use native AIO instead of the thread pool */
#define BDRV_O_NO_BACKING 0x0100 /* don't open the backing file */
#define BDRV_O_NO_FLUSH 0x0200 /* disable flushing on this disk */
+#define BDRV_O_INVALIDATE 0x0400 /* invalidate buffer cache for this device.
+ re-read things from server */
#define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 6b72470..9439cf1 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -51,6 +51,7 @@
#include <sys/param.h>
#include <linux/cdrom.h>
#include <linux/fd.h>
+#include <linux/fs.h>
#endif
#if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
#include <signal.h>
@@ -168,6 +169,29 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
s->fd = fd;
s->aligned_buf = NULL;
+#ifdef __linux__
+ if ((bdrv_flags & BDRV_O_INVALIDATE)) {
+ struct stat buf;
+ int res;
+
+ res = fstat(fd, &buf);
+
+ if (res < 0) {
+ return -errno;
+ }
+
+ if (S_ISBLK(buf.st_mode)) {
+ printf("we are in a block device: %s\n", filename);
+ res = ioctl(fd, BLKFLSBUF, 0);
+ if (res < 0) {
+ fprintf(stderr, "qemu: buffer invalidation of %s"
+ " failed with error %d\n", filename, errno);
+ return -errno;
+ }
+ }
+ }
+#endif /* __linux__ */
+
if ((bdrv_flags & BDRV_O_NOCACHE)) {
/*
* Allocate a buffer for read/modify/write cycles. Chose the size
diff --git a/blockdev.c b/blockdev.c
index 3f3df7a..94920b8 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -137,9 +137,10 @@ static int parse_block_error_action(const char *buf, int is_read)
}
}
-static int drive_open(DriveInfo *dinfo)
+static int drive_open(DriveInfo *dinfo, int extra_flags)
{
- int res = bdrv_open(dinfo->bdrv, dinfo->file, dinfo->bdrv_flags, dinfo->drv);
+ int res = bdrv_open(dinfo->bdrv, dinfo->file,
+ dinfo->bdrv_flags | extra_flags, dinfo->drv);
if (res < 0) {
fprintf(stderr, "qemu: could not open disk image %s: %s\n",
@@ -156,7 +157,7 @@ int drives_reinit(void)
if (dinfo->opened && !bdrv_is_read_only(dinfo->bdrv)) {
int res;
bdrv_close(dinfo->bdrv);
- res = drive_open(dinfo);
+ res = drive_open(dinfo, BDRV_O_INVALIDATE);
if (res) {
fprintf(stderr, "qemu: re-open of %s failed wth error %d\n",
dinfo->file, res);
@@ -522,7 +523,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
dinfo->drv = drv;
dinfo->opened = 1;
- if (drive_open(dinfo) < 0) {
+ if (drive_open(dinfo, 0) < 0) {
goto error;
}
--
1.7.3.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 4/5] Reopen files after migration
2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
@ 2011-01-04 19:05 ` Blue Swirl
2011-01-05 15:52 ` Markus Armbruster
1 sibling, 0 replies; 12+ messages in thread
From: Blue Swirl @ 2011-01-04 19:05 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel
On Tue, Jan 4, 2011 at 2:33 PM, Juan Quintela <quintela@redhat.com> wrote:
> We need to invalidate the Read Cache on the destination, otherwise we
> have corruption. Easy way to reproduce it is:
>
> - create an qcow2 images
> - start qemu on destination of migration (qemu .... -incoming tcp:...)
> - start qemu on source of migration and do one install.
> - migrate at the end of install (when lot of disk IO has happened).
>
> Destination of migration has a local copy of the L1/L2 tables that existed
> at the beginning, before the install started. We have disk corruption at
> this point. The solution (for NFS) is to just re-open the file. Operations
> have to happen in this order:
>
> - source of migration: flush()
> - destination: close(file);
> - destination: open(file)
>
> it is not necesary that source of migration close the file.
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
> blockdev.c | 42 +++++++++++++++++++++++++++++++++++++-----
> blockdev.h | 6 ++++++
> migration.c | 6 ++++++
> 3 files changed, 49 insertions(+), 5 deletions(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index f9bb659..3f3df7a 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -116,6 +116,7 @@ void drive_uninit(DriveInfo *dinfo)
> bdrv_delete(dinfo->bdrv);
> QTAILQ_REMOVE(&drives, dinfo, next);
> qemu_free(dinfo->id);
> + qemu_free(dinfo->file);
> qemu_free(dinfo);
> }
>
> @@ -136,6 +137,36 @@ static int parse_block_error_action(const char *buf, int is_read)
> }
> }
>
> +static int drive_open(DriveInfo *dinfo)
> +{
> + int res = bdrv_open(dinfo->bdrv, dinfo->file, dinfo->bdrv_flags, dinfo->drv);
> +
> + if (res < 0) {
> + fprintf(stderr, "qemu: could not open disk image %s: %s\n",
> + dinfo->file, strerror(errno));
> + }
> + return res;
> +}
> +
> +int drives_reinit(void)
> +{
> + DriveInfo *dinfo;
> +
> + QTAILQ_FOREACH(dinfo, &drives, next) {
> + if (dinfo->opened && !bdrv_is_read_only(dinfo->bdrv)) {
> + int res;
> + bdrv_close(dinfo->bdrv);
> + res = drive_open(dinfo);
> + if (res) {
> + fprintf(stderr, "qemu: re-open of %s failed wth error %d\n",
> + dinfo->file, res);
> + return res;
> + }
> + }
> + }
> + return 0;
> +}
> +
> DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
> {
> const char *buf;
> @@ -156,7 +187,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
> const char *devaddr;
> DriveInfo *dinfo;
> int snapshot = 0;
> - int ret;
>
> *fatal_error = 1;
>
> @@ -487,10 +517,12 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
>
> bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>
> - ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
> - if (ret < 0) {
> - fprintf(stderr, "qemu: could not open disk image %s: %s\n",
> - file, strerror(-ret));
> + dinfo->file = qemu_strdup(file);
> + dinfo->bdrv_flags = bdrv_flags;
> + dinfo->drv = drv;
> + dinfo->opened = 1;
> +
> + if (drive_open(dinfo) < 0) {
> goto error;
> }
>
> diff --git a/blockdev.h b/blockdev.h
> index 4536b5c..e009fee 100644
> --- a/blockdev.h
> +++ b/blockdev.h
> @@ -29,6 +29,10 @@ struct DriveInfo {
> QemuOpts *opts;
> char serial[BLOCK_SERIAL_STRLEN + 1];
> QTAILQ_ENTRY(DriveInfo) next;
> + int opened;
> + int bdrv_flags;
> + char *file;
> + BlockDriver *drv;
> };
>
> #define MAX_IDE_DEVS 2
> @@ -42,6 +46,8 @@ DriveInfo *drive_get_by_blockdev(BlockDriverState *bs);
> QemuOpts *drive_add(const char *file, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
> DriveInfo *drive_init(QemuOpts *arg, int default_to_scsi, int *fatal_error);
>
> +extern int drives_reinit(void);
'extern' is useless for functions.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices
2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
@ 2011-01-04 19:06 ` Blue Swirl
2011-01-04 19:23 ` [Qemu-devel] " Juan Quintela
2011-01-07 8:38 ` [Qemu-devel] " Christoph Hellwig
1 sibling, 1 reply; 12+ messages in thread
From: Blue Swirl @ 2011-01-04 19:06 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel
On Tue, Jan 4, 2011 at 2:33 PM, Juan Quintela <quintela@redhat.com> wrote:
> Linux allows to invalidate block devices. This is needed for the incoming
> migration part.
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
> block.h | 2 ++
> block/raw-posix.c | 24 ++++++++++++++++++++++++
> blockdev.c | 9 +++++----
> 3 files changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/block.h b/block.h
> index f923add..5ac96a5 100644
> --- a/block.h
> +++ b/block.h
> @@ -34,6 +34,8 @@ typedef struct QEMUSnapshotInfo {
> #define BDRV_O_NATIVE_AIO 0x0080 /* use native AIO instead of the thread pool */
> #define BDRV_O_NO_BACKING 0x0100 /* don't open the backing file */
> #define BDRV_O_NO_FLUSH 0x0200 /* disable flushing on this disk */
> +#define BDRV_O_INVALIDATE 0x0400 /* invalidate buffer cache for this device.
> + re-read things from server */
>
> #define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 6b72470..9439cf1 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -51,6 +51,7 @@
> #include <sys/param.h>
> #include <linux/cdrom.h>
> #include <linux/fd.h>
> +#include <linux/fs.h>
> #endif
> #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
> #include <signal.h>
> @@ -168,6 +169,29 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
> s->fd = fd;
> s->aligned_buf = NULL;
>
> +#ifdef __linux__
> + if ((bdrv_flags & BDRV_O_INVALIDATE)) {
> + struct stat buf;
> + int res;
> +
> + res = fstat(fd, &buf);
> +
> + if (res < 0) {
> + return -errno;
> + }
> +
> + if (S_ISBLK(buf.st_mode)) {
> + printf("we are in a block device: %s\n", filename);
Leftover debugging?
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] Re: [PATCH 5/5] drive_open: Add invalidate option for block devices
2011-01-04 19:06 ` Blue Swirl
@ 2011-01-04 19:23 ` Juan Quintela
0 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 19:23 UTC (permalink / raw)
To: Blue Swirl; +Cc: qemu-devel
Blue Swirl <blauwirbel@gmail.com> wrote:
> On Tue, Jan 4, 2011 at 2:33 PM, Juan Quintela <quintela@redhat.com> wrote:
>> Linux allows to invalidate block devices. This is needed for the incoming
>> migration part.
>>
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> ---
>> block.h | 2 ++
>> block/raw-posix.c | 24 ++++++++++++++++++++++++
>> blockdev.c | 9 +++++----
>> 3 files changed, 31 insertions(+), 4 deletions(-)
>>
>> diff --git a/block.h b/block.h
>> index f923add..5ac96a5 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -34,6 +34,8 @@ typedef struct QEMUSnapshotInfo {
>> #define BDRV_O_NATIVE_AIO 0x0080 /* use native AIO instead of the thread pool */
>> #define BDRV_O_NO_BACKING 0x0100 /* don't open the backing file */
>> #define BDRV_O_NO_FLUSH 0x0200 /* disable flushing on this disk */
>> +#define BDRV_O_INVALIDATE 0x0400 /* invalidate buffer cache for this device.
>> + re-read things from server */
>>
>> #define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
>>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index 6b72470..9439cf1 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -51,6 +51,7 @@
>> #include <sys/param.h>
>> #include <linux/cdrom.h>
>> #include <linux/fd.h>
>> +#include <linux/fs.h>
>> #endif
>> #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
>> #include <signal.h>
>> @@ -168,6 +169,29 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>> s->fd = fd;
>> s->aligned_buf = NULL;
>>
>> +#ifdef __linux__
>> + if ((bdrv_flags & BDRV_O_INVALIDATE)) {
>> + struct stat buf;
>> + int res;
>> +
>> + res = fstat(fd, &buf);
>> +
>> + if (res < 0) {
>> + return -errno;
>> + }
>> +
>> + if (S_ISBLK(buf.st_mode)) {
>> + printf("we are in a block device: %s\n", filename);
>
> Leftover debugging?
ouch, yes.
thanks for the spotting.
Later, Juan.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 4/5] Reopen files after migration
2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
2011-01-04 19:05 ` Blue Swirl
@ 2011-01-05 15:52 ` Markus Armbruster
1 sibling, 0 replies; 12+ messages in thread
From: Markus Armbruster @ 2011-01-05 15:52 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel
Juan Quintela <quintela@redhat.com> writes:
> We need to invalidate the Read Cache on the destination, otherwise we
> have corruption. Easy way to reproduce it is:
>
> - create an qcow2 images
> - start qemu on destination of migration (qemu .... -incoming tcp:...)
> - start qemu on source of migration and do one install.
> - migrate at the end of install (when lot of disk IO has happened).
>
> Destination of migration has a local copy of the L1/L2 tables that existed
> at the beginning, before the install started. We have disk corruption at
> this point. The solution (for NFS) is to just re-open the file. Operations
> have to happen in this order:
>
> - source of migration: flush()
> - destination: close(file);
> - destination: open(file)
>
> it is not necesary that source of migration close the file.
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
> blockdev.c | 42 +++++++++++++++++++++++++++++++++++++-----
> blockdev.h | 6 ++++++
> migration.c | 6 ++++++
> 3 files changed, 49 insertions(+), 5 deletions(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index f9bb659..3f3df7a 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -116,6 +116,7 @@ void drive_uninit(DriveInfo *dinfo)
> bdrv_delete(dinfo->bdrv);
> QTAILQ_REMOVE(&drives, dinfo, next);
> qemu_free(dinfo->id);
> + qemu_free(dinfo->file);
> qemu_free(dinfo);
> }
>
> @@ -136,6 +137,36 @@ static int parse_block_error_action(const char *buf, int is_read)
> }
> }
>
> +static int drive_open(DriveInfo *dinfo)
> +{
> + int res = bdrv_open(dinfo->bdrv, dinfo->file, dinfo->bdrv_flags, dinfo->drv);
> +
> + if (res < 0) {
> + fprintf(stderr, "qemu: could not open disk image %s: %s\n",
> + dinfo->file, strerror(errno));
> + }
> + return res;
> +}
> +
> +int drives_reinit(void)
> +{
> + DriveInfo *dinfo;
> +
> + QTAILQ_FOREACH(dinfo, &drives, next) {
> + if (dinfo->opened && !bdrv_is_read_only(dinfo->bdrv)) {
> + int res;
> + bdrv_close(dinfo->bdrv);
> + res = drive_open(dinfo);
> + if (res) {
> + fprintf(stderr, "qemu: re-open of %s failed wth error %d\n",
> + dinfo->file, res);
> + return res;
> + }
> + }
> + }
> + return 0;
> +}
> +
Shouldn't this live one layer down, in block.c?
We already have two reopens there, in bdrv_commit() and
bdrv_snapshot_goto().
I reduced DriveInfo to a mere helper object for drive_init() & friends
(see commit f8b6cc00, for instance). Real block work should not use
it.
> DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
> {
> const char *buf;
> @@ -156,7 +187,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
> const char *devaddr;
> DriveInfo *dinfo;
> int snapshot = 0;
> - int ret;
>
> *fatal_error = 1;
>
> @@ -487,10 +517,12 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
>
> bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>
> - ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
> - if (ret < 0) {
> - fprintf(stderr, "qemu: could not open disk image %s: %s\n",
> - file, strerror(-ret));
> + dinfo->file = qemu_strdup(file);
> + dinfo->bdrv_flags = bdrv_flags;
> + dinfo->drv = drv;
> + dinfo->opened = 1;
> +
> + if (drive_open(dinfo) < 0) {
> goto error;
> }
>
> diff --git a/blockdev.h b/blockdev.h
> index 4536b5c..e009fee 100644
> --- a/blockdev.h
> +++ b/blockdev.h
> @@ -29,6 +29,10 @@ struct DriveInfo {
> QemuOpts *opts;
> char serial[BLOCK_SERIAL_STRLEN + 1];
> QTAILQ_ENTRY(DriveInfo) next;
> + int opened;
Could you explain why you need this member?
> + int bdrv_flags;
Use BlockDriverState's open_flags, like the other reopens?
> + char *file;
BlockDriverState's filename?
> + BlockDriver *drv;
BlockDriverState's drv?
> };
>
> #define MAX_IDE_DEVS 2
> @@ -42,6 +46,8 @@ DriveInfo *drive_get_by_blockdev(BlockDriverState *bs);
> QemuOpts *drive_add(const char *file, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
> DriveInfo *drive_init(QemuOpts *arg, int default_to_scsi, int *fatal_error);
>
> +extern int drives_reinit(void);
> +
> /* device-hotplug */
>
> DriveInfo *add_init_drive(const char *opts);
> diff --git a/migration.c b/migration.c
> index a8b65e5..fdff804 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -17,6 +17,7 @@
> #include "buffered_file.h"
> #include "sysemu.h"
> #include "block.h"
> +#include "blockdev.h"
> #include "qemu_socket.h"
> #include "block-migration.h"
> #include "qemu-objects.h"
> @@ -69,6 +70,11 @@ void process_incoming_migration(QEMUFile *f)
>
> incoming_expected = false;
>
> + if (drives_reinit() != 0) {
> + fprintf(stderr, "reopening of drives failed\n");
> + exit(1);
> + }
> +
> if (autostart)
> vm_start();
> }
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices
2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
2011-01-04 19:06 ` Blue Swirl
@ 2011-01-07 8:38 ` Christoph Hellwig
1 sibling, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2011-01-07 8:38 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel
On Tue, Jan 04, 2011 at 03:33:30PM +0100, Juan Quintela wrote:
> Linux allows to invalidate block devices. This is needed for the incoming
> migration part.
It's not. The only thing that makes migration on block devices safe is
using O_DIRECT, aka cache=none.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] Re: [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
` (4 preceding siblings ...)
2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
@ 2011-01-10 10:15 ` Juan Quintela
5 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-10 10:15 UTC (permalink / raw)
To: qemu-devel
Juan Quintela <quintela@redhat.com> wrote:
> Hi
Nack myself :-(
After discussions with hch on irc, he "convinced" me that only way to
fix the problem is forcing O_DIRECT if migration is going to happen.
So, changing patchset to check that cache=none is used in all read/write
devices.
Later, Juan.
> This patch set creates infrastructure to invalidate buffers on
> migration target machine. The best way to see the problems is:
>
> # create a new qcow2 image
> qemu-img create -f qcow2 foo.img
> # start the destination host
> qemu .... path=foo.img....
> # start the source host with one installation
> qemu .... path=foo.img....
> # migrate after lots of disk writes
>
> Destination will have "read" the beggining of the blocks of the file
> (where the headers are). There are two bugs here:
> a- we need to re-read the image after migration, to have the new values
> (reopening fixes it)
> b- we need to be sure that we read the new blocks that are on the server,
> not the buffered ones locally from the start of the run.
> NFS: flush on source and close + open on target invalidates the cache
> Block devices: on linux, BLKFLSBUF invalidates all the buffers for that
> device. This fixes iSCSI & FiberChannel.
>
> I tested iSCSI & NFS. NFS patch have been on RHEL5 kvm forever (I just
> forget to send the patch upstream). Our NFS gurus & cluster gurus told
> that this is enough for linux to ensure consistence.
>
> Once there, I fixed a couple of minor bugs (the first 3 patches):
> - migration should exit with error 1 as everything else.
> - memory leak on drive_uninit.
> - fix cleanup on error on drive_init()
>
> Later, Juan.
>
> Juan Quintela (5):
> migration: exit with error code
> blockdev: don't leak id on removal
> blockdev: release resources in the error case
> Reopen files after migration
> drive_open: Add invalidate option for block devices
>
> block.h | 2 ++
> block/raw-posix.c | 24 ++++++++++++++++++++++++
> blockdev.c | 53 +++++++++++++++++++++++++++++++++++++++++++++--------
> blockdev.h | 6 ++++++
> migration.c | 8 +++++++-
> vl.c | 2 +-
> 6 files changed, 85 insertions(+), 10 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-01-10 10:16 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 1/5] migration: exit with error code Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 2/5] blockdev: don't leak id on removal Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 3/5] blockdev: release resources in the error case Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
2011-01-04 19:05 ` Blue Swirl
2011-01-05 15:52 ` Markus Armbruster
2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
2011-01-04 19:06 ` Blue Swirl
2011-01-04 19:23 ` [Qemu-devel] " Juan Quintela
2011-01-07 8:38 ` [Qemu-devel] " Christoph Hellwig
2011-01-10 10:15 ` [Qemu-devel] Re: [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).