* [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel
@ 2011-01-04 14:33 Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 1/5] migration: exit with error code Juan Quintela
` (5 more replies)
0 siblings, 6 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
To: qemu-devel
Hi
This patch set creates infrastructure to invalidate buffers on
migration target machine. The best way to see the problems is:
# create a new qcow2 image
qemu-img create -f qcow2 foo.img
# start the destination host
qemu .... path=foo.img....
# start the source host with one installation
qemu .... path=foo.img....
# migrate after lots of disk writes
Destination will have "read" the beggining of the blocks of the file
(where the headers are). There are two bugs here:
a- we need to re-read the image after migration, to have the new values
(reopening fixes it)
b- we need to be sure that we read the new blocks that are on the server,
not the buffered ones locally from the start of the run.
NFS: flush on source and close + open on target invalidates the cache
Block devices: on linux, BLKFLSBUF invalidates all the buffers for that
device. This fixes iSCSI & FiberChannel.
I tested iSCSI & NFS. NFS patch have been on RHEL5 kvm forever (I just
forget to send the patch upstream). Our NFS gurus & cluster gurus told
that this is enough for linux to ensure consistence.
Once there, I fixed a couple of minor bugs (the first 3 patches):
- migration should exit with error 1 as everything else.
- memory leak on drive_uninit.
- fix cleanup on error on drive_init()
Later, Juan.
Juan Quintela (5):
migration: exit with error code
blockdev: don't leak id on removal
blockdev: release resources in the error case
Reopen files after migration
drive_open: Add invalidate option for block devices
block.h | 2 ++
block/raw-posix.c | 24 ++++++++++++++++++++++++
blockdev.c | 53 +++++++++++++++++++++++++++++++++++++++++++++--------
blockdev.h | 6 ++++++
migration.c | 8 +++++++-
vl.c | 2 +-
6 files changed, 85 insertions(+), 10 deletions(-)
--
1.7.3.4
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 1/5] migration: exit with error code
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 2/5] blockdev: don't leak id on removal Juan Quintela
` (4 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
To: qemu-devel
exits due to errors should end with error code 1, not zero or negative.
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
migration.c | 2 +-
vl.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/migration.c b/migration.c
index e5ba51c..a8b65e5 100644
--- a/migration.c
+++ b/migration.c
@@ -62,7 +62,7 @@ void process_incoming_migration(QEMUFile *f)
{
if (qemu_loadvm_state(f) < 0) {
fprintf(stderr, "load of migration failed\n");
- exit(0);
+ exit(1);
}
qemu_announce_self();
DPRINTF("successfully loaded vm state\n");
diff --git a/vl.c b/vl.c
index 78fcef1..e5349af 100644
--- a/vl.c
+++ b/vl.c
@@ -3107,7 +3107,7 @@ int main(int argc, char **argv, char **envp)
if (ret < 0) {
fprintf(stderr, "Migration failed. Exit code %s(%d), exiting.\n",
incoming, ret);
- exit(ret);
+ exit(1);
}
} else if (autostart) {
vm_start();
--
1.7.3.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 2/5] blockdev: don't leak id on removal
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 1/5] migration: exit with error code Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 3/5] blockdev: release resources in the error case Juan Quintela
` (3 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
blockdev.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index d7add36..da619ad 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -115,6 +115,7 @@ void drive_uninit(DriveInfo *dinfo)
qemu_opts_del(dinfo->opts);
bdrv_delete(dinfo->bdrv);
QTAILQ_REMOVE(&drives, dinfo, next);
+ qemu_free(dinfo->id);
qemu_free(dinfo);
}
--
1.7.3.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 3/5] blockdev: release resources in the error case
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 1/5] migration: exit with error code Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 2/5] blockdev: don't leak id on removal Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
` (2 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
blockdev.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index da619ad..f9bb659 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -467,7 +467,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
}
if (!file || !*file) {
*fatal_error = 0;
- return NULL;
+ goto error;
}
if (snapshot) {
/* always use cache=unsafe with snapshot */
@@ -481,7 +481,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
} else if (ro == 1) {
if (type != IF_SCSI && type != IF_VIRTIO && type != IF_FLOPPY && type != IF_NONE) {
fprintf(stderr, "qemu: readonly flag not supported for drive with this interface\n");
- return NULL;
+ goto error;
}
}
@@ -491,13 +491,16 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
if (ret < 0) {
fprintf(stderr, "qemu: could not open disk image %s: %s\n",
file, strerror(-ret));
- return NULL;
+ goto error;
}
if (bdrv_key_required(dinfo->bdrv))
autostart = 0;
*fatal_error = 0;
return dinfo;
+error:
+ drive_uninit(dinfo);
+ return NULL;
}
void do_commit(Monitor *mon, const QDict *qdict)
--
1.7.3.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 4/5] Reopen files after migration
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
` (2 preceding siblings ...)
2011-01-04 14:33 ` [Qemu-devel] [PATCH 3/5] blockdev: release resources in the error case Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
2011-01-04 19:05 ` Blue Swirl
2011-01-05 15:52 ` Markus Armbruster
2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
2011-01-10 10:15 ` [Qemu-devel] Re: [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
5 siblings, 2 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
To: qemu-devel
We need to invalidate the Read Cache on the destination, otherwise we
have corruption. Easy way to reproduce it is:
- create an qcow2 images
- start qemu on destination of migration (qemu .... -incoming tcp:...)
- start qemu on source of migration and do one install.
- migrate at the end of install (when lot of disk IO has happened).
Destination of migration has a local copy of the L1/L2 tables that existed
at the beginning, before the install started. We have disk corruption at
this point. The solution (for NFS) is to just re-open the file. Operations
have to happen in this order:
- source of migration: flush()
- destination: close(file);
- destination: open(file)
it is not necesary that source of migration close the file.
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
blockdev.c | 42 +++++++++++++++++++++++++++++++++++++-----
blockdev.h | 6 ++++++
migration.c | 6 ++++++
3 files changed, 49 insertions(+), 5 deletions(-)
diff --git a/blockdev.c b/blockdev.c
index f9bb659..3f3df7a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -116,6 +116,7 @@ void drive_uninit(DriveInfo *dinfo)
bdrv_delete(dinfo->bdrv);
QTAILQ_REMOVE(&drives, dinfo, next);
qemu_free(dinfo->id);
+ qemu_free(dinfo->file);
qemu_free(dinfo);
}
@@ -136,6 +137,36 @@ static int parse_block_error_action(const char *buf, int is_read)
}
}
+static int drive_open(DriveInfo *dinfo)
+{
+ int res = bdrv_open(dinfo->bdrv, dinfo->file, dinfo->bdrv_flags, dinfo->drv);
+
+ if (res < 0) {
+ fprintf(stderr, "qemu: could not open disk image %s: %s\n",
+ dinfo->file, strerror(errno));
+ }
+ return res;
+}
+
+int drives_reinit(void)
+{
+ DriveInfo *dinfo;
+
+ QTAILQ_FOREACH(dinfo, &drives, next) {
+ if (dinfo->opened && !bdrv_is_read_only(dinfo->bdrv)) {
+ int res;
+ bdrv_close(dinfo->bdrv);
+ res = drive_open(dinfo);
+ if (res) {
+ fprintf(stderr, "qemu: re-open of %s failed wth error %d\n",
+ dinfo->file, res);
+ return res;
+ }
+ }
+ }
+ return 0;
+}
+
DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
{
const char *buf;
@@ -156,7 +187,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
const char *devaddr;
DriveInfo *dinfo;
int snapshot = 0;
- int ret;
*fatal_error = 1;
@@ -487,10 +517,12 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
- ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
- if (ret < 0) {
- fprintf(stderr, "qemu: could not open disk image %s: %s\n",
- file, strerror(-ret));
+ dinfo->file = qemu_strdup(file);
+ dinfo->bdrv_flags = bdrv_flags;
+ dinfo->drv = drv;
+ dinfo->opened = 1;
+
+ if (drive_open(dinfo) < 0) {
goto error;
}
diff --git a/blockdev.h b/blockdev.h
index 4536b5c..e009fee 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -29,6 +29,10 @@ struct DriveInfo {
QemuOpts *opts;
char serial[BLOCK_SERIAL_STRLEN + 1];
QTAILQ_ENTRY(DriveInfo) next;
+ int opened;
+ int bdrv_flags;
+ char *file;
+ BlockDriver *drv;
};
#define MAX_IDE_DEVS 2
@@ -42,6 +46,8 @@ DriveInfo *drive_get_by_blockdev(BlockDriverState *bs);
QemuOpts *drive_add(const char *file, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
DriveInfo *drive_init(QemuOpts *arg, int default_to_scsi, int *fatal_error);
+extern int drives_reinit(void);
+
/* device-hotplug */
DriveInfo *add_init_drive(const char *opts);
diff --git a/migration.c b/migration.c
index a8b65e5..fdff804 100644
--- a/migration.c
+++ b/migration.c
@@ -17,6 +17,7 @@
#include "buffered_file.h"
#include "sysemu.h"
#include "block.h"
+#include "blockdev.h"
#include "qemu_socket.h"
#include "block-migration.h"
#include "qemu-objects.h"
@@ -69,6 +70,11 @@ void process_incoming_migration(QEMUFile *f)
incoming_expected = false;
+ if (drives_reinit() != 0) {
+ fprintf(stderr, "reopening of drives failed\n");
+ exit(1);
+ }
+
if (autostart)
vm_start();
}
--
1.7.3.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
` (3 preceding siblings ...)
2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
@ 2011-01-04 14:33 ` Juan Quintela
2011-01-04 19:06 ` Blue Swirl
2011-01-07 8:38 ` [Qemu-devel] " Christoph Hellwig
2011-01-10 10:15 ` [Qemu-devel] Re: [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
5 siblings, 2 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 14:33 UTC (permalink / raw)
To: qemu-devel
Linux allows to invalidate block devices. This is needed for the incoming
migration part.
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
block.h | 2 ++
block/raw-posix.c | 24 ++++++++++++++++++++++++
blockdev.c | 9 +++++----
3 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/block.h b/block.h
index f923add..5ac96a5 100644
--- a/block.h
+++ b/block.h
@@ -34,6 +34,8 @@ typedef struct QEMUSnapshotInfo {
#define BDRV_O_NATIVE_AIO 0x0080 /* use native AIO instead of the thread pool */
#define BDRV_O_NO_BACKING 0x0100 /* don't open the backing file */
#define BDRV_O_NO_FLUSH 0x0200 /* disable flushing on this disk */
+#define BDRV_O_INVALIDATE 0x0400 /* invalidate buffer cache for this device.
+ re-read things from server */
#define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 6b72470..9439cf1 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -51,6 +51,7 @@
#include <sys/param.h>
#include <linux/cdrom.h>
#include <linux/fd.h>
+#include <linux/fs.h>
#endif
#if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
#include <signal.h>
@@ -168,6 +169,29 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
s->fd = fd;
s->aligned_buf = NULL;
+#ifdef __linux__
+ if ((bdrv_flags & BDRV_O_INVALIDATE)) {
+ struct stat buf;
+ int res;
+
+ res = fstat(fd, &buf);
+
+ if (res < 0) {
+ return -errno;
+ }
+
+ if (S_ISBLK(buf.st_mode)) {
+ printf("we are in a block device: %s\n", filename);
+ res = ioctl(fd, BLKFLSBUF, 0);
+ if (res < 0) {
+ fprintf(stderr, "qemu: buffer invalidation of %s"
+ " failed with error %d\n", filename, errno);
+ return -errno;
+ }
+ }
+ }
+#endif /* __linux__ */
+
if ((bdrv_flags & BDRV_O_NOCACHE)) {
/*
* Allocate a buffer for read/modify/write cycles. Chose the size
diff --git a/blockdev.c b/blockdev.c
index 3f3df7a..94920b8 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -137,9 +137,10 @@ static int parse_block_error_action(const char *buf, int is_read)
}
}
-static int drive_open(DriveInfo *dinfo)
+static int drive_open(DriveInfo *dinfo, int extra_flags)
{
- int res = bdrv_open(dinfo->bdrv, dinfo->file, dinfo->bdrv_flags, dinfo->drv);
+ int res = bdrv_open(dinfo->bdrv, dinfo->file,
+ dinfo->bdrv_flags | extra_flags, dinfo->drv);
if (res < 0) {
fprintf(stderr, "qemu: could not open disk image %s: %s\n",
@@ -156,7 +157,7 @@ int drives_reinit(void)
if (dinfo->opened && !bdrv_is_read_only(dinfo->bdrv)) {
int res;
bdrv_close(dinfo->bdrv);
- res = drive_open(dinfo);
+ res = drive_open(dinfo, BDRV_O_INVALIDATE);
if (res) {
fprintf(stderr, "qemu: re-open of %s failed wth error %d\n",
dinfo->file, res);
@@ -522,7 +523,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
dinfo->drv = drv;
dinfo->opened = 1;
- if (drive_open(dinfo) < 0) {
+ if (drive_open(dinfo, 0) < 0) {
goto error;
}
--
1.7.3.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 4/5] Reopen files after migration
2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
@ 2011-01-04 19:05 ` Blue Swirl
2011-01-05 15:52 ` Markus Armbruster
1 sibling, 0 replies; 12+ messages in thread
From: Blue Swirl @ 2011-01-04 19:05 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel
On Tue, Jan 4, 2011 at 2:33 PM, Juan Quintela <quintela@redhat.com> wrote:
> We need to invalidate the Read Cache on the destination, otherwise we
> have corruption. Easy way to reproduce it is:
>
> - create an qcow2 images
> - start qemu on destination of migration (qemu .... -incoming tcp:...)
> - start qemu on source of migration and do one install.
> - migrate at the end of install (when lot of disk IO has happened).
>
> Destination of migration has a local copy of the L1/L2 tables that existed
> at the beginning, before the install started. We have disk corruption at
> this point. The solution (for NFS) is to just re-open the file. Operations
> have to happen in this order:
>
> - source of migration: flush()
> - destination: close(file);
> - destination: open(file)
>
> it is not necesary that source of migration close the file.
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
> blockdev.c | 42 +++++++++++++++++++++++++++++++++++++-----
> blockdev.h | 6 ++++++
> migration.c | 6 ++++++
> 3 files changed, 49 insertions(+), 5 deletions(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index f9bb659..3f3df7a 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -116,6 +116,7 @@ void drive_uninit(DriveInfo *dinfo)
> bdrv_delete(dinfo->bdrv);
> QTAILQ_REMOVE(&drives, dinfo, next);
> qemu_free(dinfo->id);
> + qemu_free(dinfo->file);
> qemu_free(dinfo);
> }
>
> @@ -136,6 +137,36 @@ static int parse_block_error_action(const char *buf, int is_read)
> }
> }
>
> +static int drive_open(DriveInfo *dinfo)
> +{
> + int res = bdrv_open(dinfo->bdrv, dinfo->file, dinfo->bdrv_flags, dinfo->drv);
> +
> + if (res < 0) {
> + fprintf(stderr, "qemu: could not open disk image %s: %s\n",
> + dinfo->file, strerror(errno));
> + }
> + return res;
> +}
> +
> +int drives_reinit(void)
> +{
> + DriveInfo *dinfo;
> +
> + QTAILQ_FOREACH(dinfo, &drives, next) {
> + if (dinfo->opened && !bdrv_is_read_only(dinfo->bdrv)) {
> + int res;
> + bdrv_close(dinfo->bdrv);
> + res = drive_open(dinfo);
> + if (res) {
> + fprintf(stderr, "qemu: re-open of %s failed wth error %d\n",
> + dinfo->file, res);
> + return res;
> + }
> + }
> + }
> + return 0;
> +}
> +
> DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
> {
> const char *buf;
> @@ -156,7 +187,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
> const char *devaddr;
> DriveInfo *dinfo;
> int snapshot = 0;
> - int ret;
>
> *fatal_error = 1;
>
> @@ -487,10 +517,12 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
>
> bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>
> - ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
> - if (ret < 0) {
> - fprintf(stderr, "qemu: could not open disk image %s: %s\n",
> - file, strerror(-ret));
> + dinfo->file = qemu_strdup(file);
> + dinfo->bdrv_flags = bdrv_flags;
> + dinfo->drv = drv;
> + dinfo->opened = 1;
> +
> + if (drive_open(dinfo) < 0) {
> goto error;
> }
>
> diff --git a/blockdev.h b/blockdev.h
> index 4536b5c..e009fee 100644
> --- a/blockdev.h
> +++ b/blockdev.h
> @@ -29,6 +29,10 @@ struct DriveInfo {
> QemuOpts *opts;
> char serial[BLOCK_SERIAL_STRLEN + 1];
> QTAILQ_ENTRY(DriveInfo) next;
> + int opened;
> + int bdrv_flags;
> + char *file;
> + BlockDriver *drv;
> };
>
> #define MAX_IDE_DEVS 2
> @@ -42,6 +46,8 @@ DriveInfo *drive_get_by_blockdev(BlockDriverState *bs);
> QemuOpts *drive_add(const char *file, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
> DriveInfo *drive_init(QemuOpts *arg, int default_to_scsi, int *fatal_error);
>
> +extern int drives_reinit(void);
'extern' is useless for functions.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices
2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
@ 2011-01-04 19:06 ` Blue Swirl
2011-01-04 19:23 ` [Qemu-devel] " Juan Quintela
2011-01-07 8:38 ` [Qemu-devel] " Christoph Hellwig
1 sibling, 1 reply; 12+ messages in thread
From: Blue Swirl @ 2011-01-04 19:06 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel
On Tue, Jan 4, 2011 at 2:33 PM, Juan Quintela <quintela@redhat.com> wrote:
> Linux allows to invalidate block devices. This is needed for the incoming
> migration part.
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
> block.h | 2 ++
> block/raw-posix.c | 24 ++++++++++++++++++++++++
> blockdev.c | 9 +++++----
> 3 files changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/block.h b/block.h
> index f923add..5ac96a5 100644
> --- a/block.h
> +++ b/block.h
> @@ -34,6 +34,8 @@ typedef struct QEMUSnapshotInfo {
> #define BDRV_O_NATIVE_AIO 0x0080 /* use native AIO instead of the thread pool */
> #define BDRV_O_NO_BACKING 0x0100 /* don't open the backing file */
> #define BDRV_O_NO_FLUSH 0x0200 /* disable flushing on this disk */
> +#define BDRV_O_INVALIDATE 0x0400 /* invalidate buffer cache for this device.
> + re-read things from server */
>
> #define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 6b72470..9439cf1 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -51,6 +51,7 @@
> #include <sys/param.h>
> #include <linux/cdrom.h>
> #include <linux/fd.h>
> +#include <linux/fs.h>
> #endif
> #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
> #include <signal.h>
> @@ -168,6 +169,29 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
> s->fd = fd;
> s->aligned_buf = NULL;
>
> +#ifdef __linux__
> + if ((bdrv_flags & BDRV_O_INVALIDATE)) {
> + struct stat buf;
> + int res;
> +
> + res = fstat(fd, &buf);
> +
> + if (res < 0) {
> + return -errno;
> + }
> +
> + if (S_ISBLK(buf.st_mode)) {
> + printf("we are in a block device: %s\n", filename);
Leftover debugging?
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] Re: [PATCH 5/5] drive_open: Add invalidate option for block devices
2011-01-04 19:06 ` Blue Swirl
@ 2011-01-04 19:23 ` Juan Quintela
0 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-04 19:23 UTC (permalink / raw)
To: Blue Swirl; +Cc: qemu-devel
Blue Swirl <blauwirbel@gmail.com> wrote:
> On Tue, Jan 4, 2011 at 2:33 PM, Juan Quintela <quintela@redhat.com> wrote:
>> Linux allows to invalidate block devices. This is needed for the incoming
>> migration part.
>>
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> ---
>> block.h | 2 ++
>> block/raw-posix.c | 24 ++++++++++++++++++++++++
>> blockdev.c | 9 +++++----
>> 3 files changed, 31 insertions(+), 4 deletions(-)
>>
>> diff --git a/block.h b/block.h
>> index f923add..5ac96a5 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -34,6 +34,8 @@ typedef struct QEMUSnapshotInfo {
>> #define BDRV_O_NATIVE_AIO 0x0080 /* use native AIO instead of the thread pool */
>> #define BDRV_O_NO_BACKING 0x0100 /* don't open the backing file */
>> #define BDRV_O_NO_FLUSH 0x0200 /* disable flushing on this disk */
>> +#define BDRV_O_INVALIDATE 0x0400 /* invalidate buffer cache for this device.
>> + re-read things from server */
>>
>> #define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
>>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index 6b72470..9439cf1 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -51,6 +51,7 @@
>> #include <sys/param.h>
>> #include <linux/cdrom.h>
>> #include <linux/fd.h>
>> +#include <linux/fs.h>
>> #endif
>> #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
>> #include <signal.h>
>> @@ -168,6 +169,29 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>> s->fd = fd;
>> s->aligned_buf = NULL;
>>
>> +#ifdef __linux__
>> + if ((bdrv_flags & BDRV_O_INVALIDATE)) {
>> + struct stat buf;
>> + int res;
>> +
>> + res = fstat(fd, &buf);
>> +
>> + if (res < 0) {
>> + return -errno;
>> + }
>> +
>> + if (S_ISBLK(buf.st_mode)) {
>> + printf("we are in a block device: %s\n", filename);
>
> Leftover debugging?
ouch, yes.
thanks for the spotting.
Later, Juan.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 4/5] Reopen files after migration
2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
2011-01-04 19:05 ` Blue Swirl
@ 2011-01-05 15:52 ` Markus Armbruster
1 sibling, 0 replies; 12+ messages in thread
From: Markus Armbruster @ 2011-01-05 15:52 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel
Juan Quintela <quintela@redhat.com> writes:
> We need to invalidate the Read Cache on the destination, otherwise we
> have corruption. Easy way to reproduce it is:
>
> - create an qcow2 images
> - start qemu on destination of migration (qemu .... -incoming tcp:...)
> - start qemu on source of migration and do one install.
> - migrate at the end of install (when lot of disk IO has happened).
>
> Destination of migration has a local copy of the L1/L2 tables that existed
> at the beginning, before the install started. We have disk corruption at
> this point. The solution (for NFS) is to just re-open the file. Operations
> have to happen in this order:
>
> - source of migration: flush()
> - destination: close(file);
> - destination: open(file)
>
> it is not necesary that source of migration close the file.
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
> blockdev.c | 42 +++++++++++++++++++++++++++++++++++++-----
> blockdev.h | 6 ++++++
> migration.c | 6 ++++++
> 3 files changed, 49 insertions(+), 5 deletions(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index f9bb659..3f3df7a 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -116,6 +116,7 @@ void drive_uninit(DriveInfo *dinfo)
> bdrv_delete(dinfo->bdrv);
> QTAILQ_REMOVE(&drives, dinfo, next);
> qemu_free(dinfo->id);
> + qemu_free(dinfo->file);
> qemu_free(dinfo);
> }
>
> @@ -136,6 +137,36 @@ static int parse_block_error_action(const char *buf, int is_read)
> }
> }
>
> +static int drive_open(DriveInfo *dinfo)
> +{
> + int res = bdrv_open(dinfo->bdrv, dinfo->file, dinfo->bdrv_flags, dinfo->drv);
> +
> + if (res < 0) {
> + fprintf(stderr, "qemu: could not open disk image %s: %s\n",
> + dinfo->file, strerror(errno));
> + }
> + return res;
> +}
> +
> +int drives_reinit(void)
> +{
> + DriveInfo *dinfo;
> +
> + QTAILQ_FOREACH(dinfo, &drives, next) {
> + if (dinfo->opened && !bdrv_is_read_only(dinfo->bdrv)) {
> + int res;
> + bdrv_close(dinfo->bdrv);
> + res = drive_open(dinfo);
> + if (res) {
> + fprintf(stderr, "qemu: re-open of %s failed wth error %d\n",
> + dinfo->file, res);
> + return res;
> + }
> + }
> + }
> + return 0;
> +}
> +
Shouldn't this live one layer down, in block.c?
We already have two reopens there, in bdrv_commit() and
bdrv_snapshot_goto().
I reduced DriveInfo to a mere helper object for drive_init() & friends
(see commit f8b6cc00, for instance). Real block work should not use
it.
> DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
> {
> const char *buf;
> @@ -156,7 +187,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
> const char *devaddr;
> DriveInfo *dinfo;
> int snapshot = 0;
> - int ret;
>
> *fatal_error = 1;
>
> @@ -487,10 +517,12 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi, int *fatal_error)
>
> bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>
> - ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
> - if (ret < 0) {
> - fprintf(stderr, "qemu: could not open disk image %s: %s\n",
> - file, strerror(-ret));
> + dinfo->file = qemu_strdup(file);
> + dinfo->bdrv_flags = bdrv_flags;
> + dinfo->drv = drv;
> + dinfo->opened = 1;
> +
> + if (drive_open(dinfo) < 0) {
> goto error;
> }
>
> diff --git a/blockdev.h b/blockdev.h
> index 4536b5c..e009fee 100644
> --- a/blockdev.h
> +++ b/blockdev.h
> @@ -29,6 +29,10 @@ struct DriveInfo {
> QemuOpts *opts;
> char serial[BLOCK_SERIAL_STRLEN + 1];
> QTAILQ_ENTRY(DriveInfo) next;
> + int opened;
Could you explain why you need this member?
> + int bdrv_flags;
Use BlockDriverState's open_flags, like the other reopens?
> + char *file;
BlockDriverState's filename?
> + BlockDriver *drv;
BlockDriverState's drv?
> };
>
> #define MAX_IDE_DEVS 2
> @@ -42,6 +46,8 @@ DriveInfo *drive_get_by_blockdev(BlockDriverState *bs);
> QemuOpts *drive_add(const char *file, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
> DriveInfo *drive_init(QemuOpts *arg, int default_to_scsi, int *fatal_error);
>
> +extern int drives_reinit(void);
> +
> /* device-hotplug */
>
> DriveInfo *add_init_drive(const char *opts);
> diff --git a/migration.c b/migration.c
> index a8b65e5..fdff804 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -17,6 +17,7 @@
> #include "buffered_file.h"
> #include "sysemu.h"
> #include "block.h"
> +#include "blockdev.h"
> #include "qemu_socket.h"
> #include "block-migration.h"
> #include "qemu-objects.h"
> @@ -69,6 +70,11 @@ void process_incoming_migration(QEMUFile *f)
>
> incoming_expected = false;
>
> + if (drives_reinit() != 0) {
> + fprintf(stderr, "reopening of drives failed\n");
> + exit(1);
> + }
> +
> if (autostart)
> vm_start();
> }
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices
2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
2011-01-04 19:06 ` Blue Swirl
@ 2011-01-07 8:38 ` Christoph Hellwig
1 sibling, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2011-01-07 8:38 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel
On Tue, Jan 04, 2011 at 03:33:30PM +0100, Juan Quintela wrote:
> Linux allows to invalidate block devices. This is needed for the incoming
> migration part.
It's not. The only thing that makes migration on block devices safe is
using O_DIRECT, aka cache=none.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] Re: [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
` (4 preceding siblings ...)
2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
@ 2011-01-10 10:15 ` Juan Quintela
5 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2011-01-10 10:15 UTC (permalink / raw)
To: qemu-devel
Juan Quintela <quintela@redhat.com> wrote:
> Hi
Nack myself :-(
After discussions with hch on irc, he "convinced" me that only way to
fix the problem is forcing O_DIRECT if migration is going to happen.
So, changing patchset to check that cache=none is used in all read/write
devices.
Later, Juan.
> This patch set creates infrastructure to invalidate buffers on
> migration target machine. The best way to see the problems is:
>
> # create a new qcow2 image
> qemu-img create -f qcow2 foo.img
> # start the destination host
> qemu .... path=foo.img....
> # start the source host with one installation
> qemu .... path=foo.img....
> # migrate after lots of disk writes
>
> Destination will have "read" the beggining of the blocks of the file
> (where the headers are). There are two bugs here:
> a- we need to re-read the image after migration, to have the new values
> (reopening fixes it)
> b- we need to be sure that we read the new blocks that are on the server,
> not the buffered ones locally from the start of the run.
> NFS: flush on source and close + open on target invalidates the cache
> Block devices: on linux, BLKFLSBUF invalidates all the buffers for that
> device. This fixes iSCSI & FiberChannel.
>
> I tested iSCSI & NFS. NFS patch have been on RHEL5 kvm forever (I just
> forget to send the patch upstream). Our NFS gurus & cluster gurus told
> that this is enough for linux to ensure consistence.
>
> Once there, I fixed a couple of minor bugs (the first 3 patches):
> - migration should exit with error 1 as everything else.
> - memory leak on drive_uninit.
> - fix cleanup on error on drive_init()
>
> Later, Juan.
>
> Juan Quintela (5):
> migration: exit with error code
> blockdev: don't leak id on removal
> blockdev: release resources in the error case
> Reopen files after migration
> drive_open: Add invalidate option for block devices
>
> block.h | 2 ++
> block/raw-posix.c | 24 ++++++++++++++++++++++++
> blockdev.c | 53 +++++++++++++++++++++++++++++++++++++++++++++--------
> blockdev.h | 6 ++++++
> migration.c | 8 +++++++-
> vl.c | 2 +-
> 6 files changed, 85 insertions(+), 10 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-01-10 10:16 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-04 14:33 [Qemu-devel] [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 1/5] migration: exit with error code Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 2/5] blockdev: don't leak id on removal Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 3/5] blockdev: release resources in the error case Juan Quintela
2011-01-04 14:33 ` [Qemu-devel] [PATCH 4/5] Reopen files after migration Juan Quintela
2011-01-04 19:05 ` Blue Swirl
2011-01-05 15:52 ` Markus Armbruster
2011-01-04 14:33 ` [Qemu-devel] [PATCH 5/5] drive_open: Add invalidate option for block devices Juan Quintela
2011-01-04 19:06 ` Blue Swirl
2011-01-04 19:23 ` [Qemu-devel] " Juan Quintela
2011-01-07 8:38 ` [Qemu-devel] " Christoph Hellwig
2011-01-10 10:15 ` [Qemu-devel] Re: [PATCH 0/5] Fix migration with NFS & iscsi/Fiber channel Juan Quintela
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.