* [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram
@ 2013-11-21 9:11 Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 01/17] QAPI: introduce migration capability unix_page_flipping Lei Li
` (17 more replies)
0 siblings, 18 replies; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
This patch series tries to introduce a mechanism using side
channel pipe for RAM via SCM_RIGHTS with unix domain socket
protocol migration.
This side channel is used for the page flipping by vmsplice,
which is the internal mechanism for localhost migration that
we are trying to add to QEMU. The backgroud info and previous
patch series for reference,
Localhost migration
http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg02916.html
migration: Introduce side channel for RAM
http://lists.gnu.org/archive/html/qemu-devel/2013-09/msg04043.html
I have picked patches from the localhost migration series and rebased
it on the series of side channel, now it is a complete series that
passed the basic test.
Please let me know if there is anything needs to be fixed or improved.
Your suggestions and comments are very welcome, and thanks to Paolo
for his continued review and useful suggestions.
Changes since V2;
Address comments from Paolo including:
- Doc improvement for QAPI.
- Use callback get_buffer as the only one receiver.
- Rename the new RunState flipping-migrate to memory-stale, and
add transition from 'prelaunch' to 'memory-stale'.
- Other minor fixes.
Changes since V1:
Address suggestions from Paolo Bonzini including:
- Use Unix socket QEMUFile as basis of code and adjust the way
of overriding RDMA hooks.
- Involve the vmsplice for page flipping.
- Add new RunState RUN_STATE_FLIPPING_MIGRATE and add it to
runstate_needs_reset() for the adjustment of the current
migration process with page flipping.
Lei Li (17):
QAPI: introduce magration capability unix_page_flipping
migration: add migrate_unix_page_flipping()
qmp-command.hx: add missing docs for migration capabilites
migration-local: add QEMUFileLocal with socket based QEMUFile
migration-local: introduce qemu_fopen_socket_local()
migration-local: add send_pipefd()
migration-local: override before_ram_iterate to send pipefd
add unix_msgfd_lookup() to callback get_buffer
save_page: replace block_offset with a MemoryRegion
migration-local: override save_page for page transmit
savevm: adjust ram_control_save_page with page flipping
migration-local: override hook_ram_load
migration-unix: replace qemu_fopen_socket with qemu_fopen_socket_local
add new RanState RAN_STATE_MEMORY_STALE
migration-unix: page flipping support on unix outgoing
migration: adjust migration_thread() process for unix_page_flipping
hmp: better fomat for info migrate_capabilities
Makefile.target | 1 +
arch_init.c | 4 +-
migration-local.c | 512 ++++++++++++++++++++++++++++++++++++++++++
hmp.c | 5 +-
include/migration/migration.h | 3 +
include/migration/qemu-file.h | 2 +
migration-unix.c | 27 ++-
migration-rdma.c | 4 +-
migration.c | 18 +-
qapi-schema.json | 18 +-
qmp-commands.hx | 8 +
savevm.c | 21 +-
vl.c | 12 +-
13 files changed, 617 insertions(+), 27 deletions(-)
create mode 100644 migration-local.c
^ permalink raw reply [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 01/17] QAPI: introduce migration capability unix_page_flipping
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 02/17] migration: add migrate_unix_page_flipping() Lei Li
` (16 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
Introduce unix_page_flipping to MigrationCapability for
localhost migration.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
qapi-schema.json | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/qapi-schema.json b/qapi-schema.json
index 83fa485..b290a0f 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -685,10 +685,18 @@
# @auto-converge: If enabled, QEMU will automatically throttle down the guest
# to speed up convergence of RAM migration. (since 1.6)
#
+# @unix-page-flipping: If enabled, QEMU can optimize migration when the
+# destination is a QEMU process that runs on the same host as
+# the source (as is the case for live upgrade). If the migration
+# transport is a Unix socket, QEMU will flip RAM pages directly to
+# the destination, so that memory is only allocated twice for the
+# source and destination processes. Disabled by default. (since 1.8)
+#
# Since: 1.2
##
{ 'enum': 'MigrationCapability',
- 'data': ['xbzrle', 'x-rdma-pin-all', 'auto-converge', 'zero-blocks'] }
+ 'data': ['xbzrle', 'x-rdma-pin-all', 'auto-converge', 'zero-blocks',
+ 'unix-page-flipping'] }
##
# @MigrationCapabilityStatus
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 02/17] migration: add migrate_unix_page_flipping()
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 01/17] QAPI: introduce migration capability unix_page_flipping Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 03/17] qmp-command.hx: add missing docs for migration capabilites Lei Li
` (15 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
Add migrate_unix_page_flipping() to check if
MIGRATION_CAPABILITY_UNIX_PAGE_FLIPPING is enabled.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
include/migration/migration.h | 3 +++
migration.c | 9 +++++++++
2 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 140e6b4..7e5d01a 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -131,10 +131,13 @@ void migrate_add_blocker(Error *reason);
void migrate_del_blocker(Error *reason);
bool migrate_rdma_pin_all(void);
+
bool migrate_zero_blocks(void);
bool migrate_auto_converge(void);
+bool migrate_unix_page_flipping(void);
+
int xbzrle_encode_buffer(uint8_t *old_buf, uint8_t *new_buf, int slen,
uint8_t *dst, int dlen);
int xbzrle_decode_buffer(uint8_t *src, int slen, uint8_t *dst, int dlen);
diff --git a/migration.c b/migration.c
index 2b1ab20..4ac466b 100644
--- a/migration.c
+++ b/migration.c
@@ -541,6 +541,15 @@ int64_t migrate_xbzrle_cache_size(void)
return s->xbzrle_cache_size;
}
+bool migrate_unix_page_flipping(void)
+{
+ MigrationState *s;
+
+ s = migrate_get_current();
+
+ return s->enabled_capabilities[MIGRATION_CAPABILITY_UNIX_PAGE_FLIPPING];
+}
+
/* migration thread support */
static void *migration_thread(void *opaque)
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 03/17] qmp-command.hx: add missing docs for migration capabilites
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 01/17] QAPI: introduce migration capability unix_page_flipping Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 02/17] migration: add migrate_unix_page_flipping() Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 04/17] migration-local: add QEMUFileLocal with socket based QEMUFile Lei Li
` (14 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
qmp-commands.hx | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/qmp-commands.hx b/qmp-commands.hx
index fba15cd..dcec433 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2898,6 +2898,10 @@ migrate-set-capabilities
Enable/Disable migration capabilities
- "xbzrle": XBZRLE support
+- "x-rdma-pin-all": Pin all pages during RDMA support
+- "zero-blocks": Compress zero blocks during block migration
+- "auto-converge": Block VCPU to help convergence of migration
+- "unix-page-flipping": Page flipping for live QEMU upgrade
Arguments:
@@ -2922,6 +2926,10 @@ Query current migration capabilities
- "capabilities": migration capabilities state
- "xbzrle" : XBZRLE state (json-bool)
+ - "x-rdma-pin-all": RDMA state (json-bool)
+ - "zero-blocks": zero-blocks state (json-bool)
+ - "auto-converge": Auto converge state (json-bool)
+ - "unix-page-flipping": Page flipping state (json-bool)
Arguments:
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 04/17] migration-local: add QEMUFileLocal with socket based QEMUFile
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (2 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 03/17] qmp-command.hx: add missing docs for migration capabilites Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 05/17] migration-local: introduce qemu_fopen_socket_local() Lei Li
` (13 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
This patch adds QEMUFileLocal with copy of socket based QEMUFile, will
be used as the basis code for Unix socket protocol migration and page
flipping migration.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
Makefile.target | 1 +
migration-local.c | 121 +++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 122 insertions(+), 0 deletions(-)
create mode 100644 migration-local.c
diff --git a/Makefile.target b/Makefile.target
index af6ac7e..aa09960 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -117,6 +117,7 @@ obj-$(CONFIG_KVM) += kvm-all.o
obj-y += memory.o savevm.o cputlb.o
obj-y += memory_mapping.o
obj-y += dump.o
+obj-y += migration-local.o
LIBS+=$(libs_softmmu)
# xen support
diff --git a/migration-local.c b/migration-local.c
new file mode 100644
index 0000000..8b9e10e
--- /dev/null
+++ b/migration-local.c
@@ -0,0 +1,121 @@
+/*
+ * QEMU localhost migration with page flipping
+ *
+ * Copyright IBM, Corp. 2013
+ *
+ * Authors:
+ * Lei Li <lilei@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "config-host.h"
+#include "qemu-common.h"
+#include "migration/migration.h"
+#include "exec/cpu-common.h"
+#include "config.h"
+#include "exec/cpu-all.h"
+#include "exec/memory.h"
+#include "exec/memory-internal.h"
+#include "monitor/monitor.h"
+#include "migration/qemu-file.h"
+#include "qemu/iov.h"
+#include "sysemu/arch_init.h"
+#include "sysemu/sysemu.h"
+#include "block/block.h"
+#include "qemu/sockets.h"
+#include "migration/block.h"
+#include "qemu/thread.h"
+#include "qmp-commands.h"
+#include "trace.h"
+#include "qemu/osdep.h"
+
+//#define DEBUG_MIGRATION_LOCAL
+
+#ifdef DEBUG_MIGRATION_LOCAL
+#define DPRINTF(fmt, ...) \
+ do { printf("migration-local: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+ do { } while (0)
+#endif
+
+
+typedef struct QEMUFileLocal {
+ QEMUFile *file;
+ int sockfd;
+ int pipefd[2];
+ bool unix_page_flipping;
+} QEMUFileLocal;
+
+static int qemu_local_get_sockfd(void *opaque)
+{
+ QEMUFileLocal *s = opaque;
+
+ return s->sockfd;
+}
+
+static int qemu_local_get_buffer(void *opaque, uint8_t *buf,
+ int64_t pos, int size)
+{
+ QEMUFileLocal *s = opaque;
+ ssize_t len;
+
+ for (;;) {
+ len = qemu_recv(s->sockfd, buf, size, 0);
+ if (len != -1) {
+ break;
+ }
+
+ if (socket_error() == EAGAIN) {
+ yield_until_fd_readable(s->sockfd);
+ } else if (socket_error() != EINTR) {
+ break;
+ }
+ }
+
+ if (len == -1) {
+ len = -socket_error();
+ }
+
+ return len;
+}
+
+static ssize_t qemu_local_writev_buffer(void *opaque, struct iovec *iov,
+ int iovcnt, int64_t pos)
+{
+ QEMUFileLocal *s = opaque;
+ ssize_t len;
+ ssize_t size = iov_size(iov, iovcnt);
+
+ len = iov_send(s->sockfd, iov, iovcnt, 0, size);
+ if (len < size) {
+ len = -socket_error();
+ }
+
+ return len;
+}
+
+static int qemu_local_close(void *opaque)
+{
+ QEMUFileLocal *s = opaque;
+
+ closesocket(s->sockfd);
+ g_free(s);
+
+ return 0;
+}
+
+static const QEMUFileOps pipe_read_ops = {
+ .get_fd = qemu_local_get_sockfd,
+ .get_buffer = qemu_local_get_buffer,
+ .close = qemu_local_close,
+};
+
+static const QEMUFileOps pipe_write_ops = {
+ .get_fd = qemu_local_get_sockfd,
+ .writev_buffer = qemu_local_writev_buffer,
+ .close = qemu_local_close,
+};
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 05/17] migration-local: introduce qemu_fopen_socket_local()
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (3 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 04/17] migration-local: add QEMUFileLocal with socket based QEMUFile Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 06/17] migration-local: add send_pipefd() Lei Li
` (12 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
Add qemu_fopen_socket_local() to open QEMUFileLocal introduced
earlier. It will create a pipe in write mode if unix_page_flipping
is enabled, adjust qemu_local_close() to close pipe as well.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
include/migration/qemu-file.h | 2 +
migration-local.c | 46 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 48 insertions(+), 0 deletions(-)
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 0f757fb..f9b104a 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -99,6 +99,8 @@ QEMUFile *qemu_fopen(const char *filename, const char *mode);
QEMUFile *qemu_fdopen(int fd, const char *mode);
QEMUFile *qemu_fopen_socket(int fd, const char *mode);
QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
+QEMUFile *qemu_fopen_socket_local(int sockfd, const char *mode);
+
int qemu_get_fd(QEMUFile *f);
int qemu_fclose(QEMUFile *f);
int64_t qemu_ftell(QEMUFile *f);
diff --git a/migration-local.c b/migration-local.c
index 8b9e10e..28da05b 100644
--- a/migration-local.c
+++ b/migration-local.c
@@ -103,6 +103,12 @@ static int qemu_local_close(void *opaque)
QEMUFileLocal *s = opaque;
closesocket(s->sockfd);
+
+ if (s->unix_page_flipping) {
+ close(s->pipefd[0]);
+ close(s->pipefd[1]);
+ }
+
g_free(s);
return 0;
@@ -119,3 +125,43 @@ static const QEMUFileOps pipe_write_ops = {
.writev_buffer = qemu_local_writev_buffer,
.close = qemu_local_close,
};
+
+QEMUFile *qemu_fopen_socket_local(int sockfd, const char *mode)
+{
+ QEMUFileLocal *s;
+ int pipefd[2];
+
+ if (qemu_file_mode_is_not_valid(mode)) {
+ return NULL;
+ }
+
+ s = g_malloc0(sizeof(QEMUFileLocal));
+ s->sockfd = sockfd;
+
+ if (migrate_unix_page_flipping()) {
+ s->unix_page_flipping = 1;
+ }
+
+ if (mode[0] == 'w') {
+ if (s->unix_page_flipping) {
+ if (pipe(pipefd) < 0) {
+ fprintf(stderr, "failed to create PIPE\n");
+ goto fail;
+ }
+
+ s->pipefd[0] = pipefd[0];
+ s->pipefd[1] = pipefd[1];
+ }
+
+ qemu_set_block(s->sockfd);
+ s->file = qemu_fopen_ops(s, &pipe_write_ops);
+ } else {
+ s->file = qemu_fopen_ops(s, &pipe_read_ops);
+ }
+
+ return s->file;
+
+fail:
+ g_free(s);
+ return NULL;
+}
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 06/17] migration-local: add send_pipefd()
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (4 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 05/17] migration-local: introduce qemu_fopen_socket_local() Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-26 11:26 ` Paolo Bonzini
2013-11-21 9:11 ` [Qemu-devel] [PATCH 07/17] migration-local: override before_ram_iterate to send pipefd Lei Li
` (11 subsequent siblings)
17 siblings, 1 reply; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
This patch adds send_pipefd() to pass the pipe file descriptor
to destination process.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
migration-local.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 53 insertions(+), 0 deletions(-)
diff --git a/migration-local.c b/migration-local.c
index 28da05b..f4265a1 100644
--- a/migration-local.c
+++ b/migration-local.c
@@ -165,3 +165,56 @@ fail:
g_free(s);
return NULL;
}
+
+
+/*
+ * Pass a pipe file descriptor to another process.
+ *
+ * Return negative value If pipefd < 0. Return 0 on
+ * success.
+ *
+ */
+static int send_pipefd(int sockfd, int pipefd)
+{
+ struct msghdr msg;
+ struct iovec iov[1];
+ ssize_t ret;
+
+ union {
+ struct cmsghdr cm;
+ char control[CMSG_SPACE(sizeof(int))];
+ } control_un;
+ struct cmsghdr *cmptr;
+ char req[1] = { 0x01 };
+
+ if (pipefd < 0) {
+ msg.msg_control = NULL;
+ msg.msg_controllen = 0;
+ /* Negative status means error */
+ req[0] = pipefd;
+ } else {
+ msg.msg_control = control_un.control;
+ msg.msg_controllen = sizeof(control_un.control);
+
+ cmptr = CMSG_FIRSTHDR(&msg);
+ cmptr->cmsg_len = CMSG_LEN(sizeof(int));
+ cmptr->cmsg_level = SOL_SOCKET;
+ cmptr->cmsg_type = SCM_RIGHTS;
+ *((int *) CMSG_DATA(cmptr)) = pipefd;
+
+ msg.msg_name = NULL;
+ msg.msg_namelen = 0;
+
+ iov[0].iov_base = req;
+ iov[0].iov_len = sizeof(req);
+ msg.msg_iov = iov;
+ msg.msg_iovlen = 1;
+ }
+
+ ret = sendmsg(sockfd, &msg, 0);
+ if (ret <= 0) {
+ DPRINTF("sendmsg error: %s\n", strerror(errno));
+ }
+
+ return ret;
+}
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 07/17] migration-local: override before_ram_iterate to send pipefd
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (5 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 06/17] migration-local: add send_pipefd() Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer Lei Li
` (10 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
Override before_ram_iterate to send pipefd. It will qemu_fflush
the stream QEMUFile and send it in RAM_CONTROL_SETUP stage.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
migration-local.c | 25 +++++++++++++++++++++++++
1 files changed, 25 insertions(+), 0 deletions(-)
diff --git a/migration-local.c b/migration-local.c
index f4265a1..e028beb 100644
--- a/migration-local.c
+++ b/migration-local.c
@@ -114,6 +114,30 @@ static int qemu_local_close(void *opaque)
return 0;
}
+static int send_pipefd(int sockfd, int pipefd);
+
+static int qemu_local_send_pipefd(QEMUFile *f, void *opaque,
+ uint64_t flags)
+{
+ QEMUFileLocal *s = opaque;
+ int ret;
+
+ if (s->unix_page_flipping) {
+ /* Avoid sending pipe fd again in ram_save_complete() stage */
+ if (flags == RAM_CONTROL_SETUP) {
+ qemu_fflush(f);
+ ret = send_pipefd(s->sockfd, s->pipefd[0]);
+ if (ret < 0) {
+ fprintf(stderr, "failed to pass PIPE\n");
+ return ret;
+ }
+ DPRINTF("PIPE fd was sent\n");
+ }
+ }
+
+ return 0;
+}
+
static const QEMUFileOps pipe_read_ops = {
.get_fd = qemu_local_get_sockfd,
.get_buffer = qemu_local_get_buffer,
@@ -124,6 +148,7 @@ static const QEMUFileOps pipe_write_ops = {
.get_fd = qemu_local_get_sockfd,
.writev_buffer = qemu_local_writev_buffer,
.close = qemu_local_close,
+ .before_ram_iterate = qemu_local_send_pipefd,
};
QEMUFile *qemu_fopen_socket_local(int sockfd, const char *mode)
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (6 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 07/17] migration-local: override before_ram_iterate to send pipefd Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-26 11:30 ` Lei Li
2013-11-26 11:31 ` Paolo Bonzini
2013-11-21 9:11 ` [Qemu-devel] [PATCH 09/17] save_page: replace block_offset with a MemoryRegion Lei Li
` (9 subsequent siblings)
17 siblings, 2 replies; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
The control message for exchange of pipe file descriptor should
be received by recvmsg, and it might be eaten to stream file by
qemu_recv() when receiving by two callbacks. So this patch adds
unix_msgfd_lookup() to callback get_buffer as the only one receiver,
where the pipe file descriptor would be caughted.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
migration-local.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++--
1 files changed, 65 insertions(+), 3 deletions(-)
diff --git a/migration-local.c b/migration-local.c
index e028beb..0f0896b 100644
--- a/migration-local.c
+++ b/migration-local.c
@@ -50,6 +50,8 @@ typedef struct QEMUFileLocal {
bool unix_page_flipping;
} QEMUFileLocal;
+static bool pipefd_passed;
+
static int qemu_local_get_sockfd(void *opaque)
{
QEMUFileLocal *s = opaque;
@@ -57,16 +59,76 @@ static int qemu_local_get_sockfd(void *opaque)
return s->sockfd;
}
+static int unix_msgfd_lookup(void *opaque, struct msghdr *msg)
+{
+ QEMUFileLocal *s = opaque;
+ struct cmsghdr *cmsg;
+ bool found = false;
+
+ for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
+ if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)) ||
+ cmsg->cmsg_level != SOL_SOCKET ||
+ cmsg->cmsg_type != SCM_RIGHTS)
+ continue;
+
+ /* PIPE file descriptor to be received */
+ s->pipefd[0] = *((int *)CMSG_DATA(cmsg));
+ }
+
+ if (s->pipefd[0] <= 0) {
+ fprintf(stderr, "no pipe fd can be received\n");
+ return found;
+ }
+
+ DPRINTF("pipefd successfully received\n");
+ return s->pipefd[0];
+}
+
static int qemu_local_get_buffer(void *opaque, uint8_t *buf,
int64_t pos, int size)
{
QEMUFileLocal *s = opaque;
ssize_t len;
+ struct msghdr msg = { NULL, };
+ struct iovec iov[1];
+ union {
+ struct cmsghdr cmsg;
+ char control[CMSG_SPACE(sizeof(int))];
+ } msg_control;
+
+ iov[0].iov_base = buf;
+ iov[0].iov_len = size;
+
+ msg.msg_iov = iov;
+ msg.msg_iovlen = 1;
+ msg.msg_control = &msg_control;
+ msg.msg_controllen = sizeof(msg_control);
for (;;) {
- len = qemu_recv(s->sockfd, buf, size, 0);
- if (len != -1) {
- break;
+ if (!pipefd_passed) {
+ /*
+ * recvmsg is called here to catch the control message for
+ * the exchange of PIPE file descriptor until it is received.
+ */
+ len = recvmsg(s->sockfd, &msg, 0);
+ if (len != -1) {
+ if (unix_msgfd_lookup(s, &msg) > 0) {
+ pipefd_passed = 1;
+ /*
+ * Do not count one byte taken by the PIPE file
+ * descriptor.
+ */
+ len--;
+ } else {
+ len = -1;
+ }
+ break;
+ }
+ } else {
+ len = qemu_recv(s->sockfd, buf, size, 0);
+ if (len != -1) {
+ break;
+ }
}
if (socket_error() == EAGAIN) {
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 09/17] save_page: replace block_offset with a MemoryRegion
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (7 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 10/17] migration-local: override save_page for page transmit Lei Li
` (8 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
This patch exports MemoryRegion to save_page hook, replacing
argument ram_addr_t block_offset with a MemoryRegion suggested
by Paolo Bonzini.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
arch_init.c | 4 ++--
include/migration/migration.h | 2 +-
include/migration/qemu-file.h | 8 ++++----
migration-rdma.c | 4 ++--
savevm.c | 8 ++++----
5 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/arch_init.c b/arch_init.c
index e0acbc5..daaa519 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -485,8 +485,8 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
/* In doubt sent page as normal */
bytes_sent = -1;
- ret = ram_control_save_page(f, block->offset,
- offset, TARGET_PAGE_SIZE, &bytes_sent);
+ ret = ram_control_save_page(f, mr, offset, TARGET_PAGE_SIZE,
+ &bytes_sent);
if (ret != RAM_SAVE_CONTROL_NOT_SUPP) {
if (ret != RAM_SAVE_CONTROL_DELAYED) {
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 7e5d01a..ca852a8 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -161,7 +161,7 @@ void ram_control_load_hook(QEMUFile *f, uint64_t flags);
#define RAM_SAVE_CONTROL_NOT_SUPP -1000
#define RAM_SAVE_CONTROL_DELAYED -2000
-size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
+size_t ram_control_save_page(QEMUFile *f, MemoryRegion *mr,
ram_addr_t offset, size_t size,
int *bytes_sent);
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index f9b104a..6646e89 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -77,10 +77,10 @@ typedef int (QEMURamHookFunc)(QEMUFile *f, void *opaque, uint64_t flags);
* is saved (such as RDMA, for example.)
*/
typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
- ram_addr_t block_offset,
- ram_addr_t offset,
- size_t size,
- int *bytes_sent);
+ MemoryRegion *mr,
+ ram_addr_t offset,
+ size_t size,
+ int *bytes_sent);
typedef struct QEMUFileOps {
QEMUFilePutBufferFunc *put_buffer;
diff --git a/migration-rdma.c b/migration-rdma.c
index f94f3b4..ae04de4 100644
--- a/migration-rdma.c
+++ b/migration-rdma.c
@@ -2699,7 +2699,7 @@ static int qemu_rdma_close(void *opaque)
* the protocol because most transfers are sent asynchronously.
*/
static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
- ram_addr_t block_offset, ram_addr_t offset,
+ MemoryRegion *mr, ram_addr_t offset,
size_t size, int *bytes_sent)
{
QEMUFileRDMA *rfile = opaque;
@@ -2716,7 +2716,7 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void *opaque,
* is full, or the page doen't belong to the current chunk,
* an actual RDMA write will occur and a new chunk will be formed.
*/
- ret = qemu_rdma_write(f, rdma, block_offset, offset, size);
+ ret = qemu_rdma_write(f, rdma, mr->ram_addr, offset, size);
if (ret < 0) {
fprintf(stderr, "rdma migration: write error! %d\n", ret);
goto err;
diff --git a/savevm.c b/savevm.c
index 2f631d4..3ee256e 100644
--- a/savevm.c
+++ b/savevm.c
@@ -661,12 +661,12 @@ void ram_control_load_hook(QEMUFile *f, uint64_t flags)
}
}
-size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
- ram_addr_t offset, size_t size, int *bytes_sent)
+size_t ram_control_save_page(QEMUFile *f, MemoryRegion *mr, ram_addr_t offset,
+ size_t size, int *bytes_sent)
{
if (f->ops->save_page) {
- int ret = f->ops->save_page(f, f->opaque, block_offset,
- offset, size, bytes_sent);
+ int ret = f->ops->save_page(f, f->opaque, mr, offset,
+ size, bytes_sent);
if (ret != RAM_SAVE_CONTROL_DELAYED) {
if (bytes_sent && *bytes_sent > 0) {
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 10/17] migration-local: override save_page for page transmit
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (8 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 09/17] save_page: replace block_offset with a MemoryRegion Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-26 11:22 ` Paolo Bonzini
2013-11-21 9:11 ` [Qemu-devel] [PATCH 11/17] savevm: adjust ram_control_save_page for page flipping Lei Li
` (7 subsequent siblings)
17 siblings, 1 reply; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
This patch implements save_page callback for the outside
of page flipping. It will write the address of the page
on the Unix socket and flip the page data on pipe by
vmsplice(). Every page address would have a header flag
RAM_SAVE_FLAG_HOOK, which will trigger the load hook to
receive it in incoming side as well.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
migration-local.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 54 insertions(+), 0 deletions(-)
diff --git a/migration-local.c b/migration-local.c
index 0f0896b..14207e9 100644
--- a/migration-local.c
+++ b/migration-local.c
@@ -200,6 +200,59 @@ static int qemu_local_send_pipefd(QEMUFile *f, void *opaque,
return 0;
}
+static size_t qemu_local_save_ram(QEMUFile *f, void *opaque,
+ MemoryRegion *mr, ram_addr_t offset,
+ size_t size, int *bytes_sent)
+{
+ QEMUFileLocal *s = opaque;
+ ram_addr_t current_addr = mr->ram_addr + offset;
+ void *ram_addr;
+ ssize_t ret;
+
+ if (s->unix_page_flipping) {
+ qemu_fflush(s->file);
+ qemu_put_be64(s->file, RAM_SAVE_FLAG_HOOK);
+
+ /* Write page address to unix socket */
+ qemu_put_be64(s->file, current_addr);
+
+ ram_addr = memory_region_get_ram_ptr(mr) + offset;
+
+ /* vmsplice page data to pipe */
+ struct iovec iov = {
+ .iov_base = ram_addr,
+ .iov_len = size,
+ };
+
+ /*
+ * The flag SPLICE_F_MOVE is introduced in kernel for the page
+ * flipping feature in QEMU, which will movie pages rather than
+ * copying, previously unused.
+ *
+ * If a move is not possible the kernel will transparently falls
+ * back to copying data.
+ *
+ * For older kernels the SPLICE_F_MOVE would be ignored and a copy
+ * would occur.
+ */
+ ret = vmsplice(s->pipefd[1], &iov, 1, SPLICE_F_GIFT | SPLICE_F_MOVE);
+ if (ret == -1) {
+ if (errno != EAGAIN && errno != EINTR) {
+ fprintf(stderr, "vmsplice save error: %s\n", strerror(errno));
+ return ret;
+ }
+ } else {
+ if (bytes_sent) {
+ *bytes_sent = 1;
+ }
+ DPRINTF("block_offset: %lu, offset: %lu\n", block_offset, offset);
+ return 0;
+ }
+ }
+
+ return RAM_SAVE_CONTROL_NOT_SUPP;
+}
+
static const QEMUFileOps pipe_read_ops = {
.get_fd = qemu_local_get_sockfd,
.get_buffer = qemu_local_get_buffer,
@@ -211,6 +264,7 @@ static const QEMUFileOps pipe_write_ops = {
.writev_buffer = qemu_local_writev_buffer,
.close = qemu_local_close,
.before_ram_iterate = qemu_local_send_pipefd,
+ .save_page = qemu_local_save_ram
};
QEMUFile *qemu_fopen_socket_local(int sockfd, const char *mode)
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 11/17] savevm: adjust ram_control_save_page for page flipping
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (9 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 10/17] migration-local: override save_page for page transmit Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 12/17] migration-local: override hook_ram_load Lei Li
` (6 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
As callback save_page will always be opened by
qemu_fopen_socket_local(), and without unix_page_flipping
it will return RAM_SAVE_CONTROL_NOT_SUPP, it leads to a
wrong qemu_file_set_error() based on the current logic.
So this patch adds RAM_SAVE_CONTROL_NOT_SUPP to the check.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
savevm.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/savevm.c b/savevm.c
index 3ee256e..4576145 100644
--- a/savevm.c
+++ b/savevm.c
@@ -668,7 +668,8 @@ size_t ram_control_save_page(QEMUFile *f, MemoryRegion *mr, ram_addr_t offset,
int ret = f->ops->save_page(f, f->opaque, mr, offset,
size, bytes_sent);
- if (ret != RAM_SAVE_CONTROL_DELAYED) {
+ if (ret != RAM_SAVE_CONTROL_DELAYED &&
+ ret != RAM_SAVE_CONTROL_NOT_SUPP) {
if (bytes_sent && *bytes_sent > 0) {
qemu_update_position(f, *bytes_sent);
} else if (ret < 0) {
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 12/17] migration-local: override hook_ram_load
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (10 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 11/17] savevm: adjust ram_control_save_page for page flipping Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-26 11:25 ` Paolo Bonzini
2013-11-21 9:11 ` [Qemu-devel] [PATCH 13/17] migration-unix: replace qemu_fopen_socket with qemu_fopen_socket_local Lei Li
` (5 subsequent siblings)
17 siblings, 1 reply; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
Override hook_ram_load to receive the pipe file descriptor
passed by source process and page address which will be
extracted to vmsplice the page data from pipe.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
migration-local.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 55 insertions(+), 0 deletions(-)
diff --git a/migration-local.c b/migration-local.c
index 14207e9..8ac0af5 100644
--- a/migration-local.c
+++ b/migration-local.c
@@ -253,10 +253,65 @@ static size_t qemu_local_save_ram(QEMUFile *f, void *opaque,
return RAM_SAVE_CONTROL_NOT_SUPP;
}
+static int qemu_local_ram_load(QEMUFile *f, void *opaque,
+ uint64_t flags)
+{
+ QEMUFileLocal *s = opaque;
+ ram_addr_t addr;
+ struct iovec iov;
+ ssize_t ret = -EINVAL;
+
+ /*
+ * PIPE file descriptor will be received by another callback
+ * get_buffer.
+ */
+ if (pipefd_passed) {
+ void *host;
+ /*
+ * Extract the page address from the 8-byte record and
+ * read the page data from the pipe.
+ */
+ addr = qemu_get_be64(s->file);
+ host = qemu_get_ram_ptr(addr);
+
+ iov.iov_base = host;
+ iov.iov_len = TARGET_PAGE_SIZE;
+
+ /* The flag SPLICE_F_MOVE is introduced in kernel for the page
+ * flipping feature in QEMU, which will movie pages rather than
+ * copying, previously unused.
+ *
+ * If a move is not possible the kernel will transparently falls
+ * back to copying data.
+ *
+ * For older kernels the SPLICE_F_MOVE would be ignored and a copy
+ * would occur.
+ */
+ ret = vmsplice(s->pipefd[0], &iov, 1, SPLICE_F_MOVE);
+ if (ret == -1) {
+ if (errno != EAGAIN && errno != EINTR) {
+ fprintf(stderr, "vmsplice() load error: %s", strerror(errno));
+ return ret;
+ }
+ DPRINTF("vmsplice load error\n");
+ } else if (ret == 0) {
+ DPRINTF(stderr, "load_page: zero read\n");
+ }
+
+ DPRINTF("vmsplice (read): %zu\n", ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+
+
static const QEMUFileOps pipe_read_ops = {
.get_fd = qemu_local_get_sockfd,
.get_buffer = qemu_local_get_buffer,
.close = qemu_local_close,
+ .hook_ram_load = qemu_local_ram_load
};
static const QEMUFileOps pipe_write_ops = {
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 13/17] migration-unix: replace qemu_fopen_socket with qemu_fopen_socket_local
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (11 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 12/17] migration-local: override hook_ram_load Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 14/17] add new RanState RAN_STATE_MEMORY_STALE Lei Li
` (4 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
Relace qemu_fopen_socket with qemu_fopen_socket_local in Unix
protocol migration.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
migration-unix.c | 18 ++++++++++++++----
1 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/migration-unix.c b/migration-unix.c
index 651fc5b..9beeafe 100644
--- a/migration-unix.c
+++ b/migration-unix.c
@@ -37,12 +37,22 @@ static void unix_wait_for_connect(int fd, void *opaque)
if (fd < 0) {
DPRINTF("migrate connect error\n");
s->file = NULL;
- migrate_fd_error(s);
+ goto fail;
} else {
DPRINTF("migrate connect success\n");
- s->file = qemu_fopen_socket(fd, "wb");
+
+ s->file = qemu_fopen_socket_local(fd, "wb");
+ if (s->file == NULL) {
+ fprintf(stderr, "failed to open Unix socket\n");
+ goto fail;
+ }
+
migrate_fd_connect(s);
+ return;
}
+
+fail:
+ migrate_fd_error(s);
}
void unix_start_outgoing_migration(MigrationState *s, const char *path, Error **errp)
@@ -71,9 +81,9 @@ static void unix_accept_incoming_migration(void *opaque)
goto out;
}
- f = qemu_fopen_socket(c, "rb");
+ f = qemu_fopen_socket_local(c, "rb");
if (f == NULL) {
- fprintf(stderr, "could not qemu_fopen socket\n");
+ fprintf(stderr, "failed to open Unix socket\n");
goto out;
}
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 14/17] add new RanState RAN_STATE_MEMORY_STALE
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (12 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 13/17] migration-unix: replace qemu_fopen_socket with qemu_fopen_socket_local Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-26 12:28 ` Paolo Bonzini
2013-11-21 9:11 ` [Qemu-devel] [PATCH 15/17] migration-unix: page flipping support on unix outgoing Lei Li
` (3 subsequent siblings)
17 siblings, 1 reply; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
Introduce new RanState RAN_STATE_MEMORY_STALE and
add it to runstate_needs_reset().
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
qapi-schema.json | 7 +++++--
vl.c | 12 +++++++++++-
2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/qapi-schema.json b/qapi-schema.json
index b290a0f..640a380 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -176,12 +176,15 @@
# @watchdog: the watchdog action is configured to pause and has been triggered
#
# @guest-panicked: guest has been panicked as a result of guest OS panic
+#
+# @memory-stale: guest is paused to transmit memory, the destination guest
+# will has the newer contents of it.
##
{ 'enum': 'RunState',
'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
- 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
- 'guest-panicked' ] }
+ 'running', 'save-vm', 'shutdown', 'suspended', 'memory-stale',
+ 'watchdog', 'guest-panicked' ] }
##
# @SnapshotInfo
diff --git a/vl.c b/vl.c
index 8d5d874..0f38405 100644
--- a/vl.c
+++ b/vl.c
@@ -601,6 +601,7 @@ static const RunStateTransition runstate_transitions_def[] = {
{ RUN_STATE_PAUSED, RUN_STATE_RUNNING },
{ RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
+ { RUN_STATE_PAUSED, RUN_STATE_MEMORY_STALE },
{ RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
{ RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
@@ -608,6 +609,7 @@ static const RunStateTransition runstate_transitions_def[] = {
{ RUN_STATE_PRELAUNCH, RUN_STATE_RUNNING },
{ RUN_STATE_PRELAUNCH, RUN_STATE_FINISH_MIGRATE },
{ RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
+ { RUN_STATE_PRELAUNCH, RUN_STATE_MEMORY_STALE },
{ RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
{ RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
@@ -624,23 +626,30 @@ static const RunStateTransition runstate_transitions_def[] = {
{ RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
{ RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
{ RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
+ { RUN_STATE_RUNNING, RUN_STATE_MEMORY_STALE },
{ RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
{ RUN_STATE_SHUTDOWN, RUN_STATE_PAUSED },
{ RUN_STATE_SHUTDOWN, RUN_STATE_FINISH_MIGRATE },
+ { RUN_STATE_SHUTDOWN, RUN_STATE_MEMORY_STALE },
{ RUN_STATE_DEBUG, RUN_STATE_SUSPENDED },
{ RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
{ RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
{ RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
+ { RUN_STATE_SUSPENDED, RUN_STATE_MEMORY_STALE },
{ RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
{ RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
+ { RUN_STATE_WATCHDOG, RUN_STATE_MEMORY_STALE },
{ RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
{ RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
+ { RUN_STATE_GUEST_PANICKED, RUN_STATE_MEMORY_STALE },
+ { RUN_STATE_MEMORY_STALE, RUN_STATE_RUNNING },
+ { RUN_STATE_MEMORY_STALE, RUN_STATE_POSTMIGRATE },
{ RUN_STATE_MAX, RUN_STATE_MAX },
};
@@ -685,7 +694,8 @@ int runstate_is_running(void)
bool runstate_needs_reset(void)
{
return runstate_check(RUN_STATE_INTERNAL_ERROR) ||
- runstate_check(RUN_STATE_SHUTDOWN);
+ runstate_check(RUN_STATE_SHUTDOWN) ||
+ runstate_check(RUN_STATE_MEMORY_STALE);
}
StatusInfo *qmp_query_status(Error **errp)
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 15/17] migration-unix: page flipping support on unix outgoing
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (13 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 14/17] add new RanState RAN_STATE_MEMORY_STALE Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping Lei Li
` (2 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
Add page flipping support on unix outgoing part by stopping
VM with the new RunState RUN_STATE_MEMORY_STALE before
invoking migration if unix_page_flipping enabled.
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
migration-unix.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/migration-unix.c b/migration-unix.c
index 9beeafe..cbf2087 100644
--- a/migration-unix.c
+++ b/migration-unix.c
@@ -19,6 +19,7 @@
#include "migration/migration.h"
#include "migration/qemu-file.h"
#include "block/block.h"
+#include "sysemu/sysemu.h"
//#define DEBUG_MIGRATION_UNIX
@@ -33,6 +34,7 @@
static void unix_wait_for_connect(int fd, void *opaque)
{
MigrationState *s = opaque;
+ int ret;
if (fd < 0) {
DPRINTF("migrate connect error\n");
@@ -47,6 +49,15 @@ static void unix_wait_for_connect(int fd, void *opaque)
goto fail;
}
+ /* Stop VM before invoking migration if unix_page_flipping enabled */
+ if (migrate_unix_page_flipping()) {
+ ret = vm_stop_force_state(RUN_STATE_MEMORY_STALE);
+ if (ret < 0) {
+ DPRINTF("failed to stop VM\n");
+ goto fail;
+ }
+ }
+
migrate_fd_connect(s);
return;
}
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (14 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 15/17] migration-unix: page flipping support on unix outgoing Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-26 11:32 ` Paolo Bonzini
2013-11-21 9:11 ` [Qemu-devel] [PATCH 17/17] hmp: better format for info migrate_capabilities Lei Li
2013-11-21 10:19 ` [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Daniel P. Berrange
17 siblings, 1 reply; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
migration.c | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/migration.c b/migration.c
index 4ac466b..0f98ac1 100644
--- a/migration.c
+++ b/migration.c
@@ -579,10 +579,11 @@ static void *migration_thread(void *opaque)
pending_size = qemu_savevm_state_pending(s->file, max_size);
DPRINTF("pending size %" PRIu64 " max %" PRIu64 "\n",
pending_size, max_size);
- if (pending_size && pending_size >= max_size) {
+ if (pending_size && pending_size >= max_size &&
+ !runstate_needs_reset()) {
qemu_savevm_state_iterate(s->file);
} else {
- int ret;
+ int ret = 0;
DPRINTF("done iterating\n");
qemu_mutex_lock_iothread();
@@ -590,7 +591,10 @@ static void *migration_thread(void *opaque)
qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
old_vm_running = runstate_is_running();
- ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+ if (!runstate_needs_reset()) {
+ ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+ }
+
if (ret >= 0) {
qemu_file_set_rate_limit(s->file, INT_MAX);
qemu_savevm_state_complete(s->file);
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 17/17] hmp: better format for info migrate_capabilities
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (15 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping Lei Li
@ 2013-11-21 9:11 ` Lei Li
2013-11-21 10:19 ` [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Daniel P. Berrange
17 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-21 9:11 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini, rcj
As there might be more capabilities introduced, better to display
it in lines.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
hmp.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/hmp.c b/hmp.c
index 32ee285..dcfa2f9 100644
--- a/hmp.c
+++ b/hmp.c
@@ -226,13 +226,12 @@ void hmp_info_migrate_capabilities(Monitor *mon, const QDict *qdict)
caps = qmp_query_migrate_capabilities(NULL);
if (caps) {
- monitor_printf(mon, "capabilities: ");
+ monitor_printf(mon, "Capabilities:\n");
for (cap = caps; cap; cap = cap->next) {
- monitor_printf(mon, "%s: %s ",
+ monitor_printf(mon, "%s: %s\n",
MigrationCapability_lookup[cap->value->capability],
cap->value->state ? "on" : "off");
}
- monitor_printf(mon, "\n");
}
qapi_free_MigrationCapabilityStatusList(caps);
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
` (16 preceding siblings ...)
2013-11-21 9:11 ` [Qemu-devel] [PATCH 17/17] hmp: better format for info migrate_capabilities Lei Li
@ 2013-11-21 10:19 ` Daniel P. Berrange
2013-11-22 11:29 ` Lei Li
17 siblings, 1 reply; 45+ messages in thread
From: Daniel P. Berrange @ 2013-11-21 10:19 UTC (permalink / raw)
To: Lei Li
Cc: aarcange, quintela, mdroth, mrhines, qemu-devel, aliguori,
lagarcia, pbonzini, rcj
On Thu, Nov 21, 2013 at 05:11:23PM +0800, Lei Li wrote:
> This patch series tries to introduce a mechanism using side
> channel pipe for RAM via SCM_RIGHTS with unix domain socket
> protocol migration.
>
> This side channel is used for the page flipping by vmsplice,
> which is the internal mechanism for localhost migration that
> we are trying to add to QEMU. The backgroud info and previous
> patch series for reference,
>
> Localhost migration
> http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg02916.html
>
> migration: Introduce side channel for RAM
> http://lists.gnu.org/archive/html/qemu-devel/2013-09/msg04043.html
>
> I have picked patches from the localhost migration series and rebased
> it on the series of side channel, now it is a complete series that
> passed the basic test.
>
> Please let me know if there is anything needs to be fixed or improved.
> Your suggestions and comments are very welcome, and thanks to Paolo
> for his continued review and useful suggestions.
In discussions about supporting this for libvirt, we were told that
when this localhost migration fails, you cannot re-start the guest
on the original source QEMU.
If this is true, this implementation is not satisfactory IMHO. One
of the main motivations of this feature is to allow for in-place
live upgrades of QEMU binaries, for people who can't tolerate the
downtime of restarting their guests, and whom don't have a spare
host to migrate them to.
If people are using this because they can't tolerate any downtime
of the guest, then we need to be able to fully deal with failure to
complete migration by switching back to the original QEMU process,
as we can do with normal non-localhost migration.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram
2013-11-21 10:19 ` [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Daniel P. Berrange
@ 2013-11-22 11:29 ` Lei Li
2013-11-22 11:36 ` Paolo Bonzini
2013-11-22 11:36 ` Daniel P. Berrange
0 siblings, 2 replies; 45+ messages in thread
From: Lei Li @ 2013-11-22 11:29 UTC (permalink / raw)
To: Daniel P. Berrange
Cc: Andrea Arcangeli, quintela, qemu-devel, mrhines, mdroth, aliguori,
lagarcia, pbonzini, rcj
On 11/21/2013 06:19 PM, Daniel P. Berrange wrote:
> On Thu, Nov 21, 2013 at 05:11:23PM +0800, Lei Li wrote:
>> This patch series tries to introduce a mechanism using side
>> channel pipe for RAM via SCM_RIGHTS with unix domain socket
>> protocol migration.
>>
>> This side channel is used for the page flipping by vmsplice,
>> which is the internal mechanism for localhost migration that
>> we are trying to add to QEMU. The backgroud info and previous
>> patch series for reference,
>>
>> Localhost migration
>> http://lists.nongnu.org/archive/html/qemu-devel/2013-08/msg02916.html
>>
>> migration: Introduce side channel for RAM
>> http://lists.gnu.org/archive/html/qemu-devel/2013-09/msg04043.html
>>
>> I have picked patches from the localhost migration series and rebased
>> it on the series of side channel, now it is a complete series that
>> passed the basic test.
>>
>> Please let me know if there is anything needs to be fixed or improved.
>> Your suggestions and comments are very welcome, and thanks to Paolo
>> for his continued review and useful suggestions.
> In discussions about supporting this for libvirt, we were told that
> when this localhost migration fails, you cannot re-start the guest
> on the original source QEMU.
>
> If this is true, this implementation is not satisfactory IMHO. One
> of the main motivations of this feature is to allow for in-place
> live upgrades of QEMU binaries, for people who can't tolerate the
> downtime of restarting their guests, and whom don't have a spare
> host to migrate them to.
>
> If people are using this because they can't tolerate any downtime
> of the guest, then we need to be able to fully deal with failure to
> complete migration by switching back to the original QEMU process,
> as we can do with normal non-localhost migration.
Hi Daniel,
Page flipping is introduced here not primarily for low downtime, but
more to avoid requiring that there is enough free memory to fit an
additional copy of the largest guest which is the requirement today
with current localhost migration as the additional explanation from
Anthony in first proposal version [1].
Of course low downtime is also important to the page flipping
migration as the use case of it is to allow 'live' upgrade of a
running QEMU instance, so we expect page flipping through vmsplice
is fast enough to meet it. As an initial implementation of this
feature right now, the downtime is not good, but we are working on
it as there has been some work on kernel side [2].
During the page flipping migration, ram page of source guest would
be flipped to the destination, that's why the source guest can not
be resumed. AFAICT, the page flipping migration may fail at the
connection stage (including the exchange of pipe fd) and migration
register stage (say any blocker like unsupported migration device),
but it could be resumed for such situation since the memory has not
been flipped to another content. Once the connection is successfully
setup, it would proceed the transmission of ram page which hardly
fails. And for the failure handling in Libvirt, ZhengSheng has proposed
that restarts the old QEMU instead of resume. I know 'hardly' is not
an good answer to your concern, but it is the cost of the limited
memory IMO.
So if downtime is the key to the user, or if it's *zero toleration of
the restarting of QEMU, page flipping migration might not be a good
choice. From the perspective of management app like Libvirt, as the
'live upgrade' of QEMU will be done through localhost migration, and
there are other migration solutions which have lower downtime, like
the real live migration and the postcopy migration that Paolo mentioned
in the previous version [3]. Why not have more than one choice for it?
[1]http://lists.gnu.org/archive/html/qemu-devel/2013-06/msg02577.html
[2]http://article.gmane.org/gmane.linux.kernel/1574277
[3]http://lists.gnu.org/archive/html/qemu-devel/2013-10/msg03212.html
> Regards,
> Daniel
--
Lei
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram
2013-11-22 11:29 ` Lei Li
@ 2013-11-22 11:36 ` Paolo Bonzini
2013-11-25 7:29 ` Lei Li
2013-11-22 11:36 ` Daniel P. Berrange
1 sibling, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2013-11-22 11:36 UTC (permalink / raw)
To: Lei Li
Cc: Andrea Arcangeli, quintela, qemu-devel, mrhines, mdroth, aliguori,
lagarcia, rcj
Il 22/11/2013 12:29, Lei Li ha scritto:
> During the page flipping migration, ram page of source guest would
> be flipped to the destination, that's why the source guest can not
> be resumed. AFAICT, the page flipping migration may fail at the
> connection stage (including the exchange of pipe fd) and migration
> register stage (say any blocker like unsupported migration device),
Unfortunately, some migration problems (e.g. misconfiguration of the
destination QEMU) cannot be detected until the device data is migrated.
This happens after RAM migration, so there is indeed a reliability problem.
Postcopy would fix this (assuming the postcopy phase is reliable) by
migrating device data before any page flipping occurs.
Paolo
> but it could be resumed for such situation since the memory has not
> been flipped to another content. Once the connection is successfully
> setup, it would proceed the transmission of ram page which hardly
> fails. And for the failure handling in Libvirt, ZhengSheng has proposed
> that restarts the old QEMU instead of resume. I know 'hardly' is not
> an good answer to your concern, but it is the cost of the limited
> memory IMO.
>
> So if downtime is the key to the user, or if it's *zero toleration of
> the restarting of QEMU, page flipping migration might not be a good
> choice. From the perspective of management app like Libvirt, as the
> 'live upgrade' of QEMU will be done through localhost migration, and
> there are other migration solutions which have lower downtime, like
> the real live migration and the postcopy migration that Paolo mentioned
> in the previous version [3]. Why not have more than one choice for it?
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram
2013-11-22 11:29 ` Lei Li
2013-11-22 11:36 ` Paolo Bonzini
@ 2013-11-22 11:36 ` Daniel P. Berrange
1 sibling, 0 replies; 45+ messages in thread
From: Daniel P. Berrange @ 2013-11-22 11:36 UTC (permalink / raw)
To: Lei Li
Cc: Andrea Arcangeli, quintela, qemu-devel, mrhines, mdroth, aliguori,
lagarcia, pbonzini, rcj
On Fri, Nov 22, 2013 at 07:29:05PM +0800, Lei Li wrote:
> On 11/21/2013 06:19 PM, Daniel P. Berrange wrote:
> >On Thu, Nov 21, 2013 at 05:11:23PM +0800, Lei Li wrote:
> >In discussions about supporting this for libvirt, we were told that
> >when this localhost migration fails, you cannot re-start the guest
> >on the original source QEMU.
> >
> >If this is true, this implementation is not satisfactory IMHO. One
> >of the main motivations of this feature is to allow for in-place
> >live upgrades of QEMU binaries, for people who can't tolerate the
> >downtime of restarting their guests, and whom don't have a spare
> >host to migrate them to.
> >
> >If people are using this because they can't tolerate any downtime
> >of the guest, then we need to be able to fully deal with failure to
> >complete migration by switching back to the original QEMU process,
> >as we can do with normal non-localhost migration.
>
> Hi Daniel,
>
> Page flipping is introduced here not primarily for low downtime, but
> more to avoid requiring that there is enough free memory to fit an
> additional copy of the largest guest which is the requirement today
> with current localhost migration as the additional explanation from
> Anthony in first proposal version [1].
>
> Of course low downtime is also important to the page flipping
> migration as the use case of it is to allow 'live' upgrade of a
> running QEMU instance, so we expect page flipping through vmsplice
> is fast enough to meet it. As an initial implementation of this
> feature right now, the downtime is not good, but we are working on
> it as there has been some work on kernel side [2].
>
> During the page flipping migration, ram page of source guest would
> be flipped to the destination, that's why the source guest can not
> be resumed. AFAICT, the page flipping migration may fail at the
> connection stage (including the exchange of pipe fd) and migration
> register stage (say any blocker like unsupported migration device),
> but it could be resumed for such situation since the memory has not
> been flipped to another content. Once the connection is successfully
> setup, it would proceed the transmission of ram page which hardly
> fails. And for the failure handling in Libvirt, ZhengSheng has proposed
> that restarts the old QEMU instead of resume. I know 'hardly' is not
> an good answer to your concern, but it is the cost of the limited
> memory IMO.
If you can flip the pages in one direction, then you can surely
flip them back in the other direction upon failure. Suggesting
people restart QEMU upon failure is just not an acceptable
"recovery" strategy, since it does not in fact recover anything
useful from the user's POV. You've lost all the state of whatever
was running.
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram
2013-11-22 11:36 ` Paolo Bonzini
@ 2013-11-25 7:29 ` Lei Li
2013-11-25 9:48 ` Paolo Bonzini
0 siblings, 1 reply; 45+ messages in thread
From: Lei Li @ 2013-11-25 7:29 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Andrea Arcangeli, quintela, mdroth, mrhines, qemu-devel, aliguori,
lagarcia, rcj
On 11/22/2013 07:36 PM, Paolo Bonzini wrote:
> Il 22/11/2013 12:29, Lei Li ha scritto:
>> During the page flipping migration, ram page of source guest would
>> be flipped to the destination, that's why the source guest can not
>> be resumed. AFAICT, the page flipping migration may fail at the
>> connection stage (including the exchange of pipe fd) and migration
>> register stage (say any blocker like unsupported migration device),
> Unfortunately, some migration problems (e.g. misconfiguration of the
> destination QEMU) cannot be detected until the device data is migrated.
> This happens after RAM migration, so there is indeed a reliability problem.
Hi Paolo,
'Some migration problems cannot be detected until the device data is migrated',
do you mean that the outgoing migration has no idea the failure of incoming
side caused by the misconfiguration of the destination QEMU?
In this case, if the migration would fail just because the misconfiguration
of device state on destination, in the meantime the outgoing migration has
no aware of this failure, I think it should add such handling (like synchronize
of the device state list in incoming side?) to the current migration protocol
as it is kind of missing... It can not just rely on the resume of source
guest for such failure... or maybe it should be handled in management app to
force the configuration right?
>
> Postcopy would fix this (assuming the postcopy phase is reliable) by
> migrating device data before any page flipping occurs.
Are you suggesting that page flipping should be coupled with the postcopy
migration for live upgrade of QEMU as your comments in the previous version?
>
> Paolo
>
>> but it could be resumed for such situation since the memory has not
>> been flipped to another content. Once the connection is successfully
>> setup, it would proceed the transmission of ram page which hardly
>> fails. And for the failure handling in Libvirt, ZhengSheng has proposed
>> that restarts the old QEMU instead of resume. I know 'hardly' is not
>> an good answer to your concern, but it is the cost of the limited
>> memory IMO.
>>
>> So if downtime is the key to the user, or if it's *zero toleration of
>> the restarting of QEMU, page flipping migration might not be a good
>> choice. From the perspective of management app like Libvirt, as the
>> 'live upgrade' of QEMU will be done through localhost migration, and
>> there are other migration solutions which have lower downtime, like
>> the real live migration and the postcopy migration that Paolo mentioned
>> in the previous version [3]. Why not have more than one choice for it?
>
--
Lei
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram
2013-11-25 7:29 ` Lei Li
@ 2013-11-25 9:48 ` Paolo Bonzini
2013-11-26 11:07 ` Lei Li
0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2013-11-25 9:48 UTC (permalink / raw)
To: Lei Li
Cc: Andrea Arcangeli, quintela, mdroth, mrhines, qemu-devel, aliguori,
lagarcia, rcj
Il 25/11/2013 08:29, Lei Li ha scritto:
>
>
> In this case, if the migration would fail just because the misconfiguration
> of device state on destination, in the meantime the outgoing migration has
> no aware of this failure, I think it should add such handling (like synchronize
> of the device state list in incoming side?) to the current migration protocol
> as it is kind of missing... It can not just rely on the resume of source
> guest for such failure... or maybe it should be handled in management
> app to force the configuration right?
It is already handled by libvirt, indeed.
Basically, "-incoming" without "-S" is a broken option because of the
missing handshake at the end of migration. With "-S" something else
(either a human or a program) can check that everything went well and
choose whether to restart the source or the destination.
>> Postcopy would fix this (assuming the postcopy phase is reliable) by
>> migrating device data before any page flipping occurs.
>
> Are you suggesting that page flipping should be coupled with the postcopy
> migration for live upgrade of QEMU as your comments in the previous
> version?
In order to make live upgrade reliable, it should.
Paolo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram
2013-11-25 9:48 ` Paolo Bonzini
@ 2013-11-26 11:07 ` Lei Li
2013-11-26 11:17 ` Paolo Bonzini
0 siblings, 1 reply; 45+ messages in thread
From: Lei Li @ 2013-11-26 11:07 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Andrea Arcangeli, quintela, mdroth, mrhines, qemu-devel, aliguori,
lagarcia, rcj
On 11/25/2013 05:48 PM, Paolo Bonzini wrote:
> Il 25/11/2013 08:29, Lei Li ha scritto:
>>
>> In this case, if the migration would fail just because the misconfiguration
>> of device state on destination, in the meantime the outgoing migration has
>> no aware of this failure, I think it should add such handling (like synchronize
>> of the device state list in incoming side?) to the current migration protocol
>> as it is kind of missing... It can not just rely on the resume of source
>> guest for such failure... or maybe it should be handled in management
>> app to force the configuration right?
> It is already handled by libvirt, indeed.
>
> Basically, "-incoming" without "-S" is a broken option because of the
> missing handshake at the end of migration. With "-S" something else
> (either a human or a program) can check that everything went well and
> choose whether to restart the source or the destination.
I see, thanks for your explanation. :-)
BTW, do you think we should add such handling to the current migration
protocol?
>
>>> Postcopy would fix this (assuming the postcopy phase is reliable) by
>>> migrating device data before any page flipping occurs.
>> Are you suggesting that page flipping should be coupled with the postcopy
>> migration for live upgrade of QEMU as your comments in the previous
>> version?
> In order to make live upgrade reliable, it should.
The whole procedure for page flipping migration is straight forward, and
the cases of failure I listed are in theory, which never happened at least
since many times I have tested (except the case you raised above). But I
agree with you on coupling with postcopy migration to make it more reliable,
specially for the undetected problems.
For this, I am not quite sure I understand it correctly, seems the latest
update of post copy migration was sent on last Oct, would you please give
some insights on what else could I do for the coupling with postcopy migration?
If no, now page flipping is implemented as a migration capability, and it's
a good shape already as your comments in the previous version. Although it
still needs a little more time to get the numbers of the new vmsplice, I'd to
ask your opinion that do you consider it could be merged as an experimental
version for now?
>
> Paolo
>
--
Lei
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram
2013-11-26 11:07 ` Lei Li
@ 2013-11-26 11:17 ` Paolo Bonzini
2013-11-27 16:48 ` Andrea Arcangeli
0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2013-11-26 11:17 UTC (permalink / raw)
To: Lei Li
Cc: Andrea Arcangeli, quintela, mdroth, mrhines, qemu-devel,
Orit Wasserman, aliguori, lagarcia, rcj
Il 26/11/2013 12:07, Lei Li ha scritto:
>> Basically, "-incoming" without "-S" is a broken option because of the
>> missing handshake at the end of migration. With "-S" something else
>> (either a human or a program) can check that everything went well and
>> choose whether to restart the source or the destination.
>
> I see, thanks for your explanation. :-)
>
> BTW, do you think we should add such handling to the current migration
> protocol?
I think it's not included by design.
> The whole procedure for page flipping migration is straight forward, and
> the cases of failure I listed are in theory, which never happened at least
> since many times I have tested (except the case you raised above). But I
> agree with you on coupling with postcopy migration to make it more
> reliable, specially for the undetected problems.
The only problem that worries me is failing to load device data (most
likely due to misconfiguration or a bug).
> For this, I am not quite sure I understand it correctly, seems the latest
> update of post copy migration was sent on last Oct, would you please give
> some insights on what else could I do for the coupling with postcopy
> migration?
I don't know the state exactly. Orit and Andrea should know.
> If no, now page flipping is implemented as a migration capability, and it's
> a good shape already as your comments in the previous version. Although it
> still needs a little more time to get the numbers of the new vmsplice,
> I'd to ask your opinion that do you consider it could be merged as an
> experimental version for now?
Yes, that could be useful. I will review the patch as soon as possible.
Paolo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 10/17] migration-local: override save_page for page transmit
2013-11-21 9:11 ` [Qemu-devel] [PATCH 10/17] migration-local: override save_page for page transmit Lei Li
@ 2013-11-26 11:22 ` Paolo Bonzini
2013-11-26 12:10 ` Lei Li
0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2013-11-26 11:22 UTC (permalink / raw)
To: Lei Li
Cc: aarcange, quintela, mdroth, mrhines, qemu-devel, aliguori,
lagarcia, rcj
Il 21/11/2013 10:11, Lei Li ha scritto:
> This patch implements save_page callback for the outside
> of page flipping. It will write the address of the page
> on the Unix socket and flip the page data on pipe by
> vmsplice(). Every page address would have a header flag
> RAM_SAVE_FLAG_HOOK, which will trigger the load hook to
> receive it in incoming side as well.
>
> Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
> ---
> migration-local.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 54 insertions(+), 0 deletions(-)
>
> diff --git a/migration-local.c b/migration-local.c
> index 0f0896b..14207e9 100644
> --- a/migration-local.c
> +++ b/migration-local.c
> @@ -200,6 +200,59 @@ static int qemu_local_send_pipefd(QEMUFile *f, void *opaque,
> return 0;
> }
>
> +static size_t qemu_local_save_ram(QEMUFile *f, void *opaque,
> + MemoryRegion *mr, ram_addr_t offset,
> + size_t size, int *bytes_sent)
> +{
> + QEMUFileLocal *s = opaque;
> + ram_addr_t current_addr = mr->ram_addr + offset;
> + void *ram_addr;
> + ssize_t ret;
> +
> + if (s->unix_page_flipping) {
> + qemu_fflush(s->file);
> + qemu_put_be64(s->file, RAM_SAVE_FLAG_HOOK);
> +
> + /* Write page address to unix socket */
> + qemu_put_be64(s->file, current_addr);
> +
You can write current_addr | RAM_SAVE_FLAG_HOOK. The value will be in
the flags argument of the hook_ram_load, you can extract it with "flags
& ~RAM_SAVE_FLAG_HOOK". This cuts by half the data written to the Unix
socket.
Paolo
> + ram_addr = memory_region_get_ram_ptr(mr) + offset;
> +
> + /* vmsplice page data to pipe */
> + struct iovec iov = {
> + .iov_base = ram_addr,
> + .iov_len = size,
> + };
> +
> + /*
> + * The flag SPLICE_F_MOVE is introduced in kernel for the page
> + * flipping feature in QEMU, which will movie pages rather than
> + * copying, previously unused.
> + *
> + * If a move is not possible the kernel will transparently falls
> + * back to copying data.
> + *
> + * For older kernels the SPLICE_F_MOVE would be ignored and a copy
> + * would occur.
> + */
> + ret = vmsplice(s->pipefd[1], &iov, 1, SPLICE_F_GIFT | SPLICE_F_MOVE);
> + if (ret == -1) {
> + if (errno != EAGAIN && errno != EINTR) {
> + fprintf(stderr, "vmsplice save error: %s\n", strerror(errno));
> + return ret;
> + }
> + } else {
> + if (bytes_sent) {
> + *bytes_sent = 1;
> + }
> + DPRINTF("block_offset: %lu, offset: %lu\n", block_offset, offset);
> + return 0;
> + }
> + }
> +
> + return RAM_SAVE_CONTROL_NOT_SUPP;
> +}
> +
> static const QEMUFileOps pipe_read_ops = {
> .get_fd = qemu_local_get_sockfd,
> .get_buffer = qemu_local_get_buffer,
> @@ -211,6 +264,7 @@ static const QEMUFileOps pipe_write_ops = {
> .writev_buffer = qemu_local_writev_buffer,
> .close = qemu_local_close,
> .before_ram_iterate = qemu_local_send_pipefd,
> + .save_page = qemu_local_save_ram
> };
>
> QEMUFile *qemu_fopen_socket_local(int sockfd, const char *mode)
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 12/17] migration-local: override hook_ram_load
2013-11-21 9:11 ` [Qemu-devel] [PATCH 12/17] migration-local: override hook_ram_load Lei Li
@ 2013-11-26 11:25 ` Paolo Bonzini
2013-11-26 12:11 ` Lei Li
0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2013-11-26 11:25 UTC (permalink / raw)
To: Lei Li
Cc: aarcange, quintela, mdroth, mrhines, qemu-devel, aliguori,
lagarcia, rcj
Il 21/11/2013 10:11, Lei Li ha scritto:
> +static int qemu_local_ram_load(QEMUFile *f, void *opaque,
> + uint64_t flags)
> +{
> + QEMUFileLocal *s = opaque;
> + ram_addr_t addr;
> + struct iovec iov;
> + ssize_t ret = -EINVAL;
> +
> + /*
> + * PIPE file descriptor will be received by another callback
> + * get_buffer.
> + */
> + if (pipefd_passed) {
> + void *host;
> + /*
> + * Extract the page address from the 8-byte record and
> + * read the page data from the pipe.
> + */
> + addr = qemu_get_be64(s->file);
> + host = qemu_get_ram_ptr(addr);
> +
> + iov.iov_base = host;
> + iov.iov_len = TARGET_PAGE_SIZE;
> +
> + /* The flag SPLICE_F_MOVE is introduced in kernel for the page
> + * flipping feature in QEMU, which will movie pages rather than
> + * copying, previously unused.
> + *
> + * If a move is not possible the kernel will transparently falls
> + * back to copying data.
> + *
> + * For older kernels the SPLICE_F_MOVE would be ignored and a copy
> + * would occur.
> + */
> + ret = vmsplice(s->pipefd[0], &iov, 1, SPLICE_F_MOVE);
> + if (ret == -1) {
> + if (errno != EAGAIN && errno != EINTR) {
> + fprintf(stderr, "vmsplice() load error: %s", strerror(errno));
> + return ret;
> + }
> + DPRINTF("vmsplice load error\n");
> + } else if (ret == 0) {
> + DPRINTF(stderr, "load_page: zero read\n");
> + }
> +
> + DPRINTF("vmsplice (read): %zu\n", ret);
> + return ret;
> + }
> +
> + return 0;
> +}
I think you need to return -EINVAL if there is no pipe.
Paolo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 06/17] migration-local: add send_pipefd()
2013-11-21 9:11 ` [Qemu-devel] [PATCH 06/17] migration-local: add send_pipefd() Lei Li
@ 2013-11-26 11:26 ` Paolo Bonzini
0 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2013-11-26 11:26 UTC (permalink / raw)
To: Lei Li
Cc: aarcange, quintela, mdroth, mrhines, qemu-devel, aliguori,
lagarcia, rcj
Il 21/11/2013 10:11, Lei Li ha scritto:
> + struct cmsghdr *cmptr;
> + char req[1] = { 0x01 };
About this, see my reply to patch 8.
> + if (pipefd < 0) {
> + msg.msg_control = NULL;
> + msg.msg_controllen = 0;
> + /* Negative status means error */
> + req[0] = pipefd;
No need for this. qemu_fopen_socket_local has failed already, and you
will never get here.
Paolo
> + } else {
> + msg.msg_control = control_un.control;
> + msg.msg_controllen = sizeof(control_un.control);
> +
> + cmptr = CMSG_FIRSTHDR(&msg);
> + cmptr->cmsg_len = CMSG_LEN(sizeof(int));
> + cmptr->cmsg_level = SOL_SOCKET;
> + cmptr->cmsg_type = SCM_RIGHTS;
> + *((int *) CMSG_DATA(cmptr)) = pipefd;
> +
> + msg.msg_name = NULL;
> + msg.msg_namelen = 0;
> +
> + iov[0].iov_base = req;
> + iov[0].iov_len = sizeof(req);
> + msg.msg_iov = iov;
> + msg.msg_iovlen = 1;
> + }
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer
2013-11-21 9:11 ` [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer Lei Li
@ 2013-11-26 11:30 ` Lei Li
2013-11-26 11:31 ` Paolo Bonzini
1 sibling, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-26 11:30 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mdroth, mrhines, aliguori, lagarcia,
pbonzini@redhat.com >> Paolo Bonzini, rcj
On 11/21/2013 05:11 PM, Lei Li wrote:
> The control message for exchange of pipe file descriptor should
> be received by recvmsg, and it might be eaten to stream file by
> qemu_recv() when receiving by two callbacks. So this patch adds
> unix_msgfd_lookup() to callback get_buffer as the only one receiver,
> where the pipe file descriptor would be caughted.
>
> Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
> ---
> migration-local.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> 1 files changed, 65 insertions(+), 3 deletions(-)
>
> diff --git a/migration-local.c b/migration-local.c
> index e028beb..0f0896b 100644
> --- a/migration-local.c
> +++ b/migration-local.c
> @@ -50,6 +50,8 @@ typedef struct QEMUFileLocal {
> bool unix_page_flipping;
> } QEMUFileLocal;
>
> +static bool pipefd_passed;
> +
> static int qemu_local_get_sockfd(void *opaque)
> {
> QEMUFileLocal *s = opaque;
> @@ -57,16 +59,76 @@ static int qemu_local_get_sockfd(void *opaque)
> return s->sockfd;
> }
>
> +static int unix_msgfd_lookup(void *opaque, struct msghdr *msg)
> +{
> + QEMUFileLocal *s = opaque;
> + struct cmsghdr *cmsg;
> + bool found = false;
> +
> + for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
> + if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)) ||
> + cmsg->cmsg_level != SOL_SOCKET ||
> + cmsg->cmsg_type != SCM_RIGHTS)
> + continue;
> +
> + /* PIPE file descriptor to be received */
> + s->pipefd[0] = *((int *)CMSG_DATA(cmsg));
> + }
> +
> + if (s->pipefd[0] <= 0) {
And this should be if (s->pipefd[0] < 0)..
> + fprintf(stderr, "no pipe fd can be received\n");
> + return found;
> + }
> +
> + DPRINTF("pipefd successfully received\n");
> + return s->pipefd[0];
> +}
> +
> static int qemu_local_get_buffer(void *opaque, uint8_t *buf,
> int64_t pos, int size)
> {
> QEMUFileLocal *s = opaque;
> ssize_t len;
> + struct msghdr msg = { NULL, };
> + struct iovec iov[1];
> + union {
> + struct cmsghdr cmsg;
> + char control[CMSG_SPACE(sizeof(int))];
> + } msg_control;
> +
> + iov[0].iov_base = buf;
> + iov[0].iov_len = size;
> +
> + msg.msg_iov = iov;
> + msg.msg_iovlen = 1;
> + msg.msg_control = &msg_control;
> + msg.msg_controllen = sizeof(msg_control);
>
> for (;;) {
> - len = qemu_recv(s->sockfd, buf, size, 0);
> - if (len != -1) {
> - break;
> + if (!pipefd_passed) {
> + /*
> + * recvmsg is called here to catch the control message for
> + * the exchange of PIPE file descriptor until it is received.
> + */
> + len = recvmsg(s->sockfd, &msg, 0);
> + if (len != -1) {
> + if (unix_msgfd_lookup(s, &msg) > 0) {
> + pipefd_passed = 1;
> + /*
> + * Do not count one byte taken by the PIPE file
> + * descriptor.
> + */
> + len--;
> + } else {
> + len = -1;
> + }
Just found that this 'else' should go away as it will break the normal
Unix migration since pipefd_passed will always be 0 for it. I have
fixed this in my code, seems I mis-send it for some reason, sorry
for this...:-[
> + break;
> + }
> + } else {
> + len = qemu_recv(s->sockfd, buf, size, 0);
> + if (len != -1) {
> + break;
> + }
> }
>
> if (socket_error() == EAGAIN) {
--
Lei
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer
2013-11-21 9:11 ` [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer Lei Li
2013-11-26 11:30 ` Lei Li
@ 2013-11-26 11:31 ` Paolo Bonzini
2013-11-26 14:00 ` Lei Li
1 sibling, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2013-11-26 11:31 UTC (permalink / raw)
To: Lei Li
Cc: aarcange, quintela, mdroth, mrhines, qemu-devel, aliguori,
lagarcia, rcj
Il 21/11/2013 10:11, Lei Li ha scritto:
> + /*
> + * recvmsg is called here to catch the control message for
> + * the exchange of PIPE file descriptor until it is received.
> + */
> + len = recvmsg(s->sockfd, &msg, 0);
> + if (len != -1) {
> + if (unix_msgfd_lookup(s, &msg) > 0) {
> + pipefd_passed = 1;
> + /*
> + * Do not count one byte taken by the PIPE file
> + * descriptor.
> + */
> + len--;
I think adding a byte in the middle of the stream is not reliable.
Rather, you should transmit the socket always at the same place, for
example in the first call of qemu_local_save_ram, after it has written
the 64-bit field.
The matching code in qemu_local_ram_load will be like this:
static int qemu_local_ram_load(QEMUFile *f, void *opaque,
uint64_t flags)
{
QEMUFileLocal *s = opaque;
ram_addr_t addr;
struct iovec iov;
ssize_t ret = -EINVAL;
if (!s->pipefd_received) {
/*
* send_pipefd was called at this point, and it wrote one byte
* to the stream.
*/
qemu_get_byte(s);
s->pipefd_received = true;
}
if (pipefd_passed) {
...
}
return -EINVAL;
}
Also, please move pipefd_passed within QEMUFileLocal.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping
2013-11-21 9:11 ` [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping Lei Li
@ 2013-11-26 11:32 ` Paolo Bonzini
2013-11-26 12:03 ` Lei Li
0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2013-11-26 11:32 UTC (permalink / raw)
To: Lei Li
Cc: aarcange, quintela, mdroth, mrhines, qemu-devel, aliguori,
lagarcia, rcj
Il 21/11/2013 10:11, Lei Li ha scritto:
> Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
> ---
> migration.c | 10 +++++++---
> 1 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/migration.c b/migration.c
> index 4ac466b..0f98ac1 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -579,10 +579,11 @@ static void *migration_thread(void *opaque)
> pending_size = qemu_savevm_state_pending(s->file, max_size);
> DPRINTF("pending size %" PRIu64 " max %" PRIu64 "\n",
> pending_size, max_size);
> - if (pending_size && pending_size >= max_size) {
> + if (pending_size && pending_size >= max_size &&
> + !runstate_needs_reset()) {
> qemu_savevm_state_iterate(s->file);
I'm not sure why you need this.
> } else {
> - int ret;
> + int ret = 0;
>
> DPRINTF("done iterating\n");
> qemu_mutex_lock_iothread();
> @@ -590,7 +591,10 @@ static void *migration_thread(void *opaque)
> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> old_vm_running = runstate_is_running();
>
> - ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> + if (!runstate_needs_reset()) {
> + ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> + }
This however is okay.
Paolo
> if (ret >= 0) {
> qemu_file_set_rate_limit(s->file, INT_MAX);
> qemu_savevm_state_complete(s->file);
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping
2013-11-26 11:32 ` Paolo Bonzini
@ 2013-11-26 12:03 ` Lei Li
2013-11-26 12:54 ` Paolo Bonzini
0 siblings, 1 reply; 45+ messages in thread
From: Lei Li @ 2013-11-26 12:03 UTC (permalink / raw)
To: Paolo Bonzini
Cc: aarcange, quintela, qemu-devel, mrhines, mdroth, aliguori,
lagarcia, rcj
On 11/26/2013 07:32 PM, Paolo Bonzini wrote:
> Il 21/11/2013 10:11, Lei Li ha scritto:
>> Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
>> ---
>> migration.c | 10 +++++++---
>> 1 files changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/migration.c b/migration.c
>> index 4ac466b..0f98ac1 100644
>> --- a/migration.c
>> +++ b/migration.c
>> @@ -579,10 +579,11 @@ static void *migration_thread(void *opaque)
>> pending_size = qemu_savevm_state_pending(s->file, max_size);
>> DPRINTF("pending size %" PRIu64 " max %" PRIu64 "\n",
>> pending_size, max_size);
>> - if (pending_size && pending_size >= max_size) {
>> + if (pending_size && pending_size >= max_size &&
>> + !runstate_needs_reset()) {
>> qemu_savevm_state_iterate(s->file);
> I'm not sure why you need this.
The adjustment here is to avoid the iteration stage for page flipping.
Because pending_size = ram_save_remaining() * TARGET_PAGE_SIZE which is
not 0 and pending_size > max_size (0) at start.
In the previous version it was like this:
if (pending_size && pending_size >= max_size &&
!migrate_unix_page_flipping()) {
And you said 'This is a bit ugly but I understand the need. Perhaps "&&
!runstate_needs_reset()" like below?' :)
>
>> } else {
>> - int ret;
>> + int ret = 0;
>>
>> DPRINTF("done iterating\n");
>> qemu_mutex_lock_iothread();
>> @@ -590,7 +591,10 @@ static void *migration_thread(void *opaque)
>> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>> old_vm_running = runstate_is_running();
>>
>> - ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>> + if (!runstate_needs_reset()) {
>> + ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>> + }
> This however is okay.
>
> Paolo
>
>> if (ret >= 0) {
>> qemu_file_set_rate_limit(s->file, INT_MAX);
>> qemu_savevm_state_complete(s->file);
>>
>
--
Lei
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 10/17] migration-local: override save_page for page transmit
2013-11-26 11:22 ` Paolo Bonzini
@ 2013-11-26 12:10 ` Lei Li
0 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-26 12:10 UTC (permalink / raw)
To: Paolo Bonzini
Cc: aarcange, quintela, qemu-devel, mrhines, mdroth, aliguori,
lagarcia, rcj
On 11/26/2013 07:22 PM, Paolo Bonzini wrote:
> Il 21/11/2013 10:11, Lei Li ha scritto:
>> This patch implements save_page callback for the outside
>> of page flipping. It will write the address of the page
>> on the Unix socket and flip the page data on pipe by
>> vmsplice(). Every page address would have a header flag
>> RAM_SAVE_FLAG_HOOK, which will trigger the load hook to
>> receive it in incoming side as well.
>>
>> Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
>> ---
>> migration-local.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 files changed, 54 insertions(+), 0 deletions(-)
>>
>> diff --git a/migration-local.c b/migration-local.c
>> index 0f0896b..14207e9 100644
>> --- a/migration-local.c
>> +++ b/migration-local.c
>> @@ -200,6 +200,59 @@ static int qemu_local_send_pipefd(QEMUFile *f, void *opaque,
>> return 0;
>> }
>>
>> +static size_t qemu_local_save_ram(QEMUFile *f, void *opaque,
>> + MemoryRegion *mr, ram_addr_t offset,
>> + size_t size, int *bytes_sent)
>> +{
>> + QEMUFileLocal *s = opaque;
>> + ram_addr_t current_addr = mr->ram_addr + offset;
>> + void *ram_addr;
>> + ssize_t ret;
>> +
>> + if (s->unix_page_flipping) {
>> + qemu_fflush(s->file);
>> + qemu_put_be64(s->file, RAM_SAVE_FLAG_HOOK);
>> +
>> + /* Write page address to unix socket */
>> + qemu_put_be64(s->file, current_addr);
>> +
> You can write current_addr | RAM_SAVE_FLAG_HOOK. The value will be in
> the flags argument of the hook_ram_load, you can extract it with "flags
> & ~RAM_SAVE_FLAG_HOOK". This cuts by half the data written to the Unix
> socket.
OK, thanks.
> Paolo
>
>> + ram_addr = memory_region_get_ram_ptr(mr) + offset;
>> +
>> + /* vmsplice page data to pipe */
>> + struct iovec iov = {
>> + .iov_base = ram_addr,
>> + .iov_len = size,
>> + };
>> +
>> + /*
>> + * The flag SPLICE_F_MOVE is introduced in kernel for the page
>> + * flipping feature in QEMU, which will movie pages rather than
>> + * copying, previously unused.
>> + *
>> + * If a move is not possible the kernel will transparently falls
>> + * back to copying data.
>> + *
>> + * For older kernels the SPLICE_F_MOVE would be ignored and a copy
>> + * would occur.
>> + */
>> + ret = vmsplice(s->pipefd[1], &iov, 1, SPLICE_F_GIFT | SPLICE_F_MOVE);
>> + if (ret == -1) {
>> + if (errno != EAGAIN && errno != EINTR) {
>> + fprintf(stderr, "vmsplice save error: %s\n", strerror(errno));
>> + return ret;
>> + }
>> + } else {
>> + if (bytes_sent) {
>> + *bytes_sent = 1;
>> + }
>> + DPRINTF("block_offset: %lu, offset: %lu\n", block_offset, offset);
>> + return 0;
>> + }
>> + }
>> +
>> + return RAM_SAVE_CONTROL_NOT_SUPP;
>> +}
>> +
>> static const QEMUFileOps pipe_read_ops = {
>> .get_fd = qemu_local_get_sockfd,
>> .get_buffer = qemu_local_get_buffer,
>> @@ -211,6 +264,7 @@ static const QEMUFileOps pipe_write_ops = {
>> .writev_buffer = qemu_local_writev_buffer,
>> .close = qemu_local_close,
>> .before_ram_iterate = qemu_local_send_pipefd,
>> + .save_page = qemu_local_save_ram
>> };
>>
>> QEMUFile *qemu_fopen_socket_local(int sockfd, const char *mode)
>>
>
--
Lei
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 12/17] migration-local: override hook_ram_load
2013-11-26 11:25 ` Paolo Bonzini
@ 2013-11-26 12:11 ` Lei Li
0 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-26 12:11 UTC (permalink / raw)
To: Paolo Bonzini
Cc: aarcange, quintela, qemu-devel, mrhines, mdroth, aliguori,
lagarcia, rcj
On 11/26/2013 07:25 PM, Paolo Bonzini wrote:
> Il 21/11/2013 10:11, Lei Li ha scritto:
>> +static int qemu_local_ram_load(QEMUFile *f, void *opaque,
>> + uint64_t flags)
>> +{
>> + QEMUFileLocal *s = opaque;
>> + ram_addr_t addr;
>> + struct iovec iov;
>> + ssize_t ret = -EINVAL;
>> +
>> + /*
>> + * PIPE file descriptor will be received by another callback
>> + * get_buffer.
>> + */
>> + if (pipefd_passed) {
>> + void *host;
>> + /*
>> + * Extract the page address from the 8-byte record and
>> + * read the page data from the pipe.
>> + */
>> + addr = qemu_get_be64(s->file);
>> + host = qemu_get_ram_ptr(addr);
>> +
>> + iov.iov_base = host;
>> + iov.iov_len = TARGET_PAGE_SIZE;
>> +
>> + /* The flag SPLICE_F_MOVE is introduced in kernel for the page
>> + * flipping feature in QEMU, which will movie pages rather than
>> + * copying, previously unused.
>> + *
>> + * If a move is not possible the kernel will transparently falls
>> + * back to copying data.
>> + *
>> + * For older kernels the SPLICE_F_MOVE would be ignored and a copy
>> + * would occur.
>> + */
>> + ret = vmsplice(s->pipefd[0], &iov, 1, SPLICE_F_MOVE);
>> + if (ret == -1) {
>> + if (errno != EAGAIN && errno != EINTR) {
>> + fprintf(stderr, "vmsplice() load error: %s", strerror(errno));
>> + return ret;
>> + }
>> + DPRINTF("vmsplice load error\n");
>> + } else if (ret == 0) {
>> + DPRINTF(stderr, "load_page: zero read\n");
>> + }
>> +
>> + DPRINTF("vmsplice (read): %zu\n", ret);
>> + return ret;
>> + }
>> +
>> + return 0;
>> +}
> I think you need to return -EINVAL if there is no pipe.
Yes, you are right..
>
> Paolo
>
--
Lei
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 14/17] add new RanState RAN_STATE_MEMORY_STALE
2013-11-21 9:11 ` [Qemu-devel] [PATCH 14/17] add new RanState RAN_STATE_MEMORY_STALE Lei Li
@ 2013-11-26 12:28 ` Paolo Bonzini
2013-11-26 14:02 ` Lei Li
0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2013-11-26 12:28 UTC (permalink / raw)
To: Lei Li
Cc: aarcange, quintela, mdroth, mrhines, qemu-devel, aliguori,
lagarcia, rcj
Il 21/11/2013 10:11, Lei Li ha scritto:
>
> { RUN_STATE_DEBUG, RUN_STATE_SUSPENDED },
DEBUG -> MEMORY_STALE is missing.
Paolo
> { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
> { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
> { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
> + { RUN_STATE_SUSPENDED, RUN_STATE_MEMORY_STALE },
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping
2013-11-26 12:03 ` Lei Li
@ 2013-11-26 12:54 ` Paolo Bonzini
2013-11-26 13:53 ` Lei Li
0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2013-11-26 12:54 UTC (permalink / raw)
To: Lei Li
Cc: aarcange, quintela, qemu-devel, mrhines, mdroth, aliguori,
lagarcia, rcj
Il 26/11/2013 13:03, Lei Li ha scritto:
>>>
>>> + if (pending_size && pending_size >= max_size &&
>>> + !runstate_needs_reset()) {
>>> qemu_savevm_state_iterate(s->file);
>> I'm not sure why you need this.
>
> The adjustment here is to avoid the iteration stage for page flipping.
> Because pending_size = ram_save_remaining() * TARGET_PAGE_SIZE which is
> not 0 and pending_size > max_size (0) at start.
It's still not clear to me that avoiding the iteration stage is
necessary. I think it's just an optimization to avoid scanning the
bitmap, but:
(1) Juan's bitmap optimization will make this mostly unnecessary
(2) getting good downtime from page flipping will require postcopy anyway.
> And you said 'This is a bit ugly but I understand the need. Perhaps "&&
> !runstate_needs_reset()" like below?' :)
Oops. I might have said this before thinking about postcopy and/or
before seeing the benchmark results from Juan's patches. If this part
of the patch is just an optimization, I'd rather leave it out for now.
Thanks for putting up with me. :)
Paolo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping
2013-11-26 12:54 ` Paolo Bonzini
@ 2013-11-26 13:53 ` Lei Li
2013-11-26 14:11 ` Paolo Bonzini
0 siblings, 1 reply; 45+ messages in thread
From: Lei Li @ 2013-11-26 13:53 UTC (permalink / raw)
To: Paolo Bonzini
Cc: aarcange, quintela, mdroth, mrhines, qemu-devel, aliguori,
lagarcia, rcj
On 11/26/2013 08:54 PM, Paolo Bonzini wrote:
> Il 26/11/2013 13:03, Lei Li ha scritto:
>>>> + if (pending_size && pending_size >= max_size &&
>>>> + !runstate_needs_reset()) {
>>>> qemu_savevm_state_iterate(s->file);
>>> I'm not sure why you need this.
>> The adjustment here is to avoid the iteration stage for page flipping.
>> Because pending_size = ram_save_remaining() * TARGET_PAGE_SIZE which is
>> not 0 and pending_size > max_size (0) at start.
> It's still not clear to me that avoiding the iteration stage is
The purpose of it is not just for optimization, but to avoid the
iteration for better alignment.
The current flow of page flipping basically has two stages:
1) ram_save_setup stage, it will send all the bytes in this stages
to destination, and send_pipefd by ram_control_before_iterate
at the end of it.
2) ram_save_complete, it will start to transmit the ram page
in ram_save_block, and send the device state after that.
So it needs to adjust the current migration process to avoid
the iteration stage.
> necessary. I think it's just an optimization to avoid scanning the
> bitmap, but:
>
> (1) Juan's bitmap optimization will make this mostly unnecessary
>
> (2) getting good downtime from page flipping will require postcopy anyway.
>
>> And you said 'This is a bit ugly but I understand the need. Perhaps "&&
>> !runstate_needs_reset()" like below?' :)
> Oops. I might have said this before thinking about postcopy and/or
> before seeing the benchmark results from Juan's patches. If this part
> of the patch is just an optimization, I'd rather leave it out for now.
I am afraid that page flipping can not proceed correctly without this..
>
> Thanks for putting up with me. :)
>
> Paolo
>
--
Lei
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer
2013-11-26 11:31 ` Paolo Bonzini
@ 2013-11-26 14:00 ` Lei Li
2013-11-26 14:14 ` Paolo Bonzini
0 siblings, 1 reply; 45+ messages in thread
From: Lei Li @ 2013-11-26 14:00 UTC (permalink / raw)
To: Paolo Bonzini
Cc: aarcange, quintela, qemu-devel, mrhines, mdroth, aliguori,
lagarcia, rcj
On 11/26/2013 07:31 PM, Paolo Bonzini wrote:
> Il 21/11/2013 10:11, Lei Li ha scritto:
>> + /*
>> + * recvmsg is called here to catch the control message for
>> + * the exchange of PIPE file descriptor until it is received.
>> + */
>> + len = recvmsg(s->sockfd, &msg, 0);
>> + if (len != -1) {
>> + if (unix_msgfd_lookup(s, &msg) > 0) {
>> + pipefd_passed = 1;
>> + /*
>> + * Do not count one byte taken by the PIPE file
>> + * descriptor.
>> + */
>> + len--;
> I think adding a byte in the middle of the stream is not reliable.
>
> Rather, you should transmit the socket always at the same place, for
> example in the first call of qemu_local_save_ram, after it has written
> the 64-bit field.
I guess 'transmit the socket' you mean transmit the fd?
Sorry that I am quite understand your suggestion here.. Do you
mean that send_pipefd in the first call of qemu_local_save_ram
after it has written the 64-bit field? In this way, get rid of
qemu_local_send_pipefd?
Currently, the fd control message is sent at the end of the stream
in ram_save_setup stage, followed by the ram page. The control
message of fd is always at the same place.
>
> The matching code in qemu_local_ram_load will be like this:
>
> static int qemu_local_ram_load(QEMUFile *f, void *opaque,
> uint64_t flags)
> {
> QEMUFileLocal *s = opaque;
> ram_addr_t addr;
> struct iovec iov;
> ssize_t ret = -EINVAL;
>
> if (!s->pipefd_received) {
> /*
> * send_pipefd was called at this point, and it wrote one byte
> * to the stream.
> */
> qemu_get_byte(s);
> s->pipefd_received = true;
> }
>
> if (pipefd_passed) {
> ...
> }
> return -EINVAL;
> }
>
> Also, please move pipefd_passed within QEMUFileLocal.
>
> Thanks,
>
> Paolo
>
--
Lei
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 14/17] add new RanState RAN_STATE_MEMORY_STALE
2013-11-26 12:28 ` Paolo Bonzini
@ 2013-11-26 14:02 ` Lei Li
0 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-26 14:02 UTC (permalink / raw)
To: Paolo Bonzini
Cc: aarcange, quintela, qemu-devel, mrhines, mdroth, aliguori,
lagarcia, rcj
On 11/26/2013 08:28 PM, Paolo Bonzini wrote:
> Il 21/11/2013 10:11, Lei Li ha scritto:
>>
>> { RUN_STATE_DEBUG, RUN_STATE_SUSPENDED },
> DEBUG -> MEMORY_STALE is missing.
Good catch, I will add it, thanks. :)
>
> Paolo
>
>> { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
>> { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
>> { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
>> + { RUN_STATE_SUSPENDED, RUN_STATE_MEMORY_STALE },
>>
>
--
Lei
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping
2013-11-26 13:53 ` Lei Li
@ 2013-11-26 14:11 ` Paolo Bonzini
2013-11-28 8:19 ` Lei Li
0 siblings, 1 reply; 45+ messages in thread
From: Paolo Bonzini @ 2013-11-26 14:11 UTC (permalink / raw)
To: Lei Li
Cc: aarcange, quintela, mdroth, mrhines, qemu-devel, aliguori,
lagarcia, rcj
Il 26/11/2013 14:53, Lei Li ha scritto:
> 1) ram_save_setup stage, it will send all the bytes in this stages
> to destination, and send_pipefd by ram_control_before_iterate
> at the end of it.
ram_save_setup runs doesn't send anything from guest RAM. It sends the
lengths of the various blocks. As you said, at the end of
ram_save_setup you send the pipefd.
ram_save_iterate runs before ram_save_complete. ram_save_iterate and
ram_save_complete write data with exactly the same format. Both of them
can use ram_save_page
It should not matter if some pages are sent as part of ram_save_iterate
and others as part of ram_save_complete.
One possibility is that you are hitting a bug due to the way you ignore
the "0x01" byte that send_pipefd places on the socket.
>> Oops. I might have said this before thinking about postcopy and/or
>> before seeing the benchmark results from Juan's patches. If this part
>> of the patch is just an optimization, I'd rather leave it out for now.
>
> I am afraid that page flipping can not proceed correctly without this..
I really would like to understand why, because it really shouldn't (this
shouldn't be a place where you need a hook).
Paolo
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer
2013-11-26 14:00 ` Lei Li
@ 2013-11-26 14:14 ` Paolo Bonzini
0 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2013-11-26 14:14 UTC (permalink / raw)
To: Lei Li
Cc: aarcange, quintela, qemu-devel, mrhines, mdroth, aliguori,
lagarcia, rcj
Il 26/11/2013 15:00, Lei Li ha scritto:
>>>
>> I think adding a byte in the middle of the stream is not reliable.
>>
>> Rather, you should transmit the socket always at the same place, for
>> example in the first call of qemu_local_save_ram, after it has written
>> the 64-bit field.
>
> I guess 'transmit the socket' you mean transmit the fd?
Yes.
> Sorry that I am quite understand your suggestion here.. Do you
> mean that send_pipefd in the first call of qemu_local_save_ram
> after it has written the 64-bit field? In this way, get rid of
> qemu_local_send_pipefd?
Yes. This way you know exactly where to "eat" the byte that's written
with sendmsg.
Paolo
> Currently, the fd control message is sent at the end of the stream
> in ram_save_setup stage, followed by the ram page. The control
> message of fd is always at the same place.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram
2013-11-26 11:17 ` Paolo Bonzini
@ 2013-11-27 16:48 ` Andrea Arcangeli
0 siblings, 0 replies; 45+ messages in thread
From: Andrea Arcangeli @ 2013-11-27 16:48 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Lei Li, quintela, qemu-devel, mrhines, mdroth, Orit Wasserman,
aliguori, lagarcia, rcj
On Tue, Nov 26, 2013 at 12:17:09PM +0100, Paolo Bonzini wrote:
> Il 26/11/2013 12:07, Lei Li ha scritto:
> > For this, I am not quite sure I understand it correctly, seems the latest
> > update of post copy migration was sent on last Oct, would you please give
> > some insights on what else could I do for the coupling with postcopy
> > migration?
>
> I don't know the state exactly. Orit and Andrea should know.
Ok, about the last update sent, so I'm not optimistic the kernel
backend is good because it uses a device driver that allocates the
memory locally and effectively disables THP KSM swap compression
overcommit and automatic NUMA balancing.
I wrote a new kernel backend by introducing two new kernel features:
1) MADV_USERFAULT (to deliver the KVM/qemu page fault to qemu userland)
2) remap_anon_pages (new syscall that qemu will use inside the
migration thread that gets out of band events from the userland
page fault, and also to do the background network transfer of all
RAM while the guest already runs on the destination node)
Now you use vmsplice so you don't need remap_anon_pages in your case.
You only need MADV_USERFAULT.
I added a FOLL_USERFAULT too, as if it's KVM trapping on it, it will
have to deliver the fault to qemu through a vmexit and it's not doing
that yet. KVM page faults calling gup_fast, will have to use
FOLL_USERFAULT. This also means changing the API of all gup_fast to
get a "foll" parameter, but we need to do that anyway to remove the
FOLL_GET and fix /dev/mem mapped as guest physical memory (FOLL_GET on
/dev/mem backfires), and to speedup the page fault too to avoid those
useless get_page/put_page during every fault (MMU notifier don't
require FOLL_GET or any page reference at any time as long as the page
goes in the spte and the proper spte locks are hold to serialize
against the MMU notifier events).
For the non-local case, remap_anon_pages should be faster than
vmsplice as it doesn't need to pass through a pipe and just mangles
two pagetables and two pmds based on the virtual address given as
parameter.
If you want to review the kernel backend I implemented for postcopy,
this is updated on my latest aa.git tree:
http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?id=e69e1067f1d7e0f441c0c222a1017a07afe0bfc9
http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?id=d182b5118e2b22dd73018b75dce027c4ebabce14
I also looked into sharing code with the volatile range for android
temporary page mappings that can be discared but that has various
reasons to want putting placeholders into the pagetable. And the
functionality is different too, which is why the volatile range needs
to put placeholders into the empty pagetables, after all...
I don't think we can use the volatile range because that would discard
the pages too. MADV_USERFAULT is also somewhat simpler and it provides
just the user fault functionality (it cannot discard the pages). It
sends a sigbus instead of mapping a zero page and it doesn't even
require to allocate empty pagetables for the userfault range.
Once the live migration is complete MADV_USERFAULT should be cleared
from the vma simply with an madvise call, and any sign of it will go
away (unlike the device driver that stays forever). And once postcopy
completes all RAM is already entirely anonymous, already backed by THP
(if the out of band network transfers are 2M large it'll create 2M
pages in zero copy and there will never any sign of 4k pages for the
whole duration of migration) and the userfaulted memory can be NUMA
migrated or swapped out at any time. MADV_USERFAULT doesn't interfere
with swapouts.
remap_anon_pages also doesn't interfere with swapouts or automatic
NUMA migrations: if the received page gets swapped out before the
migration threads maps it in the guest physical address space, the
swap entry is transferred from the temporary address to the guest
physical address still with a single copy that reads and writes 8
bytes (just 1 cacheline written, modulo PT locks), and no I/O
triggers.
It would have been possible to also extend remap_file_pages to work on
anonymous memory instead of only nonlinear file mappings, however that
would alter the API as it wouldn't return -EINVAL anymore. It's easy
to change things if we want to use remap_file_pages for anonymous
memory too. Some larger discussion on the API details will be needed
but we're not at that point yet I think, and currently I'm more
interested to sort out the lowlevel details first, the kernel backend
API should be frozen at the last possible moment I think.
The qemu userland details of postcopy using the new kernel features are
still not finished, but conceptually the design is pretty clear.
This is far from definitive, if somebody has better ideas, please
comment of course.
Thanks,
Andrea
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping
2013-11-26 14:11 ` Paolo Bonzini
@ 2013-11-28 8:19 ` Lei Li
0 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-28 8:19 UTC (permalink / raw)
To: Paolo Bonzini
Cc: aarcange, quintela, qemu-devel, mrhines, mdroth, aliguori,
lagarcia, rcj
On 11/26/2013 10:11 PM, Paolo Bonzini wrote:
> Il 26/11/2013 14:53, Lei Li ha scritto:
>> 1) ram_save_setup stage, it will send all the bytes in this stages
>> to destination, and send_pipefd by ram_control_before_iterate
>> at the end of it.
> ram_save_setup runs doesn't send anything from guest RAM. It sends the
> lengths of the various blocks. As you said, at the end of
> ram_save_setup you send the pipefd.
>
> ram_save_iterate runs before ram_save_complete. ram_save_iterate and
> ram_save_complete write data with exactly the same format. Both of them
> can use ram_save_page
>
> It should not matter if some pages are sent as part of ram_save_iterate
> and others as part of ram_save_complete.
>
> One possibility is that you are hitting a bug due to the way you ignore
> the "0x01" byte that send_pipefd places on the socket.
>
>>> Oops. I might have said this before thinking about postcopy and/or
>>> before seeing the benchmark results from Juan's patches. If this part
>>> of the patch is just an optimization, I'd rather leave it out for now.
>> I am afraid that page flipping can not proceed correctly without this..
> I really would like to understand why, because it really shouldn't (this
> shouldn't be a place where you need a hook).
Hi Paolo,
Sorry for the late reply.
Yes, you are right!! I just have a try with this adjustment removed, it
works well...
I remembered that it can not proceed correctly when debugging in previous
version without this as in theory it should like your explanation above. I
guess the only answer is that there was a bug regarding the one byte fd
control message just like the possibility you listed!
>
> Paolo
>
>
--
Lei
^ permalink raw reply [flat|nested] 45+ messages in thread
* [Qemu-devel] [PATCH 14/17] add new RanState RAN_STATE_MEMORY_STALE
2013-11-29 10:06 [Qemu-devel] [PATCH 0/17 v4] " Lei Li
@ 2013-11-29 10:06 ` Lei Li
0 siblings, 0 replies; 45+ messages in thread
From: Lei Li @ 2013-11-29 10:06 UTC (permalink / raw)
To: qemu-devel
Cc: aarcange, Lei Li, quintela, mrhines, aliguori, lagarcia, pbonzini,
rcj
Introduce new RanState RAN_STATE_MEMORY_STALE and
add it to runstate_needs_reset().
Signed-off-by: Lei Li <lilei@linux.vnet.ibm.com>
---
qapi-schema.json | 7 +++++--
vl.c | 13 ++++++++++++-
2 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/qapi-schema.json b/qapi-schema.json
index b290a0f..4d9e712 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -176,12 +176,15 @@
# @watchdog: the watchdog action is configured to pause and has been triggered
#
# @guest-panicked: guest has been panicked as a result of guest OS panic
+#
+# @memory-stale: guest is paused to start unix_page_flipping migration
+# process, the destination QEMU will has the newer contents of the memory
##
{ 'enum': 'RunState',
'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
- 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
- 'guest-panicked' ] }
+ 'running', 'save-vm', 'shutdown', 'suspended', 'memory-stale',
+ 'watchdog', 'guest-panicked' ] }
##
# @SnapshotInfo
diff --git a/vl.c b/vl.c
index 8d5d874..3ea96b2 100644
--- a/vl.c
+++ b/vl.c
@@ -601,6 +601,7 @@ static const RunStateTransition runstate_transitions_def[] = {
{ RUN_STATE_PAUSED, RUN_STATE_RUNNING },
{ RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE },
+ { RUN_STATE_PAUSED, RUN_STATE_MEMORY_STALE },
{ RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
{ RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
@@ -608,6 +609,7 @@ static const RunStateTransition runstate_transitions_def[] = {
{ RUN_STATE_PRELAUNCH, RUN_STATE_RUNNING },
{ RUN_STATE_PRELAUNCH, RUN_STATE_FINISH_MIGRATE },
{ RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
+ { RUN_STATE_PRELAUNCH, RUN_STATE_MEMORY_STALE },
{ RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
{ RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE },
@@ -624,23 +626,31 @@ static const RunStateTransition runstate_transitions_def[] = {
{ RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
{ RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
{ RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
+ { RUN_STATE_RUNNING, RUN_STATE_MEMORY_STALE },
{ RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
{ RUN_STATE_SHUTDOWN, RUN_STATE_PAUSED },
{ RUN_STATE_SHUTDOWN, RUN_STATE_FINISH_MIGRATE },
+ { RUN_STATE_SHUTDOWN, RUN_STATE_MEMORY_STALE },
{ RUN_STATE_DEBUG, RUN_STATE_SUSPENDED },
+ { RUN_STATE_DEBUG, RUN_STATE_MEMORY_STALE },
{ RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
{ RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
{ RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
+ { RUN_STATE_SUSPENDED, RUN_STATE_MEMORY_STALE },
{ RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
{ RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
+ { RUN_STATE_WATCHDOG, RUN_STATE_MEMORY_STALE },
{ RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
{ RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
+ { RUN_STATE_GUEST_PANICKED, RUN_STATE_MEMORY_STALE },
+ { RUN_STATE_MEMORY_STALE, RUN_STATE_RUNNING },
+ { RUN_STATE_MEMORY_STALE, RUN_STATE_POSTMIGRATE },
{ RUN_STATE_MAX, RUN_STATE_MAX },
};
@@ -685,7 +695,8 @@ int runstate_is_running(void)
bool runstate_needs_reset(void)
{
return runstate_check(RUN_STATE_INTERNAL_ERROR) ||
- runstate_check(RUN_STATE_SHUTDOWN);
+ runstate_check(RUN_STATE_SHUTDOWN) ||
+ runstate_check(RUN_STATE_MEMORY_STALE);
}
StatusInfo *qmp_query_status(Error **errp)
--
1.7.7.6
^ permalink raw reply related [flat|nested] 45+ messages in thread
end of thread, other threads:[~2013-11-29 10:07 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-21 9:11 [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 01/17] QAPI: introduce migration capability unix_page_flipping Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 02/17] migration: add migrate_unix_page_flipping() Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 03/17] qmp-command.hx: add missing docs for migration capabilites Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 04/17] migration-local: add QEMUFileLocal with socket based QEMUFile Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 05/17] migration-local: introduce qemu_fopen_socket_local() Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 06/17] migration-local: add send_pipefd() Lei Li
2013-11-26 11:26 ` Paolo Bonzini
2013-11-21 9:11 ` [Qemu-devel] [PATCH 07/17] migration-local: override before_ram_iterate to send pipefd Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer Lei Li
2013-11-26 11:30 ` Lei Li
2013-11-26 11:31 ` Paolo Bonzini
2013-11-26 14:00 ` Lei Li
2013-11-26 14:14 ` Paolo Bonzini
2013-11-21 9:11 ` [Qemu-devel] [PATCH 09/17] save_page: replace block_offset with a MemoryRegion Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 10/17] migration-local: override save_page for page transmit Lei Li
2013-11-26 11:22 ` Paolo Bonzini
2013-11-26 12:10 ` Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 11/17] savevm: adjust ram_control_save_page for page flipping Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 12/17] migration-local: override hook_ram_load Lei Li
2013-11-26 11:25 ` Paolo Bonzini
2013-11-26 12:11 ` Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 13/17] migration-unix: replace qemu_fopen_socket with qemu_fopen_socket_local Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 14/17] add new RanState RAN_STATE_MEMORY_STALE Lei Li
2013-11-26 12:28 ` Paolo Bonzini
2013-11-26 14:02 ` Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 15/17] migration-unix: page flipping support on unix outgoing Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping Lei Li
2013-11-26 11:32 ` Paolo Bonzini
2013-11-26 12:03 ` Lei Li
2013-11-26 12:54 ` Paolo Bonzini
2013-11-26 13:53 ` Lei Li
2013-11-26 14:11 ` Paolo Bonzini
2013-11-28 8:19 ` Lei Li
2013-11-21 9:11 ` [Qemu-devel] [PATCH 17/17] hmp: better format for info migrate_capabilities Lei Li
2013-11-21 10:19 ` [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram Daniel P. Berrange
2013-11-22 11:29 ` Lei Li
2013-11-22 11:36 ` Paolo Bonzini
2013-11-25 7:29 ` Lei Li
2013-11-25 9:48 ` Paolo Bonzini
2013-11-26 11:07 ` Lei Li
2013-11-26 11:17 ` Paolo Bonzini
2013-11-27 16:48 ` Andrea Arcangeli
2013-11-22 11:36 ` Daniel P. Berrange
-- strict thread matches above, loose matches on Subject: below --
2013-11-29 10:06 [Qemu-devel] [PATCH 0/17 v4] " Lei Li
2013-11-29 10:06 ` [Qemu-devel] [PATCH 14/17] add new RanState RAN_STATE_MEMORY_STALE Lei Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).