* [PATCH 1 of 4] libxl: include domain id in userdata path
2010-09-06 10:03 [PATCH 0 of 4] libxc: avoid spurious error logging on restore/migrate Ian Campbell
@ 2010-09-06 10:03 ` Ian Campbell
2010-09-06 10:03 ` [PATCH 2 of 4] xl: do not return to caller from monitoring daemon Ian Campbell
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2010-09-06 10:03 UTC (permalink / raw)
To: xen-devel; +Cc: Brendan Cully
# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1283766891 -3600
# Node ID cbba0599e0a1728e524ba640bdcea7d4af05ddd5
# Parent 7b69ef39c61bc056aa0c9751e523ce6d9d3fc47f
libxl: include domain id in userdata path.
The userdata is specific to a particular incarnation of a domain and
the patch is therefor required to be unique to each incarnation. If
the user has explicitly configured a UUID in their domain
configuration then the path is no longer unique since
22124:22366e13f76d "xl: randomly generate UUIDs" which (correctly)
caused the uuid domain configuration option to be obeyed.
If userdata is not unique to each incarnation of a domain then
localhost live migration is broken because the target is created (and
writes its userdata) before the sender destroys the domain (and
deletes its userdata).
Strictly speaking I think the UUID is unnecessary but it is perhaps
helpful to people looking in the userdata directory, for debugging
etc.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
diff -r 7b69ef39c61b -r cbba0599e0a1 tools/libxl/libxl_dom.c
--- a/tools/libxl/libxl_dom.c Mon Sep 06 10:54:51 2010 +0100
+++ b/tools/libxl/libxl_dom.c Mon Sep 06 10:54:51 2010 +0100
@@ -466,8 +466,8 @@ static const char *userdata_path(libxl_g
uuid_string = libxl_sprintf(gc, LIBXL_UUID_FMT, LIBXL_UUID_BYTES(info.uuid));
path = libxl_sprintf(gc, "/var/lib/xen/"
- "userdata-%s.%s.%s",
- wh, uuid_string, userdata_userid);
+ "userdata-%s.%u.%s.%s",
+ wh, domid, uuid_string, userdata_userid);
if (!path)
XL_LOG_ERRNO(ctx, XL_LOG_ERROR, "unable to allocate for"
" userdata path");
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 2 of 4] xl: do not return to caller from monitoring daemon
2010-09-06 10:03 [PATCH 0 of 4] libxc: avoid spurious error logging on restore/migrate Ian Campbell
2010-09-06 10:03 ` [PATCH 1 of 4] libxl: include domain id in userdata path Ian Campbell
@ 2010-09-06 10:03 ` Ian Campbell
2010-09-06 10:03 ` [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end Ian Campbell
2010-09-06 10:03 ` [PATCH 4 of 4] libxc: restore: reset I/O fd to flags to back to state caller passed us Ian Campbell
3 siblings, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2010-09-06 10:03 UTC (permalink / raw)
To: xen-devel; +Cc: Brendan Cully
# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1283766891 -3600
# Node ID 5f96f36feebdb87eaadbbcab0399f32eda86f735
# Parent cbba0599e0a1728e524ba640bdcea7d4af05ddd5
xl: do not return to caller from monitoring daemon
The parent process will have returned to the caller and done whatever
is necessary. The daemon should not return otherwise it will repeat
this work. In the case of the migration receiver this causes it to try
and take part in the migration protocol long after the sender+parent
process have completed it, leading to confusing error messages
(although strangely not much actual damange).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
diff -r cbba0599e0a1 -r 5f96f36feebd tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c Mon Sep 06 10:54:51 2010 +0100
+++ b/tools/libxl/xl_cmdimpl.c Mon Sep 06 10:54:51 2010 +0100
@@ -1645,6 +1645,14 @@ waitpid_out:
if (child_console_pid > 0 &&
waitpid(child_console_pid, &status, 0) < 0 && errno == EINTR)
goto waitpid_out;
+
+ /*
+ * If we have daemonized then do not return to the caller -- this has
+ * already happened in the parent.
+ */
+ if ( !need_daemon )
+ exit(ret);
+
return ret;
}
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end
2010-09-06 10:03 [PATCH 0 of 4] libxc: avoid spurious error logging on restore/migrate Ian Campbell
2010-09-06 10:03 ` [PATCH 1 of 4] libxl: include domain id in userdata path Ian Campbell
2010-09-06 10:03 ` [PATCH 2 of 4] xl: do not return to caller from monitoring daemon Ian Campbell
@ 2010-09-06 10:03 ` Ian Campbell
2010-09-06 10:03 ` [PATCH 4 of 4] libxc: restore: reset I/O fd to flags to back to state caller passed us Ian Campbell
3 siblings, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2010-09-06 10:03 UTC (permalink / raw)
To: xen-devel; +Cc: Brendan Cully
# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1283766891 -3600
# Node ID bdf8ce09160d715451e1204babe5f80886ea6183
# Parent 5f96f36feebdb87eaadbbcab0399f32eda86f735
libxc: provide notification of final checkpoint to restore end
When the restore code sees this notification it will restore the
currently in-progress checkpoint when it completes.
This allows the restore end to finish up without waiting for a
spurious timeout on the receive fd and thereby avoids unnecessary
error logging in the case of a successful migration or restore.
In the normal migration or restore case the first checkpoint is always
the last. For a rolling checkpoint (such as Remus) the notification is
currently unused but could be used in the future for example to
provide a controlled failover for reasons other than error
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Brendan Cully <brendan@cs.ubc.ca>
diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_restore.c
--- a/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100
+++ b/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100
@@ -42,6 +42,7 @@ struct restore_ctx {
xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */
xen_pfn_t *p2m_batch; /* A table of P2M mappings in the current region. */
int completed; /* Set when a consistent image is available */
+ int last_checkpoint; /* Set when we should commit to the current checkpoint when it completes. */
struct domain_info_context dinfo;
};
@@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface
// DPRINTF("console pfn location: %llx\n", buf->console_pfn);
return pagebuf_get_one(xch, ctx, buf, fd, dom);
+ case XC_SAVE_ID_LAST_CHECKPOINT:
+ ctx->last_checkpoint = 1;
+ // DPRINTF("last checkpoint indication received");
+ return pagebuf_get_one(xch, ctx, buf, fd, dom);
+
default:
if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
ERROR("Max batch size exceeded (%d). Giving up.", count);
@@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch,
goto out;
}
ctx->completed = 1;
- /* shift into nonblocking mode for the remainder */
- if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
- flags = 0;
- fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
+
+ /*
+ * If more checkpoints are expected then shift into
+ * nonblocking mode for the remainder.
+ */
+ if ( !ctx->last_checkpoint )
+ {
+ if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
+ flags = 0;
+ fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
+ }
+ }
+
+ if ( ctx->last_checkpoint )
+ {
+ // DPRINTF("Last checkpoint, finishing\n");
+ goto finish;
}
// DPRINTF("Buffered checkpoint\n");
diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c
--- a/tools/libxc/xc_domain_save.c Mon Sep 06 10:54:51 2010 +0100
+++ b/tools/libxc/xc_domain_save.c Mon Sep 06 10:54:51 2010 +0100
@@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in
}
}
+ if ( !callbacks->checkpoint )
+ {
+ /*
+ * If this is not a checkpointed save then this must be the first and
+ * last checkpoint.
+ */
+ i = XC_SAVE_ID_LAST_CHECKPOINT;
+ if ( wrexact(io_fd, &i, sizeof(int)) )
+ {
+ PERROR("Error when writing last checkpoint chunk");
+ goto out;
+ }
+ }
+
/* Zero terminate */
i = 0;
if ( wrexact(io_fd, &i, sizeof(int)) )
diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h
--- a/tools/libxc/xg_save_restore.h Mon Sep 06 10:54:51 2010 +0100
+++ b/tools/libxc/xg_save_restore.h Mon Sep 06 10:54:51 2010 +0100
@@ -131,6 +131,7 @@
#define XC_SAVE_ID_TMEM_EXTRA -6
#define XC_SAVE_ID_TSC_INFO -7
#define XC_SAVE_ID_HVM_CONSOLE_PFN -8 /* (HVM-only) */
+#define XC_SAVE_ID_LAST_CHECKPOINT -9 /* Commit to restoring after completion of current iteration. */
/*
** We process save/restore/migrate in batches of pages; the below
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 4 of 4] libxc: restore: reset I/O fd to flags to back to state caller passed us
2010-09-06 10:03 [PATCH 0 of 4] libxc: avoid spurious error logging on restore/migrate Ian Campbell
` (2 preceding siblings ...)
2010-09-06 10:03 ` [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end Ian Campbell
@ 2010-09-06 10:03 ` Ian Campbell
3 siblings, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2010-09-06 10:03 UTC (permalink / raw)
To: xen-devel; +Cc: Brendan Cully
# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1283766891 -3600
# Node ID b93e43ba481f62026991c5a6afb85cd395059505
# Parent bdf8ce09160d715451e1204babe5f80886ea6183
libxc: restore: reset I/O fd to flags to back to state caller passed us
In particular this causes us to turn O_NONBLOCK back off if we set it.
The caller may continue to use the fd for it's own protocol needs and
may not be prepared to have it become non-blocking.
This probably only effects Remus now after my previous patch to signal
the last checkpoint, since a regular migration will no longer set the
fd non-blocking.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
diff -r bdf8ce09160d -r b93e43ba481f tools/libxc/xc_domain_restore.c
--- a/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100
+++ b/tools/libxc/xc_domain_restore.c Mon Sep 06 10:54:51 2010 +0100
@@ -1094,6 +1094,8 @@ int xc_domain_restore(xc_interface *xch,
void* vcpup;
uint64_t console_pfn = 0;
+ int orig_io_fd_flags;
+
static struct restore_ctx _ctx = {
.live_p2m = NULL,
.p2m = NULL,
@@ -1110,6 +1112,11 @@ int xc_domain_restore(xc_interface *xch,
if ( superpages )
return 1;
+
+ if ( (orig_io_fd_flags = fcntl(io_fd, F_GETFL, 0)) < 0 ) {
+ PERROR("unable to read IO FD flags");
+ goto out;
+ }
if ( read_exact(io_fd, &dinfo->p2m_size, sizeof(unsigned long)) )
{
@@ -1294,7 +1301,6 @@ int xc_domain_restore(xc_interface *xch,
// DPRINTF("Received all pages (%d races)\n", nraces);
if ( !ctx->completed ) {
- int flags = 0;
if ( buffer_tail(xch, ctx, &tailbuf, io_fd, max_vcpu_id, vcpumap,
ext_vcpucontext) < 0 ) {
@@ -1308,11 +1314,7 @@ int xc_domain_restore(xc_interface *xch,
* nonblocking mode for the remainder.
*/
if ( !ctx->last_checkpoint )
- {
- if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
- flags = 0;
- fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
- }
+ fcntl(io_fd, F_SETFL, orig_io_fd_flags | O_NONBLOCK);
}
if ( ctx->last_checkpoint )
@@ -1805,8 +1807,10 @@ int xc_domain_restore(xc_interface *xch,
/* discard cache for save file */
discard_file_cache(xch, io_fd, 1 /*flush*/);
+ fcntl(io_fd, F_SETFL, orig_io_fd_flags);
+
DPRINTF("Restore exit with rc=%d\n", rc);
-
+
return rc;
}
/*
^ permalink raw reply [flat|nested] 6+ messages in thread