[PATCH 0 of 4] libxc: avoid spurious error logging on restore/migrate

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0 of 4] libxc: avoid spurious error logging on restore/migrate
@ 2010-09-06 10:03 Ian Campbell
  2010-09-06 10:03 ` [PATCH 1 of 4] libxl: include domain id in userdata path Ian Campbell
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Ian Campbell @ 2010-09-06 10:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Brendan Cully

This series replaces "libxc: succeed silently on restore" from yesterday.

As well as adding an explicit "final checkpoint" notification chunk it
also includes some tweaks progress logging to be more pleasing and a
fix to xl so it does not re-run the bootloader on restore.

Changes since previous posting:
- Rebased past fde833c66948 "xl: do not continue in the child and exec
  xenconsole in the parent".
- Prepended fix for issue exposed by 22366e13f76d "xl: randomly
  generate UUIDs".
- Added Brendan's Ack to patch 3/4.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1 of 4] libxl: include domain id in userdata path
  2010-09-06 10:03 [PATCH 0 of 4] libxc: avoid spurious error logging on restore/migrate Ian Campbell
@ 2010-09-06 10:03 ` Ian Campbell
  2010-09-06 10:03 ` [PATCH 2 of 4] xl: do not return to caller from monitoring daemon Ian Campbell
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2010-09-06 10:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Brendan Cully

# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1283766891 -3600
# Node ID cbba0599e0a1728e524ba640bdcea7d4af05ddd5
# Parent  7b69ef39c61bc056aa0c9751e523ce6d9d3fc47f
libxl: include domain id in userdata path.

The userdata is specific to a particular incarnation of a domain and
the patch is therefor required to be unique to each incarnation. If
the user has explicitly configured a UUID in their domain
configuration then the path is no longer unique since
22124:22366e13f76d "xl: randomly generate UUIDs" which (correctly)
caused the uuid domain configuration option to be obeyed.

If userdata is not unique to each incarnation of a domain then
localhost live migration is broken because the target is created (and
writes its userdata) before the sender destroys the domain (and
deletes its userdata).

Strictly speaking I think the UUID is unnecessary but it is perhaps
helpful to people looking in the userdata directory, for debugging
etc.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

diff -r 7b69ef39c61b -r cbba0599e0a1 tools/libxl/libxl_dom.c
--- a/tools/libxl/libxl_dom.c	Mon Sep 06 10:54:51 2010 +0100
+++ b/tools/libxl/libxl_dom.c	Mon Sep 06 10:54:51 2010 +0100
@@ -466,8 +466,8 @@ static const char *userdata_path(libxl_g
     uuid_string = libxl_sprintf(gc, LIBXL_UUID_FMT, LIBXL_UUID_BYTES(info.uuid));

     path = libxl_sprintf(gc, "/var/lib/xen/"
-                         "userdata-%s.%s.%s",
-                         wh, uuid_string, userdata_userid);
+                         "userdata-%s.%u.%s.%s",
+                         wh, domid, uuid_string, userdata_userid);
     if (!path)
         XL_LOG_ERRNO(ctx, XL_LOG_ERROR, "unable to allocate for"
                      " userdata path");

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2 of 4] xl: do not return to caller from monitoring daemon
  2010-09-06 10:03 [PATCH 0 of 4] libxc: avoid spurious error logging on restore/migrate Ian Campbell
  2010-09-06 10:03 ` [PATCH 1 of 4] libxl: include domain id in userdata path Ian Campbell
@ 2010-09-06 10:03 ` Ian Campbell
  2010-09-06 10:03 ` [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end Ian Campbell
  2010-09-06 10:03 ` [PATCH 4 of 4] libxc: restore: reset I/O fd to flags to back to state caller passed us Ian Campbell
  3 siblings, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2010-09-06 10:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Brendan Cully

# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1283766891 -3600
# Node ID 5f96f36feebdb87eaadbbcab0399f32eda86f735
# Parent  cbba0599e0a1728e524ba640bdcea7d4af05ddd5
xl: do not return to caller from monitoring daemon

The parent process will have returned to the caller and done whatever
is necessary. The daemon should not return otherwise it will repeat
this work. In the case of the migration receiver this causes it to try
and take part in the migration protocol long after the sender+parent
process have completed it, leading to confusing error messages
(although strangely not much actual damange).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

diff -r cbba0599e0a1 -r 5f96f36feebd tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c	Mon Sep 06 10:54:51 2010 +0100
+++ b/tools/libxl/xl_cmdimpl.c	Mon Sep 06 10:54:51 2010 +0100
@@ -1645,6 +1645,14 @@ waitpid_out:
     if (child_console_pid > 0 &&
             waitpid(child_console_pid, &status, 0) < 0 && errno == EINTR)
         goto waitpid_out;
+
+    /*
+     * If we have daemonized then do not return to the caller -- this has
+     * already happened in the parent.
+     */
+    if ( !need_daemon )
+        exit(ret);
+
     return ret;
 }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end
  2010-09-06 10:03 [PATCH 0 of 4] libxc: avoid spurious error logging on restore/migrate Ian Campbell
  2010-09-06 10:03 ` [PATCH 1 of 4] libxl: include domain id in userdata path Ian Campbell
  2010-09-06 10:03 ` [PATCH 2 of 4] xl: do not return to caller from monitoring daemon Ian Campbell
@ 2010-09-06 10:03 ` Ian Campbell
  2010-09-06 10:03 ` [PATCH 4 of 4] libxc: restore: reset I/O fd to flags to back to state caller passed us Ian Campbell
  3 siblings, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2010-09-06 10:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Brendan Cully

# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1283766891 -3600
# Node ID bdf8ce09160d715451e1204babe5f80886ea6183
# Parent  5f96f36feebdb87eaadbbcab0399f32eda86f735
libxc: provide notification of final checkpoint to restore end

When the restore code sees this notification it will restore the
currently in-progress checkpoint when it completes.

This allows the restore end to finish up without waiting for a
spurious timeout on the receive fd and thereby avoids unnecessary
error logging in the case of a successful migration or restore.

In the normal migration or restore case the first checkpoint is always
the last. For a rolling checkpoint (such as Remus) the notification is
currently unused but could be used in the future for example to
provide a controlled failover for reasons other than error

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Brendan Cully <brendan@cs.ubc.ca>

diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_restore.c
--- a/tools/libxc/xc_domain_restore.c	Mon Sep 06 10:54:51 2010 +0100
+++ b/tools/libxc/xc_domain_restore.c	Mon Sep 06 10:54:51 2010 +0100
@@ -42,6 +42,7 @@ struct restore_ctx {
     xen_pfn_t *p2m; /* A table mapping each PFN to its new MFN. */
     xen_pfn_t *p2m_batch; /* A table of P2M mappings in the current region.  */
     int completed; /* Set when a consistent image is available */
+    int last_checkpoint; /* Set when we should commit to the current checkpoint when it completes. */
     struct domain_info_context dinfo;
 };
 
@@ -765,6 +766,11 @@ static int pagebuf_get_one(xc_interface 
         // DPRINTF("console pfn location: %llx\n", buf->console_pfn);
         return pagebuf_get_one(xch, ctx, buf, fd, dom);
 
+    case XC_SAVE_ID_LAST_CHECKPOINT:
+        ctx->last_checkpoint = 1;
+        // DPRINTF("last checkpoint indication received");
+        return pagebuf_get_one(xch, ctx, buf, fd, dom);
+
     default:
         if ( (count > MAX_BATCH_SIZE) || (count < 0) ) {
             ERROR("Max batch size exceeded (%d). Giving up.", count);
@@ -1296,10 +1302,23 @@ int xc_domain_restore(xc_interface *xch,
             goto out;
         }
         ctx->completed = 1;
-        /* shift into nonblocking mode for the remainder */
-        if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
-            flags = 0;
-        fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
+
+        /*
+         * If more checkpoints are expected then shift into
+         * nonblocking mode for the remainder.
+         */
+        if ( !ctx->last_checkpoint )
+        {
+            if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
+                flags = 0;
+            fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
+        }
+    }
+
+    if ( ctx->last_checkpoint )
+    {
+        // DPRINTF("Last checkpoint, finishing\n");
+        goto finish;
     }
 
     // DPRINTF("Buffered checkpoint\n");
diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xc_domain_save.c
--- a/tools/libxc/xc_domain_save.c	Mon Sep 06 10:54:51 2010 +0100
+++ b/tools/libxc/xc_domain_save.c	Mon Sep 06 10:54:51 2010 +0100
@@ -1616,6 +1616,20 @@ int xc_domain_save(xc_interface *xch, in
         }
     }
 
+    if ( !callbacks->checkpoint )
+    {
+        /*
+         * If this is not a checkpointed save then this must be the first and
+         * last checkpoint.
+         */
+        i = XC_SAVE_ID_LAST_CHECKPOINT;
+        if ( wrexact(io_fd, &i, sizeof(int)) )
+        {
+            PERROR("Error when writing last checkpoint chunk");
+            goto out;
+        }
+    }
+
     /* Zero terminate */
     i = 0;
     if ( wrexact(io_fd, &i, sizeof(int)) )
diff -r 5f96f36feebd -r bdf8ce09160d tools/libxc/xg_save_restore.h
--- a/tools/libxc/xg_save_restore.h	Mon Sep 06 10:54:51 2010 +0100
+++ b/tools/libxc/xg_save_restore.h	Mon Sep 06 10:54:51 2010 +0100
@@ -131,6 +131,7 @@
 #define XC_SAVE_ID_TMEM_EXTRA         -6
 #define XC_SAVE_ID_TSC_INFO           -7
 #define XC_SAVE_ID_HVM_CONSOLE_PFN    -8 /* (HVM-only) */
+#define XC_SAVE_ID_LAST_CHECKPOINT    -9 /* Commit to restoring after completion of current iteration. */
 
 /*
 ** We process save/restore/migrate in batches of pages; the below

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 4 of 4] libxc: restore: reset I/O fd to flags to back to state caller passed us
  2010-09-06 10:03 [PATCH 0 of 4] libxc: avoid spurious error logging on restore/migrate Ian Campbell
                   ` (2 preceding siblings ...)
  2010-09-06 10:03 ` [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end Ian Campbell
@ 2010-09-06 10:03 ` Ian Campbell
  3 siblings, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2010-09-06 10:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Brendan Cully

# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1283766891 -3600
# Node ID b93e43ba481f62026991c5a6afb85cd395059505
# Parent  bdf8ce09160d715451e1204babe5f80886ea6183
libxc: restore: reset I/O fd to flags to back to state caller passed us

In particular this causes us to turn O_NONBLOCK back off if we set it.

The caller may continue to use the fd for it's own protocol needs and
may not be prepared to have it become non-blocking.

This probably only effects Remus now after my previous patch to signal
the last checkpoint, since a regular migration will no longer set the
fd non-blocking.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

diff -r bdf8ce09160d -r b93e43ba481f tools/libxc/xc_domain_restore.c
--- a/tools/libxc/xc_domain_restore.c	Mon Sep 06 10:54:51 2010 +0100
+++ b/tools/libxc/xc_domain_restore.c	Mon Sep 06 10:54:51 2010 +0100
@@ -1094,6 +1094,8 @@ int xc_domain_restore(xc_interface *xch,
     void* vcpup;
     uint64_t console_pfn = 0;
 
+    int orig_io_fd_flags;
+
     static struct restore_ctx _ctx = {
         .live_p2m = NULL,
         .p2m = NULL,
@@ -1110,6 +1112,11 @@ int xc_domain_restore(xc_interface *xch,
 
     if ( superpages )
         return 1;
+
+    if ( (orig_io_fd_flags = fcntl(io_fd, F_GETFL, 0)) < 0 ) {
+        PERROR("unable to read IO FD flags");
+        goto out;
+    }
 
     if ( read_exact(io_fd, &dinfo->p2m_size, sizeof(unsigned long)) )
     {
@@ -1294,7 +1301,6 @@ int xc_domain_restore(xc_interface *xch,
     // DPRINTF("Received all pages (%d races)\n", nraces);
 
     if ( !ctx->completed ) {
-        int flags = 0;
 
         if ( buffer_tail(xch, ctx, &tailbuf, io_fd, max_vcpu_id, vcpumap,
                          ext_vcpucontext) < 0 ) {
@@ -1308,11 +1314,7 @@ int xc_domain_restore(xc_interface *xch,
          * nonblocking mode for the remainder.
          */
         if ( !ctx->last_checkpoint )
-        {
-            if ( (flags = fcntl(io_fd, F_GETFL,0)) < 0 )
-                flags = 0;
-            fcntl(io_fd, F_SETFL, flags | O_NONBLOCK);
-        }
+            fcntl(io_fd, F_SETFL, orig_io_fd_flags | O_NONBLOCK);
     }
 
     if ( ctx->last_checkpoint )
@@ -1805,8 +1807,10 @@ int xc_domain_restore(xc_interface *xch,
     /* discard cache for save file  */
     discard_file_cache(xch, io_fd, 1 /*flush*/);
 
+    fcntl(io_fd, F_SETFL, orig_io_fd_flags);
+
     DPRINTF("Restore exit with rc=%d\n", rc);
-    
+
     return rc;
 }
 /*

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 0 of 4] libxc: avoid spurious error logging on restore/migrate
@ 2010-09-03 13:06 Ian Campbell
  0 siblings, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2010-09-03 13:06 UTC (permalink / raw)
  To: xen-devel; +Cc: Brendan Cully

This series replaces "libxc: succeed silently on restore" from yesterday.

As well as adding an explicit "final checkpoint" notification chunk it
also includes some tweaks progress logging to be more pleasing and a
fix to xl so it does not re-run the bootloader on restore.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-09-06 10:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-06 10:03 [PATCH 0 of 4] libxc: avoid spurious error logging on restore/migrate Ian Campbell
2010-09-06 10:03 ` [PATCH 1 of 4] libxl: include domain id in userdata path Ian Campbell
2010-09-06 10:03 ` [PATCH 2 of 4] xl: do not return to caller from monitoring daemon Ian Campbell
2010-09-06 10:03 ` [PATCH 3 of 4] libxc: provide notification of final checkpoint to restore end Ian Campbell
2010-09-06 10:03 ` [PATCH 4 of 4] libxc: restore: reset I/O fd to flags to back to state caller passed us Ian Campbell
  -- strict thread matches above, loose matches on Subject: below --
2010-09-03 13:06 [PATCH 0 of 4] libxc: avoid spurious error logging on restore/migrate Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.