All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/7] libxl: fork: Selective reaping
@ 2014-01-16 17:22 Ian Jackson
  2014-01-16 17:22 ` [PATCH 1/7] libxl: fork: Break out checked_waitpid Ian Jackson
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Ian Jackson @ 2014-01-16 17:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Jim Fehlig, Ian Campbell

libvirt reaps its children synchronously and has no central pid
registry and no dispatch mechanism.  libxl does have a pid registry so
can provide a selective reaping facility, but that is not currently exposed.

NB that I have compiled this series but I have NOT EXECUTED IT.
The most plausible test environment is a suitably modified libvirt.

 1/7 libxl: fork: Break out checked_waitpid
 2/7 libxl: fork: Break out childproc_reaped_ours
 3/7 libxl: fork: Clarify docs for libxl_sigchld_owner
 4/7 libxl: fork: assert that chldmode is right
 5/7 libxl: fork: Provide libxl_childproc_sigchld_occurred
 6/7 libxl: fork: Provide ..._always_selective_reap
 7/7 libxl: fork: Provide LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/7] libxl: fork: Break out checked_waitpid
  2014-01-16 17:22 [RFC PATCH 0/7] libxl: fork: Selective reaping Ian Jackson
@ 2014-01-16 17:22 ` Ian Jackson
  2014-01-17 11:09   ` Ian Campbell
  2014-01-16 17:22 ` [PATCH 2/7] libxl: fork: Break out childproc_reaped_ours Ian Jackson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2014-01-16 17:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Jim Fehlig, Ian Jackson, Ian Campbell

This is a simple error-handling wrapper for waitpid.  We're going to
want to call waitpid somewhere else and this avoids some of the
duplication.

No functional change in this patch.  (Technically, we used to check
chldmode_ours again in the EINTR case, and don't now, but that can't
have changed because we continuously hold the libxl ctx lock.)

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
---
 tools/libxl/libxl_fork.c |   26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
index 4ae9f94..2252370 100644
--- a/tools/libxl/libxl_fork.c
+++ b/tools/libxl/libxl_fork.c
@@ -155,6 +155,22 @@ int libxl__carefd_fd(const libxl__carefd *cf)
  * Actual child process handling
  */
 
+/* Like waitpid(,,WNOHANG) but handles all errors except ECHILD. */
+static pid_t checked_waitpid(libxl__egc *egc, pid_t want, int *status)
+{
+    for (;;) {
+        pid_t got = waitpid(want, status, WNOHANG);
+        if (got != -1)
+            return got;
+        if (errno == ECHILD)
+            return got;
+        if (errno == EINTR)
+            continue;
+        LIBXL__EVENT_DISASTER(egc, "waitpid() failed", errno, 0);
+        return 0;
+    }
+}
+
 static void sigchld_selfpipe_handler(libxl__egc *egc, libxl__ev_fd *ev,
                                      int fd, short events, short revents);
 
@@ -331,16 +347,10 @@ static void sigchld_selfpipe_handler(libxl__egc *egc, libxl__ev_fd *ev,
 
     while (chldmode_ours(CTX, 0) /* in case the app changes the mode */) {
         int status;
-        pid_t pid = waitpid(-1, &status, WNOHANG);
-
-        if (pid == 0) return;
+        pid_t pid = checked_waitpid(egc, -1, &status);
 
-        if (pid == -1) {
-            if (errno == ECHILD) return;
-            if (errno == EINTR) continue;
-            LIBXL__EVENT_DISASTER(egc, "waitpid() failed", errno, 0);
+        if (pid == 0 || pid == -1 /* ECHILD */)
             return;
-        }
 
         int rc = childproc_reaped(egc, pid, status);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/7] libxl: fork: Break out childproc_reaped_ours
  2014-01-16 17:22 [RFC PATCH 0/7] libxl: fork: Selective reaping Ian Jackson
  2014-01-16 17:22 ` [PATCH 1/7] libxl: fork: Break out checked_waitpid Ian Jackson
@ 2014-01-16 17:22 ` Ian Jackson
  2014-01-17 11:10   ` Ian Campbell
  2014-01-16 17:22 ` [PATCH 3/7] libxl: fork: Clarify docs for libxl_sigchld_owner Ian Jackson
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2014-01-16 17:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Jim Fehlig, Ian Jackson, Ian Campbell

We're going to want to do this again at a new call site.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
---
 tools/libxl/libxl_fork.c |   12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
index 2252370..7b84765 100644
--- a/tools/libxl/libxl_fork.c
+++ b/tools/libxl/libxl_fork.c
@@ -290,6 +290,14 @@ static int perhaps_installhandler(libxl__gc *gc, bool creating)
     return 0;
 }
 
+static void childproc_reaped_ours(libxl__egc *egc, libxl__ev_child *ch,
+                                 int status)
+{
+    LIBXL_LIST_REMOVE(ch, entry);
+    ch->pid = -1;
+    ch->callback(egc, ch, ch->pid, status);
+}
+
 static int childproc_reaped(libxl__egc *egc, pid_t pid, int status)
 {
     EGC_GC;
@@ -303,9 +311,7 @@ static int childproc_reaped(libxl__egc *egc, pid_t pid, int status)
     return ERROR_UNKNOWN_CHILD;
 
  found:
-    LIBXL_LIST_REMOVE(ch, entry);
-    ch->pid = -1;
-    ch->callback(egc, ch, pid, status);
+    childproc_reaped_ours(egc, ch, status);
 
     perhaps_removehandler(gc);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/7] libxl: fork: Clarify docs for libxl_sigchld_owner
  2014-01-16 17:22 [RFC PATCH 0/7] libxl: fork: Selective reaping Ian Jackson
  2014-01-16 17:22 ` [PATCH 1/7] libxl: fork: Break out checked_waitpid Ian Jackson
  2014-01-16 17:22 ` [PATCH 2/7] libxl: fork: Break out childproc_reaped_ours Ian Jackson
@ 2014-01-16 17:22 ` Ian Jackson
  2014-01-17 11:12   ` Ian Campbell
  2014-01-16 17:22 ` [PATCH 4/7] libxl: fork: assert that chldmode is right Ian Jackson
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2014-01-16 17:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Jim Fehlig, Ian Jackson, Ian Campbell

Clarify that libxl_sigchld_owner_libxl causes libxl to reap all the
process's children, and clarify the wording of the description of
libxl_sigchld_owner_libxl_always.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
---
 tools/libxl/libxl_event.h |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
index 6261f99..4221d5a 100644
--- a/tools/libxl/libxl_event.h
+++ b/tools/libxl/libxl_event.h
@@ -467,7 +467,8 @@ void libxl_osevent_occurred_timeout(libxl_ctx *ctx, void *for_libxl)
 
 
 typedef enum {
-    /* libxl owns SIGCHLD whenever it has a child. */
+    /* libxl owns SIGCHLD whenever it has a child, and reaps
+     * all children. */
     libxl_sigchld_owner_libxl,
 
     /* Application promises to call libxl_childproc_exited but NOT
@@ -476,7 +477,7 @@ typedef enum {
     libxl_sigchld_owner_mainloop,
 
     /* libxl owns SIGCHLD all the time, and the application is
-     * relying on libxl's event loop for reaping its own children. */
+     * relying on libxl's event loop for reaping its children too. */
     libxl_sigchld_owner_libxl_always,
 } libxl_sigchld_owner;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/7] libxl: fork: assert that chldmode is right
  2014-01-16 17:22 [RFC PATCH 0/7] libxl: fork: Selective reaping Ian Jackson
                   ` (2 preceding siblings ...)
  2014-01-16 17:22 ` [PATCH 3/7] libxl: fork: Clarify docs for libxl_sigchld_owner Ian Jackson
@ 2014-01-16 17:22 ` Ian Jackson
  2014-01-17 11:13   ` Ian Campbell
  2014-01-16 17:22 ` [PATCH 5/7] libxl: fork: Provide libxl_childproc_sigchld_occurred Ian Jackson
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2014-01-16 17:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Jim Fehlig, Ian Jackson, Ian Campbell

In libxl_childproc_reaped, check that the chldmode is as expected.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
 tools/libxl/libxl_fork.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
index 7b84765..85db2fb 100644
--- a/tools/libxl/libxl_fork.c
+++ b/tools/libxl/libxl_fork.c
@@ -322,6 +322,8 @@ int libxl_childproc_reaped(libxl_ctx *ctx, pid_t pid, int status)
 {
     EGC_INIT(ctx);
     CTX_LOCK;
+    assert(CTX->childproc_hooks->chldowner
+           == libxl_sigchld_owner_mainloop);
     int rc = childproc_reaped(egc, pid, status);
     CTX_UNLOCK;
     EGC_FREE;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/7] libxl: fork: Provide libxl_childproc_sigchld_occurred
  2014-01-16 17:22 [RFC PATCH 0/7] libxl: fork: Selective reaping Ian Jackson
                   ` (3 preceding siblings ...)
  2014-01-16 17:22 ` [PATCH 4/7] libxl: fork: assert that chldmode is right Ian Jackson
@ 2014-01-16 17:22 ` Ian Jackson
  2014-01-17 11:22   ` Ian Campbell
  2014-01-16 17:22 ` [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap Ian Jackson
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2014-01-16 17:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Jim Fehlig, Ian Jackson, Ian Campbell

Applications exist which don't keep track of all their child processes
in a manner suitable for coherent dispatch of their termination.  In
such a situation, nothing in the whole process may call wait, or
waitpid(-1,,).  Doing so reaps processes belonging to other parts of
the application and there is then no way to deliver the exit status to
the right place.

To facilitate this, provide a facility for such an application to ask
libxl to call waitpid on each of its children individually.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
---
 tools/libxl/libxl_event.h |   29 +++++++++++++++++++++++++----
 tools/libxl/libxl_fork.c  |   45 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
index 4221d5a..12e3d1f 100644
--- a/tools/libxl/libxl_event.h
+++ b/tools/libxl/libxl_event.h
@@ -471,9 +471,10 @@ typedef enum {
      * all children. */
     libxl_sigchld_owner_libxl,
 
-    /* Application promises to call libxl_childproc_exited but NOT
-     * from within a signal handler.  libxl will not itself arrange to
-     * (un)block or catch SIGCHLD. */
+    /* Application promises to discover when SIGCHLD occurs and call
+     * libxl_childproc_exited or libxl_childproc_sigchld_occurred (but
+     * NOT from within a signal handler).  libxl will not itself
+     * arrange to (un)block or catch SIGCHLD. */
     libxl_sigchld_owner_mainloop,
 
     /* libxl owns SIGCHLD all the time, and the application is
@@ -531,7 +532,8 @@ void libxl_childproc_setmode(libxl_ctx *ctx, const libxl_childproc_hooks *hooks,
 
 /*
  * This function is for an application which owns SIGCHLD and which
- * therefore reaps all of the process's children.
+ * reaps all of the process's children, and dispatches the exit status
+ * to the correct place inside the application.
  *
  * May be called only by an application which has called setmode with
  * chldowner == libxl_sigchld_owner_mainloop.  If pid was a process started
@@ -547,6 +549,25 @@ void libxl_childproc_setmode(libxl_ctx *ctx, const libxl_childproc_hooks *hooks,
 int libxl_childproc_reaped(libxl_ctx *ctx, pid_t, int status)
                            LIBXL_EXTERNAL_CALLERS_ONLY;
 
+/*
+ * This function is for an application which owns SIGCHLD but which
+ * doesn't keep track of all of its own children in a manner suitable
+ * for reaping all of them and then dispatching them.
+ *
+ * Such an the application must notify libxl, by calling this
+ * function, that a SIGCHLD occurred.  libxl will then check all its
+ * children, reap any that are ready, and take any action necessary -
+ * but it will not reap anything else.
+ *
+ * May be called only by an application which has called setmode with
+ * chldowner == libxl_sigchld_owner_mainloop.
+ *
+ * May NOT be called from within a signal handler which might
+ * interrupt any libxl operation (just like libxl_childproc_reaped).
+ */
+void libxl_childproc_sigchld_occurred(libxl_ctx *ctx)
+                           LIBXL_EXTERNAL_CALLERS_ONLY;
+
 
 /*
  * An application which initialises a libxl_ctx in a parent process
diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
index 85db2fb..b2325e0 100644
--- a/tools/libxl/libxl_fork.c
+++ b/tools/libxl/libxl_fork.c
@@ -330,6 +330,51 @@ int libxl_childproc_reaped(libxl_ctx *ctx, pid_t pid, int status)
     return rc;
 }
 
+static void childproc_checkall(libxl__egc *egc)
+{
+    EGC_GC;
+    libxl__ev_child *ch;
+
+    for (;;) {
+        int status;
+        pid_t got;
+
+        LIBXL_LIST_FOREACH(ch, &CTX->children, entry) {
+            got = checked_waitpid(egc, ch->pid, &status);
+            if (got)
+                goto found;
+        }
+        /* not found */
+        return;
+
+    found:
+        if (got == -1) {
+            LIBXL__EVENT_DISASTER
+                (egc, "waitpid() gave ECHILD but we have a child",
+                 ECHILD, 0);
+            /* it must have finished but we don't know its status */
+            status = 255<<8; /* no wait.h macro for this! */
+            assert(WIFEXITED(status));
+            assert(WEXITSTATUS(status)==255);
+            assert(!WIFSIGNALED(status));
+            assert(!WIFSTOPPED(status));
+        }
+        childproc_reaped_ours(egc, ch, status);
+        /* we need to restart the loop, as children may have been edited */
+    }
+}
+
+void libxl_childproc_sigchld_occurred(libxl_ctx *ctx)
+{
+    EGC_INIT(ctx);
+    CTX_LOCK;
+    assert(CTX->childproc_hooks->chldowner
+           == libxl_sigchld_owner_mainloop);
+    childproc_checkall(egc);
+    CTX_UNLOCK;
+    EGC_FREE;
+}
+
 static void sigchld_selfpipe_handler(libxl__egc *egc, libxl__ev_fd *ev,
                                      int fd, short events, short revents)
 {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap
  2014-01-16 17:22 [RFC PATCH 0/7] libxl: fork: Selective reaping Ian Jackson
                   ` (4 preceding siblings ...)
  2014-01-16 17:22 ` [PATCH 5/7] libxl: fork: Provide libxl_childproc_sigchld_occurred Ian Jackson
@ 2014-01-16 17:22 ` Ian Jackson
  2014-01-17  6:36   ` Jim Fehlig
  2014-01-17 11:27   ` Ian Campbell
  2014-01-16 17:22 ` [PATCH 7/7] libxl: fork: Provide LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP Ian Jackson
  2014-01-16 19:25 ` [RFC PATCH 0/7] libxl: fork: Selective reaping Ian Jackson
  7 siblings, 2 replies; 27+ messages in thread
From: Ian Jackson @ 2014-01-16 17:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Jim Fehlig, Ian Jackson, Ian Campbell

Applications exist which want to use libxl in an event-driven mode but
which do not integrate child termination into their event system, but
instead reap all their own children synchronously.

In such an application libxl must own SIGCHLD but avoid reaping any
children that don't belong to libxl.

Provide libxl_sigchld_owner_libxl_always_selective_reap which has this
behaviour.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
---
 tools/libxl/libxl_event.h |    5 +++++
 tools/libxl/libxl_fork.c  |    7 +++++++
 2 files changed, 12 insertions(+)

diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
index 12e3d1f..c09e3ed 100644
--- a/tools/libxl/libxl_event.h
+++ b/tools/libxl/libxl_event.h
@@ -480,6 +480,11 @@ typedef enum {
     /* libxl owns SIGCHLD all the time, and the application is
      * relying on libxl's event loop for reaping its children too. */
     libxl_sigchld_owner_libxl_always,
+
+    /* libxl owns SIGCHLD all the time, but it must only reap its own
+     * children.  The application will reap its own children
+     * synchronously with waitpid, without the assistance of SIGCHLD. */
+    libxl_sigchld_owner_libxl_always_selective_reap,
 } libxl_sigchld_owner;
 
 typedef struct {
diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
index b2325e0..16e17f6 100644
--- a/tools/libxl/libxl_fork.c
+++ b/tools/libxl/libxl_fork.c
@@ -268,6 +268,7 @@ static bool chldmode_ours(libxl_ctx *ctx, bool creating)
     case libxl_sigchld_owner_mainloop:
         return 0;
     case libxl_sigchld_owner_libxl_always:
+    case libxl_sigchld_owner_libxl_always_selective_reap:
         return 1;
     }
     abort();
@@ -398,6 +399,12 @@ static void sigchld_selfpipe_handler(libxl__egc *egc, libxl__ev_fd *ev,
     int e = libxl__self_pipe_eatall(selfpipe);
     if (e) LIBXL__EVENT_DISASTER(egc, "read sigchld pipe", e, 0);
 
+    if (CTX->childproc_hooks->chldowner
+        == libxl_sigchld_owner_libxl_always_selective_reap) {
+        childproc_checkall(egc);
+        return;
+    }
+
     while (chldmode_ours(CTX, 0) /* in case the app changes the mode */) {
         int status;
         pid_t pid = checked_waitpid(egc, -1, &status);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 7/7] libxl: fork: Provide LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP
  2014-01-16 17:22 [RFC PATCH 0/7] libxl: fork: Selective reaping Ian Jackson
                   ` (5 preceding siblings ...)
  2014-01-16 17:22 ` [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap Ian Jackson
@ 2014-01-16 17:22 ` Ian Jackson
  2014-01-17 11:27   ` Ian Campbell
  2014-01-16 19:25 ` [RFC PATCH 0/7] libxl: fork: Selective reaping Ian Jackson
  7 siblings, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2014-01-16 17:22 UTC (permalink / raw)
  To: xen-devel; +Cc: Jim Fehlig, Ian Jackson, Ian Campbell

This is the feature test macro for libxl_childproc_sigchld_occurred
and libxl_sigchld_owner_libxl_always_selective_reap.

It is split out into this separate patch because: a single feature
test is sensible because we do not intend anyone to release or ship
libxl versions with one of these but not the other; but, the two
features are in separate patches for clarity; and, this just makes
reading the actual code easier.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
---
 tools/libxl/libxl.h |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 12d6c31..1ac34c3 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -409,6 +409,19 @@
  */
 #define LIBXL_HAVE_DRIVER_DOMAIN_CREATION 1
 
+/*
+ * LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP
+ *
+ * If this is defined:
+ *
+ * Firstly, the enum libxl_sigchld_owner (in libxl_event.h) has the
+ * value libxl_sigchld_owner_libxl_always_selective_reap which may be
+ * passed to libxl_childproc_setmode in hooks->chldmode.
+ *
+ * Secondly, the function libxl_childproc_sigchld_occurred exists.
+ */
+#define LIBXL_HAVE_SIGCHLD_OWNER_SELECTIVE_REAP 1
+
 /* Functions annotated with LIBXL_EXTERNAL_CALLERS_ONLY may not be
  * called from within libxl itself. Callers outside libxl, who
  * do not #include libxl_internal.h, are fine. */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/7] libxl: fork: Selective reaping
  2014-01-16 17:22 [RFC PATCH 0/7] libxl: fork: Selective reaping Ian Jackson
                   ` (6 preceding siblings ...)
  2014-01-16 17:22 ` [PATCH 7/7] libxl: fork: Provide LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP Ian Jackson
@ 2014-01-16 19:25 ` Ian Jackson
  2014-01-17  6:41   ` Jim Fehlig
  7 siblings, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2014-01-16 19:25 UTC (permalink / raw)
  To: xen-devel, Ian Campbell, Jim Fehlig

Ian Jackson writes ("[RFC PATCH 0/7] libxl: fork: Selective reaping"):
> libvirt reaps its children synchronously and has no central pid
> registry and no dispatch mechanism.  libxl does have a pid registry so
> can provide a selective reaping facility, but that is not currently exposed.
> 
> NB that I have compiled this series but I have NOT EXECUTED IT.
> The most plausible test environment is a suitably modified libvirt.
> 
>  1/7 libxl: fork: Break out checked_waitpid
>  2/7 libxl: fork: Break out childproc_reaped_ours
>  3/7 libxl: fork: Clarify docs for libxl_sigchld_owner
>  4/7 libxl: fork: assert that chldmode is right
>  5/7 libxl: fork: Provide libxl_childproc_sigchld_occurred
>  6/7 libxl: fork: Provide ..._always_selective_reap
>  7/7 libxl: fork: Provide LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP

I should say, to Jim: I think that with this series applied, simply
having libvirt pass libxl_sigchld_owner_libxl_always_selective_reap
should be sufficient for everything to work.

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap
  2014-01-16 17:22 ` [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap Ian Jackson
@ 2014-01-17  6:36   ` Jim Fehlig
  2014-01-17 11:35     ` Ian Jackson
  2014-01-17 12:38     ` Ian Jackson
  2014-01-17 11:27   ` Ian Campbell
  1 sibling, 2 replies; 27+ messages in thread
From: Jim Fehlig @ 2014-01-17  6:36 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Ian Campbell

Ian Jackson wrote:
> Applications exist which want to use libxl in an event-driven mode but
> which do not integrate child termination into their event system, but
> instead reap all their own children synchronously.
>
> In such an application libxl must own SIGCHLD but avoid reaping any
> children that don't belong to libxl.
>
> Provide libxl_sigchld_owner_libxl_always_selective_reap which has this
> behaviour.
>
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> Cc: Jim Fehlig <jfehlig@suse.com>
> Cc: Ian Campbell <Ian.Campbell@citrix.com>
> ---
>  tools/libxl/libxl_event.h |    5 +++++
>  tools/libxl/libxl_fork.c  |    7 +++++++
>  2 files changed, 12 insertions(+)
>
> diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
> index 12e3d1f..c09e3ed 100644
> --- a/tools/libxl/libxl_event.h
> +++ b/tools/libxl/libxl_event.h
> @@ -480,6 +480,11 @@ typedef enum {
>      /* libxl owns SIGCHLD all the time, and the application is
>       * relying on libxl's event loop for reaping its children too. */
>      libxl_sigchld_owner_libxl_always,
> +
> +    /* libxl owns SIGCHLD all the time, but it must only reap its own
> +     * children.  The application will reap its own children
> +     * synchronously with waitpid, without the assistance of SIGCHLD. */
> +    libxl_sigchld_owner_libxl_always_selective_reap,
>   

Should there be some documentation in the opening comments of
"Subprocess handling"?  E.g. an entry under "For programs which run
their own children alongside libxl's:"?

BTW, it is not clear to me how to use libxl_childproc_setmode() wrt
different libxl_ctx.  Currently in the libvirt libxl driver there's a
driver-wide ctx for more host-centric operations like
libxl_get_version_info(), libxl_get_free_memory(), etc., and a
per-domain ctx for domain-specific operations.  The current doc for
libxl_childproc_setmode() says:

 * May not be called when libxl might have any child processes, or
the              
 * behaviour is undefined.  So it is best to call this
at                           
 *
initialisation.                                                                  


which makes me believe setmode() should be called during initialization
of the driver, using the driver-wide ctx.  But looking at the code in
libxl__ev_child_fork(), seems a domain-specific ctx would be used, since
a domain-specific operation resulted in calling libxl__ev_child_fork().

>  } libxl_sigchld_owner;
>  
>  typedef struct {
> diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
> index b2325e0..16e17f6 100644
> --- a/tools/libxl/libxl_fork.c
> +++ b/tools/libxl/libxl_fork.c
> @@ -268,6 +268,7 @@ static bool chldmode_ours(libxl_ctx *ctx, bool creating)
>      case libxl_sigchld_owner_mainloop:
>          return 0;
>      case libxl_sigchld_owner_libxl_always:
> +    case libxl_sigchld_owner_libxl_always_selective_reap:
>   

When calling setmode() on the driver-wide or on each domain-specific
ctx, I get an assert with this hunk

libvirtd: libxl_fork.c:241: libxl__sigchld_installhandler: Assertion
`!sigchld_owner' failed.

Calling setmode() on the driver-wide ctx, I hit the assert when starting
a domain, which has its own ctx.  When calling setmode() on the
domain-specific ctx's, I don't see any problems with only one domain
(and hence one ctx), but hit the assert on a second domain, which has
its own ctx.  So e.g. libxl_domain_create_new(dom1_ctx, ...) works, but
I hit the assert when calling libxl_domain_create_new(dom2_ctx, ...).

I don't see the assert, regardless of how I call setmode(), when
changing this hunk to

@@ -264,11 +264,11 @@ static bool chldmode_ours(libxl_ctx *ctx, bool
creating)
 {
     switch (ctx->childproc_hooks->chldowner) {
     case libxl_sigchld_owner_libxl:
+    case libxl_sigchld_owner_libxl_always_selective_reap:
         return creating || !LIBXL_LIST_EMPTY(&ctx->children);
     case libxl_sigchld_owner_mainloop:
         return 0;
     case libxl_sigchld_owner_libxl_always:
-    case libxl_sigchld_owner_libxl_always_selective_reap:
         return 1;
     }
     abort();

I'm not familiar with this aspect of libxl, so this might be complete
nonsense.  But improving the setmode() docs wrt apps that have multiple
libxl_ctx's would be helpful.

Regards,
Jim

>          return 1;
>      }
>      abort();
> @@ -398,6 +399,12 @@ static void sigchld_selfpipe_handler(libxl__egc *egc, libxl__ev_fd *ev,
>      int e = libxl__self_pipe_eatall(selfpipe);
>      if (e) LIBXL__EVENT_DISASTER(egc, "read sigchld pipe", e, 0);
>  
> +    if (CTX->childproc_hooks->chldowner
> +        == libxl_sigchld_owner_libxl_always_selective_reap) {
> +        childproc_checkall(egc);
> +        return;
> +    }
> +
>      while (chldmode_ours(CTX, 0) /* in case the app changes the mode */) {
>          int status;
>          pid_t pid = checked_waitpid(egc, -1, &status);
>   

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/7] libxl: fork: Selective reaping
  2014-01-16 19:25 ` [RFC PATCH 0/7] libxl: fork: Selective reaping Ian Jackson
@ 2014-01-17  6:41   ` Jim Fehlig
  0 siblings, 0 replies; 27+ messages in thread
From: Jim Fehlig @ 2014-01-17  6:41 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Ian Campbell

Ian Jackson wrote:
> Ian Jackson writes ("[RFC PATCH 0/7] libxl: fork: Selective reaping"):
>   
>> libvirt reaps its children synchronously and has no central pid
>> registry and no dispatch mechanism.  libxl does have a pid registry so
>> can provide a selective reaping facility, but that is not currently exposed.
>>
>> NB that I have compiled this series but I have NOT EXECUTED IT.
>> The most plausible test environment is a suitably modified libvirt.
>>
>>  1/7 libxl: fork: Break out checked_waitpid
>>  2/7 libxl: fork: Break out childproc_reaped_ours
>>  3/7 libxl: fork: Clarify docs for libxl_sigchld_owner
>>  4/7 libxl: fork: assert that chldmode is right
>>  5/7 libxl: fork: Provide libxl_childproc_sigchld_occurred
>>  6/7 libxl: fork: Provide ..._always_selective_reap
>>  7/7 libxl: fork: Provide LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP
>>     
>
> I should say, to Jim: I think that with this series applied, simply
> having libvirt pass libxl_sigchld_owner_libxl_always_selective_reap
> should be sufficient for everything to work.
>   

Ian, thanks a lot for these patches!  They certainly make life much
easier in the libvirt libxl driver :).  See 6/7 for a few questions.

Regards,
Jim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] libxl: fork: Break out checked_waitpid
  2014-01-16 17:22 ` [PATCH 1/7] libxl: fork: Break out checked_waitpid Ian Jackson
@ 2014-01-17 11:09   ` Ian Campbell
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Campbell @ 2014-01-17 11:09 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Jim Fehlig, xen-devel

On Thu, 2014-01-16 at 17:22 +0000, Ian Jackson wrote:
> This is a simple error-handling wrapper for waitpid.  We're going to
> want to call waitpid somewhere else and this avoids some of the
> duplication.
> 
> No functional change in this patch.  (Technically, we used to check
> chldmode_ours again in the EINTR case, and don't now, but that can't
> have changed because we continuously hold the libxl ctx lock.)

I was going to ask if that outer while condition is a bit pointless
then, but I see that outside the context we drop and reacquire the lock
so it makes sense.

> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> Cc: Jim Fehlig <jfehlig@suse.com>
> Cc: Ian Campbell <Ian.Campbell@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

> ---
>  tools/libxl/libxl_fork.c |   26 ++++++++++++++++++--------
>  1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
> index 4ae9f94..2252370 100644
> --- a/tools/libxl/libxl_fork.c
> +++ b/tools/libxl/libxl_fork.c
> @@ -155,6 +155,22 @@ int libxl__carefd_fd(const libxl__carefd *cf)
>   * Actual child process handling
>   */
>  
> +/* Like waitpid(,,WNOHANG) but handles all errors except ECHILD. */
> +static pid_t checked_waitpid(libxl__egc *egc, pid_t want, int *status)
> +{
> +    for (;;) {
> +        pid_t got = waitpid(want, status, WNOHANG);
> +        if (got != -1)
> +            return got;
> +        if (errno == ECHILD)
> +            return got;
> +        if (errno == EINTR)
> +            continue;
> +        LIBXL__EVENT_DISASTER(egc, "waitpid() failed", errno, 0);
> +        return 0;
> +    }
> +}
> +
>  static void sigchld_selfpipe_handler(libxl__egc *egc, libxl__ev_fd *ev,
>                                       int fd, short events, short revents);
>  
> @@ -331,16 +347,10 @@ static void sigchld_selfpipe_handler(libxl__egc *egc, libxl__ev_fd *ev,
>  
>      while (chldmode_ours(CTX, 0) /* in case the app changes the mode */) {
>          int status;
> -        pid_t pid = waitpid(-1, &status, WNOHANG);
> -
> -        if (pid == 0) return;
> +        pid_t pid = checked_waitpid(egc, -1, &status);
>  
> -        if (pid == -1) {
> -            if (errno == ECHILD) return;
> -            if (errno == EINTR) continue;
> -            LIBXL__EVENT_DISASTER(egc, "waitpid() failed", errno, 0);
> +        if (pid == 0 || pid == -1 /* ECHILD */)
>              return;
> -        }
>  
>          int rc = childproc_reaped(egc, pid, status);
>  

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/7] libxl: fork: Break out childproc_reaped_ours
  2014-01-16 17:22 ` [PATCH 2/7] libxl: fork: Break out childproc_reaped_ours Ian Jackson
@ 2014-01-17 11:10   ` Ian Campbell
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Campbell @ 2014-01-17 11:10 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Jim Fehlig, xen-devel

On Thu, 2014-01-16 at 17:22 +0000, Ian Jackson wrote:
> We're going to want to do this again at a new call site.
> 
> No functional change.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> Cc: Jim Fehlig <jfehlig@suse.com>

Acked-by: Ian Campbell <Ian.Campbell@citrix.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/7] libxl: fork: Clarify docs for libxl_sigchld_owner
  2014-01-16 17:22 ` [PATCH 3/7] libxl: fork: Clarify docs for libxl_sigchld_owner Ian Jackson
@ 2014-01-17 11:12   ` Ian Campbell
  2014-01-17 12:41     ` Ian Jackson
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2014-01-17 11:12 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Jim Fehlig, xen-devel

On Thu, 2014-01-16 at 17:22 +0000, Ian Jackson wrote:
> Clarify that libxl_sigchld_owner_libxl causes libxl to reap all the
> process's children, and clarify the wording of the description of
> libxl_sigchld_owner_libxl_always.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> Cc: Jim Fehlig <jfehlig@suse.com>
> Cc: Ian Campbell <Ian.Campbell@citrix.com>
> ---
>  tools/libxl/libxl_event.h |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
> index 6261f99..4221d5a 100644
> --- a/tools/libxl/libxl_event.h
> +++ b/tools/libxl/libxl_event.h
> @@ -467,7 +467,8 @@ void libxl_osevent_occurred_timeout(libxl_ctx *ctx, void *for_libxl)
>  
> 
>  typedef enum {
> -    /* libxl owns SIGCHLD whenever it has a child. */
> +    /* libxl owns SIGCHLD whenever it has a child, and reaps
> +     * all children. */

"all children, including those which were not spawned by libxl" might be
extra clear. It might also be worth saying explicitly "When libxl has no
children it does not own SIGCHLD and will not reap anything". I know
it's implied by the existing text but it is a bit of a subtle point.

With or without those mods:
Acked-by: Ian Campbell <ian.campbell@citrix.com>

>      libxl_sigchld_owner_libxl,
>  
>      /* Application promises to call libxl_childproc_exited but NOT
> @@ -476,7 +477,7 @@ typedef enum {
>      libxl_sigchld_owner_mainloop,
>  
>      /* libxl owns SIGCHLD all the time, and the application is
> -     * relying on libxl's event loop for reaping its own children. */
> +     * relying on libxl's event loop for reaping its children too. */
>      libxl_sigchld_owner_libxl_always,
>  } libxl_sigchld_owner;
>  

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/7] libxl: fork: assert that chldmode is right
  2014-01-16 17:22 ` [PATCH 4/7] libxl: fork: assert that chldmode is right Ian Jackson
@ 2014-01-17 11:13   ` Ian Campbell
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Campbell @ 2014-01-17 11:13 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Jim Fehlig, xen-devel

On Thu, 2014-01-16 at 17:22 +0000, Ian Jackson wrote:
> In libxl_childproc_reaped, check that the chldmode is as expected.

The doc comment on libxl_childproc_reaped says:
 * May be called only by an application which has called setmode with
 * chldowner == libxl_sigchld_owner_mainloop.  If pid was a process started
so this is obviously correct.

> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

> ---
>  tools/libxl/libxl_fork.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tools/libxl/libxl_fork.c b/tools/libxl/libxl_fork.c
> index 7b84765..85db2fb 100644
> --- a/tools/libxl/libxl_fork.c
> +++ b/tools/libxl/libxl_fork.c
> @@ -322,6 +322,8 @@ int libxl_childproc_reaped(libxl_ctx *ctx, pid_t pid, int status)
>  {
>      EGC_INIT(ctx);
>      CTX_LOCK;
> +    assert(CTX->childproc_hooks->chldowner
> +           == libxl_sigchld_owner_mainloop);
>      int rc = childproc_reaped(egc, pid, status);
>      CTX_UNLOCK;
>      EGC_FREE;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] libxl: fork: Provide libxl_childproc_sigchld_occurred
  2014-01-16 17:22 ` [PATCH 5/7] libxl: fork: Provide libxl_childproc_sigchld_occurred Ian Jackson
@ 2014-01-17 11:22   ` Ian Campbell
  2014-01-17 11:27     ` Ian Jackson
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2014-01-17 11:22 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Jim Fehlig, xen-devel

On Thu, 2014-01-16 at 17:22 +0000, Ian Jackson wrote:
> +        if (got == -1) {
> +            LIBXL__EVENT_DISASTER
> +                (egc, "waitpid() gave ECHILD but we have a child",
> +                 ECHILD, 0);
> +            /* it must have finished but we don't know its status */
> +            status = 255<<8; /* no wait.h macro for this! */
> +            assert(WIFEXITED(status));
> +            assert(WEXITSTATUS(status)==255);
> +            assert(!WIFSIGNALED(status));
> +            assert(!WIFSTOPPED(status));

This is quite exciting! How can this happen? Kernel bug or similar?

Either way, I suppose this is the best we can do under the circumstances
so:
Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap
  2014-01-16 17:22 ` [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap Ian Jackson
  2014-01-17  6:36   ` Jim Fehlig
@ 2014-01-17 11:27   ` Ian Campbell
  1 sibling, 0 replies; 27+ messages in thread
From: Ian Campbell @ 2014-01-17 11:27 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Jim Fehlig, xen-devel

On Thu, 2014-01-16 at 17:22 +0000, Ian Jackson wrote:
> Applications exist which want to use libxl in an event-driven mode but
> which do not integrate child termination into their event system, but
> instead reap all their own children synchronously.
> 
> In such an application libxl must own SIGCHLD but avoid reaping any
> children that don't belong to libxl.
> 
> Provide libxl_sigchld_owner_libxl_always_selective_reap which has this
> behaviour.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> Cc: Jim Fehlig <jfehlig@suse.com>

So far as this patch goes:

Acked-by: Ian Campbell <Ian.Campbell@citrix.com>

But Jim's queries about multiple ctx's etc are interesting.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] libxl: fork: Provide libxl_childproc_sigchld_occurred
  2014-01-17 11:22   ` Ian Campbell
@ 2014-01-17 11:27     ` Ian Jackson
  2014-01-17 11:29       ` Ian Campbell
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2014-01-17 11:27 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Jim Fehlig, xen-devel

Ian Campbell writes ("Re: [PATCH 5/7] libxl: fork: Provide libxl_childproc_sigchld_occurred"):
> On Thu, 2014-01-16 at 17:22 +0000, Ian Jackson wrote:
> > +            /* it must have finished but we don't know its status */
> > +            status = 255<<8; /* no wait.h macro for this! */
> > +            assert(WIFEXITED(status));
> > +            assert(WEXITSTATUS(status)==255);
> > +            assert(!WIFSIGNALED(status));
> > +            assert(!WIFSTOPPED(status));
> 
> This is quite exciting! How can this happen? Kernel bug or similar?

In principle it would be possible (and standards-compliant) for a
system to encode its wait status in a different way to usual.  I don't
think any such system really exists - at least, not one we'll be
running libxl on.

Of course this code is only reached if DISASTER returns...

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] libxl: fork: Provide LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP
  2014-01-16 17:22 ` [PATCH 7/7] libxl: fork: Provide LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP Ian Jackson
@ 2014-01-17 11:27   ` Ian Campbell
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Campbell @ 2014-01-17 11:27 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Jim Fehlig, xen-devel

On Thu, 2014-01-16 at 17:22 +0000, Ian Jackson wrote:
> This is the feature test macro for libxl_childproc_sigchld_occurred
> and libxl_sigchld_owner_libxl_always_selective_reap.
> 
> It is split out into this separate patch because: a single feature
> test is sensible because we do not intend anyone to release or ship
> libxl versions with one of these but not the other; but, the two
> features are in separate patches for clarity; and, this just makes
> reading the actual code easier.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> Cc: Jim Fehlig <jfehlig@suse.com>

Acked-by: Ian Campbell <Ian.Campbell@citrix.com>

> ---
>  tools/libxl/libxl.h |   13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 12d6c31..1ac34c3 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -409,6 +409,19 @@
>   */
>  #define LIBXL_HAVE_DRIVER_DOMAIN_CREATION 1
>  
> +/*
> + * LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP
> + *
> + * If this is defined:
> + *
> + * Firstly, the enum libxl_sigchld_owner (in libxl_event.h) has the
> + * value libxl_sigchld_owner_libxl_always_selective_reap which may be
> + * passed to libxl_childproc_setmode in hooks->chldmode.
> + *
> + * Secondly, the function libxl_childproc_sigchld_occurred exists.
> + */
> +#define LIBXL_HAVE_SIGCHLD_OWNER_SELECTIVE_REAP 1
> +
>  /* Functions annotated with LIBXL_EXTERNAL_CALLERS_ONLY may not be
>   * called from within libxl itself. Callers outside libxl, who
>   * do not #include libxl_internal.h, are fine. */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] libxl: fork: Provide libxl_childproc_sigchld_occurred
  2014-01-17 11:27     ` Ian Jackson
@ 2014-01-17 11:29       ` Ian Campbell
  2014-01-17 16:09         ` Ian Jackson
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2014-01-17 11:29 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Jim Fehlig, xen-devel

On Fri, 2014-01-17 at 11:27 +0000, Ian Jackson wrote:
> Ian Campbell writes ("Re: [PATCH 5/7] libxl: fork: Provide libxl_childproc_sigchld_occurred"):
> > On Thu, 2014-01-16 at 17:22 +0000, Ian Jackson wrote:
> > > +            /* it must have finished but we don't know its status */
> > > +            status = 255<<8; /* no wait.h macro for this! */
> > > +            assert(WIFEXITED(status));
> > > +            assert(WEXITSTATUS(status)==255);
> > > +            assert(!WIFSIGNALED(status));
> > > +            assert(!WIFSTOPPED(status));
> > 
> > This is quite exciting! How can this happen? Kernel bug or similar?
> 
> In principle it would be possible (and standards-compliant) for a
> system to encode its wait status in a different way to usual.  I don't
> think any such system really exists - at least, not one we'll be
> running libxl on.
> 
> Of course this code is only reached if DISASTER returns...

Sorry, I meant how can we get here at all, i.e. with a waitpid returning
-1/ECHLD when we think there is actually a child.

Thinking about it this way I now realise this could be a bug in libxl or
the application which caused the process to be reaped elsewhere.

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap
  2014-01-17  6:36   ` Jim Fehlig
@ 2014-01-17 11:35     ` Ian Jackson
  2014-01-17 15:16       ` Jim Fehlig
  2014-01-17 12:38     ` Ian Jackson
  1 sibling, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2014-01-17 11:35 UTC (permalink / raw)
  To: Jim Fehlig; +Cc: xen-devel, Ian Campbell

Jim Fehlig writes ("Re: [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap"):
> Ian Jackson wrote:
> > +    /* libxl owns SIGCHLD all the time, but it must only reap its own
> > +     * children.  The application will reap its own children
> > +     * synchronously with waitpid, without the assistance of SIGCHLD. */
> > +    libxl_sigchld_owner_libxl_always_selective_reap,
> 
> Should there be some documentation in the opening comments of
> "Subprocess handling"?  E.g. an entry under "For programs which run
> their own children alongside libxl's:"?

Yes.

> BTW, it is not clear to me how to use libxl_childproc_setmode() wrt
> different libxl_ctx.  Currently in the libvirt libxl driver there's a
> driver-wide ctx for more host-centric operations like
> libxl_get_version_info(), libxl_get_free_memory(), etc., and a
> per-domain ctx for domain-specific operations.  The current doc for
> libxl_childproc_setmode() says:

Oh dear.  Can you change it to use the same ctx everywhere ?

If not, I'm afraid, someone needs to
 - keep a record of all the libxl ctxs
 - when the self-pipe triggers, iterate over all the ctxs calling
   libxl_childproc_sigchld_occurred

If that "someone" is libvirt, you'll have to do the self-pipe trick.

It would probably be easier to have libxl maintain a global list of
libxl ctx's for this purpose.

> When calling setmode() on the driver-wide or on each domain-specific
> ctx, I get an assert with this hunk
> 
> libvirtd: libxl_fork.c:241: libxl__sigchld_installhandler: Assertion
> `!sigchld_owner' failed.

The problem is that the SIGCHLD ownership is attached to the
individual ctx.  This could in principle be changed, I think.

I will try to see if I can do that.

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap
  2014-01-17  6:36   ` Jim Fehlig
  2014-01-17 11:35     ` Ian Jackson
@ 2014-01-17 12:38     ` Ian Jackson
  2014-01-17 15:22       ` Jim Fehlig
  1 sibling, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2014-01-17 12:38 UTC (permalink / raw)
  To: Jim Fehlig; +Cc: xen-devel, Ian Campbell

Jim Fehlig writes ("Re: [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap"):
> I don't see the assert, regardless of how I call setmode(), when
> changing this hunk to
> 
> @@ -264,11 +264,11 @@ static bool chldmode_ours(libxl_ctx *ctx, bool
> creating)
>  {
>      switch (ctx->childproc_hooks->chldowner) {
>      case libxl_sigchld_owner_libxl:
> +    case libxl_sigchld_owner_libxl_always_selective_reap:
>          return creating || !LIBXL_LIST_EMPTY(&ctx->children);
>      case libxl_sigchld_owner_mainloop:
>          return 0;
>      case libxl_sigchld_owner_libxl_always:
> -    case libxl_sigchld_owner_libxl_always_selective_reap:
>          return 1;
>      }
>      abort();

I should say: that that works is just luck, I think.  I have a better
fix.

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/7] libxl: fork: Clarify docs for libxl_sigchld_owner
  2014-01-17 11:12   ` Ian Campbell
@ 2014-01-17 12:41     ` Ian Jackson
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Jackson @ 2014-01-17 12:41 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Jim Fehlig, xen-devel

Ian Campbell writes ("Re: [PATCH 3/7] libxl: fork: Clarify docs for libxl_sigchld_owner"):
> On Thu, 2014-01-16 at 17:22 +0000, Ian Jackson wrote:
> >  typedef enum {
> > -    /* libxl owns SIGCHLD whenever it has a child. */
> > +    /* libxl owns SIGCHLD whenever it has a child, and reaps
> > +     * all children. */
> 
> "all children, including those which were not spawned by libxl" might be
> extra clear. It might also be worth saying explicitly "When libxl has no
> children it does not own SIGCHLD and will not reap anything". I know
> it's implied by the existing text but it is a bit of a subtle point.

Good idea, done.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap
  2014-01-17 11:35     ` Ian Jackson
@ 2014-01-17 15:16       ` Jim Fehlig
  2014-01-17 16:13         ` Ian Jackson
  0 siblings, 1 reply; 27+ messages in thread
From: Jim Fehlig @ 2014-01-17 15:16 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Ian Campbell

Ian Jackson wrote:
> Jim Fehlig writes ("Re: [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap"):
>   
>> Ian Jackson wrote:
>>     
>>> +    /* libxl owns SIGCHLD all the time, but it must only reap its own
>>> +     * children.  The application will reap its own children
>>> +     * synchronously with waitpid, without the assistance of SIGCHLD. */
>>> +    libxl_sigchld_owner_libxl_always_selective_reap,
>>>       
>> Should there be some documentation in the opening comments of
>> "Subprocess handling"?  E.g. an entry under "For programs which run
>> their own children alongside libxl's:"?
>>     
>
> Yes.
>
>   
>> BTW, it is not clear to me how to use libxl_childproc_setmode() wrt
>> different libxl_ctx.  Currently in the libvirt libxl driver there's a
>> driver-wide ctx for more host-centric operations like
>> libxl_get_version_info(), libxl_get_free_memory(), etc., and a
>> per-domain ctx for domain-specific operations.  The current doc for
>> libxl_childproc_setmode() says:
>>     
>
> Oh dear.  Can you change it to use the same ctx everywhere ?
>   

What would be the downside of this in a multi-threaded application,
where there are many concurrent domain operations?  I was under the
impression that such an app would need a per-domain ctx.

Regards,
Jim

> If not, I'm afraid, someone needs to
>  - keep a record of all the libxl ctxs
>  - when the self-pipe triggers, iterate over all the ctxs calling
>    libxl_childproc_sigchld_occurred
>
> If that "someone" is libvirt, you'll have to do the self-pipe trick.
>
> It would probably be easier to have libxl maintain a global list of
> libxl ctx's for this purpose.
>
>   
>> When calling setmode() on the driver-wide or on each domain-specific
>> ctx, I get an assert with this hunk
>>
>> libvirtd: libxl_fork.c:241: libxl__sigchld_installhandler: Assertion
>> `!sigchld_owner' failed.
>>     
>
> The problem is that the SIGCHLD ownership is attached to the
> individual ctx.  This could in principle be changed, I think.
>
> I will try to see if I can do that.
>
> Ian.
>
>   

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap
  2014-01-17 12:38     ` Ian Jackson
@ 2014-01-17 15:22       ` Jim Fehlig
  0 siblings, 0 replies; 27+ messages in thread
From: Jim Fehlig @ 2014-01-17 15:22 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel, Ian Campbell

Ian Jackson wrote:
> Jim Fehlig writes ("Re: [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap"):
>   
>> I don't see the assert, regardless of how I call setmode(), when
>> changing this hunk to
>>
>> @@ -264,11 +264,11 @@ static bool chldmode_ours(libxl_ctx *ctx, bool
>> creating)
>>  {
>>      switch (ctx->childproc_hooks->chldowner) {
>>      case libxl_sigchld_owner_libxl:
>> +    case libxl_sigchld_owner_libxl_always_selective_reap:
>>          return creating || !LIBXL_LIST_EMPTY(&ctx->children);
>>      case libxl_sigchld_owner_mainloop:
>>          return 0;
>>      case libxl_sigchld_owner_libxl_always:
>> -    case libxl_sigchld_owner_libxl_always_selective_reap:
>>          return 1;
>>      }
>>      abort();
>>     
>
> I should say: that that works is just luck, I think.

That's what I suspected :).  And I also suspect my luck would run out
once I throw 10's of domains with many concurrent operations in the mix.

> I have a better fix.
>   

Ok, thanks!

Regards,
Jim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] libxl: fork: Provide libxl_childproc_sigchld_occurred
  2014-01-17 11:29       ` Ian Campbell
@ 2014-01-17 16:09         ` Ian Jackson
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Jackson @ 2014-01-17 16:09 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Jim Fehlig, xen-devel

Ian Campbell writes ("Re: [PATCH 5/7] libxl: fork: Provide libxl_childproc_sigchld_occurred"):
> On Fri, 2014-01-17 at 11:27 +0000, Ian Jackson wrote:
> > Of course this code is only reached if DISASTER returns...
> 
> Sorry, I meant how can we get here at all, i.e. with a waitpid returning
> -1/ECHLD when we think there is actually a child.
> 
> Thinking about it this way I now realise this could be a bug in libxl or
> the application which caused the process to be reaped elsewhere.

Exactly.

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap
  2014-01-17 15:16       ` Jim Fehlig
@ 2014-01-17 16:13         ` Ian Jackson
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Jackson @ 2014-01-17 16:13 UTC (permalink / raw)
  To: Jim Fehlig; +Cc: xen-devel, Ian Campbell

Jim Fehlig writes ("Re: [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap"):
> Ian Jackson wrote:
> > Oh dear.  Can you change it to use the same ctx everywhere ?
> 
> What would be the downside of this in a multi-threaded application,
> where there are many concurrent domain operations?  I was under the
> impression that such an app would need a per-domain ctx.

In theory this should all be fine.  The design is intended to allow
you to use the same ctx for the whole program.  libxl is supposed to
release the ctx lock when doing anything long-running.

However, there are still areas where this hasn't been fully achieved
(parts of the migration code, and all the QMP communication with the
device model, come to mind).

I'm about to post v2 of my series which relaxes the libxl restriction
about SIGCHLD.

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-01-17 16:13 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-16 17:22 [RFC PATCH 0/7] libxl: fork: Selective reaping Ian Jackson
2014-01-16 17:22 ` [PATCH 1/7] libxl: fork: Break out checked_waitpid Ian Jackson
2014-01-17 11:09   ` Ian Campbell
2014-01-16 17:22 ` [PATCH 2/7] libxl: fork: Break out childproc_reaped_ours Ian Jackson
2014-01-17 11:10   ` Ian Campbell
2014-01-16 17:22 ` [PATCH 3/7] libxl: fork: Clarify docs for libxl_sigchld_owner Ian Jackson
2014-01-17 11:12   ` Ian Campbell
2014-01-17 12:41     ` Ian Jackson
2014-01-16 17:22 ` [PATCH 4/7] libxl: fork: assert that chldmode is right Ian Jackson
2014-01-17 11:13   ` Ian Campbell
2014-01-16 17:22 ` [PATCH 5/7] libxl: fork: Provide libxl_childproc_sigchld_occurred Ian Jackson
2014-01-17 11:22   ` Ian Campbell
2014-01-17 11:27     ` Ian Jackson
2014-01-17 11:29       ` Ian Campbell
2014-01-17 16:09         ` Ian Jackson
2014-01-16 17:22 ` [PATCH 6/7] libxl: fork: Provide ..._always_selective_reap Ian Jackson
2014-01-17  6:36   ` Jim Fehlig
2014-01-17 11:35     ` Ian Jackson
2014-01-17 15:16       ` Jim Fehlig
2014-01-17 16:13         ` Ian Jackson
2014-01-17 12:38     ` Ian Jackson
2014-01-17 15:22       ` Jim Fehlig
2014-01-17 11:27   ` Ian Campbell
2014-01-16 17:22 ` [PATCH 7/7] libxl: fork: Provide LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP Ian Jackson
2014-01-17 11:27   ` Ian Campbell
2014-01-16 19:25 ` [RFC PATCH 0/7] libxl: fork: Selective reaping Ian Jackson
2014-01-17  6:41   ` Jim Fehlig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.