All of lore.kernel.org
 help / color / mirror / Atom feed
* user-cr thread safety
@ 2010-07-26 18:37 Nathan Lynch
  2010-07-29 14:56 ` Oren Laadan
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Nathan Lynch @ 2010-07-26 18:37 UTC (permalink / raw)
  To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

user-cr's restart program creates a thread to pipe the checkpoint image
into the sys_restart file descriptor.  This is a thread created with
clone(2) and it shares its address space with the coordinator.

While glibc has internal mechanisms to ensure thread safety, these work
only with threads that were created using glibc/pthread interfaces.
clone(2) bypasses the housekeeping that glibc does to track threads.  It
is not safe to call e.g. malloc or printf from the feeder thread.

The behavior I've been seeing is that restart will occasionally abort,
crash, or sleep indefinitely (with both the coordinator and feeder
threads waiting forever on the same futex) - before restart(2) or
eclone(2) are ever called.

I have tried patching user-cr to create the feeder thread with
pthread_create, but it's not trivial -- I think the program's correct
functioning depends heavily on the threads having separate file
descriptor tables.

The best I can come up with right now is to allocate ckpt_msg's buffer
on the stack - I think this avoids most if not all of the concurrent
malloc activity associated with the crashes/hangs I've been seing.

 common.h |   16 ++++++----------
 1 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/common.h b/common.h
index 99b224d..927b146 100644
--- a/common.h
+++ b/common.h
@@ -1,25 +1,21 @@
 #include <stdio.h>
 #include <signal.h>
 
-#define BUFSIZE  (4 * 4096)
+#define BUFSIZE  (4096)
 
 static inline void ckpt_msg(int fd, char *format, ...)
 {
+	char buf[BUFSIZE] = { '\0' };
 	va_list ap;
-	char *bufp;
+
 	if (fd < 0)
 		return;
 
 	va_start(ap, format);
-
-	bufp = malloc(BUFSIZE);
-	if(bufp) {
-		vsnprintf(bufp, BUFSIZE, format, ap);
-		write(fd, bufp, strlen(bufp));
-	}
-	free(bufp);
-
+	vsnprintf(buf, BUFSIZE, format, ap);
 	va_end(ap);
+
+	write(fd, buf, strlen(buf));
 }
 
 #define ckpt_perror(s) 							\

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: user-cr thread safety
  2010-07-26 18:37 user-cr thread safety Nathan Lynch
@ 2010-07-29 14:56 ` Oren Laadan
       [not found]   ` <4C51968D.3000301-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  2010-07-30 17:08 ` [PATCH 1/4] restart: check for overflow when counting (nested) vpids Oren Laadan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Oren Laadan @ 2010-07-29 14:56 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Nathan,

Thanks for pointing this out. A couple of comments:

1) The separate fd-table between the coordinator and the feeder
is just a convenience and can be relatively easily relaxed so
that pthreads may be used. However, ...

2) More importantly, malloc() and printf() also occur in the
processes and threads generated during the creation of the new
(restored) task tree. So the same problems may occur there as
well. Unfortunately, here we can't use glibc, in part because
it is not even supported by glibc.

Maybe a more robust way to address this is to: (1) use mmap()
and munmap() instead of malloc() and free(), and also (2) use
sprintf() + write() instead of printf().
That should make everything thread-safe. Did you notice other
libc calls which may be problematic ?

Oren.


Nathan Lynch wrote:
> user-cr's restart program creates a thread to pipe the checkpoint image
> into the sys_restart file descriptor.  This is a thread created with
> clone(2) and it shares its address space with the coordinator.
> 
> While glibc has internal mechanisms to ensure thread safety, these work
> only with threads that were created using glibc/pthread interfaces.
> clone(2) bypasses the housekeeping that glibc does to track threads.  It
> is not safe to call e.g. malloc or printf from the feeder thread.
> 
> The behavior I've been seeing is that restart will occasionally abort,
> crash, or sleep indefinitely (with both the coordinator and feeder
> threads waiting forever on the same futex) - before restart(2) or
> eclone(2) are ever called.
> 
> I have tried patching user-cr to create the feeder thread with
> pthread_create, but it's not trivial -- I think the program's correct
> functioning depends heavily on the threads having separate file
> descriptor tables.
> 
> The best I can come up with right now is to allocate ckpt_msg's buffer
> on the stack - I think this avoids most if not all of the concurrent
> malloc activity associated with the crashes/hangs I've been seing.
> 
>  common.h |   16 ++++++----------
>  1 files changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/common.h b/common.h
> index 99b224d..927b146 100644
> --- a/common.h
> +++ b/common.h
> @@ -1,25 +1,21 @@
>  #include <stdio.h>
>  #include <signal.h>
>  
> -#define BUFSIZE  (4 * 4096)
> +#define BUFSIZE  (4096)
>  
>  static inline void ckpt_msg(int fd, char *format, ...)
>  {
> +	char buf[BUFSIZE] = { '\0' };
>  	va_list ap;
> -	char *bufp;
> +
>  	if (fd < 0)
>  		return;
>  
>  	va_start(ap, format);
> -
> -	bufp = malloc(BUFSIZE);
> -	if(bufp) {
> -		vsnprintf(bufp, BUFSIZE, format, ap);
> -		write(fd, bufp, strlen(bufp));
> -	}
> -	free(bufp);
> -
> +	vsnprintf(buf, BUFSIZE, format, ap);
>  	va_end(ap);
> +
> +	write(fd, buf, strlen(buf));
>  }
>  
>  #define ckpt_perror(s) 							\
> 
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: user-cr thread safety
       [not found]   ` <4C51968D.3000301-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2010-07-29 17:37     ` Nathan Lynch
  2010-07-29 22:14       ` Oren Laadan
  0 siblings, 1 reply; 13+ messages in thread
From: Nathan Lynch @ 2010-07-29 17:37 UTC (permalink / raw)
  To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu, 2010-07-29 at 10:56 -0400, Oren Laadan wrote:
> 1) The separate fd-table between the coordinator and the feeder
> is just a convenience and can be relatively easily relaxed so
> that pthreads may be used. However, ...
> 
> 2) More importantly, malloc() and printf() also occur in the
> processes and threads generated during the creation of the new
> (restored) task tree. So the same problems may occur there as
> well. Unfortunately, here we can't use glibc, in part because
> it is not even supported by glibc.
> 
> Maybe a more robust way to address this is to: (1) use mmap()
> and munmap() instead of malloc() and free(), and also (2) use
> sprintf() + write() instead of printf().
> That should make everything thread-safe. Did you notice other
> libc calls which may be problematic ?

No, I think you covered it.  The only remaining concern I have is
whether accesses to global state (like the context) are adequately
serialized, and if not, what mechanism could provide mutual exclusion
(semaphores?).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: user-cr thread safety
  2010-07-29 17:37     ` Nathan Lynch
@ 2010-07-29 22:14       ` Oren Laadan
  0 siblings, 0 replies; 13+ messages in thread
From: Oren Laadan @ 2010-07-29 22:14 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



On 07/29/2010 01:37 PM, Nathan Lynch wrote:
> On Thu, 2010-07-29 at 10:56 -0400, Oren Laadan wrote:
>> 1) The separate fd-table between the coordinator and the feeder
>> is just a convenience and can be relatively easily relaxed so
>> that pthreads may be used. However, ...
>>
>> 2) More importantly, malloc() and printf() also occur in the
>> processes and threads generated during the creation of the new
>> (restored) task tree. So the same problems may occur there as
>> well. Unfortunately, here we can't use glibc, in part because
>> it is not even supported by glibc.
>>
>> Maybe a more robust way to address this is to: (1) use mmap()
>> and munmap() instead of malloc() and free(), and also (2) use
>> sprintf() + write() instead of printf().
>> That should make everything thread-safe. Did you notice other
>> libc calls which may be problematic ?
> 
> No, I think you covered it.  The only remaining concern I have is
> whether accesses to global state (like the context) are adequately
> serialized, and if not, what mechanism could provide mutual exclusion
> (semaphores?).

That global variables (*) like the context are set once prior
to the fork/clone starts, and are immutable thereafter.

So I'll go ahead and clean it up.

(*) except for signal handling, which I'd like to remove anyway.

Oren.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] restart: check for overflow when counting (nested) vpids
  2010-07-26 18:37 user-cr thread safety Nathan Lynch
  2010-07-29 14:56 ` Oren Laadan
@ 2010-07-30 17:08 ` Oren Laadan
  2010-07-30 17:08 ` [PATCH 2/4] restart thread safety: remove malloc from ckpt_fork_child Oren Laadan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Oren Laadan @ 2010-07-30 17:08 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Nathan Lynch

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 restart.c |   29 +++++++++++++++++------------
 1 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/restart.c b/restart.c
index d813269..b281ca2 100644
--- a/restart.c
+++ b/restart.c
@@ -1780,14 +1780,10 @@ static pid_t ckpt_fork_child(struct ckpt_ctx *ctx, struct task *child)
 			flags |= CLONE_NEWPID;
 		}
 		if (flags & CLONE_NEWPID && !ctx->args->pidns) {
-			ckpt_err("Must use --pidns for nested pidns container");
+			ckpt_err("need --pidns for nested pidns container");
 			errno = -EINVAL;
 			return -1;
 		}
-#if 0
-		if (flags & CLONE_NEWPID)
-			clone_args.nr_pids--;
-#endif
 #endif /* CLONE_NEWPID */
 	}
 
@@ -2375,17 +2371,26 @@ static int ckpt_read_vpids(struct ckpt_ctx *ctx)
 {
 	int i, len, ret;
 
-	for (i = 0; i < ctx->pids_nr; i++)
-		ctx->vpids_nr += ctx->pids_arr[i].depth;
+	for (i = 0; i < ctx->pids_nr; i++) {
+		if (ctx->pids_arr[i].depth < 0) {
+			ckpt_err("Invalid depth %d for pid %d",
+				 ctx->pids_arr[i].depth,
+				 ctx->tasks_arr[i].pid);
+			errno = -EINVAL;
+			return -1;
+		}
 
-	ckpt_dbg("number of vpids: %d\n", ctx->vpids_nr);
+		ctx->vpids_nr += ctx->pids_arr[i].depth;
 
-	if (ctx->vpids_nr < 0) {
-		ckpt_err("Invalid number of vpids %d", ctx->vpids_nr);
-		errno = -EINVAL;
-		return -1;
+		if(ctx->vpids_nr < 0) {
+			ckpt_err("Number of vpids overflowed");
+			errno = -E2BIG;
+			return -1;
+		}
 	}
 
+	ckpt_dbg("number of vpids: %d\n", ctx->vpids_nr);
+
 	if (!ctx->vpids_nr)
 		return 0;
 
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] restart thread safety: remove malloc from ckpt_fork_child
  2010-07-26 18:37 user-cr thread safety Nathan Lynch
  2010-07-29 14:56 ` Oren Laadan
  2010-07-30 17:08 ` [PATCH 1/4] restart: check for overflow when counting (nested) vpids Oren Laadan
@ 2010-07-30 17:08 ` Oren Laadan
  2010-07-30 17:08 ` [PATCH 3/4] restart thread safety: remove malloc from genstack Oren Laadan
  2010-07-30 17:08 ` [PATCH 4/4] restart thread-safety: avoid malloc in ckpt_msg() Oren Laadan
  4 siblings, 0 replies; 13+ messages in thread
From: Oren Laadan @ 2010-07-30 17:08 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Nathan Lynch

We use clone and eclone directly and not through glibc, therefore
must explicitly care about thread-safety of malloc.

This patch eliminates the use of malloc() from ckpt_fork_child(). The
malloc was needed to make a pid-array for eclone using data from the
vpids from the checkpoint image. In particular, it added the 0th level
pid. Instead, we re-package vpids array early on by expanding the array
and pre-adding 0th pids before any of the forks take place.

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 restart.c |   82 ++++++++++++++++++++++++++++++++----------------------------
 1 files changed, 44 insertions(+), 38 deletions(-)

diff --git a/restart.c b/restart.c
index b281ca2..b274e37 100644
--- a/restart.c
+++ b/restart.c
@@ -1725,9 +1725,9 @@ static pid_t ckpt_fork_child(struct ckpt_ctx *ctx, struct task *child)
 	unsigned long flags = SIGCHLD;
 	pid_t pid = 0;
 	pid_t *pids = &pid;
-	int i, j, depth;
+	int i, depth;
 
-	ckpt_dbg("forking child vpid %d flags %#x\n", child->pid, child->flags);
+	ckpt_dbg("fork child vpid %d flags %#x\n", child->pid, child->flags);
 
 	stk = genstack_alloc(PTHREAD_STACK_MIN);
 	if (!stk) {
@@ -1747,17 +1747,7 @@ static pid_t ckpt_fork_child(struct ckpt_ctx *ctx, struct task *child)
 		depth = child->piddepth + 1;
 		clone_args.nr_pids = depth;
 
-		pids = malloc(sizeof(pid_t) * depth);
-		if (!pids) {
-			perror("ckpt_fork_child pids malloc");
-			return -1;
-		}
-
-		memset(pids, 0, sizeof(pid_t) * depth);
-		pids[0] = child->pid;
-
-		for (i = child->piddepth - 1, j = 0; i >= 0; i--, j++)
-			pids[j + 1] = ctx->vpids_arr[child->vidx + j];
+		pids = &ctx->vpids_arr[child->vidx];
 
 #ifndef CLONE_NEWPID
 		if (child->piddepth > child->creator->piddepth) {
@@ -1810,9 +1800,6 @@ static pid_t ckpt_fork_child(struct ckpt_ctx *ctx, struct task *child)
 	if (pid < 0 || !(child->flags & TASK_THREAD))
 		genstack_release(stk);
 
-	if (pids != &pid)
-		free(pids);
-
 	ckpt_dbg("forked child vpid %d (asked %d)\n", pid, child->pid);
 	return pid;
 }
@@ -2336,34 +2323,53 @@ static int ckpt_read_tree(struct ckpt_ctx *ctx)
 	return ret;
 }
 
-/* set the vpids pointers in all the tasks */
+/*
+ * transform vpids arrays to the format convenient for eclone:
+ * prefix the level 0 pid to every sequence of nested pids.
+ * also,  set the vpids pointers in all the tasks.
+ */
 static int assign_vpids(struct ckpt_ctx *ctx)
 {
-	int d, hidx, tidx;
-	struct task *t;
+	__s32 *vpids_arr;
+	int depth, hidx, vidx, tidx;
+	struct task *task;
 
-	for (hidx = 0, tidx = 0; tidx < ctx->pids_nr; tidx++) {
-		t = &ctx->tasks_arr[tidx];
-		d = t->piddepth = ctx->pids_arr[tidx].depth;
-		if (!d) {
-			ckpt_dbg("task[%d].vidx = -1\n", tidx);
-			t->vidx = -1;
-			continue;
-		}
-		t->vidx = hidx;
+	vpids_arr = malloc(sizeof(__s32) * (ctx->vpids_nr + ctx->pids_nr));
+	if (vpids_arr == NULL) {
+		perror("assign_vpids malloc");
+		return -1;
+	}
+
+	for (tidx = 0, hidx = 0, vidx = 0; tidx < ctx->pids_nr; tidx++) {
+		task = &ctx->tasks_arr[tidx];
+		depth = ctx->pids_arr[tidx].depth;
+
+		task->vidx = vidx;
+		task->piddepth = depth;
+
+		/* set task's and top level pid */
+		vpids_arr[vidx++] = task->pid;
+		/* copy task's nested pids */
+		memcpy(&vpids_arr[vidx], &ctx->vpids_arr[hidx],
+		       sizeof(__s32) * depth);
+
+		vidx += depth;
+		hidx += depth;
+
+#ifdef CHECKPOINT_DEBUG
 		ckpt_dbg("task[%d].vidx = %d (depth %d, rpid %d)\n",
-			tidx, hidx, t->piddepth, ctx->pids_arr[tidx].vpid);
-		int i;
-		for (i=0; i<t->piddepth; i++)
-			ckpt_dbg("task[%d].vpid[%d] = %d\n", tidx, i,
-				ctx->vpids_arr[hidx+i]);
-		hidx += d;
-		if (hidx > ctx->vpids_nr) {
-			ckpt_err("Error parsing vpids array");
-			return -1;
+			tidx, vidx, depth, ctx->pids_arr[tidx].vpid);
+		while (depth-- > 0)  {
+			ckpt_dbg("task[%d].vpid[%d] = %d\n", tidx,
+				 depth, vpids_arr[hidx - depth - 1]);
 		}
+#endif
 	}
 
+	/* relpace "raw" vpids_arr with this one */
+	free(ctx->vpids_arr);
+	ctx->vpids_arr = vpids_arr;
+
 	return 0;
 }
 
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] restart thread safety: remove malloc from genstack
  2010-07-26 18:37 user-cr thread safety Nathan Lynch
                   ` (2 preceding siblings ...)
  2010-07-30 17:08 ` [PATCH 2/4] restart thread safety: remove malloc from ckpt_fork_child Oren Laadan
@ 2010-07-30 17:08 ` Oren Laadan
       [not found]   ` <1280509713-6745-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  2010-07-30 17:08 ` [PATCH 4/4] restart thread-safety: avoid malloc in ckpt_msg() Oren Laadan
  4 siblings, 1 reply; 13+ messages in thread
From: Oren Laadan @ 2010-07-30 17:08 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Nathan Lynch

We use clone and eclone directly and not through glibc, therefore
must explicitly care about thread-safety of malloc.

This patch eliminates the use of malloc() in genstack_alloc().
Use mmap() instead. While an overkill, I don't expect performance
issues of this.

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 genstack.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/genstack.c b/genstack.c
index c552d04..c97e0c0 100644
--- a/genstack.c
+++ b/genstack.c
@@ -34,7 +34,11 @@ struct genstack *genstack_alloc(size_t sz)
 	if (sz == 0)
 		return NULL;
 
-	stk = malloc(sizeof(*stk));
+	/*
+	 * This is clearly an overkill; however, we must not use
+	 * malloc because it may not be thread-safe!
+	 */
+	stk = mmap(NULL, page_size(), mmap_prot, mmap_flags, -1, 0);
 	if (!stk)
 		return NULL;
 
@@ -61,7 +65,7 @@ struct genstack *genstack_alloc(size_t sz)
 void genstack_release(struct genstack *stk)
 {
 	munmap(stk->addr, stk->size);
-	free(stk);
+	munmap(stk, page_size());
 }
 
 /* Return the size of the usable stack region.  Suitable for providing
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] restart thread-safety: avoid malloc in ckpt_msg()
  2010-07-26 18:37 user-cr thread safety Nathan Lynch
                   ` (3 preceding siblings ...)
  2010-07-30 17:08 ` [PATCH 3/4] restart thread safety: remove malloc from genstack Oren Laadan
@ 2010-07-30 17:08 ` Oren Laadan
       [not found]   ` <1280509713-6745-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  4 siblings, 1 reply; 13+ messages in thread
From: Oren Laadan @ 2010-07-30 17:08 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Nathan Lynch

We use clone and eclone directly and not through glibc, therefore
must explicitly care about thread-safety of malloc.

This patch removes the use of malloc in ckpt_msg() and instead
allocate a buffer on the stack. Also convert calls to strerr() to
to calls to strerr_r() which are thread-safe.

This patch is based on Nathan Lynch's patch:
"""
  I have tried patching user-cr to create the feeder thread with
  pthread_create, but it's not trivial -- I think the program's
  correct functioning depends heavily on the threads having separate
  file descriptor tables.

  The best I can come up with right now is to allocate ckpt_msg's
  buffer on the stack - I think this avoids most if not all of the
  concurrent malloc activity associated with the crashes/hangs I've
  been seing.
"""

Todo ??

Now the only remaining non-thread-safe behavior that I'm aware of is
the use of @errno: it is possible that an error in one thread will be
incorrectly reported by another thread. I think we can tolerate this
because it does not impact correctness when restart should succeed; at
worst it may confuse us about userspace errors in the restart.

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 common.h  |   25 +++++++++++--------------
 restart.c |    7 +++++--
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/common.h b/common.h
index 99b224d..faf180e 100644
--- a/common.h
+++ b/common.h
@@ -1,31 +1,28 @@
 #include <stdio.h>
 #include <signal.h>
 
-#define BUFSIZE  (4 * 4096)
+#define BUFSIZE  (4096)
 
 static inline void ckpt_msg(int fd, char *format, ...)
 {
+	char buf[BUFSIZE];
 	va_list ap;
-	char *bufp;
+
 	if (fd < 0)
 		return;
 
 	va_start(ap, format);
-
-	bufp = malloc(BUFSIZE);
-	if(bufp) {
-		vsnprintf(bufp, BUFSIZE, format, ap);
-		write(fd, bufp, strlen(bufp));
-	}
-	free(bufp);
-
+	vsnprintf(buf, BUFSIZE, format, ap);
 	va_end(ap);
+
+	write(fd, buf, strlen(buf));
 }
 
-#define ckpt_perror(s) 							\
-	do {								\
-		ckpt_msg(global_uerrfd, s);				\
-		ckpt_msg(global_uerrfd, ": %s\n", strerror(errno));	\
+#define ckpt_perror(s)						\
+	do {							\
+		char __buf[256];				\
+		strerror_r(errno, __buf, 256);			\
+		ckpt_msg(global_uerrfd, "%s: %s\n", s, __buf);	\
 	} while (0)
 
 #ifdef CHECKPOINT_DEBUG
diff --git a/restart.c b/restart.c
index b274e37..9cb7430 100644
--- a/restart.c
+++ b/restart.c
@@ -739,6 +739,7 @@ static int ckpt_pretend_reaper(struct ckpt_ctx *ctx)
 static int ckpt_probe_child(pid_t pid, char *str)
 {
 	int status, ret;
+	char buf[256];
 
 	/* use waitpid() to probe that a child is still alive */
 	ret = waitpid(pid, &status, WNOHANG);
@@ -746,11 +747,13 @@ static int ckpt_probe_child(pid_t pid, char *str)
 		report_exit_status(status, str, 0);
 		exit(1);
 	} else if (ret < 0 && errno == ECHILD) {
+		strerror_r(errno, buf, 256);
 		ckpt_err("WEIRD: %s exited without trace (%s)\n",
-			 str, strerror(errno));
+			 str, buf);
 		exit(1);
 	} else if (ret != 0) {
-		ckpt_err("waitpid for %s (%s)", str, strerror(errno));
+		strerror_r(errno, buf, 256);
+		ckpt_err("waitpid for %s (%s)", str, buf);
 		exit(1);
 	}
 	return 0;
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] restart thread safety: remove malloc from genstack
       [not found]   ` <1280509713-6745-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2010-07-30 18:46     ` Matt Helsley
       [not found]       ` <20100730184641.GB3426-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
  2010-08-04 23:08     ` Nathan Lynch
  1 sibling, 1 reply; 13+ messages in thread
From: Matt Helsley @ 2010-07-30 18:46 UTC (permalink / raw)
  To: Oren Laadan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Nathan Lynch

On Fri, Jul 30, 2010 at 01:08:32PM -0400, Oren Laadan wrote:
> We use clone and eclone directly and not through glibc, therefore
> must explicitly care about thread-safety of malloc.
> 
> This patch eliminates the use of malloc() in genstack_alloc().
> Use mmap() instead. While an overkill, I don't expect performance
> issues of this.

Should work so long as we know the glibc mmap wrappers themselves are
reentrant/thread-safe. I expect they are but this might be a good thing
to confirm.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] restart thread safety: remove malloc from genstack
       [not found]       ` <20100730184641.GB3426-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
@ 2010-07-30 18:57         ` Oren Laadan
  0 siblings, 0 replies; 13+ messages in thread
From: Oren Laadan @ 2010-07-30 18:57 UTC (permalink / raw)
  To: Matt Helsley
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Nathan Lynch


Matt Helsley wrote:
> On Fri, Jul 30, 2010 at 01:08:32PM -0400, Oren Laadan wrote:
>> We use clone and eclone directly and not through glibc, therefore
>> must explicitly care about thread-safety of malloc.
>>
>> This patch eliminates the use of malloc() in genstack_alloc().
>> Use mmap() instead. While an overkill, I don't expect performance
>> issues of this.
> 
> Should work so long as we know the glibc mmap wrappers themselves are
> reentrant/thread-safe. I expect they are but this might be a good thing
> to confirm.

True. And that goes for every system call we use in there -
including, for example, setsid().

One alternative is to call all those syscall directly using
the syscall() syscall. Not sure it's worth our time now, though.

Oren.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] restart thread safety: remove malloc from genstack
       [not found]   ` <1280509713-6745-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  2010-07-30 18:46     ` Matt Helsley
@ 2010-08-04 23:08     ` Nathan Lynch
  1 sibling, 0 replies; 13+ messages in thread
From: Nathan Lynch @ 2010-08-04 23:08 UTC (permalink / raw)
  To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Fri, 2010-07-30 at 13:08 -0400, Oren Laadan wrote:
> We use clone and eclone directly and not through glibc, therefore
> must explicitly care about thread-safety of malloc.
> 
> This patch eliminates the use of malloc() in genstack_alloc().
> Use mmap() instead. While an overkill, I don't expect performance
> issues of this.

Other than a missed conversion of free -> munmap in one of the error
paths in genstack_alloc, this looks fine.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] restart thread-safety: avoid malloc in ckpt_msg()
       [not found]   ` <1280509713-6745-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2010-08-04 23:30     ` Nathan Lynch
  2010-08-04 23:56       ` Oren Laadan
  0 siblings, 1 reply; 13+ messages in thread
From: Nathan Lynch @ 2010-08-04 23:30 UTC (permalink / raw)
  To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Fri, 2010-07-30 at 13:08 -0400, Oren Laadan wrote:
> We use clone and eclone directly and not through glibc, therefore
> must explicitly care about thread-safety of malloc.
> 
> This patch removes the use of malloc in ckpt_msg() and instead
> allocate a buffer on the stack. Also convert calls to strerr() to
> to calls to strerr_r() which are thread-safe.

Well, strerror_r is safe only for code that uses glibc/libpthread
interfaces to create threads, right?

Furthermore, strerror_r has different behaviors depending on whether
you're using the XSI- or GNU-specified version.  My local strerror(3)
man page says:

"The GNU-specific strerror_r() returns a pointer to a string  containing
the  error  message.  This may be either a pointer to a string that the
function stores in buf, or a pointer to some (immutable) static  string
(in which case buf is unused)."

And I'm seeing garbage output from ckpt_perror() with this patch
applied, implying that the GNU version is in use and that it is electing
not to modify the supplied buffer.

Surely strerror(errno) is "good enough" for error paths?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] restart thread-safety: avoid malloc in ckpt_msg()
  2010-08-04 23:30     ` Nathan Lynch
@ 2010-08-04 23:56       ` Oren Laadan
  0 siblings, 0 replies; 13+ messages in thread
From: Oren Laadan @ 2010-08-04 23:56 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA



On 08/04/2010 07:30 PM, Nathan Lynch wrote:
> On Fri, 2010-07-30 at 13:08 -0400, Oren Laadan wrote:
>> We use clone and eclone directly and not through glibc, therefore
>> must explicitly care about thread-safety of malloc.
>>
>> This patch removes the use of malloc in ckpt_msg() and instead
>> allocate a buffer on the stack. Also convert calls to strerr() to
>> to calls to strerr_r() which are thread-safe.
>
> Well, strerror_r is safe only for code that uses glibc/libpthread
> interfaces to create threads, right?
>
> Furthermore, strerror_r has different behaviors depending on whether
> you're using the XSI- or GNU-specified version.  My local strerror(3)
> man page says:
>
> "The GNU-specific strerror_r() returns a pointer to a string  containing
> the  error  message.  This may be either a pointer to a string that the
> function stores in buf, or a pointer to some (immutable) static  string
> (in which case buf is unused)."
>
> And I'm seeing garbage output from ckpt_perror() with this patch
> applied, implying that the GNU version is in use and that it is electing
> not to modify the supplied buffer.

Doh ... I should have known better.

Ok from the manpage:
"""
    Feature Test Macro Requirements for glibc (see
    feature_test_macros(7)):

        The XSI-compliant version of strerror_r() is provided if:
        (_POSIX_C_SOURCE >= 200112L || _XOPEN_SOURCE >= 600) && !_GNU_SOURCE
        Otherwise, the GNU-specific version is provided.
"""

so how about:

#if (_POSIX_C_SOURCE >= 200112L || _XOPEN_SOURCE >= 600) && !_GNU_SOURCE
   use-XSI
#else
   use-GNU
#endif

>
> Surely strerror(errno) is "good enough" for error paths?

Heh .. given that errno can already be scrambled between threads...

Oren.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-08-04 23:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-26 18:37 user-cr thread safety Nathan Lynch
2010-07-29 14:56 ` Oren Laadan
     [not found]   ` <4C51968D.3000301-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-07-29 17:37     ` Nathan Lynch
2010-07-29 22:14       ` Oren Laadan
2010-07-30 17:08 ` [PATCH 1/4] restart: check for overflow when counting (nested) vpids Oren Laadan
2010-07-30 17:08 ` [PATCH 2/4] restart thread safety: remove malloc from ckpt_fork_child Oren Laadan
2010-07-30 17:08 ` [PATCH 3/4] restart thread safety: remove malloc from genstack Oren Laadan
     [not found]   ` <1280509713-6745-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-07-30 18:46     ` Matt Helsley
     [not found]       ` <20100730184641.GB3426-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-07-30 18:57         ` Oren Laadan
2010-08-04 23:08     ` Nathan Lynch
2010-07-30 17:08 ` [PATCH 4/4] restart thread-safety: avoid malloc in ckpt_msg() Oren Laadan
     [not found]   ` <1280509713-6745-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-08-04 23:30     ` Nathan Lynch
2010-08-04 23:56       ` Oren Laadan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.