* Re: user-cr thread safety
2010-07-26 18:37 user-cr thread safety Nathan Lynch
@ 2010-07-29 14:56 ` Oren Laadan
[not found] ` <4C51968D.3000301-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-07-30 17:08 ` [PATCH 1/4] restart: check for overflow when counting (nested) vpids Oren Laadan
` (3 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Oren Laadan @ 2010-07-29 14:56 UTC (permalink / raw)
To: Nathan Lynch; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Nathan,
Thanks for pointing this out. A couple of comments:
1) The separate fd-table between the coordinator and the feeder
is just a convenience and can be relatively easily relaxed so
that pthreads may be used. However, ...
2) More importantly, malloc() and printf() also occur in the
processes and threads generated during the creation of the new
(restored) task tree. So the same problems may occur there as
well. Unfortunately, here we can't use glibc, in part because
it is not even supported by glibc.
Maybe a more robust way to address this is to: (1) use mmap()
and munmap() instead of malloc() and free(), and also (2) use
sprintf() + write() instead of printf().
That should make everything thread-safe. Did you notice other
libc calls which may be problematic ?
Oren.
Nathan Lynch wrote:
> user-cr's restart program creates a thread to pipe the checkpoint image
> into the sys_restart file descriptor. This is a thread created with
> clone(2) and it shares its address space with the coordinator.
>
> While glibc has internal mechanisms to ensure thread safety, these work
> only with threads that were created using glibc/pthread interfaces.
> clone(2) bypasses the housekeeping that glibc does to track threads. It
> is not safe to call e.g. malloc or printf from the feeder thread.
>
> The behavior I've been seeing is that restart will occasionally abort,
> crash, or sleep indefinitely (with both the coordinator and feeder
> threads waiting forever on the same futex) - before restart(2) or
> eclone(2) are ever called.
>
> I have tried patching user-cr to create the feeder thread with
> pthread_create, but it's not trivial -- I think the program's correct
> functioning depends heavily on the threads having separate file
> descriptor tables.
>
> The best I can come up with right now is to allocate ckpt_msg's buffer
> on the stack - I think this avoids most if not all of the concurrent
> malloc activity associated with the crashes/hangs I've been seing.
>
> common.h | 16 ++++++----------
> 1 files changed, 6 insertions(+), 10 deletions(-)
>
> diff --git a/common.h b/common.h
> index 99b224d..927b146 100644
> --- a/common.h
> +++ b/common.h
> @@ -1,25 +1,21 @@
> #include <stdio.h>
> #include <signal.h>
>
> -#define BUFSIZE (4 * 4096)
> +#define BUFSIZE (4096)
>
> static inline void ckpt_msg(int fd, char *format, ...)
> {
> + char buf[BUFSIZE] = { '\0' };
> va_list ap;
> - char *bufp;
> +
> if (fd < 0)
> return;
>
> va_start(ap, format);
> -
> - bufp = malloc(BUFSIZE);
> - if(bufp) {
> - vsnprintf(bufp, BUFSIZE, format, ap);
> - write(fd, bufp, strlen(bufp));
> - }
> - free(bufp);
> -
> + vsnprintf(buf, BUFSIZE, format, ap);
> va_end(ap);
> +
> + write(fd, buf, strlen(buf));
> }
>
> #define ckpt_perror(s) \
>
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 1/4] restart: check for overflow when counting (nested) vpids
2010-07-26 18:37 user-cr thread safety Nathan Lynch
2010-07-29 14:56 ` Oren Laadan
@ 2010-07-30 17:08 ` Oren Laadan
2010-07-30 17:08 ` [PATCH 2/4] restart thread safety: remove malloc from ckpt_fork_child Oren Laadan
` (2 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Oren Laadan @ 2010-07-30 17:08 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Nathan Lynch
Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
restart.c | 29 +++++++++++++++++------------
1 files changed, 17 insertions(+), 12 deletions(-)
diff --git a/restart.c b/restart.c
index d813269..b281ca2 100644
--- a/restart.c
+++ b/restart.c
@@ -1780,14 +1780,10 @@ static pid_t ckpt_fork_child(struct ckpt_ctx *ctx, struct task *child)
flags |= CLONE_NEWPID;
}
if (flags & CLONE_NEWPID && !ctx->args->pidns) {
- ckpt_err("Must use --pidns for nested pidns container");
+ ckpt_err("need --pidns for nested pidns container");
errno = -EINVAL;
return -1;
}
-#if 0
- if (flags & CLONE_NEWPID)
- clone_args.nr_pids--;
-#endif
#endif /* CLONE_NEWPID */
}
@@ -2375,17 +2371,26 @@ static int ckpt_read_vpids(struct ckpt_ctx *ctx)
{
int i, len, ret;
- for (i = 0; i < ctx->pids_nr; i++)
- ctx->vpids_nr += ctx->pids_arr[i].depth;
+ for (i = 0; i < ctx->pids_nr; i++) {
+ if (ctx->pids_arr[i].depth < 0) {
+ ckpt_err("Invalid depth %d for pid %d",
+ ctx->pids_arr[i].depth,
+ ctx->tasks_arr[i].pid);
+ errno = -EINVAL;
+ return -1;
+ }
- ckpt_dbg("number of vpids: %d\n", ctx->vpids_nr);
+ ctx->vpids_nr += ctx->pids_arr[i].depth;
- if (ctx->vpids_nr < 0) {
- ckpt_err("Invalid number of vpids %d", ctx->vpids_nr);
- errno = -EINVAL;
- return -1;
+ if(ctx->vpids_nr < 0) {
+ ckpt_err("Number of vpids overflowed");
+ errno = -E2BIG;
+ return -1;
+ }
}
+ ckpt_dbg("number of vpids: %d\n", ctx->vpids_nr);
+
if (!ctx->vpids_nr)
return 0;
--
1.7.0.4
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH 2/4] restart thread safety: remove malloc from ckpt_fork_child
2010-07-26 18:37 user-cr thread safety Nathan Lynch
2010-07-29 14:56 ` Oren Laadan
2010-07-30 17:08 ` [PATCH 1/4] restart: check for overflow when counting (nested) vpids Oren Laadan
@ 2010-07-30 17:08 ` Oren Laadan
2010-07-30 17:08 ` [PATCH 3/4] restart thread safety: remove malloc from genstack Oren Laadan
2010-07-30 17:08 ` [PATCH 4/4] restart thread-safety: avoid malloc in ckpt_msg() Oren Laadan
4 siblings, 0 replies; 13+ messages in thread
From: Oren Laadan @ 2010-07-30 17:08 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Nathan Lynch
We use clone and eclone directly and not through glibc, therefore
must explicitly care about thread-safety of malloc.
This patch eliminates the use of malloc() from ckpt_fork_child(). The
malloc was needed to make a pid-array for eclone using data from the
vpids from the checkpoint image. In particular, it added the 0th level
pid. Instead, we re-package vpids array early on by expanding the array
and pre-adding 0th pids before any of the forks take place.
Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
restart.c | 82 ++++++++++++++++++++++++++++++++----------------------------
1 files changed, 44 insertions(+), 38 deletions(-)
diff --git a/restart.c b/restart.c
index b281ca2..b274e37 100644
--- a/restart.c
+++ b/restart.c
@@ -1725,9 +1725,9 @@ static pid_t ckpt_fork_child(struct ckpt_ctx *ctx, struct task *child)
unsigned long flags = SIGCHLD;
pid_t pid = 0;
pid_t *pids = &pid;
- int i, j, depth;
+ int i, depth;
- ckpt_dbg("forking child vpid %d flags %#x\n", child->pid, child->flags);
+ ckpt_dbg("fork child vpid %d flags %#x\n", child->pid, child->flags);
stk = genstack_alloc(PTHREAD_STACK_MIN);
if (!stk) {
@@ -1747,17 +1747,7 @@ static pid_t ckpt_fork_child(struct ckpt_ctx *ctx, struct task *child)
depth = child->piddepth + 1;
clone_args.nr_pids = depth;
- pids = malloc(sizeof(pid_t) * depth);
- if (!pids) {
- perror("ckpt_fork_child pids malloc");
- return -1;
- }
-
- memset(pids, 0, sizeof(pid_t) * depth);
- pids[0] = child->pid;
-
- for (i = child->piddepth - 1, j = 0; i >= 0; i--, j++)
- pids[j + 1] = ctx->vpids_arr[child->vidx + j];
+ pids = &ctx->vpids_arr[child->vidx];
#ifndef CLONE_NEWPID
if (child->piddepth > child->creator->piddepth) {
@@ -1810,9 +1800,6 @@ static pid_t ckpt_fork_child(struct ckpt_ctx *ctx, struct task *child)
if (pid < 0 || !(child->flags & TASK_THREAD))
genstack_release(stk);
- if (pids != &pid)
- free(pids);
-
ckpt_dbg("forked child vpid %d (asked %d)\n", pid, child->pid);
return pid;
}
@@ -2336,34 +2323,53 @@ static int ckpt_read_tree(struct ckpt_ctx *ctx)
return ret;
}
-/* set the vpids pointers in all the tasks */
+/*
+ * transform vpids arrays to the format convenient for eclone:
+ * prefix the level 0 pid to every sequence of nested pids.
+ * also, set the vpids pointers in all the tasks.
+ */
static int assign_vpids(struct ckpt_ctx *ctx)
{
- int d, hidx, tidx;
- struct task *t;
+ __s32 *vpids_arr;
+ int depth, hidx, vidx, tidx;
+ struct task *task;
- for (hidx = 0, tidx = 0; tidx < ctx->pids_nr; tidx++) {
- t = &ctx->tasks_arr[tidx];
- d = t->piddepth = ctx->pids_arr[tidx].depth;
- if (!d) {
- ckpt_dbg("task[%d].vidx = -1\n", tidx);
- t->vidx = -1;
- continue;
- }
- t->vidx = hidx;
+ vpids_arr = malloc(sizeof(__s32) * (ctx->vpids_nr + ctx->pids_nr));
+ if (vpids_arr == NULL) {
+ perror("assign_vpids malloc");
+ return -1;
+ }
+
+ for (tidx = 0, hidx = 0, vidx = 0; tidx < ctx->pids_nr; tidx++) {
+ task = &ctx->tasks_arr[tidx];
+ depth = ctx->pids_arr[tidx].depth;
+
+ task->vidx = vidx;
+ task->piddepth = depth;
+
+ /* set task's and top level pid */
+ vpids_arr[vidx++] = task->pid;
+ /* copy task's nested pids */
+ memcpy(&vpids_arr[vidx], &ctx->vpids_arr[hidx],
+ sizeof(__s32) * depth);
+
+ vidx += depth;
+ hidx += depth;
+
+#ifdef CHECKPOINT_DEBUG
ckpt_dbg("task[%d].vidx = %d (depth %d, rpid %d)\n",
- tidx, hidx, t->piddepth, ctx->pids_arr[tidx].vpid);
- int i;
- for (i=0; i<t->piddepth; i++)
- ckpt_dbg("task[%d].vpid[%d] = %d\n", tidx, i,
- ctx->vpids_arr[hidx+i]);
- hidx += d;
- if (hidx > ctx->vpids_nr) {
- ckpt_err("Error parsing vpids array");
- return -1;
+ tidx, vidx, depth, ctx->pids_arr[tidx].vpid);
+ while (depth-- > 0) {
+ ckpt_dbg("task[%d].vpid[%d] = %d\n", tidx,
+ depth, vpids_arr[hidx - depth - 1]);
}
+#endif
}
+ /* relpace "raw" vpids_arr with this one */
+ free(ctx->vpids_arr);
+ ctx->vpids_arr = vpids_arr;
+
return 0;
}
--
1.7.0.4
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH 3/4] restart thread safety: remove malloc from genstack
2010-07-26 18:37 user-cr thread safety Nathan Lynch
` (2 preceding siblings ...)
2010-07-30 17:08 ` [PATCH 2/4] restart thread safety: remove malloc from ckpt_fork_child Oren Laadan
@ 2010-07-30 17:08 ` Oren Laadan
[not found] ` <1280509713-6745-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-07-30 17:08 ` [PATCH 4/4] restart thread-safety: avoid malloc in ckpt_msg() Oren Laadan
4 siblings, 1 reply; 13+ messages in thread
From: Oren Laadan @ 2010-07-30 17:08 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Nathan Lynch
We use clone and eclone directly and not through glibc, therefore
must explicitly care about thread-safety of malloc.
This patch eliminates the use of malloc() in genstack_alloc().
Use mmap() instead. While an overkill, I don't expect performance
issues of this.
Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
genstack.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/genstack.c b/genstack.c
index c552d04..c97e0c0 100644
--- a/genstack.c
+++ b/genstack.c
@@ -34,7 +34,11 @@ struct genstack *genstack_alloc(size_t sz)
if (sz == 0)
return NULL;
- stk = malloc(sizeof(*stk));
+ /*
+ * This is clearly an overkill; however, we must not use
+ * malloc because it may not be thread-safe!
+ */
+ stk = mmap(NULL, page_size(), mmap_prot, mmap_flags, -1, 0);
if (!stk)
return NULL;
@@ -61,7 +65,7 @@ struct genstack *genstack_alloc(size_t sz)
void genstack_release(struct genstack *stk)
{
munmap(stk->addr, stk->size);
- free(stk);
+ munmap(stk, page_size());
}
/* Return the size of the usable stack region. Suitable for providing
--
1.7.0.4
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH 4/4] restart thread-safety: avoid malloc in ckpt_msg()
2010-07-26 18:37 user-cr thread safety Nathan Lynch
` (3 preceding siblings ...)
2010-07-30 17:08 ` [PATCH 3/4] restart thread safety: remove malloc from genstack Oren Laadan
@ 2010-07-30 17:08 ` Oren Laadan
[not found] ` <1280509713-6745-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
4 siblings, 1 reply; 13+ messages in thread
From: Oren Laadan @ 2010-07-30 17:08 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Nathan Lynch
We use clone and eclone directly and not through glibc, therefore
must explicitly care about thread-safety of malloc.
This patch removes the use of malloc in ckpt_msg() and instead
allocate a buffer on the stack. Also convert calls to strerr() to
to calls to strerr_r() which are thread-safe.
This patch is based on Nathan Lynch's patch:
"""
I have tried patching user-cr to create the feeder thread with
pthread_create, but it's not trivial -- I think the program's
correct functioning depends heavily on the threads having separate
file descriptor tables.
The best I can come up with right now is to allocate ckpt_msg's
buffer on the stack - I think this avoids most if not all of the
concurrent malloc activity associated with the crashes/hangs I've
been seing.
"""
Todo ??
Now the only remaining non-thread-safe behavior that I'm aware of is
the use of @errno: it is possible that an error in one thread will be
incorrectly reported by another thread. I think we can tolerate this
because it does not impact correctness when restart should succeed; at
worst it may confuse us about userspace errors in the restart.
Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
common.h | 25 +++++++++++--------------
restart.c | 7 +++++--
2 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/common.h b/common.h
index 99b224d..faf180e 100644
--- a/common.h
+++ b/common.h
@@ -1,31 +1,28 @@
#include <stdio.h>
#include <signal.h>
-#define BUFSIZE (4 * 4096)
+#define BUFSIZE (4096)
static inline void ckpt_msg(int fd, char *format, ...)
{
+ char buf[BUFSIZE];
va_list ap;
- char *bufp;
+
if (fd < 0)
return;
va_start(ap, format);
-
- bufp = malloc(BUFSIZE);
- if(bufp) {
- vsnprintf(bufp, BUFSIZE, format, ap);
- write(fd, bufp, strlen(bufp));
- }
- free(bufp);
-
+ vsnprintf(buf, BUFSIZE, format, ap);
va_end(ap);
+
+ write(fd, buf, strlen(buf));
}
-#define ckpt_perror(s) \
- do { \
- ckpt_msg(global_uerrfd, s); \
- ckpt_msg(global_uerrfd, ": %s\n", strerror(errno)); \
+#define ckpt_perror(s) \
+ do { \
+ char __buf[256]; \
+ strerror_r(errno, __buf, 256); \
+ ckpt_msg(global_uerrfd, "%s: %s\n", s, __buf); \
} while (0)
#ifdef CHECKPOINT_DEBUG
diff --git a/restart.c b/restart.c
index b274e37..9cb7430 100644
--- a/restart.c
+++ b/restart.c
@@ -739,6 +739,7 @@ static int ckpt_pretend_reaper(struct ckpt_ctx *ctx)
static int ckpt_probe_child(pid_t pid, char *str)
{
int status, ret;
+ char buf[256];
/* use waitpid() to probe that a child is still alive */
ret = waitpid(pid, &status, WNOHANG);
@@ -746,11 +747,13 @@ static int ckpt_probe_child(pid_t pid, char *str)
report_exit_status(status, str, 0);
exit(1);
} else if (ret < 0 && errno == ECHILD) {
+ strerror_r(errno, buf, 256);
ckpt_err("WEIRD: %s exited without trace (%s)\n",
- str, strerror(errno));
+ str, buf);
exit(1);
} else if (ret != 0) {
- ckpt_err("waitpid for %s (%s)", str, strerror(errno));
+ strerror_r(errno, buf, 256);
+ ckpt_err("waitpid for %s (%s)", str, buf);
exit(1);
}
return 0;
--
1.7.0.4
^ permalink raw reply related [flat|nested] 13+ messages in thread