* [PATCH 0/7] Expand usage of ckpt_err
@ 2009-11-06 0:00 serue-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <1257465619-1777-1-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 18+ messages in thread
From: serue-r/Jw6+rmf7HQT0dZR+AlfA @ 2009-11-06 0:00 UTC (permalink / raw)
To: orenl-RdfvBDnrOixBDgjK7y7TUQ
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
This patchset expands the usage of ckpt_err(). It moves the
setting of ckpt_err (and CKPT_CTX_ERROR kflag) into ckpt_err()
itself. It puts cktp_err()s in useful places for restart, and
in some cases improves the information in the messages. It
also splits the old 'restore_notify_error', which used to both
set ctx->errno and wake all restarting tasks. Since we can pass
an error to ckpt_err(), and there might be several ckpt_err()s
along a failure path, we set ctx->errno only the first time, while
we explicitly call restore_wake_all_on_error() at the end (of
do_XYZ_task()) to actually wake the remaining tasks and notify
them of the error.
The intent of this patchset is to provide an end-user with
useful info when restart fails.
It is available as branch nov5.4 of
git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux-cr.git (gitweb at
http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/sergeh/linux-cr.git;a=shortlog;h=refs/heads/nov5.4
)
thanks,
-serge
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/7] move handling of err down into _ckpt_do_msg and _append
[not found] ` <1257465619-1777-1-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-11-06 0:00 ` serue-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <1257465619-1777-2-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-06 0:00 ` [PATCH 2/7] restart.c: use ckpt_err serue-r/Jw6+rmf7HQT0dZR+AlfA
` (5 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: serue-r/Jw6+rmf7HQT0dZR+AlfA @ 2009-11-06 0:00 UTC (permalink / raw)
To: orenl-RdfvBDnrOixBDgjK7y7TUQ
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Since ckpt_err() will also be used at restart, and since at
restart we don't want to just set ctx->err, move handling of
the argument 'err' further down. We only set ckpt->errno if
err is not 0 and a checkpoint is going on. Then we automatically
add '[err %d] % err' if err is non-0, so that the caller has a
choice of passing in err or passing %(E) in fmt and err as a
vararg.
Changelog:
nov 4: drop %(E) and print passed-in err instead
Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
checkpoint/sys.c | 51 +++++++++++++++++++++++++++----------------
include/linux/checkpoint.h | 16 ++++++-------
2 files changed, 39 insertions(+), 28 deletions(-)
diff --git a/checkpoint/sys.c b/checkpoint/sys.c
index 7d14b30..c1c4e99 100644
--- a/checkpoint/sys.c
+++ b/checkpoint/sys.c
@@ -382,14 +382,13 @@ static inline int is_special_flag(char *s)
* The special flags are surrounded by %() to help them visually stand
* out. For instance, %(O) means an objref. The following special
* flags are recognized:
- * E: error
* O: objref
* P: pointer
* T: task
* S: string
* V: variable
*
- * %(E) will be expanded to "[err %d]". Likewise O, P, S, and V, will
+ * %(O) will be expanded to "[obj %d]". Likewise P, S, and V, will
* also expand to format flags requiring an argument to the subsequent
* sprintf or printk. T will be expanded to a string with no flags,
* requiring no further arguments.
@@ -401,7 +400,7 @@ static inline int is_special_flag(char *s)
* the additional variabes, in order, to match the @fmt (except for
* the T key), e.g.:
*
- * ckpt_err(ctx, "%(T)FILE flags %d %(O) %(E)\n", flags, objref, err);
+ * ckpt_err(ctx, err, "%(T)FILE flags %d %(O)\n", flags, objref);
*
* May be called under spinlock.
* Must be called with ctx->msg_mutex held. The expanded format
@@ -418,9 +417,6 @@ static void _ckpt_generate_fmt(struct ckpt_ctx *ctx, char *fmt)
continue;
}
switch (fmt[2]) {
- case 'E':
- len += snprintf(s+len, CKPT_MSG_LEN-len, "[err %%d]");
- break;
case 'O':
len += snprintf(s+len, CKPT_MSG_LEN-len, "[obj %%d]");
break;
@@ -455,24 +451,41 @@ static void _ckpt_generate_fmt(struct ckpt_ctx *ctx, char *fmt)
s[len] = '\0';
}
-static void _ckpt_msg_appendv(struct ckpt_ctx *ctx, char *fmt, va_list ap)
+static void _ckpt_msg_appendv(struct ckpt_ctx *ctx, int err, char *fmt,
+ va_list ap)
{
int len = ctx->msglen;
+ if (err) {
+ /* At restart we must use a more baroque helper to set
+ * ctx->errno, which also wakes all other waiting restarting
+ * tasks. But at checkpoint we just set ctx->errno so that
+ * _ckpt_msg_complete() will know to write the error message
+ * to the checkpoint image.
+ */
+ if (ctx->kflags & CKPT_CTX_CHECKPOINT && !ctx->errno)
+ ctx->errno = err;
+ len += snprintf(&ctx->msg[len], CKPT_MSG_LEN-len, "[err %d]",
+ err);
+ if (len > CKPT_MSG_LEN)
+ goto full;
+ }
+
len += vsnprintf(&ctx->msg[len], CKPT_MSG_LEN-len, fmt, ap);
if (len > CKPT_MSG_LEN) {
+full:
len = CKPT_MSG_LEN;
ctx->msg[CKPT_MSG_LEN-1] = '\0';
}
ctx->msglen = len;
}
-void _ckpt_msg_append(struct ckpt_ctx *ctx, char *fmt, ...)
+void _ckpt_msg_append(struct ckpt_ctx *ctx, int err, char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
- _ckpt_msg_appendv(ctx, fmt, ap);
+ _ckpt_msg_appendv(ctx, err, fmt, ap);
va_end(ap);
}
@@ -507,25 +520,25 @@ void _ckpt_msg_complete(struct ckpt_ctx *ctx)
ctx->msglen = 0;
}
-#define __do_ckpt_msg(ctx, fmt) do { \
- va_list ap; \
- _ckpt_generate_fmt(ctx, fmt); \
- va_start(ap, fmt); \
- _ckpt_msg_appendv(ctx, ctx->fmt, ap); \
- va_end(ap); \
+#define __do_ckpt_msg(ctx, err, fmt) do { \
+ va_list ap; \
+ _ckpt_generate_fmt(ctx, fmt); \
+ va_start(ap, fmt); \
+ _ckpt_msg_appendv(ctx, err, ctx->fmt, ap); \
+ va_end(ap); \
} while (0)
-void _do_ckpt_msg(struct ckpt_ctx *ctx, char *fmt, ...)
+void _do_ckpt_msg(struct ckpt_ctx *ctx, int err, char *fmt, ...)
{
- __do_ckpt_msg(ctx, fmt);
+ __do_ckpt_msg(ctx, err, fmt);
}
-void do_ckpt_msg(struct ckpt_ctx *ctx, char *fmt, ...)
+void do_ckpt_msg(struct ckpt_ctx *ctx, int err, char *fmt, ...)
{
if (!ctx) return;
ckpt_msg_lock(ctx);
- __do_ckpt_msg(ctx, fmt);
+ __do_ckpt_msg(ctx, err, fmt);
_ckpt_msg_complete(ctx);
ckpt_msg_unlock(ctx);
}
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 89e6fb3..8fd6cba 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -384,7 +384,7 @@ extern void ckpt_msg_unlock(struct ckpt_ctx *ctx);
* May be called under spinlock.
* Must be called under ckpt_msg_lock().
*/
-extern void _ckpt_msg_append(struct ckpt_ctx *ctx, char *fmt, ...);
+extern void _ckpt_msg_append(struct ckpt_ctx *ctx, int err, char *fmt, ...);
/*
* Write ctx->msg to all relevant places.
@@ -399,7 +399,7 @@ extern void _ckpt_msg_complete(struct ckpt_ctx *ctx);
* the caller will have to use _ckpt_msg_complete() to finish up.
* Must be called with ckpt_msg_lock held.
*/
-extern void _do_ckpt_msg(struct ckpt_ctx *ctx, char *fmt, ...);
+extern void _do_ckpt_msg(struct ckpt_ctx *ctx, int err, char *fmt, ...);
/*
* Append an enhanced formatted message to ctx->msg.
@@ -411,12 +411,11 @@ extern void _do_ckpt_msg(struct ckpt_ctx *ctx, char *fmt, ...);
*
* Must not be called under spinlock.
*/
-extern void do_ckpt_msg(struct ckpt_ctx *ctx, char *fmt, ...);
+extern void do_ckpt_msg(struct ckpt_ctx *ctx, int err, char *fmt, ...);
#define ckpt_err(ctx, err, fmt, args...) do { \
- ctx->errno = (err); \
- do_ckpt_msg(ctx, "[Error at %s:%d][err %d]" fmt, __func__, __LINE__, \
- (err), ##args); \
+ do_ckpt_msg(ctx, (err), "[Error at %s:%d]" fmt, __func__, __LINE__, \
+ ##args); \
} while (0)
@@ -427,9 +426,8 @@ extern void do_ckpt_msg(struct ckpt_ctx *ctx, char *fmt, ...);
* must be followed by a call to _ckpt_msg_complete()
*/
#define _ckpt_err(ctx, err, fmt, args...) do { \
- ctx->errno = (err); \
- _do_ckpt_msg(ctx, "[ error %s:%d][err %d]" fmt, __func__, __LINE__, \
- (err), ##args); \
+ _do_ckpt_msg(ctx, (err), "[Error at %s:%d]" fmt, __func__, __LINE__, \
+ ##args); \
} while (0)
#endif /* CONFIG_CHECKPOINT */
--
1.6.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/7] restart.c: use ckpt_err
[not found] ` <1257465619-1777-1-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-06 0:00 ` [PATCH 1/7] move handling of err down into _ckpt_do_msg and _append serue-r/Jw6+rmf7HQT0dZR+AlfA
@ 2009-11-06 0:00 ` serue-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <1257465619-1777-3-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-06 0:00 ` [PATCH 3/7] process.c: use ckpt_err at restart serue-r/Jw6+rmf7HQT0dZR+AlfA
` (4 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: serue-r/Jw6+rmf7HQT0dZR+AlfA @ 2009-11-06 0:00 UTC (permalink / raw)
To: orenl-RdfvBDnrOixBDgjK7y7TUQ
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Changelog:
Nove 4: update restart.c ckpt_err usage
Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
checkpoint/restart.c | 81 +++++++++++++++++++++++++++++++++++---------------
1 files changed, 57 insertions(+), 24 deletions(-)
diff --git a/checkpoint/restart.c b/checkpoint/restart.c
index 130b4b2..e1bd0ad 100644
--- a/checkpoint/restart.c
+++ b/checkpoint/restart.c
@@ -64,7 +64,7 @@ static int restore_debug_task(struct ckpt_ctx *ctx, int flags)
s = kmalloc(sizeof(*s), GFP_KERNEL);
if (!s) {
- ckpt_debug("no memory to register ?!\n");
+ ckpt_err(ctx, 0, "no memory to register ?!\n");
return -ENOMEM;
}
s->pid = current->pid;
@@ -197,13 +197,13 @@ static int _ckpt_read_err(struct ckpt_ctx *ctx, struct ckpt_hdr *h)
len = h->len - sizeof(*h);
ptr = kzalloc(len + 1, GFP_KERNEL);
if (!ptr) {
- ckpt_debug("insufficient memory to report image error\n");
+ ckpt_err(ctx, 0, "insufficient memory to report image error\n");
return -ENOMEM;
}
ret = ckpt_kread(ctx, ptr, len);
if (ret >= 0) {
- ckpt_debug("%s\n", &ptr[1]);
+ ckpt_err(ctx, 0, "%(S)Error recorded in image\n", &ptr[1]);
ret = -EIO;
}
@@ -757,7 +757,8 @@ static int set_task_ctx(struct task_struct *task, struct ckpt_ctx *ctx)
task->checkpoint_ctx = ckpt_ctx_get(ctx);
ret = 0;
} else {
- ckpt_debug("task %d has checkpoint_ctx\n", task_pid_vnr(task));
+ ckpt_err(ctx, 0, "task %d already has checkpoint_ctx\n",
+ task_pid_vnr(task));
ret = 1;
}
task_unlock(task);
@@ -807,7 +808,7 @@ static int restore_activate_next(struct ckpt_ctx *ctx)
rcu_read_unlock();
if (!task) {
- ckpt_debug("could not find task %d\n", pid);
+ ckpt_err(ctx, 0, "could not find task %d\n", pid);
restore_notify_error(ctx, -ESRCH);
return -ESRCH;
}
@@ -962,29 +963,38 @@ static int do_restore_task(void)
current->flags |= PF_RESTARTING;
ret = wait_sync_threads();
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "wait_sync_threads\n");
goto out;
+ }
/* wait for our turn, do the restore, and tell next task in line */
ret = wait_task_active(ctx);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "wait_task_active\n");
goto out;
+ }
restore_debug_running(ctx);
ret = pre_restore_task();
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "pre_restore_task\n");
goto out;
+ }
zombie = restore_task(ctx);
if (zombie < 0) {
ret = zombie;
+ ckpt_err(ctx, ret, "restore_task\n");
goto out;
}
ret = restore_activate_next(ctx);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "restore_activate_next\n");
goto out;
+ }
/*
* zombie: we're done here; do_exit() will notice the @ctx on
@@ -1025,12 +1035,12 @@ static int __prepare_descendants(struct task_struct *task, void *data)
ckpt_debug("consider task %d\n", task_pid_vnr(task));
if (!ptrace_may_access(task, PTRACE_MODE_ATTACH)) {
- ckpt_debug("stranger task %d\n", task_pid_vnr(task));
+ ckpt_err(ctx, 0, "stranger task %d\n", task_pid_vnr(task));
return -EPERM;
}
if (task_ptrace(task) & PT_PTRACED) {
- ckpt_debug("ptraced task %d\n", task_pid_vnr(task));
+ ckpt_err(ctx, 0, "ptraced task %d\n", task_pid_vnr(task));
return -EBUSY;
}
@@ -1187,23 +1197,34 @@ static int do_restore_coord(struct ckpt_ctx *ctx, pid_t pid)
ret = restore_read_header(ctx);
ckpt_debug("restore header: %d\n", ret);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "restore header\n");
return ret;
+ }
ret = restore_container(ctx);
ckpt_debug("restore container: %d\n", ret);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "restore container\n");
return ret;
+ }
ret = restore_read_tree(ctx);
ckpt_debug("restore tree: %d\n", ret);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "restore tree\n");
return ret;
+ }
- if ((ctx->uflags & RESTART_TASKSELF) && ctx->nr_pids != 1)
+ if ((ctx->uflags & RESTART_TASKSELF) && ctx->nr_pids != 1) {
+ ckpt_err(ctx, 0, "self-restart but nr_pids=%d\n",
+ ctx->nr_pids);
return -EINVAL;
+ }
ret = init_restart_ctx(ctx, pid);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "init_restart_ctx\n");
return ret;
+ }
/*
* Populate own ->checkpoint_ctx: if an ancestor attempts to
@@ -1216,7 +1237,7 @@ static int do_restore_coord(struct ckpt_ctx *ctx, pid_t pid)
* We are a bad-behaving descendant: an ancestor must
* have prepare_descendants() us as part of a restart.
*/
- ckpt_debug("coord already has checkpoint_ctx\n");
+ ckpt_err(ctx, 0, "coord already has checkpoint_ctx\n");
return -EBUSY;
}
@@ -1229,34 +1250,46 @@ static int do_restore_coord(struct ckpt_ctx *ctx, pid_t pid)
if (ctx->uflags & RESTART_TASKSELF) {
ret = pre_restore_task();
ckpt_debug("pre restore task: %d\n", ret);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "pre restore task\n");
goto out;
+ }
ret = restore_task(ctx);
ckpt_debug("restore task: %d\n", ret);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "restore task\n");
goto out;
+ }
} else {
/* prepare descendants' t->checkpoint_ctx point to coord */
ret = prepare_descendants(ctx, ctx->root_task);
ckpt_debug("restore prepare: %d\n", ret);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "prepare_descendants\n");
goto out;
+ }
/* wait for all other tasks to complete do_restore_task() */
ret = wait_all_tasks_finish(ctx);
ckpt_debug("restore finish: %d\n", ret);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "wait_all_tasks_finish\n");
goto out;
+ }
}
ret = deferqueue_run(ctx->deferqueue); /* run deferred work */
ckpt_debug("restore deferqueue: %d\n", ret);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "restore deferqueue\n");
goto out;
+ }
ret = restore_read_tail(ctx);
ckpt_debug("restore tail: %d\n", ret);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "restore tail\n");
goto out;
+ }
if (ctx->uflags & RESTART_FROZEN) {
ret = cgroup_freezer_make_frozen(ctx->root_task);
@@ -1368,7 +1401,7 @@ long do_restart(struct ckpt_ctx *ctx, pid_t pid, unsigned long flags)
if (!ctx || (ctx->uflags & RESTART_TASKSELF)) {
if (ret < 0) {
/* partial restore is undefined: terminate */
- ckpt_debug("restart err %ld, exiting\n", ret);
+ ckpt_err(ctx, ret, "restart error, exiting\n");
force_sig(SIGKILL, current);
} else {
ret = restore_retval();
--
1.6.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/7] process.c: use ckpt_err at restart
[not found] ` <1257465619-1777-1-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-06 0:00 ` [PATCH 1/7] move handling of err down into _ckpt_do_msg and _append serue-r/Jw6+rmf7HQT0dZR+AlfA
2009-11-06 0:00 ` [PATCH 2/7] restart.c: use ckpt_err serue-r/Jw6+rmf7HQT0dZR+AlfA
@ 2009-11-06 0:00 ` serue-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <1257465619-1777-4-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-06 0:00 ` [PATCH 4/7] files.c: ckpt_err() during restore serue-r/Jw6+rmf7HQT0dZR+AlfA
` (3 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: serue-r/Jw6+rmf7HQT0dZR+AlfA @ 2009-11-06 0:00 UTC (permalink / raw)
To: orenl-RdfvBDnrOixBDgjK7y7TUQ
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
checkpoint/process.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/checkpoint/process.c b/checkpoint/process.c
index 5bc8ccc..9a56f68 100644
--- a/checkpoint/process.c
+++ b/checkpoint/process.c
@@ -581,16 +581,15 @@ static int restore_task_creds(struct ckpt_ctx *ctx)
realcred = ckpt_obj_fetch(ctx, h->cred_ref, CKPT_OBJ_CRED);
if (IS_ERR(realcred)) {
- ckpt_debug("Error %ld fetching realcred (ref %d)\n",
- PTR_ERR(realcred), h->cred_ref);
ret = PTR_ERR(realcred);
+ ckpt_err(ctx, ret, "%(O)Error fetching realcred\n",
+ h->cred_ref);
goto out;
}
ecred = ckpt_obj_fetch(ctx, h->ecred_ref, CKPT_OBJ_CRED);
if (IS_ERR(ecred)) {
- ckpt_debug("Error %ld fetching ecred (ref %d)\n",
- PTR_ERR(ecred), h->ecred_ref);
ret = PTR_ERR(ecred);
+ ckpt_err(ctx, ret, "%(O)Error fetching ecred\n", h->ecred_ref);
goto out;
}
ctx->realcred = realcred;
@@ -614,19 +613,20 @@ static int restore_task_objs(struct ckpt_ctx *ctx)
*/
ret = restore_task_creds(ctx);
if (ret < 0) {
- ckpt_debug("restore_task_creds returned %d\n", ret);
+ ckpt_err(ctx, ret, "restore_task_creds\n");
return ret;
}
ret = restore_task_ns(ctx);
if (ret < 0) {
- ckpt_debug("restore_task_ns returned %d\n", ret);
+ ckpt_err(ctx, ret, "restore_task_ns\n");
return ret;
}
h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_TASK_OBJS);
if (IS_ERR(h)) {
- ckpt_debug("Error fetching task obj\n");
- return PTR_ERR(h);
+ ret = PTR_ERR(h);
+ ckpt_err(ctx, ret, "fetching task obj\n");
+ return ret;
}
ret = restore_obj_file_table(ctx, h->files_objref);
--
1.6.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/7] files.c: ckpt_err() during restore
[not found] ` <1257465619-1777-1-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
` (2 preceding siblings ...)
2009-11-06 0:00 ` [PATCH 3/7] process.c: use ckpt_err at restart serue-r/Jw6+rmf7HQT0dZR+AlfA
@ 2009-11-06 0:00 ` serue-r/Jw6+rmf7HQT0dZR+AlfA
2009-11-06 0:00 ` [PATCH 5/7] kernel/cred.c: ckpt_err at restart serue-r/Jw6+rmf7HQT0dZR+AlfA
` (2 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: serue-r/Jw6+rmf7HQT0dZR+AlfA @ 2009-11-06 0:00 UTC (permalink / raw)
To: orenl-RdfvBDnrOixBDgjK7y7TUQ
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
checkpoint/files.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/checkpoint/files.c b/checkpoint/files.c
index 14ccfd6..9686b0b 100644
--- a/checkpoint/files.c
+++ b/checkpoint/files.c
@@ -649,6 +649,9 @@ static struct file *do_restore_file(struct ckpt_ctx *ctx)
if (ops->restore)
file = ops->restore(ctx, h);
out:
+ if (IS_ERR(file))
+ ckpt_err(ctx, PTR_ERR(file), "file restore\n");
+
ckpt_hdr_put(ctx, h);
return file;
}
@@ -738,8 +741,10 @@ static struct files_struct *do_restore_file_table(struct ckpt_ctx *ctx)
/* point of no return -- close all file descriptors */
ret = close_all_fds(current->files);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "closing fds\n");
goto out;
+ }
for (i = 0; i < h->fdt_nfds; i++) {
ret = restore_file_desc(ctx);
@@ -757,6 +762,7 @@ static struct files_struct *do_restore_file_table(struct ckpt_ctx *ctx)
files = current->files;
atomic_inc(&files->count);
} else {
+ ckpt_err(ctx, ret, "restoring file table\n");
files = ERR_PTR(ret);
}
return files;
--
1.6.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 5/7] kernel/cred.c: ckpt_err at restart
[not found] ` <1257465619-1777-1-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
` (3 preceding siblings ...)
2009-11-06 0:00 ` [PATCH 4/7] files.c: ckpt_err() during restore serue-r/Jw6+rmf7HQT0dZR+AlfA
@ 2009-11-06 0:00 ` serue-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <1257465619-1777-6-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-06 0:00 ` [PATCH 6/7] have ckpt_err set ctx->errno serue-r/Jw6+rmf7HQT0dZR+AlfA
2009-11-06 0:00 ` [PATCH 7/7] (debug) print vpids for all restarting tasks serue-r/Jw6+rmf7HQT0dZR+AlfA
6 siblings, 1 reply; 18+ messages in thread
From: serue-r/Jw6+rmf7HQT0dZR+AlfA @ 2009-11-06 0:00 UTC (permalink / raw)
To: orenl-RdfvBDnrOixBDgjK7y7TUQ
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
kernel/cred.c | 46 ++++++++++++++++++++++++++++++++++++----------
1 files changed, 36 insertions(+), 10 deletions(-)
diff --git a/kernel/cred.c b/kernel/cred.c
index 62d28a4..c941078 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -764,32 +764,46 @@ static struct cred *do_restore_cred(struct ckpt_ctx *ctx)
int i;
h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_CRED);
- if (IS_ERR(h))
+ if (IS_ERR(h)) {
+ ckpt_err(ctx, ret, "reading cred entry\n");
return ERR_PTR(PTR_ERR(h));
+ }
cred = prepare_creds();
- if (!cred)
+ if (!cred) {
+ ckpt_err(ctx, ret, "prepare_creds()\n");
goto error;
+ }
/* Do we care if the target user and target group were compatible?
* Probably. But then, we can't do any setuid without CAP_SETUID,
* so we must have been privileged to abuse it... */
groupinfo = ckpt_obj_fetch(ctx, h->groupinfo_ref, CKPT_OBJ_GROUPINFO);
- if (IS_ERR(groupinfo))
+ if (IS_ERR(groupinfo)) {
+ ret = PTR_ERR(groupinfo);
+ ckpt_err(ctx, ret, "%(O)fetching group\n", h->groupinfo_ref);
goto err_putcred;
+ }
user = ckpt_obj_fetch(ctx, h->user_ref, CKPT_OBJ_USER);
- if (IS_ERR(user))
+ if (IS_ERR(user)) {
+ ret = PTR_ERR(user);
+ ckpt_err(ctx, ret, "%(O)fetching user\n", h->user_ref);
goto err_putcred;
+ }
/*
* TODO: this check should go into the common helper in
* kernel/sys.c, and should account for user namespaces
*/
+ ret = -EPERM;
if (!capable(CAP_SETGID))
for (i = 0; i < groupinfo->ngroups; i++) {
- if (!in_egroup_p(GROUP_AT(groupinfo, i)))
+ gid_t g = GROUP_AT(groupinfo, i);
+ if (!in_egroup_p(g)) {
+ ckpt_err(ctx, ret, "group %d\n", g);
goto err_putcred;
+ }
}
ret = set_groups(cred, groupinfo);
if (ret < 0)
@@ -797,20 +811,32 @@ static struct cred *do_restore_cred(struct ckpt_ctx *ctx)
free_uid(cred->user);
cred->user = get_uid(user);
ret = cred_setresuid(cred, h->uid, h->euid, h->suid);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "setting uid %d euid %d suid %d\n",
+ h->uid, h->euid, h->suid);
goto err_putcred;
+ }
ret = cred_setfsuid(cred, h->fsuid, &olduid);
- if (olduid != h->fsuid && ret < 0)
+ if (olduid != h->fsuid && ret < 0) {
+ ckpt_err(ctx, ret, "setting fs uid %d\n", h->fsuid);
goto err_putcred;
+ }
ret = cred_setresgid(cred, h->gid, h->egid, h->sgid);
- if (ret < 0)
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "setting gid %d egid %d sgid %d\n",
+ h->gid, h->egid, h->sgid);
goto err_putcred;
+ }
ret = cred_setfsgid(cred, h->fsgid, &oldgid);
- if (oldgid != h->fsgid && ret < 0)
+ if (oldgid != h->fsgid && ret < 0) {
+ ckpt_err(ctx, ret, "setting fs gid %d\n", h->fsgid);
goto err_putcred;
+ }
ret = restore_capabilities(&h->cap_s, cred);
- if (ret)
+ if (ret) {
+ ckpt_err(ctx, ret, "restoring capabilities\n");
goto err_putcred;
+ }
ckpt_hdr_put(ctx, h);
return cred;
--
1.6.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 6/7] have ckpt_err set ctx->errno
[not found] ` <1257465619-1777-1-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
` (4 preceding siblings ...)
2009-11-06 0:00 ` [PATCH 5/7] kernel/cred.c: ckpt_err at restart serue-r/Jw6+rmf7HQT0dZR+AlfA
@ 2009-11-06 0:00 ` serue-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <1257465619-1777-7-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-06 0:00 ` [PATCH 7/7] (debug) print vpids for all restarting tasks serue-r/Jw6+rmf7HQT0dZR+AlfA
6 siblings, 1 reply; 18+ messages in thread
From: serue-r/Jw6+rmf7HQT0dZR+AlfA @ 2009-11-06 0:00 UTC (permalink / raw)
To: orenl-RdfvBDnrOixBDgjK7y7TUQ
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Move setting of ctx->errno into do_ckpt_msg(). We leave a small separate
function to notify all restarting tasks in case of failure. We use a
CKPT_CTX_WOKEN bit for this. If we always notify restarting tasks of
failure inside do_ckpt_msg() then we will have to be more careful about
when ckpt_err() is called at restart. Only 4 instances were doing the
wakeup right now (at the end of each possible restart sequence), and we
don't want to risk moving notification too early.
This patch also fixes what should have been a problem before: always
init_completion(&complete), else we might complete(&complete) on random junk
Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
checkpoint/restart.c | 66 +++++++++++++++++++++----------------------
checkpoint/sys.c | 11 ++-----
include/linux/checkpoint.h | 13 +++-----
3 files changed, 40 insertions(+), 50 deletions(-)
diff --git a/checkpoint/restart.c b/checkpoint/restart.c
index e1bd0ad..9e2647e 100644
--- a/checkpoint/restart.c
+++ b/checkpoint/restart.c
@@ -58,6 +58,20 @@ struct ckpt_task_status {
struct list_head list;
};
+/*
+ * restore_wake_all_on_error() is always called after a ckpt_err()
+ * with err < 0. So CKPT_CTX_ERROR will always be set before this
+ * gets called.
+ */
+static void restore_wake_all_on_error(struct ckpt_ctx *ctx)
+{
+ if (!ckpt_test_and_set_ctx_kflag(ctx, CKPT_CTX_WOKEN)) {
+ complete(&ctx->complete);
+ wake_up_all(&ctx->waitq);
+ wake_up_all(&ctx->ghostq);
+ }
+}
+
static int restore_debug_task(struct ckpt_ctx *ctx, int flags)
{
struct ckpt_task_status *s;
@@ -714,29 +728,6 @@ static inline int is_task_active(struct ckpt_ctx *ctx, pid_t pid)
return get_active_pid(ctx) == pid;
}
-/* should not be called under write_lock_irq(&tasklist_lock) */
-static void _restore_notify_error(struct ckpt_ctx *ctx, int errno)
-{
- /* first to fail: notify everyone (racy but harmless) */
- if (!ckpt_test_ctx_error(ctx)) {
- ckpt_debug("setting restart error %d\n", errno); \
- ckpt_set_ctx_error(ctx, errno);
- complete(&ctx->complete);
- wake_up_all(&ctx->waitq);
- wake_up_all(&ctx->ghostq);
- }
-}
-
-/*
- * Need to call ckpt_debug such that it will get the correct source
- * location. Should not be called under write_lock_irq(&tasklist_lock)
-*/
-#define restore_notify_error(ctx, errno) \
-do { \
- ckpt_debug("restart error %d, root pid %d\n", errno, ctx->root_pid); \
- _restore_notify_error(ctx, errno); \
-} while(0)
-
static inline struct ckpt_ctx *get_task_ctx(struct task_struct *task)
{
struct ckpt_ctx *ctx;
@@ -781,7 +772,8 @@ static void clear_task_ctx(struct task_struct *task)
static void restore_task_done(struct ckpt_ctx *ctx)
{
if (atomic_dec_and_test(&ctx->nr_total))
- complete(&ctx->complete);
+ if (!ckpt_test_and_set_ctx_kflag(ctx, CKPT_CTX_WOKEN))
+ complete(&ctx->complete);
BUG_ON(atomic_read(&ctx->nr_total) < 0);
}
@@ -808,8 +800,8 @@ static int restore_activate_next(struct ckpt_ctx *ctx)
rcu_read_unlock();
if (!task) {
- ckpt_err(ctx, 0, "could not find task %d\n", pid);
- restore_notify_error(ctx, -ESRCH);
+ ckpt_err(ctx, -ESRCH, "could not find task %d\n", pid);
+ restore_wake_all_on_error(ctx);
return -ESRCH;
}
} else {
@@ -893,8 +885,10 @@ static int do_ghost_task(void)
ckpt_test_ctx_error(ctx));
out:
restore_debug_error(ctx, ret);
- if (ret < 0)
- restore_notify_error(ctx, ret);
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "waiting on ghostq\n");
+ restore_wake_all_on_error(ctx);
+ }
current->exit_signal = -1;
restore_debug_exit(ctx);
@@ -1013,8 +1007,10 @@ static int do_restore_task(void)
ret = wait_task_sync(ctx);
out:
restore_debug_error(ctx, ret);
- if (ret < 0)
- restore_notify_error(ctx, ret);
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "wait_task_sync\n");
+ restore_wake_all_on_error(ctx);
+ }
post_restore_task();
current->flags &= ~PF_RESTARTING;
@@ -1097,8 +1093,6 @@ static int wait_all_tasks_finish(struct ckpt_ctx *ctx)
{
int ret;
- init_completion(&ctx->complete);
-
BUG_ON(ctx->active_pid != -1);
ret = restore_activate_next(ctx);
if (ret < 0)
@@ -1247,6 +1241,8 @@ static int do_restore_coord(struct ckpt_ctx *ctx, pid_t pid)
* subtree that we marked for restart - see below.
*/
+ init_completion(&ctx->complete);
+
if (ctx->uflags & RESTART_TASKSELF) {
ret = pre_restore_task();
ckpt_debug("pre restore task: %d\n", ret);
@@ -1301,8 +1297,10 @@ static int do_restore_coord(struct ckpt_ctx *ctx, pid_t pid)
restore_debug_error(ctx, ret);
- if (ret < 0)
- ckpt_set_ctx_error(ctx, ret);
+ if (ret < 0) {
+ ckpt_err(ctx, ret, "coordinator noting error\n");
+ restore_wake_all_on_error(ctx);
+ }
if (ckpt_test_ctx_error(ctx)) {
destroy_descendants(ctx);
diff --git a/checkpoint/sys.c b/checkpoint/sys.c
index c1c4e99..8cd9baa 100644
--- a/checkpoint/sys.c
+++ b/checkpoint/sys.c
@@ -457,14 +457,6 @@ static void _ckpt_msg_appendv(struct ckpt_ctx *ctx, int err, char *fmt,
int len = ctx->msglen;
if (err) {
- /* At restart we must use a more baroque helper to set
- * ctx->errno, which also wakes all other waiting restarting
- * tasks. But at checkpoint we just set ctx->errno so that
- * _ckpt_msg_complete() will know to write the error message
- * to the checkpoint image.
- */
- if (ctx->kflags & CKPT_CTX_CHECKPOINT && !ctx->errno)
- ctx->errno = err;
len += snprintf(&ctx->msg[len], CKPT_MSG_LEN-len, "[err %d]",
err);
if (len > CKPT_MSG_LEN)
@@ -533,10 +525,13 @@ void _do_ckpt_msg(struct ckpt_ctx *ctx, int err, char *fmt, ...)
__do_ckpt_msg(ctx, err, fmt);
}
+
void do_ckpt_msg(struct ckpt_ctx *ctx, int err, char *fmt, ...)
{
if (!ctx) return;
+ if (err && !ckpt_test_and_set_ctx_kflag(ctx, CKPT_CTX_ERROR))
+ ctx->errno = err;
ckpt_msg_lock(ctx);
__do_ckpt_msg(ctx, err, fmt);
_ckpt_msg_complete(ctx);
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 8fd6cba..2920630 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -40,11 +40,13 @@
#define CKPT_CTX_RESTART_BIT 1
#define CKPT_CTX_SUCCESS_BIT 2
#define CKPT_CTX_ERROR_BIT 3
+#define CKPT_CTX_WOKEN_BIT 4 /* after ERROR is set, wake all */
#define CKPT_CTX_CHECKPOINT (1 << CKPT_CTX_CHECKPOINT_BIT)
#define CKPT_CTX_RESTART (1 << CKPT_CTX_RESTART_BIT)
#define CKPT_CTX_SUCCESS (1 << CKPT_CTX_SUCCESS_BIT)
#define CKPT_CTX_ERROR (1 << CKPT_CTX_ERROR_BIT)
+#define CKPT_CTX_WOKEN (1 << CKPT_CTX_WOKEN_BIT)
/* ckpt_ctx: uflags */
#define CHECKPOINT_USER_FLAGS CHECKPOINT_SUBTREE
@@ -53,6 +55,9 @@
RESTART_FROZEN | \
RESTART_GHOST)
+#define ckpt_test_and_set_ctx_kflag(__ctx, __kflag) \
+ test_and_set_bit(__kflag##_BIT, &(__ctx)->kflags)
+
extern int walk_task_subtree(struct task_struct *task,
int (*func)(struct task_struct *, void *),
void *data);
@@ -99,17 +104,9 @@ extern int ckpt_sock_getnames(struct ckpt_ctx *ctx,
/* ckpt kflags */
#define ckpt_set_ctx_kflag(__ctx, __kflag) \
set_bit(__kflag##_BIT, &(__ctx)->kflags)
-#define ckpt_test_and_set_ctx_kflag(__ctx, __kflag) \
- test_and_set_bit(__kflag##_BIT, &(__ctx)->kflags)
#define ckpt_set_ctx_success(ctx) ckpt_set_ctx_kflag(ctx, CKPT_CTX_SUCCESS)
-static inline void ckpt_set_ctx_error(struct ckpt_ctx *ctx, int errno)
-{
- if (!ckpt_test_and_set_ctx_kflag(ctx, CKPT_CTX_ERROR))
- ctx->errno = errno;
-}
-
#define ckpt_test_ctx_error(ctx) \
((ctx)->kflags & CKPT_CTX_ERROR)
#define ckpt_test_ctx_complete(ctx) \
--
1.6.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 7/7] (debug) print vpids for all restarting tasks
[not found] ` <1257465619-1777-1-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
` (5 preceding siblings ...)
2009-11-06 0:00 ` [PATCH 6/7] have ckpt_err set ctx->errno serue-r/Jw6+rmf7HQT0dZR+AlfA
@ 2009-11-06 0:00 ` serue-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <1257465619-1777-8-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
6 siblings, 1 reply; 18+ messages in thread
From: serue-r/Jw6+rmf7HQT0dZR+AlfA @ 2009-11-06 0:00 UTC (permalink / raw)
To: orenl-RdfvBDnrOixBDgjK7y7TUQ
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
checkpoint/restart.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/checkpoint/restart.c b/checkpoint/restart.c
index 9e2647e..724b7b2 100644
--- a/checkpoint/restart.c
+++ b/checkpoint/restart.c
@@ -158,7 +158,7 @@ void restore_debug_free(struct ckpt_ctx *ctx)
ctx->errno);
ckpt_debug("kflags %lu uflags %lu oflags %lu", ctx->kflags,
ctx->uflags, ctx->oflags);
- for (i = 0; i < ctx->active_pid; i++)
+ for (i = 0; i < ctx->nr_pids; i++)
ckpt_debug("task[%d] to run %d\n", i, ctx->pids_arr[i].vpid);
list_for_each_entry_safe(s, p, &ctx->task_status, list) {
--
1.6.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/7] restart.c: use ckpt_err
[not found] ` <1257465619-1777-3-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-11-16 16:02 ` Oren Laadan
[not found] ` <4B017780.6080609-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
0 siblings, 1 reply; 18+ messages in thread
From: Oren Laadan @ 2009-11-16 16:02 UTC (permalink / raw)
To: serue-r/Jw6+rmf7HQT0dZR+AlfA
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
> From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>
> Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>
> Changelog:
> Nove 4: update restart.c ckpt_err usage
>
> Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
> checkpoint/restart.c | 81 +++++++++++++++++++++++++++++++++++---------------
> 1 files changed, 57 insertions(+), 24 deletions(-)
Is it your intent to entirely get rid of ckpt_debug() ?
We originally discussed two levels of details: only error status
or a detailed log (and we also thought of a detailed debug, that can
be compiled out to save space). How does that fit with the patch(es) ?
To "define" what's "error status" and what's "log" (and maybe what's
"debug"), I suggest a test like:
1) error status: what conveys the most specific reason of failure,
e.g. "failed to open file to restore fd"; The caller should be able
to assume that the total message(s) length will not exceed a pipe
buffer.
2) log status: that gives status about progress, or what lead to and
what followed an error, e.g. file open failure may have happened
when restoring a file descriptor, or when restore a vma, so a log
like "failed to restore vma" would be helpful.
3) debug status: that we want to be able to compile out without having
to reintroduce it for every bug that it may help us debug.
>
> diff --git a/checkpoint/restart.c b/checkpoint/restart.c
> index 130b4b2..e1bd0ad 100644
> --- a/checkpoint/restart.c
> +++ b/checkpoint/restart.c
> @@ -64,7 +64,7 @@ static int restore_debug_task(struct ckpt_ctx *ctx, int flags)
>
> s = kmalloc(sizeof(*s), GFP_KERNEL);
> if (!s) {
> - ckpt_debug("no memory to register ?!\n");
> + ckpt_err(ctx, 0, "no memory to register ?!\n");
> return -ENOMEM;
What is the purpose in passing '0' instead of -ENOMEM to ckpt_err() ?
(a few more instances below).
> }
> s->pid = current->pid;
> @@ -197,13 +197,13 @@ static int _ckpt_read_err(struct ckpt_ctx *ctx, struct ckpt_hdr *h)
> len = h->len - sizeof(*h);
> ptr = kzalloc(len + 1, GFP_KERNEL);
> if (!ptr) {
> - ckpt_debug("insufficient memory to report image error\n");
> + ckpt_err(ctx, 0, "insufficient memory to report image error\n");
> return -ENOMEM;
> }
>
> ret = ckpt_kread(ctx, ptr, len);
> if (ret >= 0) {
> - ckpt_debug("%s\n", &ptr[1]);
> + ckpt_err(ctx, 0, "%(S)Error recorded in image\n", &ptr[1]);
> ret = -EIO;
> }
>
> @@ -757,7 +757,8 @@ static int set_task_ctx(struct task_struct *task, struct ckpt_ctx *ctx)
> task->checkpoint_ctx = ckpt_ctx_get(ctx);
> ret = 0;
> } else {
> - ckpt_debug("task %d has checkpoint_ctx\n", task_pid_vnr(task));
> + ckpt_err(ctx, 0, "task %d already has checkpoint_ctx\n",
> + task_pid_vnr(task));
> ret = 1;
> }
> task_unlock(task);
> @@ -807,7 +808,7 @@ static int restore_activate_next(struct ckpt_ctx *ctx)
> rcu_read_unlock();
>
> if (!task) {
> - ckpt_debug("could not find task %d\n", pid);
> + ckpt_err(ctx, 0, "could not find task %d\n", pid);
> restore_notify_error(ctx, -ESRCH);
> return -ESRCH;
> }
> @@ -962,29 +963,38 @@ static int do_restore_task(void)
> current->flags |= PF_RESTARTING;
>
> ret = wait_sync_threads();
> - if (ret < 0)
> + if (ret < 0) {
> + ckpt_err(ctx, ret, "wait_sync_threads\n");
(#3) - The only error in wait_sync_threads() can be EINTR, so this
is only necessary for our debugging.
> goto out;
> + }
>
> /* wait for our turn, do the restore, and tell next task in line */
> ret = wait_task_active(ctx);
> - if (ret < 0)
> + if (ret < 0) {
> + ckpt_err(ctx, ret, "wait_task_active\n");
Ditto (it could also be EBUSY - but there are ckpt_debug() statements
there).
> goto out;
> + }
>
> restore_debug_running(ctx);
>
> ret = pre_restore_task();
> - if (ret < 0)
> + if (ret < 0) {
> + ckpt_err(ctx, ret, "pre_restore_task\n");
> goto out;
> + }
>
> zombie = restore_task(ctx);
> if (zombie < 0) {
> ret = zombie;
> + ckpt_err(ctx, ret, "restore_task\n");
I'd assume that restore_task() will already have reported a specific
error that caused the failure. So we can save on code size but not
compiling this in when non-debug.
> goto out;
> }
>
> ret = restore_activate_next(ctx);
> - if (ret < 0)
> + if (ret < 0) {
> + ckpt_err(ctx, ret, "restore_activate_next\n");
> goto out;
> + }
>
> /*
> * zombie: we're done here; do_exit() will notice the @ctx on
> @@ -1025,12 +1035,12 @@ static int __prepare_descendants(struct task_struct *task, void *data)
> ckpt_debug("consider task %d\n", task_pid_vnr(task));
>
> if (!ptrace_may_access(task, PTRACE_MODE_ATTACH)) {
> - ckpt_debug("stranger task %d\n", task_pid_vnr(task));
> + ckpt_err(ctx, 0, "stranger task %d\n", task_pid_vnr(task));
This is called with tasklist_lock taken...
> return -EPERM;
> }
>
> if (task_ptrace(task) & PT_PTRACED) {
> - ckpt_debug("ptraced task %d\n", task_pid_vnr(task));
> + ckpt_err(ctx, 0, "ptraced task %d\n", task_pid_vnr(task));
And here too.
> return -EBUSY;
> }
>
[...]
There are similar comments also below.
Are you still concerned about the increase in code size with c/r ?
Oren.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/7] process.c: use ckpt_err at restart
[not found] ` <1257465619-1777-4-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-11-16 16:09 ` Oren Laadan
0 siblings, 0 replies; 18+ messages in thread
From: Oren Laadan @ 2009-11-16 16:09 UTC (permalink / raw)
To: serue-r/Jw6+rmf7HQT0dZR+AlfA
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
> From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>
> Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
> checkpoint/process.c | 16 ++++++++--------
> 1 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/checkpoint/process.c b/checkpoint/process.c
> index 5bc8ccc..9a56f68 100644
> --- a/checkpoint/process.c
> +++ b/checkpoint/process.c
> @@ -581,16 +581,15 @@ static int restore_task_creds(struct ckpt_ctx *ctx)
>
> realcred = ckpt_obj_fetch(ctx, h->cred_ref, CKPT_OBJ_CRED);
> if (IS_ERR(realcred)) {
> - ckpt_debug("Error %ld fetching realcred (ref %d)\n",
> - PTR_ERR(realcred), h->cred_ref);
> ret = PTR_ERR(realcred);
> + ckpt_err(ctx, ret, "%(O)Error fetching realcred\n",
> + h->cred_ref);
Would it make sense to report fetching (and any other object related
generic error) in the original function - in this case ckpt_obj_fetch(),
something like:
ckpt_err(ctx, ret, "%(O)Error fetching object type %s\n",...)
[...]
Oren.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 5/7] kernel/cred.c: ckpt_err at restart
[not found] ` <1257465619-1777-6-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-11-16 16:15 ` Oren Laadan
[not found] ` <4B017AA5.60503-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
0 siblings, 1 reply; 18+ messages in thread
From: Oren Laadan @ 2009-11-16 16:15 UTC (permalink / raw)
To: serue-r/Jw6+rmf7HQT0dZR+AlfA
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
> From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>
> Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
> kernel/cred.c | 46 ++++++++++++++++++++++++++++++++++++----------
> 1 files changed, 36 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/cred.c b/kernel/cred.c
> index 62d28a4..c941078 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -764,32 +764,46 @@ static struct cred *do_restore_cred(struct ckpt_ctx *ctx)
> int i;
>
> h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_CRED);
> - if (IS_ERR(h))
> + if (IS_ERR(h)) {
> + ckpt_err(ctx, ret, "reading cred entry\n");
This error is better reported for _all_ callers of ckpt_read_obj_type()
from that function ?
I am thinking that moving the report to the specific place where an
error occurs, plus data from the image file (and location there)
should be sufficient for debugging.
[...]
Oren.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 6/7] have ckpt_err set ctx->errno
[not found] ` <1257465619-1777-7-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-11-16 16:24 ` Oren Laadan
[not found] ` <4B017CB4.10707-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
0 siblings, 1 reply; 18+ messages in thread
From: Oren Laadan @ 2009-11-16 16:24 UTC (permalink / raw)
To: serue-r/Jw6+rmf7HQT0dZR+AlfA
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
> From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>
> Move setting of ctx->errno into do_ckpt_msg(). We leave a small separate
> function to notify all restarting tasks in case of failure. We use a
> CKPT_CTX_WOKEN bit for this. If we always notify restarting tasks of
> failure inside do_ckpt_msg() then we will have to be more careful about
> when ckpt_err() is called at restart. Only 4 instances were doing the
I don't think it is wrong to notify of errors and wakeup tasks
"too early" or multiple times. In fact, the first one to notify
an error (test-and-set of ctx->errno) should do that. What am I
missing ?
> wakeup right now (at the end of each possible restart sequence), and we
> don't want to risk moving notification too early.
>
> This patch also fixes what should have been a problem before: always
> init_completion(&complete), else we might complete(&complete) on random junk
Nice catch.
[...]
Oren.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/7] restart.c: use ckpt_err
[not found] ` <4B017780.6080609-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-11-16 16:43 ` Serge E. Hallyn
[not found] ` <20091116164314.GA16493-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 18+ messages in thread
From: Serge E. Hallyn @ 2009-11-16 16:43 UTC (permalink / raw)
To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> Is it your intent to entirely get rid of ckpt_debug() ?
Replace with a new ckpt_log(), yes. I think that's too much to do
all at once so figured ckpt_err() in v19, then start adding ckpt_log(),
and converting callers one file at a time.
>
> We originally discussed two levels of details: only error status
> or a detailed log (and we also thought of a detailed debug, that can
> be compiled out to save space). How does that fit with the patch(es) ?
FIts perfectly. ckpt_err() always is dumped, ckpt_log() can be deemed
'informative' and optionally dumped. When we implement it.
> To "define" what's "error status" and what's "log" (and maybe what's
> "debug"), I suggest a test like:
>
> 1) error status: what conveys the most specific reason of failure,
> e.g. "failed to open file to restore fd"; The caller should be able
> to assume that the total message(s) length will not exceed a pipe
> buffer.
>
> 2) log status: that gives status about progress, or what lead to and
> what followed an error, e.g. file open failure may have happened
> when restoring a file descriptor, or when restore a vma, so a log
> like "failed to restore vma" would be helpful.
I think 'failed to open file' should always be 'error', so that we
know which file failed to open. If all we print is a generic
'failed to restore open files' then the user isn't much better off
than getting -EBADF for sys_restart().
> 3) debug status: that we want to be able to compile out without having
> to reintroduce it for every bug that it may help us debug.
<shrug> This may be useful and good, but in any case starting
with just implementing (1) seemed like the most practical approach.
The patchset accomplishes getting rid of ckpt_write_err(), and sending
error messages to the user logfile, so I think it's plenty useful
without trying to do everything (with resulting in all the extra
patch churn).
> > diff --git a/checkpoint/restart.c b/checkpoint/restart.c
> > index 130b4b2..e1bd0ad 100644
> > --- a/checkpoint/restart.c
> > +++ b/checkpoint/restart.c
> > @@ -64,7 +64,7 @@ static int restore_debug_task(struct ckpt_ctx *ctx, int flags)
> >
> > s = kmalloc(sizeof(*s), GFP_KERNEL);
> > if (!s) {
> > - ckpt_debug("no memory to register ?!\n");
> > + ckpt_err(ctx, 0, "no memory to register ?!\n");
> > return -ENOMEM;
>
> What is the purpose in passing '0' instead of -ENOMEM to ckpt_err() ?
> (a few more instances below).
Hmm, I think that can pass errno now. I probably had done that bc
originally ckpt_err() was going to do the restore_notify_error
too.
> Are you still concerned about the increase in code size with c/r ?
Yes, I am. But our first priority should be to empower a user to
debug why a checkpoint or restart failed. Once we're settled with
that, we can look at how to decrease code size. Compiling out the
log and debug messages is fair game imo, but compiling out ckpt_err()
is not. If users can't tell that checkpoint failed because they had
an unlinked file which used to be called .vimrc open, then I don't
think we can reasonably hope to get this upstream (as per previous
'toy implementation' arguments).
-serge
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 5/7] kernel/cred.c: ckpt_err at restart
[not found] ` <4B017AA5.60503-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-11-16 16:51 ` Serge E. Hallyn
0 siblings, 0 replies; 18+ messages in thread
From: Serge E. Hallyn @ 2009-11-16 16:51 UTC (permalink / raw)
To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>
>
> serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
> > From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> >
> > Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> > ---
> > kernel/cred.c | 46 ++++++++++++++++++++++++++++++++++++----------
> > 1 files changed, 36 insertions(+), 10 deletions(-)
> >
> > diff --git a/kernel/cred.c b/kernel/cred.c
> > index 62d28a4..c941078 100644
> > --- a/kernel/cred.c
> > +++ b/kernel/cred.c
> > @@ -764,32 +764,46 @@ static struct cred *do_restore_cred(struct ckpt_ctx *ctx)
> > int i;
> >
> > h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_CRED);
> > - if (IS_ERR(h))
> > + if (IS_ERR(h)) {
> > + ckpt_err(ctx, ret, "reading cred entry\n");
>
> This error is better reported for _all_ callers of ckpt_read_obj_type()
> from that function ?
>
> I am thinking that moving the report to the specific place where an
> error occurs, plus data from the image file (and location there)
> should be sufficient for debugging.
Yeah, it would help reduce the code size too. I guess I was
thinking the caller would have extra information, but it really
doesn't, or if in the rare caes it does then it can add it. Good
idea.
thanks,
-serge
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/7] restart.c: use ckpt_err
[not found] ` <20091116164314.GA16493-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-11-16 16:57 ` Oren Laadan
0 siblings, 0 replies; 18+ messages in thread
From: Oren Laadan @ 2009-11-16 16:57 UTC (permalink / raw)
To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>> Is it your intent to entirely get rid of ckpt_debug() ?
>
> Replace with a new ckpt_log(), yes. I think that's too much to do
> all at once so figured ckpt_err() in v19, then start adding ckpt_log(),
> and converting callers one file at a time.
>
>> We originally discussed two levels of details: only error status
>> or a detailed log (and we also thought of a detailed debug, that can
>> be compiled out to save space). How does that fit with the patch(es) ?
>
> FIts perfectly. ckpt_err() always is dumped, ckpt_log() can be deemed
> 'informative' and optionally dumped. When we implement it.
Yes. I guess my point was that it seemed to me that there was
'informative' messages in ckpt_err() at several places.
>
>> To "define" what's "error status" and what's "log" (and maybe what's
>> "debug"), I suggest a test like:
>>
>> 1) error status: what conveys the most specific reason of failure,
>> e.g. "failed to open file to restore fd"; The caller should be able
>> to assume that the total message(s) length will not exceed a pipe
>> buffer.
>>
>> 2) log status: that gives status about progress, or what lead to and
>> what followed an error, e.g. file open failure may have happened
>> when restoring a file descriptor, or when restore a vma, so a log
>> like "failed to restore vma" would be helpful.
>
> I think 'failed to open file' should always be 'error', so that we
> know which file failed to open. If all we print is a generic
> 'failed to restore open files' then the user isn't much better off
> than getting -EBADF for sys_restart().
Yes, that was a bad example of "specific".. you're right.
>
>> 3) debug status: that we want to be able to compile out without having
>> to reintroduce it for every bug that it may help us debug.
>
> <shrug> This may be useful and good, but in any case starting
> with just implementing (1) seemed like the most practical approach.
> The patchset accomplishes getting rid of ckpt_write_err(), and sending
> error messages to the user logfile, so I think it's plenty useful
> without trying to do everything (with resulting in all the extra
> patch churn).
>
I agree, and also suggest to avoid proliferation of ckpt_err()
where it would otherwise be 'informative' or debug - just leave
it as is in 'ckpt_debug()' state.
>>> diff --git a/checkpoint/restart.c b/checkpoint/restart.c
>>> index 130b4b2..e1bd0ad 100644
>>> --- a/checkpoint/restart.c
>>> +++ b/checkpoint/restart.c
>>> @@ -64,7 +64,7 @@ static int restore_debug_task(struct ckpt_ctx *ctx, int flags)
>>>
>>> s = kmalloc(sizeof(*s), GFP_KERNEL);
>>> if (!s) {
>>> - ckpt_debug("no memory to register ?!\n");
>>> + ckpt_err(ctx, 0, "no memory to register ?!\n");
>>> return -ENOMEM;
>> What is the purpose in passing '0' instead of -ENOMEM to ckpt_err() ?
>> (a few more instances below).
>
> Hmm, I think that can pass errno now. I probably had done that bc
> originally ckpt_err() was going to do the restore_notify_error
> too.
>
>> Are you still concerned about the increase in code size with c/r ?
>
> Yes, I am. But our first priority should be to empower a user to
> debug why a checkpoint or restart failed. Once we're settled with
> that, we can look at how to decrease code size. Compiling out the
> log and debug messages is fair game imo, but compiling out ckpt_err()
> is not. If users can't tell that checkpoint failed because they had
> an unlinked file which used to be called .vimrc open, then I don't
> think we can reasonably hope to get this upstream (as per previous
> 'toy implementation' arguments).
In replying to other patches I suggested two ways of reducing the
size which also make the report more concise:
1) Report where the error occurs: e.g. report in ckpt_obj_fetch()
and not in the caller of ckpt_obj_fetch().
2) If function foo() returns and error, and function foo() already
reported the error, then the caller should not use ckpt_err() too.
Instead it should use log/debug mode.
Oren.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 6/7] have ckpt_err set ctx->errno
[not found] ` <4B017CB4.10707-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-11-16 17:20 ` Serge E. Hallyn
0 siblings, 0 replies; 18+ messages in thread
From: Serge E. Hallyn @ 2009-11-16 17:20 UTC (permalink / raw)
To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>
>
> serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
> > From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> >
> > Move setting of ctx->errno into do_ckpt_msg(). We leave a small separate
> > function to notify all restarting tasks in case of failure. We use a
> > CKPT_CTX_WOKEN bit for this. If we always notify restarting tasks of
> > failure inside do_ckpt_msg() then we will have to be more careful about
> > when ckpt_err() is called at restart. Only 4 instances were doing the
>
> I don't think it is wrong to notify of errors and wakeup tasks
> "too early" or multiple times. In fact, the first one to notify
> an error (test-and-set of ctx->errno) should do that. What am I
> missing ?
ckpt_err() is called before the init_completion(&complete). We could
just make sure that never happens.
-serge
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 7/7] (debug) print vpids for all restarting tasks
[not found] ` <1257465619-1777-8-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-11-16 19:08 ` Oren Laadan
0 siblings, 0 replies; 18+ messages in thread
From: Oren Laadan @ 2009-11-16 19:08 UTC (permalink / raw)
To: serue-r/Jw6+rmf7HQT0dZR+AlfA
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Pulled for v19-rc1.
Oren.
serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
> From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>
> Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
> checkpoint/restart.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/checkpoint/restart.c b/checkpoint/restart.c
> index 9e2647e..724b7b2 100644
> --- a/checkpoint/restart.c
> +++ b/checkpoint/restart.c
> @@ -158,7 +158,7 @@ void restore_debug_free(struct ckpt_ctx *ctx)
> ctx->errno);
> ckpt_debug("kflags %lu uflags %lu oflags %lu", ctx->kflags,
> ctx->uflags, ctx->oflags);
> - for (i = 0; i < ctx->active_pid; i++)
> + for (i = 0; i < ctx->nr_pids; i++)
> ckpt_debug("task[%d] to run %d\n", i, ctx->pids_arr[i].vpid);
>
> list_for_each_entry_safe(s, p, &ctx->task_status, list) {
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/7] move handling of err down into _ckpt_do_msg and _append
[not found] ` <1257465619-1777-2-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-11-16 19:25 ` Oren Laadan
0 siblings, 0 replies; 18+ messages in thread
From: Oren Laadan @ 2009-11-16 19:25 UTC (permalink / raw)
To: serue-r/Jw6+rmf7HQT0dZR+AlfA
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Pulled for v19-rc1.
Oren.
serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
> From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>
> Since ckpt_err() will also be used at restart, and since at
> restart we don't want to just set ctx->err, move handling of
> the argument 'err' further down. We only set ckpt->errno if
> err is not 0 and a checkpoint is going on. Then we automatically
> add '[err %d] % err' if err is non-0, so that the caller has a
> choice of passing in err or passing %(E) in fmt and err as a
> vararg.
>
> Changelog:
> nov 4: drop %(E) and print passed-in err instead
>
> Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
> checkpoint/sys.c | 51 +++++++++++++++++++++++++++----------------
> include/linux/checkpoint.h | 16 ++++++-------
> 2 files changed, 39 insertions(+), 28 deletions(-)
>
> diff --git a/checkpoint/sys.c b/checkpoint/sys.c
> index 7d14b30..c1c4e99 100644
> --- a/checkpoint/sys.c
> +++ b/checkpoint/sys.c
> @@ -382,14 +382,13 @@ static inline int is_special_flag(char *s)
> * The special flags are surrounded by %() to help them visually stand
> * out. For instance, %(O) means an objref. The following special
> * flags are recognized:
> - * E: error
> * O: objref
> * P: pointer
> * T: task
> * S: string
> * V: variable
> *
> - * %(E) will be expanded to "[err %d]". Likewise O, P, S, and V, will
> + * %(O) will be expanded to "[obj %d]". Likewise P, S, and V, will
> * also expand to format flags requiring an argument to the subsequent
> * sprintf or printk. T will be expanded to a string with no flags,
> * requiring no further arguments.
> @@ -401,7 +400,7 @@ static inline int is_special_flag(char *s)
> * the additional variabes, in order, to match the @fmt (except for
> * the T key), e.g.:
> *
> - * ckpt_err(ctx, "%(T)FILE flags %d %(O) %(E)\n", flags, objref, err);
> + * ckpt_err(ctx, err, "%(T)FILE flags %d %(O)\n", flags, objref);
> *
> * May be called under spinlock.
> * Must be called with ctx->msg_mutex held. The expanded format
> @@ -418,9 +417,6 @@ static void _ckpt_generate_fmt(struct ckpt_ctx *ctx, char *fmt)
> continue;
> }
> switch (fmt[2]) {
> - case 'E':
> - len += snprintf(s+len, CKPT_MSG_LEN-len, "[err %%d]");
> - break;
> case 'O':
> len += snprintf(s+len, CKPT_MSG_LEN-len, "[obj %%d]");
> break;
> @@ -455,24 +451,41 @@ static void _ckpt_generate_fmt(struct ckpt_ctx *ctx, char *fmt)
> s[len] = '\0';
> }
>
> -static void _ckpt_msg_appendv(struct ckpt_ctx *ctx, char *fmt, va_list ap)
> +static void _ckpt_msg_appendv(struct ckpt_ctx *ctx, int err, char *fmt,
> + va_list ap)
> {
> int len = ctx->msglen;
>
> + if (err) {
> + /* At restart we must use a more baroque helper to set
> + * ctx->errno, which also wakes all other waiting restarting
> + * tasks. But at checkpoint we just set ctx->errno so that
> + * _ckpt_msg_complete() will know to write the error message
> + * to the checkpoint image.
> + */
> + if (ctx->kflags & CKPT_CTX_CHECKPOINT && !ctx->errno)
> + ctx->errno = err;
> + len += snprintf(&ctx->msg[len], CKPT_MSG_LEN-len, "[err %d]",
> + err);
> + if (len > CKPT_MSG_LEN)
> + goto full;
> + }
> +
> len += vsnprintf(&ctx->msg[len], CKPT_MSG_LEN-len, fmt, ap);
> if (len > CKPT_MSG_LEN) {
> +full:
> len = CKPT_MSG_LEN;
> ctx->msg[CKPT_MSG_LEN-1] = '\0';
> }
> ctx->msglen = len;
> }
>
> -void _ckpt_msg_append(struct ckpt_ctx *ctx, char *fmt, ...)
> +void _ckpt_msg_append(struct ckpt_ctx *ctx, int err, char *fmt, ...)
> {
> va_list ap;
>
> va_start(ap, fmt);
> - _ckpt_msg_appendv(ctx, fmt, ap);
> + _ckpt_msg_appendv(ctx, err, fmt, ap);
> va_end(ap);
> }
>
> @@ -507,25 +520,25 @@ void _ckpt_msg_complete(struct ckpt_ctx *ctx)
> ctx->msglen = 0;
> }
>
> -#define __do_ckpt_msg(ctx, fmt) do { \
> - va_list ap; \
> - _ckpt_generate_fmt(ctx, fmt); \
> - va_start(ap, fmt); \
> - _ckpt_msg_appendv(ctx, ctx->fmt, ap); \
> - va_end(ap); \
> +#define __do_ckpt_msg(ctx, err, fmt) do { \
> + va_list ap; \
> + _ckpt_generate_fmt(ctx, fmt); \
> + va_start(ap, fmt); \
> + _ckpt_msg_appendv(ctx, err, ctx->fmt, ap); \
> + va_end(ap); \
> } while (0)
>
> -void _do_ckpt_msg(struct ckpt_ctx *ctx, char *fmt, ...)
> +void _do_ckpt_msg(struct ckpt_ctx *ctx, int err, char *fmt, ...)
> {
> - __do_ckpt_msg(ctx, fmt);
> + __do_ckpt_msg(ctx, err, fmt);
> }
>
> -void do_ckpt_msg(struct ckpt_ctx *ctx, char *fmt, ...)
> +void do_ckpt_msg(struct ckpt_ctx *ctx, int err, char *fmt, ...)
> {
> if (!ctx) return;
>
> ckpt_msg_lock(ctx);
> - __do_ckpt_msg(ctx, fmt);
> + __do_ckpt_msg(ctx, err, fmt);
> _ckpt_msg_complete(ctx);
> ckpt_msg_unlock(ctx);
> }
> diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
> index 89e6fb3..8fd6cba 100644
> --- a/include/linux/checkpoint.h
> +++ b/include/linux/checkpoint.h
> @@ -384,7 +384,7 @@ extern void ckpt_msg_unlock(struct ckpt_ctx *ctx);
> * May be called under spinlock.
> * Must be called under ckpt_msg_lock().
> */
> -extern void _ckpt_msg_append(struct ckpt_ctx *ctx, char *fmt, ...);
> +extern void _ckpt_msg_append(struct ckpt_ctx *ctx, int err, char *fmt, ...);
>
> /*
> * Write ctx->msg to all relevant places.
> @@ -399,7 +399,7 @@ extern void _ckpt_msg_complete(struct ckpt_ctx *ctx);
> * the caller will have to use _ckpt_msg_complete() to finish up.
> * Must be called with ckpt_msg_lock held.
> */
> -extern void _do_ckpt_msg(struct ckpt_ctx *ctx, char *fmt, ...);
> +extern void _do_ckpt_msg(struct ckpt_ctx *ctx, int err, char *fmt, ...);
>
> /*
> * Append an enhanced formatted message to ctx->msg.
> @@ -411,12 +411,11 @@ extern void _do_ckpt_msg(struct ckpt_ctx *ctx, char *fmt, ...);
> *
> * Must not be called under spinlock.
> */
> -extern void do_ckpt_msg(struct ckpt_ctx *ctx, char *fmt, ...);
> +extern void do_ckpt_msg(struct ckpt_ctx *ctx, int err, char *fmt, ...);
>
> #define ckpt_err(ctx, err, fmt, args...) do { \
> - ctx->errno = (err); \
> - do_ckpt_msg(ctx, "[Error at %s:%d][err %d]" fmt, __func__, __LINE__, \
> - (err), ##args); \
> + do_ckpt_msg(ctx, (err), "[Error at %s:%d]" fmt, __func__, __LINE__, \
> + ##args); \
> } while (0)
>
>
> @@ -427,9 +426,8 @@ extern void do_ckpt_msg(struct ckpt_ctx *ctx, char *fmt, ...);
> * must be followed by a call to _ckpt_msg_complete()
> */
> #define _ckpt_err(ctx, err, fmt, args...) do { \
> - ctx->errno = (err); \
> - _do_ckpt_msg(ctx, "[ error %s:%d][err %d]" fmt, __func__, __LINE__, \
> - (err), ##args); \
> + _do_ckpt_msg(ctx, (err), "[Error at %s:%d]" fmt, __func__, __LINE__, \
> + ##args); \
> } while (0)
>
> #endif /* CONFIG_CHECKPOINT */
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2009-11-16 19:25 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-06 0:00 [PATCH 0/7] Expand usage of ckpt_err serue-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <1257465619-1777-1-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-06 0:00 ` [PATCH 1/7] move handling of err down into _ckpt_do_msg and _append serue-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <1257465619-1777-2-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-16 19:25 ` Oren Laadan
2009-11-06 0:00 ` [PATCH 2/7] restart.c: use ckpt_err serue-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <1257465619-1777-3-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-16 16:02 ` Oren Laadan
[not found] ` <4B017780.6080609-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-11-16 16:43 ` Serge E. Hallyn
[not found] ` <20091116164314.GA16493-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-16 16:57 ` Oren Laadan
2009-11-06 0:00 ` [PATCH 3/7] process.c: use ckpt_err at restart serue-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <1257465619-1777-4-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-16 16:09 ` Oren Laadan
2009-11-06 0:00 ` [PATCH 4/7] files.c: ckpt_err() during restore serue-r/Jw6+rmf7HQT0dZR+AlfA
2009-11-06 0:00 ` [PATCH 5/7] kernel/cred.c: ckpt_err at restart serue-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <1257465619-1777-6-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-16 16:15 ` Oren Laadan
[not found] ` <4B017AA5.60503-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-11-16 16:51 ` Serge E. Hallyn
2009-11-06 0:00 ` [PATCH 6/7] have ckpt_err set ctx->errno serue-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <1257465619-1777-7-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-16 16:24 ` Oren Laadan
[not found] ` <4B017CB4.10707-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-11-16 17:20 ` Serge E. Hallyn
2009-11-06 0:00 ` [PATCH 7/7] (debug) print vpids for all restarting tasks serue-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <1257465619-1777-8-git-send-email-serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-16 19:08 ` Oren Laadan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.