All of lore.kernel.org
 help / color / mirror / Atom feed
* c/r: Add UTS support
@ 2009-03-31 20:58 Dan Smith
       [not found] ` <1238533107-11796-1-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Dan Smith @ 2009-03-31 20:58 UTC (permalink / raw)
  To: containers-qjLDD68F18O7TbgM5vRIOg

This is the latest version of my UTS patch set.  It has been re-modified to
do restart in the kernel after much ponderage of the required algorithm
to do nested restart in userspace.  Thus, it no longer requires the
modified mktree.c from earlier sets.

The namespace information has been moved to follow the per-task
information in the checkpoint stream, per Oren's recommendation.

Checkpoint and restart of nested UTS namespaces is now supported.  In
support of that, the cr_may_checkpoint_task() function is updated to be
flexible about which namespaces must patch that of the root task.

The updated stub implementation of IPC namespace patch is included at
the end again.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] Make cr_may_checkpoint_task() check each namespace individually
       [not found] ` <1238533107-11796-1-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-03-31 20:58   ` Dan Smith
       [not found]     ` <1238533107-11796-2-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-03-31 20:58   ` [PATCH 2/3] c/r: Add UTS support (v6) Dan Smith
  2009-03-31 20:58   ` [PATCH 3/3] Stub implementation of IPC namespace c/r (v2) Dan Smith
  2 siblings, 1 reply; 14+ messages in thread
From: Dan Smith @ 2009-03-31 20:58 UTC (permalink / raw)
  To: containers-qjLDD68F18O7TbgM5vRIOg

Cc: orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org
Signed-off-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 checkpoint/checkpoint.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
index ef35754..c2f0e16 100644
--- a/checkpoint/checkpoint.c
+++ b/checkpoint/checkpoint.c
@@ -302,8 +302,19 @@ static int cr_may_checkpoint_task(struct task_struct *t, struct cr_ctx *ctx)
 	if (t != current && !frozen(t))
 		return -EBUSY;
 
-	/* FIXME: change this for nested containers */
-	if (task_nsproxy(t) != ctx->root_nsproxy)
+	if (task_nsproxy(t)->uts_ns != ctx->root_nsproxy->uts_ns)
+		return -EPERM;
+
+	if (task_nsproxy(t)->ipc_ns != ctx->root_nsproxy->ipc_ns)
+		return -EPERM;
+
+	if (task_nsproxy(t)->mnt_ns != ctx->root_nsproxy->mnt_ns)
+		return -EPERM;
+
+	if (task_nsproxy(t)->pid_ns != ctx->root_nsproxy->pid_ns)
+		return -EPERM;
+
+	if (task_nsproxy(t)->net_ns != ctx->root_nsproxy->net_ns)
 		return -EPERM;
 
 	return 0;
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] c/r: Add UTS support (v6)
       [not found] ` <1238533107-11796-1-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-03-31 20:58   ` [PATCH 1/3] Make cr_may_checkpoint_task() check each namespace individually Dan Smith
@ 2009-03-31 20:58   ` Dan Smith
       [not found]     ` <1238533107-11796-3-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-03-31 20:58   ` [PATCH 3/3] Stub implementation of IPC namespace c/r (v2) Dan Smith
  2 siblings, 1 reply; 14+ messages in thread
From: Dan Smith @ 2009-03-31 20:58 UTC (permalink / raw)
  To: containers-qjLDD68F18O7TbgM5vRIOg

This patch adds a "phase" of checkpoint that saves out information about any
namespaces the task(s) may have.  Do this by tracking the namespace objects
of the tasks and making sure that tasks with the same namespace that follow
get properly referenced in the checkpoint stream.

I tested this with single and multiple task restore, on top of Oren's
v13 tree.

Changes:
  - Remove the kernel restore path
  - Punt on nested namespaces
  - Use __NEW_UTS_LEN in nodename and domainname buffers
  - Add a note to Documentation/checkpoint/internals.txt to indicate where
    in the save/restore process the UTS information is kept
  - Store (and track) the objref of the namespace itself instead of the
    nsproxy (based on comments from Dave on IRC)
  - Remove explicit check for non-root nsproxy
  - Store the nodename and domainname lengths and use cr_write_string()
    to store the actual name strings
  - Catch failure of cr_obj_add_ptr() in cr_write_namespaces()
  - Remove "types" bitfield and use the "is this new" flag to determine
    whether or not we should write out a new ns descriptor
  - Replace kernel restore path
  - Move the namespace information to be directly after the task
    information record
  - Update Documentation to reflect new location of namespace info
  - Support checkpoint and restart of nested UTS namespaces

Cc: orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org
Signed-off-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 Documentation/checkpoint/internals.txt |    1 +
 checkpoint/Makefile                    |    1 +
 checkpoint/checkpoint.c                |   66 ++++++++++++++++++++-
 checkpoint/objhash.c                   |    7 ++
 checkpoint/restart.c                   |  101 ++++++++++++++++++++++++++++++++
 include/linux/checkpoint.h             |    1 +
 include/linux/checkpoint_hdr.h         |   11 ++++
 7 files changed, 185 insertions(+), 3 deletions(-)

diff --git a/Documentation/checkpoint/internals.txt b/Documentation/checkpoint/internals.txt
index c741b6c..bdd202c 100644
--- a/Documentation/checkpoint/internals.txt
+++ b/Documentation/checkpoint/internals.txt
@@ -17,6 +17,7 @@ The order of operations, both save and restore, is as follows:
   -> thread state: elements of thread_struct and thread_info
   -> CPU state: registers etc, including FPU
   -> memory state: memory address space layout and contents
+  -> namespace information
   -> filesystem state: [TBD] filesystem namespace state, chroot, cwd, etc
   -> files state: open file descriptors and their state
   -> signals state: [TBD] pending signals and signal handling state
diff --git a/checkpoint/Makefile b/checkpoint/Makefile
index 607d864..55c5c3d 100644
--- a/checkpoint/Makefile
+++ b/checkpoint/Makefile
@@ -4,3 +4,4 @@
 
 obj-$(CONFIG_CHECKPOINT) += sys.o checkpoint.o restart.o objhash.o \
 		ckpt_mem.o rstr_mem.o ckpt_file.o rstr_file.o
+EXTRA_CFLAGS += -DDEBUG
diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
index c2f0e16..5f83e83 100644
--- a/checkpoint/checkpoint.c
+++ b/checkpoint/checkpoint.c
@@ -213,6 +213,65 @@ static int cr_write_tail(struct cr_ctx *ctx)
 	return ret;
 }
 
+static int cr_write_utsns(struct cr_ctx *ctx, struct new_utsname *name)
+{
+	struct cr_hdr h;
+	struct cr_hdr_utsns *hh = cr_hbuf_get(ctx, sizeof(*hh));
+	int ret;
+
+	h.type = CR_HDR_UTSNS;
+	h.len = sizeof(*hh);
+
+	hh->nodename_len = strlen(name->nodename) + 1;
+	hh->domainname_len = strlen(name->domainname) + 1;
+
+	ret = cr_write_obj(ctx, &h, hh);
+	if (ret < 0)
+		goto out;
+
+	ret = cr_write_string(ctx, name->nodename, hh->nodename_len);
+	if (ret < 0)
+		goto out;
+
+	ret = cr_write_string(ctx, name->domainname, hh->domainname_len);
+ out:
+	cr_hbuf_put(ctx, sizeof(*hh));
+
+	return ret;
+}
+
+static int cr_write_namespaces(struct cr_ctx *ctx, struct task_struct *t)
+{
+	struct cr_hdr h;
+	struct cr_hdr_namespaces *hh = cr_hbuf_get(ctx, sizeof(*hh));
+	struct nsproxy *nsp = t->nsproxy;
+	int ret;
+	int uts;
+
+	h.type = CR_HDR_NS;
+	h.len = sizeof(*hh);
+
+	uts = cr_obj_add_ptr(ctx, nsp->uts_ns, &hh->uts_ref, CR_OBJ_UTSNS, 0);
+	if (uts < 0)
+		goto out;
+
+	ret = cr_write_obj(ctx, &h, hh);
+	if (ret)
+		goto out;
+
+	if (uts) {
+		ret = cr_write_utsns(ctx, &nsp->uts_ns->name);
+		if (ret < 0)
+			goto out;
+	}
+
+	/* FIXME: Write other namespaces here */
+ out:
+	cr_hbuf_put(ctx, sizeof(*hh));
+
+	return ret;
+}
+
 /* dump the task_struct of a given task */
 static int cr_write_task_struct(struct cr_ctx *ctx, struct task_struct *t)
 {
@@ -267,6 +326,10 @@ static int cr_write_task(struct cr_ctx *ctx, struct task_struct *t)
 		goto out;
 	ret = cr_write_cpu(ctx, t);
 	cr_debug("cpu: ret %d\n", ret);
+	if (ret < 0)
+		goto out;
+	ret = cr_write_namespaces(ctx, t);
+	cr_debug("ns: ret %d\n", ret);
  out:
 	return ret;
 }
@@ -302,9 +365,6 @@ static int cr_may_checkpoint_task(struct task_struct *t, struct cr_ctx *ctx)
 	if (t != current && !frozen(t))
 		return -EBUSY;
 
-	if (task_nsproxy(t)->uts_ns != ctx->root_nsproxy->uts_ns)
-		return -EPERM;
-
 	if (task_nsproxy(t)->ipc_ns != ctx->root_nsproxy->ipc_ns)
 		return -EPERM;
 
diff --git a/checkpoint/objhash.c b/checkpoint/objhash.c
index 25916c1..c6ae7c1 100644
--- a/checkpoint/objhash.c
+++ b/checkpoint/objhash.c
@@ -12,6 +12,7 @@
 #include <linux/file.h>
 #include <linux/hash.h>
 #include <linux/checkpoint.h>
+#include <linux/utsname.h>
 
 struct cr_objref {
 	int objref;
@@ -38,6 +39,9 @@ static void cr_obj_ref_drop(struct cr_objref *obj)
 	case CR_OBJ_INODE:
 		iput((struct inode *) obj->ptr);
 		break;
+	case CR_OBJ_UTSNS:
+		put_uts_ns((struct uts_namespace *) obj->ptr);
+		break;
 	default:
 		BUG();
 	}
@@ -55,6 +59,9 @@ static int cr_obj_ref_grab(struct cr_objref *obj)
 		if (!igrab((struct inode *) obj->ptr))
 			ret = -EBADF;
 		break;
+	case CR_OBJ_UTSNS:
+		get_uts_ns((struct uts_namespace *) obj->ptr);
+		break;
 	default:
 		BUG();
 	}
diff --git a/checkpoint/restart.c b/checkpoint/restart.c
index d9e01ce..f42d549 100644
--- a/checkpoint/restart.c
+++ b/checkpoint/restart.c
@@ -15,6 +15,8 @@
 #include <linux/magic.h>
 #include <linux/checkpoint.h>
 #include <linux/checkpoint_hdr.h>
+#include <linux/utsname.h>
+#include <linux/syscalls.h>
 
 #include "checkpoint_arch.h"
 
@@ -237,6 +239,101 @@ static int cr_read_tail(struct cr_ctx *ctx)
 	return ret;
 }
 
+static int cr_read_utsns(struct cr_ctx *ctx, struct task_struct *t)
+{
+	struct cr_hdr_utsns hh;
+	struct uts_namespace *ns;
+	int ret;
+	char *nn = NULL;
+	char *dn = NULL;
+
+	ret = cr_read_obj_type(ctx, &hh, sizeof(hh), CR_HDR_UTSNS);
+	if (ret < 0)
+		return ret;
+
+	nn = kmalloc(hh.nodename_len, GFP_KERNEL);
+	if (!nn) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	dn = kmalloc(hh.domainname_len, GFP_KERNEL);
+	if (!dn) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = cr_read_string(ctx, nn, hh.nodename_len);
+	if (ret < 0)
+		goto out;
+
+	ret = cr_read_string(ctx, dn, hh.domainname_len);
+	if (ret < 0)
+		goto out;
+
+	ret = sys_unshare(CLONE_NEWUTS);
+	if (ret)
+		goto out;
+
+	ns = t->nsproxy->uts_ns;
+	memcpy(ns->name.nodename, nn, hh.nodename_len);
+	memcpy(ns->name.domainname, dn, hh.domainname_len);
+
+ out:
+	kfree(nn);
+	kfree(dn);
+
+	return ret;
+}
+
+static int cr_restore_utsns(struct cr_ctx *ctx, int ref)
+{
+	struct uts_namespace *uts;
+	int ret;
+
+	uts = cr_obj_get_by_ref(ctx, ref, CR_OBJ_UTSNS);
+	if (uts == NULL) {
+		ret = cr_read_utsns(ctx, current);
+		if (ret < 0)
+			return ret;
+
+		return cr_obj_add_ref(ctx, current->nsproxy->uts_ns,
+				      ref, CR_OBJ_UTSNS, 0);
+	} else if (IS_ERR(uts)) {
+		cr_debug("Failed to get UTS ns from objhash");
+		return PTR_ERR(uts);
+	}
+
+	ret = copy_namespaces(CLONE_NEWUTS, current);
+	if (ret < 0)
+		return ret;
+
+	put_uts_ns(current->nsproxy->uts_ns);
+	get_uts_ns(uts);
+	current->nsproxy->uts_ns = uts;
+
+	return 0;
+}
+
+static int cr_read_namespaces(struct cr_ctx *ctx)
+{
+	struct cr_hdr_namespaces hh;
+	int ret;
+
+	ret = cr_read_obj_type(ctx, &hh, sizeof(hh), CR_HDR_NS);
+	if (ret < 0)
+		return ret;
+
+	ret = cr_restore_utsns(ctx, hh.uts_ref);
+	cr_debug("uts ns: %d\n", ret);
+	if (ret < 0)
+		return ret;
+
+	/* FIXME: Add more namespaces here */
+
+	return 0;
+}
+
 /* read the task_struct into the current task */
 static int cr_read_task_struct(struct cr_ctx *ctx)
 {
@@ -298,6 +395,10 @@ static int cr_read_task(struct cr_ctx *ctx)
 		goto out;
 	ret = cr_read_cpu(ctx);
 	cr_debug("cpu: ret %d\n", ret);
+	if (ret < 0)
+		goto out;
+	ret = cr_read_namespaces(ctx);
+	cr_debug("ns: ret %d\n", ret);
 
  out:
 	return ret;
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 2e99c74..cb62716 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -75,6 +75,7 @@ extern void cr_ctx_put(struct cr_ctx *ctx);
 enum {
 	CR_OBJ_FILE = 1,
 	CR_OBJ_INODE,
+	CR_OBJ_UTSNS,
 	CR_OBJ_MAX
 };
 
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index 3addb48..6f29a72 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -48,6 +48,8 @@ enum {
 	CR_HDR_TASK,
 	CR_HDR_THREAD,
 	CR_HDR_CPU,
+	CR_HDR_NS,
+	CR_HDR_UTSNS,
 
 	CR_HDR_MM = 201,
 	CR_HDR_VMA,
@@ -177,4 +179,13 @@ struct cr_hdr_fd_pipe {
 	__s32 nr_bufs;
 } __attribute__((aligned(8)));
 
+struct cr_hdr_namespaces {
+	__u32 uts_ref;
+};
+
+struct cr_hdr_utsns {
+	__u32 nodename_len;
+	__u32 domainname_len;
+};
+
 #endif /* _CHECKPOINT_CKPT_HDR_H_ */
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] Stub implementation of IPC namespace c/r (v2)
       [not found] ` <1238533107-11796-1-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-03-31 20:58   ` [PATCH 1/3] Make cr_may_checkpoint_task() check each namespace individually Dan Smith
  2009-03-31 20:58   ` [PATCH 2/3] c/r: Add UTS support (v6) Dan Smith
@ 2009-03-31 20:58   ` Dan Smith
       [not found]     ` <1238533107-11796-4-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2 siblings, 1 reply; 14+ messages in thread
From: Dan Smith @ 2009-03-31 20:58 UTC (permalink / raw)
  To: containers-qjLDD68F18O7TbgM5vRIOg

Changes:
 - Update to match UTS changes

Signed-off-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 checkpoint/checkpoint.c        |   26 +++++++++++++++++++++
 checkpoint/objhash.c           |    7 +++++
 checkpoint/restart.c           |   49 ++++++++++++++++++++++++++++++++++++++++
 include/linux/checkpoint.h     |    1 +
 include/linux/checkpoint_hdr.h |    6 +++++
 5 files changed, 89 insertions(+), 0 deletions(-)

diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
index 5f83e83..e20eff3 100644
--- a/checkpoint/checkpoint.c
+++ b/checkpoint/checkpoint.c
@@ -240,6 +240,21 @@ static int cr_write_utsns(struct cr_ctx *ctx, struct new_utsname *name)
 	return ret;
 }
 
+static int cr_write_ipcns(struct cr_ctx *ctx, struct ipc_namespace *ipc)
+{
+	struct cr_hdr h;
+	struct cr_hdr_ipcns *hh = cr_hbuf_get(ctx, sizeof(*hh));
+	int ret;
+
+	h.type = CR_HDR_IPCNS;
+	h.len = sizeof(*hh);
+
+	ret = cr_write_obj(ctx, &h, hh);
+	cr_hbuf_put(ctx, sizeof(*hh));
+
+	return ret;
+}
+
 static int cr_write_namespaces(struct cr_ctx *ctx, struct task_struct *t)
 {
 	struct cr_hdr h;
@@ -247,6 +262,7 @@ static int cr_write_namespaces(struct cr_ctx *ctx, struct task_struct *t)
 	struct nsproxy *nsp = t->nsproxy;
 	int ret;
 	int uts;
+	int ipc;
 
 	h.type = CR_HDR_NS;
 	h.len = sizeof(*hh);
@@ -255,6 +271,10 @@ static int cr_write_namespaces(struct cr_ctx *ctx, struct task_struct *t)
 	if (uts < 0)
 		goto out;
 
+	ipc = cr_obj_add_ptr(ctx, nsp->ipc_ns, &hh->ipc_ref, CR_OBJ_IPCNS, 0);
+	if (ipc < 0)
+		goto out;
+
 	ret = cr_write_obj(ctx, &h, hh);
 	if (ret)
 		goto out;
@@ -265,6 +285,12 @@ static int cr_write_namespaces(struct cr_ctx *ctx, struct task_struct *t)
 			goto out;
 	}
 
+	if (ipc) {
+		ret = cr_write_ipcns(ctx, nsp->ipc_ns);
+		if (ret < 0)
+			goto out;
+	}
+
 	/* FIXME: Write other namespaces here */
  out:
 	cr_hbuf_put(ctx, sizeof(*hh));
diff --git a/checkpoint/objhash.c b/checkpoint/objhash.c
index c6ae7c1..b04f8df 100644
--- a/checkpoint/objhash.c
+++ b/checkpoint/objhash.c
@@ -13,6 +13,7 @@
 #include <linux/hash.h>
 #include <linux/checkpoint.h>
 #include <linux/utsname.h>
+#include <linux/ipc_namespace.h>
 
 struct cr_objref {
 	int objref;
@@ -42,6 +43,9 @@ static void cr_obj_ref_drop(struct cr_objref *obj)
 	case CR_OBJ_UTSNS:
 		put_uts_ns((struct uts_namespace *) obj->ptr);
 		break;
+	case CR_OBJ_IPCNS:
+		put_ipc_ns((struct ipc_namespace *) obj->ptr);
+		break;
 	default:
 		BUG();
 	}
@@ -62,6 +66,9 @@ static int cr_obj_ref_grab(struct cr_objref *obj)
 	case CR_OBJ_UTSNS:
 		get_uts_ns((struct uts_namespace *) obj->ptr);
 		break;
+	case CR_OBJ_IPCNS:
+		get_ipc_ns((struct ipc_namespace *) obj->ptr);
+		break;
 	default:
 		BUG();
 	}
diff --git a/checkpoint/restart.c b/checkpoint/restart.c
index f42d549..803816a 100644
--- a/checkpoint/restart.c
+++ b/checkpoint/restart.c
@@ -16,6 +16,7 @@
 #include <linux/checkpoint.h>
 #include <linux/checkpoint_hdr.h>
 #include <linux/utsname.h>
+#include <linux/ipc_namespace.h>
 #include <linux/syscalls.h>
 
 #include "checkpoint_arch.h"
@@ -315,6 +316,49 @@ static int cr_restore_utsns(struct cr_ctx *ctx, int ref)
 	return 0;
 }
 
+static int cr_read_ipcns(struct cr_ctx *ctx, struct task_struct *t)
+{
+	struct cr_hdr_ipcns hh;
+	int ret;
+
+	ret = cr_read_obj_type(ctx, &hh, sizeof(hh), CR_HDR_IPCNS);
+	if (ret < 0)
+		return ret;
+
+	/* FIXME: Implement this */
+
+	return 0;
+}
+
+static int cr_restore_ipcns(struct cr_ctx *ctx, int ref)
+{
+	struct ipc_namespace *ipc;
+	int ret;
+
+	ipc = cr_obj_get_by_ref(ctx, ref, CR_OBJ_IPCNS);
+	if (ipc == NULL) {
+		ret = cr_read_ipcns(ctx, current);
+		if (ret < 0)
+			return ret;
+
+		return cr_obj_add_ref(ctx, current->nsproxy->ipc_ns,
+				      ref, CR_OBJ_IPCNS, 0);
+	} else if (IS_ERR(ipc)) {
+		cr_debug("Failed to get IPC ns from objhash\n");
+		return PTR_ERR(ipc);
+	}
+
+	ret = copy_namespaces(CLONE_NEWIPC, current);
+	if (ret < 0)
+		return ret;
+
+	put_ipc_ns(current->nsproxy->ipc_ns);
+	get_ipc_ns(ipc);
+	current->nsproxy->ipc_ns = ipc;
+
+	return 0;
+}
+
 static int cr_read_namespaces(struct cr_ctx *ctx)
 {
 	struct cr_hdr_namespaces hh;
@@ -329,6 +373,11 @@ static int cr_read_namespaces(struct cr_ctx *ctx)
 	if (ret < 0)
 		return ret;
 
+	ret = cr_restore_ipcns(ctx, hh.ipc_ref);
+	cr_debug("ipc ns: %d\n", ret);
+	if (ret < 0)
+		return ret;
+
 	/* FIXME: Add more namespaces here */
 
 	return 0;
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index cb62716..2fc39fb 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -76,6 +76,7 @@ enum {
 	CR_OBJ_FILE = 1,
 	CR_OBJ_INODE,
 	CR_OBJ_UTSNS,
+	CR_OBJ_IPCNS,
 	CR_OBJ_MAX
 };
 
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index 6f29a72..9d5d935 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -50,6 +50,7 @@ enum {
 	CR_HDR_CPU,
 	CR_HDR_NS,
 	CR_HDR_UTSNS,
+	CR_HDR_IPCNS,
 
 	CR_HDR_MM = 201,
 	CR_HDR_VMA,
@@ -181,6 +182,7 @@ struct cr_hdr_fd_pipe {
 
 struct cr_hdr_namespaces {
 	__u32 uts_ref;
+	__u32 ipc_ref;
 };
 
 struct cr_hdr_utsns {
@@ -188,4 +190,8 @@ struct cr_hdr_utsns {
 	__u32 domainname_len;
 };
 
+struct cr_hdr_ipcns {
+	/* FIXME: Fill this in */
+};
+
 #endif /* _CHECKPOINT_CKPT_HDR_H_ */
-- 
1.5.6.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] c/r: Add UTS support (v6)
       [not found]     ` <1238533107-11796-3-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-04-02 17:44       ` Serge E. Hallyn
  2009-04-02 17:54         ` Dan Smith
  2009-04-02 17:48       ` Serge E. Hallyn
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Serge E. Hallyn @ 2009-04-02 17:44 UTC (permalink / raw)
  To: Dan Smith; +Cc: containers-qjLDD68F18O7TbgM5vRIOg

Quoting Dan Smith (danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> This patch adds a "phase" of checkpoint that saves out information about any
> namespaces the task(s) may have.  Do this by tracking the namespace objects
> of the tasks and making sure that tasks with the same namespace that follow
> get properly referenced in the checkpoint stream.
> 
> I tested this with single and multiple task restore, on top of Oren's
> v13 tree.
> 
> Changes:
>   - Remove the kernel restore path
>   - Punt on nested namespaces
>   - Use __NEW_UTS_LEN in nodename and domainname buffers
>   - Add a note to Documentation/checkpoint/internals.txt to indicate where
>     in the save/restore process the UTS information is kept
>   - Store (and track) the objref of the namespace itself instead of the
>     nsproxy (based on comments from Dave on IRC)
>   - Remove explicit check for non-root nsproxy
>   - Store the nodename and domainname lengths and use cr_write_string()
>     to store the actual name strings
>   - Catch failure of cr_obj_add_ptr() in cr_write_namespaces()
>   - Remove "types" bitfield and use the "is this new" flag to determine
>     whether or not we should write out a new ns descriptor
>   - Replace kernel restore path
>   - Move the namespace information to be directly after the task
>     information record
>   - Update Documentation to reflect new location of namespace info
>   - Support checkpoint and restart of nested UTS namespaces
> 
> Cc: orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org
> Signed-off-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
>  Documentation/checkpoint/internals.txt |    1 +
>  checkpoint/Makefile                    |    1 +
>  checkpoint/checkpoint.c                |   66 ++++++++++++++++++++-
>  checkpoint/objhash.c                   |    7 ++
>  checkpoint/restart.c                   |  101 ++++++++++++++++++++++++++++++++
>  include/linux/checkpoint.h             |    1 +
>  include/linux/checkpoint_hdr.h         |   11 ++++
>  7 files changed, 185 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/checkpoint/internals.txt b/Documentation/checkpoint/internals.txt
> index c741b6c..bdd202c 100644
> --- a/Documentation/checkpoint/internals.txt
> +++ b/Documentation/checkpoint/internals.txt
> @@ -17,6 +17,7 @@ The order of operations, both save and restore, is as follows:
>    -> thread state: elements of thread_struct and thread_info
>    -> CPU state: registers etc, including FPU
>    -> memory state: memory address space layout and contents
> +  -> namespace information
>    -> filesystem state: [TBD] filesystem namespace state, chroot, cwd, etc
>    -> files state: open file descriptors and their state
>    -> signals state: [TBD] pending signals and signal handling state
> diff --git a/checkpoint/Makefile b/checkpoint/Makefile
> index 607d864..55c5c3d 100644
> --- a/checkpoint/Makefile
> +++ b/checkpoint/Makefile
> @@ -4,3 +4,4 @@
> 
>  obj-$(CONFIG_CHECKPOINT) += sys.o checkpoint.o restart.o objhash.o \
>  		ckpt_mem.o rstr_mem.o ckpt_file.o rstr_file.o
> +EXTRA_CFLAGS += -DDEBUG
> diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
> index c2f0e16..5f83e83 100644
> --- a/checkpoint/checkpoint.c
> +++ b/checkpoint/checkpoint.c
> @@ -213,6 +213,65 @@ static int cr_write_tail(struct cr_ctx *ctx)
>  	return ret;
>  }
> 
> +static int cr_write_utsns(struct cr_ctx *ctx, struct new_utsname *name)
> +{
> +	struct cr_hdr h;
> +	struct cr_hdr_utsns *hh = cr_hbuf_get(ctx, sizeof(*hh));
> +	int ret;
> +
> +	h.type = CR_HDR_UTSNS;
> +	h.len = sizeof(*hh);
> +
> +	hh->nodename_len = strlen(name->nodename) + 1;
> +	hh->domainname_len = strlen(name->domainname) + 1;
> +
> +	ret = cr_write_obj(ctx, &h, hh);
> +	if (ret < 0)
> +		goto out;
> +
> +	ret = cr_write_string(ctx, name->nodename, hh->nodename_len);
> +	if (ret < 0)
> +		goto out;
> +
> +	ret = cr_write_string(ctx, name->domainname, hh->domainname_len);
> + out:
> +	cr_hbuf_put(ctx, sizeof(*hh));
> +
> +	return ret;
> +}
> +
> +static int cr_write_namespaces(struct cr_ctx *ctx, struct task_struct *t)
> +{
> +	struct cr_hdr h;
> +	struct cr_hdr_namespaces *hh = cr_hbuf_get(ctx, sizeof(*hh));
> +	struct nsproxy *nsp = t->nsproxy;
> +	int ret;
> +	int uts;
> +
> +	h.type = CR_HDR_NS;
> +	h.len = sizeof(*hh);
> +
> +	uts = cr_obj_add_ptr(ctx, nsp->uts_ns, &hh->uts_ref, CR_OBJ_UTSNS, 0);
> +	if (uts < 0)
> +		goto out;
> +
> +	ret = cr_write_obj(ctx, &h, hh);
> +	if (ret)
> +		goto out;
> +
> +	if (uts) {
> +		ret = cr_write_utsns(ctx, &nsp->uts_ns->name);
> +		if (ret < 0)
> +			goto out;
> +	}
> +
> +	/* FIXME: Write other namespaces here */
> + out:
> +	cr_hbuf_put(ctx, sizeof(*hh));
> +
> +	return ret;
> +}
> +
>  /* dump the task_struct of a given task */
>  static int cr_write_task_struct(struct cr_ctx *ctx, struct task_struct *t)
>  {
> @@ -267,6 +326,10 @@ static int cr_write_task(struct cr_ctx *ctx, struct task_struct *t)
>  		goto out;
>  	ret = cr_write_cpu(ctx, t);
>  	cr_debug("cpu: ret %d\n", ret);
> +	if (ret < 0)
> +		goto out;
> +	ret = cr_write_namespaces(ctx, t);
> +	cr_debug("ns: ret %d\n", ret);
>   out:
>  	return ret;
>  }
> @@ -302,9 +365,6 @@ static int cr_may_checkpoint_task(struct task_struct *t, struct cr_ctx *ctx)
>  	if (t != current && !frozen(t))
>  		return -EBUSY;
> 
> -	if (task_nsproxy(t)->uts_ns != ctx->root_nsproxy->uts_ns)
> -		return -EPERM;
> -
>  	if (task_nsproxy(t)->ipc_ns != ctx->root_nsproxy->ipc_ns)
>  		return -EPERM;
> 
> diff --git a/checkpoint/objhash.c b/checkpoint/objhash.c
> index 25916c1..c6ae7c1 100644
> --- a/checkpoint/objhash.c
> +++ b/checkpoint/objhash.c
> @@ -12,6 +12,7 @@
>  #include <linux/file.h>
>  #include <linux/hash.h>
>  #include <linux/checkpoint.h>
> +#include <linux/utsname.h>
> 
>  struct cr_objref {
>  	int objref;
> @@ -38,6 +39,9 @@ static void cr_obj_ref_drop(struct cr_objref *obj)
>  	case CR_OBJ_INODE:
>  		iput((struct inode *) obj->ptr);
>  		break;
> +	case CR_OBJ_UTSNS:
> +		put_uts_ns((struct uts_namespace *) obj->ptr);
> +		break;
>  	default:
>  		BUG();
>  	}
> @@ -55,6 +59,9 @@ static int cr_obj_ref_grab(struct cr_objref *obj)
>  		if (!igrab((struct inode *) obj->ptr))
>  			ret = -EBADF;
>  		break;
> +	case CR_OBJ_UTSNS:
> +		get_uts_ns((struct uts_namespace *) obj->ptr);
> +		break;
>  	default:
>  		BUG();
>  	}
> diff --git a/checkpoint/restart.c b/checkpoint/restart.c
> index d9e01ce..f42d549 100644
> --- a/checkpoint/restart.c
> +++ b/checkpoint/restart.c
> @@ -15,6 +15,8 @@
>  #include <linux/magic.h>
>  #include <linux/checkpoint.h>
>  #include <linux/checkpoint_hdr.h>
> +#include <linux/utsname.h>
> +#include <linux/syscalls.h>
> 
>  #include "checkpoint_arch.h"
> 
> @@ -237,6 +239,101 @@ static int cr_read_tail(struct cr_ctx *ctx)
>  	return ret;
>  }
> 
> +static int cr_read_utsns(struct cr_ctx *ctx, struct task_struct *t)
> +{
> +	struct cr_hdr_utsns hh;
> +	struct uts_namespace *ns;
> +	int ret;
> +	char *nn = NULL;
> +	char *dn = NULL;
> +
> +	ret = cr_read_obj_type(ctx, &hh, sizeof(hh), CR_HDR_UTSNS);
> +	if (ret < 0)
> +		return ret;
> +
> +	nn = kmalloc(hh.nodename_len, GFP_KERNEL);
> +	if (!nn) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	dn = kmalloc(hh.domainname_len, GFP_KERNEL);
> +	if (!dn) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	ret = cr_read_string(ctx, nn, hh.nodename_len);
> +	if (ret < 0)
> +		goto out;
> +
> +	ret = cr_read_string(ctx, dn, hh.domainname_len);
> +	if (ret < 0)
> +		goto out;
> +
> +	ret = sys_unshare(CLONE_NEWUTS);

One thing to note is that this will drive the ns cgroup
bananas.  It might still be worthwhile collecting the
flags for all the to-be-unshared namespaces, and then
doing all of the unsharing at once.

Futhermore, you do sys_unshare here, then further down you
do another copy_namespaces(CLONE_NEWUTS)?

Finally, it seems to me every task will unshare(CLONE_NEWUTS),
no?  Where is the check done (and stored) for whether this
task has a different utsns from its parent?

I could be misunderstanding your code...

But it seems to me a simpler algorith would be:

Save identifiers for all of the namespaces at the top of the
checkpoint image;  have restart create a set of dummy tasks,
enough to contain all of the new namespaces;  have each unshare
their  namespaces;  then, as each real new task is restarted,
manually create a new nsproxy and link it to all of the
required new namespaces.

OR you can stick to trying to use clone(), but I don't think
this patch is doing that right.

> +	if (ret)
> +		goto out;
> +
> +	ns = t->nsproxy->uts_ns;
> +	memcpy(ns->name.nodename, nn, hh.nodename_len);
> +	memcpy(ns->name.domainname, dn, hh.domainname_len);
> +
> + out:
> +	kfree(nn);
> +	kfree(dn);
> +
> +	return ret;
> +}
> +
> +static int cr_restore_utsns(struct cr_ctx *ctx, int ref)
> +{
> +	struct uts_namespace *uts;
> +	int ret;
> +
> +	uts = cr_obj_get_by_ref(ctx, ref, CR_OBJ_UTSNS);
> +	if (uts == NULL) {
> +		ret = cr_read_utsns(ctx, current);
> +		if (ret < 0)
> +			return ret;
> +
> +		return cr_obj_add_ref(ctx, current->nsproxy->uts_ns,
> +				      ref, CR_OBJ_UTSNS, 0);
> +	} else if (IS_ERR(uts)) {
> +		cr_debug("Failed to get UTS ns from objhash");
> +		return PTR_ERR(uts);
> +	}
> +
> +	ret = copy_namespaces(CLONE_NEWUTS, current);
> +	if (ret < 0)
> +		return ret;
> +
> +	put_uts_ns(current->nsproxy->uts_ns);
> +	get_uts_ns(uts);
> +	current->nsproxy->uts_ns = uts;
> +
> +	return 0;
> +}
> +
> +static int cr_read_namespaces(struct cr_ctx *ctx)
> +{
> +	struct cr_hdr_namespaces hh;
> +	int ret;
> +
> +	ret = cr_read_obj_type(ctx, &hh, sizeof(hh), CR_HDR_NS);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = cr_restore_utsns(ctx, hh.uts_ref);
> +	cr_debug("uts ns: %d\n", ret);
> +	if (ret < 0)
> +		return ret;
> +
> +	/* FIXME: Add more namespaces here */
> +
> +	return 0;
> +}
> +
>  /* read the task_struct into the current task */
>  static int cr_read_task_struct(struct cr_ctx *ctx)
>  {
> @@ -298,6 +395,10 @@ static int cr_read_task(struct cr_ctx *ctx)
>  		goto out;
>  	ret = cr_read_cpu(ctx);
>  	cr_debug("cpu: ret %d\n", ret);
> +	if (ret < 0)
> +		goto out;
> +	ret = cr_read_namespaces(ctx);
> +	cr_debug("ns: ret %d\n", ret);
> 
>   out:
>  	return ret;
> diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
> index 2e99c74..cb62716 100644
> --- a/include/linux/checkpoint.h
> +++ b/include/linux/checkpoint.h
> @@ -75,6 +75,7 @@ extern void cr_ctx_put(struct cr_ctx *ctx);
>  enum {
>  	CR_OBJ_FILE = 1,
>  	CR_OBJ_INODE,
> +	CR_OBJ_UTSNS,
>  	CR_OBJ_MAX
>  };
> 
> diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
> index 3addb48..6f29a72 100644
> --- a/include/linux/checkpoint_hdr.h
> +++ b/include/linux/checkpoint_hdr.h
> @@ -48,6 +48,8 @@ enum {
>  	CR_HDR_TASK,
>  	CR_HDR_THREAD,
>  	CR_HDR_CPU,
> +	CR_HDR_NS,
> +	CR_HDR_UTSNS,
> 
>  	CR_HDR_MM = 201,
>  	CR_HDR_VMA,
> @@ -177,4 +179,13 @@ struct cr_hdr_fd_pipe {
>  	__s32 nr_bufs;
>  } __attribute__((aligned(8)));
> 
> +struct cr_hdr_namespaces {
> +	__u32 uts_ref;
> +};
> +
> +struct cr_hdr_utsns {
> +	__u32 nodename_len;
> +	__u32 domainname_len;
> +};
> +
>  #endif /* _CHECKPOINT_CKPT_HDR_H_ */
> -- 
> 1.5.6.3
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] c/r: Add UTS support (v6)
       [not found]     ` <1238533107-11796-3-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-04-02 17:44       ` Serge E. Hallyn
@ 2009-04-02 17:48       ` Serge E. Hallyn
  2009-04-02 17:58       ` Serge E. Hallyn
  2009-04-02 18:09       ` Serge E. Hallyn
  3 siblings, 0 replies; 14+ messages in thread
From: Serge E. Hallyn @ 2009-04-02 17:48 UTC (permalink / raw)
  To: Dan Smith; +Cc: containers-qjLDD68F18O7TbgM5vRIOg

Quoting Dan Smith (danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> +static int cr_restore_utsns(struct cr_ctx *ctx, int ref)
> +{
> +	struct uts_namespace *uts;
> +	int ret;
> +
> +	uts = cr_obj_get_by_ref(ctx, ref, CR_OBJ_UTSNS);
> +	if (uts == NULL) {
> +		ret = cr_read_utsns(ctx, current);
> +		if (ret < 0)
> +			return ret;
> +
> +		return cr_obj_add_ref(ctx, current->nsproxy->uts_ns,
> +				      ref, CR_OBJ_UTSNS, 0);
> +	} else if (IS_ERR(uts)) {
> +		cr_debug("Failed to get UTS ns from objhash");
> +		return PTR_ERR(uts);
> +	}
> +
> +	ret = copy_namespaces(CLONE_NEWUTS, current);
> +	if (ret < 0)
> +		return ret;
> +
> +	put_uts_ns(current->nsproxy->uts_ns);
> +	get_uts_ns(uts);
> +	current->nsproxy->uts_ns = uts;

Oh, sorry, now I see.

It does seem all right, never mind...

-serge

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] Make cr_may_checkpoint_task() check each namespace individually
       [not found]     ` <1238533107-11796-2-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-04-02 17:50       ` Serge E. Hallyn
  0 siblings, 0 replies; 14+ messages in thread
From: Serge E. Hallyn @ 2009-04-02 17:50 UTC (permalink / raw)
  To: Dan Smith; +Cc: containers-qjLDD68F18O7TbgM5vRIOg

Quoting Dan Smith (danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> Cc: orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org
> Signed-off-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

> ---
>  checkpoint/checkpoint.c |   15 +++++++++++++--
>  1 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
> index ef35754..c2f0e16 100644
> --- a/checkpoint/checkpoint.c
> +++ b/checkpoint/checkpoint.c
> @@ -302,8 +302,19 @@ static int cr_may_checkpoint_task(struct task_struct *t, struct cr_ctx *ctx)
>  	if (t != current && !frozen(t))
>  		return -EBUSY;
> 
> -	/* FIXME: change this for nested containers */
> -	if (task_nsproxy(t) != ctx->root_nsproxy)
> +	if (task_nsproxy(t)->uts_ns != ctx->root_nsproxy->uts_ns)
> +		return -EPERM;
> +
> +	if (task_nsproxy(t)->ipc_ns != ctx->root_nsproxy->ipc_ns)
> +		return -EPERM;
> +
> +	if (task_nsproxy(t)->mnt_ns != ctx->root_nsproxy->mnt_ns)
> +		return -EPERM;
> +
> +	if (task_nsproxy(t)->pid_ns != ctx->root_nsproxy->pid_ns)
> +		return -EPERM;
> +
> +	if (task_nsproxy(t)->net_ns != ctx->root_nsproxy->net_ns)
>  		return -EPERM;
> 
>  	return 0;
> -- 
> 1.5.6.3
> 
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] c/r: Add UTS support (v6)
  2009-04-02 17:44       ` Serge E. Hallyn
@ 2009-04-02 17:54         ` Dan Smith
       [not found]           ` <87wsa32b7h.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Dan Smith @ 2009-04-02 17:54 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-qjLDD68F18O7TbgM5vRIOg

SH> One thing to note is that this will drive the ns cgroup bananas.
SH> It might still be worthwhile collecting the flags for all the
SH> to-be-unshared namespaces, and then doing all of the unsharing at
SH> once.

Okay, that's fair.

SH> Futhermore, you do sys_unshare here, then further down you do
SH> another copy_namespaces(CLONE_NEWUTS)?

That's in the case where our UTS namespace has already been created by
a previous task.  We need to copy_namespaces() in order to get a new
nsproxy (since our nsproxy must be copied if we no longer share all
namespaces with our parent).  I have to pass a clone flag to it to get
it to do anything.  I promptly drop my hold on that new UTS namespace
and replace it in my new nsproxy with the one from the objhash that my
predecessor created (which is kinda ugly).

SH> Finally, it seems to me every task will unshare(CLONE_NEWUTS), no?
SH> Where is the check done (and stored) for whether this task has a
SH> different utsns from its parent?

No, tasks only unshare() if their UTS namespace objref is not found in
the objhash (thus indicating that they're the first of that namespace
to be restarted).

Perhaps you're referring to the fact that all tasks call
copy_namespaces() (if they're not the first).  You're correct there,
but I'm not sure that a check to see if we need to
(i.e. task->nsproxy->uts == uts) because at the time that the tasks
were created, none of them had done their unshare() yet).

SH> Save identifiers for all of the namespaces at the top of the
SH> checkpoint image; have restart create a set of dummy tasks, enough
SH> to contain all of the new namespaces; have each unshare their
SH> namespaces; then, as each real new task is restarted, manually
SH> create a new nsproxy and link it to all of the required new
SH> namespaces.

Well, that's an option I suppose.  Oren said he wanted to avoid an
additional loop over all tasks during checkpoint and preferred that it
all be stored with the task itself.  Oren?

-- 
Dan Smith
IBM Linux Technology Center
email: danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] c/r: Add UTS support (v6)
       [not found]     ` <1238533107-11796-3-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-04-02 17:44       ` Serge E. Hallyn
  2009-04-02 17:48       ` Serge E. Hallyn
@ 2009-04-02 17:58       ` Serge E. Hallyn
       [not found]         ` <20090402175804.GC21178-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-04-02 18:09       ` Serge E. Hallyn
  3 siblings, 1 reply; 14+ messages in thread
From: Serge E. Hallyn @ 2009-04-02 17:58 UTC (permalink / raw)
  To: Dan Smith; +Cc: containers-qjLDD68F18O7TbgM5vRIOg

Quoting Dan Smith (danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> This patch adds a "phase" of checkpoint that saves out information about any
> namespaces the task(s) may have.  Do this by tracking the namespace objects
> of the tasks and making sure that tasks with the same namespace that follow
> get properly referenced in the checkpoint stream.
> 
> I tested this with single and multiple task restore, on top of Oren's
> v13 tree.
> 
> Changes:
>   - Remove the kernel restore path
>   - Punt on nested namespaces
>   - Use __NEW_UTS_LEN in nodename and domainname buffers
>   - Add a note to Documentation/checkpoint/internals.txt to indicate where
>     in the save/restore process the UTS information is kept
>   - Store (and track) the objref of the namespace itself instead of the
>     nsproxy (based on comments from Dave on IRC)
>   - Remove explicit check for non-root nsproxy
>   - Store the nodename and domainname lengths and use cr_write_string()
>     to store the actual name strings
>   - Catch failure of cr_obj_add_ptr() in cr_write_namespaces()
>   - Remove "types" bitfield and use the "is this new" flag to determine
>     whether or not we should write out a new ns descriptor
>   - Replace kernel restore path
>   - Move the namespace information to be directly after the task
>     information record
>   - Update Documentation to reflect new location of namespace info
>   - Support checkpoint and restart of nested UTS namespaces
> 
> Cc: orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org
> Signed-off-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Yup, ignore my first reply.  This does seem like the way to go.

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Except for two comments:

> ---
>  Documentation/checkpoint/internals.txt |    1 +
>  checkpoint/Makefile                    |    1 +
>  checkpoint/checkpoint.c                |   66 ++++++++++++++++++++-
>  checkpoint/objhash.c                   |    7 ++
>  checkpoint/restart.c                   |  101 ++++++++++++++++++++++++++++++++
>  include/linux/checkpoint.h             |    1 +
>  include/linux/checkpoint_hdr.h         |   11 ++++
>  7 files changed, 185 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/checkpoint/internals.txt b/Documentation/checkpoint/internals.txt
> index c741b6c..bdd202c 100644
> --- a/Documentation/checkpoint/internals.txt
> +++ b/Documentation/checkpoint/internals.txt
> @@ -17,6 +17,7 @@ The order of operations, both save and restore, is as follows:
>    -> thread state: elements of thread_struct and thread_info
>    -> CPU state: registers etc, including FPU
>    -> memory state: memory address space layout and contents
> +  -> namespace information
>    -> filesystem state: [TBD] filesystem namespace state, chroot, cwd, etc
>    -> files state: open file descriptors and their state
>    -> signals state: [TBD] pending signals and signal handling state
> diff --git a/checkpoint/Makefile b/checkpoint/Makefile
> index 607d864..55c5c3d 100644
> --- a/checkpoint/Makefile
> +++ b/checkpoint/Makefile
> @@ -4,3 +4,4 @@
> 
>  obj-$(CONFIG_CHECKPOINT) += sys.o checkpoint.o restart.o objhash.o \
>  		ckpt_mem.o rstr_mem.o ckpt_file.o rstr_file.o
> +EXTRA_CFLAGS += -DDEBUG
> diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
> index c2f0e16..5f83e83 100644
> --- a/checkpoint/checkpoint.c
> +++ b/checkpoint/checkpoint.c
> @@ -213,6 +213,65 @@ static int cr_write_tail(struct cr_ctx *ctx)
>  	return ret;
> +static int cr_write_namespaces(struct cr_ctx *ctx, struct task_struct *t)
> +{
> +	struct cr_hdr h;
> +	struct cr_hdr_namespaces *hh = cr_hbuf_get(ctx, sizeof(*hh));
> +	struct nsproxy *nsp = t->nsproxy;
> +	int ret;
> +	int uts;
> +
> +	h.type = CR_HDR_NS;
> +	h.len = sizeof(*hh);
> +
> +	uts = cr_obj_add_ptr(ctx, nsp->uts_ns, &hh->uts_ref, CR_OBJ_UTSNS, 0);

I would prefer this be called 'uts_was_new' or something, though.

> +	if (uts < 0)
> +		goto out;
> +
> +	ret = cr_write_obj(ctx, &h, hh);
> +	if (ret)
> +		goto out;
> +
> +	if (uts) {
> +		ret = cr_write_utsns(ctx, &nsp->uts_ns->name);
> +		if (ret < 0)
> +			goto out;
> +	}
> +
> +	/* FIXME: Write other namespaces here */
> + out:
> +	cr_hbuf_put(ctx, sizeof(*hh));
> +
> +	return ret;
> +}
> +

...

> +	ns = t->nsproxy->uts_ns;

Should probably memset them to 0 first.  I realize it doesn't
really seem like security-relevant information leakage, but
sys_hostname() does it, so it seems like we ought to as well.

> +	memcpy(ns->name.nodename, nn, hh.nodename_len);
> +	memcpy(ns->name.domainname, dn, hh.domainname_len);
> +
> + out:
> +	kfree(nn);
> +	kfree(dn);
> +
> +	return ret;
> +}

-serge

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] c/r: Add UTS support (v6)
       [not found]         ` <20090402175804.GC21178-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-04-02 18:06           ` Dan Smith
  0 siblings, 0 replies; 14+ messages in thread
From: Dan Smith @ 2009-04-02 18:06 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-qjLDD68F18O7TbgM5vRIOg

>> +	uts = cr_obj_add_ptr(ctx, nsp->uts_ns, &hh->uts_ref, CR_OBJ_UTSNS, 0);

SH> I would prefer this be called 'uts_was_new' or something, though.

Fair enough.  I was trying to avoid wrapping that line :)

SH> Should probably memset them to 0 first.  I realize it doesn't
SH> really seem like security-relevant information leakage, but
SH> sys_hostname() does it, so it seems like we ought to as well.

Okay.

-- 
Dan Smith
IBM Linux Technology Center
email: danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] Stub implementation of IPC namespace c/r (v2)
       [not found]     ` <1238533107-11796-4-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-04-02 18:06       ` Serge E. Hallyn
  0 siblings, 0 replies; 14+ messages in thread
From: Serge E. Hallyn @ 2009-04-02 18:06 UTC (permalink / raw)
  To: Dan Smith; +Cc: containers-qjLDD68F18O7TbgM5vRIOg

Quoting Dan Smith (danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> Changes:
>  - Update to match UTS changes
> 
> Signed-off-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Oren, I will wait until you rebase your IPC patches on top of these
patches before reviewing them more, if that's ok.

> +static int cr_write_ipcns(struct cr_ctx *ctx, struct ipc_namespace *ipc)
> +{
> +	struct cr_hdr h;
> +	struct cr_hdr_ipcns *hh = cr_hbuf_get(ctx, sizeof(*hh));
> +	int ret;

Note that Oren has taken to checking the return values of all
his cr_hbuf_gets()s...

-serge

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] c/r: Add UTS support (v6)
       [not found]     ` <1238533107-11796-3-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
                         ` (2 preceding siblings ...)
  2009-04-02 17:58       ` Serge E. Hallyn
@ 2009-04-02 18:09       ` Serge E. Hallyn
       [not found]         ` <20090402180936.GE21178-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  3 siblings, 1 reply; 14+ messages in thread
From: Serge E. Hallyn @ 2009-04-02 18:09 UTC (permalink / raw)
  To: Dan Smith; +Cc: containers-qjLDD68F18O7TbgM5vRIOg

Quoting Dan Smith (danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> +	ret = cr_read_string(ctx, nn, hh.nodename_len);
> +	if (ret < 0)
> +		goto out;
> +
> +	ret = cr_read_string(ctx, dn, hh.domainname_len);
> +	if (ret < 0)
> +		goto out;
> +
> +	ret = sys_unshare(CLONE_NEWUTS);
> +	if (ret)
> +		goto out;
> +
> +	ns = t->nsproxy->uts_ns;
> +	memcpy(ns->name.nodename, nn, hh.nodename_len);
> +	memcpy(ns->name.domainname, dn, hh.domainname_len);

Actually, I think you must make sure the user didn't slip
in a nodename_len which was > sizeof(ns->name.nodename).

-serge

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] c/r: Add UTS support (v6)
       [not found]         ` <20090402180936.GE21178-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-04-02 18:10           ` Dan Smith
  0 siblings, 0 replies; 14+ messages in thread
From: Dan Smith @ 2009-04-02 18:10 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-qjLDD68F18O7TbgM5vRIOg

SH> Actually, I think you must make sure the user didn't slip in a
SH> nodename_len which was > sizeof(ns->name.nodename).

Indeed, thanks.

-- 
Dan Smith
IBM Linux Technology Center
email: danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] c/r: Add UTS support (v6)
       [not found]           ` <87wsa32b7h.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
@ 2009-04-06  2:33             ` Oren Laadan
  0 siblings, 0 replies; 14+ messages in thread
From: Oren Laadan @ 2009-04-06  2:33 UTC (permalink / raw)
  To: Dan Smith; +Cc: containers-qjLDD68F18O7TbgM5vRIOg



Dan Smith wrote:
> SH> One thing to note is that this will drive the ns cgroup bananas.
> SH> It might still be worthwhile collecting the flags for all the
> SH> to-be-unshared namespaces, and then doing all of the unsharing at
> SH> once.
> 
> Okay, that's fair.
> 
> SH> Futhermore, you do sys_unshare here, then further down you do
> SH> another copy_namespaces(CLONE_NEWUTS)?
> 
> That's in the case where our UTS namespace has already been created by
> a previous task.  We need to copy_namespaces() in order to get a new
> nsproxy (since our nsproxy must be copied if we no longer share all
> namespaces with our parent).  I have to pass a clone flag to it to get
> it to do anything.  I promptly drop my hold on that new UTS namespace
> and replace it in my new nsproxy with the one from the objhash that my
> predecessor created (which is kinda ugly).
> 
> SH> Finally, it seems to me every task will unshare(CLONE_NEWUTS), no?
> SH> Where is the check done (and stored) for whether this task has a
> SH> different utsns from its parent?
> 
> No, tasks only unshare() if their UTS namespace objref is not found in
> the objhash (thus indicating that they're the first of that namespace
> to be restarted).
> 
> Perhaps you're referring to the fact that all tasks call
> copy_namespaces() (if they're not the first).  You're correct there,
> but I'm not sure that a check to see if we need to
> (i.e. task->nsproxy->uts == uts) because at the time that the tasks
> were created, none of them had done their unshare() yet).
> 
> SH> Save identifiers for all of the namespaces at the top of the
> SH> checkpoint image; have restart create a set of dummy tasks, enough
> SH> to contain all of the new namespaces; have each unshare their
> SH> namespaces; then, as each real new task is restarted, manually
> SH> create a new nsproxy and link it to all of the required new
> SH> namespaces.
> 
> Well, that's an option I suppose.  Oren said he wanted to avoid an
> additional loop over all tasks during checkpoint and preferred that it
> all be stored with the task itself.  Oren?

First off, that's totally possible without a second loop: while filling
the pids_arr[] we can already collect the namespaces information and
fill the data in additional fields in pids_arr[]. That will make it also
available to userspace easily, and also in the kernel just as well.

I suppose you're aiming at doing the unshare() in userspace because you
anticipate headaches with net_ns, right ?

In that case, you don't even need to fork that many dummy tasks. You
could use a single task that would repeatedly unshare() and then call
[light bulb appears...] some form of cr_advise() to tell the kernel
that your current nsproxy (or uts_ns) should be used with objref X in
an upcoming restart.

Oren.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-04-06  2:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-31 20:58 c/r: Add UTS support Dan Smith
     [not found] ` <1238533107-11796-1-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-03-31 20:58   ` [PATCH 1/3] Make cr_may_checkpoint_task() check each namespace individually Dan Smith
     [not found]     ` <1238533107-11796-2-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-02 17:50       ` Serge E. Hallyn
2009-03-31 20:58   ` [PATCH 2/3] c/r: Add UTS support (v6) Dan Smith
     [not found]     ` <1238533107-11796-3-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-02 17:44       ` Serge E. Hallyn
2009-04-02 17:54         ` Dan Smith
     [not found]           ` <87wsa32b7h.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2009-04-06  2:33             ` Oren Laadan
2009-04-02 17:48       ` Serge E. Hallyn
2009-04-02 17:58       ` Serge E. Hallyn
     [not found]         ` <20090402175804.GC21178-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-02 18:06           ` Dan Smith
2009-04-02 18:09       ` Serge E. Hallyn
     [not found]         ` <20090402180936.GE21178-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-02 18:10           ` Dan Smith
2009-03-31 20:58   ` [PATCH 3/3] Stub implementation of IPC namespace c/r (v2) Dan Smith
     [not found]     ` <1238533107-11796-4-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-02 18:06       ` Serge E. Hallyn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.