[RFC v2][PATCH 00/10] sysv SHM checkpoint/restart

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC v2][PATCH 00/10] sysv SHM checkpoint/restart
@ 2009-04-07 12:31 Oren Laadan
       [not found] ` <1239107503-21941-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Oren Laadan @ 2009-04-07 12:31 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

This patchset adds support for IPC shared-memory and message queues.
It applies on top of c/r v14. Tested on x86_32 and verified with the 
tests provided in the userspace tools. 

Changelog:

  [2009-Apr-07] [v2]
  - Reorder paches
  - Rename 'cr_workqueue' -> 'cr_deferqueue'
  - Add c/r of sysvipc message queues
  - Integrate with recent naemspaces c/r (rebase on ckpt-v14)

Summary:

Oren Laadan (10):
      Infrastructure for work postponed to the end of checkpoint/restart
      ipc: allow allocation of an ipc object with desired identifier
      ipc: helpers to save and restore kern_ipc_perm structures
      sysvipc-shm: checkpoint
      sysvipc-shm: restart
      sysvipc-shm: export interface from ipc/shm.c to delete ipc shm
      sysvipc-shm: correctly handle deleted (active) ipc shared memory
      sysvipc-msg: make 'struct msg_msgseg' visible in ipc/util.h
      sysvipc-msq: checkpoint
      sysvipc-msq: restart

 checkpoint/Makefile            |    5 +-
 checkpoint/checkpoint.c        |    4 +
 checkpoint/ckpt_mem.c          |    9 +
 checkpoint/ckpt_task.c         |    2 +-
 checkpoint/deferqueue.c        |   62 ++++++
 checkpoint/restart.c           |    5 +
 checkpoint/rstr_file.c         |    1 -
 checkpoint/rstr_mem.c          |   23 +++
 checkpoint/rstr_task.c         |    2 +-
 checkpoint/sys.c               |    7 +
 checkpoint/util_ipc.c          |   95 +++++++++
 include/linux/checkpoint.h     |   28 +++
 include/linux/checkpoint_hdr.h |   65 +++++++
 include/linux/shm.h            |    4 +
 ipc/Makefile                   |    1 +
 ipc/ckpt_msg.c                 |  414 ++++++++++++++++++++++++++++++++++++++++
 ipc/ckpt_shm.c                 |  339 ++++++++++++++++++++++++++++++++
 ipc/msg.c                      |   20 ++-
 ipc/msgutil.c                  |    8 -
 ipc/sem.c                      |   17 ++-
 ipc/shm.c                      |   34 +++-
 ipc/util.c                     |   42 +++--
 ipc/util.h                     |   20 ++-
 23 files changed, 1158 insertions(+), 49 deletions(-)
 create mode 100644 checkpoint/deferqueue.c
 create mode 100644 checkpoint/util_ipc.c
 create mode 100644 ipc/ckpt_msg.c
 create mode 100644 ipc/ckpt_shm.c

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC v2][PATCH 01/10] Infrastructure for work postponed to the end of checkpoint/restart
       [not found] ` <1239107503-21941-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-04-07 12:31   ` Oren Laadan
       [not found]     ` <1239107503-21941-2-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  2009-04-07 12:31   ` [RFC v2][PATCH 02/10] ipc: allow allocation of an ipc object with desired identifier Oren Laadan
                     ` (8 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Oren Laadan @ 2009-04-07 12:31 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

Add a interface to postpone an action until the end of the entire
checkpoint or restart operation. This is useful when during the
scan of tasks an operation cannot be performed in place, to avoid
the need for a second scan.

One use case is when restoring an ipc shared memory region that has
been deleted (but is still attached), during restart it needs to be
create, attached and then deleted. However, creation and attachment
are performed in distinct locations, so deletion can not be performed
on the spot. Instead, this work (delete) is deferred until later.
(This example is in one of the following patches).

The interface is as follows:

cr_deferqueue_run(ctx):
  Execute all the pending works in the queue. Returns the number of
  works executed, or an error.

cr_deferqueue_add(ctx, function, flags, data, size):
  Enqueue a postponed work. @function is the function to do the work,
  which will be called with @data as an argument. @size tells the
  size of data. @flags is unused at the moment.

Why aren't we using the existing kernel workqueue mechanism?  We need
to defer to work until the end of the operation: not earlier, since we
need other things to be in place; not later, to not block waiting for
it. However, the workqueue schedules the work for 'some time later'.
Also, the kernel workqueue may run in any task context, but we require
many times that an operation be run in the context of some specific
restarting task (e.g., restoring IPC state of a certain ipc_ns).

Instead, this mechanism is a simple way for the c/r operation as a
whole, and later a task in particular, to defer some action until
later (but not arbitrarily later) _in the restart_ operation.

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 checkpoint/Makefile        |    4 +-
 checkpoint/checkpoint.c    |    4 +++
 checkpoint/deferqueue.c    |   62 ++++++++++++++++++++++++++++++++++++++++++++
 checkpoint/restart.c       |    4 +++
 checkpoint/sys.c           |    7 +++++
 include/linux/checkpoint.h |    9 ++++++
 6 files changed, 88 insertions(+), 2 deletions(-)
 create mode 100644 checkpoint/deferqueue.c

diff --git a/checkpoint/Makefile b/checkpoint/Makefile
index 420c2e6..fc0f766 100644
--- a/checkpoint/Makefile
+++ b/checkpoint/Makefile
@@ -2,8 +2,8 @@
 # Makefile for linux checkpoint/restart.
 #
 
-obj-$(CONFIG_CHECKPOINT) += sys.o objhash.o \
+obj-$(CONFIG_CHECKPOINT) += sys.o objhash.o deferqueue.o \
 		checkpoint.o restart.o \
 		ckpt_task.o rstr_task.o \
 		ckpt_mem.o rstr_mem.o \
-		ckpt_file.o rstr_file.o
+		ckpt_file.o rstr_file.o \
diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
index 7382cc3..47d5bd1 100644
--- a/checkpoint/checkpoint.c
+++ b/checkpoint/checkpoint.c
@@ -550,6 +550,10 @@ int do_checkpoint(struct cr_ctx *ctx, pid_t pid)
 	if (ret < 0)
 		goto out;
 
+	ret = cr_deferqueue_run(ctx);
+	if (ret < 0)
+		goto out;
+
 	ctx->crid = atomic_inc_return(&cr_ctx_count);
 
 	/* on success, return (unique) checkpoint identifier */
diff --git a/checkpoint/deferqueue.c b/checkpoint/deferqueue.c
new file mode 100644
index 0000000..a02d577
--- /dev/null
+++ b/checkpoint/deferqueue.c
@@ -0,0 +1,62 @@
+/*
+ *  Checkpoint-restart - infrastructure to manage deferred work
+ *
+ *  Copyright (C) 2009 Oren Laadan
+ *
+ *  This file is subject to the terms and conditions of the GNU General Public
+ *  License.  See the file COPYING in the main directory of the Linux
+ *  distribution for more details.
+ */
+
+#include <linux/list.h>
+#include <linux/checkpoint.h>
+
+struct cr_deferqueue {
+	cr_deferqueue_func_t function;
+	unsigned int flags;
+	struct list_head list;
+	char data[0];
+};
+
+int cr_deferqueue_add(struct cr_ctx *ctx, cr_deferqueue_func_t function,
+		     unsigned int flags, void *data, int size)
+{
+	struct cr_deferqueue *wq;
+
+	wq = kmalloc(sizeof(wq) + size, GFP_KERNEL);
+	if (!wq)
+		return -ENOMEM;
+
+	wq->function = function;
+	wq->flags = flags;
+	memcpy(wq->data, data, size);
+
+	cr_debug("adding work %p function %p\n", wq, wq->function);
+	list_add_tail(&ctx->deferqueue, &wq->list);
+	return 0;
+}
+
+/*
+ * cr_deferqueue_run - perform all work in the work queue
+ * @ctx: checkpoint context
+ *
+ * returns: number of works performed, or < 0 on error
+ */
+int cr_deferqueue_run(struct cr_ctx *ctx)
+{
+	struct cr_deferqueue *wq, *n;
+	int nr = 0;
+	int ret;
+
+	list_for_each_entry_safe(wq, n, &ctx->deferqueue, list) {
+		cr_debug("doing work %p function %p\n", wq, wq->function);
+		ret = wq->function(wq->data);
+		if (ret < 0)
+			cr_debug("wq function failed %d\n", ret);
+		list_del(&wq->list);
+		kfree(wq);
+		nr++;
+	}
+
+	return nr;
+}
diff --git a/checkpoint/restart.c b/checkpoint/restart.c
index f9b6ca1..d5c5ce2 100644
--- a/checkpoint/restart.c
+++ b/checkpoint/restart.c
@@ -483,6 +483,10 @@ static int do_restart_root(struct cr_ctx *ctx, pid_t pid)
 	if (ret < 0)
 		return ret;
 
+	ret = cr_deferqueue_run(ctx);
+	if (ret < 0)
+		return ret;
+
 	return cr_read_tail(ctx);
 }
 
diff --git a/checkpoint/sys.c b/checkpoint/sys.c
index 63ee55e..afcbf75 100644
--- a/checkpoint/sys.c
+++ b/checkpoint/sys.c
@@ -171,8 +171,14 @@ static void cr_task_arr_free(struct cr_ctx *ctx)
 
 static void cr_ctx_free(struct cr_ctx *ctx)
 {
+	int ret;
+
 	BUG_ON(atomic_read(&ctx->refcount));
 
+	ret = cr_deferqueue_run(ctx);
+	if (ret != 0)
+		cr_debug("deferred deferqueue had %d entries", ret);
+
 	if (ctx->file)
 		fput(ctx->file);
 
@@ -211,6 +217,7 @@ static struct cr_ctx *cr_ctx_alloc(int fd, unsigned long flags)
 	atomic_set(&ctx->refcount, 0);
 	INIT_LIST_HEAD(&ctx->pgarr_list);
 	INIT_LIST_HEAD(&ctx->pgarr_pool);
+	INIT_LIST_HEAD(&ctx->deferqueue);
 	init_waitqueue_head(&ctx->waitq);
 
 	err = -EBADF;
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 1999639..9ca6960 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -40,6 +40,7 @@ struct cr_ctx {
 	atomic_t refcount;
 
 	struct cr_objhash *objhash;	/* hash for shared objects */
+	struct list_head deferqueue;	/* list of deferred works */
 
 	struct list_head pgarr_list;	/* page array to dump VMA contents */
 	struct list_head pgarr_pool;	/* pool of empty page arrays chain */
@@ -72,6 +73,14 @@ extern void cr_hbuf_put(struct cr_ctx *ctx, int n);
 extern void cr_ctx_get(struct cr_ctx *ctx);
 extern void cr_ctx_put(struct cr_ctx *ctx);
 
+/* deferred tasks */
+
+typedef int (*cr_deferqueue_func_t)(void *);
+
+extern int cr_deferqueue_run(struct cr_ctx *ctx);
+extern int cr_deferqueue_add(struct cr_ctx *ctx, cr_deferqueue_func_t func,
+			     unsigned int flags, void *data, int size);
+
 /* shared objects handling */
 
 enum {
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2][PATCH 02/10] ipc: allow allocation of an ipc object with desired identifier
       [not found] ` <1239107503-21941-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  2009-04-07 12:31   ` [RFC v2][PATCH 01/10] Infrastructure for work postponed to the end of checkpoint/restart Oren Laadan
@ 2009-04-07 12:31   ` Oren Laadan
       [not found]     ` <1239107503-21941-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  2009-04-07 12:31   ` [RFC v2][PATCH 03/10] ipc: helpers to save and restore kern_ipc_perm structures Oren Laadan
                     ` (7 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Oren Laadan @ 2009-04-07 12:31 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

During restart, we need to allocate ipc objects that with the same
identifiers as recorded during checkpoint. Modify the allocation
code allow an in-kernel caller to request a specific ipc identifier.
The system call interface remains unchanged.

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 ipc/msg.c  |   17 ++++++++++++-----
 ipc/sem.c  |   17 ++++++++++++-----
 ipc/shm.c  |   19 +++++++++++++------
 ipc/util.c |   42 +++++++++++++++++++++++++++++-------------
 ipc/util.h |   11 ++++++++---
 5 files changed, 74 insertions(+), 32 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 2ceab7f..1db7c45 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -73,7 +73,7 @@ struct msg_sender {
 #define msg_unlock(msq)		ipc_unlock(&(msq)->q_perm)
 
 static void freeque(struct ipc_namespace *, struct kern_ipc_perm *);
-static int newque(struct ipc_namespace *, struct ipc_params *);
+static int newque(struct ipc_namespace *, struct ipc_params *, int);
 #ifdef CONFIG_PROC_FS
 static int sysvipc_msg_proc_show(struct seq_file *s, void *it);
 #endif
@@ -174,10 +174,12 @@ static inline void msg_rmid(struct ipc_namespace *ns, struct msg_queue *s)
  * newque - Create a new msg queue
  * @ns: namespace
  * @params: ptr to the structure that contains the key and msgflg
+ * @req_id: request desired id if available (-1 if don't care)
  *
  * Called with msg_ids.rw_mutex held (writer)
  */
-static int newque(struct ipc_namespace *ns, struct ipc_params *params)
+static int
+newque(struct ipc_namespace *ns, struct ipc_params *params, int req_id)
 {
 	struct msg_queue *msq;
 	int id, retval;
@@ -201,7 +203,7 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
 	/*
 	 * ipc_addid() locks msq
 	 */
-	id = ipc_addid(&msg_ids(ns), &msq->q_perm, ns->msg_ctlmni);
+	id = ipc_addid(&msg_ids(ns), &msq->q_perm, ns->msg_ctlmni, req_id);
 	if (id < 0) {
 		security_msg_queue_free(msq);
 		ipc_rcu_putref(msq);
@@ -309,7 +311,7 @@ static inline int msg_security(struct kern_ipc_perm *ipcp, int msgflg)
 	return security_msg_queue_associate(msq, msgflg);
 }
 
-SYSCALL_DEFINE2(msgget, key_t, key, int, msgflg)
+int do_msgget(key_t key, int msgflg, int req_id)
 {
 	struct ipc_namespace *ns;
 	struct ipc_ops msg_ops;
@@ -324,7 +326,12 @@ SYSCALL_DEFINE2(msgget, key_t, key, int, msgflg)
 	msg_params.key = key;
 	msg_params.flg = msgflg;
 
-	return ipcget(ns, &msg_ids(ns), &msg_ops, &msg_params);
+	return ipcget(ns, &msg_ids(ns), &msg_ops, &msg_params, req_id);
+}
+
+SYSCALL_DEFINE2(msgget, key_t, key, int, msgflg)
+{
+	return do_msgget(key, msgflg, -1);
 }
 
 static inline unsigned long
diff --git a/ipc/sem.c b/ipc/sem.c
index 16a2189..207dbbb 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -92,7 +92,7 @@
 #define sem_unlock(sma)		ipc_unlock(&(sma)->sem_perm)
 #define sem_checkid(sma, semid)	ipc_checkid(&sma->sem_perm, semid)
 
-static int newary(struct ipc_namespace *, struct ipc_params *);
+static int newary(struct ipc_namespace *, struct ipc_params *, int);
 static void freeary(struct ipc_namespace *, struct kern_ipc_perm *);
 #ifdef CONFIG_PROC_FS
 static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
@@ -227,11 +227,13 @@ static inline void sem_rmid(struct ipc_namespace *ns, struct sem_array *s)
  * newary - Create a new semaphore set
  * @ns: namespace
  * @params: ptr to the structure that contains key, semflg and nsems
+ * @req_id: request desired id if available (-1 if don't care)
  *
  * Called with sem_ids.rw_mutex held (as a writer)
  */
 
-static int newary(struct ipc_namespace *ns, struct ipc_params *params)
+static int
+newary(struct ipc_namespace *ns, struct ipc_params *params, int req_id)
 {
 	int id;
 	int retval;
@@ -263,7 +265,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
 		return retval;
 	}
 
-	id = ipc_addid(&sem_ids(ns), &sma->sem_perm, ns->sc_semmni);
+	id = ipc_addid(&sem_ids(ns), &sma->sem_perm, ns->sc_semmni, req_id);
 	if (id < 0) {
 		security_sem_free(sma);
 		ipc_rcu_putref(sma);
@@ -308,7 +310,7 @@ static inline int sem_more_checks(struct kern_ipc_perm *ipcp,
 	return 0;
 }
 
-SYSCALL_DEFINE3(semget, key_t, key, int, nsems, int, semflg)
+int do_semget(key_t key, int nsems, int semflg, int req_id)
 {
 	struct ipc_namespace *ns;
 	struct ipc_ops sem_ops;
@@ -327,7 +329,12 @@ SYSCALL_DEFINE3(semget, key_t, key, int, nsems, int, semflg)
 	sem_params.flg = semflg;
 	sem_params.u.nsems = nsems;
 
-	return ipcget(ns, &sem_ids(ns), &sem_ops, &sem_params);
+	return ipcget(ns, &sem_ids(ns), &sem_ops, &sem_params, req_id);
+}
+
+SYSCALL_DEFINE3(semget, key_t, key, int, nsems, int, semflg)
+{
+	return do_semget(key, nsems, semflg, -1);
 }
 
 /*
diff --git a/ipc/shm.c b/ipc/shm.c
index 05d51d2..4135f28 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -61,7 +61,7 @@ static struct vm_operations_struct shm_vm_ops;
 #define shm_unlock(shp)			\
 	ipc_unlock(&(shp)->shm_perm)
 
-static int newseg(struct ipc_namespace *, struct ipc_params *);
+static int newseg(struct ipc_namespace *, struct ipc_params *, int);
 static void shm_open(struct vm_area_struct *vma);
 static void shm_close(struct vm_area_struct *vma);
 static void shm_destroy (struct ipc_namespace *ns, struct shmid_kernel *shp);
@@ -82,7 +82,7 @@ void shm_init_ns(struct ipc_namespace *ns)
  * Called with shm_ids.rw_mutex (writer) and the shp structure locked.
  * Only shm_ids.rw_mutex remains locked on exit.
  */
-static void do_shm_rmid(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
+void do_shm_rmid(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
 {
 	struct shmid_kernel *shp;
 	shp = container_of(ipcp, struct shmid_kernel, shm_perm);
@@ -325,11 +325,13 @@ static struct vm_operations_struct shm_vm_ops = {
  * newseg - Create a new shared memory segment
  * @ns: namespace
  * @params: ptr to the structure that contains key, size and shmflg
+ * @req_id: request desired id if available (-1 if don't care)
  *
  * Called with shm_ids.rw_mutex held as a writer.
  */
 
-static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
+static int
+newseg(struct ipc_namespace *ns, struct ipc_params *params, int req_id)
 {
 	key_t key = params->key;
 	int shmflg = params->flg;
@@ -384,7 +386,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 	if (IS_ERR(file))
 		goto no_file;
 
-	id = ipc_addid(&shm_ids(ns), &shp->shm_perm, ns->shm_ctlmni);
+	id = ipc_addid(&shm_ids(ns), &shp->shm_perm, ns->shm_ctlmni, req_id);
 	if (id < 0) {
 		error = id;
 		goto no_id;
@@ -442,7 +444,7 @@ static inline int shm_more_checks(struct kern_ipc_perm *ipcp,
 	return 0;
 }
 
-SYSCALL_DEFINE3(shmget, key_t, key, size_t, size, int, shmflg)
+int do_shmget(key_t key, size_t size, int shmflg, int req_id)
 {
 	struct ipc_namespace *ns;
 	struct ipc_ops shm_ops;
@@ -458,7 +460,12 @@ SYSCALL_DEFINE3(shmget, key_t, key, size_t, size, int, shmflg)
 	shm_params.flg = shmflg;
 	shm_params.u.size = size;
 
-	return ipcget(ns, &shm_ids(ns), &shm_ops, &shm_params);
+	return ipcget(ns, &shm_ids(ns), &shm_ops, &shm_params, req_id);
+}
+
+SYSCALL_DEFINE3(shmget, key_t, key, size_t, size, int, shmflg)
+{
+	return do_shmget(key, size, shmflg, -1);
 }
 
 static inline unsigned long copy_shmid_to_user(void __user *buf, struct shmid64_ds *in, int version)
diff --git a/ipc/util.c b/ipc/util.c
index 7585a72..58eaa0b 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -256,10 +256,12 @@ int ipc_get_maxid(struct ipc_ids *ids)
  *	Called with ipc_ids.rw_mutex held as a writer.
  */
  
-int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
+int
+ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int size, int req_id)
 {
 	uid_t euid;
 	gid_t egid;
+	int lid = 0;
 	int id, err;
 
 	if (size > IPCMNI)
@@ -268,28 +270,41 @@ int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
 	if (ids->in_use >= size)
 		return -ENOSPC;
 
+	if (req_id >= 0)
+		lid = ipcid_to_idx(req_id);
+
 	spin_lock_init(&new->lock);
 	new->deleted = 0;
 	rcu_read_lock();
 	spin_lock(&new->lock);
 
-	err = idr_get_new(&ids->ipcs_idr, new, &id);
+	err = idr_get_new_above(&ids->ipcs_idr, new, lid, &id);
 	if (err) {
 		spin_unlock(&new->lock);
 		rcu_read_unlock();
 		return err;
 	}
 
+	if (req_id >= 0) {
+		if (id != lid) {
+			idr_remove(&ids->ipcs_idr, id);
+			spin_unlock(&new->lock);
+			rcu_read_unlock();
+			return -EBUSY;
+		}
+		new->seq = req_id / SEQ_MULTIPLIER;
+	} else {
+		new->seq = ids->seq++;
+		if (ids->seq > ids->seq_max)
+			ids->seq = 0;
+	}
+
 	ids->in_use++;
 
 	current_euid_egid(&euid, &egid);
 	new->cuid = new->uid = euid;
 	new->gid = new->cgid = egid;
 
-	new->seq = ids->seq++;
-	if(ids->seq > ids->seq_max)
-		ids->seq = 0;
-
 	new->id = ipc_buildid(id, new->seq);
 	return id;
 }
@@ -305,7 +320,7 @@ int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
  *	when the key is IPC_PRIVATE.
  */
 static int ipcget_new(struct ipc_namespace *ns, struct ipc_ids *ids,
-		struct ipc_ops *ops, struct ipc_params *params)
+		struct ipc_ops *ops, struct ipc_params *params, int req_id)
 {
 	int err;
 retry:
@@ -315,7 +330,7 @@ retry:
 		return -ENOMEM;
 
 	down_write(&ids->rw_mutex);
-	err = ops->getnew(ns, params);
+	err = ops->getnew(ns, params, req_id);
 	up_write(&ids->rw_mutex);
 
 	if (err == -EAGAIN)
@@ -360,6 +375,7 @@ static int ipc_check_perms(struct kern_ipc_perm *ipcp, struct ipc_ops *ops,
  *	@ids: IPC identifer set
  *	@ops: the actual creation routine to call
  *	@params: its parameters
+ *	@req_id: request desired id if available (-1 if don't care)
  *
  *	This routine is called by sys_msgget, sys_semget() and sys_shmget()
  *	when the key is not IPC_PRIVATE.
@@ -369,7 +385,7 @@ static int ipc_check_perms(struct kern_ipc_perm *ipcp, struct ipc_ops *ops,
  *	On success, the ipc id is returned.
  */
 static int ipcget_public(struct ipc_namespace *ns, struct ipc_ids *ids,
-		struct ipc_ops *ops, struct ipc_params *params)
+		struct ipc_ops *ops, struct ipc_params *params, int req_id)
 {
 	struct kern_ipc_perm *ipcp;
 	int flg = params->flg;
@@ -390,7 +406,7 @@ retry:
 		else if (!err)
 			err = -ENOMEM;
 		else
-			err = ops->getnew(ns, params);
+			err = ops->getnew(ns, params, req_id);
 	} else {
 		/* ipc object has been locked by ipc_findkey() */
 
@@ -751,12 +767,12 @@ struct kern_ipc_perm *ipc_lock_check(struct ipc_ids *ids, int id)
  * Common routine called by sys_msgget(), sys_semget() and sys_shmget().
  */
 int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids,
-			struct ipc_ops *ops, struct ipc_params *params)
+		struct ipc_ops *ops, struct ipc_params *params, int req_id)
 {
 	if (params->key == IPC_PRIVATE)
-		return ipcget_new(ns, ids, ops, params);
+		return ipcget_new(ns, ids, ops, params, req_id);
 	else
-		return ipcget_public(ns, ids, ops, params);
+		return ipcget_public(ns, ids, ops, params, req_id);
 }
 
 /**
diff --git a/ipc/util.h b/ipc/util.h
index 3646b45..3bef7ce 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -52,7 +52,7 @@ struct ipc_params {
  *      . routine to call for an extra check if needed
  */
 struct ipc_ops {
-	int (*getnew) (struct ipc_namespace *, struct ipc_params *);
+	int (*getnew) (struct ipc_namespace *, struct ipc_params *, int);
 	int (*associate) (struct kern_ipc_perm *, int);
 	int (*more_checks) (struct kern_ipc_perm *, struct ipc_params *);
 };
@@ -75,7 +75,7 @@ void __init ipc_init_proc_interface(const char *path, const char *header,
 #define ipcid_to_idx(id) ((id) % SEQ_MULTIPLIER)
 
 /* must be called with ids->rw_mutex acquired for writing */
-int ipc_addid(struct ipc_ids *, struct kern_ipc_perm *, int);
+int ipc_addid(struct ipc_ids *, struct kern_ipc_perm *, int, int);
 
 /* must be called with ids->rw_mutex acquired for reading */
 int ipc_get_maxid(struct ipc_ids *);
@@ -152,6 +152,11 @@ static inline void ipc_unlock(struct kern_ipc_perm *perm)
 
 struct kern_ipc_perm *ipc_lock_check(struct ipc_ids *ids, int id);
 int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids,
-			struct ipc_ops *ops, struct ipc_params *params);
+		struct ipc_ops *ops, struct ipc_params *params, int req_id);
+
+/* for checkpoint/restart */
+extern int do_shmget(key_t key, size_t size, int shmflg, int req_id);
+extern void do_shm_rmid(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp);
+
 
 #endif
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2][PATCH 03/10] ipc: helpers to save and restore kern_ipc_perm structures
       [not found] ` <1239107503-21941-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  2009-04-07 12:31   ` [RFC v2][PATCH 01/10] Infrastructure for work postponed to the end of checkpoint/restart Oren Laadan
  2009-04-07 12:31   ` [RFC v2][PATCH 02/10] ipc: allow allocation of an ipc object with desired identifier Oren Laadan
@ 2009-04-07 12:31   ` Oren Laadan
       [not found]     ` <1239107503-21941-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  2009-04-07 12:31   ` [RFC v2][PATCH 04/10] sysvipc-shm: checkpoint Oren Laadan
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Oren Laadan @ 2009-04-07 12:31 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

Add the helpers to save and restore the contents of 'struct
kern_ipc_perm'. Add header structures for ipc state. Put
place-holders to save and restore ipc state.

TODO:
This patch does _not_ address the issues of users/groups and the
related security issues. For now, it saves the old user/group of
ipc objects, but does not restore them during restart.

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 checkpoint/Makefile            |    1 +
 checkpoint/checkpoint.c        |    3 +
 checkpoint/restart.c           |    5 ++
 checkpoint/util_ipc.c          |   82 ++++++++++++++++++++++++++++++++++++++++
 include/linux/checkpoint.h     |   12 ++++++
 include/linux/checkpoint_hdr.h |   32 +++++++++++++++
 6 files changed, 135 insertions(+), 0 deletions(-)
 create mode 100644 checkpoint/util_ipc.c

diff --git a/checkpoint/Makefile b/checkpoint/Makefile
index fc0f766..e64784e 100644
--- a/checkpoint/Makefile
+++ b/checkpoint/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_CHECKPOINT) += sys.o objhash.o deferqueue.o \
 		ckpt_task.o rstr_task.o \
 		ckpt_mem.o rstr_mem.o \
 		ckpt_file.o rstr_file.o \
+		util_ipc.o
diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
index 47d5bd1..1c6c946 100644
--- a/checkpoint/checkpoint.c
+++ b/checkpoint/checkpoint.c
@@ -541,6 +541,9 @@ int do_checkpoint(struct cr_ctx *ctx, pid_t pid)
 	ret = cr_write_tree(ctx);
 	if (ret < 0)
 		goto out;
+	ret = cr_write_ipc(ctx, ctx->root_nsproxy);
+	if (ret < 0)
+		goto out;
 
 	ret = cr_write_all_tasks(ctx);
 	if (ret < 0)
diff --git a/checkpoint/restart.c b/checkpoint/restart.c
index d5c5ce2..c6ac1e4 100644
--- a/checkpoint/restart.c
+++ b/checkpoint/restart.c
@@ -458,6 +458,7 @@ static int do_restart_root(struct cr_ctx *ctx, pid_t pid)
 {
 	int ret;
 
+
 	ret = cr_read_head(ctx);
 	if (ret < 0)
 		return ret;
@@ -465,6 +466,10 @@ static int do_restart_root(struct cr_ctx *ctx, pid_t pid)
 	if (ret < 0)
 		return ret;
 
+	ret = cr_read_ipc(ctx);
+	if (ret < 0)
+		return ret;
+
 	ret = cr_ctx_restart(ctx, pid);
 	if (ret < 0)
 		return ret;
diff --git a/checkpoint/util_ipc.c b/checkpoint/util_ipc.c
new file mode 100644
index 0000000..70c4b18
--- /dev/null
+++ b/checkpoint/util_ipc.c
@@ -0,0 +1,82 @@
+/*
+ *  Checkpoint logic and helpers
+ *
+ *  Copyright (C) 2009 Oren Laadan
+ *
+ *  This file is subject to the terms and conditions of the GNU General Public
+ *  License.  See the file COPYING in the main directory of the Linux
+ *  distribution for more details.
+ */
+
+#ifdef CONFIG_SYSVIPC
+
+#include <linux/version.h>
+#include <linux/checkpoint.h>
+#include <linux/checkpoint_hdr.h>
+
+int cr_write_ipc(struct cr_ctx *ctx, struct nsproxy *nsproxy)
+{
+	return 0;
+}
+
+int cr_read_ipc(struct cr_ctx *ctx)
+{
+	return 0;
+}
+
+void cr_fill_ipc_perms(struct cr_hdr_ipc_perms *hh, struct kern_ipc_perm *perm)
+{
+	hh->id = perm->id;
+	hh->key = perm->key;
+	hh->uid = perm->uid;
+	hh->gid = perm->gid;
+	hh->cuid = perm->cuid;
+	hh->cgid = perm->cgid;
+	hh->mode = perm->mode & S_IRWXUGO;
+	hh->seq = perm->seq;
+}
+
+int cr_load_ipc_perms(struct cr_hdr_ipc_perms *hh, struct kern_ipc_perm *perm)
+{
+	if (hh->id < 0)
+		return -EINVAL;
+	if (CR_TST_OVERFLOW_16(hh->uid, perm->uid) ||
+	    CR_TST_OVERFLOW_16(hh->gid, perm->gid) ||
+	    CR_TST_OVERFLOW_16(hh->cuid, perm->cuid) ||
+	    CR_TST_OVERFLOW_16(hh->cgid, perm->cgid) ||
+	    CR_TST_OVERFLOW_16(hh->mode, perm->mode))
+		return -EINVAL;
+	if (hh->seq >= USHORT_MAX)
+		return -EINVAL;
+	if (hh->mode & ~S_IRWXUGO)
+		return -EINVAL;
+
+	/* FIXME: verify the ->mode field makes sense */
+
+	perm->id = hh->id;
+	perm->key = hh->key;
+#if 0 /* FIXME: requires security checks */
+	perm->uid = hh->uid;
+	perm->gid = hh->gid;
+	perm->cuid = hh->cuid;
+	perm->cgid = hh->cgid;
+#endif
+	perm->mode = hh->mode;
+	perm->seq = hh->seq;
+
+	return 0;
+}
+
+#else
+
+int cr_write_ipc(struct cr_ctx *ctx, struct nsproxy *nsproxy)
+{
+	return 0;
+}
+
+int cr_read_ipc(struct cr_ctx *ctx)
+{
+	return 0;
+}
+
+#endif /* CONFIG_SYSVIPC */
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 9ca6960..9d6710b 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -14,6 +14,7 @@
 #include <linux/fs.h>
 #include <linux/path.h>
 #include <linux/sched.h>
+#include <linux/nsproxy.h>
 #include <asm/atomic.h>
 
 #define CR_VERSION  3
@@ -125,12 +126,14 @@ extern struct file *cr_read_open_fname(struct cr_ctx *ctx,
 extern int cr_write_shmem_contents(struct cr_ctx *ctx, struct inode *inode);
 extern int cr_read_shmem_contents(struct cr_ctx *ctx, struct inode *inode);
 
+extern int cr_write_ipc(struct cr_ctx *ctx, struct nsproxy *nsproxy);
 extern int cr_write_task(struct cr_ctx *ctx, struct task_struct *t);
 extern int cr_write_restart_block(struct cr_ctx *ctx, struct task_struct *t);
 extern int cr_write_mm(struct cr_ctx *ctx, struct task_struct *t);
 extern int cr_write_fd_table(struct cr_ctx *ctx, struct task_struct *t);
 extern int cr_write_file(struct cr_ctx *ctx, struct file *file);
 
+extern int cr_read_ipc(struct cr_ctx *ctx);
 extern int cr_read_task(struct cr_ctx *ctx);
 extern int cr_read_restart_block(struct cr_ctx *ctx);
 extern int cr_read_mm(struct cr_ctx *ctx);
@@ -141,6 +144,15 @@ extern int do_checkpoint(struct cr_ctx *ctx, pid_t pid);
 extern int do_restart(struct cr_ctx *ctx, pid_t pid);
 
 
+#ifdef CONFIG_SYSVIPC
+struct cr_hdr_ipc_perms;
+extern void cr_fill_ipc_perms(struct cr_hdr_ipc_perms *hh,
+			      struct kern_ipc_perm *perm);
+extern int cr_load_ipc_perms(struct cr_hdr_ipc_perms *hh,
+			     struct kern_ipc_perm *perm);
+#endif
+
+
 /* useful macros to copy fields and buffers to/from cr_hdr_xxx structures */
 #define CR_CPT 1
 #define CR_RST 2
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index 5e923c3..3a2c4af 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -63,9 +63,23 @@ enum {
 	CR_HDR_FILE,
 	CR_HDR_FD_PIPE,
 
+	CR_HDR_IPC = 401,
+	CR_HDR_IPC_SHM,
+	CR_HDR_IPC_MSG,
+	CR_HDR_IPC_SEM,
+
 	CR_HDR_TAIL = 5001
 };
 
+#define CR_TST_OVERFLOW_16(a, b) \
+	((sizeof(a) > sizeof(b)) && ((a) > SHORT_MAX))
+
+#define CR_TST_OVERFLOW_32(a, b) \
+	((sizeof(a) > sizeof(b)) && ((a) > INT_MAX))
+
+#define CR_TST_OVERFLOW_64(a, b) \
+	((sizeof(a) > sizeof(b)) && ((a) > LONG_MAX))
+
 struct cr_hdr_head {
 	__u64 magic;
 
@@ -222,4 +236,22 @@ struct cr_hdr_fd_pipe {
 	__s32 nr_bufs;
 } __attribute__((aligned(8)));
 
+/* ipc commons */
+struct cr_hdr_ipc {
+	__u32 ipc_type;
+	__u32 ipc_count;
+} __attribute__((aligned(8)));
+
+struct cr_hdr_ipc_perms {
+	__s32 id;
+	__u32 key;
+	__u32 uid;
+	__u32 gid;
+	__u32 cuid;
+	__u32 cgid;
+	__u32 mode;
+	__u32 _padding;
+	__u64 seq;
+} __attribute__((aligned(8)));
+
 #endif /* _CHECKPOINT_CKPT_HDR_H_ */
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2][PATCH 04/10] sysvipc-shm: checkpoint
       [not found] ` <1239107503-21941-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2009-04-07 12:31   ` [RFC v2][PATCH 03/10] ipc: helpers to save and restore kern_ipc_perm structures Oren Laadan
@ 2009-04-07 12:31   ` Oren Laadan
  2009-04-07 12:31   ` [RFC v2][PATCH 05/10] sysvipc-shm: restart Oren Laadan
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Oren Laadan @ 2009-04-07 12:31 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

Checkpoint of sysvipc shared memory is performed in two steps: first,
the entire ipc namespace is dumped as a whole by iterating through all
shm objects and dumping the contents of each one. The shmem inode is
registered in the objhash. Second, for each vma that refers to ipc
shared memory we find the inode in the objhash, and save the objref.

(If we find a new inode, that indicates that the ipc namespace is not
entirely frozen and someone must have manipulated it since step 1).

Handling of shm objects that have been deleted (via IPC_RMID) is left
to a later patch in this series.

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 checkpoint/checkpoint.c        |    3 -
 checkpoint/ckpt_mem.c          |    9 +++
 checkpoint/ckpt_task.c         |    2 +-
 checkpoint/restart.c           |    4 -
 checkpoint/util_ipc.c          |    7 +-
 include/linux/checkpoint.h     |    6 +-
 include/linux/checkpoint_hdr.h |   15 ++++
 ipc/Makefile                   |    1 +
 ipc/ckpt_shm.c                 |  142 ++++++++++++++++++++++++++++++++++++++++
 ipc/shm.c                      |   11 +++
 10 files changed, 186 insertions(+), 14 deletions(-)
 create mode 100644 ipc/ckpt_shm.c

diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
index 1c6c946..47d5bd1 100644
--- a/checkpoint/checkpoint.c
+++ b/checkpoint/checkpoint.c
@@ -541,9 +541,6 @@ int do_checkpoint(struct cr_ctx *ctx, pid_t pid)
 	ret = cr_write_tree(ctx);
 	if (ret < 0)
 		goto out;
-	ret = cr_write_ipc(ctx, ctx->root_nsproxy);
-	if (ret < 0)
-		goto out;
 
 	ret = cr_write_all_tasks(ctx);
 	if (ret < 0)
diff --git a/checkpoint/ckpt_mem.c b/checkpoint/ckpt_mem.c
index 0df3cda..54b2674 100644
--- a/checkpoint/ckpt_mem.c
+++ b/checkpoint/ckpt_mem.c
@@ -566,7 +566,16 @@ static int cr_write_shared_vma_contents(struct cr_ctx *ctx,
 		inode = vma->vm_file->f_dentry->d_inode;
 		ret = cr_write_shmem_contents(ctx, inode);
 		break;
+	case CR_VMA_SHM_IPC:
+		/*
+		 * This doesn't happen, because all IPC regions should have
+		 * been already dumped by now via ipc namespaces; It means
+		 * the ipc_ns has been modified recently during checkpoint.
+		 */
+		ret = -EBUSY;
+		break;
 	case CR_VMA_SHM_ANON_SKIP:
+	case CR_VMA_SHM_IPC_SKIP:
 	case CR_VMA_SHM_FILE_SKIP:
 		/* already saved before .. skip now */
 		break;
diff --git a/checkpoint/ckpt_task.c b/checkpoint/ckpt_task.c
index b5e330b..4d19e31 100644
--- a/checkpoint/ckpt_task.c
+++ b/checkpoint/ckpt_task.c
@@ -250,7 +250,7 @@ static int cr_write_namespaces(struct cr_ctx *ctx, struct task_struct *t)
 			goto out;
 	}
 	if (new_ipc) {
-		/* ret = cr_write_ipcns(ctx, nsproxy->ipc_ns); */ ret = 0;
+		ret = cr_write_ipcns(ctx, nsproxy->ipc_ns);
 		if (ret < 0)
 			goto out;
 	}
diff --git a/checkpoint/restart.c b/checkpoint/restart.c
index c6ac1e4..dad257e 100644
--- a/checkpoint/restart.c
+++ b/checkpoint/restart.c
@@ -466,10 +466,6 @@ static int do_restart_root(struct cr_ctx *ctx, pid_t pid)
 	if (ret < 0)
 		return ret;
 
-	ret = cr_read_ipc(ctx);
-	if (ret < 0)
-		return ret;
-
 	ret = cr_ctx_restart(ctx, pid);
 	if (ret < 0)
 		return ret;
diff --git a/checkpoint/util_ipc.c b/checkpoint/util_ipc.c
index 70c4b18..c2d2944 100644
--- a/checkpoint/util_ipc.c
+++ b/checkpoint/util_ipc.c
@@ -10,16 +10,15 @@
 
 #ifdef CONFIG_SYSVIPC
 
-#include <linux/version.h>
 #include <linux/checkpoint.h>
 #include <linux/checkpoint_hdr.h>
 
-int cr_write_ipc(struct cr_ctx *ctx, struct nsproxy *nsproxy)
+int cr_write_ipcns(struct cr_ctx *ctx, struct ipc_namespace *ipc_ns)
 {
-	return 0;
+	return cr_write_ipc_shm(ctx, ipc_ns);
 }
 
-int cr_read_ipc(struct cr_ctx *ctx)
+int cr_read_ipcns(struct cr_ctx *ctx)
 {
 	return 0;
 }
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 9d6710b..97565f8 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -15,6 +15,7 @@
 #include <linux/path.h>
 #include <linux/sched.h>
 #include <linux/nsproxy.h>
+#include <linux/ipc_namespace.h>
 #include <asm/atomic.h>
 
 #define CR_VERSION  3
@@ -126,14 +127,14 @@ extern struct file *cr_read_open_fname(struct cr_ctx *ctx,
 extern int cr_write_shmem_contents(struct cr_ctx *ctx, struct inode *inode);
 extern int cr_read_shmem_contents(struct cr_ctx *ctx, struct inode *inode);
 
-extern int cr_write_ipc(struct cr_ctx *ctx, struct nsproxy *nsproxy);
+extern int cr_write_ipcns(struct cr_ctx *ctx, struct ipc_namespace *ipc_ns);
 extern int cr_write_task(struct cr_ctx *ctx, struct task_struct *t);
 extern int cr_write_restart_block(struct cr_ctx *ctx, struct task_struct *t);
 extern int cr_write_mm(struct cr_ctx *ctx, struct task_struct *t);
 extern int cr_write_fd_table(struct cr_ctx *ctx, struct task_struct *t);
 extern int cr_write_file(struct cr_ctx *ctx, struct file *file);
 
-extern int cr_read_ipc(struct cr_ctx *ctx);
+extern int cr_read_ipcns(struct cr_ctx *ctx);
 extern int cr_read_task(struct cr_ctx *ctx);
 extern int cr_read_restart_block(struct cr_ctx *ctx);
 extern int cr_read_mm(struct cr_ctx *ctx);
@@ -150,6 +151,7 @@ extern void cr_fill_ipc_perms(struct cr_hdr_ipc_perms *hh,
 			      struct kern_ipc_perm *perm);
 extern int cr_load_ipc_perms(struct cr_hdr_ipc_perms *hh,
 			     struct kern_ipc_perm *perm);
+extern int cr_write_ipc_shm(struct cr_ctx *ctx, struct ipc_namespace *ipcns);
 #endif
 
 
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index 3a2c4af..b93b2fc 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -175,6 +175,8 @@ enum cr_vma_type {
 	CR_VMA_FILE,		/* private mapped file */
 	CR_VMA_SHM_ANON,	/* shared anonymous */
 	CR_VMA_SHM_ANON_SKIP,	/* shared anonymous, skip contents */
+	CR_VMA_SHM_IPC,		/* shared sysvipc */
+	CR_VMA_SHM_IPC_SKIP,	/* shared sysvipc, skip contents */
 	CR_VMA_SHM_FILE,	/* shared mapped file, only msync */
 	CR_VMA_SHM_FILE_SKIP,	/* shared mapped file, skip msync */
 	CR_VMA_UNKNOWN,		/* unkown (unsupported) vma type */
@@ -254,4 +256,17 @@ struct cr_hdr_ipc_perms {
 	__u64 seq;
 } __attribute__((aligned(8)));
 
+struct cr_hdr_ipc_shm {
+	struct cr_hdr_ipc_perms perms;
+	__u64 shm_segsz;
+	__u64 shm_atim;
+	__u64 shm_dtim;
+	__u64 shm_ctim;
+	__s32 shm_cprid;
+	__s32 shm_lprid;
+	__u32 mlock_uid;
+	__u32 flags;
+	__u32 objref;
+} __attribute__((aligned(8)));
+
 #endif /* _CHECKPOINT_CKPT_HDR_H_ */
diff --git a/ipc/Makefile b/ipc/Makefile
index 65c3843..0789ec8 100644
--- a/ipc/Makefile
+++ b/ipc/Makefile
@@ -8,4 +8,5 @@ obj-$(CONFIG_SYSVIPC_SYSCTL) += ipc_sysctl.o
 obj_mq-$(CONFIG_COMPAT) += compat_mq.o
 obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y)
 obj-$(CONFIG_IPC_NS) += namespace.o
+obj-$(CONFIG_CHECKPOINT) += ckpt_shm.o
 
diff --git a/ipc/ckpt_shm.c b/ipc/ckpt_shm.c
new file mode 100644
index 0000000..a473cc3
--- /dev/null
+++ b/ipc/ckpt_shm.c
@@ -0,0 +1,142 @@
+/*
+ *  Checkpoint/restart - dump state of sysvipc shm
+ *
+ *  Copyright (C) 2009 Oren Laadan
+ *
+ *  This file is subject to the terms and conditions of the GNU General Public
+ *  License.  See the file COPYING in the main directory of the Linux
+ *  distribution for more details.
+ */
+
+#include <linux/mm.h>
+#include <linux/shm.h>
+#include <linux/shmem_fs.h>
+#include <linux/hugetlb.h>
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/file.h>
+#include <linux/syscalls.h>
+#include <linux/nsproxy.h>
+#include <linux/ipc_namespace.h>
+
+#include <linux/msg.h>	/* needed for util.h that uses 'struct msg_msg' */
+#include "util.h"
+
+#include <linux/checkpoint.h>
+#include <linux/checkpoint_hdr.h>
+
+/************************************************************************
+ * ipc checkpoint
+ */
+
+static int cr_fill_ipc_shm_hdr(struct cr_ctx *ctx,
+			       struct cr_hdr_ipc_shm *hh,
+			       struct shmid_kernel *shp)
+{
+	int ret = 0;
+
+	ipc_lock_by_ptr(&shp->shm_perm);
+
+	cr_fill_ipc_perms(&hh->perms, &shp->shm_perm);
+
+	hh->shm_segsz = shp->shm_segsz;
+	hh->shm_atim = shp->shm_atim;
+	hh->shm_dtim = shp->shm_dtim;
+	hh->shm_ctim = shp->shm_ctim;
+	hh->shm_cprid = shp->shm_cprid;
+	hh->shm_lprid = shp->shm_lprid;
+
+	if (shp->mlock_user)
+		hh->mlock_uid = shp->mlock_user->uid;
+	else
+		hh->mlock_uid = (unsigned int) -1;
+
+	hh->flags = 0;
+	/* check if shm was setup with SHM_NORESERVE */
+	if (SHMEM_I(shp->shm_file->f_dentry->d_inode)->flags & VM_NORESERVE)
+		hh->flags |= SHM_NORESERVE;
+	/* check if shm was setup with SHM_HUGETLB (unsupported yet) */
+	if (is_file_hugepages(shp->shm_file)) {
+		pr_warning("c/r: unsupported SHM_HUGETLB\n");
+		ret = -ENOSYS;
+	}
+
+	ipc_unlock(&shp->shm_perm);
+	cr_debug("shm: cprid %d lprid %d segsz %lld mlock %d\n",
+		 hh->shm_cprid, hh->shm_lprid, hh->shm_segsz, hh->mlock_uid);
+
+	return ret;
+}
+
+static int cr_do_write_ipc_shm(int id, void *p, void *data)
+{
+	struct cr_hdr h;
+	struct cr_hdr_ipc_shm *hh;
+	struct cr_ctx *ctx = (struct cr_ctx *) data;
+	struct kern_ipc_perm *perm = (struct kern_ipc_perm *) p;
+	struct shmid_kernel *shp;
+	struct inode *inode;
+	int ret;
+
+	shp = container_of(perm, struct shmid_kernel, shm_perm);
+	inode = shp->shm_file->f_dentry->d_inode;
+
+	h.type = CR_HDR_IPC_SHM;
+	h.len = sizeof(*hh);
+
+	hh = cr_hbuf_get(ctx, sizeof(*hh));
+	if (!hh)
+		return -ENOMEM;
+
+	ret = cr_fill_ipc_shm_hdr(ctx, hh, shp);
+	if (ret < 0)
+		goto out;
+
+	ret = cr_obj_add_ptr(ctx, inode, &hh->objref, CR_OBJ_INODE, 0);
+	if (ret < 0)
+		goto out;
+	BUG_ON(ret != 1);	/* must be first time always */
+
+	cr_debug("shm: objref %d\n", hh->objref);
+	ret = cr_write_obj(ctx, &h, hh);
+	if (ret < 0)
+		goto out;
+
+	ret = cr_write_shmem_contents(ctx, inode);
+ out:
+	cr_hbuf_put(ctx, sizeof(*hh));
+	return ret;
+}
+
+int cr_write_ipc_shm(struct cr_ctx *ctx, struct ipc_namespace *ipcns)
+{
+	struct cr_hdr h;
+	struct cr_hdr_ipc *hh;
+	struct ipc_ids *shm_ids = &ipcns->ids[IPC_SHM_IDS];
+	int ret = -ENOMEM;
+
+	down_read(&shm_ids->rw_mutex);
+
+	h.type = CR_HDR_IPC;
+	h.len = sizeof(*hh);
+
+	hh = cr_hbuf_get(ctx, sizeof(*hh));
+	if (!hh)
+		goto out;
+
+	hh->ipc_type = CR_HDR_IPC_SHM;
+	hh->ipc_count = shm_ids->in_use;
+	cr_debug("shm: count %d\n", hh->ipc_count);
+
+	ret = cr_write_obj(ctx, &h, hh);
+	cr_hbuf_put(ctx, sizeof(*hh));
+	if (ret < 0)
+		goto out;
+
+	ret = idr_for_each(&shm_ids->ipcs_idr, cr_do_write_ipc_shm, ctx);
+	cr_debug("shm: ret %d\n", ret);
+
+ out:
+	up_read(&shm_ids->rw_mutex);
+	return ret;
+}
diff --git a/ipc/shm.c b/ipc/shm.c
index 4135f28..5ac6aec 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -39,6 +39,7 @@
 #include <linux/nsproxy.h>
 #include <linux/mount.h>
 #include <linux/ipc_namespace.h>
+#include <linux/checkpoint_hdr.h>
 
 #include <asm/uaccess.h>
 
@@ -244,6 +245,13 @@ static struct mempolicy *shm_get_policy(struct vm_area_struct *vma,
 }
 #endif
 
+#ifdef CONFIG_CHECKPOINT
+static int shm_cr_vma_type(struct vm_area_struct *vma)
+{
+	return CR_VMA_SHM_IPC;
+}
+#endif
+
 static int shm_mmap(struct file * file, struct vm_area_struct * vma)
 {
 	struct shm_file_data *sfd = shm_file_data(file);
@@ -319,6 +327,9 @@ static struct vm_operations_struct shm_vm_ops = {
 	.set_policy = shm_set_policy,
 	.get_policy = shm_get_policy,
 #endif
+#if defined(CONFIG_CHECKPOINT)
+	.cr_vma_type = shm_cr_vma_type,
+#endif
 };
 
 /**
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2][PATCH 05/10] sysvipc-shm: restart
       [not found] ` <1239107503-21941-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2009-04-07 12:31   ` [RFC v2][PATCH 04/10] sysvipc-shm: checkpoint Oren Laadan
@ 2009-04-07 12:31   ` Oren Laadan
  2009-04-07 12:31   ` [RFC v2][PATCH 06/10] sysvipc-shm: export interface from ipc/shm.c to delete ipc shm Oren Laadan
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Oren Laadan @ 2009-04-07 12:31 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

Like chekcpoint, restart of sysvipc shared memory is also performed in
two steps: first, the entire ipc namespace is restored as a whole, by
restoring each shm object read from the checkpoint image. The shmem's
file pointer is registered in the objhash. Second, for each vma that
refers to ipc shared memory, we use the objref to find the file in the
objhash, and use that file in calling do_mmap_pgoff().

Handling of shm objects that have been deleted (via IPC_RMID) is left
to a later patch in this series.

Handling of ipc shm mappings that are locked (via SHM_MLOCK) is also
not restored at the moment.

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 checkpoint/rstr_mem.c      |   23 ++++++
 checkpoint/rstr_task.c     |    2 +-
 checkpoint/util_ipc.c      |    2 +-
 include/linux/checkpoint.h |    3 +
 ipc/ckpt_shm.c             |  161 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 189 insertions(+), 2 deletions(-)

diff --git a/checkpoint/rstr_mem.c b/checkpoint/rstr_mem.c
index 7e73129..9de770d 100644
--- a/checkpoint/rstr_mem.c
+++ b/checkpoint/rstr_mem.c
@@ -342,6 +342,24 @@ static struct file *cr_vma_prep_file(struct cr_ctx *ctx, struct cr_hdr_vma *hh)
 		if (!IS_ERR(file))
 			get_file(file);
 		break;
+#ifdef CONFIG_SYSVIPC
+	case CR_VMA_SHM_IPC_SKIP:	/* shared sysvipc mapping skipped */
+		if (!hh->shm_objref || hh->vma_objref)
+			break;
+		file = cr_obj_get_by_ref(ctx, hh->shm_objref, CR_OBJ_FILE);
+		if (!file)
+			file = ERR_PTR(-EINVAL);
+		if (!IS_ERR(file)) {
+			ret = cr_ipc_shm_attach(file,
+						hh->vm_start,
+						hh->vm_flags);
+			if (ret < 0)
+				file = ERR_PTR(ret);
+		}
+		if (!IS_ERR(file))
+			get_file(file);
+		break;
+#endif
 	case CR_VMA_SHM_FILE:		/* shared mapping of a file */
 		if (!hh->shm_objref || !hh->vma_objref)
 			break;
@@ -438,6 +456,10 @@ static int cr_read_vma(struct cr_ctx *ctx, struct mm_struct *mm)
 		goto out;
 	}
 
+	/* yuck: sysvipc shm are already mapped, so skip this */
+	if (vma_type == CR_VMA_SHM_IPC_SKIP)
+		goto contents;
+
 	/* create a new vma */
 	down_write(&mm->mmap_sem);
 	addr = do_mmap_pgoff(file, vm_start, vm_size,
@@ -451,6 +473,7 @@ static int cr_read_vma(struct cr_ctx *ctx, struct mm_struct *mm)
 		goto out;
 	}
 
+ contents:
 	/* read in the contents of this vma */
 	if (shm)
 		ret = cr_read_shared_vma_contents(ctx, file, vma_type);
diff --git a/checkpoint/rstr_task.c b/checkpoint/rstr_task.c
index 520c15a..fe5c059 100644
--- a/checkpoint/rstr_task.c
+++ b/checkpoint/rstr_task.c
@@ -249,7 +249,7 @@ static int cr_restore_ipcns(struct cr_ctx *ctx, int ref, int flags)
 		return -EINVAL;
 
 	if (!ipc_ns) {
-		/* ret = cr_read_ipcns(ctx, current); */ ret = 0;
+		ret = cr_read_ipcns(ctx);
 		if (ret < 0)
 			return ret;
 
diff --git a/checkpoint/util_ipc.c b/checkpoint/util_ipc.c
index c2d2944..1b791f9 100644
--- a/checkpoint/util_ipc.c
+++ b/checkpoint/util_ipc.c
@@ -20,7 +20,7 @@ int cr_write_ipcns(struct cr_ctx *ctx, struct ipc_namespace *ipc_ns)
 
 int cr_read_ipcns(struct cr_ctx *ctx)
 {
-	return 0;
+	return cr_read_ipc_shm(ctx);
 }
 
 void cr_fill_ipc_perms(struct cr_hdr_ipc_perms *hh, struct kern_ipc_perm *perm)
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 97565f8..0f49b68 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -152,6 +152,9 @@ extern void cr_fill_ipc_perms(struct cr_hdr_ipc_perms *hh,
 extern int cr_load_ipc_perms(struct cr_hdr_ipc_perms *hh,
 			     struct kern_ipc_perm *perm);
 extern int cr_write_ipc_shm(struct cr_ctx *ctx, struct ipc_namespace *ipcns);
+extern int cr_read_ipc_shm(struct cr_ctx *ctx);
+extern int cr_ipc_shm_attach(struct file *file,
+			     unsigned long addr, unsigned long flags);
 #endif
 
 
diff --git a/ipc/ckpt_shm.c b/ipc/ckpt_shm.c
index a473cc3..ee9b77a 100644
--- a/ipc/ckpt_shm.c
+++ b/ipc/ckpt_shm.c
@@ -140,3 +140,164 @@ int cr_write_ipc_shm(struct cr_ctx *ctx, struct ipc_namespace *ipcns)
 	up_read(&shm_ids->rw_mutex);
 	return ret;
 }
+
+/************************************************************************
+ * ipc restart
+ */
+
+int cr_ipc_shm_attach(struct file *file,
+		      unsigned long vm_addr,
+		      unsigned long vm_flags)
+{
+	mm_segment_t old_fs;
+	unsigned long addr;
+	int shmid, shmflg = 0;
+	int ret;
+
+	shmid = file->f_dentry->d_inode->i_ino;
+
+	if (!(vm_flags & VM_WRITE))
+		shmflg |= SHM_RDONLY;
+
+	old_fs = get_fs();
+	set_fs(get_ds());
+	ret = do_shmat(shmid, (char __user *) vm_addr, shmflg, &addr);
+	set_fs(old_fs);
+
+	BUG_ON(ret >= 0 && addr != vm_addr);
+	return ret;
+}
+
+static int cr_load_ipc_shm_hdr(struct cr_ctx *ctx,
+			       struct cr_hdr_ipc_shm *hh,
+			       struct shmid_kernel *shp)
+{
+	int ret;
+
+	ret = cr_load_ipc_perms(&hh->perms, &shp->shm_perm);
+	if (ret < 0)
+		return ret;
+
+	cr_debug("shm: cprid %d lprid %d segsz %lld mlock %d\n",
+		 hh->shm_cprid, hh->shm_lprid, hh->shm_segsz, hh->mlock_uid);
+
+	if (hh->shm_cprid < 0 || hh->shm_lprid < 0)
+		return -EINVAL;
+
+	shp->shm_segsz = hh->shm_segsz;
+	shp->shm_atim = hh->shm_atim;
+	shp->shm_dtim = hh->shm_dtim;
+	shp->shm_ctim = hh->shm_ctim;
+	shp->shm_cprid = hh->shm_cprid;
+	shp->shm_lprid = hh->shm_lprid;
+
+	return 0;
+}
+
+static int cr_do_read_ipc_shm(struct cr_ctx *ctx)
+{
+	struct cr_hdr_ipc_shm *hh;
+	struct kern_ipc_perm *perms;
+	struct shmid_kernel *shp;
+	struct ipc_ids *shm_ids = &current->nsproxy->ipc_ns->ids[IPC_SHM_IDS];
+	struct file *file;
+	int shmflag;
+	int ret;
+
+	hh = cr_hbuf_get(ctx, sizeof(*hh));
+	if (!hh)
+		return -ENOMEM;
+	ret = cr_read_obj_type(ctx, hh, sizeof(*hh), CR_HDR_IPC_SHM);
+	if (ret < 0)
+		goto out;
+	ret = -EINVAL;
+	if (hh->perms.id < 0)
+		goto out;
+
+#define CR_SHMFL_MASK  (SHM_NORESERVE | SHM_HUGETLB)
+	if (hh->flags & ~CR_SHMFL_MASK)
+		goto out;
+
+	ret = -ENOSYS;
+	if (hh->mlock_uid != (unsigned int) -1)	/* FIXME: support SHM_LOCK */
+		goto out;
+	if (hh->flags & SHM_HUGETLB)	/* FIXME: support SHM_HUGETLB */
+		goto out;
+
+	/* FIXME: this will fail for deleted ipc shm segments */
+
+	shmflag = hh->flags | hh->perms.mode | IPC_CREAT | IPC_EXCL;
+	cr_debug("shm: do_shmget size %lld flag %#x id %d\n",
+		 hh->shm_segsz, shmflag, hh->perms.id);
+	ret = do_shmget(hh->perms.key, hh->shm_segsz, shmflag, hh->perms.id);
+	cr_debug("shm: do_shmget ret %d\n", ret);
+	if (ret < 0)
+		goto out;
+
+	down_write(&shm_ids->rw_mutex);
+
+	ret = -EIDRM;
+	perms = ipc_lock(shm_ids, hh->perms.id);
+	if (IS_ERR(perms)) {	/* this should not happen .. but be safe */
+		up_write(&shm_ids->rw_mutex);
+		ret = PTR_ERR(perms);
+		goto out;
+	}
+
+	shp = container_of(perms, struct shmid_kernel, shm_perm);
+	ret = cr_load_ipc_shm_hdr(ctx, hh, shp);
+	if (ret < 0) {
+		cr_debug("shm: need to remove (%d)\n", ret);
+		do_shm_rmid(current->nsproxy->ipc_ns, perms);
+		up_write(&shm_ids->rw_mutex);
+		goto out;
+	}
+
+	file = shp->shm_file;
+	get_file(file);
+	ipc_unlock(perms);
+	up_write(&shm_ids->rw_mutex);
+
+	/* deposit in objhash and read contents in */
+	ret = cr_obj_add_ref(ctx, file, hh->objref, CR_OBJ_FILE, 0);
+	if (ret < 0)
+		goto file;
+	ret = cr_read_shmem_contents(ctx, file->f_dentry->d_inode);
+ file:
+	fput(file);
+ out:
+	cr_hbuf_put(ctx, sizeof(*hh));
+	return ret;
+}
+
+int cr_read_ipc_shm(struct cr_ctx *ctx)
+{
+	struct cr_hdr_ipc *hh;
+	int n, ret;
+
+	hh = cr_hbuf_get(ctx, sizeof(*hh));
+	if (!hh)
+		return -ENOMEM;
+
+	ret = cr_read_obj_type(ctx, hh, sizeof(*hh), CR_HDR_IPC);
+	if (ret < 0)
+		goto out;
+
+	cr_debug("shm: count %d\n", hh->ipc_count);
+
+	ret = -EINVAL;
+	if (hh->ipc_type != CR_HDR_IPC_SHM)
+		goto out;
+
+	ret = 0;
+	for (n = 0; n < hh->ipc_count; n++) {
+		ret = cr_do_read_ipc_shm(ctx);
+		if (ret < 0)
+			goto out;
+	}
+
+ out:
+	cr_debug("shm: ret %d\n", ret);
+	cr_hbuf_put(ctx, sizeof(*hh));
+	return ret;
+}
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2][PATCH 06/10] sysvipc-shm: export interface from ipc/shm.c to delete ipc shm
       [not found] ` <1239107503-21941-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
                     ` (4 preceding siblings ...)
  2009-04-07 12:31   ` [RFC v2][PATCH 05/10] sysvipc-shm: restart Oren Laadan
@ 2009-04-07 12:31   ` Oren Laadan
  2009-04-07 12:31   ` [RFC v2][PATCH 07/10] sysvipc-shm: correctly handle deleted (active) ipc shared memory Oren Laadan
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Oren Laadan @ 2009-04-07 12:31 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

Export shmctl_down() which will be used in the next patch during
restart to delete an ipc shm (the shm is mapped already, so it
won't be lost).

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 include/linux/shm.h |    4 ++++
 ipc/shm.c           |    4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/shm.h b/include/linux/shm.h
index eca6235..ec36e99 100644
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -118,6 +118,10 @@ static inline int is_file_shm_hugepages(struct file *file)
 }
 #endif
 
+struct ipc_namespace;
+extern int shmctl_down(struct ipc_namespace *ns, int shmid, int cmd,
+		       struct shmid_ds __user *buf, int version);
+
 #endif /* __KERNEL__ */
 
 #endif /* _LINUX_SHM_H_ */
diff --git a/ipc/shm.c b/ipc/shm.c
index 5ac6aec..28a8b57 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -605,8 +605,8 @@ static void shm_get_stat(struct ipc_namespace *ns, unsigned long *rss,
  * to be held in write mode.
  * NOTE: no locks must be held, the rw_mutex is taken inside this function.
  */
-static int shmctl_down(struct ipc_namespace *ns, int shmid, int cmd,
-		       struct shmid_ds __user *buf, int version)
+int shmctl_down(struct ipc_namespace *ns, int shmid, int cmd,
+		struct shmid_ds __user *buf, int version)
 {
 	struct kern_ipc_perm *ipcp;
 	struct shmid64_ds shmid64;
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2][PATCH 07/10] sysvipc-shm: correctly handle deleted (active) ipc shared memory
       [not found] ` <1239107503-21941-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
                     ` (5 preceding siblings ...)
  2009-04-07 12:31   ` [RFC v2][PATCH 06/10] sysvipc-shm: export interface from ipc/shm.c to delete ipc shm Oren Laadan
@ 2009-04-07 12:31   ` Oren Laadan
  2009-04-07 12:31   ` [RFC v2][PATCH 08/10] sysvipc-msg: make 'struct msg_msgseg' visible in ipc/util.h Oren Laadan
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Oren Laadan @ 2009-04-07 12:31 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

During restart, an ipc shared region may have SHM_DEST, indicating
that it has been originally deleted (while still active). In this
case the task of deleting the region after restoring it is postponed
until the end of the restart; otherwise, it would be quite silly to
delete it at that time, because it will be ... gone :o

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 ipc/ckpt_shm.c |   44 ++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/ipc/ckpt_shm.c b/ipc/ckpt_shm.c
index ee9b77a..c5b7f60 100644
--- a/ipc/ckpt_shm.c
+++ b/ipc/ckpt_shm.c
@@ -145,6 +145,25 @@ int cr_write_ipc_shm(struct cr_ctx *ctx, struct ipc_namespace *ipcns)
  * ipc restart
  */
 
+struct cr_dq_ipcshm_del {
+	struct ipc_namespace *ipcns;
+	int id;
+};
+
+static int cr_ipc_shm_delete(void *data)
+{
+	struct cr_dq_ipcshm_del *dq = (struct cr_dq_ipcshm_del *) data;
+	mm_segment_t old_fs;
+	int ret;
+
+	old_fs = get_fs();
+	set_fs(get_ds());
+	ret = shmctl_down(dq->ipcns, dq->id, IPC_RMID, NULL, 0);
+	set_fs(old_fs);
+
+	return ret;
+}
+
 int cr_ipc_shm_attach(struct file *file,
 		      unsigned long vm_addr,
 		      unsigned long vm_flags)
@@ -224,7 +243,25 @@ static int cr_do_read_ipc_shm(struct cr_ctx *ctx)
 	if (hh->flags & SHM_HUGETLB)	/* FIXME: support SHM_HUGETLB */
 		goto out;
 
-	/* FIXME: this will fail for deleted ipc shm segments */
+	/*
+	 * SHM_DEST means that the shm is to be deleted after creation.
+	 * However, deleting before it's actually attached is quite silly.
+	 * Instead, we defer this task to until restart has succeeded.
+	 */
+	if (hh->perms.mode & SHM_DEST) {
+		struct cr_dq_ipcshm_del dq;
+
+		/* to not confuse the rest of the code */
+		hh->perms.mode &= ~SHM_DEST;
+
+		dq.ipcns = current->nsproxy->ipc_ns;
+		dq.id = hh->perms.id;
+
+		ret = cr_deferqueue_add(ctx, cr_ipc_shm_delete,
+				       0, &dq, sizeof(dq));
+		if (ret < 0)
+			goto out;
+	}
 
 	shmflag = hh->flags | hh->perms.mode | IPC_CREAT | IPC_EXCL;
 	cr_debug("shm: do_shmget size %lld flag %#x id %d\n",
@@ -235,7 +272,6 @@ static int cr_do_read_ipc_shm(struct cr_ctx *ctx)
 		goto out;
 
 	down_write(&shm_ids->rw_mutex);
-
 	ret = -EIDRM;
 	perms = ipc_lock(shm_ids, hh->perms.id);
 	if (IS_ERR(perms)) {	/* this should not happen .. but be safe */
@@ -261,9 +297,9 @@ static int cr_do_read_ipc_shm(struct cr_ctx *ctx)
 	/* deposit in objhash and read contents in */
 	ret = cr_obj_add_ref(ctx, file, hh->objref, CR_OBJ_FILE, 0);
 	if (ret < 0)
-		goto file;
+		goto fput;
 	ret = cr_read_shmem_contents(ctx, file->f_dentry->d_inode);
- file:
+ fput:
 	fput(file);
  out:
 	cr_hbuf_put(ctx, sizeof(*hh));
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2][PATCH 08/10] sysvipc-msg: make 'struct msg_msgseg' visible in ipc/util.h
       [not found] ` <1239107503-21941-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
                     ` (6 preceding siblings ...)
  2009-04-07 12:31   ` [RFC v2][PATCH 07/10] sysvipc-shm: correctly handle deleted (active) ipc shared memory Oren Laadan
@ 2009-04-07 12:31   ` Oren Laadan
  2009-04-07 12:31   ` [RFC v2][PATCH 09/10] sysvipc-msq: checkpoint Oren Laadan
  2009-04-07 12:31   ` [RFC v2][PATCH 10/10] sysvipc-msq: restart Oren Laadan
  9 siblings, 0 replies; 19+ messages in thread
From: Oren Laadan @ 2009-04-07 12:31 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

Move the definition of 'struct msg_msgseg' and constants DATALEN_*
to ipc/util.h, where they are visible to ipc/ckpt_msg.c

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 ipc/msg.c     |    3 +--
 ipc/msgutil.c |    8 --------
 ipc/util.h    |   11 ++++++++++-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 1db7c45..1d5d087 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -72,7 +72,6 @@ struct msg_sender {
 
 #define msg_unlock(msq)		ipc_unlock(&(msq)->q_perm)
 
-static void freeque(struct ipc_namespace *, struct kern_ipc_perm *);
 static int newque(struct ipc_namespace *, struct ipc_params *, int);
 #ifdef CONFIG_PROC_FS
 static int sysvipc_msg_proc_show(struct seq_file *s, void *it);
@@ -278,7 +277,7 @@ static void expunge_all(struct msg_queue *msq, int res)
  * msg_ids.rw_mutex (writer) and the spinlock for this message queue are held
  * before freeque() is called. msg_ids.rw_mutex remains locked on exit.
  */
-static void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
+void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
 {
 	struct list_head *tmp;
 	struct msg_queue *msq = container_of(ipcp, struct msg_queue, q_perm);
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index c82c215..a546e3e 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -17,14 +17,6 @@
 
 #include "util.h"
 
-struct msg_msgseg {
-	struct msg_msgseg* next;
-	/* the next part of the message follows immediately */
-};
-
-#define DATALEN_MSG	(PAGE_SIZE-sizeof(struct msg_msg))
-#define DATALEN_SEG	(PAGE_SIZE-sizeof(struct msg_msgseg))
-
 struct msg_msg *load_msg(const void __user *src, int len)
 {
 	struct msg_msg *msg;
diff --git a/ipc/util.h b/ipc/util.h
index 3bef7ce..b6ef57f 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -121,6 +121,14 @@ extern void free_msg(struct msg_msg *msg);
 extern struct msg_msg *load_msg(const void __user *src, int len);
 extern int store_msg(void __user *dest, struct msg_msg *msg, int len);
 
+struct msg_msgseg {
+	struct msg_msgseg *next;
+	/* the next part of the message follows immediately */
+};
+
+#define DATALEN_MSG	(PAGE_SIZE-sizeof(struct msg_msg))
+#define DATALEN_SEG	(PAGE_SIZE-sizeof(struct msg_msgseg))
+
 extern void recompute_msgmni(struct ipc_namespace *);
 
 static inline int ipc_buildid(int id, int seq)
@@ -157,6 +165,7 @@ int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids,
 /* for checkpoint/restart */
 extern int do_shmget(key_t key, size_t size, int shmflg, int req_id);
 extern void do_shm_rmid(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp);
-
+extern int do_msgget(key_t key, int msgflg, int req_id);
+extern void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp);
 
 #endif
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2][PATCH 09/10] sysvipc-msq: checkpoint
       [not found] ` <1239107503-21941-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
                     ` (7 preceding siblings ...)
  2009-04-07 12:31   ` [RFC v2][PATCH 08/10] sysvipc-msg: make 'struct msg_msgseg' visible in ipc/util.h Oren Laadan
@ 2009-04-07 12:31   ` Oren Laadan
  2009-04-07 12:31   ` [RFC v2][PATCH 10/10] sysvipc-msq: restart Oren Laadan
  9 siblings, 0 replies; 19+ messages in thread
From: Oren Laadan @ 2009-04-07 12:31 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

Checkpoint of sysvipc message-queues is performed by iterating through
all 'msq' objects and dumping the contents of each one. The message
queued on each 'msq' are dumped with that object.

Message of a specific queue get written one by one. The queue lock
cannot be held while dumping them, but the loop must be protected from
someone (who ?) writing or reading. To do that we grab the lock, then
hijack the entire chain of messages from the queue, drop the lock,
and then safely dump them in a loop. Finally, with the lock held, we
re-attach the chain while verifying that there isn't other (new) data
on that queue.

Writing the message contents themselves is straight forward. The code
is similar to that in ipc/msgutil.c, the main difference being that
we deal with kernel memory and not user memory.

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 checkpoint/util_ipc.c          |    9 ++-
 include/linux/checkpoint.h     |    1 +
 include/linux/checkpoint_hdr.h |   18 ++++
 ipc/Makefile                   |    2 +-
 ipc/ckpt_msg.c                 |  204 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 232 insertions(+), 2 deletions(-)
 create mode 100644 ipc/ckpt_msg.c

diff --git a/checkpoint/util_ipc.c b/checkpoint/util_ipc.c
index 1b791f9..163a106 100644
--- a/checkpoint/util_ipc.c
+++ b/checkpoint/util_ipc.c
@@ -15,7 +15,14 @@
 
 int cr_write_ipcns(struct cr_ctx *ctx, struct ipc_namespace *ipc_ns)
 {
-	return cr_write_ipc_shm(ctx, ipc_ns);
+	int ret;
+
+	ret = cr_write_ipc_shm(ctx, ipc_ns);
+	if (ret < 0)
+		return ret;
+	ret = cr_write_ipc_msg(ctx, ipc_ns);
+
+	return ret;
 }
 
 int cr_read_ipcns(struct cr_ctx *ctx)
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 0f49b68..16dd96d 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -155,6 +155,7 @@ extern int cr_write_ipc_shm(struct cr_ctx *ctx, struct ipc_namespace *ipcns);
 extern int cr_read_ipc_shm(struct cr_ctx *ctx);
 extern int cr_ipc_shm_attach(struct file *file,
 			     unsigned long addr, unsigned long flags);
+extern int cr_write_ipc_msg(struct cr_ctx *ctx, struct ipc_namespace *ipcns);
 #endif
 
 
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index b93b2fc..92b0336 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -66,6 +66,7 @@ enum {
 	CR_HDR_IPC = 401,
 	CR_HDR_IPC_SHM,
 	CR_HDR_IPC_MSG,
+	CR_HDR_IPC_MSG_MSG,
 	CR_HDR_IPC_SEM,
 
 	CR_HDR_TAIL = 5001
@@ -269,4 +270,21 @@ struct cr_hdr_ipc_shm {
 	__u32 objref;
 } __attribute__((aligned(8)));
 
+struct cr_hdr_ipc_msg {
+	struct cr_hdr_ipc_perms perms;
+	__u64 q_stime;
+	__u64 q_rtime;
+	__u64 q_ctime;
+	__u64 q_cbytes;
+	__u64 q_qnum;
+	__u64 q_qbytes;
+	__s32 q_lspid;
+	__s32 q_lrpid;
+} __attribute__((aligned(8)));
+
+struct cr_hdr_ipc_msg_msg {
+	__s32 m_type;
+	__u32 m_ts;
+} __attribute__((aligned(8)));
+
 #endif /* _CHECKPOINT_CKPT_HDR_H_ */
diff --git a/ipc/Makefile b/ipc/Makefile
index 0789ec8..aa20c76 100644
--- a/ipc/Makefile
+++ b/ipc/Makefile
@@ -8,5 +8,5 @@ obj-$(CONFIG_SYSVIPC_SYSCTL) += ipc_sysctl.o
 obj_mq-$(CONFIG_COMPAT) += compat_mq.o
 obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y)
 obj-$(CONFIG_IPC_NS) += namespace.o
-obj-$(CONFIG_CHECKPOINT) += ckpt_shm.o
+obj-$(CONFIG_CHECKPOINT) += ckpt_shm.o ckpt_msg.o
 
diff --git a/ipc/ckpt_msg.c b/ipc/ckpt_msg.c
new file mode 100644
index 0000000..5e11253
--- /dev/null
+++ b/ipc/ckpt_msg.c
@@ -0,0 +1,204 @@
+/*
+ *  Checkpoint/restart - dump state of sysvipc msg
+ *
+ *  Copyright (C) 2009 Oren Laadan
+ *
+ *  This file is subject to the terms and conditions of the GNU General Public
+ *  License.  See the file COPYING in the main directory of the Linux
+ *  distribution for more details.
+ */
+
+#include <linux/mm.h>
+#include <linux/msg.h>
+#include <linux/shmem_fs.h>
+#include <linux/hugetlb.h>
+#include <linux/rwsem.h>
+#include <linux/sched.h>
+#include <linux/syscalls.h>
+#include <linux/nsproxy.h>
+#include <linux/ipc_namespace.h>
+
+#include <linux/msg.h>	/* needed for util.h that uses 'struct msg_msg' */
+#include "util.h"
+
+#include <linux/checkpoint.h>
+#include <linux/checkpoint_hdr.h>
+
+/************************************************************************
+ * ipc checkpoint
+ */
+
+static int cr_fill_ipc_msg_hdr(struct cr_ctx *ctx,
+			       struct cr_hdr_ipc_msg *hh,
+			       struct msg_queue *msq)
+{
+	int ret = 0;
+
+	ipc_lock_by_ptr(&msq->q_perm);
+
+	cr_fill_ipc_perms(&hh->perms, &msq->q_perm);
+
+	hh->q_stime = msq->q_stime;
+	hh->q_rtime = msq->q_rtime;
+	hh->q_ctime = msq->q_ctime;
+	hh->q_cbytes = msq->q_cbytes;
+	hh->q_qnum = msq->q_qnum;
+	hh->q_qbytes = msq->q_qbytes;
+	hh->q_lspid = msq->q_lspid;
+	hh->q_lrpid = msq->q_lrpid;
+
+	ipc_unlock(&msq->q_perm);
+
+	cr_debug("msg: lspid %d rspid %d qnum %lld qbytes %lld\n",
+		 hh->q_lspid, hh->q_lrpid, hh->q_qnum, hh->q_qbytes);
+
+	return ret;
+}
+
+static int cr_write_msg_contents(struct cr_ctx *ctx, struct msg_msg *msg)
+{
+	struct cr_hdr h;
+	struct cr_hdr_ipc_msg_msg *hh;
+	struct msg_msgseg *seg;
+	int total, len;
+	int ret;
+
+	h.type = CR_HDR_IPC_MSG_MSG;
+	h.len = sizeof(*hh);
+
+	hh = cr_hbuf_get(ctx, sizeof(*hh));
+	if (!hh)
+		return -ENOMEM;
+
+	hh->m_type = msg->m_type;
+	hh->m_ts = msg->m_ts;
+
+	ret = cr_write_obj(ctx, &h, hh);
+	cr_hbuf_put(ctx, sizeof(*hh));
+	if (ret < 0)
+		return ret;
+
+	total = msg->m_ts;
+	len = min(total, (int) DATALEN_MSG);
+	ret = cr_write_buffer(ctx, (msg + 1), len);
+	if (ret < 0)
+		return ret;
+
+	seg = msg->next;
+	total -= len;
+
+	while (total) {
+		len = min(total, (int) DATALEN_SEG);
+		ret = cr_write_buffer(ctx, (seg + 1), len);
+		if (ret < 0)
+			break;
+		seg = seg->next;
+		total -= len;
+	}
+
+	return ret;
+}
+
+static int cr_write_msg_queue(struct cr_ctx *ctx, struct msg_queue *msq)
+{
+	struct list_head messages;
+	struct msg_msg *msg;
+	int ret = -EBUSY;
+
+	/*
+	 * Scanning the msq requires the lock, but then we can't write
+	 * data out from inside. Instead, we grab the lock, remove all
+	 * messages to our own list, drop the lock, write the messages,
+	 * and finally re-attach the them to the msq with the lock taken.
+	 */
+	ipc_lock_by_ptr(&msq->q_perm);
+	if (!list_empty(&msq->q_receivers))
+		goto unlock;
+	if (!list_empty(&msq->q_senders))
+		goto unlock;
+	if (list_empty(&msq->q_messages))
+		goto unlock;
+	/* temporarily take out all messages */
+	INIT_LIST_HEAD(&messages);
+	list_splice_init(&msq->q_messages, &messages);
+ unlock:
+	ipc_unlock(&msq->q_perm);
+
+	list_for_each_entry(msg, &messages, m_list) {
+		ret = cr_write_msg_contents(ctx, msg);
+		if (ret < 0)
+			break;
+	}
+
+	/* put all the messages back in */
+	ipc_lock_by_ptr(&msq->q_perm);
+	list_splice(&messages, &msq->q_messages);
+	ipc_unlock(&msq->q_perm);
+
+	return ret;
+}
+
+static int cr_do_write_ipc_msg(int id, void *p, void *data)
+{
+	struct cr_hdr h;
+	struct cr_hdr_ipc_msg *hh;
+	struct cr_ctx *ctx = (struct cr_ctx *) data;
+	struct kern_ipc_perm *perm = (struct kern_ipc_perm *) p;
+	struct msg_queue *msq;
+	int ret;
+
+	msq = container_of(perm, struct msg_queue, q_perm);
+
+	h.type = CR_HDR_IPC_MSG;
+	h.len = sizeof(*hh);
+	hh = cr_hbuf_get(ctx, sizeof(*hh));
+	if (!hh)
+		return -ENOMEM;
+
+	ret = cr_fill_ipc_msg_hdr(ctx, hh, msq);
+	if (ret < 0)
+		goto out;
+
+	ret = cr_write_obj(ctx, &h, hh);
+	if (ret < 0)
+		goto out;
+
+	if (hh->q_qnum)
+		ret = cr_write_msg_queue(ctx, msq);
+
+ out:
+	cr_hbuf_put(ctx, sizeof(*hh));
+	return ret;
+}
+
+int cr_write_ipc_msg(struct cr_ctx *ctx, struct ipc_namespace *ipcns)
+{
+	struct cr_hdr h;
+	struct cr_hdr_ipc *hh;
+	struct ipc_ids *msg_ids = &ipcns->ids[IPC_MSG_IDS];
+	int ret = -ENOMEM;
+
+	down_read(&msg_ids->rw_mutex);
+
+	h.type = CR_HDR_IPC;
+	h.len = sizeof(*hh);
+	hh = cr_hbuf_get(ctx, sizeof(*hh));
+	if (!hh)
+		goto out;
+
+	hh->ipc_type = CR_HDR_IPC_MSG;
+	hh->ipc_count = msg_ids->in_use;
+	cr_debug("msg: count %d\n", hh->ipc_count);
+
+	ret = cr_write_obj(ctx, &h, hh);
+	cr_hbuf_put(ctx, sizeof(*hh));
+	if (ret < 0)
+		goto out;
+
+	ret = idr_for_each(&msg_ids->ipcs_idr, cr_do_write_ipc_msg, ctx);
+	cr_debug("msg: ret %d\n", ret);
+
+ out:
+	up_read(&msg_ids->rw_mutex);
+	return ret;
+}
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC v2][PATCH 10/10] sysvipc-msq: restart
       [not found] ` <1239107503-21941-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
                     ` (8 preceding siblings ...)
  2009-04-07 12:31   ` [RFC v2][PATCH 09/10] sysvipc-msq: checkpoint Oren Laadan
@ 2009-04-07 12:31   ` Oren Laadan
  9 siblings, 0 replies; 19+ messages in thread
From: Oren Laadan @ 2009-04-07 12:31 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Dave Hansen

The namespace is restored by creating each 'msq' object read from
the checkpoint image.

Message of a specific queue are first read and chained together on
a temporary list, and once done are attached atomically as a whole
to the newly created message queue ('msq').

Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
 checkpoint/rstr_file.c     |    1 -
 checkpoint/util_ipc.c      |    9 ++-
 include/linux/checkpoint.h |    1 +
 ipc/ckpt_msg.c             |  210 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 219 insertions(+), 2 deletions(-)

diff --git a/checkpoint/rstr_file.c b/checkpoint/rstr_file.c
index cf3bece..845a5cd 100644
--- a/checkpoint/rstr_file.c
+++ b/checkpoint/rstr_file.c
@@ -399,7 +399,6 @@ int cr_read_fd_table(struct cr_ctx *ctx)
 			break;
 	}
 
-	ret = 0;
  out:
 	cr_hbuf_put(ctx, sizeof(*hh));
 	return ret;
diff --git a/checkpoint/util_ipc.c b/checkpoint/util_ipc.c
index 163a106..9a4e37d 100644
--- a/checkpoint/util_ipc.c
+++ b/checkpoint/util_ipc.c
@@ -27,7 +27,14 @@ int cr_write_ipcns(struct cr_ctx *ctx, struct ipc_namespace *ipc_ns)
 
 int cr_read_ipcns(struct cr_ctx *ctx)
 {
-	return cr_read_ipc_shm(ctx);
+	int ret;
+
+	ret = cr_read_ipc_shm(ctx);
+	if (ret < 0)
+		return ret;
+	ret = cr_read_ipc_msg(ctx);
+
+	return ret;
 }
 
 void cr_fill_ipc_perms(struct cr_hdr_ipc_perms *hh, struct kern_ipc_perm *perm)
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 16dd96d..898176c 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -156,6 +156,7 @@ extern int cr_read_ipc_shm(struct cr_ctx *ctx);
 extern int cr_ipc_shm_attach(struct file *file,
 			     unsigned long addr, unsigned long flags);
 extern int cr_write_ipc_msg(struct cr_ctx *ctx, struct ipc_namespace *ipcns);
+extern int cr_read_ipc_msg(struct cr_ctx *ctx);
 #endif
 
 
diff --git a/ipc/ckpt_msg.c b/ipc/ckpt_msg.c
index 5e11253..eebbf06 100644
--- a/ipc/ckpt_msg.c
+++ b/ipc/ckpt_msg.c
@@ -202,3 +202,213 @@ int cr_write_ipc_msg(struct cr_ctx *ctx, struct ipc_namespace *ipcns)
 	up_read(&msg_ids->rw_mutex);
 	return ret;
 }
+
+/************************************************************************
+ * ipc restart
+ */
+
+static int cr_load_ipc_msg_hdr(struct cr_ctx *ctx,
+			       struct cr_hdr_ipc_msg *hh,
+			       struct msg_queue *msq)
+{
+	int ret = 0;
+
+	ret = cr_load_ipc_perms(&hh->perms, &msq->q_perm);
+	if (ret < 0)
+		return ret;
+
+	cr_debug("msq: lspid %d lrpid %d qnum %lld qbytes %lld\n",
+		 hh->q_lspid, hh->q_lrpid, hh->q_qnum, hh->q_qbytes);
+
+	if (hh->q_lspid < 0 || hh->q_lrpid < 0)
+		return -EINVAL;
+
+	msq->q_stime = hh->q_stime;
+	msq->q_rtime = hh->q_rtime;
+	msq->q_ctime = hh->q_ctime;
+	msq->q_lspid = hh->q_lspid;
+	msq->q_lrpid = hh->q_lrpid;
+
+	return 0;
+}
+
+static struct msg_msg *cr_read_msg_contents_one(struct cr_ctx *ctx)
+{
+	struct cr_hdr_ipc_msg_msg *hh;
+	struct msg_msg *msg = NULL;
+	struct msg_msgseg *seg, **pseg;
+	int total, len;
+	int ret;
+
+	hh = cr_hbuf_get(ctx, sizeof(*hh));
+	if (!hh)
+		return ERR_PTR(-ENOMEM);
+	ret = cr_read_obj_type(ctx, hh, sizeof(*hh), CR_HDR_IPC_MSG_MSG);
+	if (ret < 0)
+		goto out;
+
+	ret = -EINVAL;
+	if (hh->m_type < 1)
+		goto out;
+	if (hh->m_ts > current->nsproxy->ipc_ns->msg_ctlmax)
+		goto out;
+
+	total = hh->m_ts;
+	len = min(total, (int) DATALEN_MSG);
+	ret = -ENOMEM;
+	msg = kmalloc(sizeof(*msg) + len, GFP_KERNEL);
+	if (!msg)
+		goto out;
+	msg->next = NULL;
+	pseg = &msg->next;
+
+	ret = cr_read_buffer_len(ctx, (msg + 1), len);
+	if (ret < 0)
+		goto out;
+
+	total -= len;
+	while (total) {
+		len = min(total, (int) DATALEN_SEG);
+		ret = -ENOMEM;
+		seg = kmalloc(sizeof(*seg) + len, GFP_KERNEL);
+		if (!seg)
+			goto out;
+		seg->next = NULL;
+		*pseg = seg;
+		pseg = &seg->next;
+
+		ret = cr_read_buffer_len(ctx, (seg + 1), len);
+		if (ret < 0)
+			goto out;
+	}
+
+	msg->m_type = hh->m_type;
+	msg->m_ts = hh->m_ts;
+	return msg;
+
+ out:
+	if (msg)
+		free_msg(msg);
+	return ERR_PTR(ret);
+}
+
+static int cr_read_msg_contents(struct cr_ctx *ctx,
+				struct list_head *queue, unsigned long qnum)
+{
+	struct msg_msg *msg, *tmp;
+	int ret = 0;
+
+	INIT_LIST_HEAD(queue);
+
+	while (qnum--) {
+		msg = cr_read_msg_contents_one(ctx);
+		if (IS_ERR(msg))
+			goto fail;
+		list_add_tail(&msg->m_list, queue);
+	}
+
+	return 0;
+
+ fail:
+	ret = PTR_ERR(msg);
+	list_for_each_entry_safe(msg, tmp, queue, m_list)
+		free_msg(msg);
+	return ret;
+}
+
+static int cr_do_read_ipc_msg(struct cr_ctx *ctx)
+{
+	struct cr_hdr_ipc_msg *hh;
+	struct kern_ipc_perm *perms;
+	struct msg_queue *msq;
+	struct ipc_ids *msg_ids = &current->nsproxy->ipc_ns->ids[IPC_MSG_IDS];
+	struct list_head messages;
+	int msgflag;
+	int ret;
+
+	hh = cr_hbuf_get(ctx, sizeof(*hh));
+	if (!hh)
+		return -ENOMEM;
+	ret = cr_read_obj_type(ctx, hh, sizeof(*hh), CR_HDR_IPC_MSG);
+	if (ret < 0)
+		goto out;
+	ret = -EINVAL;
+	if (hh->perms.id < 0)
+		goto out;
+
+	/* read queued messages into temporary queue */
+	ret = cr_read_msg_contents(ctx, &messages, hh->q_qnum);
+	if (ret < 0)
+		goto out;
+
+	/* restore the message queue now */
+	msgflag = hh->perms.mode | IPC_CREAT | IPC_EXCL;
+	cr_debug("msg: do_msgget key %d flag %#x id %d\n",
+		 hh->perms.key, msgflag, hh->perms.id);
+	ret = do_msgget(hh->perms.key, msgflag, hh->perms.id);
+	cr_debug("shm: do_msgget ret %d\n", ret);
+	if (ret < 0)
+		goto out;
+
+	down_write(&msg_ids->rw_mutex);
+	ret = -EIDRM;
+	perms = ipc_lock(msg_ids, hh->perms.id);
+	if (IS_ERR(perms)) {	/* this should not happen .. but be safe */
+		up_write(&msg_ids->rw_mutex);
+		ret = PTR_ERR(perms);
+		goto out;
+	}
+
+	msq = container_of(perms, struct msg_queue, q_perm);
+	ret = cr_load_ipc_msg_hdr(ctx, hh, msq);
+	if (ret < 0) {
+		cr_debug("msq: need to remove (%d)\n", ret);
+		freeque(current->nsproxy->ipc_ns, perms);
+		up_write(&msg_ids->rw_mutex);
+		goto out;
+	}
+
+	/* attach queued messages we read before */
+	if (list_empty(&msq->q_messages))
+		list_splice_init(&messages, &msq->q_messages);
+	else
+		ret = -EBUSY;
+
+	ipc_unlock(perms);
+	up_write(&msg_ids->rw_mutex);
+ out:
+	cr_hbuf_put(ctx, sizeof(*hh));
+	return ret;
+}
+
+int cr_read_ipc_msg(struct cr_ctx *ctx)
+{
+	struct cr_hdr_ipc *hh;
+	int n, ret;
+
+	hh = cr_hbuf_get(ctx, sizeof(*hh));
+	if (!hh)
+		return -ENOMEM;
+
+	ret = cr_read_obj_type(ctx, hh, sizeof(*hh), CR_HDR_IPC);
+	if (ret < 0)
+		goto out;
+
+	cr_debug("msg: count %d\n", hh->ipc_count);
+
+	ret = -EINVAL;
+	if (hh->ipc_type != CR_HDR_IPC_MSG)
+		goto out;
+
+	ret = 0;
+	for (n = 0; n < hh->ipc_count; n++) {
+		ret = cr_do_read_ipc_msg(ctx);
+		if (ret < 0)
+			goto out;
+	}
+
+ out:
+	cr_debug("msg: ret %d\n", ret);
+	cr_hbuf_put(ctx, sizeof(*hh));
+	return ret;
+}
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC v2][PATCH 01/10] Infrastructure for work postponed to the end of checkpoint/restart
       [not found]     ` <1239107503-21941-2-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-04-13 15:35       ` Serge E. Hallyn
       [not found]         ` <20090413153504.GA15846-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Serge E. Hallyn @ 2009-04-13 15:35 UTC (permalink / raw)
  To: Oren Laadan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Dave Hansen

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> --- a/checkpoint/Makefile
> +++ b/checkpoint/Makefile
> @@ -2,8 +2,8 @@
>  # Makefile for linux checkpoint/restart.
>  #
> 
> -obj-$(CONFIG_CHECKPOINT) += sys.o objhash.o \
> +obj-$(CONFIG_CHECKPOINT) += sys.o objhash.o deferqueue.o \
>  		checkpoint.o restart.o \
>  		ckpt_task.o rstr_task.o \
>  		ckpt_mem.o rstr_mem.o \
> -		ckpt_file.o rstr_file.o
> +		ckpt_file.o rstr_file.o \

?

> +int cr_deferqueue_add(struct cr_ctx *ctx, cr_deferqueue_func_t function,
> +		     unsigned int flags, void *data, int size)
> +{
> +	struct cr_deferqueue *wq;
> +
> +	wq = kmalloc(sizeof(wq) + size, GFP_KERNEL);
> +	if (!wq)
> +		return -ENOMEM;
> +
> +	wq->function = function;
> +	wq->flags = flags;
> +	memcpy(wq->data, data, size);
> +
> +	cr_debug("adding work %p function %p\n", wq, wq->function);
> +	list_add_tail(&ctx->deferqueue, &wq->list);
> +	return 0;
> +}

Shouldn't the deferqueue be protected by a spinlock here?

-serge

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2][PATCH 01/10] Infrastructure for work postponed to the end of checkpoint/restart
       [not found]         ` <20090413153504.GA15846-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-04-13 15:49           ` Oren Laadan
       [not found]             ` <49E35F25.5070501-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Oren Laadan @ 2009-04-13 15:49 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Dave Hansen



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>> --- a/checkpoint/Makefile
>> +++ b/checkpoint/Makefile
>> @@ -2,8 +2,8 @@
>>  # Makefile for linux checkpoint/restart.
>>  #
>>
>> -obj-$(CONFIG_CHECKPOINT) += sys.o objhash.o \
>> +obj-$(CONFIG_CHECKPOINT) += sys.o objhash.o deferqueue.o \
>>  		checkpoint.o restart.o \
>>  		ckpt_task.o rstr_task.o \
>>  		ckpt_mem.o rstr_mem.o \
>> -		ckpt_file.o rstr_file.o
>> +		ckpt_file.o rstr_file.o \
> 
> ?
> 
>> +int cr_deferqueue_add(struct cr_ctx *ctx, cr_deferqueue_func_t function,
>> +		     unsigned int flags, void *data, int size)
>> +{
>> +	struct cr_deferqueue *wq;
>> +
>> +	wq = kmalloc(sizeof(wq) + size, GFP_KERNEL);
>> +	if (!wq)
>> +		return -ENOMEM;
>> +
>> +	wq->function = function;
>> +	wq->flags = flags;
>> +	memcpy(wq->data, data, size);
>> +
>> +	cr_debug("adding work %p function %p\n", wq, wq->function);
>> +	list_add_tail(&ctx->deferqueue, &wq->list);
>> +	return 0;
>> +}
> 
> Shouldn't the deferqueue be protected by a spinlock here?

Not until we implement concurrent checkpoint/restart. At the moment
it's one task at a time the can access it.

Oren.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2][PATCH 01/10] Infrastructure for work postponed to the end of checkpoint/restart
       [not found]             ` <49E35F25.5070501-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-04-13 18:04               ` Serge E. Hallyn
       [not found]                 ` <20090413180459.GA18467-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Serge E. Hallyn @ 2009-04-13 18:04 UTC (permalink / raw)
  To: Oren Laadan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Dave Hansen

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> 
> 
> Serge E. Hallyn wrote:
> > Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >> --- a/checkpoint/Makefile
> >> +++ b/checkpoint/Makefile
> >> @@ -2,8 +2,8 @@
> >>  # Makefile for linux checkpoint/restart.
> >>  #
> >>
> >> -obj-$(CONFIG_CHECKPOINT) += sys.o objhash.o \
> >> +obj-$(CONFIG_CHECKPOINT) += sys.o objhash.o deferqueue.o \
> >>  		checkpoint.o restart.o \
> >>  		ckpt_task.o rstr_task.o \
> >>  		ckpt_mem.o rstr_mem.o \
> >> -		ckpt_file.o rstr_file.o
> >> +		ckpt_file.o rstr_file.o \
> > 
> > ?
> > 
> >> +int cr_deferqueue_add(struct cr_ctx *ctx, cr_deferqueue_func_t function,
> >> +		     unsigned int flags, void *data, int size)
> >> +{
> >> +	struct cr_deferqueue *wq;
> >> +
> >> +	wq = kmalloc(sizeof(wq) + size, GFP_KERNEL);
> >> +	if (!wq)
> >> +		return -ENOMEM;
> >> +
> >> +	wq->function = function;
> >> +	wq->flags = flags;
> >> +	memcpy(wq->data, data, size);
> >> +
> >> +	cr_debug("adding work %p function %p\n", wq, wq->function);
> >> +	list_add_tail(&ctx->deferqueue, &wq->list);
> >> +	return 0;
> >> +}
> > 
> > Shouldn't the deferqueue be protected by a spinlock here?
> 
> Not until we implement concurrent checkpoint/restart. At the moment
> it's one task at a time the can access it.

That's too bad.  I think this woudl be better done as a single
simple patch addin ga new generic deferqueue mechanism for all
to use, with a per-queue spinlock protecting both _add and
_run

-serge

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2][PATCH 01/10] Infrastructure for work postponed to the end of checkpoint/restart
       [not found]                 ` <20090413180459.GA18467-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-04-14  6:16                   ` Oren Laadan
       [not found]                     ` <49E42A23.4030404-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Oren Laadan @ 2009-04-14  6:16 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Dave Hansen



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>
>> Serge E. Hallyn wrote:
>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>>> --- a/checkpoint/Makefile
>>>> +++ b/checkpoint/Makefile
>>>> @@ -2,8 +2,8 @@
>>>>  # Makefile for linux checkpoint/restart.
>>>>  #
>>>>
>>>> -obj-$(CONFIG_CHECKPOINT) += sys.o objhash.o \
>>>> +obj-$(CONFIG_CHECKPOINT) += sys.o objhash.o deferqueue.o \
>>>>  		checkpoint.o restart.o \
>>>>  		ckpt_task.o rstr_task.o \
>>>>  		ckpt_mem.o rstr_mem.o \
>>>> -		ckpt_file.o rstr_file.o
>>>> +		ckpt_file.o rstr_file.o \
>>> ?
>>>
>>>> +int cr_deferqueue_add(struct cr_ctx *ctx, cr_deferqueue_func_t function,
>>>> +		     unsigned int flags, void *data, int size)
>>>> +{
>>>> +	struct cr_deferqueue *wq;
>>>> +
>>>> +	wq = kmalloc(sizeof(wq) + size, GFP_KERNEL);
>>>> +	if (!wq)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	wq->function = function;
>>>> +	wq->flags = flags;
>>>> +	memcpy(wq->data, data, size);
>>>> +
>>>> +	cr_debug("adding work %p function %p\n", wq, wq->function);
>>>> +	list_add_tail(&ctx->deferqueue, &wq->list);
>>>> +	return 0;
>>>> +}
>>> Shouldn't the deferqueue be protected by a spinlock here?
>> Not until we implement concurrent checkpoint/restart. At the moment
>> it's one task at a time the can access it.
> 
> That's too bad.  I think this woudl be better done as a single
> simple patch addin ga new generic deferqueue mechanism for all
> to use, with a per-queue spinlock protecting both _add and
> _run


Fair enough. Would you like to take a stab at it ?

Oren.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2][PATCH 01/10] Infrastructure for work postponed to the end of checkpoint/restart
       [not found]                     ` <49E42A23.4030404-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-04-14 17:14                       ` Serge E. Hallyn
  2009-04-14 22:48                       ` Serge E. Hallyn
  1 sibling, 0 replies; 19+ messages in thread
From: Serge E. Hallyn @ 2009-04-14 17:14 UTC (permalink / raw)
  To: Oren Laadan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Dave Hansen

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> > That's too bad.  I think this woudl be better done as a single
> > simple patch addin ga new generic deferqueue mechanism for all
> > to use, with a per-queue spinlock protecting both _add and
> > _run
> 
> 
> Fair enough. Would you like to take a stab at it ?
> 
> Oren.

Sure - if you people would shut up for a bit so I could stop being
behind on emails :)

-serge

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2][PATCH 02/10] ipc: allow allocation of an ipc object with desired identifier
       [not found]     ` <1239107503-21941-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-04-14 18:08       ` Serge E. Hallyn
  0 siblings, 0 replies; 19+ messages in thread
From: Serge E. Hallyn @ 2009-04-14 18:08 UTC (permalink / raw)
  To: Oren Laadan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Dave Hansen

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> During restart, we need to allocate ipc objects that with the same
> identifiers as recorded during checkpoint. Modify the allocation
> code allow an in-kernel caller to request a specific ipc identifier.
> The system call interface remains unchanged.
> 
> Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2][PATCH 03/10] ipc: helpers to save and restore kern_ipc_perm structures
       [not found]     ` <1239107503-21941-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-04-14 21:55       ` Serge E. Hallyn
  0 siblings, 0 replies; 19+ messages in thread
From: Serge E. Hallyn @ 2009-04-14 21:55 UTC (permalink / raw)
  To: Oren Laadan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Dave Hansen

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> Add the helpers to save and restore the contents of 'struct
> kern_ipc_perm'. Add header structures for ipc state. Put
> place-holders to save and restore ipc state.
> 
> TODO:
> This patch does _not_ address the issues of users/groups and the
> related security issues. For now, it saves the old user/group of
> ipc objects, but does not restore them during restart.
> 
> Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC v2][PATCH 01/10] Infrastructure for work postponed to the end of checkpoint/restart
       [not found]                     ` <49E42A23.4030404-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
  2009-04-14 17:14                       ` Serge E. Hallyn
@ 2009-04-14 22:48                       ` Serge E. Hallyn
  1 sibling, 0 replies; 19+ messages in thread
From: Serge E. Hallyn @ 2009-04-14 22:48 UTC (permalink / raw)
  To: Oren Laadan
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Dave Hansen

Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> 
> 
> Serge E. Hallyn wrote:
> > Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>
> >> Serge E. Hallyn wrote:
> >>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
> >>>> --- a/checkpoint/Makefile
> >>>> +++ b/checkpoint/Makefile
> >>>> @@ -2,8 +2,8 @@
> >>>>  # Makefile for linux checkpoint/restart.
> >>>>  #
> >>>>
> >>>> -obj-$(CONFIG_CHECKPOINT) += sys.o objhash.o \
> >>>> +obj-$(CONFIG_CHECKPOINT) += sys.o objhash.o deferqueue.o \
> >>>>  		checkpoint.o restart.o \
> >>>>  		ckpt_task.o rstr_task.o \
> >>>>  		ckpt_mem.o rstr_mem.o \
> >>>> -		ckpt_file.o rstr_file.o
> >>>> +		ckpt_file.o rstr_file.o \
> >>> ?
> >>>
> >>>> +int cr_deferqueue_add(struct cr_ctx *ctx, cr_deferqueue_func_t function,
> >>>> +		     unsigned int flags, void *data, int size)
> >>>> +{
> >>>> +	struct cr_deferqueue *wq;
> >>>> +
> >>>> +	wq = kmalloc(sizeof(wq) + size, GFP_KERNEL);
> >>>> +	if (!wq)
> >>>> +		return -ENOMEM;
> >>>> +
> >>>> +	wq->function = function;
> >>>> +	wq->flags = flags;
> >>>> +	memcpy(wq->data, data, size);
> >>>> +
> >>>> +	cr_debug("adding work %p function %p\n", wq, wq->function);
> >>>> +	list_add_tail(&ctx->deferqueue, &wq->list);
> >>>> +	return 0;
> >>>> +}
> >>> Shouldn't the deferqueue be protected by a spinlock here?
> >> Not until we implement concurrent checkpoint/restart. At the moment
> >> it's one task at a time the can access it.
> > 
> > That's too bad.  I think this woudl be better done as a single
> > simple patch addin ga new generic deferqueue mechanism for all
> > to use, with a per-queue spinlock protecting both _add and
> > _run
> 
> 
> Fair enough. Would you like to take a stab at it ?

Only compile tested so far, but here's what I end up with so far.
I'll try to hook it into the rest of your patchset later tonight
or tomorrow...

-serge

From 45cdd4a387cb4d34f02fe1a3c9043169d1df2681 Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Date: Tue, 14 Apr 2009 15:45:38 -0700
Subject: [PATCH 1/1] deferqueue: generic queue to defer work

For us lazy types...

Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 checkpoint/Kconfig         |    1 +
 include/linux/deferqueue.h |   31 ++++++++++++++++
 kernel/Makefile            |    1 +
 kernel/deferqueue.c        |   87 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 120 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/deferqueue.h
 create mode 100644 kernel/deferqueue.c

diff --git a/checkpoint/Kconfig b/checkpoint/Kconfig
index 1761b0a..4e20f18 100644
--- a/checkpoint/Kconfig
+++ b/checkpoint/Kconfig
@@ -5,6 +5,7 @@
 config CHECKPOINT
 	bool "Enable checkpoint/restart (EXPERIMENTAL)"
 	depends on CHECKPOINT_SUPPORT && EXPERIMENTAL
+	select DEFERQUEUE
 	help
 	  Application checkpoint/restart is the ability to save the
 	  state of a running application so that it can later resume
diff --git a/include/linux/deferqueue.h b/include/linux/deferqueue.h
new file mode 100644
index 0000000..5de9797
--- /dev/null
+++ b/include/linux/deferqueue.h
@@ -0,0 +1,31 @@
+/*
+ * workqueue.h --- work queue handling for Linux.
+ */
+
+#ifndef _LINUX_DEFERQUEUE_H
+#define _LINUX_DEFERQUEUE_H
+
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+typedef int (*deferqueue_func_t)(void *);
+
+struct deferqueue_entry {
+	deferqueue_func_t function;
+	struct list_head list;
+	char data[0];
+};
+
+struct deferqueue_head {
+	spinlock_t lock;
+	struct list_head list;
+};
+
+struct deferqueue_head *deferqueue_create(void);
+void deferqueue_destroy(struct deferqueue_head *h);
+int deferqueue_add(struct deferqueue_head *head, deferqueue_func_t function,
+		void *data, int size);
+int cr_deferqueue_run(struct deferqueue_head *head);
+
+#endif
diff --git a/kernel/Makefile b/kernel/Makefile
index e4791b3..0848374 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -22,6 +22,7 @@ CFLAGS_REMOVE_cgroup-debug.o = -pg
 CFLAGS_REMOVE_sched_clock.o = -pg
 endif
 
+obj-$(CONFIG_DEFERQUEUE) += deferqueue.o
 obj-$(CONFIG_FREEZER) += freezer.o
 obj-$(CONFIG_PROFILING) += profile.o
 obj-$(CONFIG_SYSCTL_SYSCALL_CHECK) += sysctl_check.o
diff --git a/kernel/deferqueue.c b/kernel/deferqueue.c
new file mode 100644
index 0000000..9d6f44b
--- /dev/null
+++ b/kernel/deferqueue.c
@@ -0,0 +1,87 @@
+/*
+ *  Checkpoint-restart - infrastructure to manage deferred work
+ *
+ *  This differs from a workqueue in that the work must be deferred
+ *  until specifically run by the caller.
+ *
+ *  As the only user currently is checkpoint/restart, which has
+ *  very simple usage, the locking is kept simple.  Adding rules
+ *  is protected by the head->lock.  But deferqueue_run() is only
+ *  called once, after all entries have been added.  So it is not
+ *  protected.  Similarly, _destroy is only called once when the
+ *  cr_ctx is releeased, so it is not locked or refcounted.  These
+ *  can of course be added if needed by other users.
+ *
+ *  Copyright (C) 2009 Oren Laadan
+ *
+ *  This file is subject to the terms and conditions of the GNU General Public
+ *  License.  See the file COPYING in the main directory of the Linux
+ *  distribution for more details.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/deferqueue.h>
+
+struct deferqueue_head *deferqueue_create(void)
+{
+	struct deferqueue_head *h = kmalloc(sizeof(*h), GFP_KERNEL);
+	if (h) {
+		spin_lock_init(&h->lock);
+		INIT_LIST_HEAD(&h->list);
+	}
+	return h;
+}
+
+void deferqueue_destroy(struct deferqueue_head *h)
+{
+	if (!list_empty(&h->list))
+		pr_debug("%s: freeing non-empty queue\n", __func__);
+	kfree(h);
+}
+
+int deferqueue_add(struct deferqueue_head *head, deferqueue_func_t function,
+		void *data, int size)
+{
+	struct deferqueue_entry *wq;
+
+	wq = kmalloc(sizeof(wq) + size, GFP_KERNEL);
+	if (!wq)
+		return -ENOMEM;
+
+	wq->function = function;
+	memcpy(wq->data, data, size);
+
+	pr_debug("%s: adding work %p function %p\n", __func__, wq,
+			wq->function);
+	spin_lock(&head->lock);
+	list_add_tail(&head->list, &wq->list);
+	spin_unlock(&head->lock);
+	return 0;
+}
+
+/*
+ * deferqueue_run - perform all work in the work queue
+ * @head: deferqueue_head from which to run
+ *
+ * returns: number of works performed, or < 0 on error
+ */
+int cr_deferqueue_run(struct deferqueue_head *head)
+{
+	struct deferqueue_entry *wq, *n;
+	int nr = 0;
+	int ret;
+
+	list_for_each_entry_safe(wq, n, &head->list, list) {
+		pr_debug("doing work %p function %p\n", wq, wq->function);
+		ret = wq->function(wq->data);
+		if (ret < 0)
+			pr_debug("wq function failed %d\n", ret);
+		list_del(&wq->list);
+		kfree(wq);
+		nr++;
+	}
+
+	return nr;
+}
-- 
1.5.4.3

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-04-14 22:48 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-07 12:31 [RFC v2][PATCH 00/10] sysv SHM checkpoint/restart Oren Laadan
     [not found] ` <1239107503-21941-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-07 12:31   ` [RFC v2][PATCH 01/10] Infrastructure for work postponed to the end of checkpoint/restart Oren Laadan
     [not found]     ` <1239107503-21941-2-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-13 15:35       ` Serge E. Hallyn
     [not found]         ` <20090413153504.GA15846-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-13 15:49           ` Oren Laadan
     [not found]             ` <49E35F25.5070501-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-13 18:04               ` Serge E. Hallyn
     [not found]                 ` <20090413180459.GA18467-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-04-14  6:16                   ` Oren Laadan
     [not found]                     ` <49E42A23.4030404-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-14 17:14                       ` Serge E. Hallyn
2009-04-14 22:48                       ` Serge E. Hallyn
2009-04-07 12:31   ` [RFC v2][PATCH 02/10] ipc: allow allocation of an ipc object with desired identifier Oren Laadan
     [not found]     ` <1239107503-21941-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-14 18:08       ` Serge E. Hallyn
2009-04-07 12:31   ` [RFC v2][PATCH 03/10] ipc: helpers to save and restore kern_ipc_perm structures Oren Laadan
     [not found]     ` <1239107503-21941-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-04-14 21:55       ` Serge E. Hallyn
2009-04-07 12:31   ` [RFC v2][PATCH 04/10] sysvipc-shm: checkpoint Oren Laadan
2009-04-07 12:31   ` [RFC v2][PATCH 05/10] sysvipc-shm: restart Oren Laadan
2009-04-07 12:31   ` [RFC v2][PATCH 06/10] sysvipc-shm: export interface from ipc/shm.c to delete ipc shm Oren Laadan
2009-04-07 12:31   ` [RFC v2][PATCH 07/10] sysvipc-shm: correctly handle deleted (active) ipc shared memory Oren Laadan
2009-04-07 12:31   ` [RFC v2][PATCH 08/10] sysvipc-msg: make 'struct msg_msgseg' visible in ipc/util.h Oren Laadan
2009-04-07 12:31   ` [RFC v2][PATCH 09/10] sysvipc-msq: checkpoint Oren Laadan
2009-04-07 12:31   ` [RFC v2][PATCH 10/10] sysvipc-msq: restart Oren Laadan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.