All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] SYSVIPC/semaphores - allow saving/restoring a process' semundo_list
@ 2008-06-25 13:49 Nadia.Derbey-6ktuUTfB/bM
  2008-06-25 13:49 ` [RFC PATCH 1/6] IPC/sem: RCU-protect the process semundo list Nadia.Derbey-6ktuUTfB/bM
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-06-25 13:49 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA


This patchset is a part of an effort to make sysv ipc objects
read/writable from userspace for checkpoint / restart.

System V ipc's are objects that are global to a system and can thus be
checkpointed and restarted on a container basis. But some parts of the ipc
structures are process related and, as such should be checkpointed / restarted
on a process basis.

Message queues needn't and thus cannot be reached from a task structure
(only the other direction is possible: a task can be reached by a msg receiver
or sender structure if it is sleeping).

Shared memories are accessible from a task structure through that task's
memory mapping (/proc/<pid>/maps shows a process' memory maps).

Semaphores are kind of accessible from a task structure too: the task
structure's sysvsem field makes it possible to walk through all the semaphores
operations to undo when a process is exiting.
This list, that need to be saved and restored during a process' c/r,
cannot yet be accessed from user space.

This is a feature that will be needed if ever we take the direction of driving
checkpoint / restart from user space, though the read part of it could be used
even from now on.

Since this undo_list is, again, on a thread basis, we propose to externalize
it via a new proc file: /proc/<pid>/semundo.

Actually, Pierre Pieffer has already done the proposal in threads
https://lists.linux-foundation.org/pipermail/containers/2008-January/thread.html#9756
up to #9759

I've ported them to 2.6.26-rc5-mm3, and I'm now coming back with a
simpler implementation: the write operation is now only allowed into
/proc/self/semundo, which simplifies the locking strategy.


These patches should be applied to 2.6.26-rc5-mm3, in the following order:
[ PATCH 1/6] : ipc_rcu_protect_access_to_undo_list.patch
               Makes the process' undo_list rcu protected in order to enable
               safely reading it.
[ PATCH 2/6] : ipc_procfs_semundo_file.patch
               Introduces the semundo proc file (the seq operations are still
               empty).
[ PATCH 3/6] : ipc_procfs_semundo_start_stop_seqops.patch
               Introduces the .start and .stop seq operations.
[ PATCH 4/6] : ipc_procfs_semundo_next_seqop.patch
               Introduces the .next seq operation.
[ PATCH 5/6] : ipc_procfs_semundo_show_seqop.patch
               Introduces the .show seq operation.
[ PATCH 6/6] : ipc_procfs_semundo_write.patch
               The semundo proc file becomes writable.

Comments are welcome!

Regards,
Nadia

--

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 1/6] IPC/sem: RCU-protect the process semundo list
  2008-06-25 13:49 [RFC PATCH 0/6] SYSVIPC/semaphores - allow saving/restoring a process' semundo_list Nadia.Derbey-6ktuUTfB/bM
@ 2008-06-25 13:49 ` Nadia.Derbey-6ktuUTfB/bM
       [not found]   ` <20080625135538.385496000-6ktuUTfB/bM@public.gmane.org>
  2008-06-25 13:49 ` [RFC PATCH 2/6] IPC/sem: per <pid> semundo file in procfs Nadia.Derbey-6ktuUTfB/bM
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-06-25 13:49 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Pierre Peiffer, Nadia Derbey

[-- Attachment #1: ipc_rcu_protect_access_to_undo_list.patch --]
[-- Type: text/plain, Size: 4042 bytes --]

PATCH [01/06]

Today, 'current' has an exclusive access to its sem_undo_list (anchored at
current->sysvsem.undo_list):
  . it is created during a semop() if the SEMUNDO flag is specified for one
    of the semaphores.
  . it can also be created during a copy_process() operation if the
    CLONE_SYSVSEM flag is specified (in that case the undo_list is created/
    copied from 'current' into the target task but that target task is not
    running yet).
  . it is freed during an unshare() or an exit() operation, if the caller
    (current) is the last task using it.


In order to allow a third party process to read a process' undo list, without
a too big performance impact on the existing operations, this patch proposes
to make the sem_undo_list structures rcu protected.

They can then be safely accessed by any task inside read critical section,
this way:

        struct sem_undo_list *undo_list;
        int ret;
        ...
        rcu_read_lock();
        undo_list = rcu_dereference(task->sysvsem.undo_list);
        if (undo_list)   
                ret = atomic_inc_not_zero(&undo_list->refcnt);
        rcu_read_unlock();
        ...
        if (undo_list && ret) {
                /* section where undo_list can be used quietly */
                ...
        }
        ...

Signed-off-by: Pierre Peiffer <pierre.peiffer-6ktuUTfB/bM@public.gmane.org>
Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>

---
 include/linux/sem.h |    4 +++-
 ipc/sem.c           |   22 ++++++++++++++++++----
 2 files changed, 21 insertions(+), 5 deletions(-)

Index: linux-2.6.26-rc5-mm3/include/linux/sem.h
===================================================================
--- linux-2.6.26-rc5-mm3.orig/include/linux/sem.h	2008-06-24 09:04:21.000000000 +0200
+++ linux-2.6.26-rc5-mm3/include/linux/sem.h	2008-06-24 10:01:50.000000000 +0200
@@ -112,7 +112,8 @@ struct sem_queue {
 };
 
 /* Each task has a list of undo requests. They are executed automatically
- * when the process exits.
+ * when the last refcnt of sem_undo_list is released (ie when the process
+ * exits in the general case).
  */
 struct sem_undo {
 	struct list_head	list_proc;	/* per-process list: all undos from one process. */
@@ -131,6 +132,7 @@ struct sem_undo_list {
 	atomic_t		refcnt;
 	spinlock_t		lock;
 	struct list_head	list_proc;
+	struct rcu_head		rcu;
 };
 
 struct sysv_sem {
Index: linux-2.6.26-rc5-mm3/ipc/sem.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 09:05:03.000000000 +0200
+++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-24 09:37:33.000000000 +0200
@@ -939,6 +939,10 @@ static inline int get_undo_list(struct s
 {
 	struct sem_undo_list *undo_list;
 
+	/*
+	 * No need to have a rcu read critical section here: no one but
+	 * current is accessing the undo_list.
+	 */
 	undo_list = current->sysvsem.undo_list;
 	if (!undo_list) {
 		undo_list = kzalloc(sizeof(*undo_list), GFP_KERNEL);
@@ -948,7 +952,7 @@ static inline int get_undo_list(struct s
 		atomic_set(&undo_list->refcnt, 1);
 		INIT_LIST_HEAD(&undo_list->list_proc);
 
-		current->sysvsem.undo_list = undo_list;
+		rcu_assign_pointer(current->sysvsem.undo_list, undo_list);
 	}
 	*undo_listp = undo_list;
 	return 0;
@@ -1268,10 +1272,15 @@ void exit_sem(struct task_struct *tsk)
 {
 	struct sem_undo_list *ulp;
 
-	ulp = tsk->sysvsem.undo_list;
-	if (!ulp)
+	rcu_read_lock();
+	ulp = rcu_dereference(tsk->sysvsem.undo_list);
+	if (!ulp) {
+		rcu_read_unlock();
 		return;
-	tsk->sysvsem.undo_list = NULL;
+	}
+	rcu_read_unlock();
+	rcu_assign_pointer(tsk->sysvsem.undo_list, NULL);
+	synchronize_rcu();
 
 	if (!atomic_dec_and_test(&ulp->refcnt))
 		return;
@@ -1349,6 +1358,11 @@ void exit_sem(struct task_struct *tsk)
 
 		call_rcu(&un->rcu, free_un);
 	}
+	/*
+	 * No need to call synchronize_rcu() here: we come here if the refcnt
+	 * is 0 and this has been done into exit_sem after synchronizing. So
+	 * nobody else can be referencing to the undo_list.
+	 */
 	kfree(ulp);
 }
 

--

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 2/6] IPC/sem: per <pid> semundo file in procfs
  2008-06-25 13:49 [RFC PATCH 0/6] SYSVIPC/semaphores - allow saving/restoring a process' semundo_list Nadia.Derbey-6ktuUTfB/bM
  2008-06-25 13:49 ` [RFC PATCH 1/6] IPC/sem: RCU-protect the process semundo list Nadia.Derbey-6ktuUTfB/bM
@ 2008-06-25 13:49 ` Nadia.Derbey-6ktuUTfB/bM
       [not found]   ` <20080625135538.762662000-6ktuUTfB/bM@public.gmane.org>
  2008-06-25 13:49 ` [RFC PATCH 3/6] IPC/sem: start/stop operations for /proc/pid/semundo Nadia.Derbey-6ktuUTfB/bM
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-06-25 13:49 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Pierre Peiffer, Nadia Derbey

[-- Attachment #1: ipc_procfs_semundo_file.patch --]
[-- Type: text/plain, Size: 6222 bytes --]

PATCH [02/06]

This patch adds a new procfs interface to display the per-process semundo
data. 

A new per-PID file is added, called "semundo".
It contains one line per semaphore IPC where there is something to undo for
this process.
Then, each line contains the semid followed by each undo value
corresponding to each semaphores of the semaphores array.

This interface will be particularly useful to allow a user access
these data, for example for checkpointing a process

With this patch, the semundo file can only be accessed in read mode.
When this file is opened, if an undo_list exists for the target process, it
is accessed in an rcu read section, and its refcount is incremented, avoiding
that it be freed.
The reverse is done during the release operation, and the undo_list is
freed if the process reading the file was the last process accessing that
list.

Signed-off-by: Pierre Peiffer <pierre.peiffer-6ktuUTfB/bM@public.gmane.org>
Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>

---
 fs/proc/base.c     |    3 +
 fs/proc/internal.h |    1 
 ipc/sem.c          |  127 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 128 insertions(+), 3 deletions(-)

Index: linux-2.6.26-rc5-mm3/fs/proc/base.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/fs/proc/base.c	2008-06-24 09:05:09.000000000 +0200
+++ linux-2.6.26-rc5-mm3/fs/proc/base.c	2008-06-24 10:03:33.000000000 +0200
@@ -2525,6 +2525,9 @@ static const struct pid_entry tgid_base_
 #ifdef CONFIG_TASK_IO_ACCOUNTING
 	INF("io",	S_IRUGO, tgid_io_accounting),
 #endif
+#ifdef CONFIG_SYSVIPC
+	REG("semundo",  S_IRUGO, semundo),
+#endif
 };
 
 static int proc_tgid_base_readdir(struct file * filp,
Index: linux-2.6.26-rc5-mm3/fs/proc/internal.h
===================================================================
--- linux-2.6.26-rc5-mm3.orig/fs/proc/internal.h	2008-06-24 09:05:09.000000000 +0200
+++ linux-2.6.26-rc5-mm3/fs/proc/internal.h	2008-06-24 10:04:19.000000000 +0200
@@ -65,6 +65,7 @@ extern const struct file_operations proc
 extern const struct file_operations proc_net_operations;
 extern const struct file_operations proc_kmsg_operations;
 extern const struct inode_operations proc_net_inode_operations;
+extern const struct file_operations proc_semundo_operations;
 
 void free_proc_entry(struct proc_dir_entry *de);
 
Index: linux-2.6.26-rc5-mm3/ipc/sem.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 09:37:33.000000000 +0200
+++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-24 10:59:46.000000000 +0200
@@ -97,6 +97,7 @@ static void freeary(struct ipc_namespace
 #ifdef CONFIG_PROC_FS
 static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
 #endif
+static void free_semundo_list(struct sem_undo_list *, struct ipc_namespace *);
 
 #define SEMMSL_FAST	256 /* 512 bytes on stack */
 #define SEMOPM_FAST	64  /* ~ 372 bytes on stack */
@@ -1282,8 +1283,14 @@ void exit_sem(struct task_struct *tsk)
 	rcu_assign_pointer(tsk->sysvsem.undo_list, NULL);
 	synchronize_rcu();
 
-	if (!atomic_dec_and_test(&ulp->refcnt))
-		return;
+	if (atomic_dec_and_test(&ulp->refcnt))
+		free_semundo_list(ulp, tsk->nsproxy->ipc_ns);
+}
+
+static void free_semundo_list(struct sem_undo_list *ulp,
+				struct ipc_namespace *ipc_ns)
+{
+	BUG_ON(atomic_read(&ulp->refcnt));
 
 	for (;;) {
 		struct sem_array *sma;
@@ -1303,7 +1310,7 @@ void exit_sem(struct task_struct *tsk)
 		if (semid == -1)
 			break;
 
-		sma = sem_lock_check(tsk->nsproxy->ipc_ns, un->semid);
+		sma = sem_lock_check(ipc_ns, un->semid);
 
 		/* exit_sem raced with IPC_RMID, nothing to do */
 		if (IS_ERR(sma))
@@ -1384,4 +1391,118 @@ static int sysvipc_sem_proc_show(struct 
 			  sma->sem_otime,
 			  sma->sem_ctime);
 }
+
+struct undo_list_data {
+	struct sem_undo_list *undo_list;
+	struct ipc_namespace *ipc_ns;
+};
+
+/* iterator */
+static void *semundo_start(struct seq_file *m, loff_t *ppos)
+{
+	return NULL;
+}
+
+static void *semundo_next(struct seq_file *m, void *v, loff_t *ppos)
+{
+	return NULL;
+}
+
+static void semundo_stop(struct seq_file *m, void *v)
+{
+	return;
+}
+
+static int semundo_show(struct seq_file *m, void *v)
+{
+	return 0;
+}
+
+static struct seq_operations semundo_op = {
+	.start	= semundo_start,
+	.next	= semundo_next,
+	.stop	= semundo_stop,
+	.show	= semundo_show
+};
+
+static struct sem_undo_list *get_proc_ulp(struct task_struct *tsk)
+{
+	struct sem_undo_list *ulp;
+
+	rcu_read_lock();
+	ulp = rcu_dereference(tsk->sysvsem.undo_list);
+	if (ulp)
+		if (!atomic_inc_not_zero(&ulp->refcnt))
+			ulp = NULL;
+	rcu_read_unlock();
+	return ulp;
+}
+
+static void put_proc_ulp(struct sem_undo_list *ulp,
+					struct ipc_namespace *ns)
+{
+	if (ulp && atomic_dec_and_test(&ulp->refcnt))
+		free_semundo_list(ulp, ns);
+}
+
+/*
+ * semundo_open: open operation for /proc/<PID>/semundo file
+ */
+static int semundo_open(struct inode *inode, struct file *file)
+{
+	struct task_struct *task;
+	struct sem_undo_list *ulp;
+	struct undo_list_data *data;
+	struct ipc_namespace *ns;
+	int ret = 0;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	task = get_pid_task(PROC_I(inode)->pid, PIDTYPE_PID);
+	if (!task) {
+		ret = -EINVAL;
+		goto out_err;
+	}
+
+	ulp = get_proc_ulp(task);
+	ns = get_ipc_ns(task->nsproxy->ipc_ns);
+	put_task_struct(task);
+
+	ret = seq_open(file, &semundo_op);
+	if (!ret) {
+		struct seq_file *m = file->private_data;
+		data->undo_list = ulp;
+		data->ipc_ns = ns;
+		m->private = data;
+		return 0;
+	}
+
+	put_proc_ulp(ulp, ns);
+	put_ipc_ns(ns);
+out_err:
+	kfree(data);
+	return ret;
+}
+
+static int semundo_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *m = file->private_data;
+	struct undo_list_data *data = m->private;
+	struct sem_undo_list *ulp = data->undo_list;
+	struct ipc_namespace *ns = data->ipc_ns;
+
+	put_proc_ulp(ulp, ns);
+	put_ipc_ns(ns);
+	kfree(data);
+	return seq_release(inode, file);
+}
+
+const struct file_operations proc_semundo_operations = {
+	.open		= semundo_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= semundo_release,
+};
 #endif

--

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 3/6] IPC/sem: start/stop operations for /proc/pid/semundo
  2008-06-25 13:49 [RFC PATCH 0/6] SYSVIPC/semaphores - allow saving/restoring a process' semundo_list Nadia.Derbey-6ktuUTfB/bM
  2008-06-25 13:49 ` [RFC PATCH 1/6] IPC/sem: RCU-protect the process semundo list Nadia.Derbey-6ktuUTfB/bM
  2008-06-25 13:49 ` [RFC PATCH 2/6] IPC/sem: per <pid> semundo file in procfs Nadia.Derbey-6ktuUTfB/bM
@ 2008-06-25 13:49 ` Nadia.Derbey-6ktuUTfB/bM
       [not found]   ` <20080625135539.139605000-6ktuUTfB/bM@public.gmane.org>
  2008-06-25 13:49 ` [RFC PATCH 4/6] IPC/sem: next " Nadia.Derbey-6ktuUTfB/bM
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-06-25 13:49 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Pierre Peiffer, Nadia Derbey

[-- Attachment #1: ipc_procfs_semundo_start_stop_seqops.patch --]
[-- Type: text/plain, Size: 2230 bytes --]

PATCH [03/06]

This patch introduces the .start and .stop seq operations for
/proc/pid/semundo.

Signed-off-by: Pierre Peiffer <pierre.peiffer-6ktuUTfB/bM@public.gmane.org>
Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>

---
 ipc/sem.c |   43 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

Index: linux-2.6.26-rc5-mm3/ipc/sem.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 10:59:46.000000000 +0200
+++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-24 12:32:36.000000000 +0200
@@ -1400,7 +1400,42 @@ struct undo_list_data {
 /* iterator */
 static void *semundo_start(struct seq_file *m, loff_t *ppos)
 {
-	return NULL;
+	struct undo_list_data *data = m->private;
+	struct sem_undo_list *ulp = data->undo_list;
+	struct sem_undo	*undo;
+	loff_t pos = *ppos;
+
+	if (!ulp)
+		return NULL;
+
+	if (pos < 0)
+		return NULL;
+
+	/* If ulp is not NULL, it means that we've successfully grabbed
+	 * a refcnt in semundo_open. That prevents the undo_list from being
+	 * freed.
+	 *
+	 * Note:
+	 * 1) the sem_undo structure is RCU-protected. So take the rcu read
+	 *    lock and only release it during the .stop operation.
+	 * 2) we accept to release the undo_list lock (i.e. we allow the list
+	 *    to change) between each iteration, instead of taking that lock
+	 *    during the .start and releasing it during the .stop operation.
+	 *    This is to reduce the performance impact on the access to the
+	 *    undo_list.
+	 */
+	rcu_read_lock();
+	spin_lock(&ulp->lock);
+	list_for_each_entry_rcu(undo, &ulp->list_proc, list_proc) {
+		if ((undo->semid != -1) && !(pos--))
+			break;
+	}
+	spin_unlock(&ulp->lock);
+
+	if (&undo->list_proc == &ulp->list_proc)
+		return NULL;
+
+	return undo;
 }
 
 static void *semundo_next(struct seq_file *m, void *v, loff_t *ppos)
@@ -1410,7 +1445,11 @@ static void *semundo_next(struct seq_fil
 
 static void semundo_stop(struct seq_file *m, void *v)
 {
-	return;
+	struct undo_list_data *data = m->private;
+	struct sem_undo_list *ulp = data->undo_list;
+
+	if (ulp)
+		rcu_read_unlock();
 }
 
 static int semundo_show(struct seq_file *m, void *v)

--

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 4/6] IPC/sem: next operations for /proc/pid/semundo
  2008-06-25 13:49 [RFC PATCH 0/6] SYSVIPC/semaphores - allow saving/restoring a process' semundo_list Nadia.Derbey-6ktuUTfB/bM
                   ` (2 preceding siblings ...)
  2008-06-25 13:49 ` [RFC PATCH 3/6] IPC/sem: start/stop operations for /proc/pid/semundo Nadia.Derbey-6ktuUTfB/bM
@ 2008-06-25 13:49 ` Nadia.Derbey-6ktuUTfB/bM
       [not found]   ` <20080625135539.519489000-6ktuUTfB/bM@public.gmane.org>
  2008-06-25 13:49 ` [RFC PATCH 5/6] IPC/sem: .show operation " Nadia.Derbey-6ktuUTfB/bM
  2008-06-25 13:49 ` [RFC PATCH 6/6] IPC/sem: .write operation for /proc/<self>/semundo Nadia.Derbey-6ktuUTfB/bM
  5 siblings, 1 reply; 20+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-06-25 13:49 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Pierre Peiffer, Nadia Derbey

[-- Attachment #1: ipc_procfs_semundo_next_seqop.patch --]
[-- Type: text/plain, Size: 1647 bytes --]

PATCH [04/06]

This patch introduces the .next seq operation for /proc/pid/semundo.

What should be mentioned here is that the undo_list lock is released between
between each iteration.
Doing this, we only guarantee to access some valid data during the .show,
not to have a full coherent view of the whole list. But, oth, this reduces the
the performance impact on the access to the undo_list.

Signed-off-by: Pierre Peiffer <pierre.peiffer-6ktuUTfB/bM@public.gmane.org>
Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>

---
 ipc/sem.c |   23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

Index: linux-2.6.26-rc5-mm3/ipc/sem.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 12:32:36.000000000 +0200
+++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-24 12:54:40.000000000 +0200
@@ -1440,7 +1440,28 @@ static void *semundo_start(struct seq_fi
 
 static void *semundo_next(struct seq_file *m, void *v, loff_t *ppos)
 {
-	return NULL;
+	struct sem_undo	*undo = v;
+	struct undo_list_data *data = m->private;
+	struct sem_undo_list *ulp = data->undo_list;
+
+	/*
+	 * No need to protect against ulp being NULL, if we are here,
+	 * it can't be NULL.
+	 */
+	spin_lock(&ulp->lock);
+
+	do {
+		undo = list_entry(rcu_dereference(undo->list_proc.next),
+				struct sem_undo, list_proc);
+
+	} while (&undo->list_proc != &ulp->list_proc && undo->semid == -1);
+
+	++*ppos;
+	spin_unlock(&ulp->lock);
+
+	if (&undo->list_proc == &ulp->list_proc)
+		return NULL;
+	return undo;
 }
 
 static void semundo_stop(struct seq_file *m, void *v)

--

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 5/6] IPC/sem: .show operation for /proc/pid/semundo
  2008-06-25 13:49 [RFC PATCH 0/6] SYSVIPC/semaphores - allow saving/restoring a process' semundo_list Nadia.Derbey-6ktuUTfB/bM
                   ` (3 preceding siblings ...)
  2008-06-25 13:49 ` [RFC PATCH 4/6] IPC/sem: next " Nadia.Derbey-6ktuUTfB/bM
@ 2008-06-25 13:49 ` Nadia.Derbey-6ktuUTfB/bM
       [not found]   ` <20080625135539.893049000-6ktuUTfB/bM@public.gmane.org>
  2008-06-25 13:49 ` [RFC PATCH 6/6] IPC/sem: .write operation for /proc/<self>/semundo Nadia.Derbey-6ktuUTfB/bM
  5 siblings, 1 reply; 20+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-06-25 13:49 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Pierre Peiffer, Nadia Derbey

[-- Attachment #1: ipc_procfs_semundo_show_seqop.patch --]
[-- Type: text/plain, Size: 1343 bytes --]

PATCH [05/06]

This patch introduces the .show seq operation for /proc/pid/semundo.


Signed-off-by: Pierre Peiffer <pierre.peiffer-6ktuUTfB/bM@public.gmane.org>
Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>

---
 ipc/sem.c |   28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

Index: linux-2.6.26-rc5-mm3/ipc/sem.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 12:54:40.000000000 +0200
+++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-24 12:59:15.000000000 +0200
@@ -1475,6 +1475,34 @@ static void semundo_stop(struct seq_file
 
 static int semundo_show(struct seq_file *m, void *v)
 {
+	struct undo_list_data *data = m->private;
+	struct ipc_namespace *ns = data->ipc_ns;
+	struct sem_undo	*u = v;
+	int nsems, i;
+	struct sem_array *sma;
+
+	/*
+	 * This semid has been deleted, ignore it.
+	 * Even if we skipped all sem_undo belonging to deleted semid
+	 * in semundo_next(), some more deletions may have happened.
+	 */
+	if (u->semid == -1)
+		return 0;
+
+	seq_printf(m, "%10d", u->semid);
+
+	sma = sem_lock(ns, u->semid);
+	if (IS_ERR(sma))
+		goto out;
+
+	nsems = sma->sem_nsems;
+	sem_unlock(sma);
+
+	for (i = 0; i < nsems; i++)
+		seq_printf(m, " %6d", u->semadj[i]);
+
+out:
+	seq_putc(m, '\n');
 	return 0;
 }
 

--

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 6/6] IPC/sem: .write operation for /proc/<self>/semundo
  2008-06-25 13:49 [RFC PATCH 0/6] SYSVIPC/semaphores - allow saving/restoring a process' semundo_list Nadia.Derbey-6ktuUTfB/bM
                   ` (4 preceding siblings ...)
  2008-06-25 13:49 ` [RFC PATCH 5/6] IPC/sem: .show operation " Nadia.Derbey-6ktuUTfB/bM
@ 2008-06-25 13:49 ` Nadia.Derbey-6ktuUTfB/bM
       [not found]   ` <20080625135540.271934000-6ktuUTfB/bM@public.gmane.org>
  5 siblings, 1 reply; 20+ messages in thread
From: Nadia.Derbey-6ktuUTfB/bM @ 2008-06-25 13:49 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA; +Cc: Nadia Derbey

[-- Attachment #1: ipc_procfs_semundo_write.patch --]
[-- Type: text/plain, Size: 7665 bytes --]

PATCH [06/06]

This patch introduces the .write seq operation for /proc/pid/semundo.

In order to simplify the locking strategy, the write operation is only allowed
to 'current'.


Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>

---
 fs/proc/base.c |    2 
 ipc/sem.c      |  250 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 250 insertions(+), 2 deletions(-)

Index: linux-2.6.26-rc5-mm3/fs/proc/base.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/fs/proc/base.c	2008-06-24 10:03:33.000000000 +0200
+++ linux-2.6.26-rc5-mm3/fs/proc/base.c	2008-06-24 13:21:44.000000000 +0200
@@ -2526,7 +2526,7 @@ static const struct pid_entry tgid_base_
 	INF("io",	S_IRUGO, tgid_io_accounting),
 #endif
 #ifdef CONFIG_SYSVIPC
-	REG("semundo",  S_IRUGO, semundo),
+	REG("semundo",  S_IWUSR|S_IRUGO, semundo),
 #endif
 };
 
Index: linux-2.6.26-rc5-mm3/ipc/sem.c
===================================================================
--- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 12:59:15.000000000 +0200
+++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-25 08:56:07.000000000 +0200
@@ -1554,7 +1554,27 @@ static int semundo_open(struct inode *in
 		goto out_err;
 	}
 
-	ulp = get_proc_ulp(task);
+	if (file->f_flags & O_WRONLY || file->f_flags & O_RDWR) {
+		/*
+		 * Writing is only allowed to current
+		 */
+		if (task != current) {
+			put_task_struct(task);
+			ret = -EPERM;
+			goto out_err;
+		}
+		/*
+		 * No need to increment the undo_list's refcnt: only current
+		 * can be freeing it through exit, and we are current.
+		 * Only signal through the ulp NULL pointer that it won't be
+		 * necessary to decrement the refcnt during release.
+		 * Actually, in this path the undo_list will be gotten during
+		 * the write operation.
+		 */
+		ulp = NULL;
+	} else
+		ulp = get_proc_ulp(task);
+
 	ns = get_ipc_ns(task->nsproxy->ipc_ns);
 	put_task_struct(task);
 
@@ -1574,6 +1594,233 @@ out_err:
 	return ret;
 }
 
+/* Skip all spaces at the beginning of the buffer */
+static inline int skip_space(const char __user **buf, size_t *len)
+{
+	char c = 0;
+	while (*len) {
+		if (get_user(c, *buf))
+			return -EFAULT;
+		if (c != '\t' && c != ' ')
+			break;
+		--*len;
+		++*buf;
+	}
+	return c;
+}
+
+/* Retrieve the first numerical value contained in the string.
+ * Note: The value is supposed to be a 32-bit integer.
+ */
+static inline int get_next_value(const char __user **buf, size_t *len, int *val)
+{
+#define BUFLEN 11
+	int err, neg = 0, left;
+	char s[BUFLEN], *p;
+
+	err = skip_space(buf, len);
+	if (err < 0)
+		return err;
+	if (!*len)
+		return INT_MAX;
+	if (err == '\n') {
+		++*buf;
+		--*len;
+		return INT_MAX;
+	}
+	if (err == '-') {
+		++*buf;
+		--*len;
+		neg = 1;
+	}
+
+	left = *len;
+	if (left > sizeof(s) - 1)
+		left = sizeof(s) - 1;
+	if (copy_from_user(s, *buf, left))
+		return -EFAULT;
+
+	s[left] = 0;
+	p = s;
+	if (*p < '0' || *p > '9')
+		return -EINVAL;
+
+	*val = simple_strtoul(p, &p, 0);
+	if (neg)
+		*val = -(*val);
+
+	left = p-s;
+	(*len) -= left;
+	(*buf) += left;
+
+	return 0;
+#undef BUFLEN
+}
+
+/*
+ * Reads a line from /proc/self/semundo.
+ * Returns the number of undo values read (or errcode upon failure).
+ * @id: pointer to the semid (filled in with 1st field in the line)
+ * @array: semundo values (filled in iwth remaining fields in the line).
+ * @array_len: max # of expected semundo values
+ */
+static inline int semundo_readline(const char __user **buf, size_t *left,
+				   int *id, short *array, int array_len)
+{
+	int i, val, err;
+
+	/* Read semid */
+	err = get_next_value(buf, left, id);
+	if (err)
+		return err;
+
+	/* Read all (semundo-) values of a full line */
+	for (i = 0; ; i++) {
+		err = get_next_value(buf, left, &val);
+		if (err < 0)
+			return err;
+		/* reached end of line or end of buffer */
+		if (err == INT_MAX)
+			break;
+		/* Return an error if we get more values than expected */
+		if (i < array_len)
+			array[i] = val;
+		else
+			return -EINVAL;
+	}
+	return i;
+}
+
+/*
+ * sets or updates the undo values for the undo_list of a given semaphore id.
+ * This is exactly the same code sequence as sys_semtimedop if we have undos.
+ */
+static inline int semundo_update(struct ipc_namespace *ns, int semid,
+				short *array, int size)
+{
+	struct sem_undo *un;
+	struct sem_array *sma;
+	int ret = 0;
+
+	un = find_alloc_undo(ns, semid);
+	if (IS_ERR(un)) {
+		ret = PTR_ERR(un);
+		goto out;
+	}
+
+	/* lookup the sem_array */
+	sma = sem_lock(ns, semid);
+	if (IS_ERR(sma)) {
+		ret = PTR_ERR(sma);
+		rcu_read_unlock();
+		goto out;
+	}
+
+	/*
+	 * find_alloc_undo opened an rcu read section to protect un.
+	 * Releasing it here is safe:
+	 *    . sem_lock is held, so we are protected against IPC_RMID
+	 *    . the refcnt won't fall to 0 since exit_sem only operates on
+	 *      current and we are the current.
+	 */
+	rcu_read_unlock();
+
+	/*
+	 * semid identifiers are not unique - find_alloc_undo() may have
+	 * allocated an undo structure, it was invalidated by an RMID and now
+	 * a new array received the same id.
+	 * Check and fail.
+	 * This case can be detected checking un->semid. The existance of
+	 * "un" itself is guaranteed by rcu.
+	 */
+	if (un->semid == -1) {
+		ret = -EIDRM;
+		goto out_unlock;
+	}
+
+	/*
+	 * If the number of values given does not match the number of
+	 * semaphores in the array, consider this as an error.
+	 */
+	if (size != sma->sem_nsems) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/* update the undo values */
+	while (--size >= 0)
+		un->semadj[size] = array[size];
+
+	/* maybe some queued-up processes were waiting for this */
+	update_queue(sma);
+
+out_unlock:
+	sem_unlock(sma);
+out:
+	return ret;
+}
+
+/*
+ * write operation for /proc/self/semundo file
+ *
+ * Only current is allowed to write to its semundo file.
+ *
+ * The expected string format is:
+ * "<semID> <val1> <val2> ... <valN>"
+ *
+ * It sets (or updates) the sem_undo list for 'current' and the target
+ * <semID>, to the given 'undo' values.
+ *
+ * <semID> must match an existing semaphore array.
+ * The number of values following <semID> must match the number of semaphores
+ * in the corresponding array.
+ *
+ * Multiple semID's can be passed simultaneously: newline ('\n') is considered
+ * as a separator in that case.
+ *
+ * Note: it is not allowed to set the sem_undo list for a given semID using
+ *       mutliple write calls.
+ */
+static ssize_t semundo_write(struct file *file, const char __user *buf,
+			     size_t count, loff_t *ppos)
+{
+	struct seq_file *m = file->private_data;
+	short *array;
+	int err, max_sem, semid = 0;
+	size_t left = count;
+	struct undo_list_data *data = m->private;
+	struct ipc_namespace *ns = data->ipc_ns;
+
+	if (data->undo_list)
+		/* Should never happen */
+		return -EINVAL;
+
+	max_sem = ns->sc_semmsl;
+
+	array = kmalloc(sizeof(short)*max_sem, GFP_KERNEL);
+	if (array == NULL)
+		return -ENOMEM;
+
+	while (left) {
+		int nval;
+
+		nval = semundo_readline(&buf, &left, &semid, array, max_sem);
+		if (nval < 0) {
+			err = nval;
+			goto out;
+		}
+
+		err = semundo_update(ns, semid, array, nval);
+		if (err)
+			goto out;
+	}
+	err = count - left;
+
+out:
+	kfree(array);
+	return err;
+}
+
 static int semundo_release(struct inode *inode, struct file *file)
 {
 	struct seq_file *m = file->private_data;
@@ -1590,6 +1837,7 @@ static int semundo_release(struct inode 
 const struct file_operations proc_semundo_operations = {
 	.open		= semundo_open,
 	.read		= seq_read,
+	.write		= semundo_write,
 	.llseek		= seq_lseek,
 	.release	= semundo_release,
 };

--

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/6] IPC/sem: RCU-protect the process semundo list
       [not found]   ` <20080625135538.385496000-6ktuUTfB/bM@public.gmane.org>
@ 2008-06-25 20:33     ` Serge E. Hallyn
  0 siblings, 0 replies; 20+ messages in thread
From: Serge E. Hallyn @ 2008-06-25 20:33 UTC (permalink / raw)
  To: Nadia.Derbey-6ktuUTfB/bM
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Pierre Peiffer

Quoting Nadia.Derbey-6ktuUTfB/bM@public.gmane.org (Nadia.Derbey-6ktuUTfB/bM@public.gmane.org):
> PATCH [01/06]
> 
> Today, 'current' has an exclusive access to its sem_undo_list (anchored at
> current->sysvsem.undo_list):
>   . it is created during a semop() if the SEMUNDO flag is specified for one
>     of the semaphores.
>   . it can also be created during a copy_process() operation if the
>     CLONE_SYSVSEM flag is specified (in that case the undo_list is created/
>     copied from 'current' into the target task but that target task is not
>     running yet).
>   . it is freed during an unshare() or an exit() operation, if the caller
>     (current) is the last task using it.
> 
> 
> In order to allow a third party process to read a process' undo list, without
> a too big performance impact on the existing operations, this patch proposes
> to make the sem_undo_list structures rcu protected.
> 
> They can then be safely accessed by any task inside read critical section,
> this way:
> 
>         struct sem_undo_list *undo_list;
>         int ret;
>         ...
>         rcu_read_lock();
>         undo_list = rcu_dereference(task->sysvsem.undo_list);
>         if (undo_list)   
>                 ret = atomic_inc_not_zero(&undo_list->refcnt);
>         rcu_read_unlock();
>         ...
>         if (undo_list && ret) {
>                 /* section where undo_list can be used quietly */
>                 ...
>         }
>         ...
> 
> Signed-off-by: Pierre Peiffer <pierre.peiffer-6ktuUTfB/bM@public.gmane.org>
> Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

thanks,
-serge

> 
> ---
>  include/linux/sem.h |    4 +++-
>  ipc/sem.c           |   22 ++++++++++++++++++----
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6.26-rc5-mm3/include/linux/sem.h
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/include/linux/sem.h	2008-06-24 09:04:21.000000000 +0200
> +++ linux-2.6.26-rc5-mm3/include/linux/sem.h	2008-06-24 10:01:50.000000000 +0200
> @@ -112,7 +112,8 @@ struct sem_queue {
>  };
> 
>  /* Each task has a list of undo requests. They are executed automatically
> - * when the process exits.
> + * when the last refcnt of sem_undo_list is released (ie when the process
> + * exits in the general case).
>   */
>  struct sem_undo {
>  	struct list_head	list_proc;	/* per-process list: all undos from one process. */
> @@ -131,6 +132,7 @@ struct sem_undo_list {
>  	atomic_t		refcnt;
>  	spinlock_t		lock;
>  	struct list_head	list_proc;
> +	struct rcu_head		rcu;
>  };
> 
>  struct sysv_sem {
> Index: linux-2.6.26-rc5-mm3/ipc/sem.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 09:05:03.000000000 +0200
> +++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-24 09:37:33.000000000 +0200
> @@ -939,6 +939,10 @@ static inline int get_undo_list(struct s
>  {
>  	struct sem_undo_list *undo_list;
> 
> +	/*
> +	 * No need to have a rcu read critical section here: no one but
> +	 * current is accessing the undo_list.
> +	 */
>  	undo_list = current->sysvsem.undo_list;
>  	if (!undo_list) {
>  		undo_list = kzalloc(sizeof(*undo_list), GFP_KERNEL);
> @@ -948,7 +952,7 @@ static inline int get_undo_list(struct s
>  		atomic_set(&undo_list->refcnt, 1);
>  		INIT_LIST_HEAD(&undo_list->list_proc);
> 
> -		current->sysvsem.undo_list = undo_list;
> +		rcu_assign_pointer(current->sysvsem.undo_list, undo_list);
>  	}
>  	*undo_listp = undo_list;
>  	return 0;
> @@ -1268,10 +1272,15 @@ void exit_sem(struct task_struct *tsk)
>  {
>  	struct sem_undo_list *ulp;
> 
> -	ulp = tsk->sysvsem.undo_list;
> -	if (!ulp)
> +	rcu_read_lock();
> +	ulp = rcu_dereference(tsk->sysvsem.undo_list);
> +	if (!ulp) {
> +		rcu_read_unlock();
>  		return;
> -	tsk->sysvsem.undo_list = NULL;
> +	}
> +	rcu_read_unlock();
> +	rcu_assign_pointer(tsk->sysvsem.undo_list, NULL);
> +	synchronize_rcu();
> 
>  	if (!atomic_dec_and_test(&ulp->refcnt))
>  		return;
> @@ -1349,6 +1358,11 @@ void exit_sem(struct task_struct *tsk)
> 
>  		call_rcu(&un->rcu, free_un);
>  	}
> +	/*
> +	 * No need to call synchronize_rcu() here: we come here if the refcnt
> +	 * is 0 and this has been done into exit_sem after synchronizing. So
> +	 * nobody else can be referencing to the undo_list.
> +	 */
>  	kfree(ulp);
>  }
> 
> 
> --

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 2/6] IPC/sem: per <pid> semundo file in procfs
       [not found]   ` <20080625135538.762662000-6ktuUTfB/bM@public.gmane.org>
@ 2008-06-25 20:33     ` Serge E. Hallyn
  2008-06-26  5:08     ` Michael Kerrisk
  1 sibling, 0 replies; 20+ messages in thread
From: Serge E. Hallyn @ 2008-06-25 20:33 UTC (permalink / raw)
  To: Nadia.Derbey-6ktuUTfB/bM
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Pierre Peiffer

Quoting Nadia.Derbey-6ktuUTfB/bM@public.gmane.org (Nadia.Derbey-6ktuUTfB/bM@public.gmane.org):
> PATCH [02/06]
> 
> This patch adds a new procfs interface to display the per-process semundo
> data. 
> 
> A new per-PID file is added, called "semundo".
> It contains one line per semaphore IPC where there is something to undo for
> this process.
> Then, each line contains the semid followed by each undo value
> corresponding to each semaphores of the semaphores array.
> 
> This interface will be particularly useful to allow a user access
> these data, for example for checkpointing a process
> 
> With this patch, the semundo file can only be accessed in read mode.
> When this file is opened, if an undo_list exists for the target process, it
> is accessed in an rcu read section, and its refcount is incremented, avoiding
> that it be freed.
> The reverse is done during the release operation, and the undo_list is
> freed if the process reading the file was the last process accessing that
> list.
> 
> Signed-off-by: Pierre Peiffer <pierre.peiffer-6ktuUTfB/bM@public.gmane.org>
> Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

-serge

> 
> ---
>  fs/proc/base.c     |    3 +
>  fs/proc/internal.h |    1 
>  ipc/sem.c          |  127 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 128 insertions(+), 3 deletions(-)
> 
> Index: linux-2.6.26-rc5-mm3/fs/proc/base.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/fs/proc/base.c	2008-06-24 09:05:09.000000000 +0200
> +++ linux-2.6.26-rc5-mm3/fs/proc/base.c	2008-06-24 10:03:33.000000000 +0200
> @@ -2525,6 +2525,9 @@ static const struct pid_entry tgid_base_
>  #ifdef CONFIG_TASK_IO_ACCOUNTING
>  	INF("io",	S_IRUGO, tgid_io_accounting),
>  #endif
> +#ifdef CONFIG_SYSVIPC
> +	REG("semundo",  S_IRUGO, semundo),
> +#endif
>  };
> 
>  static int proc_tgid_base_readdir(struct file * filp,
> Index: linux-2.6.26-rc5-mm3/fs/proc/internal.h
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/fs/proc/internal.h	2008-06-24 09:05:09.000000000 +0200
> +++ linux-2.6.26-rc5-mm3/fs/proc/internal.h	2008-06-24 10:04:19.000000000 +0200
> @@ -65,6 +65,7 @@ extern const struct file_operations proc
>  extern const struct file_operations proc_net_operations;
>  extern const struct file_operations proc_kmsg_operations;
>  extern const struct inode_operations proc_net_inode_operations;
> +extern const struct file_operations proc_semundo_operations;
> 
>  void free_proc_entry(struct proc_dir_entry *de);
> 
> Index: linux-2.6.26-rc5-mm3/ipc/sem.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 09:37:33.000000000 +0200
> +++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-24 10:59:46.000000000 +0200
> @@ -97,6 +97,7 @@ static void freeary(struct ipc_namespace
>  #ifdef CONFIG_PROC_FS
>  static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
>  #endif
> +static void free_semundo_list(struct sem_undo_list *, struct ipc_namespace *);
> 
>  #define SEMMSL_FAST	256 /* 512 bytes on stack */
>  #define SEMOPM_FAST	64  /* ~ 372 bytes on stack */
> @@ -1282,8 +1283,14 @@ void exit_sem(struct task_struct *tsk)
>  	rcu_assign_pointer(tsk->sysvsem.undo_list, NULL);
>  	synchronize_rcu();
> 
> -	if (!atomic_dec_and_test(&ulp->refcnt))
> -		return;
> +	if (atomic_dec_and_test(&ulp->refcnt))
> +		free_semundo_list(ulp, tsk->nsproxy->ipc_ns);
> +}
> +
> +static void free_semundo_list(struct sem_undo_list *ulp,
> +				struct ipc_namespace *ipc_ns)
> +{
> +	BUG_ON(atomic_read(&ulp->refcnt));
> 
>  	for (;;) {
>  		struct sem_array *sma;
> @@ -1303,7 +1310,7 @@ void exit_sem(struct task_struct *tsk)
>  		if (semid == -1)
>  			break;
> 
> -		sma = sem_lock_check(tsk->nsproxy->ipc_ns, un->semid);
> +		sma = sem_lock_check(ipc_ns, un->semid);
> 
>  		/* exit_sem raced with IPC_RMID, nothing to do */
>  		if (IS_ERR(sma))
> @@ -1384,4 +1391,118 @@ static int sysvipc_sem_proc_show(struct 
>  			  sma->sem_otime,
>  			  sma->sem_ctime);
>  }
> +
> +struct undo_list_data {
> +	struct sem_undo_list *undo_list;
> +	struct ipc_namespace *ipc_ns;
> +};
> +
> +/* iterator */
> +static void *semundo_start(struct seq_file *m, loff_t *ppos)
> +{
> +	return NULL;
> +}
> +
> +static void *semundo_next(struct seq_file *m, void *v, loff_t *ppos)
> +{
> +	return NULL;
> +}
> +
> +static void semundo_stop(struct seq_file *m, void *v)
> +{
> +	return;
> +}
> +
> +static int semundo_show(struct seq_file *m, void *v)
> +{
> +	return 0;
> +}
> +
> +static struct seq_operations semundo_op = {
> +	.start	= semundo_start,
> +	.next	= semundo_next,
> +	.stop	= semundo_stop,
> +	.show	= semundo_show
> +};
> +
> +static struct sem_undo_list *get_proc_ulp(struct task_struct *tsk)
> +{
> +	struct sem_undo_list *ulp;
> +
> +	rcu_read_lock();
> +	ulp = rcu_dereference(tsk->sysvsem.undo_list);
> +	if (ulp)
> +		if (!atomic_inc_not_zero(&ulp->refcnt))
> +			ulp = NULL;
> +	rcu_read_unlock();
> +	return ulp;
> +}
> +
> +static void put_proc_ulp(struct sem_undo_list *ulp,
> +					struct ipc_namespace *ns)
> +{
> +	if (ulp && atomic_dec_and_test(&ulp->refcnt))
> +		free_semundo_list(ulp, ns);
> +}
> +
> +/*
> + * semundo_open: open operation for /proc/<PID>/semundo file
> + */
> +static int semundo_open(struct inode *inode, struct file *file)
> +{
> +	struct task_struct *task;
> +	struct sem_undo_list *ulp;
> +	struct undo_list_data *data;
> +	struct ipc_namespace *ns;
> +	int ret = 0;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	task = get_pid_task(PROC_I(inode)->pid, PIDTYPE_PID);
> +	if (!task) {
> +		ret = -EINVAL;
> +		goto out_err;
> +	}
> +
> +	ulp = get_proc_ulp(task);
> +	ns = get_ipc_ns(task->nsproxy->ipc_ns);
> +	put_task_struct(task);
> +
> +	ret = seq_open(file, &semundo_op);
> +	if (!ret) {
> +		struct seq_file *m = file->private_data;
> +		data->undo_list = ulp;
> +		data->ipc_ns = ns;
> +		m->private = data;
> +		return 0;
> +	}
> +
> +	put_proc_ulp(ulp, ns);
> +	put_ipc_ns(ns);
> +out_err:
> +	kfree(data);
> +	return ret;
> +}
> +
> +static int semundo_release(struct inode *inode, struct file *file)
> +{
> +	struct seq_file *m = file->private_data;
> +	struct undo_list_data *data = m->private;
> +	struct sem_undo_list *ulp = data->undo_list;
> +	struct ipc_namespace *ns = data->ipc_ns;
> +
> +	put_proc_ulp(ulp, ns);
> +	put_ipc_ns(ns);
> +	kfree(data);
> +	return seq_release(inode, file);
> +}
> +
> +const struct file_operations proc_semundo_operations = {
> +	.open		= semundo_open,
> +	.read		= seq_read,
> +	.llseek		= seq_lseek,
> +	.release	= semundo_release,
> +};
>  #endif
> 
> --

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 3/6] IPC/sem: start/stop operations for /proc/pid/semundo
       [not found]   ` <20080625135539.139605000-6ktuUTfB/bM@public.gmane.org>
@ 2008-06-25 20:39     ` Serge E. Hallyn
  0 siblings, 0 replies; 20+ messages in thread
From: Serge E. Hallyn @ 2008-06-25 20:39 UTC (permalink / raw)
  To: Nadia.Derbey-6ktuUTfB/bM
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Pierre Peiffer

Quoting Nadia.Derbey-6ktuUTfB/bM@public.gmane.org (Nadia.Derbey-6ktuUTfB/bM@public.gmane.org):
> PATCH [03/06]
> 
> This patch introduces the .start and .stop seq operations for
> /proc/pid/semundo.
> 
> Signed-off-by: Pierre Peiffer <pierre.peiffer-6ktuUTfB/bM@public.gmane.org>
> Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

> ---
>  ipc/sem.c |   43 +++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 41 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.26-rc5-mm3/ipc/sem.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 10:59:46.000000000 +0200
> +++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-24 12:32:36.000000000 +0200
> @@ -1400,7 +1400,42 @@ struct undo_list_data {
>  /* iterator */
>  static void *semundo_start(struct seq_file *m, loff_t *ppos)
>  {
> -	return NULL;
> +	struct undo_list_data *data = m->private;
> +	struct sem_undo_list *ulp = data->undo_list;
> +	struct sem_undo	*undo;
> +	loff_t pos = *ppos;
> +
> +	if (!ulp)
> +		return NULL;
> +
> +	if (pos < 0)
> +		return NULL;
> +
> +	/* If ulp is not NULL, it means that we've successfully grabbed
> +	 * a refcnt in semundo_open. That prevents the undo_list from being
> +	 * freed.
> +	 *
> +	 * Note:
> +	 * 1) the sem_undo structure is RCU-protected. So take the rcu read
> +	 *    lock and only release it during the .stop operation.
> +	 * 2) we accept to release the undo_list lock (i.e. we allow the list
> +	 *    to change) between each iteration, instead of taking that lock
> +	 *    during the .start and releasing it during the .stop operation.
> +	 *    This is to reduce the performance impact on the access to the
> +	 *    undo_list.
> +	 */
> +	rcu_read_lock();
> +	spin_lock(&ulp->lock);
> +	list_for_each_entry_rcu(undo, &ulp->list_proc, list_proc) {
> +		if ((undo->semid != -1) && !(pos--))
> +			break;
> +	}
> +	spin_unlock(&ulp->lock);
> +
> +	if (&undo->list_proc == &ulp->list_proc)
> +		return NULL;
> +
> +	return undo;
>  }
> 
>  static void *semundo_next(struct seq_file *m, void *v, loff_t *ppos)
> @@ -1410,7 +1445,11 @@ static void *semundo_next(struct seq_fil
> 
>  static void semundo_stop(struct seq_file *m, void *v)
>  {
> -	return;
> +	struct undo_list_data *data = m->private;
> +	struct sem_undo_list *ulp = data->undo_list;
> +
> +	if (ulp)
> +		rcu_read_unlock();
>  }
> 
>  static int semundo_show(struct seq_file *m, void *v)
> 
> --

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/6] IPC/sem: next operations for /proc/pid/semundo
       [not found]   ` <20080625135539.519489000-6ktuUTfB/bM@public.gmane.org>
@ 2008-06-25 20:57     ` Serge E. Hallyn
       [not found]       ` <20080625205719.GD16374-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 20+ messages in thread
From: Serge E. Hallyn @ 2008-06-25 20:57 UTC (permalink / raw)
  To: Nadia.Derbey-6ktuUTfB/bM
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Pierre Peiffer

Quoting Nadia.Derbey-6ktuUTfB/bM@public.gmane.org (Nadia.Derbey-6ktuUTfB/bM@public.gmane.org):
> PATCH [04/06]
> 
> This patch introduces the .next seq operation for /proc/pid/semundo.
> 
> What should be mentioned here is that the undo_list lock is released between
> between each iteration.
> Doing this, we only guarantee to access some valid data during the .show,

Ok so you count on an item sticking around for the duration of the
rcu_read_cycle().  exit_sem() is therefore not an issue.  The other
possible racer is freeary() as called from IPC_RMID, but while that
could remove this entry from the undo_list->list_proc, it will wait
an rcu cycle before it actually frees it.

Am I reading that all right?  If so, then:

> not to have a full coherent view of the whole list. But, oth, this reduces the
> the performance impact on the access to the undo_list.
> 
> Signed-off-by: Pierre Peiffer <pierre.peiffer-6ktuUTfB/bM@public.gmane.org>
> Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

> 
> ---
>  ipc/sem.c |   23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.26-rc5-mm3/ipc/sem.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 12:32:36.000000000 +0200
> +++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-24 12:54:40.000000000 +0200
> @@ -1440,7 +1440,28 @@ static void *semundo_start(struct seq_fi
> 
>  static void *semundo_next(struct seq_file *m, void *v, loff_t *ppos)
>  {
> -	return NULL;
> +	struct sem_undo	*undo = v;
> +	struct undo_list_data *data = m->private;
> +	struct sem_undo_list *ulp = data->undo_list;
> +
> +	/*
> +	 * No need to protect against ulp being NULL, if we are here,
> +	 * it can't be NULL.
> +	 */
> +	spin_lock(&ulp->lock);
> +
> +	do {
> +		undo = list_entry(rcu_dereference(undo->list_proc.next),
> +				struct sem_undo, list_proc);
> +
> +	} while (&undo->list_proc != &ulp->list_proc && undo->semid == -1);
> +
> +	++*ppos;
> +	spin_unlock(&ulp->lock);
> +
> +	if (&undo->list_proc == &ulp->list_proc)
> +		return NULL;
> +	return undo;
>  }
> 
>  static void semundo_stop(struct seq_file *m, void *v)
> 
> --

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 5/6] IPC/sem: .show operation for /proc/pid/semundo
       [not found]   ` <20080625135539.893049000-6ktuUTfB/bM@public.gmane.org>
@ 2008-06-25 20:58     ` Serge E. Hallyn
  0 siblings, 0 replies; 20+ messages in thread
From: Serge E. Hallyn @ 2008-06-25 20:58 UTC (permalink / raw)
  To: Nadia.Derbey-6ktuUTfB/bM
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Pierre Peiffer

Quoting Nadia.Derbey-6ktuUTfB/bM@public.gmane.org (Nadia.Derbey-6ktuUTfB/bM@public.gmane.org):
> PATCH [05/06]
> 
> This patch introduces the .show seq operation for /proc/pid/semundo.
> 
> 
> Signed-off-by: Pierre Peiffer <pierre.peiffer-6ktuUTfB/bM@public.gmane.org>
> Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

> ---
>  ipc/sem.c |   28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> Index: linux-2.6.26-rc5-mm3/ipc/sem.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 12:54:40.000000000 +0200
> +++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-24 12:59:15.000000000 +0200
> @@ -1475,6 +1475,34 @@ static void semundo_stop(struct seq_file
> 
>  static int semundo_show(struct seq_file *m, void *v)
>  {
> +	struct undo_list_data *data = m->private;
> +	struct ipc_namespace *ns = data->ipc_ns;
> +	struct sem_undo	*u = v;
> +	int nsems, i;
> +	struct sem_array *sma;
> +
> +	/*
> +	 * This semid has been deleted, ignore it.
> +	 * Even if we skipped all sem_undo belonging to deleted semid
> +	 * in semundo_next(), some more deletions may have happened.
> +	 */
> +	if (u->semid == -1)
> +		return 0;
> +
> +	seq_printf(m, "%10d", u->semid);
> +
> +	sma = sem_lock(ns, u->semid);
> +	if (IS_ERR(sma))
> +		goto out;
> +
> +	nsems = sma->sem_nsems;
> +	sem_unlock(sma);
> +
> +	for (i = 0; i < nsems; i++)
> +		seq_printf(m, " %6d", u->semadj[i]);
> +
> +out:
> +	seq_putc(m, '\n');
>  	return 0;
>  }
> 
> 
> --

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 6/6] IPC/sem: .write operation for /proc/<self>/semundo
       [not found]   ` <20080625135540.271934000-6ktuUTfB/bM@public.gmane.org>
@ 2008-06-25 21:09     ` Serge E. Hallyn
       [not found]       ` <20080625210937.GF16374-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2008-06-27 14:06     ` Serge E. Hallyn
  1 sibling, 1 reply; 20+ messages in thread
From: Serge E. Hallyn @ 2008-06-25 21:09 UTC (permalink / raw)
  To: Nadia.Derbey-6ktuUTfB/bM
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Nadia.Derbey-6ktuUTfB/bM@public.gmane.org (Nadia.Derbey-6ktuUTfB/bM@public.gmane.org):
> PATCH [06/06]
> 
> This patch introduces the .write seq operation for /proc/pid/semundo.
> 
> In order to simplify the locking strategy, the write operation is only allowed
> to 'current'.

There should also be a patch against Documentation/filesystems/proc.txt

> Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>
> 
> ---
>  fs/proc/base.c |    2 
>  ipc/sem.c      |  250 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 250 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.26-rc5-mm3/fs/proc/base.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/fs/proc/base.c	2008-06-24 10:03:33.000000000 +0200
> +++ linux-2.6.26-rc5-mm3/fs/proc/base.c	2008-06-24 13:21:44.000000000 +0200
> @@ -2526,7 +2526,7 @@ static const struct pid_entry tgid_base_
>  	INF("io",	S_IRUGO, tgid_io_accounting),
>  #endif
>  #ifdef CONFIG_SYSVIPC
> -	REG("semundo",  S_IRUGO, semundo),
> +	REG("semundo",  S_IWUSR|S_IRUGO, semundo),
>  #endif
>  };
> 
> Index: linux-2.6.26-rc5-mm3/ipc/sem.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 12:59:15.000000000 +0200
> +++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-25 08:56:07.000000000 +0200
> @@ -1554,7 +1554,27 @@ static int semundo_open(struct inode *in
>  		goto out_err;
>  	}
> 
> -	ulp = get_proc_ulp(task);
> +	if (file->f_flags & O_WRONLY || file->f_flags & O_RDWR) {
> +		/*
> +		 * Writing is only allowed to current
> +		 */
> +		if (task != current) {
> +			put_task_struct(task);
> +			ret = -EPERM;
> +			goto out_err;
> +		}
> +		/*
> +		 * No need to increment the undo_list's refcnt: only current
> +		 * can be freeing it through exit, and we are current.
> +		 * Only signal through the ulp NULL pointer that it won't be
> +		 * necessary to decrement the refcnt during release.
> +		 * Actually, in this path the undo_list will be gotten during
> +		 * the write operation.
> +		 */

(I'll probably regret asking this when I turn out to be obviously wrong,
but:)

What if I'm opening the file O_RDWR and first read the file?  You'll end
up showing no entries, right?

Which may be ok, so long as it's properly documented in proc.txt...

thanks,
-serge

> +		ulp = NULL;
> +	} else
> +		ulp = get_proc_ulp(task);
> +
>  	ns = get_ipc_ns(task->nsproxy->ipc_ns);
>  	put_task_struct(task);
> 
> @@ -1574,6 +1594,233 @@ out_err:
>  	return ret;
>  }
> 
> +/* Skip all spaces at the beginning of the buffer */
> +static inline int skip_space(const char __user **buf, size_t *len)
> +{
> +	char c = 0;
> +	while (*len) {
> +		if (get_user(c, *buf))
> +			return -EFAULT;
> +		if (c != '\t' && c != ' ')
> +			break;
> +		--*len;
> +		++*buf;
> +	}
> +	return c;
> +}
> +
> +/* Retrieve the first numerical value contained in the string.
> + * Note: The value is supposed to be a 32-bit integer.
> + */
> +static inline int get_next_value(const char __user **buf, size_t *len, int *val)
> +{
> +#define BUFLEN 11
> +	int err, neg = 0, left;
> +	char s[BUFLEN], *p;
> +
> +	err = skip_space(buf, len);
> +	if (err < 0)
> +		return err;
> +	if (!*len)
> +		return INT_MAX;
> +	if (err == '\n') {
> +		++*buf;
> +		--*len;
> +		return INT_MAX;
> +	}
> +	if (err == '-') {
> +		++*buf;
> +		--*len;
> +		neg = 1;
> +	}
> +
> +	left = *len;
> +	if (left > sizeof(s) - 1)
> +		left = sizeof(s) - 1;
> +	if (copy_from_user(s, *buf, left))
> +		return -EFAULT;
> +
> +	s[left] = 0;
> +	p = s;
> +	if (*p < '0' || *p > '9')
> +		return -EINVAL;
> +
> +	*val = simple_strtoul(p, &p, 0);
> +	if (neg)
> +		*val = -(*val);
> +
> +	left = p-s;
> +	(*len) -= left;
> +	(*buf) += left;
> +
> +	return 0;
> +#undef BUFLEN
> +}
> +
> +/*
> + * Reads a line from /proc/self/semundo.
> + * Returns the number of undo values read (or errcode upon failure).
> + * @id: pointer to the semid (filled in with 1st field in the line)
> + * @array: semundo values (filled in iwth remaining fields in the line).
> + * @array_len: max # of expected semundo values
> + */
> +static inline int semundo_readline(const char __user **buf, size_t *left,
> +				   int *id, short *array, int array_len)
> +{
> +	int i, val, err;
> +
> +	/* Read semid */
> +	err = get_next_value(buf, left, id);
> +	if (err)
> +		return err;
> +
> +	/* Read all (semundo-) values of a full line */
> +	for (i = 0; ; i++) {
> +		err = get_next_value(buf, left, &val);
> +		if (err < 0)
> +			return err;
> +		/* reached end of line or end of buffer */
> +		if (err == INT_MAX)
> +			break;
> +		/* Return an error if we get more values than expected */
> +		if (i < array_len)
> +			array[i] = val;
> +		else
> +			return -EINVAL;
> +	}
> +	return i;
> +}
> +
> +/*
> + * sets or updates the undo values for the undo_list of a given semaphore id.
> + * This is exactly the same code sequence as sys_semtimedop if we have undos.
> + */
> +static inline int semundo_update(struct ipc_namespace *ns, int semid,
> +				short *array, int size)
> +{
> +	struct sem_undo *un;
> +	struct sem_array *sma;
> +	int ret = 0;
> +
> +	un = find_alloc_undo(ns, semid);
> +	if (IS_ERR(un)) {
> +		ret = PTR_ERR(un);
> +		goto out;
> +	}
> +
> +	/* lookup the sem_array */
> +	sma = sem_lock(ns, semid);
> +	if (IS_ERR(sma)) {
> +		ret = PTR_ERR(sma);
> +		rcu_read_unlock();
> +		goto out;
> +	}
> +
> +	/*
> +	 * find_alloc_undo opened an rcu read section to protect un.
> +	 * Releasing it here is safe:
> +	 *    . sem_lock is held, so we are protected against IPC_RMID
> +	 *    . the refcnt won't fall to 0 since exit_sem only operates on
> +	 *      current and we are the current.
> +	 */
> +	rcu_read_unlock();
> +
> +	/*
> +	 * semid identifiers are not unique - find_alloc_undo() may have
> +	 * allocated an undo structure, it was invalidated by an RMID and now
> +	 * a new array received the same id.
> +	 * Check and fail.
> +	 * This case can be detected checking un->semid. The existance of
> +	 * "un" itself is guaranteed by rcu.
> +	 */
> +	if (un->semid == -1) {
> +		ret = -EIDRM;
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * If the number of values given does not match the number of
> +	 * semaphores in the array, consider this as an error.
> +	 */
> +	if (size != sma->sem_nsems) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	/* update the undo values */
> +	while (--size >= 0)
> +		un->semadj[size] = array[size];
> +
> +	/* maybe some queued-up processes were waiting for this */
> +	update_queue(sma);
> +
> +out_unlock:
> +	sem_unlock(sma);
> +out:
> +	return ret;
> +}
> +
> +/*
> + * write operation for /proc/self/semundo file
> + *
> + * Only current is allowed to write to its semundo file.
> + *
> + * The expected string format is:
> + * "<semID> <val1> <val2> ... <valN>"
> + *
> + * It sets (or updates) the sem_undo list for 'current' and the target
> + * <semID>, to the given 'undo' values.
> + *
> + * <semID> must match an existing semaphore array.
> + * The number of values following <semID> must match the number of semaphores
> + * in the corresponding array.
> + *
> + * Multiple semID's can be passed simultaneously: newline ('\n') is considered
> + * as a separator in that case.
> + *
> + * Note: it is not allowed to set the sem_undo list for a given semID using
> + *       mutliple write calls.
> + */
> +static ssize_t semundo_write(struct file *file, const char __user *buf,
> +			     size_t count, loff_t *ppos)
> +{
> +	struct seq_file *m = file->private_data;
> +	short *array;
> +	int err, max_sem, semid = 0;
> +	size_t left = count;
> +	struct undo_list_data *data = m->private;
> +	struct ipc_namespace *ns = data->ipc_ns;
> +
> +	if (data->undo_list)
> +		/* Should never happen */
> +		return -EINVAL;
> +
> +	max_sem = ns->sc_semmsl;
> +
> +	array = kmalloc(sizeof(short)*max_sem, GFP_KERNEL);
> +	if (array == NULL)
> +		return -ENOMEM;
> +
> +	while (left) {
> +		int nval;
> +
> +		nval = semundo_readline(&buf, &left, &semid, array, max_sem);
> +		if (nval < 0) {
> +			err = nval;
> +			goto out;
> +		}
> +
> +		err = semundo_update(ns, semid, array, nval);
> +		if (err)
> +			goto out;
> +	}
> +	err = count - left;
> +
> +out:
> +	kfree(array);
> +	return err;
> +}
> +
>  static int semundo_release(struct inode *inode, struct file *file)
>  {
>  	struct seq_file *m = file->private_data;
> @@ -1590,6 +1837,7 @@ static int semundo_release(struct inode 
>  const struct file_operations proc_semundo_operations = {
>  	.open		= semundo_open,
>  	.read		= seq_read,
> +	.write		= semundo_write,
>  	.llseek		= seq_lseek,
>  	.release	= semundo_release,
>  };
> 
> --

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 2/6] IPC/sem: per <pid> semundo file in procfs
       [not found]   ` <20080625135538.762662000-6ktuUTfB/bM@public.gmane.org>
  2008-06-25 20:33     ` Serge E. Hallyn
@ 2008-06-26  5:08     ` Michael Kerrisk
       [not found]       ` <517f3f820806252208m1f5b91dfm38b8babff73adf72-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 20+ messages in thread
From: Michael Kerrisk @ 2008-06-26  5:08 UTC (permalink / raw)
  To: Nadia.Derbey-6ktuUTfB/bM@public.gmane.org
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Pierre Peiffer, Michael Kerrisk

On 6/25/08, Nadia.Derbey-6ktuUTfB/bM@public.gmane.org <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org> wrote:
> PATCH [02/06]
>
>  This patch adds a new procfs interface to display the per-process semundo
>  data.
>
>  A new per-PID file is added, called "semundo".
>  It contains one line per semaphore IPC where there is something to undo for
>  this process.
>  Then, each line contains the semid followed by each undo value
>  corresponding to each semaphores of the semaphores array.
>
>  This interface will be particularly useful to allow a user access
>  these data, for example for checkpointing a process
>
>  With this patch, the semundo file can only be accessed in read mode.
>  When this file is opened, if an undo_list exists for the target process, it
>  is accessed in an rcu read section, and its refcount is incremented, avoiding
>  that it be freed.
>  The reverse is done during the release operation, and the undo_list is
>  freed if the process reading the file was the last process accessing that
>  list.
>
>  Signed-off-by: Pierre Peiffer <pierre.peiffer-6ktuUTfB/bM@public.gmane.org>
>  Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>

Nadia,

This is a kernel-userland interface change, please do CC me, so that I
can change the man pages if needed.

Cheers,

Michael



>  ---
>   fs/proc/base.c     |    3 +
>   fs/proc/internal.h |    1
>   ipc/sem.c          |  127 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>   3 files changed, 128 insertions(+), 3 deletions(-)
>
>  Index: linux-2.6.26-rc5-mm3/fs/proc/base.c
>  ===================================================================
>  --- linux-2.6.26-rc5-mm3.orig/fs/proc/base.c    2008-06-24 09:05:09.000000000 +0200
>  +++ linux-2.6.26-rc5-mm3/fs/proc/base.c 2008-06-24 10:03:33.000000000 +0200
>  @@ -2525,6 +2525,9 @@ static const struct pid_entry tgid_base_
>   #ifdef CONFIG_TASK_IO_ACCOUNTING
>         INF("io",       S_IRUGO, tgid_io_accounting),
>   #endif
>  +#ifdef CONFIG_SYSVIPC
>  +       REG("semundo",  S_IRUGO, semundo),
>  +#endif
>   };
>
>   static int proc_tgid_base_readdir(struct file * filp,
>  Index: linux-2.6.26-rc5-mm3/fs/proc/internal.h
>  ===================================================================
>  --- linux-2.6.26-rc5-mm3.orig/fs/proc/internal.h        2008-06-24 09:05:09.000000000 +0200
>  +++ linux-2.6.26-rc5-mm3/fs/proc/internal.h     2008-06-24 10:04:19.000000000 +0200
>  @@ -65,6 +65,7 @@ extern const struct file_operations proc
>   extern const struct file_operations proc_net_operations;
>   extern const struct file_operations proc_kmsg_operations;
>   extern const struct inode_operations proc_net_inode_operations;
>  +extern const struct file_operations proc_semundo_operations;
>
>   void free_proc_entry(struct proc_dir_entry *de);
>
>  Index: linux-2.6.26-rc5-mm3/ipc/sem.c
>  ===================================================================
>  --- linux-2.6.26-rc5-mm3.orig/ipc/sem.c 2008-06-24 09:37:33.000000000 +0200
>  +++ linux-2.6.26-rc5-mm3/ipc/sem.c      2008-06-24 10:59:46.000000000 +0200
>  @@ -97,6 +97,7 @@ static void freeary(struct ipc_namespace
>   #ifdef CONFIG_PROC_FS
>   static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
>   #endif
>  +static void free_semundo_list(struct sem_undo_list *, struct ipc_namespace *);
>
>   #define SEMMSL_FAST    256 /* 512 bytes on stack */
>   #define SEMOPM_FAST    64  /* ~ 372 bytes on stack */
>  @@ -1282,8 +1283,14 @@ void exit_sem(struct task_struct *tsk)
>         rcu_assign_pointer(tsk->sysvsem.undo_list, NULL);
>         synchronize_rcu();
>
>  -       if (!atomic_dec_and_test(&ulp->refcnt))
>  -               return;
>  +       if (atomic_dec_and_test(&ulp->refcnt))
>  +               free_semundo_list(ulp, tsk->nsproxy->ipc_ns);
>  +}
>  +
>  +static void free_semundo_list(struct sem_undo_list *ulp,
>  +                               struct ipc_namespace *ipc_ns)
>  +{
>  +       BUG_ON(atomic_read(&ulp->refcnt));
>
>         for (;;) {
>                 struct sem_array *sma;
>  @@ -1303,7 +1310,7 @@ void exit_sem(struct task_struct *tsk)
>                 if (semid == -1)
>                         break;
>
>  -               sma = sem_lock_check(tsk->nsproxy->ipc_ns, un->semid);
>  +               sma = sem_lock_check(ipc_ns, un->semid);
>
>                 /* exit_sem raced with IPC_RMID, nothing to do */
>                 if (IS_ERR(sma))
>  @@ -1384,4 +1391,118 @@ static int sysvipc_sem_proc_show(struct
>                           sma->sem_otime,
>                           sma->sem_ctime);
>   }
>  +
>  +struct undo_list_data {
>  +       struct sem_undo_list *undo_list;
>  +       struct ipc_namespace *ipc_ns;
>  +};
>  +
>  +/* iterator */
>  +static void *semundo_start(struct seq_file *m, loff_t *ppos)
>  +{
>  +       return NULL;
>  +}
>  +
>  +static void *semundo_next(struct seq_file *m, void *v, loff_t *ppos)
>  +{
>  +       return NULL;
>  +}
>  +
>  +static void semundo_stop(struct seq_file *m, void *v)
>  +{
>  +       return;
>  +}
>  +
>  +static int semundo_show(struct seq_file *m, void *v)
>  +{
>  +       return 0;
>  +}
>  +
>  +static struct seq_operations semundo_op = {
>  +       .start  = semundo_start,
>  +       .next   = semundo_next,
>  +       .stop   = semundo_stop,
>  +       .show   = semundo_show
>  +};
>  +
>  +static struct sem_undo_list *get_proc_ulp(struct task_struct *tsk)
>  +{
>  +       struct sem_undo_list *ulp;
>  +
>  +       rcu_read_lock();
>  +       ulp = rcu_dereference(tsk->sysvsem.undo_list);
>  +       if (ulp)
>  +               if (!atomic_inc_not_zero(&ulp->refcnt))
>  +                       ulp = NULL;
>  +       rcu_read_unlock();
>  +       return ulp;
>  +}
>  +
>  +static void put_proc_ulp(struct sem_undo_list *ulp,
>  +                                       struct ipc_namespace *ns)
>  +{
>  +       if (ulp && atomic_dec_and_test(&ulp->refcnt))
>  +               free_semundo_list(ulp, ns);
>  +}
>  +
>  +/*
>  + * semundo_open: open operation for /proc/<PID>/semundo file
>  + */
>  +static int semundo_open(struct inode *inode, struct file *file)
>  +{
>  +       struct task_struct *task;
>  +       struct sem_undo_list *ulp;
>  +       struct undo_list_data *data;
>  +       struct ipc_namespace *ns;
>  +       int ret = 0;
>  +
>  +       data = kzalloc(sizeof(*data), GFP_KERNEL);
>  +       if (!data)
>  +               return -ENOMEM;
>  +
>  +       task = get_pid_task(PROC_I(inode)->pid, PIDTYPE_PID);
>  +       if (!task) {
>  +               ret = -EINVAL;
>  +               goto out_err;
>  +       }
>  +
>  +       ulp = get_proc_ulp(task);
>  +       ns = get_ipc_ns(task->nsproxy->ipc_ns);
>  +       put_task_struct(task);
>  +
>  +       ret = seq_open(file, &semundo_op);
>  +       if (!ret) {
>  +               struct seq_file *m = file->private_data;
>  +               data->undo_list = ulp;
>  +               data->ipc_ns = ns;
>  +               m->private = data;
>  +               return 0;
>  +       }
>  +
>  +       put_proc_ulp(ulp, ns);
>  +       put_ipc_ns(ns);
>  +out_err:
>  +       kfree(data);
>  +       return ret;
>  +}
>  +
>  +static int semundo_release(struct inode *inode, struct file *file)
>  +{
>  +       struct seq_file *m = file->private_data;
>  +       struct undo_list_data *data = m->private;
>  +       struct sem_undo_list *ulp = data->undo_list;
>  +       struct ipc_namespace *ns = data->ipc_ns;
>  +
>  +       put_proc_ulp(ulp, ns);
>  +       put_ipc_ns(ns);
>  +       kfree(data);
>  +       return seq_release(inode, file);
>  +}
>  +
>  +const struct file_operations proc_semundo_operations = {
>  +       .open           = semundo_open,
>  +       .read           = seq_read,
>  +       .llseek         = seq_lseek,
>  +       .release        = semundo_release,
>  +};
>   #endif
>
>
>  --
>  _______________________________________________
>  Containers mailing list
>  Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>  https://lists.linux-foundation.org/mailman/listinfo/containers
>


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/6] IPC/sem: next operations for /proc/pid/semundo
       [not found]       ` <20080625205719.GD16374-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-06-26  5:35         ` Nadia Derbey
  0 siblings, 0 replies; 20+ messages in thread
From: Nadia Derbey @ 2008-06-26  5:35 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Serge E. Hallyn wrote:
> Quoting Nadia.Derbey-6ktuUTfB/bM@public.gmane.org (Nadia.Derbey-6ktuUTfB/bM@public.gmane.org):
> 
>>PATCH [04/06]
>>
>>This patch introduces the .next seq operation for /proc/pid/semundo.
>>
>>What should be mentioned here is that the undo_list lock is released between
>>between each iteration.
>>Doing this, we only guarantee to access some valid data during the .show,
> 
> 
> Ok so you count on an item sticking around for the duration of the
> rcu_read_cycle().  exit_sem() is therefore not an issue.  The other
> possible racer is freeary() as called from IPC_RMID, but while that
> could remove this entry from the undo_list->list_proc, it will wait
> an rcu cycle before it actually frees it.

Yes, the sem_undo structure is freed in an rcu callback in freeary() too.

> 
> Am I reading that all right?  If so, then:
> 
> 
>>not to have a full coherent view of the whole list. But, oth, this reduces the
>>the performance impact on the access to the undo_list.
>>
>>Signed-off-by: Pierre Peiffer <pierre.peiffer-6ktuUTfB/bM@public.gmane.org>
>>Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>
> 
> 
> Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> 
>>---
>> ipc/sem.c |   23 ++++++++++++++++++++++-
>> 1 file changed, 22 insertions(+), 1 deletion(-)
>>
>>Index: linux-2.6.26-rc5-mm3/ipc/sem.c
>>===================================================================
>>--- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 12:32:36.000000000 +0200
>>+++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-24 12:54:40.000000000 +0200
>>@@ -1440,7 +1440,28 @@ static void *semundo_start(struct seq_fi
>>
>> static void *semundo_next(struct seq_file *m, void *v, loff_t *ppos)
>> {
>>-	return NULL;
>>+	struct sem_undo	*undo = v;
>>+	struct undo_list_data *data = m->private;
>>+	struct sem_undo_list *ulp = data->undo_list;
>>+
>>+	/*
>>+	 * No need to protect against ulp being NULL, if we are here,
>>+	 * it can't be NULL.
>>+	 */
>>+	spin_lock(&ulp->lock);
>>+
>>+	do {
>>+		undo = list_entry(rcu_dereference(undo->list_proc.next),
>>+				struct sem_undo, list_proc);
>>+
>>+	} while (&undo->list_proc != &ulp->list_proc && undo->semid == -1);
>>+
>>+	++*ppos;
>>+	spin_unlock(&ulp->lock);
>>+
>>+	if (&undo->list_proc == &ulp->list_proc)
>>+		return NULL;
>>+	return undo;
>> }
>>
>> static void semundo_stop(struct seq_file *m, void *v)
>>
>>--
> 
> 
> 


-- 
===============================================================
Name.......... Nadia DERBEY
Organization.. BULL/DT/OSwR&D/Linux
---------------------------------------------------------------
Email......... mailto:Nadia.Derbey-6ktuUTfB/bM@public.gmane.org
Address....... BULL, B.P. 208, 38432 Echirolles Cedex, France
Tel........... (33) 76 29 77 62 [Internal Bull: (229) 77 62]
Telex,Fax..... 980648 F - (33) 76 29 76 00
Internal Bull. Mail: FREC-B1208
===============================================================

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 6/6] IPC/sem: .write operation for /proc/<self>/semundo
       [not found]       ` <20080625210937.GF16374-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-06-26  5:44         ` Nadia Derbey
  0 siblings, 0 replies; 20+ messages in thread
From: Nadia Derbey @ 2008-06-26  5:44 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Serge E. Hallyn wrote:
> Quoting Nadia.Derbey-6ktuUTfB/bM@public.gmane.org (Nadia.Derbey-6ktuUTfB/bM@public.gmane.org):
> 
>>PATCH [06/06]
>>
>>This patch introduces the .write seq operation for /proc/pid/semundo.
>>
>>In order to simplify the locking strategy, the write operation is only allowed
>>to 'current'.
> 
> 
> There should also be a patch against Documentation/filesystems/proc.txt
> 

Ok, will do that.

> 
>>Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>
>>
>>---
>> fs/proc/base.c |    2 
>> ipc/sem.c      |  250 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 250 insertions(+), 2 deletions(-)
>>
>>Index: linux-2.6.26-rc5-mm3/fs/proc/base.c
>>===================================================================
>>--- linux-2.6.26-rc5-mm3.orig/fs/proc/base.c	2008-06-24 10:03:33.000000000 +0200
>>+++ linux-2.6.26-rc5-mm3/fs/proc/base.c	2008-06-24 13:21:44.000000000 +0200
>>@@ -2526,7 +2526,7 @@ static const struct pid_entry tgid_base_
>> 	INF("io",	S_IRUGO, tgid_io_accounting),
>> #endif
>> #ifdef CONFIG_SYSVIPC
>>-	REG("semundo",  S_IRUGO, semundo),
>>+	REG("semundo",  S_IWUSR|S_IRUGO, semundo),
>> #endif
>> };
>>
>>Index: linux-2.6.26-rc5-mm3/ipc/sem.c
>>===================================================================
>>--- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 12:59:15.000000000 +0200
>>+++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-25 08:56:07.000000000 +0200
>>@@ -1554,7 +1554,27 @@ static int semundo_open(struct inode *in
>> 		goto out_err;
>> 	}
>>
>>-	ulp = get_proc_ulp(task);
>>+	if (file->f_flags & O_WRONLY || file->f_flags & O_RDWR) {
>>+		/*
>>+		 * Writing is only allowed to current
>>+		 */
>>+		if (task != current) {
>>+			put_task_struct(task);
>>+			ret = -EPERM;
>>+			goto out_err;
>>+		}
>>+		/*
>>+		 * No need to increment the undo_list's refcnt: only current
>>+		 * can be freeing it through exit, and we are current.
>>+		 * Only signal through the ulp NULL pointer that it won't be
>>+		 * necessary to decrement the refcnt during release.
>>+		 * Actually, in this path the undo_list will be gotten during
>>+		 * the write operation.
>>+		 */
> 
> 
> (I'll probably regret asking this when I turn out to be obviously wrong,
> but:)
> 
> What if I'm opening the file O_RDWR and first read the file?  You'll end
> up showing no entries, right?

Yes, exactly. I know this is weird, and I 1st thought of delaying the 
check on task = current until the actual .write operation. But it would 
have made things a bit more complicated:
. add the task pointer to the private data
. delay the put_task_struct

> 
> Which may be ok, so long as it's properly documented in proc.txt...

Ok, will do that

> 
> thanks,
> -serge
> 
> 
>>+		ulp = NULL;
>>+	} else
>>+		ulp = get_proc_ulp(task);
>>+
>> 	ns = get_ipc_ns(task->nsproxy->ipc_ns);
>> 	put_task_struct(task);
>>
>>@@ -1574,6 +1594,233 @@ out_err:
>> 	return ret;
>> }
>>
>>+/* Skip all spaces at the beginning of the buffer */
>>+static inline int skip_space(const char __user **buf, size_t *len)
>>+{
>>+	char c = 0;
>>+	while (*len) {
>>+		if (get_user(c, *buf))
>>+			return -EFAULT;
>>+		if (c != '\t' && c != ' ')
>>+			break;
>>+		--*len;
>>+		++*buf;
>>+	}
>>+	return c;
>>+}
>>+
>>+/* Retrieve the first numerical value contained in the string.
>>+ * Note: The value is supposed to be a 32-bit integer.
>>+ */
>>+static inline int get_next_value(const char __user **buf, size_t *len, int *val)
>>+{
>>+#define BUFLEN 11
>>+	int err, neg = 0, left;
>>+	char s[BUFLEN], *p;
>>+
>>+	err = skip_space(buf, len);
>>+	if (err < 0)
>>+		return err;
>>+	if (!*len)
>>+		return INT_MAX;
>>+	if (err == '\n') {
>>+		++*buf;
>>+		--*len;
>>+		return INT_MAX;
>>+	}
>>+	if (err == '-') {
>>+		++*buf;
>>+		--*len;
>>+		neg = 1;
>>+	}
>>+
>>+	left = *len;
>>+	if (left > sizeof(s) - 1)
>>+		left = sizeof(s) - 1;
>>+	if (copy_from_user(s, *buf, left))
>>+		return -EFAULT;
>>+
>>+	s[left] = 0;
>>+	p = s;
>>+	if (*p < '0' || *p > '9')
>>+		return -EINVAL;
>>+
>>+	*val = simple_strtoul(p, &p, 0);
>>+	if (neg)
>>+		*val = -(*val);
>>+
>>+	left = p-s;
>>+	(*len) -= left;
>>+	(*buf) += left;
>>+
>>+	return 0;
>>+#undef BUFLEN
>>+}
>>+
>>+/*
>>+ * Reads a line from /proc/self/semundo.
>>+ * Returns the number of undo values read (or errcode upon failure).
>>+ * @id: pointer to the semid (filled in with 1st field in the line)
>>+ * @array: semundo values (filled in iwth remaining fields in the line).
>>+ * @array_len: max # of expected semundo values
>>+ */
>>+static inline int semundo_readline(const char __user **buf, size_t *left,
>>+				   int *id, short *array, int array_len)
>>+{
>>+	int i, val, err;
>>+
>>+	/* Read semid */
>>+	err = get_next_value(buf, left, id);
>>+	if (err)
>>+		return err;
>>+
>>+	/* Read all (semundo-) values of a full line */
>>+	for (i = 0; ; i++) {
>>+		err = get_next_value(buf, left, &val);
>>+		if (err < 0)
>>+			return err;
>>+		/* reached end of line or end of buffer */
>>+		if (err == INT_MAX)
>>+			break;
>>+		/* Return an error if we get more values than expected */
>>+		if (i < array_len)
>>+			array[i] = val;
>>+		else
>>+			return -EINVAL;
>>+	}
>>+	return i;
>>+}
>>+
>>+/*
>>+ * sets or updates the undo values for the undo_list of a given semaphore id.
>>+ * This is exactly the same code sequence as sys_semtimedop if we have undos.
>>+ */
>>+static inline int semundo_update(struct ipc_namespace *ns, int semid,
>>+				short *array, int size)
>>+{
>>+	struct sem_undo *un;
>>+	struct sem_array *sma;
>>+	int ret = 0;
>>+
>>+	un = find_alloc_undo(ns, semid);
>>+	if (IS_ERR(un)) {
>>+		ret = PTR_ERR(un);
>>+		goto out;
>>+	}
>>+
>>+	/* lookup the sem_array */
>>+	sma = sem_lock(ns, semid);
>>+	if (IS_ERR(sma)) {
>>+		ret = PTR_ERR(sma);
>>+		rcu_read_unlock();
>>+		goto out;
>>+	}
>>+
>>+	/*
>>+	 * find_alloc_undo opened an rcu read section to protect un.
>>+	 * Releasing it here is safe:
>>+	 *    . sem_lock is held, so we are protected against IPC_RMID
>>+	 *    . the refcnt won't fall to 0 since exit_sem only operates on
>>+	 *      current and we are the current.
>>+	 */
>>+	rcu_read_unlock();
>>+
>>+	/*
>>+	 * semid identifiers are not unique - find_alloc_undo() may have
>>+	 * allocated an undo structure, it was invalidated by an RMID and now
>>+	 * a new array received the same id.
>>+	 * Check and fail.
>>+	 * This case can be detected checking un->semid. The existance of
>>+	 * "un" itself is guaranteed by rcu.
>>+	 */
>>+	if (un->semid == -1) {
>>+		ret = -EIDRM;
>>+		goto out_unlock;
>>+	}
>>+
>>+	/*
>>+	 * If the number of values given does not match the number of
>>+	 * semaphores in the array, consider this as an error.
>>+	 */
>>+	if (size != sma->sem_nsems) {
>>+		ret = -EINVAL;
>>+		goto out_unlock;
>>+	}
>>+
>>+	/* update the undo values */
>>+	while (--size >= 0)
>>+		un->semadj[size] = array[size];
>>+
>>+	/* maybe some queued-up processes were waiting for this */
>>+	update_queue(sma);
>>+
>>+out_unlock:
>>+	sem_unlock(sma);
>>+out:
>>+	return ret;
>>+}
>>+
>>+/*
>>+ * write operation for /proc/self/semundo file
>>+ *
>>+ * Only current is allowed to write to its semundo file.
>>+ *
>>+ * The expected string format is:
>>+ * "<semID> <val1> <val2> ... <valN>"
>>+ *
>>+ * It sets (or updates) the sem_undo list for 'current' and the target
>>+ * <semID>, to the given 'undo' values.
>>+ *
>>+ * <semID> must match an existing semaphore array.
>>+ * The number of values following <semID> must match the number of semaphores
>>+ * in the corresponding array.
>>+ *
>>+ * Multiple semID's can be passed simultaneously: newline ('\n') is considered
>>+ * as a separator in that case.
>>+ *
>>+ * Note: it is not allowed to set the sem_undo list for a given semID using
>>+ *       mutliple write calls.
>>+ */
>>+static ssize_t semundo_write(struct file *file, const char __user *buf,
>>+			     size_t count, loff_t *ppos)
>>+{
>>+	struct seq_file *m = file->private_data;
>>+	short *array;
>>+	int err, max_sem, semid = 0;
>>+	size_t left = count;
>>+	struct undo_list_data *data = m->private;
>>+	struct ipc_namespace *ns = data->ipc_ns;
>>+
>>+	if (data->undo_list)
>>+		/* Should never happen */
>>+		return -EINVAL;
>>+
>>+	max_sem = ns->sc_semmsl;
>>+
>>+	array = kmalloc(sizeof(short)*max_sem, GFP_KERNEL);
>>+	if (array == NULL)
>>+		return -ENOMEM;
>>+
>>+	while (left) {
>>+		int nval;
>>+
>>+		nval = semundo_readline(&buf, &left, &semid, array, max_sem);
>>+		if (nval < 0) {
>>+			err = nval;
>>+			goto out;
>>+		}
>>+
>>+		err = semundo_update(ns, semid, array, nval);
>>+		if (err)
>>+			goto out;
>>+	}
>>+	err = count - left;
>>+
>>+out:
>>+	kfree(array);
>>+	return err;
>>+}
>>+
>> static int semundo_release(struct inode *inode, struct file *file)
>> {
>> 	struct seq_file *m = file->private_data;
>>@@ -1590,6 +1837,7 @@ static int semundo_release(struct inode 
>> const struct file_operations proc_semundo_operations = {
>> 	.open		= semundo_open,
>> 	.read		= seq_read,
>>+	.write		= semundo_write,
>> 	.llseek		= seq_lseek,
>> 	.release	= semundo_release,
>> };
>>
>>--
> 
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 2/6] IPC/sem: per <pid> semundo file in procfs
       [not found]       ` <517f3f820806252208m1f5b91dfm38b8babff73adf72-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-06-26  6:07         ` Nadia Derbey
  0 siblings, 0 replies; 20+ messages in thread
From: Nadia Derbey @ 2008-06-26  6:07 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Michael Kerrisk wrote:
> On 6/25/08, Nadia.Derbey-6ktuUTfB/bM@public.gmane.org <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org> wrote:
> 
>>PATCH [02/06]
>>
>> This patch adds a new procfs interface to display the per-process semundo
>> data.
>>
>> A new per-PID file is added, called "semundo".
>> It contains one line per semaphore IPC where there is something to undo for
>> this process.
>> Then, each line contains the semid followed by each undo value
>> corresponding to each semaphores of the semaphores array.
>>
>> This interface will be particularly useful to allow a user access
>> these data, for example for checkpointing a process
>>
>> With this patch, the semundo file can only be accessed in read mode.
>> When this file is opened, if an undo_list exists for the target process, it
>> is accessed in an rcu read section, and its refcount is incremented, avoiding
>> that it be freed.
>> The reverse is done during the release operation, and the undo_list is
>> freed if the process reading the file was the last process accessing that
>> list.
>>
>> Signed-off-by: Pierre Peiffer <pierre.peiffer-6ktuUTfB/bM@public.gmane.org>
>> Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>
> 
> 
> Nadia,
> 
> This is a kernel-userland interface change, please do CC me, so that I
> can change the man pages if needed.

So sorry Michael, but looks like I forgot the complete documentation 
part: I also have an issue from Serge about a missing documentation in 
the Documentation directory.
Will fix that and take that opportunity to put you in the loop.

Regards,
Nadia

> 
> Cheers,
> 
> Michael
> 
> 
> 
> 
>> ---
>>  fs/proc/base.c     |    3 +
>>  fs/proc/internal.h |    1
>>  ipc/sem.c          |  127 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>>  3 files changed, 128 insertions(+), 3 deletions(-)
>>
>> Index: linux-2.6.26-rc5-mm3/fs/proc/base.c
>> ===================================================================
>> --- linux-2.6.26-rc5-mm3.orig/fs/proc/base.c    2008-06-24 09:05:09.000000000 +0200
>> +++ linux-2.6.26-rc5-mm3/fs/proc/base.c 2008-06-24 10:03:33.000000000 +0200
>> @@ -2525,6 +2525,9 @@ static const struct pid_entry tgid_base_
>>  #ifdef CONFIG_TASK_IO_ACCOUNTING
>>        INF("io",       S_IRUGO, tgid_io_accounting),
>>  #endif
>> +#ifdef CONFIG_SYSVIPC
>> +       REG("semundo",  S_IRUGO, semundo),
>> +#endif
>>  };
>>
>>  static int proc_tgid_base_readdir(struct file * filp,
>> Index: linux-2.6.26-rc5-mm3/fs/proc/internal.h
>> ===================================================================
>> --- linux-2.6.26-rc5-mm3.orig/fs/proc/internal.h        2008-06-24 09:05:09.000000000 +0200
>> +++ linux-2.6.26-rc5-mm3/fs/proc/internal.h     2008-06-24 10:04:19.000000000 +0200
>> @@ -65,6 +65,7 @@ extern const struct file_operations proc
>>  extern const struct file_operations proc_net_operations;
>>  extern const struct file_operations proc_kmsg_operations;
>>  extern const struct inode_operations proc_net_inode_operations;
>> +extern const struct file_operations proc_semundo_operations;
>>
>>  void free_proc_entry(struct proc_dir_entry *de);
>>
>> Index: linux-2.6.26-rc5-mm3/ipc/sem.c
>> ===================================================================
>> --- linux-2.6.26-rc5-mm3.orig/ipc/sem.c 2008-06-24 09:37:33.000000000 +0200
>> +++ linux-2.6.26-rc5-mm3/ipc/sem.c      2008-06-24 10:59:46.000000000 +0200
>> @@ -97,6 +97,7 @@ static void freeary(struct ipc_namespace
>>  #ifdef CONFIG_PROC_FS
>>  static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
>>  #endif
>> +static void free_semundo_list(struct sem_undo_list *, struct ipc_namespace *);
>>
>>  #define SEMMSL_FAST    256 /* 512 bytes on stack */
>>  #define SEMOPM_FAST    64  /* ~ 372 bytes on stack */
>> @@ -1282,8 +1283,14 @@ void exit_sem(struct task_struct *tsk)
>>        rcu_assign_pointer(tsk->sysvsem.undo_list, NULL);
>>        synchronize_rcu();
>>
>> -       if (!atomic_dec_and_test(&ulp->refcnt))
>> -               return;
>> +       if (atomic_dec_and_test(&ulp->refcnt))
>> +               free_semundo_list(ulp, tsk->nsproxy->ipc_ns);
>> +}
>> +
>> +static void free_semundo_list(struct sem_undo_list *ulp,
>> +                               struct ipc_namespace *ipc_ns)
>> +{
>> +       BUG_ON(atomic_read(&ulp->refcnt));
>>
>>        for (;;) {
>>                struct sem_array *sma;
>> @@ -1303,7 +1310,7 @@ void exit_sem(struct task_struct *tsk)
>>                if (semid == -1)
>>                        break;
>>
>> -               sma = sem_lock_check(tsk->nsproxy->ipc_ns, un->semid);
>> +               sma = sem_lock_check(ipc_ns, un->semid);
>>
>>                /* exit_sem raced with IPC_RMID, nothing to do */
>>                if (IS_ERR(sma))
>> @@ -1384,4 +1391,118 @@ static int sysvipc_sem_proc_show(struct
>>                          sma->sem_otime,
>>                          sma->sem_ctime);
>>  }
>> +
>> +struct undo_list_data {
>> +       struct sem_undo_list *undo_list;
>> +       struct ipc_namespace *ipc_ns;
>> +};
>> +
>> +/* iterator */
>> +static void *semundo_start(struct seq_file *m, loff_t *ppos)
>> +{
>> +       return NULL;
>> +}
>> +
>> +static void *semundo_next(struct seq_file *m, void *v, loff_t *ppos)
>> +{
>> +       return NULL;
>> +}
>> +
>> +static void semundo_stop(struct seq_file *m, void *v)
>> +{
>> +       return;
>> +}
>> +
>> +static int semundo_show(struct seq_file *m, void *v)
>> +{
>> +       return 0;
>> +}
>> +
>> +static struct seq_operations semundo_op = {
>> +       .start  = semundo_start,
>> +       .next   = semundo_next,
>> +       .stop   = semundo_stop,
>> +       .show   = semundo_show
>> +};
>> +
>> +static struct sem_undo_list *get_proc_ulp(struct task_struct *tsk)
>> +{
>> +       struct sem_undo_list *ulp;
>> +
>> +       rcu_read_lock();
>> +       ulp = rcu_dereference(tsk->sysvsem.undo_list);
>> +       if (ulp)
>> +               if (!atomic_inc_not_zero(&ulp->refcnt))
>> +                       ulp = NULL;
>> +       rcu_read_unlock();
>> +       return ulp;
>> +}
>> +
>> +static void put_proc_ulp(struct sem_undo_list *ulp,
>> +                                       struct ipc_namespace *ns)
>> +{
>> +       if (ulp && atomic_dec_and_test(&ulp->refcnt))
>> +               free_semundo_list(ulp, ns);
>> +}
>> +
>> +/*
>> + * semundo_open: open operation for /proc/<PID>/semundo file
>> + */
>> +static int semundo_open(struct inode *inode, struct file *file)
>> +{
>> +       struct task_struct *task;
>> +       struct sem_undo_list *ulp;
>> +       struct undo_list_data *data;
>> +       struct ipc_namespace *ns;
>> +       int ret = 0;
>> +
>> +       data = kzalloc(sizeof(*data), GFP_KERNEL);
>> +       if (!data)
>> +               return -ENOMEM;
>> +
>> +       task = get_pid_task(PROC_I(inode)->pid, PIDTYPE_PID);
>> +       if (!task) {
>> +               ret = -EINVAL;
>> +               goto out_err;
>> +       }
>> +
>> +       ulp = get_proc_ulp(task);
>> +       ns = get_ipc_ns(task->nsproxy->ipc_ns);
>> +       put_task_struct(task);
>> +
>> +       ret = seq_open(file, &semundo_op);
>> +       if (!ret) {
>> +               struct seq_file *m = file->private_data;
>> +               data->undo_list = ulp;
>> +               data->ipc_ns = ns;
>> +               m->private = data;
>> +               return 0;
>> +       }
>> +
>> +       put_proc_ulp(ulp, ns);
>> +       put_ipc_ns(ns);
>> +out_err:
>> +       kfree(data);
>> +       return ret;
>> +}
>> +
>> +static int semundo_release(struct inode *inode, struct file *file)
>> +{
>> +       struct seq_file *m = file->private_data;
>> +       struct undo_list_data *data = m->private;
>> +       struct sem_undo_list *ulp = data->undo_list;
>> +       struct ipc_namespace *ns = data->ipc_ns;
>> +
>> +       put_proc_ulp(ulp, ns);
>> +       put_ipc_ns(ns);
>> +       kfree(data);
>> +       return seq_release(inode, file);
>> +}
>> +
>> +const struct file_operations proc_semundo_operations = {
>> +       .open           = semundo_open,
>> +       .read           = seq_read,
>> +       .llseek         = seq_lseek,
>> +       .release        = semundo_release,
>> +};
>>  #endif
>>
>>
>> --
>> _______________________________________________
>> Containers mailing list
>> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
>> https://lists.linux-foundation.org/mailman/listinfo/containers
>>
> 
> 
> 


-- 
===============================================================
Name.......... Nadia DERBEY
Organization.. BULL/DT/OSwR&D/Linux
---------------------------------------------------------------
Email......... mailto:Nadia.Derbey-6ktuUTfB/bM@public.gmane.org
Address....... BULL, B.P. 208, 38432 Echirolles Cedex, France
Tel........... (33) 76 29 77 62 [Internal Bull: (229) 77 62]
Telex,Fax..... 980648 F - (33) 76 29 76 00
Internal Bull. Mail: FREC-B1208
===============================================================

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 6/6] IPC/sem: .write operation for /proc/<self>/semundo
       [not found]   ` <20080625135540.271934000-6ktuUTfB/bM@public.gmane.org>
  2008-06-25 21:09     ` Serge E. Hallyn
@ 2008-06-27 14:06     ` Serge E. Hallyn
       [not found]       ` <20080627140607.GA20581-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 20+ messages in thread
From: Serge E. Hallyn @ 2008-06-27 14:06 UTC (permalink / raw)
  To: Nadia.Derbey-6ktuUTfB/bM
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Nadia.Derbey-6ktuUTfB/bM@public.gmane.org (Nadia.Derbey-6ktuUTfB/bM@public.gmane.org):
> PATCH [06/06]
> 
> This patch introduces the .write seq operation for /proc/pid/semundo.
> 
> In order to simplify the locking strategy, the write operation is only allowed
> to 'current'.
> 
> 
> Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>

Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Have you written a patch to cryo in light of the new (target==current)
restriction?

thanks,
-serge

> 
> ---
>  fs/proc/base.c |    2 
>  ipc/sem.c      |  250 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 250 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.26-rc5-mm3/fs/proc/base.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/fs/proc/base.c	2008-06-24 10:03:33.000000000 +0200
> +++ linux-2.6.26-rc5-mm3/fs/proc/base.c	2008-06-24 13:21:44.000000000 +0200
> @@ -2526,7 +2526,7 @@ static const struct pid_entry tgid_base_
>  	INF("io",	S_IRUGO, tgid_io_accounting),
>  #endif
>  #ifdef CONFIG_SYSVIPC
> -	REG("semundo",  S_IRUGO, semundo),
> +	REG("semundo",  S_IWUSR|S_IRUGO, semundo),
>  #endif
>  };
> 
> Index: linux-2.6.26-rc5-mm3/ipc/sem.c
> ===================================================================
> --- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 12:59:15.000000000 +0200
> +++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-25 08:56:07.000000000 +0200
> @@ -1554,7 +1554,27 @@ static int semundo_open(struct inode *in
>  		goto out_err;
>  	}
> 
> -	ulp = get_proc_ulp(task);
> +	if (file->f_flags & O_WRONLY || file->f_flags & O_RDWR) {
> +		/*
> +		 * Writing is only allowed to current
> +		 */
> +		if (task != current) {
> +			put_task_struct(task);
> +			ret = -EPERM;
> +			goto out_err;
> +		}
> +		/*
> +		 * No need to increment the undo_list's refcnt: only current
> +		 * can be freeing it through exit, and we are current.
> +		 * Only signal through the ulp NULL pointer that it won't be
> +		 * necessary to decrement the refcnt during release.
> +		 * Actually, in this path the undo_list will be gotten during
> +		 * the write operation.
> +		 */
> +		ulp = NULL;
> +	} else
> +		ulp = get_proc_ulp(task);
> +
>  	ns = get_ipc_ns(task->nsproxy->ipc_ns);
>  	put_task_struct(task);
> 
> @@ -1574,6 +1594,233 @@ out_err:
>  	return ret;
>  }
> 
> +/* Skip all spaces at the beginning of the buffer */
> +static inline int skip_space(const char __user **buf, size_t *len)
> +{
> +	char c = 0;
> +	while (*len) {
> +		if (get_user(c, *buf))
> +			return -EFAULT;
> +		if (c != '\t' && c != ' ')
> +			break;
> +		--*len;
> +		++*buf;
> +	}
> +	return c;
> +}
> +
> +/* Retrieve the first numerical value contained in the string.
> + * Note: The value is supposed to be a 32-bit integer.
> + */
> +static inline int get_next_value(const char __user **buf, size_t *len, int *val)
> +{
> +#define BUFLEN 11
> +	int err, neg = 0, left;
> +	char s[BUFLEN], *p;
> +
> +	err = skip_space(buf, len);
> +	if (err < 0)
> +		return err;
> +	if (!*len)
> +		return INT_MAX;
> +	if (err == '\n') {
> +		++*buf;
> +		--*len;
> +		return INT_MAX;
> +	}
> +	if (err == '-') {
> +		++*buf;
> +		--*len;
> +		neg = 1;
> +	}
> +
> +	left = *len;
> +	if (left > sizeof(s) - 1)
> +		left = sizeof(s) - 1;
> +	if (copy_from_user(s, *buf, left))
> +		return -EFAULT;
> +
> +	s[left] = 0;
> +	p = s;
> +	if (*p < '0' || *p > '9')
> +		return -EINVAL;
> +
> +	*val = simple_strtoul(p, &p, 0);
> +	if (neg)
> +		*val = -(*val);
> +
> +	left = p-s;
> +	(*len) -= left;
> +	(*buf) += left;
> +
> +	return 0;
> +#undef BUFLEN
> +}
> +
> +/*
> + * Reads a line from /proc/self/semundo.
> + * Returns the number of undo values read (or errcode upon failure).
> + * @id: pointer to the semid (filled in with 1st field in the line)
> + * @array: semundo values (filled in iwth remaining fields in the line).
> + * @array_len: max # of expected semundo values
> + */
> +static inline int semundo_readline(const char __user **buf, size_t *left,
> +				   int *id, short *array, int array_len)
> +{
> +	int i, val, err;
> +
> +	/* Read semid */
> +	err = get_next_value(buf, left, id);
> +	if (err)
> +		return err;
> +
> +	/* Read all (semundo-) values of a full line */
> +	for (i = 0; ; i++) {
> +		err = get_next_value(buf, left, &val);
> +		if (err < 0)
> +			return err;
> +		/* reached end of line or end of buffer */
> +		if (err == INT_MAX)
> +			break;
> +		/* Return an error if we get more values than expected */
> +		if (i < array_len)
> +			array[i] = val;
> +		else
> +			return -EINVAL;
> +	}
> +	return i;
> +}
> +
> +/*
> + * sets or updates the undo values for the undo_list of a given semaphore id.
> + * This is exactly the same code sequence as sys_semtimedop if we have undos.
> + */
> +static inline int semundo_update(struct ipc_namespace *ns, int semid,
> +				short *array, int size)
> +{
> +	struct sem_undo *un;
> +	struct sem_array *sma;
> +	int ret = 0;
> +
> +	un = find_alloc_undo(ns, semid);
> +	if (IS_ERR(un)) {
> +		ret = PTR_ERR(un);
> +		goto out;
> +	}
> +
> +	/* lookup the sem_array */
> +	sma = sem_lock(ns, semid);
> +	if (IS_ERR(sma)) {
> +		ret = PTR_ERR(sma);
> +		rcu_read_unlock();
> +		goto out;
> +	}
> +
> +	/*
> +	 * find_alloc_undo opened an rcu read section to protect un.
> +	 * Releasing it here is safe:
> +	 *    . sem_lock is held, so we are protected against IPC_RMID
> +	 *    . the refcnt won't fall to 0 since exit_sem only operates on
> +	 *      current and we are the current.
> +	 */
> +	rcu_read_unlock();
> +
> +	/*
> +	 * semid identifiers are not unique - find_alloc_undo() may have
> +	 * allocated an undo structure, it was invalidated by an RMID and now
> +	 * a new array received the same id.
> +	 * Check and fail.
> +	 * This case can be detected checking un->semid. The existance of
> +	 * "un" itself is guaranteed by rcu.
> +	 */
> +	if (un->semid == -1) {
> +		ret = -EIDRM;
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * If the number of values given does not match the number of
> +	 * semaphores in the array, consider this as an error.
> +	 */
> +	if (size != sma->sem_nsems) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	/* update the undo values */
> +	while (--size >= 0)
> +		un->semadj[size] = array[size];
> +
> +	/* maybe some queued-up processes were waiting for this */
> +	update_queue(sma);
> +
> +out_unlock:
> +	sem_unlock(sma);
> +out:
> +	return ret;
> +}
> +
> +/*
> + * write operation for /proc/self/semundo file
> + *
> + * Only current is allowed to write to its semundo file.
> + *
> + * The expected string format is:
> + * "<semID> <val1> <val2> ... <valN>"
> + *
> + * It sets (or updates) the sem_undo list for 'current' and the target
> + * <semID>, to the given 'undo' values.
> + *
> + * <semID> must match an existing semaphore array.
> + * The number of values following <semID> must match the number of semaphores
> + * in the corresponding array.
> + *
> + * Multiple semID's can be passed simultaneously: newline ('\n') is considered
> + * as a separator in that case.
> + *
> + * Note: it is not allowed to set the sem_undo list for a given semID using
> + *       mutliple write calls.
> + */
> +static ssize_t semundo_write(struct file *file, const char __user *buf,
> +			     size_t count, loff_t *ppos)
> +{
> +	struct seq_file *m = file->private_data;
> +	short *array;
> +	int err, max_sem, semid = 0;
> +	size_t left = count;
> +	struct undo_list_data *data = m->private;
> +	struct ipc_namespace *ns = data->ipc_ns;
> +
> +	if (data->undo_list)
> +		/* Should never happen */
> +		return -EINVAL;
> +
> +	max_sem = ns->sc_semmsl;
> +
> +	array = kmalloc(sizeof(short)*max_sem, GFP_KERNEL);
> +	if (array == NULL)
> +		return -ENOMEM;
> +
> +	while (left) {
> +		int nval;
> +
> +		nval = semundo_readline(&buf, &left, &semid, array, max_sem);
> +		if (nval < 0) {
> +			err = nval;
> +			goto out;
> +		}
> +
> +		err = semundo_update(ns, semid, array, nval);
> +		if (err)
> +			goto out;
> +	}
> +	err = count - left;
> +
> +out:
> +	kfree(array);
> +	return err;
> +}
> +
>  static int semundo_release(struct inode *inode, struct file *file)
>  {
>  	struct seq_file *m = file->private_data;
> @@ -1590,6 +1837,7 @@ static int semundo_release(struct inode 
>  const struct file_operations proc_semundo_operations = {
>  	.open		= semundo_open,
>  	.read		= seq_read,
> +	.write		= semundo_write,
>  	.llseek		= seq_lseek,
>  	.release	= semundo_release,
>  };
> 
> --

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 6/6] IPC/sem: .write operation for /proc/<self>/semundo
       [not found]       ` <20080627140607.GA20581-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-06-27 14:12         ` Nadia Derbey
  2008-06-30  8:37         ` Nadia Derbey
  1 sibling, 0 replies; 20+ messages in thread
From: Nadia Derbey @ 2008-06-27 14:12 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Serge E. Hallyn wrote:
> Quoting Nadia.Derbey-6ktuUTfB/bM@public.gmane.org (Nadia.Derbey-6ktuUTfB/bM@public.gmane.org):
> 
>>PATCH [06/06]
>>
>>This patch introduces the .write seq operation for /proc/pid/semundo.
>>
>>In order to simplify the locking strategy, the write operation is only allowed
>>to 'current'.
>>
>>
>>Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>
> 
> 
> Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> 
> Have you written a patch to cryo in light of the new (target==current)
> restriction?

Nop, will do that.

Regards,
Nadia

> 
> thanks,
> -serge
> 
> 
>>---
>> fs/proc/base.c |    2 
>> ipc/sem.c      |  250 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 250 insertions(+), 2 deletions(-)
>>
>>Index: linux-2.6.26-rc5-mm3/fs/proc/base.c
>>===================================================================
>>--- linux-2.6.26-rc5-mm3.orig/fs/proc/base.c	2008-06-24 10:03:33.000000000 +0200
>>+++ linux-2.6.26-rc5-mm3/fs/proc/base.c	2008-06-24 13:21:44.000000000 +0200
>>@@ -2526,7 +2526,7 @@ static const struct pid_entry tgid_base_
>> 	INF("io",	S_IRUGO, tgid_io_accounting),
>> #endif
>> #ifdef CONFIG_SYSVIPC
>>-	REG("semundo",  S_IRUGO, semundo),
>>+	REG("semundo",  S_IWUSR|S_IRUGO, semundo),
>> #endif
>> };
>>
>>Index: linux-2.6.26-rc5-mm3/ipc/sem.c
>>===================================================================
>>--- linux-2.6.26-rc5-mm3.orig/ipc/sem.c	2008-06-24 12:59:15.000000000 +0200
>>+++ linux-2.6.26-rc5-mm3/ipc/sem.c	2008-06-25 08:56:07.000000000 +0200
>>@@ -1554,7 +1554,27 @@ static int semundo_open(struct inode *in
>> 		goto out_err;
>> 	}
>>
>>-	ulp = get_proc_ulp(task);
>>+	if (file->f_flags & O_WRONLY || file->f_flags & O_RDWR) {
>>+		/*
>>+		 * Writing is only allowed to current
>>+		 */
>>+		if (task != current) {
>>+			put_task_struct(task);
>>+			ret = -EPERM;
>>+			goto out_err;
>>+		}
>>+		/*
>>+		 * No need to increment the undo_list's refcnt: only current
>>+		 * can be freeing it through exit, and we are current.
>>+		 * Only signal through the ulp NULL pointer that it won't be
>>+		 * necessary to decrement the refcnt during release.
>>+		 * Actually, in this path the undo_list will be gotten during
>>+		 * the write operation.
>>+		 */
>>+		ulp = NULL;
>>+	} else
>>+		ulp = get_proc_ulp(task);
>>+
>> 	ns = get_ipc_ns(task->nsproxy->ipc_ns);
>> 	put_task_struct(task);
>>
>>@@ -1574,6 +1594,233 @@ out_err:
>> 	return ret;
>> }
>>
>>+/* Skip all spaces at the beginning of the buffer */
>>+static inline int skip_space(const char __user **buf, size_t *len)
>>+{
>>+	char c = 0;
>>+	while (*len) {
>>+		if (get_user(c, *buf))
>>+			return -EFAULT;
>>+		if (c != '\t' && c != ' ')
>>+			break;
>>+		--*len;
>>+		++*buf;
>>+	}
>>+	return c;
>>+}
>>+
>>+/* Retrieve the first numerical value contained in the string.
>>+ * Note: The value is supposed to be a 32-bit integer.
>>+ */
>>+static inline int get_next_value(const char __user **buf, size_t *len, int *val)
>>+{
>>+#define BUFLEN 11
>>+	int err, neg = 0, left;
>>+	char s[BUFLEN], *p;
>>+
>>+	err = skip_space(buf, len);
>>+	if (err < 0)
>>+		return err;
>>+	if (!*len)
>>+		return INT_MAX;
>>+	if (err == '\n') {
>>+		++*buf;
>>+		--*len;
>>+		return INT_MAX;
>>+	}
>>+	if (err == '-') {
>>+		++*buf;
>>+		--*len;
>>+		neg = 1;
>>+	}
>>+
>>+	left = *len;
>>+	if (left > sizeof(s) - 1)
>>+		left = sizeof(s) - 1;
>>+	if (copy_from_user(s, *buf, left))
>>+		return -EFAULT;
>>+
>>+	s[left] = 0;
>>+	p = s;
>>+	if (*p < '0' || *p > '9')
>>+		return -EINVAL;
>>+
>>+	*val = simple_strtoul(p, &p, 0);
>>+	if (neg)
>>+		*val = -(*val);
>>+
>>+	left = p-s;
>>+	(*len) -= left;
>>+	(*buf) += left;
>>+
>>+	return 0;
>>+#undef BUFLEN
>>+}
>>+
>>+/*
>>+ * Reads a line from /proc/self/semundo.
>>+ * Returns the number of undo values read (or errcode upon failure).
>>+ * @id: pointer to the semid (filled in with 1st field in the line)
>>+ * @array: semundo values (filled in iwth remaining fields in the line).
>>+ * @array_len: max # of expected semundo values
>>+ */
>>+static inline int semundo_readline(const char __user **buf, size_t *left,
>>+				   int *id, short *array, int array_len)
>>+{
>>+	int i, val, err;
>>+
>>+	/* Read semid */
>>+	err = get_next_value(buf, left, id);
>>+	if (err)
>>+		return err;
>>+
>>+	/* Read all (semundo-) values of a full line */
>>+	for (i = 0; ; i++) {
>>+		err = get_next_value(buf, left, &val);
>>+		if (err < 0)
>>+			return err;
>>+		/* reached end of line or end of buffer */
>>+		if (err == INT_MAX)
>>+			break;
>>+		/* Return an error if we get more values than expected */
>>+		if (i < array_len)
>>+			array[i] = val;
>>+		else
>>+			return -EINVAL;
>>+	}
>>+	return i;
>>+}
>>+
>>+/*
>>+ * sets or updates the undo values for the undo_list of a given semaphore id.
>>+ * This is exactly the same code sequence as sys_semtimedop if we have undos.
>>+ */
>>+static inline int semundo_update(struct ipc_namespace *ns, int semid,
>>+				short *array, int size)
>>+{
>>+	struct sem_undo *un;
>>+	struct sem_array *sma;
>>+	int ret = 0;
>>+
>>+	un = find_alloc_undo(ns, semid);
>>+	if (IS_ERR(un)) {
>>+		ret = PTR_ERR(un);
>>+		goto out;
>>+	}
>>+
>>+	/* lookup the sem_array */
>>+	sma = sem_lock(ns, semid);
>>+	if (IS_ERR(sma)) {
>>+		ret = PTR_ERR(sma);
>>+		rcu_read_unlock();
>>+		goto out;
>>+	}
>>+
>>+	/*
>>+	 * find_alloc_undo opened an rcu read section to protect un.
>>+	 * Releasing it here is safe:
>>+	 *    . sem_lock is held, so we are protected against IPC_RMID
>>+	 *    . the refcnt won't fall to 0 since exit_sem only operates on
>>+	 *      current and we are the current.
>>+	 */
>>+	rcu_read_unlock();
>>+
>>+	/*
>>+	 * semid identifiers are not unique - find_alloc_undo() may have
>>+	 * allocated an undo structure, it was invalidated by an RMID and now
>>+	 * a new array received the same id.
>>+	 * Check and fail.
>>+	 * This case can be detected checking un->semid. The existance of
>>+	 * "un" itself is guaranteed by rcu.
>>+	 */
>>+	if (un->semid == -1) {
>>+		ret = -EIDRM;
>>+		goto out_unlock;
>>+	}
>>+
>>+	/*
>>+	 * If the number of values given does not match the number of
>>+	 * semaphores in the array, consider this as an error.
>>+	 */
>>+	if (size != sma->sem_nsems) {
>>+		ret = -EINVAL;
>>+		goto out_unlock;
>>+	}
>>+
>>+	/* update the undo values */
>>+	while (--size >= 0)
>>+		un->semadj[size] = array[size];
>>+
>>+	/* maybe some queued-up processes were waiting for this */
>>+	update_queue(sma);
>>+
>>+out_unlock:
>>+	sem_unlock(sma);
>>+out:
>>+	return ret;
>>+}
>>+
>>+/*
>>+ * write operation for /proc/self/semundo file
>>+ *
>>+ * Only current is allowed to write to its semundo file.
>>+ *
>>+ * The expected string format is:
>>+ * "<semID> <val1> <val2> ... <valN>"
>>+ *
>>+ * It sets (or updates) the sem_undo list for 'current' and the target
>>+ * <semID>, to the given 'undo' values.
>>+ *
>>+ * <semID> must match an existing semaphore array.
>>+ * The number of values following <semID> must match the number of semaphores
>>+ * in the corresponding array.
>>+ *
>>+ * Multiple semID's can be passed simultaneously: newline ('\n') is considered
>>+ * as a separator in that case.
>>+ *
>>+ * Note: it is not allowed to set the sem_undo list for a given semID using
>>+ *       mutliple write calls.
>>+ */
>>+static ssize_t semundo_write(struct file *file, const char __user *buf,
>>+			     size_t count, loff_t *ppos)
>>+{
>>+	struct seq_file *m = file->private_data;
>>+	short *array;
>>+	int err, max_sem, semid = 0;
>>+	size_t left = count;
>>+	struct undo_list_data *data = m->private;
>>+	struct ipc_namespace *ns = data->ipc_ns;
>>+
>>+	if (data->undo_list)
>>+		/* Should never happen */
>>+		return -EINVAL;
>>+
>>+	max_sem = ns->sc_semmsl;
>>+
>>+	array = kmalloc(sizeof(short)*max_sem, GFP_KERNEL);
>>+	if (array == NULL)
>>+		return -ENOMEM;
>>+
>>+	while (left) {
>>+		int nval;
>>+
>>+		nval = semundo_readline(&buf, &left, &semid, array, max_sem);
>>+		if (nval < 0) {
>>+			err = nval;
>>+			goto out;
>>+		}
>>+
>>+		err = semundo_update(ns, semid, array, nval);
>>+		if (err)
>>+			goto out;
>>+	}
>>+	err = count - left;
>>+
>>+out:
>>+	kfree(array);
>>+	return err;
>>+}
>>+
>> static int semundo_release(struct inode *inode, struct file *file)
>> {
>> 	struct seq_file *m = file->private_data;
>>@@ -1590,6 +1837,7 @@ static int semundo_release(struct inode 
>> const struct file_operations proc_semundo_operations = {
>> 	.open		= semundo_open,
>> 	.read		= seq_read,
>>+	.write		= semundo_write,
>> 	.llseek		= seq_lseek,
>> 	.release	= semundo_release,
>> };
>>
>>--
> 
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 6/6] IPC/sem: .write operation for /proc/<self>/semundo
       [not found]       ` <20080627140607.GA20581-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2008-06-27 14:12         ` Nadia Derbey
@ 2008-06-30  8:37         ` Nadia Derbey
  1 sibling, 0 replies; 20+ messages in thread
From: Nadia Derbey @ 2008-06-30  8:37 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Serge E. Hallyn wrote:
> Quoting Nadia.Derbey-6ktuUTfB/bM@public.gmane.org (Nadia.Derbey-6ktuUTfB/bM@public.gmane.org):
> 
>>PATCH [06/06]
>>
>>This patch introduces the .write seq operation for /proc/pid/semundo.
>>
>>In order to simplify the locking strategy, the write operation is only allowed
>>to 'current'.
>>
>>
>>Signed-off-by: Nadia Derbey <Nadia.Derbey-6ktuUTfB/bM@public.gmane.org>
> 
> 
> Acked-by: Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> 
> Have you written a patch to cryo in light of the new (target==current)
> restriction?
> 

Serge,

Finally, after having a look at cryo code, it appeared that the semundo 
proc file was already recreated in "injection mode" for current task.
So, nothing to be changed... except that I found a bug at the end of the 
  undo list restoring phase.
Sending the patch right now.

Regards,
Nadia

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-06-30  8:37 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-25 13:49 [RFC PATCH 0/6] SYSVIPC/semaphores - allow saving/restoring a process' semundo_list Nadia.Derbey-6ktuUTfB/bM
2008-06-25 13:49 ` [RFC PATCH 1/6] IPC/sem: RCU-protect the process semundo list Nadia.Derbey-6ktuUTfB/bM
     [not found]   ` <20080625135538.385496000-6ktuUTfB/bM@public.gmane.org>
2008-06-25 20:33     ` Serge E. Hallyn
2008-06-25 13:49 ` [RFC PATCH 2/6] IPC/sem: per <pid> semundo file in procfs Nadia.Derbey-6ktuUTfB/bM
     [not found]   ` <20080625135538.762662000-6ktuUTfB/bM@public.gmane.org>
2008-06-25 20:33     ` Serge E. Hallyn
2008-06-26  5:08     ` Michael Kerrisk
     [not found]       ` <517f3f820806252208m1f5b91dfm38b8babff73adf72-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-06-26  6:07         ` Nadia Derbey
2008-06-25 13:49 ` [RFC PATCH 3/6] IPC/sem: start/stop operations for /proc/pid/semundo Nadia.Derbey-6ktuUTfB/bM
     [not found]   ` <20080625135539.139605000-6ktuUTfB/bM@public.gmane.org>
2008-06-25 20:39     ` Serge E. Hallyn
2008-06-25 13:49 ` [RFC PATCH 4/6] IPC/sem: next " Nadia.Derbey-6ktuUTfB/bM
     [not found]   ` <20080625135539.519489000-6ktuUTfB/bM@public.gmane.org>
2008-06-25 20:57     ` Serge E. Hallyn
     [not found]       ` <20080625205719.GD16374-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-06-26  5:35         ` Nadia Derbey
2008-06-25 13:49 ` [RFC PATCH 5/6] IPC/sem: .show operation " Nadia.Derbey-6ktuUTfB/bM
     [not found]   ` <20080625135539.893049000-6ktuUTfB/bM@public.gmane.org>
2008-06-25 20:58     ` Serge E. Hallyn
2008-06-25 13:49 ` [RFC PATCH 6/6] IPC/sem: .write operation for /proc/<self>/semundo Nadia.Derbey-6ktuUTfB/bM
     [not found]   ` <20080625135540.271934000-6ktuUTfB/bM@public.gmane.org>
2008-06-25 21:09     ` Serge E. Hallyn
     [not found]       ` <20080625210937.GF16374-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-06-26  5:44         ` Nadia Derbey
2008-06-27 14:06     ` Serge E. Hallyn
     [not found]       ` <20080627140607.GA20581-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-06-27 14:12         ` Nadia Derbey
2008-06-30  8:37         ` Nadia Derbey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.