All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily
@ 2013-05-16  1:07 Davidlohr Bueso
  2013-05-16  1:08 ` [PATCH 01/11] ipc: move rcu lock out of ipc_addid Davidlohr Bueso
                   ` (12 more replies)
  0 siblings, 13 replies; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-16  1:07 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, riel, linux-kernel, Davidlohr Bueso

This patchset continues the work that began in the sysv ipc semaphore scaling 
series: https://lkml.org/lkml/2013/3/20/546

Just like semaphores used to be, sysv shared memory and msg queues also abuse the ipc
lock, unnecessarily holding it for operations such as permission and security checks. This
patchset mostly deals with mqueues, and while shared mem can be done in a very similar way,
I want to get these patches out in the open first. It also does some pending cleanups,
mostly focused on the two level locking we have in ipc code, taking care of ipc_addid() 
and ipcctl_pre_down_nolock() - yes there are still functions that need to be updated as well.

I have tried to split each patch to be as readable and specific as possible, specially when
shortening the critical regions, going one function at a time.

Patch 1 moves the locking to be explicitly done by the callers of ipc_addid.
It updates msg, sem and shm.

Patches 2-3: are just wrappers around the ipc lock, initially suggested by Linus.

Patch 4 is similar to patch 1, moving the rcu and rw_mutex locking out of
ipcctl_pre_down_nolock so that the callers explicitly deals with them. It updates msg, sem
and shm.

Patch 5 shortens the critical region in msgctl_down(), using the lockless
ipcctl_pre_down() function and only acquiring the ipc lock for RMID and SET commands.

Patch 6 simply moves the what-should-be lockless logic of *_INFO and *_STAT commands
out of msgctl() into a new function, msgctl_lockless().

Patch 7 introduces the necessary wrappers around ipc_obtain_object[_check]()
that will later enable us to separately lookup the ipc object without holding the lock.

Patch 8 updates the previously added msgctl_nolock() to actually be lockless, reducing
the critical region for the STAT commands.

Patch 9 redoes the locking for msgsend().

Patch 10 redoes the locking for msgrcv().

Patch 11 removes the now unused msg_lock and msg_lock_check functions, replaced by
a smarter combination of rcu, ipc_obtain_object and ipc_object_lock.

Davidlohr Bueso (11):
  ipc: move rcu lock out of ipc_addid
  ipc: introduce ipc object locking helpers
  ipc: close open coded spin lock calls
  ipc: move locking out of ipcctl_pre_down_nolock
  ipc,msg: shorten critical region in semctl_down
  ipc,msg: introduce msgctl_nolock
  ipc,msg: introduce lockless functions to obtain the ipc object
  ipc,msg: make msgctl_nolock lockless
  ipc,msg: reduce critical region in msgsnd
  ipc,msg: make shorten critical region in msgrcv
  ipc: remove unused functions

 ipc/msg.c  | 227 ++++++++++++++++++++++++++++++++++++++-----------------------
 ipc/sem.c  |  42 +++++++-----
 ipc/shm.c  |  32 ++++++---
 ipc/util.c |  25 ++-----
 ipc/util.h |  22 ++++--
 5 files changed, 211 insertions(+), 137 deletions(-)

Thanks,
Davidlohr

-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 01/11] ipc: move rcu lock out of ipc_addid
  2013-05-16  1:07 [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Davidlohr Bueso
@ 2013-05-16  1:08 ` Davidlohr Bueso
  2013-05-16  1:08 ` [PATCH 02/11] ipc: introduce ipc object locking helpers Davidlohr Bueso
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-16  1:08 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, riel, linux-kernel, Davidlohr Bueso

Make all callers explicitly take and release the RCU read lock.

This addresses the two level locking seen in newary(), newseg()
and newqueue(). For the last two, explicitly unlock the ipc object
and the rcu lock, instead of calling the custom shm_unlock and
msg_unlock functions. The next patch will deal with the open
coded locking for ->perm.lock

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
---
 ipc/msg.c  | 9 +++++----
 ipc/sem.c  | 2 ++
 ipc/shm.c  | 7 ++++++-
 ipc/util.c | 4 +---
 4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index d0c6d96..8270511 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -199,11 +199,11 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
 		return retval;
 	}
 
-	/*
-	 * ipc_addid() locks msq
-	 */
+	/* ipc_addid() locks msq upon success. */
+	rcu_read_lock();
 	id = ipc_addid(&msg_ids(ns), &msq->q_perm, ns->msg_ctlmni);
 	if (id < 0) {
+		rcu_read_unlock();
 		security_msg_queue_free(msq);
 		ipc_rcu_putref(msq);
 		return id;
@@ -218,7 +218,8 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
 	INIT_LIST_HEAD(&msq->q_receivers);
 	INIT_LIST_HEAD(&msq->q_senders);
 
-	msg_unlock(msq);
+	spin_unlock(&msq->q_perm.lock);
+	rcu_read_unlock();
 
 	return msq->q_perm.id;
 }
diff --git a/ipc/sem.c b/ipc/sem.c
index a7e40ed..cee96e6 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -407,8 +407,10 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
 		return retval;
 	}
 
+	rcu_read_lock();
 	id = ipc_addid(&sem_ids(ns), &sma->sem_perm, ns->sc_semmni);
 	if (id < 0) {
+		rcu_read_unlock();
 		security_sem_free(sma);
 		ipc_rcu_putref(sma);
 		return id;
diff --git a/ipc/shm.c b/ipc/shm.c
index 7e199fa..11dec70 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -521,9 +521,11 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 	if (IS_ERR(file))
 		goto no_file;
 
+	rcu_read_lock();
 	id = ipc_addid(&shm_ids(ns), &shp->shm_perm, ns->shm_ctlmni);
 	if (id < 0) {
 		error = id;
+		rcu_read_unlock();
 		goto no_id;
 	}
 
@@ -535,6 +537,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 	shp->shm_nattch = 0;
 	shp->shm_file = file;
 	shp->shm_creator = current;
+
 	/*
 	 * shmid gets reported as "inode#" in /proc/pid/maps.
 	 * proc-ps tools use this. Changing this will break them.
@@ -543,7 +546,9 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 
 	ns->shm_tot += numpages;
 	error = shp->shm_perm.id;
-	shm_unlock(shp);
+
+	spin_unlock(&shp->shm_perm.lock);
+	rcu_read_unlock();
 	return error;
 
 no_id:
diff --git a/ipc/util.c b/ipc/util.c
index 809ec5e..92f0242 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -246,9 +246,8 @@ int ipc_get_maxid(struct ipc_ids *ids)
  *	is returned. The 'new' entry is returned in a locked state on success.
  *	On failure the entry is not locked and a negative err-code is returned.
  *
- *	Called with ipc_ids.rw_mutex held as a writer.
+ *	Called with RCU read lock and writer ipc_ids.rw_mutex held.
  */
- 
 int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
 {
 	kuid_t euid;
@@ -266,7 +265,6 @@ int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
 
 	spin_lock_init(&new->lock);
 	new->deleted = 0;
-	rcu_read_lock();
 	spin_lock(&new->lock);
 
 	id = idr_alloc(&ids->ipcs_idr, new,
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 02/11] ipc: introduce ipc object locking helpers
  2013-05-16  1:07 [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Davidlohr Bueso
  2013-05-16  1:08 ` [PATCH 01/11] ipc: move rcu lock out of ipc_addid Davidlohr Bueso
@ 2013-05-16  1:08 ` Davidlohr Bueso
  2013-05-16  1:08 ` [PATCH 03/11] ipc: close open coded spin lock calls Davidlohr Bueso
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-16  1:08 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, riel, linux-kernel, Davidlohr Bueso

Simple helpers around the (kern_ipc_perm *)->lock spinlock.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
---
 ipc/util.h | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/ipc/util.h b/ipc/util.h
index 2b0bdd5..da65e8a 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -159,23 +159,33 @@ static inline int ipc_checkid(struct kern_ipc_perm *ipcp, int uid)
 	return uid / SEQ_MULTIPLIER != ipcp->seq;
 }
 
-static inline void ipc_lock_by_ptr(struct kern_ipc_perm *perm)
+static inline void ipc_lock_object(struct kern_ipc_perm *perm)
 {
-	rcu_read_lock();
 	spin_lock(&perm->lock);
 }
 
-static inline void ipc_unlock(struct kern_ipc_perm *perm)
+static inline void ipc_unlock_object(struct kern_ipc_perm *perm)
 {
 	spin_unlock(&perm->lock);
-	rcu_read_unlock();
 }
 
-static inline void ipc_lock_object(struct kern_ipc_perm *perm)
+static inline void ipc_assert_locked_object(struct kern_ipc_perm *perm)
 {
+	assert_spin_locked(&perm->lock);
+}
+
+static inline void ipc_lock_by_ptr(struct kern_ipc_perm *perm)
+{
+	rcu_read_lock();
 	spin_lock(&perm->lock);
 }
 
+static inline void ipc_unlock(struct kern_ipc_perm *perm)
+{
+	spin_unlock(&perm->lock);
+	rcu_read_unlock();
+}
+
 struct kern_ipc_perm *ipc_lock_check(struct ipc_ids *ids, int id);
 struct kern_ipc_perm *ipc_obtain_object_check(struct ipc_ids *ids, int id);
 int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids,
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 03/11] ipc: close open coded spin lock calls
  2013-05-16  1:07 [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Davidlohr Bueso
  2013-05-16  1:08 ` [PATCH 01/11] ipc: move rcu lock out of ipc_addid Davidlohr Bueso
  2013-05-16  1:08 ` [PATCH 02/11] ipc: introduce ipc object locking helpers Davidlohr Bueso
@ 2013-05-16  1:08 ` Davidlohr Bueso
  2013-05-16  1:08 ` [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock Davidlohr Bueso
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-16  1:08 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, riel, linux-kernel, Davidlohr Bueso

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
---
 ipc/msg.c  |  2 +-
 ipc/sem.c  | 14 +++++++-------
 ipc/shm.c  |  4 ++--
 ipc/util.h |  4 ++--
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 8270511..7a6f344 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -218,7 +218,7 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
 	INIT_LIST_HEAD(&msq->q_receivers);
 	INIT_LIST_HEAD(&msq->q_senders);
 
-	spin_unlock(&msq->q_perm.lock);
+	ipc_unlock_object(&msq->q_perm);
 	rcu_read_unlock();
 
 	return msq->q_perm.id;
diff --git a/ipc/sem.c b/ipc/sem.c
index cee96e6..7854fc8 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -246,7 +246,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
 		 * their critical section while the array lock is held.
 		 */
  lock_array:
-		spin_lock(&sma->sem_perm.lock);
+		ipc_lock_object(&sma->sem_perm);
 		for (i = 0; i < sma->sem_nsems; i++) {
 			struct sem *sem = sma->sem_base + i;
 			spin_unlock_wait(&sem->lock);
@@ -259,7 +259,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
 static inline void sem_unlock(struct sem_array *sma, int locknum)
 {
 	if (locknum == -1) {
-		spin_unlock(&sma->sem_perm.lock);
+		ipc_unlock_object(&sma->sem_perm);
 	} else {
 		struct sem *sem = sma->sem_base + locknum;
 		spin_unlock(&sem->lock);
@@ -857,7 +857,7 @@ static void freeary(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
 	int i;
 
 	/* Free the existing undo structures for this semaphore set.  */
-	assert_spin_locked(&sma->sem_perm.lock);
+	ipc_assert_locked_object(&sma->sem_perm);
 	list_for_each_entry_safe(un, tu, &sma->list_id, list_id) {
 		list_del(&un->list_id);
 		spin_lock(&un->ulp->lock);
@@ -1055,7 +1055,7 @@ static int semctl_setval(struct ipc_namespace *ns, int semid, int semnum,
 
 	curr = &sma->sem_base[semnum];
 
-	assert_spin_locked(&sma->sem_perm.lock);
+	ipc_assert_locked_object(&sma->sem_perm);
 	list_for_each_entry(un, &sma->list_id, list_id)
 		un->semadj[semnum] = 0;
 
@@ -1184,7 +1184,7 @@ static int semctl_main(struct ipc_namespace *ns, int semid, int semnum,
 		for (i = 0; i < nsems; i++)
 			sma->sem_base[i].semval = sem_io[i];
 
-		assert_spin_locked(&sma->sem_perm.lock);
+		ipc_assert_locked_object(&sma->sem_perm);
 		list_for_each_entry(un, &sma->list_id, list_id) {
 			for (i = 0; i < nsems; i++)
 				un->semadj[i] = 0;
@@ -1481,7 +1481,7 @@ static struct sem_undo *find_alloc_undo(struct ipc_namespace *ns, int semid)
 	new->semid = semid;
 	assert_spin_locked(&ulp->lock);
 	list_add_rcu(&new->list_proc, &ulp->list_proc);
-	assert_spin_locked(&sma->sem_perm.lock);
+	ipc_assert_locked_object(&sma->sem_perm);
 	list_add(&new->list_id, &sma->list_id);
 	un = new;
 
@@ -1818,7 +1818,7 @@ void exit_sem(struct task_struct *tsk)
 		}
 
 		/* remove un from the linked lists */
-		assert_spin_locked(&sma->sem_perm.lock);
+		ipc_assert_locked_object(&sma->sem_perm);
 		list_del(&un->list_id);
 
 		spin_lock(&ulp->lock);
diff --git a/ipc/shm.c b/ipc/shm.c
index 11dec70..1fd3d05f 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -141,7 +141,7 @@ static inline struct shmid_kernel *shm_lock(struct ipc_namespace *ns, int id)
 static inline void shm_lock_by_ptr(struct shmid_kernel *ipcp)
 {
 	rcu_read_lock();
-	spin_lock(&ipcp->shm_perm.lock);
+	ipc_lock_object(&ipcp->shm_perm);
 }
 
 static inline struct shmid_kernel *shm_lock_check(struct ipc_namespace *ns,
@@ -547,7 +547,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 	ns->shm_tot += numpages;
 	error = shp->shm_perm.id;
 
-	spin_unlock(&shp->shm_perm.lock);
+	ipc_unlock_object(&shp->shm_perm);
 	rcu_read_unlock();
 	return error;
 
diff --git a/ipc/util.h b/ipc/util.h
index da65e8a..b6a6a88 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -177,12 +177,12 @@ static inline void ipc_assert_locked_object(struct kern_ipc_perm *perm)
 static inline void ipc_lock_by_ptr(struct kern_ipc_perm *perm)
 {
 	rcu_read_lock();
-	spin_lock(&perm->lock);
+	ipc_lock_object(perm);
 }
 
 static inline void ipc_unlock(struct kern_ipc_perm *perm)
 {
-	spin_unlock(&perm->lock);
+	ipc_unlock_object(perm);
 	rcu_read_unlock();
 }
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock
  2013-05-16  1:07 [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Davidlohr Bueso
                   ` (2 preceding siblings ...)
  2013-05-16  1:08 ` [PATCH 03/11] ipc: close open coded spin lock calls Davidlohr Bueso
@ 2013-05-16  1:08 ` Davidlohr Bueso
  2013-05-16 17:55   ` Andi Kleen
  2013-05-24 20:16   ` Andrew Morton
  2013-05-16  1:08 ` [PATCH 05/11] ipc,msg: shorten critical region in msgctl_down Davidlohr Bueso
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-16  1:08 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, riel, linux-kernel, Davidlohr Bueso

This function currently acquires both the rw_mutex and the rcu lock on
successful lookups, leaving the callers to explicitly unlock them, creating
another two level locking situation.

Make the callers (including those that still use ipcctl_pre_down()) explicitly
lock and unlock the rwsem and rcu lock.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
---
 ipc/msg.c  | 24 +++++++++++++++++-------
 ipc/sem.c  | 27 ++++++++++++++++-----------
 ipc/shm.c  | 23 +++++++++++++++++------
 ipc/util.c | 21 ++++++---------------
 4 files changed, 56 insertions(+), 39 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 7a6f344..4eaeb08 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -409,31 +409,38 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
 			return -EFAULT;
 	}
 
+	down_write(&msg_ids(ns).rw_mutex);
+	rcu_read_lock();
+
 	ipcp = ipcctl_pre_down(ns, &msg_ids(ns), msqid, cmd,
 			       &msqid64.msg_perm, msqid64.msg_qbytes);
-	if (IS_ERR(ipcp))
-		return PTR_ERR(ipcp);
+	if (IS_ERR(ipcp)) {
+		err = PTR_ERR(ipcp);
+		/* the ipc lock is not held upon failure */
+		goto out_unlock1;
+	}
 
 	msq = container_of(ipcp, struct msg_queue, q_perm);
 
 	err = security_msg_queue_msgctl(msq, cmd);
 	if (err)
-		goto out_unlock;
+		goto out_unlock0;
 
 	switch (cmd) {
 	case IPC_RMID:
+		/* freeque unlocks the ipc object and rcu */
 		freeque(ns, ipcp);
 		goto out_up;
 	case IPC_SET:
 		if (msqid64.msg_qbytes > ns->msg_ctlmnb &&
 		    !capable(CAP_SYS_RESOURCE)) {
 			err = -EPERM;
-			goto out_unlock;
+			goto out_unlock0;
 		}
 
 		err = ipc_update_perm(&msqid64.msg_perm, ipcp);
 		if (err)
-			goto out_unlock;
+			goto out_unlock0;
 
 		msq->q_qbytes = msqid64.msg_qbytes;
 
@@ -450,8 +457,11 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
 	default:
 		err = -EINVAL;
 	}
-out_unlock:
-	msg_unlock(msq);
+
+out_unlock0:
+	ipc_unlock_object(&msq->q_perm);
+out_unlock1:
+	rcu_read_unlock();
 out_up:
 	up_write(&msg_ids(ns).rw_mutex);
 	return err;
diff --git a/ipc/sem.c b/ipc/sem.c
index 7854fc8..f7d99f9 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1274,39 +1274,44 @@ static int semctl_down(struct ipc_namespace *ns, int semid,
 			return -EFAULT;
 	}
 
+	down_write(&sem_ids(ns).rw_mutex);
+	rcu_read_lock();
+
 	ipcp = ipcctl_pre_down_nolock(ns, &sem_ids(ns), semid, cmd,
 				      &semid64.sem_perm, 0);
-	if (IS_ERR(ipcp))
-		return PTR_ERR(ipcp);
+	if (IS_ERR(ipcp)) {
+		err = PTR_ERR(ipcp);
+		/* the ipc lock is not held upon failure */
+		goto out_unlock1;
+	}
 
 	sma = container_of(ipcp, struct sem_array, sem_perm);
 
 	err = security_sem_semctl(sma, cmd);
-	if (err) {
-		rcu_read_unlock();
-		goto out_up;
-	}
+	if (err)
+		goto out_unlock1;
 
-	switch(cmd){
+	switch (cmd) {
 	case IPC_RMID:
 		sem_lock(sma, NULL, -1);
+		/* freeary unlocks the ipc object and rcu */
 		freeary(ns, ipcp);
 		goto out_up;
 	case IPC_SET:
 		sem_lock(sma, NULL, -1);
 		err = ipc_update_perm(&semid64.sem_perm, ipcp);
 		if (err)
-			goto out_unlock;
+			goto out_unlock0;
 		sma->sem_ctime = get_seconds();
 		break;
 	default:
-		rcu_read_unlock();
 		err = -EINVAL;
-		goto out_up;
+		goto out_unlock1;
 	}
 
-out_unlock:
+out_unlock0:
 	sem_unlock(sma, -1);
+out_unlock1:
 	rcu_read_unlock();
 out_up:
 	up_write(&sem_ids(ns).rw_mutex);
diff --git a/ipc/shm.c b/ipc/shm.c
index 1fd3d05f..078cc19 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -759,31 +759,42 @@ static int shmctl_down(struct ipc_namespace *ns, int shmid, int cmd,
 			return -EFAULT;
 	}
 
+	down_write(&shm_ids(ns).rw_mutex);
+	rcu_read_lock();
+
 	ipcp = ipcctl_pre_down(ns, &shm_ids(ns), shmid, cmd,
 			       &shmid64.shm_perm, 0);
-	if (IS_ERR(ipcp))
-		return PTR_ERR(ipcp);
+	if (IS_ERR(ipcp)) {
+		err = PTR_ERR(ipcp);
+		/* the ipc lock is not held upon failure */
+		goto out_unlock1;
+	}
 
 	shp = container_of(ipcp, struct shmid_kernel, shm_perm);
 
 	err = security_shm_shmctl(shp, cmd);
 	if (err)
-		goto out_unlock;
+		goto out_unlock0;
+
 	switch (cmd) {
 	case IPC_RMID:
+		/* do_shm_rmid unlocks the ipc object and rcu */
 		do_shm_rmid(ns, ipcp);
 		goto out_up;
 	case IPC_SET:
 		err = ipc_update_perm(&shmid64.shm_perm, ipcp);
 		if (err)
-			goto out_unlock;
+			goto out_unlock0;
 		shp->shm_ctim = get_seconds();
 		break;
 	default:
 		err = -EINVAL;
 	}
-out_unlock:
-	shm_unlock(shp);
+
+out_unlock0:
+	ipc_unlock_object(&shp->shm_perm);
+out_unlock1:
+	rcu_read_unlock();
 out_up:
 	up_write(&shm_ids(ns).rw_mutex);
 	return err;
diff --git a/ipc/util.c b/ipc/util.c
index 92f0242..a746abb 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -745,8 +745,10 @@ int ipc_update_perm(struct ipc64_perm *in, struct kern_ipc_perm *out)
  * It must be called without any lock held and
  *  - retrieves the ipc with the given id in the given table.
  *  - performs some audit and permission check, depending on the given cmd
- *  - returns the ipc with both ipc and rw_mutex locks held in case of success
+ *  - returns the ipc with the ipc lock held in case of success
  *    or an err-code without any lock held otherwise.
+ *
+ * Call holding the both the rw_mutex and the rcu read lock.
  */
 struct kern_ipc_perm *ipcctl_pre_down(struct ipc_namespace *ns,
 				      struct ipc_ids *ids, int id, int cmd,
@@ -771,13 +773,10 @@ struct kern_ipc_perm *ipcctl_pre_down_nolock(struct ipc_namespace *ns,
 	int err = -EPERM;
 	struct kern_ipc_perm *ipcp;
 
-	down_write(&ids->rw_mutex);
-	rcu_read_lock();
-
 	ipcp = ipc_obtain_object_check(ids, id);
 	if (IS_ERR(ipcp)) {
 		err = PTR_ERR(ipcp);
-		goto out_up;
+		goto err;
 	}
 
 	audit_ipc_obj(ipcp);
@@ -788,16 +787,8 @@ struct kern_ipc_perm *ipcctl_pre_down_nolock(struct ipc_namespace *ns,
 	euid = current_euid();
 	if (uid_eq(euid, ipcp->cuid) || uid_eq(euid, ipcp->uid)  ||
 	    ns_capable(ns->user_ns, CAP_SYS_ADMIN))
-		return ipcp;
-
-out_up:
-	/*
-	 * Unsuccessful lookup, unlock and return
-	 * the corresponding error.
-	 */
-	rcu_read_unlock();
-	up_write(&ids->rw_mutex);
-
+		return ipcp; /* successful lookup */
+err:
 	return ERR_PTR(err);
 }
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 05/11] ipc,msg: shorten critical region in msgctl_down
  2013-05-16  1:07 [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Davidlohr Bueso
                   ` (3 preceding siblings ...)
  2013-05-16  1:08 ` [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock Davidlohr Bueso
@ 2013-05-16  1:08 ` Davidlohr Bueso
  2013-05-16  1:08 ` [PATCH 06/11] ipc,msg: introduce msgctl_nolock Davidlohr Bueso
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-16  1:08 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, riel, linux-kernel, Davidlohr Bueso

Instead of holding the ipc lock for the entire function, use
the ipcctl_pre_down_nolock and only acquire the lock for
specific commands: RMID and SET.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
---
 ipc/msg.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 4eaeb08..264ebe0 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -412,11 +412,10 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
 	down_write(&msg_ids(ns).rw_mutex);
 	rcu_read_lock();
 
-	ipcp = ipcctl_pre_down(ns, &msg_ids(ns), msqid, cmd,
-			       &msqid64.msg_perm, msqid64.msg_qbytes);
+	ipcp = ipcctl_pre_down_nolock(ns, &msg_ids(ns), msqid, cmd,
+				      &msqid64.msg_perm, msqid64.msg_qbytes);
 	if (IS_ERR(ipcp)) {
 		err = PTR_ERR(ipcp);
-		/* the ipc lock is not held upon failure */
 		goto out_unlock1;
 	}
 
@@ -424,10 +423,11 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
 
 	err = security_msg_queue_msgctl(msq, cmd);
 	if (err)
-		goto out_unlock0;
+		goto out_unlock1;
 
 	switch (cmd) {
 	case IPC_RMID:
+		ipc_lock_object(&msq->q_perm);
 		/* freeque unlocks the ipc object and rcu */
 		freeque(ns, ipcp);
 		goto out_up;
@@ -435,9 +435,10 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
 		if (msqid64.msg_qbytes > ns->msg_ctlmnb &&
 		    !capable(CAP_SYS_RESOURCE)) {
 			err = -EPERM;
-			goto out_unlock0;
+			goto out_unlock1;
 		}
 
+		ipc_lock_object(&msq->q_perm);
 		err = ipc_update_perm(&msqid64.msg_perm, ipcp);
 		if (err)
 			goto out_unlock0;
@@ -456,6 +457,7 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
 		break;
 	default:
 		err = -EINVAL;
+		goto out_unlock1;
 	}
 
 out_unlock0:
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 06/11] ipc,msg: introduce msgctl_nolock
  2013-05-16  1:07 [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Davidlohr Bueso
                   ` (4 preceding siblings ...)
  2013-05-16  1:08 ` [PATCH 05/11] ipc,msg: shorten critical region in msgctl_down Davidlohr Bueso
@ 2013-05-16  1:08 ` Davidlohr Bueso
  2013-05-16  1:08 ` [PATCH 07/11] ipc,msg: introduce lockless functions to obtain the ipc object Davidlohr Bueso
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-16  1:08 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, riel, linux-kernel, Davidlohr Bueso

Similar to semctl, when calling msgctl, the *_INFO and *_STAT
commands can be performed without acquiring the ipc object.

Add a msgctl_nolock() function and move the logic of *_INFO
and *_STAT out of msgctl(). This change still takes the lock
and it will be properly lockless in the next patch

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
---
 ipc/msg.c | 49 ++++++++++++++++++++++++++++++++++---------------
 1 file changed, 34 insertions(+), 15 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 264ebe0..20361b5 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -469,17 +469,11 @@ out_up:
 	return err;
 }
 
-SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf)
+static int msgctl_nolock(struct ipc_namespace *ns, int msqid,
+			 int cmd, int version, void __user *buf)
 {
+	int err;
 	struct msg_queue *msq;
-	int err, version;
-	struct ipc_namespace *ns;
-
-	if (msqid < 0 || cmd < 0)
-		return -EINVAL;
-
-	version = ipc_parse_version(&cmd);
-	ns = current->nsproxy->ipc_ns;
 
 	switch (cmd) {
 	case IPC_INFO:
@@ -490,6 +484,7 @@ SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf)
 
 		if (!buf)
 			return -EFAULT;
+
 		/*
 		 * We must not return kernel stack data.
 		 * due to padding, it's not enough
@@ -521,7 +516,8 @@ SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf)
 			return -EFAULT;
 		return (max_id < 0) ? 0 : max_id;
 	}
-	case MSG_STAT:	/* msqid is an index rather than a msg queue id */
+
+	case MSG_STAT:
 	case IPC_STAT:
 	{
 		struct msqid64_ds tbuf;
@@ -565,19 +561,42 @@ SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf)
 			return -EFAULT;
 		return success_return;
 	}
-	case IPC_SET:
-	case IPC_RMID:
-		err = msgctl_down(ns, msqid, cmd, buf, version);
-		return err;
+
 	default:
-		return  -EINVAL;
+		return -EINVAL;
 	}
 
+	return err;
 out_unlock:
 	msg_unlock(msq);
 	return err;
 }
 
+SYSCALL_DEFINE3(msgctl, int, msqid, int, cmd, struct msqid_ds __user *, buf)
+{
+	int version;
+	struct ipc_namespace *ns;
+
+	if (msqid < 0 || cmd < 0)
+		return -EINVAL;
+
+	version = ipc_parse_version(&cmd);
+	ns = current->nsproxy->ipc_ns;
+
+	switch (cmd) {
+	case IPC_INFO:
+	case MSG_INFO:
+	case MSG_STAT:	/* msqid is an index rather than a msg queue id */
+	case IPC_STAT:
+		return msgctl_nolock(ns, msqid, cmd, version, buf);
+	case IPC_SET:
+	case IPC_RMID:
+		return msgctl_down(ns, msqid, cmd, buf, version);
+	default:
+		return  -EINVAL;
+	}
+}
+
 static int testmsg(struct msg_msg *msg, long type, int mode)
 {
 	switch(mode)
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 07/11] ipc,msg: introduce lockless functions to obtain the ipc object
  2013-05-16  1:07 [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Davidlohr Bueso
                   ` (5 preceding siblings ...)
  2013-05-16  1:08 ` [PATCH 06/11] ipc,msg: introduce msgctl_nolock Davidlohr Bueso
@ 2013-05-16  1:08 ` Davidlohr Bueso
  2013-05-16  1:08 ` [PATCH 08/11] ipc,msg: make msgctl_nolock lockless Davidlohr Bueso
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-16  1:08 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, riel, linux-kernel, Davidlohr Bueso

Add msq_obtain_object() and msq_obtain_object_check(), which will
allow us to get the ipc object without acquiring the lock. Just
as with semaphores, these functions are basically wrappers around
ipc_obtain_object*().

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
---
 ipc/msg.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/ipc/msg.c b/ipc/msg.c
index 20361b5..9ff295b 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -166,6 +166,27 @@ static inline struct msg_queue *msg_lock_check(struct ipc_namespace *ns,
 	return container_of(ipcp, struct msg_queue, q_perm);
 }
 
+static inline struct msg_queue *msq_obtain_object(struct ipc_namespace *ns, int id)
+{
+	struct kern_ipc_perm *ipcp = ipc_obtain_object(&msg_ids(ns), id);
+
+	if (IS_ERR(ipcp))
+		return ERR_CAST(ipcp);
+
+	return container_of(ipcp, struct msg_queue, q_perm);
+}
+
+static inline struct msg_queue *msq_obtain_object_check(struct ipc_namespace *ns,
+							int id)
+{
+	struct kern_ipc_perm *ipcp = ipc_obtain_object_check(&msg_ids(ns), id);
+
+	if (IS_ERR(ipcp))
+		return ERR_CAST(ipcp);
+
+	return container_of(ipcp, struct msg_queue, q_perm);
+}
+
 static inline void msg_rmid(struct ipc_namespace *ns, struct msg_queue *s)
 {
 	ipc_rmid(&msg_ids(ns), &s->q_perm);
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 08/11] ipc,msg: make msgctl_nolock lockless
  2013-05-16  1:07 [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Davidlohr Bueso
                   ` (6 preceding siblings ...)
  2013-05-16  1:08 ` [PATCH 07/11] ipc,msg: introduce lockless functions to obtain the ipc object Davidlohr Bueso
@ 2013-05-16  1:08 ` Davidlohr Bueso
  2013-05-16  1:08 ` [PATCH 09/11] ipc,msg: shorten critical region in msgsnd Davidlohr Bueso
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-16  1:08 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, riel, linux-kernel, Davidlohr Bueso

While the INFO cmd doesn't take the ipc lock, the STAT
commands do acquire it unnecessarily. We can do the
permissions and security checks only holding the
rcu lock.

This function now mimics semctl_nolock().

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
---
 ipc/msg.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 9ff295b..2d89b00 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -547,17 +547,25 @@ static int msgctl_nolock(struct ipc_namespace *ns, int msqid,
 		if (!buf)
 			return -EFAULT;
 
+		memset(&tbuf, 0, sizeof(tbuf));
+
+		rcu_read_lock();
 		if (cmd == MSG_STAT) {
-			msq = msg_lock(ns, msqid);
-			if (IS_ERR(msq))
-				return PTR_ERR(msq);
+			msq = msq_obtain_object(ns, msqid);
+			if (IS_ERR(msq)) {
+				err = PTR_ERR(msq);
+				goto out_unlock;
+			}
 			success_return = msq->q_perm.id;
 		} else {
-			msq = msg_lock_check(ns, msqid);
-			if (IS_ERR(msq))
-				return PTR_ERR(msq);
+			msq = msq_obtain_object_check(ns, msqid);
+			if (IS_ERR(msq)) {
+				err = PTR_ERR(msq);
+				goto out_unlock;
+			}
 			success_return = 0;
 		}
+
 		err = -EACCES;
 		if (ipcperms(ns, &msq->q_perm, S_IRUGO))
 			goto out_unlock;
@@ -566,8 +574,6 @@ static int msgctl_nolock(struct ipc_namespace *ns, int msqid,
 		if (err)
 			goto out_unlock;
 
-		memset(&tbuf, 0, sizeof(tbuf));
-
 		kernel_to_ipc64_perm(&msq->q_perm, &tbuf.msg_perm);
 		tbuf.msg_stime  = msq->q_stime;
 		tbuf.msg_rtime  = msq->q_rtime;
@@ -577,7 +583,8 @@ static int msgctl_nolock(struct ipc_namespace *ns, int msqid,
 		tbuf.msg_qbytes = msq->q_qbytes;
 		tbuf.msg_lspid  = msq->q_lspid;
 		tbuf.msg_lrpid  = msq->q_lrpid;
-		msg_unlock(msq);
+		rcu_read_unlock();
+
 		if (copy_msqid_to_user(buf, &tbuf, version))
 			return -EFAULT;
 		return success_return;
@@ -589,7 +596,7 @@ static int msgctl_nolock(struct ipc_namespace *ns, int msqid,
 
 	return err;
 out_unlock:
-	msg_unlock(msq);
+	rcu_read_unlock();
 	return err;
 }
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 09/11] ipc,msg: shorten critical region in msgsnd
  2013-05-16  1:07 [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Davidlohr Bueso
                   ` (7 preceding siblings ...)
  2013-05-16  1:08 ` [PATCH 08/11] ipc,msg: make msgctl_nolock lockless Davidlohr Bueso
@ 2013-05-16  1:08 ` Davidlohr Bueso
  2013-05-16  1:08 ` [PATCH 10/11] ipc,msg: shorten critical region in msgrcv Davidlohr Bueso
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-16  1:08 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, riel, linux-kernel, Davidlohr Bueso

do_msgsnd() is another function that does too many things
with the ipc object acquired. Take it only when needed when
actually updating msq.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
---
 ipc/msg.c | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 2d89b00..e60f0e0 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -700,10 +700,11 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
 	msg->m_type = mtype;
 	msg->m_ts = msgsz;
 
-	msq = msg_lock_check(ns, msqid);
+	rcu_read_lock();
+	msq = msq_obtain_object_check(ns, msqid);
 	if (IS_ERR(msq)) {
 		err = PTR_ERR(msq);
-		goto out_free;
+		goto out_unlock1;
 	}
 
 	for (;;) {
@@ -711,11 +712,11 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
 
 		err = -EACCES;
 		if (ipcperms(ns, &msq->q_perm, S_IWUGO))
-			goto out_unlock_free;
+			goto out_unlock1;
 
 		err = security_msg_queue_msgsnd(msq, msg, msgflg);
 		if (err)
-			goto out_unlock_free;
+			goto out_unlock1;
 
 		if (msgsz + msq->q_cbytes <= msq->q_qbytes &&
 				1 + msq->q_qnum <= msq->q_qbytes) {
@@ -725,32 +726,41 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
 		/* queue full, wait: */
 		if (msgflg & IPC_NOWAIT) {
 			err = -EAGAIN;
-			goto out_unlock_free;
+			goto out_unlock1;
 		}
+
+		ipc_lock_object(&msq->q_perm);
 		ss_add(msq, &s);
 
 		if (!ipc_rcu_getref(msq)) {
 			err = -EIDRM;
-			goto out_unlock_free;
+			goto out_unlock0;
 		}
 
-		msg_unlock(msq);
+		ipc_unlock_object(&msq->q_perm);
+		rcu_read_unlock();
 		schedule();
 
-		ipc_lock_by_ptr(&msq->q_perm);
+		rcu_read_lock();
+		ipc_lock_object(&msq->q_perm);
+
 		ipc_rcu_putref(msq);
 		if (msq->q_perm.deleted) {
 			err = -EIDRM;
-			goto out_unlock_free;
+			goto out_unlock0;
 		}
+
 		ss_del(&s);
 
 		if (signal_pending(current)) {
 			err = -ERESTARTNOHAND;
-			goto out_unlock_free;
+			goto out_unlock0;
 		}
+
+		ipc_unlock_object(&msq->q_perm);
 	}
 
+	ipc_lock_object(&msq->q_perm);
 	msq->q_lspid = task_tgid_vnr(current);
 	msq->q_stime = get_seconds();
 
@@ -766,9 +776,10 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
 	err = 0;
 	msg = NULL;
 
-out_unlock_free:
-	msg_unlock(msq);
-out_free:
+out_unlock0:
+	ipc_unlock_object(&msq->q_perm);
+out_unlock1:
+	rcu_read_unlock();
 	if (msg != NULL)
 		free_msg(msg);
 	return err;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 10/11] ipc,msg: shorten critical region in msgrcv
  2013-05-16  1:07 [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Davidlohr Bueso
                   ` (8 preceding siblings ...)
  2013-05-16  1:08 ` [PATCH 09/11] ipc,msg: shorten critical region in msgsnd Davidlohr Bueso
@ 2013-05-16  1:08 ` Davidlohr Bueso
  2013-05-16  1:08 ` [PATCH 11/11] ipc: remove unused functions Davidlohr Bueso
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-16  1:08 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, riel, linux-kernel, Davidlohr Bueso

do_msgrcv() is the last msg queue function that abuses the ipc lock
Take it only when needed when actually updating msq.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
---
 ipc/msg.c | 57 ++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index e60f0e0..1654be4 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -887,21 +887,19 @@ static struct msg_msg *find_msg(struct msg_queue *msq, long *msgtyp, int mode)
 	return ERR_PTR(-EAGAIN);
 }
 
-
-long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
-	       int msgflg,
+long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, int msgflg,
 	       long (*msg_handler)(void __user *, struct msg_msg *, size_t))
 {
-	struct msg_queue *msq;
-	struct msg_msg *msg;
 	int mode;
+	struct msg_queue *msq;
 	struct ipc_namespace *ns;
-	struct msg_msg *copy = NULL;
+	struct msg_msg *msg, *copy = NULL;
 
 	ns = current->nsproxy->ipc_ns;
 
 	if (msqid < 0 || (long) bufsz < 0)
 		return -EINVAL;
+
 	if (msgflg & MSG_COPY) {
 		copy = prepare_copy(buf, min_t(size_t, bufsz, ns->msg_ctlmax));
 		if (IS_ERR(copy))
@@ -909,8 +907,10 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
 	}
 	mode = convert_mode(&msgtyp, msgflg);
 
-	msq = msg_lock_check(ns, msqid);
+	rcu_read_lock();
+	msq = msq_obtain_object_check(ns, msqid);
 	if (IS_ERR(msq)) {
+		rcu_read_unlock();
 		free_copy(copy);
 		return PTR_ERR(msq);
 	}
@@ -920,10 +920,9 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
 
 		msg = ERR_PTR(-EACCES);
 		if (ipcperms(ns, &msq->q_perm, S_IRUGO))
-			goto out_unlock;
+			goto out_unlock1;
 
 		msg = find_msg(msq, &msgtyp, mode);
-
 		if (!IS_ERR(msg)) {
 			/*
 			 * Found a suitable message.
@@ -931,7 +930,7 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
 			 */
 			if ((bufsz < msg->m_ts) && !(msgflg & MSG_NOERROR)) {
 				msg = ERR_PTR(-E2BIG);
-				goto out_unlock;
+				goto out_unlock1;
 			}
 			/*
 			 * If we are copying, then do not unlink message and do
@@ -939,8 +938,10 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
 			 */
 			if (msgflg & MSG_COPY) {
 				msg = copy_msg(msg, copy);
-				goto out_unlock;
+				goto out_unlock1;
 			}
+
+			ipc_lock_object(&msq->q_perm);
 			list_del(&msg->m_list);
 			msq->q_qnum--;
 			msq->q_rtime = get_seconds();
@@ -949,14 +950,17 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
 			atomic_sub(msg->m_ts, &ns->msg_bytes);
 			atomic_dec(&ns->msg_hdrs);
 			ss_wakeup(&msq->q_senders, 0);
-			msg_unlock(msq);
-			break;
+
+			goto out_unlock0;
 		}
+
 		/* No message waiting. Wait for a message */
 		if (msgflg & IPC_NOWAIT) {
 			msg = ERR_PTR(-ENOMSG);
-			goto out_unlock;
+			goto out_unlock1;
 		}
+
+		ipc_lock_object(&msq->q_perm);
 		list_add_tail(&msr_d.r_list, &msq->q_receivers);
 		msr_d.r_tsk = current;
 		msr_d.r_msgtype = msgtyp;
@@ -967,8 +971,9 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
 			msr_d.r_maxsize = bufsz;
 		msr_d.r_msg = ERR_PTR(-EAGAIN);
 		current->state = TASK_INTERRUPTIBLE;
-		msg_unlock(msq);
 
+		ipc_unlock_object(&msq->q_perm);
+		rcu_read_unlock();
 		schedule();
 
 		/* Lockless receive, part 1:
@@ -998,32 +1003,34 @@ long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp,
 		 * If there is a message or an error then accept it without
 		 * locking.
 		 */
-		if (msg != ERR_PTR(-EAGAIN)) {
-			rcu_read_unlock();
-			break;
-		}
+		if (msg != ERR_PTR(-EAGAIN))
+			goto out_unlock1;
 
 		/* Lockless receive, part 3:
 		 * Acquire the queue spinlock.
 		 */
-		ipc_lock_by_ptr(&msq->q_perm);
-		rcu_read_unlock();
+		ipc_lock_object(&msq->q_perm);
 
 		/* Lockless receive, part 4:
 		 * Repeat test after acquiring the spinlock.
 		 */
 		msg = (struct msg_msg*)msr_d.r_msg;
 		if (msg != ERR_PTR(-EAGAIN))
-			goto out_unlock;
+			goto out_unlock0;
 
 		list_del(&msr_d.r_list);
 		if (signal_pending(current)) {
 			msg = ERR_PTR(-ERESTARTNOHAND);
-out_unlock:
-			msg_unlock(msq);
-			break;
+			goto out_unlock0;
 		}
+
+		ipc_unlock_object(&msq->q_perm);
 	}
+
+out_unlock0:
+	ipc_unlock_object(&msq->q_perm);
+out_unlock1:
+	rcu_read_unlock();
 	if (IS_ERR(msg)) {
 		free_copy(copy);
 		return PTR_ERR(msg);
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 11/11] ipc: remove unused functions
  2013-05-16  1:07 [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Davidlohr Bueso
                   ` (9 preceding siblings ...)
  2013-05-16  1:08 ` [PATCH 10/11] ipc,msg: shorten critical region in msgrcv Davidlohr Bueso
@ 2013-05-16  1:08 ` Davidlohr Bueso
  2013-05-16 14:12 ` [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Linus Torvalds
  2013-05-24  1:45 ` Davidlohr Bueso
  12 siblings, 0 replies; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-16  1:08 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, riel, linux-kernel, Davidlohr Bueso

We can now drop the msg_lock and msg_lock_check functions along
with a bogus comment introduced previously in semctl_down.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
---
 ipc/msg.c | 25 -------------------------
 ipc/sem.c |  1 -
 2 files changed, 26 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 1654be4..3b7b4b5 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -141,31 +141,6 @@ void __init msg_init(void)
 				IPC_MSG_IDS, sysvipc_msg_proc_show);
 }
 
-/*
- * msg_lock_(check_) routines are called in the paths where the rw_mutex
- * is not held.
- */
-static inline struct msg_queue *msg_lock(struct ipc_namespace *ns, int id)
-{
-	struct kern_ipc_perm *ipcp = ipc_lock(&msg_ids(ns), id);
-
-	if (IS_ERR(ipcp))
-		return (struct msg_queue *)ipcp;
-
-	return container_of(ipcp, struct msg_queue, q_perm);
-}
-
-static inline struct msg_queue *msg_lock_check(struct ipc_namespace *ns,
-						int id)
-{
-	struct kern_ipc_perm *ipcp = ipc_lock_check(&msg_ids(ns), id);
-
-	if (IS_ERR(ipcp))
-		return (struct msg_queue *)ipcp;
-
-	return container_of(ipcp, struct msg_queue, q_perm);
-}
-
 static inline struct msg_queue *msq_obtain_object(struct ipc_namespace *ns, int id)
 {
 	struct kern_ipc_perm *ipcp = ipc_obtain_object(&msg_ids(ns), id);
diff --git a/ipc/sem.c b/ipc/sem.c
index f7d99f9..8223f17 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1281,7 +1281,6 @@ static int semctl_down(struct ipc_namespace *ns, int semid,
 				      &semid64.sem_perm, 0);
 	if (IS_ERR(ipcp)) {
 		err = PTR_ERR(ipcp);
-		/* the ipc lock is not held upon failure */
 		goto out_unlock1;
 	}
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily
  2013-05-16  1:07 [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Davidlohr Bueso
                   ` (10 preceding siblings ...)
  2013-05-16  1:08 ` [PATCH 11/11] ipc: remove unused functions Davidlohr Bueso
@ 2013-05-16 14:12 ` Linus Torvalds
  2013-05-16 16:41   ` Davidlohr Bueso
  2013-05-24  1:45 ` Davidlohr Bueso
  12 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2013-05-16 14:12 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Andrew Morton, Rik van Riel, Linux Kernel Mailing List

On Wed, May 15, 2013 at 6:07 PM, Davidlohr Bueso <davidlohr.bueso@hp.com> wrote:
> This patchset continues the work that began in the sysv ipc semaphore scaling

Looks fine to me, but with the problems we had on the semaphore side,
and since we're out of the merge window, I'm not going to take it into
3.10 (I assume you weren't expecting that either). But hopefully it
can go into Andrew''s -mm tree and get some testing.

              Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily
  2013-05-16 14:12 ` [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Linus Torvalds
@ 2013-05-16 16:41   ` Davidlohr Bueso
  0 siblings, 0 replies; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-16 16:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, Rik van Riel, Linux Kernel Mailing List

On Thu, 2013-05-16 at 07:12 -0700, Linus Torvalds wrote:
> On Wed, May 15, 2013 at 6:07 PM, Davidlohr Bueso <davidlohr.bueso@hp.com> wrote:
> > This patchset continues the work that began in the sysv ipc semaphore scaling
> 
> Looks fine to me, but with the problems we had on the semaphore side,
> and since we're out of the merge window, I'm not going to take it into
> 3.10 (I assume you weren't expecting that either). But hopefully it
> can go into Andrew''s -mm tree and get some testing.

Yes, it is intended for Andrew's tree.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock
  2013-05-16  1:08 ` [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock Davidlohr Bueso
@ 2013-05-16 17:55   ` Andi Kleen
  2013-05-24 20:16   ` Andrew Morton
  1 sibling, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2013-05-16 17:55 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: akpm, torvalds, riel, linux-kernel

Davidlohr Bueso <davidlohr.bueso@hp.com> writes:

> This function currently acquires both the rw_mutex and the rcu lock on
> successful lookups, leaving the callers to explicitly unlock them, creating
> another two level locking situation.

Note that the rcu read lock is not a real lock, so it doesn't have the
usual ordering problems.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily
  2013-05-16  1:07 [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Davidlohr Bueso
                   ` (11 preceding siblings ...)
  2013-05-16 14:12 ` [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Linus Torvalds
@ 2013-05-24  1:45 ` Davidlohr Bueso
  12 siblings, 0 replies; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-24  1:45 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, riel, linux-kernel

ping, Andrew?

On Wed, 2013-05-15 at 18:07 -0700, Davidlohr Bueso wrote:
> This patchset continues the work that began in the sysv ipc semaphore scaling 
> series: https://lkml.org/lkml/2013/3/20/546
> 
> Just like semaphores used to be, sysv shared memory and msg queues also abuse the ipc
> lock, unnecessarily holding it for operations such as permission and security checks. This
> patchset mostly deals with mqueues, and while shared mem can be done in a very similar way,
> I want to get these patches out in the open first. It also does some pending cleanups,
> mostly focused on the two level locking we have in ipc code, taking care of ipc_addid() 
> and ipcctl_pre_down_nolock() - yes there are still functions that need to be updated as well.
> 
> I have tried to split each patch to be as readable and specific as possible, specially when
> shortening the critical regions, going one function at a time.
> 
> Patch 1 moves the locking to be explicitly done by the callers of ipc_addid.
> It updates msg, sem and shm.
> 
> Patches 2-3: are just wrappers around the ipc lock, initially suggested by Linus.
> 
> Patch 4 is similar to patch 1, moving the rcu and rw_mutex locking out of
> ipcctl_pre_down_nolock so that the callers explicitly deals with them. It updates msg, sem
> and shm.
> 
> Patch 5 shortens the critical region in msgctl_down(), using the lockless
> ipcctl_pre_down() function and only acquiring the ipc lock for RMID and SET commands.
> 
> Patch 6 simply moves the what-should-be lockless logic of *_INFO and *_STAT commands
> out of msgctl() into a new function, msgctl_lockless().
> 
> Patch 7 introduces the necessary wrappers around ipc_obtain_object[_check]()
> that will later enable us to separately lookup the ipc object without holding the lock.
> 
> Patch 8 updates the previously added msgctl_nolock() to actually be lockless, reducing
> the critical region for the STAT commands.
> 
> Patch 9 redoes the locking for msgsend().
> 
> Patch 10 redoes the locking for msgrcv().
> 
> Patch 11 removes the now unused msg_lock and msg_lock_check functions, replaced by
> a smarter combination of rcu, ipc_obtain_object and ipc_object_lock.
> 
> Davidlohr Bueso (11):
>   ipc: move rcu lock out of ipc_addid
>   ipc: introduce ipc object locking helpers
>   ipc: close open coded spin lock calls
>   ipc: move locking out of ipcctl_pre_down_nolock
>   ipc,msg: shorten critical region in semctl_down
>   ipc,msg: introduce msgctl_nolock
>   ipc,msg: introduce lockless functions to obtain the ipc object
>   ipc,msg: make msgctl_nolock lockless
>   ipc,msg: reduce critical region in msgsnd
>   ipc,msg: make shorten critical region in msgrcv
>   ipc: remove unused functions
> 
>  ipc/msg.c  | 227 ++++++++++++++++++++++++++++++++++++++-----------------------
>  ipc/sem.c  |  42 +++++++-----
>  ipc/shm.c  |  32 ++++++---
>  ipc/util.c |  25 ++-----
>  ipc/util.h |  22 ++++--
>  5 files changed, 211 insertions(+), 137 deletions(-)
> 
> Thanks,
> Davidlohr
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock
  2013-05-16  1:08 ` [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock Davidlohr Bueso
  2013-05-16 17:55   ` Andi Kleen
@ 2013-05-24 20:16   ` Andrew Morton
  2013-05-24 22:21     ` Davidlohr Bueso
  1 sibling, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2013-05-24 20:16 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: torvalds, riel, linux-kernel

On Wed, 15 May 2013 18:08:03 -0700 Davidlohr Bueso <davidlohr.bueso@hp.com> wrote:

> This function currently acquires both the rw_mutex and the rcu lock on
> successful lookups, leaving the callers to explicitly unlock them, creating
> another two level locking situation.
> 
> Make the callers (including those that still use ipcctl_pre_down()) explicitly
> lock and unlock the rwsem and rcu lock.
> 
> ...
>
> @@ -409,31 +409,38 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
>  			return -EFAULT;
>  	}
>  
> +	down_write(&msg_ids(ns).rw_mutex);
> +	rcu_read_lock();
> +
>  	ipcp = ipcctl_pre_down(ns, &msg_ids(ns), msqid, cmd,
>  			       &msqid64.msg_perm, msqid64.msg_qbytes);
> -	if (IS_ERR(ipcp))
> -		return PTR_ERR(ipcp);
> +	if (IS_ERR(ipcp)) {
> +		err = PTR_ERR(ipcp);
> +		/* the ipc lock is not held upon failure */

Terms like "the ipc lock" are unnecessarily vague.  It's better to
identify the lock by name, eg msg_queue.q_perm.lock.

Where should readers go to understand the overall locking scheme?  A
description of the overall object hierarchy and the role which the
various locks play?

Have you done any performance testing of this patchset?  Just from
squinting at it, I'd expect the effects to be small...


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock
  2013-05-24 20:16   ` Andrew Morton
@ 2013-05-24 22:21     ` Davidlohr Bueso
  2013-05-24 22:29       ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Davidlohr Bueso @ 2013-05-24 22:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, riel, linux-kernel

On Fri, 2013-05-24 at 13:16 -0700, Andrew Morton wrote:
> On Wed, 15 May 2013 18:08:03 -0700 Davidlohr Bueso <davidlohr.bueso@hp.com> wrote:
> 
> > This function currently acquires both the rw_mutex and the rcu lock on
> > successful lookups, leaving the callers to explicitly unlock them, creating
> > another two level locking situation.
> > 
> > Make the callers (including those that still use ipcctl_pre_down()) explicitly
> > lock and unlock the rwsem and rcu lock.
> > 
> > ...
> >
> > @@ -409,31 +409,38 @@ static int msgctl_down(struct ipc_namespace *ns, int msqid, int cmd,
> >  			return -EFAULT;
> >  	}
> >  
> > +	down_write(&msg_ids(ns).rw_mutex);
> > +	rcu_read_lock();
> > +
> >  	ipcp = ipcctl_pre_down(ns, &msg_ids(ns), msqid, cmd,
> >  			       &msqid64.msg_perm, msqid64.msg_qbytes);
> > -	if (IS_ERR(ipcp))
> > -		return PTR_ERR(ipcp);
> > +	if (IS_ERR(ipcp)) {
> > +		err = PTR_ERR(ipcp);
> > +		/* the ipc lock is not held upon failure */
> 
> Terms like "the ipc lock" are unnecessarily vague.  It's better to
> identify the lock by name, eg msg_queue.q_perm.lock.

Ok, I can send a patch to rephrase that to perm.lock when I send the shm
patchset (which will be very similar to this one).

> 
> Where should readers go to understand the overall locking scheme?  A
> description of the overall object hierarchy and the role which the
> various locks play?

That can be done, how about something like
Documentation/ipc-locking.txt?

> 
> Have you done any performance testing of this patchset?  Just from
> squinting at it, I'd expect the effects to be small...
> 

Right, I don't expect much performance benefits. (a) unlike sems, I
haven't seen mqueues ever show up as any source of contention, and (b) I
think sysv mqueues have mostly been replaced by posix ones...

For testing, I did run these patches with ipccmd
(http://code.google.com/p/ipcmd/), pgbench, aim7 and Oracle on large
machines - no regressions but nothing new in terms of performance.

I suspect that shm could have a little more impact, but haven't looked
too much into it.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock
  2013-05-24 22:21     ` Davidlohr Bueso
@ 2013-05-24 22:29       ` Andrew Morton
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2013-05-24 22:29 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: torvalds, riel, linux-kernel

On Fri, 24 May 2013 15:21:36 -0700 Davidlohr Bueso <davidlohr.bueso@hp.com> wrote:

> > 
> > Where should readers go to understand the overall locking scheme?  A
> > description of the overall object hierarchy and the role which the
> > various locks play?
> 
> That can be done, how about something like
> Documentation/ipc-locking.txt?

I find that code comments at definition sites work better, if possible.
Standalone documentation ends up being too verbose and never gets
updated or read...


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-05-24 22:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-16  1:07 [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Davidlohr Bueso
2013-05-16  1:08 ` [PATCH 01/11] ipc: move rcu lock out of ipc_addid Davidlohr Bueso
2013-05-16  1:08 ` [PATCH 02/11] ipc: introduce ipc object locking helpers Davidlohr Bueso
2013-05-16  1:08 ` [PATCH 03/11] ipc: close open coded spin lock calls Davidlohr Bueso
2013-05-16  1:08 ` [PATCH 04/11] ipc: move locking out of ipcctl_pre_down_nolock Davidlohr Bueso
2013-05-16 17:55   ` Andi Kleen
2013-05-24 20:16   ` Andrew Morton
2013-05-24 22:21     ` Davidlohr Bueso
2013-05-24 22:29       ` Andrew Morton
2013-05-16  1:08 ` [PATCH 05/11] ipc,msg: shorten critical region in msgctl_down Davidlohr Bueso
2013-05-16  1:08 ` [PATCH 06/11] ipc,msg: introduce msgctl_nolock Davidlohr Bueso
2013-05-16  1:08 ` [PATCH 07/11] ipc,msg: introduce lockless functions to obtain the ipc object Davidlohr Bueso
2013-05-16  1:08 ` [PATCH 08/11] ipc,msg: make msgctl_nolock lockless Davidlohr Bueso
2013-05-16  1:08 ` [PATCH 09/11] ipc,msg: shorten critical region in msgsnd Davidlohr Bueso
2013-05-16  1:08 ` [PATCH 10/11] ipc,msg: shorten critical region in msgrcv Davidlohr Bueso
2013-05-16  1:08 ` [PATCH 11/11] ipc: remove unused functions Davidlohr Bueso
2013-05-16 14:12 ` [PATCH 00/11] sysv mqueue: do not hold the ipc lock unnecessarily Linus Torvalds
2013-05-16 16:41   ` Davidlohr Bueso
2013-05-24  1:45 ` Davidlohr Bueso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.