linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rik van Riel <riel@surriel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Michel Lespinasse <walken@google.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	torvalds@linux-foundation.org, davidlohr.bueso@hp.com,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	hhuang@redhat.com, jason.low2@hp.com, lwoodman@redhat.com,
	chegu_vinod@hp.com, Dave Jones <davej@redhat.com>,
	benisty.e@gmail.com, Ingo Molnar <mingo@redhat.com>
Subject: [PATCH v2 -mm -next] ipc,sem: fix lockdep false positive
Date: Thu, 28 Mar 2013 16:23:37 -0400	[thread overview]
Message-ID: <20130328162337.3003ccd4@cuia.bos.redhat.com> (raw)
In-Reply-To: <1364373750.5053.54.camel@laptop>

On Wed, 27 Mar 2013 09:42:30 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> I since realized there's an ordering problem with ->global_locked, we
> need to use spin_is_locked() or somesuch.
> 
> Two competing sma_lock() operations will screw over the separate
> variable.

I created a worse version of the stress test, one that will have
the first thread always do two semops at a time (creating a complex
operation), and have all the other threads do one at a time. All
the threads issue down and then up, so there is a lot of sleeping
and waking back up with this test.

The patch below should fix the last lockdep issue (the other issue
being fixed by the RCU patch), and is running the various semop
tests just fine on a 48 CPU system.

The locking is a little more complicated than I would have liked,
but a few hundred million semaphore operations suggest there are
probably no race conditions left...

---8<---

Subject: [PATCH -mm -next] ipc,sem: change locking scheme to make lockdep happy

Unfortunately the locking scheme originally proposed has false positives
with lockdep.  This can be fixed by changing the code to only ever take
one lock, and making sure that other relevant locks are not locked, before
entering a critical section.

For the "global lock" case, this is done by taking the sem_array lock,
and then (potentially) waiting for all the semaphore's spinlocks to be
unlocked.

For the "local lock" case, we wait on the sem_array's lock to be free,
before taking the semaphore local lock. To prevent races, we need to
check again after we have taken the local lock.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
 ipc/sem.c |   55 ++++++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index 36500a6..87b74d5 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -320,24 +320,39 @@ void __init sem_init (void)
 }
 
 /*
- * If the sem_array contains just one semaphore, or if multiple
- * semops are performed in one syscall, or if there are complex
- * operations pending, the whole sem_array is locked.
- * If one semop is performed on an array with multiple semaphores,
- * get a shared lock on the array, and lock the individual semaphore.
+ * If the request contains only one semaphore operation, and there are
+ * no complex transactions pending, lock only the semaphore involved.
+ * Otherwise, lock the entire semaphore array, since we either have
+ * multiple semaphores in our own semops, or we need to look at
+ * semaphores from other pending complex operations.
  *
  * Carefully guard against sma->complex_count changing between zero
  * and non-zero while we are spinning for the lock. The value of
  * sma->complex_count cannot change while we are holding the lock,
  * so sem_unlock should be fine.
+ *
+ * The global lock path checks that all the local locks have been released,
+ * checking each local lock once. This means that the local lock paths
+ * cannot start their critical sections while the global lock is held.
  */
 static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
 			      int nsops)
 {
 	int locknum;
+ again:
 	if (nsops == 1 && !sma->complex_count) {
 		struct sem *sem = sma->sem_base + sops->sem_num;
 
+		/*
+		 * Another process is holding the global lock on the
+		 * sem_array. Wait for that process to release the lock,
+		 * before acquiring our lock.
+		 */
+		if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
+			spin_unlock_wait(&sma->sem_perm.lock);
+			goto again;
+		}
+
 		/* Lock just the semaphore we are interested in. */
 		spin_lock(&sem->lock);
 
@@ -347,17 +362,33 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
 		 */
 		if (unlikely(sma->complex_count)) {
 			spin_unlock(&sem->lock);
-			goto lock_all;
+			goto lock_array;
+		}
+
+		/*
+		 * Another process is holding the global lock on the
+		 * sem_array; we cannot enter our critical section,
+		 * but have to wait for the global lock to be released.
+		 */
+		if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
+			spin_unlock(&sem->lock);
+			goto again;
 		}
+
 		locknum = sops->sem_num;
 	} else {
 		int i;
-		/* Lock the sem_array, and all the semaphore locks */
- lock_all:
+		/*
+		 * Lock the semaphore array, and wait for all of the
+		 * individual semaphore locks to go away.  The code
+		 * above ensures no new single-lock holders will enter
+		 * their critical section while the array lock is held.
+		 */
+ lock_array:
 		spin_lock(&sma->sem_perm.lock);
 		for (i = 0; i < sma->sem_nsems; i++) {
 			struct sem *sem = sma->sem_base + i;
-			spin_lock(&sem->lock);
+			spin_unlock_wait(&sem->lock);
 		}
 		locknum = -1;
 	}
@@ -367,11 +398,6 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
 static inline void sem_unlock(struct sem_array *sma, int locknum)
 {
 	if (locknum == -1) {
-		int i;
-		for (i = 0; i < sma->sem_nsems; i++) {
-			struct sem *sem = sma->sem_base + i;
-			spin_unlock(&sem->lock);
-		}
 		spin_unlock(&sma->sem_perm.lock);
 	} else {
 		struct sem *sem = sma->sem_base + locknum;
@@ -558,7 +584,6 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
 	for (i = 0; i < nsems; i++) {
 		INIT_LIST_HEAD(&sma->sem_base[i].sem_pending);
 		spin_lock_init(&sma->sem_base[i].lock);
-		spin_lock(&sma->sem_base[i].lock);
 	}
 
 	sma->complex_count = 0;

  parent reply	other threads:[~2013-03-28 20:27 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-20 19:55 ipc,sem: sysv semaphore scalability Rik van Riel
2013-03-20 19:55 ` [PATCH 1/7] ipc: remove bogus lock comment for ipc_checkid Rik van Riel
2013-03-20 19:55 ` [PATCH 2/7] ipc: introduce obtaining a lockless ipc object Rik van Riel
2013-03-20 19:55 ` [PATCH 3/7] ipc: introduce lockless pre_down ipcctl Rik van Riel
2013-03-20 19:55 ` [PATCH 4/7] ipc,sem: do not hold ipc lock more than necessary Rik van Riel
2013-03-20 19:55 ` [PATCH 5/7] ipc,sem: open code and rename sem_lock Rik van Riel
2013-03-22  1:14   ` Davidlohr Bueso
2013-03-20 19:55 ` [PATCH 6/7] ipc,sem: have only one list in struct sem_queue Rik van Riel
2013-03-22  1:14   ` Davidlohr Bueso
2013-03-20 19:55 ` [PATCH 7/7] ipc,sem: fine grained locking for semtimedop Rik van Riel
2013-03-22  1:14   ` Davidlohr Bueso
2013-03-22 23:01   ` Michel Lespinasse
2013-03-22 23:38     ` Rik van Riel
2013-03-22 23:42     ` [PATCH 7/7 part3] fix for sem_lock Rik van Riel
2013-03-20 20:49 ` ipc,sem: sysv semaphore scalability Linus Torvalds
2013-03-20 20:56   ` Linus Torvalds
2013-03-20 20:57   ` Davidlohr Bueso
2013-03-21 21:10 ` Andrew Morton
2013-03-21 21:47   ` Peter Hurley
2013-03-21 21:50   ` Peter Hurley
2013-03-21 22:01     ` Andrew Morton
2013-03-22  3:38       ` Rik van Riel
2013-03-26 19:28   ` Dave Jones
2013-03-26 19:43     ` Andrew Morton
2013-03-29 16:17       ` Dave Jones
2013-03-29 18:00         ` Linus Torvalds
2013-03-29 18:04           ` Dave Jones
2013-03-29 18:10             ` Linus Torvalds
2013-03-29 18:43         ` Linus Torvalds
2013-03-29 19:06           ` Dave Jones
2013-03-29 19:13             ` Linus Torvalds
2013-03-29 19:26             ` Linus Torvalds
2013-03-29 19:36               ` Peter Hurley
2013-04-02 16:08                 ` Sasha Levin
2013-04-02 17:24                   ` Linus Torvalds
2013-04-02 17:52                   ` Linus Torvalds
2013-04-02 19:53                     ` Sasha Levin
2013-04-02 20:00                       ` Dave Jones
2013-03-29 19:33           ` Peter Hurley
2013-03-29 19:54             ` Linus Torvalds
2013-04-01  7:40           ` Stanislav Kinsbursky
2013-03-29 20:41         ` Linus Torvalds
2013-03-29 21:12           ` Linus Torvalds
2013-03-29 23:16             ` Linus Torvalds
2013-03-30  1:36               ` Emmanuel Benisty
2013-03-30  2:08                 ` Davidlohr Bueso
2013-03-30  3:02                   ` Emmanuel Benisty
2013-03-30  3:46                     ` Linus Torvalds
2013-03-30  4:33                       ` Emmanuel Benisty
2013-03-30  5:10                         ` Linus Torvalds
2013-03-30  5:57                           ` Emmanuel Benisty
2013-03-30 17:22                             ` Linus Torvalds
2013-03-31  2:38                               ` Emmanuel Benisty
2013-03-31  5:01                         ` Davidlohr Bueso
2013-03-31 13:45                           ` Rik van Riel
2013-03-31 17:10                             ` Linus Torvalds
2013-03-31 17:02                           ` Emmanuel Benisty
2013-03-30  2:09                 ` Linus Torvalds
2013-03-30  2:55                   ` Davidlohr Bueso
2013-03-29 19:01       ` Dave Jones
2013-05-03 15:03         ` Peter Hurley
2013-03-22  1:12 ` Davidlohr Bueso
2013-03-22  1:23   ` Linus Torvalds
2013-03-22  3:40     ` Rik van Riel
2013-03-22  7:30 ` Mike Galbraith
2013-03-22 11:04 ` Emmanuel Benisty
2013-03-22 15:37   ` Linus Torvalds
2013-03-23  3:19     ` Emmanuel Benisty
2013-03-23 19:45       ` Linus Torvalds
2013-03-24 13:46         ` Emmanuel Benisty
2013-03-24 17:10           ` Linus Torvalds
2013-03-25 13:47             ` Emmanuel Benisty
2013-03-25 14:00               ` Rik van Riel
2013-03-25 14:03                 ` Rik van Riel
2013-03-25 15:20                   ` Emmanuel Benisty
2013-03-25 15:53                     ` Rik van Riel
2013-03-25 17:09                       ` Emmanuel Benisty
2013-03-25 14:01               ` Rik van Riel
2013-03-25 14:21                 ` Emmanuel Benisty
2013-03-26 17:59               ` Davidlohr Bueso
2013-03-26 18:14                 ` Rik van Riel
2013-03-26 18:35                 ` Andrew Morton
2013-04-16 23:30                   ` Andrew Morton
2013-05-04 15:55       ` Jörn Engel
2013-05-04 18:12         ` Borislav Petkov
2013-05-06 14:47           ` Jörn Engel
2013-03-22 17:51 ` Davidlohr Bueso
2013-03-25 20:21 ` Sasha Levin
2013-03-25 20:38   ` [PATCH -mm -next] ipc,sem: fix lockdep false positive Rik van Riel
2013-03-25 21:42     ` Michel Lespinasse
2013-03-25 21:51       ` Michel Lespinasse
2013-03-25 21:56         ` Sasha Levin
2013-03-25 21:52       ` Sasha Levin
2013-03-26 13:19       ` Peter Zijlstra
2013-03-26 13:40         ` Michel Lespinasse
2013-03-26 14:27           ` Peter Zijlstra
2013-03-26 15:19             ` Rik van Riel
2013-03-27  8:40               ` Peter Zijlstra
2013-03-27  8:42               ` Peter Zijlstra
2013-03-27 11:22                 ` Michel Lespinasse
2013-03-27 12:02                   ` Peter Zijlstra
2013-03-27 20:00                 ` Rik van Riel
2013-03-28 20:23                 ` Rik van Riel [this message]
2013-03-29  2:50                   ` [PATCH v2 " Michel Lespinasse
2013-03-29  9:57                     ` Peter Zijlstra
2013-03-29 13:21                       ` Michel Lespinasse
2013-03-29 12:07                     ` Rik van Riel
2013-03-29 13:08                       ` Michel Lespinasse
2013-03-29 13:24                         ` Rik van Riel
2013-03-29 13:55                     ` [PATCH v3 " Rik van Riel
2013-03-29 13:59                       ` Michel Lespinasse
2013-03-26 14:25         ` [PATCH " Rik van Riel
2013-03-26 17:33 ` ipc,sem: sysv semaphore scalability Sasha Levin
2013-03-26 17:51   ` Davidlohr Bueso
2013-03-26 18:07     ` Sasha Levin
2013-03-26 18:17       ` Rik van Riel
2013-03-26 20:00       ` [PATCH -mm -next] ipc,sem: untangle RCU locking with find_alloc_undo Rik van Riel
2013-04-05  4:38         ` Mike Galbraith
2013-04-05 13:21           ` Rik van Riel
2013-04-05 16:26             ` Mike Galbraith
2013-04-16 12:37             ` Mike Galbraith
2013-03-26 17:55   ` ipc,sem: sysv semaphore scalability Paul E. McKenney
2013-03-28 15:32   ` [PATCH -mm -next] ipc,sem: untangle RCU locking with find_alloc_undo Rik van Riel
2013-03-28 21:05     ` Davidlohr Bueso
2013-03-29  1:00     ` Michel Lespinasse
2013-03-29  1:14       ` Sasha Levin
2013-03-30 13:35     ` Sasha Levin
2013-03-31  1:30       ` Rik van Riel
2013-03-31  4:09         ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130328162337.3003ccd4@cuia.bos.redhat.com \
    --to=riel@surriel.com \
    --cc=akpm@linux-foundation.org \
    --cc=benisty.e@gmail.com \
    --cc=chegu_vinod@hp.com \
    --cc=davej@redhat.com \
    --cc=davidlohr.bueso@hp.com \
    --cc=hhuang@redhat.com \
    --cc=jason.low2@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lwoodman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sasha.levin@oracle.com \
    --cc=torvalds@linux-foundation.org \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).