public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] 2.5.3 remove global semaphore_lock spin lock.
@ 2002-01-31 23:01 Bob Miller
  2002-01-31 23:55 ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Bob Miller @ 2002-01-31 23:01 UTC (permalink / raw)
  To: linux-kernel

Below is a patch for i386 that replaces the global spin lock semaphore_lock,
with the rwlock_t embedded in the wait_queue_head_t in the struct semaphore.

None of the data protected by semaphore_lock lock is global and there is
no need to restrict the system to only allow one semaphore to be dealt with
at a time.

This removes 2 lock round trips from __down() and __down_interruptible().
It also reduces the number the cache lines touched by 1 (i.e.: cache line with
semaphore_lock).

This work has only been done for i386 because that is the only arch I have
access to for testing.  The same changes could be applied to arm, ia64, s390,
s390x and sparc if requested.

-- 
Bob Miller					Email: rem@osdl.org
Open Software Development Lab			Phone: 503.626.2455 Ext. 17


diff -ru linux.2.5.3-orig/arch/i386/kernel/semaphore.c linux.2.5.3/arch/i386/kernel/semaphore.c
--- linux.2.5.3-orig/arch/i386/kernel/semaphore.c	Fri Nov 30 08:54:18 2001
+++ linux.2.5.3/arch/i386/kernel/semaphore.c	Thu Jan 31 14:18:43 2002
@@ -28,7 +28,8 @@
  * the increment operation.
  *
  * "sleeping" and the contention routine ordering is
- * protected by the semaphore spinlock.
+ * protected by the read/write lock in the semaphore's
+ * waitqueue head.
  *
  * Note that these functions are only called when there is
  * contention on the lock, and as such all this is the
@@ -52,39 +53,41 @@
 	wake_up(&sem->wait);
 }
 
-static spinlock_t semaphore_lock = SPIN_LOCK_UNLOCKED;
-
 void __down(struct semaphore * sem)
 {
 	struct task_struct *tsk = current;
 	DECLARE_WAITQUEUE(wait, tsk);
+	unsigned long flags;
+
 	tsk->state = TASK_UNINTERRUPTIBLE;
-	add_wait_queue_exclusive(&sem->wait, &wait);
+	wq_write_lock_irqsave(&sem->wait.lock, flags);
+	add_wait_queue_exclusive_locked(&sem->wait, &wait);
 
-	spin_lock_irq(&semaphore_lock);
 	sem->sleepers++;
 	for (;;) {
 		int sleepers = sem->sleepers;
 
 		/*
 		 * Add "everybody else" into it. They aren't
-		 * playing, because we own the spinlock.
+		 * playing, because we own the wait lock
+		 * exclusivly.
 		 */
 		if (!atomic_add_negative(sleepers - 1, &sem->count)) {
 			sem->sleepers = 0;
 			break;
 		}
 		sem->sleepers = 1;	/* us - see -1 above */
-		spin_unlock_irq(&semaphore_lock);
+		wq_write_unlock_irqrestore(&sem->wait.lock, flags);
 
 		schedule();
+
+		wq_write_lock_irqsave(&sem->wait.lock, flags);
 		tsk->state = TASK_UNINTERRUPTIBLE;
-		spin_lock_irq(&semaphore_lock);
 	}
-	spin_unlock_irq(&semaphore_lock);
-	remove_wait_queue(&sem->wait, &wait);
+	remove_wait_queue_locked(&sem->wait, &wait);
+	wake_up_locked(&sem->wait);
+	wq_write_unlock_irqrestore(&sem->wait.lock, flags);
 	tsk->state = TASK_RUNNING;
-	wake_up(&sem->wait);
 }
 
 int __down_interruptible(struct semaphore * sem)
@@ -92,11 +95,13 @@
 	int retval = 0;
 	struct task_struct *tsk = current;
 	DECLARE_WAITQUEUE(wait, tsk);
+	unsigned long flags;
+
 	tsk->state = TASK_INTERRUPTIBLE;
-	add_wait_queue_exclusive(&sem->wait, &wait);
+	wq_write_lock_irqsave(&sem->wait.lock, flags);
+	add_wait_queue_exclusive_locked(&sem->wait, &wait);
 
-	spin_lock_irq(&semaphore_lock);
-	sem->sleepers ++;
+	sem->sleepers++;
 	for (;;) {
 		int sleepers = sem->sleepers;
 
@@ -116,25 +121,27 @@
 
 		/*
 		 * Add "everybody else" into it. They aren't
-		 * playing, because we own the spinlock. The
-		 * "-1" is because we're still hoping to get
-		 * the lock.
+		 * playing, because we own the wait lock
+		 * exclusivly. The "-1" is because we're still
+		 * hoping to get the semaphore.
 		 */
 		if (!atomic_add_negative(sleepers - 1, &sem->count)) {
 			sem->sleepers = 0;
 			break;
 		}
 		sem->sleepers = 1;	/* us - see -1 above */
-		spin_unlock_irq(&semaphore_lock);
+		wq_write_unlock_irqrestore(&sem->wait.lock, flags);
 
 		schedule();
+
+		wq_write_lock_irqsave(&sem->wait.lock, flags);
 		tsk->state = TASK_INTERRUPTIBLE;
-		spin_lock_irq(&semaphore_lock);
 	}
-	spin_unlock_irq(&semaphore_lock);
+	remove_wait_queue_locked(&sem->wait, &wait);
+	wake_up_locked(&sem->wait);
+	wq_write_unlock_irqrestore(&sem->wait.lock, flags);
+
 	tsk->state = TASK_RUNNING;
-	remove_wait_queue(&sem->wait, &wait);
-	wake_up(&sem->wait);
 	return retval;
 }
 
@@ -151,18 +158,19 @@
 	int sleepers;
 	unsigned long flags;
 
-	spin_lock_irqsave(&semaphore_lock, flags);
+	wq_write_lock_irqsave(&sem->wait.lock, flags);
 	sleepers = sem->sleepers + 1;
 	sem->sleepers = 0;
 
 	/*
 	 * Add "everybody else" and us into it. They aren't
-	 * playing, because we own the spinlock.
+	 * playing, because we own the wait lock exclusivly.
 	 */
-	if (!atomic_add_negative(sleepers, &sem->count))
-		wake_up(&sem->wait);
+	if (!atomic_add_negative(sleepers, &sem->count)) {
+		wake_up_locked(&sem->wait);
+	}
 
-	spin_unlock_irqrestore(&semaphore_lock, flags);
+	wq_write_unlock_irqrestore(&sem->wait.lock, flags);
 	return 1;
 }
 
diff -ru linux.2.5.3-orig/include/linux/sched.h linux.2.5.3/include/linux/sched.h
--- linux.2.5.3-orig/include/linux/sched.h	Tue Jan 29 21:41:12 2002
+++ linux.2.5.3/include/linux/sched.h	Thu Jan 31 10:35:37 2002
@@ -537,6 +537,7 @@
 
 extern void FASTCALL(__wake_up(wait_queue_head_t *q, unsigned int mode, int nr));
 extern void FASTCALL(__wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr));
+extern void FASTCALL(__wake_up_locked(wait_queue_head_t *q, unsigned int mode, int nr));
 extern void FASTCALL(sleep_on(wait_queue_head_t *q));
 extern long FASTCALL(sleep_on_timeout(wait_queue_head_t *q,
 				      signed long timeout));
@@ -556,6 +557,7 @@
 #define wake_up_interruptible_all(x)	__wake_up((x),TASK_INTERRUPTIBLE, 0)
 #define wake_up_interruptible_sync(x)	__wake_up_sync((x),TASK_INTERRUPTIBLE, 1)
 #define wake_up_interruptible_sync_nr(x) __wake_up_sync((x),TASK_INTERRUPTIBLE,  nr)
+#define wake_up_locked(x)		__wake_up_locked((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1)
 asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struct rusage * ru);
 
 extern int in_group_p(gid_t);
@@ -757,7 +759,9 @@
 
 extern void FASTCALL(add_wait_queue(wait_queue_head_t *q, wait_queue_t * wait));
 extern void FASTCALL(add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t * wait));
+extern void FASTCALL(add_wait_queue_exclusive_locked(wait_queue_head_t *q, wait_queue_t * wait));
 extern void FASTCALL(remove_wait_queue(wait_queue_head_t *q, wait_queue_t * wait));
+extern void FASTCALL(remove_wait_queue_locked(wait_queue_head_t *q, wait_queue_t * wait));
 
 extern void wait_task_inactive(task_t * p);
 extern void kick_if_running(task_t * p);
diff -ru linux.2.5.3-orig/kernel/fork.c linux.2.5.3/kernel/fork.c
--- linux.2.5.3-orig/kernel/fork.c	Mon Jan 28 15:11:45 2002
+++ linux.2.5.3/kernel/fork.c	Thu Jan 31 09:55:20 2002
@@ -59,6 +59,15 @@
 	wq_write_unlock_irqrestore(&q->lock, flags);
 }
 
+/*
+ * Must be called with wait queue lock held in write mode.
+ */
+void add_wait_queue_exclusive_locked(wait_queue_head_t *q, wait_queue_t * wait)
+{
+	wait->flags |= WQ_FLAG_EXCLUSIVE;
+	__add_wait_queue_tail(q, wait);
+}
+
 void remove_wait_queue(wait_queue_head_t *q, wait_queue_t * wait)
 {
 	unsigned long flags;
@@ -68,6 +77,14 @@
 	wq_write_unlock_irqrestore(&q->lock, flags);
 }
 
+/*
+ * Must be called with wait queue lock held in write mode.
+ */
+void remove_wait_queue_locked(wait_queue_head_t *q, wait_queue_t * wait)
+{
+	__remove_wait_queue(q, wait);
+}
+
 void __init fork_init(unsigned long mempages)
 {
 	/*
diff -ru linux.2.5.3-orig/kernel/sched.c linux.2.5.3/kernel/sched.c
--- linux.2.5.3-orig/kernel/sched.c	Mon Jan 28 15:12:47 2002
+++ linux.2.5.3/kernel/sched.c	Thu Jan 31 09:55:20 2002
@@ -743,6 +743,17 @@
 	}
 }
 
+/*
+ * Same as __wake_up but called with the wait_queue_head_t lock held
+ * in at least read mode.
+ */
+void __wake_up_locked(wait_queue_head_t *q, unsigned int mode, int nr)
+{
+	if (q) {
+		__wake_up_common(q, mode, nr, 0);
+	}
+}
+
 void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr)
 {
 	if (q) {

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] 2.5.3 remove global semaphore_lock spin lock.
  2002-01-31 23:01 [PATCH] 2.5.3 remove global semaphore_lock spin lock Bob Miller
@ 2002-01-31 23:55 ` Andrew Morton
  2002-02-01  3:41   ` Jeff Garzik
  2002-02-01 20:52   ` Bob Miller
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2002-01-31 23:55 UTC (permalink / raw)
  To: Bob Miller; +Cc: linux-kernel

Bob Miller wrote:
> 
> Below is a patch for i386 that replaces the global spin lock semaphore_lock,
> with the rwlock_t embedded in the wait_queue_head_t in the struct semaphore.
> 

Looks sane.  In practice, the speedup is unmeasurable, but...

> ...
> +       unsigned long flags;
> +       wq_write_lock_irqsave(&sem->wait.lock, flags);
> -       spin_lock_irq(&semaphore_lock);

I rather dislike spin_lock_irq(), because it's fragile (makes
assumptions about the caller's state).  But in this case,
it's probably a reasonable micro-optimisation to not have to
save the flags.  Nobody should be calling down() with local
interrupts disabled.

> ...
> +/*
> + * Same as __wake_up but called with the wait_queue_head_t lock held
> + * in at least read mode.
> + */
> +void __wake_up_locked(wait_queue_head_t *q, unsigned int mode, int nr)
> +{
> +       if (q) {

I don't think we need to test `q' here.  It's a new function,
and we don't need to support broken callers.  So __wake_up_locked()
can become a macro direct call to __wake_up_common().

> +               __wake_up_common(q, mode, nr, 0);

This one breaks the camel's back :)

Let's un-inline __wake_up_common and EXPORT_SYMBOL it.

It'd be good if you could also verify that the code still
works when the use-rwlocks-for-waitqueues option is turned
on.   (wait.h:USE_RW_WAIT_QUEUE_SPINLOCK)

 
-

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] 2.5.3 remove global semaphore_lock spin lock.
  2002-01-31 23:55 ` Andrew Morton
@ 2002-02-01  3:41   ` Jeff Garzik
  2002-02-01 15:58     ` Jens Axboe
  2002-02-01 20:52   ` Bob Miller
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Garzik @ 2002-02-01  3:41 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Bob Miller, linux-kernel

On Thu, Jan 31, 2002 at 03:55:02PM -0800, Andrew Morton wrote:
> > +       unsigned long flags;
> > +       wq_write_lock_irqsave(&sem->wait.lock, flags);
> > -       spin_lock_irq(&semaphore_lock);
> 
> I rather dislike spin_lock_irq(), because it's fragile (makes

It's less flexible for architectures, too.

spin_lock_irqsave is considered 100% portable AFAIK, and I make it my
own policy to s/_irq/_irqsave/ when the opportunity strikes in my PCI
drivers.

	Jeff



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] 2.5.3 remove global semaphore_lock spin lock.
  2002-02-01  3:41   ` Jeff Garzik
@ 2002-02-01 15:58     ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2002-02-01 15:58 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andrew Morton, Bob Miller, linux-kernel

On Thu, Jan 31 2002, Jeff Garzik wrote:
> On Thu, Jan 31, 2002 at 03:55:02PM -0800, Andrew Morton wrote:
> > > +       unsigned long flags;
> > > +       wq_write_lock_irqsave(&sem->wait.lock, flags);
> > > -       spin_lock_irq(&semaphore_lock);
> > 
> > I rather dislike spin_lock_irq(), because it's fragile (makes
> 
> It's less flexible for architectures, too.
> 
> spin_lock_irqsave is considered 100% portable AFAIK, and I make it my
> own policy to s/_irq/_irqsave/ when the opportunity strikes in my PCI
> drivers.

spin_lock_irq is cheaper, though, and sometimes you _know_ it's safe to
use. For instance, if the function in question can block (ie never
called with interrupts disabled) then using spin_lock_irq is always
safe.

I've heard this portability argument before, anyone care to outline
_what_ the problem allegedly is?? Major part of the kernel uses
spin_lock_irq and I suspect we would be seeing lots of request list
corruption did it not work.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] 2.5.3 remove global semaphore_lock spin lock.
  2002-01-31 23:55 ` Andrew Morton
  2002-02-01  3:41   ` Jeff Garzik
@ 2002-02-01 20:52   ` Bob Miller
  2002-02-01 21:05     ` Andrew Morton
  1 sibling, 1 reply; 7+ messages in thread
From: Bob Miller @ 2002-02-01 20:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Thu, Jan 31, 2002 at 03:55:02PM -0800, Andrew Morton wrote:
> Bob Miller wrote:
> > 
> > Below is a patch for i386 that replaces the global spin lock semaphore_lock,
> > with the rwlock_t embedded in the wait_queue_head_t in the struct semaphore.
> > 
> 
> Looks sane.  In practice, the speedup is unmeasurable, but...
> 
> > ...
> > +       unsigned long flags;
> > +       wq_write_lock_irqsave(&sem->wait.lock, flags);
> > -       spin_lock_irq(&semaphore_lock);
> 
> I rather dislike spin_lock_irq(), because it's fragile (makes
> assumptions about the caller's state).  But in this case,
> it's probably a reasonable micro-optimisation to not have to
> save the flags.  Nobody should be calling down() with local
> interrupts disabled.
> 
> > ...
> > +/*
> > + * Same as __wake_up but called with the wait_queue_head_t lock held
> > + * in at least read mode.
> > + */
> > +void __wake_up_locked(wait_queue_head_t *q, unsigned int mode, int nr)
> > +{
> > +       if (q) {
> 
> I don't think we need to test `q' here.  It's a new function,
> and we don't need to support broken callers.  So __wake_up_locked()
> can become a macro direct call to __wake_up_common().
> 
> > +               __wake_up_common(q, mode, nr, 0);
> 
> This one breaks the camel's back :)
> 
> Let's un-inline __wake_up_common and EXPORT_SYMBOL it.
> 
> It'd be good if you could also verify that the code still
> works when the use-rwlocks-for-waitqueues option is turned
> on.   (wait.h:USE_RW_WAIT_QUEUE_SPINLOCK)
> 
>  
Thanks for the feed back. I've incorporated your comments.  Also, at your
suggestion I set wait.h:USE_RW_WAIT_QUEUE_SPINLOCK on a clean 2.5.3 system
to test.  The problem is that it OOPs on startup.  After I track that down
and test with my stuff I'll resubmit the patch.

Thanks for taking the time...

-- 
Bob Miller					Email: rem@osdl.org
Open Software Development Lab			Phone: 503.626.2455 Ext. 17

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] 2.5.3 remove global semaphore_lock spin lock.
  2002-02-01 20:52   ` Bob Miller
@ 2002-02-01 21:05     ` Andrew Morton
  2002-02-01 22:45       ` Bob Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2002-02-01 21:05 UTC (permalink / raw)
  To: Bob Miller; +Cc: linux-kernel

Bob Miller wrote:
> 
> Also, at your suggestion I set wait.h:USE_RW_WAIT_QUEUE_SPINLOCK on
> a clean 2.5.3 system to test.  The problem is that it OOPs on startup.

OK, someone broke it; possibly the scheduler changes.  Not surprising,
really.

It'd be nice to have it fixed, but I wouldn't suggest that you
bust a gut over it.   Certainly your patch shouldn't be held up by
this.  An oops trace would be useful.

-

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] 2.5.3 remove global semaphore_lock spin lock.
  2002-02-01 21:05     ` Andrew Morton
@ 2002-02-01 22:45       ` Bob Miller
  0 siblings, 0 replies; 7+ messages in thread
From: Bob Miller @ 2002-02-01 22:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Fri, Feb 01, 2002 at 01:05:22PM -0800, Andrew Morton wrote:
> Bob Miller wrote:
> > 
> > Also, at your suggestion I set wait.h:USE_RW_WAIT_QUEUE_SPINLOCK on
> > a clean 2.5.3 system to test.  The problem is that it OOPs on startup.
> 
> OK, someone broke it; possibly the scheduler changes.  Not surprising,
> really.
> 
> It'd be nice to have it fixed, but I wouldn't suggest that you
> bust a gut over it.   Certainly your patch shouldn't be held up by
> this.  An oops trace would be useful.
> 
> -
I found the problem... in kernel/sched.c around the 2.4.7 time frame
wait_for_completion() and other code was added.  It uses a new
completion structure that has a wait_queue_haed_t embedded in it.
In wait_for_completion() and other places they use spin_lock_*()
calls that cause the OOPs.

In order to do some of the clean up you suggested I needed to and
some macros to wait.h.  To fix wait_for_completion() and others
those same macros will be needed.  I'm going to fix the wait_for_completion()
stuff first and then get back to the sema stuff.

-- 
Bob Miller					Email: rem@osdl.org
Open Software Development Lab			Phone: 503.626.2455 Ext. 17

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2002-02-01 22:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-01-31 23:01 [PATCH] 2.5.3 remove global semaphore_lock spin lock Bob Miller
2002-01-31 23:55 ` Andrew Morton
2002-02-01  3:41   ` Jeff Garzik
2002-02-01 15:58     ` Jens Axboe
2002-02-01 20:52   ` Bob Miller
2002-02-01 21:05     ` Andrew Morton
2002-02-01 22:45       ` Bob Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox