[RFC] New locking primitive for 2.5

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC] New locking primitive for 2.5
@ 2002-02-07 15:38 Martin Wirth
  2002-02-07 18:04 ` Daniel Phillips
                   ` (4 more replies)
  0 siblings, 5 replies; 61+ messages in thread
From: Martin Wirth @ 2002-02-07 15:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, torvalds, mingo, rml, nigel

This is a request for comment on a new locking primitive
called a combilock.

The goal of this development is:

1. To allow for a better SMP scalability of semaphores used as Mutex
2. As a replacement for long held spinlocks in an preemptible kernel

The new lock uses a combination of a spinlock and a (mutex-)semaphore.
You can lock it for short-term issues in a spin-lock mode:

        combi_spin_lock(struct combilock *x)
        combi_spin_unlock(struct combilock *x)

and for longer lasting tasks in a sleeping mode by:

        combi_mutex_lock(struct combilock *x)
        combi_mutex_unlock(struct combilock *x)

If a spin_lock request is blocked by a mutex_lock call, the spin_lock
attempt also sleeps i.e. behaves like a semaphore.
If you gained ownership of the lock, you can switch between spin-mode
and mutex-(ie.e sleeping) mode by calling:

        combi_to_mutex_mode(struct combilock *x)
        combi_to_spin_mode(struct combilock *x)

without loosing the lock. So you may start with a spin-lock and relax
to a sleeping lock if for example you need to call a non-atomic kmalloc.

This approach is less automatic than a first_spin_then_sleep mutex,
but normally the programmer knows better if he is going to do quick
things, or maybe unbounded stuff.



Note: For a preemptible kernel this approach could lead to much less
scheduling ping-pong also for UP if a spinlock is replaced by a
combilock instead of a semaphore.


Limitations:
  1. In the current implementation the combilock is not irq-save

  2. Although the combilock shares some advantages with a
  spin-lock (no unnecessary scheduling for short time locking) it may
  behave like a semaphore on entry also if you call combi_spin_lock.
  For example

        spin_lock(&slock);
        combi_spin_lock(&clock);

  is a BUG because combi_spin_lock may sleep while holding slock!


Open questions:

  * Does it make sense to also provide irq-save versions of the
    locking functions? This means you could use the unlock functions
    from interrupt context. But the main use in this situation is
    completion handling and there are already (new) completion handlers
    available. So I don't think this is a must have.

  * I there any need to provide non exclusive versions of the waiting
    functions? For real-time applications this would lead to an
    automatic selection of the highest priority task that's waiting
    for the lock. But on the other hand it leads to a lot of unnecessary
    scheduling and I doubt it's really worth it. But maybe there are
    other good reasons to wakeup all waiters.

Possible optimizations:

    If a further extension to a priority inheritance scheme is discarded
    the owner may be replace by a simple flag. And if this is done,
    one possibly could group together the owner flag and the waitqueue
    spinlock to a single 3-state lock. But on architectures without a
    cmpxchg command this may give no real performance win. And also on
    i386 this would need a tweaking of the waitqueue code.

To really take any benefit from a preemptible kernel a lot of spin locks
will have to be replaced by mutex locks. The combi-lock approach may
convince more people who typically fear the higher scheduling pressure
of sleeping locks to do so, if they can decide on each instance which
approach (spin of sleep) will be taken.



Here comes the code:

diff -ruP linux-2.5.3/include/linux/combilock.h
linux/include/linux/combilock.h
--- linux-2.5.3/include/linux/combilock.h	Thu Jan  1 01:00:00 1970
+++ linux/include/linux/combilock.h	Wed Feb  6 14:09:23 2002
@@ -0,0 +1,86 @@
+#ifndef __LINUX_COMBILOCK_H
+#define __LINUX_COMBILOCK_H
+
+/*
+* combilock data structure.
+* See kernel/sched.c for details.
+*/
+
+
+#include <linux/wait.h>
+#include <asm/current.h>
+
+struct combilock {
+	struct task_struct volatile   *owner;
+	wait_queue_head_t             wait;
+};
+
+#define COMBILOCK_INITIALIZER(work) \
+	{ NULL, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+
+#define DECLARE_COMBILOCK(work) \
+	struct combilock work = COMBILOCK_INITIALIZER(work)
+
+static inline void init_combilock(struct combilock *x)
+{
+	x->owner = NULL;
+	init_waitqueue_head(&x->wait);
+}
+
+extern int  FASTCALL(combi_mutex_trylock(struct combilock *x));
+extern void FASTCALL(combi_mutex_lock(struct combilock *x));
+extern int  FASTCALL(combi_mutex_lock_interruptible(struct combilock
*x));
+extern void FASTCALL(combi_mutex_unlock(struct combilock *x));
+extern void FASTCALL(__combi_wait(struct combilock *x));
+
+static inline struct task_struct volatile *combi_owner(struct combilock
*x)
+{
+	return x->owner;
+}
+
+static inline void combi_spin_lock(struct combilock *x)
+{
+	spin_lock(&x->wait.lock);
+	if (unlikely(x->owner))
+		__combi_wait(x);
+}
+
+static inline int combi_spin_trylock(struct combilock *x)
+{
+	if (unlikely(spin_trylock(&x->wait.lock)))
+		return 1;
+	if (unlikely(x->owner))
+		__combi_wait(x);
+	return 0;
+}
+
+static inline void combi_spin_unlock(struct combilock *x)
+{
+	spin_unlock(&x->wait.lock);
+}
+
+static inline void combi_to_mutex_mode(struct combilock *x)
+{
+	if (likely(!x->owner)) {
+		x->owner=current;
+		spin_unlock(&x->wait.lock);
+	}
+}
+
+static inline void combi_to_spin_mode(struct combilock *x)
+{
+	if (likely(x->owner)) {
+		spin_lock(&x->wait.lock);
+		x->owner=NULL;
+	}
+}
+
+static inline void combi_generic_unlock(struct combilock *x)
+{
+	if (likely(!x->owner))
+		combi_spin_unlock(x);
+	else
+		combi_mutex_unlock(x);
+}
+
+#endif
diff -ruP -X /home/adlex/linuxdiffpattern.txt linux-2.5.3/kernel/ksyms.c
linux/kernel/ksyms.c
--- linux-2.5.3/kernel/ksyms.c	Tue Jan 29 19:47:10 2002
+++ linux/kernel/ksyms.c	Wed Feb  6 15:35:26 2002
@@ -46,6 +46,7 @@
 #include <linux/tty.h>
 #include <linux/in6.h>
 #include <linux/completion.h>
+#include <linux/combilock.h>
 #include <linux/seq_file.h>
 #include <asm/checksum.h>

@@ -371,6 +372,13 @@
 /* completion handling */
 EXPORT_SYMBOL(wait_for_completion);
 EXPORT_SYMBOL(complete);
+
+/* combilock non-inline functions */
+EXPORT_SYMBOL(combi_mutex_trylock);
+EXPORT_SYMBOL(combi_mutex_lock);
+EXPORT_SYMBOL(combi_mutex_lock_interruptible);
+EXPORT_SYMBOL(combi_mutex_unlock);
+EXPORT_SYMBOL(__combi_wait);

 /* The notion of irq probe/assignment is foreign to S/390 */

diff -ruP -X /home/adlex/linuxdiffpattern.txt linux-2.5.3/kernel/sched.c
linux/kernel/sched.c
--- linux-2.5.3/kernel/sched.c	Tue Jan 29 00:12:47 2002
+++ linux/kernel/sched.c	Wed Feb  6 13:36:19 2002
@@ -19,6 +19,7 @@
 #include <linux/smp_lock.h>
 #include <linux/interrupt.h>
 #include <asm/mmu_context.h>
+#include <linux/combilock.h>

 #define BITMAP_SIZE ((((MAX_PRIO+7)/8)+sizeof(long)-1)/sizeof(long))

@@ -782,6 +783,89 @@
 	x->done--;
 	spin_unlock_irq(&x->wait.lock);
 }
+
+
+
+
+
+
+/*
+ *  Helper functions assume we hold x->wait.lock
+ */
+void __combi_wait(struct combilock *x)
+{
+	DECLARE_WAITQUEUE(wait, current);
+
+	wait.flags |= WQ_FLAG_EXCLUSIVE;
+	__add_wait_queue_tail(&x->wait, &wait);
+	do {
+	__set_current_state(TASK_UNINTERRUPTIBLE);
+		spin_unlock(&x->wait.lock);
+		schedule();
+		spin_lock(&x->wait.lock);
+	} while (x->owner);
+	__remove_wait_queue(&x->wait, &wait);
+}
+
+int combi_mutex_lock_interruptible(struct combilock *x)
+{
+	int res=0;
+	spin_lock(&x->wait.lock);
+	if (unlikely(x->owner)) {
+		DECLARE_WAITQUEUE(wait, current);
+
+		wait.flags |= WQ_FLAG_EXCLUSIVE;
+		__add_wait_queue_tail(&x->wait, &wait);
+		for (;;) {
+			__set_current_state(TASK_INTERRUPTIBLE);
+			spin_unlock(&x->wait.lock);
+			schedule();
+			spin_lock(&x->wait.lock);
+			if (likely(!x->owner)) {
+				x->owner=current;
+				break;
+			}
+			if (unlikely(signal_pending(current))) {
+				res=1;
+				break;
+			}
+		}
+		__remove_wait_queue(&x->wait, &wait);
+	}
+	spin_unlock(&x->wait.lock);
+	return res;
+}
+
+int combi_mutex_trylock(struct combilock *x)
+{
+	spin_lock(&x->wait.lock);
+	if (unlikely(x->owner)) {
+		spin_unlock(&x->wait.lock);
+		return 1;
+	}
+	x->owner=current;
+	spin_unlock(&x->wait.lock);
+	return 0;
+}
+
+void combi_mutex_lock(struct combilock *x)
+{
+	spin_lock(&x->wait.lock);
+	if (unlikely(x->owner))
+		__combi_wait(x);
+	x->owner=current;
+	spin_unlock(&x->wait.lock);
+}
+
+void combi_mutex_unlock(struct combilock *x)
+{
+	spin_lock(&x->wait.lock);
+	x->owner=NULL;
+	__wake_up_common(&x->wait, TASK_UNINTERRUPTIBLE |
TASK_INTERRUPTIBLE,1, 0);
+	spin_unlock(&x->wait.lock);
+}
+
+

 #define	SLEEP_ON_VAR				\
 	unsigned long flags;			\

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 15:38 [RFC] New locking primitive for 2.5 Martin Wirth
@ 2002-02-07 18:04 ` Daniel Phillips
  2002-02-07 18:06   ` Richard Gooch
  2002-02-07 18:22 ` Christoph Hellwig
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 61+ messages in thread
From: Daniel Phillips @ 2002-02-07 18:04 UTC (permalink / raw)
  To: Martin Wirth, linux-kernel; +Cc: akpm, torvalds, mingo, rml, nigel

On February 7, 2002 04:38 pm, Martin Wirth wrote:
> The new lock uses a combination of a spinlock and a (mutex-)semaphore.

Spinaphore :-)

-- 
Daniel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 18:04 ` Daniel Phillips
@ 2002-02-07 18:06   ` Richard Gooch
  0 siblings, 0 replies; 61+ messages in thread
From: Richard Gooch @ 2002-02-07 18:06 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Martin Wirth, linux-kernel, akpm, torvalds, mingo, rml, nigel

Daniel Phillips writes:
> On February 7, 2002 04:38 pm, Martin Wirth wrote:
> > The new lock uses a combination of a spinlock and a (mutex-)semaphore.
> 
> Spinaphore :-)

Or mutabloat :-(

				Regards,

					Richard....
Permanent: rgooch@atnf.csiro.au
Current:   rgooch@ras.ucalgary.ca

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 15:38 [RFC] New locking primitive for 2.5 Martin Wirth
  2002-02-07 18:04 ` Daniel Phillips
@ 2002-02-07 18:22 ` Christoph Hellwig
  2002-02-07 19:33   ` Daniel Phillips
                     ` (2 more replies)
  2002-02-07 18:40 ` Robert Love
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 61+ messages in thread
From: Christoph Hellwig @ 2002-02-07 18:22 UTC (permalink / raw)
  To: Martin Wirth; +Cc: akpm, torvalds, mingo, rml, nigel, linux-kernel

In article <3C629F91.2869CB1F@dlr.de> you wrote:
> The new lock uses a combination of a spinlock and a (mutex-)semaphore.
> You can lock it for short-term issues in a spin-lock mode:
>
>         combi_spin_lock(struct combilock *x)
>         combi_spin_unlock(struct combilock *x)
>
> and for longer lasting tasks in a sleeping mode by:
>
>         combi_mutex_lock(struct combilock *x)
>         combi_mutex_unlock(struct combilock *x)

I think this API is really ugly.  If both pathes actually do the same,
just with different defaults, one lock function with a flag would be
much nicer.  Also why do we need two unlock functions?

What about the following instead:

	combi_lock(struct combilock *x, int spin);
	combi_unlock(struct combilock *x);

> If a spin_lock request is blocked by a mutex_lock call, the spin_lock
> attempt also sleeps i.e. behaves like a semaphore.
> If you gained ownership of the lock, you can switch between spin-mode
> and mutex-(ie.e sleeping) mode by calling:
>
>         combi_to_mutex_mode(struct combilock *x)
>         combi_to_spin_mode(struct combilock *x)
>
> without loosing the lock. So you may start with a spin-lock and relax
> to a sleeping lock if for example you need to call a non-atomic kmalloc.

This looks really ugly.  I'd really prefer an automatic fallback from
spinning to sleeping after some timeout like e.g. solaris adaptive
mutices.

>   * Does it make sense to also provide irq-save versions of the
>     locking functions? This means you could use the unlock functions
>     from interrupt context. But the main use in this situation is
>     completion handling and there are already (new) completion handlers
>     available. So I don't think this is a must have.

You are no supposed to sleep in irq context, so irq-save combi-locks
don't make that much sense, IMHO.

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 15:38 [RFC] New locking primitive for 2.5 Martin Wirth
  2002-02-07 18:04 ` Daniel Phillips
  2002-02-07 18:22 ` Christoph Hellwig
@ 2002-02-07 18:40 ` Robert Love
  2002-02-07 19:25   ` Andrew Morton
                     ` (3 more replies)
  2002-02-07 19:56 ` yodaiken
  2002-02-08 19:22 ` Horst von Brand
  4 siblings, 4 replies; 61+ messages in thread
From: Robert Love @ 2002-02-07 18:40 UTC (permalink / raw)
  To: Martin Wirth; +Cc: linux-kernel, akpm, torvalds, mingo, nigel

On Thu, 2002-02-07 at 10:38, Martin Wirth wrote:
> This is a request for comment on a new locking primitive
> called a combilock.

Interesting ...

The question I raise is, how many locks do we have where we have a
single resource we lock where in some codepaths the lock is used for
short duration and in other places the lock is long-duration?

It would be useful to identify a few locks where this would benefit and
apply the appropriate combi variant and do some benchmarking.

Some of the talk I've heard has been toward an adaptive lock.  These are
locks like Solaris's that can spin or sleep, usually depending on the
state of the lock's holder.  Another alternative, which I prefer since
it is much less overhead, is a lock that spins-then-sleeps
unconditionally.

> If a spin_lock request is blocked by a mutex_lock call, the spin_lock
> attempt also sleeps i.e. behaves like a semaphore.
> If you gained ownership of the lock, you can switch between spin-mode
> and mutex-(ie.e sleeping) mode by calling:

This can be bad.  What if I grab a spinlock in a codepath where only a
spinlock is appropriate (i.e. somewhere I can't sleep, like an irq
handler) -- and then I sleep?

> Note: For a preemptible kernel this approach could lead to much less
> scheduling ping-pong also for UP if a spinlock is replaced by a
> combilock instead of a semaphore.

Very true ... but only assuming we can find locks where there are
differing profiles of use.

> Open questions:
> 
>   * Does it make sense to also provide irq-save versions of the
>     locking functions? This means you could use the unlock functions
>     from interrupt context. But the main use in this situation is
>     completion handling and there are already (new) completion handlers
>     available. So I don't think this is a must have.

You can't sleep in an interrupt request handler, so this wouldn't make a
lot of sense.

>   * I there any need to provide non exclusive versions of the waiting
>     functions? For real-time applications this would lead to an
>     automatic selection of the highest priority task that's waiting
>     for the lock. But on the other hand it leads to a lot of unnecessary
>     scheduling and I doubt it's really worth it. But maybe there are
>     other good reasons to wakeup all waiters.

Non-exclusive wakeups are common ... see the uses of normal wake_up vs
wake_up_exclusive.

> To really take any benefit from a preemptible kernel a lot of spin locks
> will have to be replaced by mutex locks. The combi-lock approach may
> convince more people who typically fear the higher scheduling pressure
> of sleeping locks to do so, if they can decide on each instance which
> approach (spin of sleep) will be taken.

We shouldn't engage in wholesale changing of spinlocks to semaphores
without a priority-inheritance mechanism.  And _that_ is the bigger
issue ...

	Robert Love

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 18:40 ` Robert Love
@ 2002-02-07 19:25   ` Andrew Morton
  2002-02-07 19:51     ` Dave Hansen
                       ` (2 more replies)
  2002-02-07 19:58   ` yodaiken
                     ` (2 subsequent siblings)
  3 siblings, 3 replies; 61+ messages in thread
From: Andrew Morton @ 2002-02-07 19:25 UTC (permalink / raw)
  To: Robert Love; +Cc: Martin Wirth, linux-kernel, mingo, nigel

Robert Love wrote:
> 
> On Thu, 2002-02-07 at 10:38, Martin Wirth wrote:
> > This is a request for comment on a new locking primitive
> > called a combilock.
> 
> Interesting ...
> 
> The question I raise is, how many locks do we have where we have a
> single resource we lock where in some codepaths the lock is used for
> short duration and in other places the lock is long-duration?

Quite a few.  Significant ones.  pagemap_lru_lock and lru_list_lock
come to mind.

> It would be useful to identify a few locks where this would benefit and
> apply the appropriate combi variant and do some benchmarking.
> 
> Some of the talk I've heard has been toward an adaptive lock.  These are
> locks like Solaris's that can spin or sleep, usually depending on the
> state of the lock's holder.  Another alternative, which I prefer since
> it is much less overhead, is a lock that spins-then-sleeps
> unconditionally.

I dunno.  The spin-a-bit-then-sleep lock has always struck me as
i_dont_know_what_the_fuck_im_doing_lock().  Martin's approach puts
the decision in the hands of the programmer, rather than saying
"Oh gee I goofed" at runtime.

I need to think about all of this some more...
 
> ...
> 
> > To really take any benefit from a preemptible kernel a lot of spin locks
> > will have to be replaced by mutex locks. The combi-lock approach may
> > convince more people who typically fear the higher scheduling pressure
> > of sleeping locks to do so, if they can decide on each instance which
> > approach (spin of sleep) will be taken.
> 
> We shouldn't engage in wholesale changing of spinlocks to semaphores
> without a priority-inheritance mechanism.  And _that_ is the bigger
> issue ...

hmmm.

Let's back off a bit.  What are we trying to achieve here?  What
problem are we trying to solve?  Is it to allow preemptability
inside the infamous long-held locks?   If so then I'd favour
a piecemeal approach to handling each one, rather than magic
bullets.  Now it may be that certain of the locks are best handled
via a new primitive, but that's not obviously true at this time, to me.

-

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 18:22 ` Christoph Hellwig
@ 2002-02-07 19:33   ` Daniel Phillips
  2002-02-07 19:55   ` Mark Frazer
  2002-02-08 12:24   ` Denis Vlasenko
  2 siblings, 0 replies; 61+ messages in thread
From: Daniel Phillips @ 2002-02-07 19:33 UTC (permalink / raw)
  To: Christoph Hellwig, Martin Wirth
  Cc: akpm, torvalds, mingo, rml, nigel, linux-kernel

On February 7, 2002 07:22 pm, Christoph Hellwig wrote:
> In article <3C629F91.2869CB1F@dlr.de> you wrote:
> > If a spin_lock request is blocked by a mutex_lock call, the spin_lock
> > attempt also sleeps i.e. behaves like a semaphore.
> > If you gained ownership of the lock, you can switch between spin-mode
> > and mutex-(ie.e sleeping) mode by calling:
> >
> >         combi_to_mutex_mode(struct combilock *x)
> >         combi_to_spin_mode(struct combilock *x)
> >
> > without loosing the lock. So you may start with a spin-lock and relax
> > to a sleeping lock if for example you need to call a non-atomic kmalloc.
> 
> This looks really ugly.  I'd really prefer an automatic fallback from
> spinning to sleeping after some timeout like e.g. solaris adaptive
> mutices.

Look closer at what he said.  You'd take the lock, then you might decide you
had to do something on a slow/blocking path, for example, a copy to/from user,
so you could do that without leaving waiters spinning, or having to
acquire a different lock and re-establish the state.  It bears thinking
about.

-- 
Daniel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 19:25   ` Andrew Morton
@ 2002-02-07 19:51     ` Dave Hansen
  2002-02-07 20:06       ` Andrew Morton
  2002-02-07 21:27     ` Ingo Molnar
  2002-02-08  8:20     ` Nigel Gamble
  2 siblings, 1 reply; 61+ messages in thread
From: Dave Hansen @ 2002-02-07 19:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Robert Love, Martin Wirth, linux-kernel, mingo, nigel

Andrew Morton wrote:
> Robert Love wrote:
>>On Thu, 2002-02-07 at 10:38, Martin Wirth wrote:
>>Some of the talk I've heard has been toward an adaptive lock.  These are
>>locks like Solaris's that can spin or sleep, usually depending on the
>>state of the lock's holder.  Another alternative, which I prefer since
>>it is much less overhead, is a lock that spins-then-sleeps
>>unconditionally.
> I dunno.  The spin-a-bit-then-sleep lock has always struck me as
> i_dont_know_what_the_fuck_im_doing_lock().  Martin's approach puts
> the decision in the hands of the programmer, rather than saying
> "Oh gee I goofed" at runtime.

The spin-then-sleep lock could be interesting as a replacement for the 
BKL in places where a semaphore causes performance degredation.  In 
quite a few places where we replaced the BKL with a more finely grained 
semapore (not a spinlock because we needed to sleep during the hold), 
instead of spinning for a bit, it would schedule instead.  This was bad 
:).  Spin-then-sleep would be great behaviour in this situation.

-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 18:22 ` Christoph Hellwig
  2002-02-07 19:33   ` Daniel Phillips
@ 2002-02-07 19:55   ` Mark Frazer
  2002-02-08 12:24   ` Denis Vlasenko
  2 siblings, 0 replies; 61+ messages in thread
From: Mark Frazer @ 2002-02-07 19:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Martin Wirth, akpm, torvalds, mingo, rml, nigel, linux-kernel

Christoph Hellwig <hch@ns.caldera.de> [02/02/07 14:41]:
> What about the following instead:
> 
> 	combi_lock(struct combilock *x, int spin);
> 	combi_unlock(struct combilock *x);

how about
        combi_lock (struct combilock *x, int canblock, int shouldblock);

Where the should block flag is copied to the mutex once it's held by
the caller to indicate to new threads grabbing the lock how long the
lock will be held for.

For locks that are held for some short duration tasks and some long
duration tasks, the holder should indicate how long the lock will be held.
I'd consider that an improvement over the "spin for a while then block"
idea.

Interrupt handlers can't block, so we need a flag to never block.

-mark

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 15:38 [RFC] New locking primitive for 2.5 Martin Wirth
                   ` (2 preceding siblings ...)
  2002-02-07 18:40 ` Robert Love
@ 2002-02-07 19:56 ` yodaiken
  2002-02-07 22:09   ` Ingo Molnar
  2002-02-08 12:47   ` Denis Vlasenko
  2002-02-08 19:22 ` Horst von Brand
  4 siblings, 2 replies; 61+ messages in thread
From: yodaiken @ 2002-02-07 19:56 UTC (permalink / raw)
  To: Martin Wirth; +Cc: linux-kernel, akpm, torvalds, mingo, rml, nigel

On Thu, Feb 07, 2002 at 04:38:57PM +0100, Martin Wirth wrote:
> This is a request for comment on a new locking primitive
> called a combilock.
> 
> The goal of this development is:
> 
> 1. To allow for a better SMP scalability of semaphores used as Mutex
> 2. As a replacement for long held spinlocks in an preemptible kernel
> 
> The new lock uses a combination of a spinlock and a (mutex-)semaphore.
> You can lock it for short-term issues in a spin-lock mode:
> 
>         combi_spin_lock(struct combilock *x)
>         combi_spin_unlock(struct combilock *x)
> 
> and for longer lasting tasks in a sleeping mode by:
> 
>         combi_mutex_lock(struct combilock *x)
>         combi_mutex_unlock(struct combilock *x)
> 
> If a spin_lock request is blocked by a mutex_lock call, the spin_lock
> attempt also sleeps i.e. behaves like a semaphore.

So what's the difference between combi_spin and combi_mutex?
combi_spin becomes
	if not mutex locked, spin
	else sleep
Bizzare

The entire concept is revolting.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 18:40 ` Robert Love
  2002-02-07 19:25   ` Andrew Morton
@ 2002-02-07 19:58   ` yodaiken
  2002-02-07 20:08     ` Robert Love
  2002-02-07 20:49   ` Martin Wirth
  2002-02-08  8:34   ` Martin Wirth
  3 siblings, 1 reply; 61+ messages in thread
From: yodaiken @ 2002-02-07 19:58 UTC (permalink / raw)
  To: Robert Love; +Cc: Martin Wirth, linux-kernel, akpm, torvalds, mingo, nigel

On Thu, Feb 07, 2002 at 01:40:59PM -0500, Robert Love wrote:
> > To really take any benefit from a preemptible kernel a lot of spin locks
> > will have to be replaced by mutex locks. The combi-lock approach may
> > convince more people who typically fear the higher scheduling pressure
> > of sleeping locks to do so, if they can decide on each instance which
> > approach (spin of sleep) will be taken.
> 
> We shouldn't engage in wholesale changing of spinlocks to semaphores
> without a priority-inheritance mechanism.  And _that_ is the bigger
> issue ...

Cool. We can then have the Solaris "this usually doesn't fail on test" priority
inherit read/write lock.  I can hardly wait.


-- 
---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 21:27     ` Ingo Molnar
@ 2002-02-07 19:59       ` Andrew Morton
  0 siblings, 0 replies; 61+ messages in thread
From: Andrew Morton @ 2002-02-07 19:59 UTC (permalink / raw)
  To: mingo; +Cc: Robert Love, Martin Wirth, linux-kernel, nigel

Ingo Molnar wrote:
> 
> On Thu, 7 Feb 2002, Andrew Morton wrote:
> 
> > Quite a few.  Significant ones.  pagemap_lru_lock and lru_list_lock
> > come to mind.
> 
> ugh. Are you sure we want to *sleep* with something like pagemap_lru_lock
> held?

Not guilty :)  I was answering rml's question.

I suspect lru_list_lock is the shining example.  We often
take it for very short periods and occasionally take it
for enormous periods.

> That pretty much brings all pagecache related operations to a
> grinding halt.

yup.  We'd go into a sheduling storm until we find the process
which holds the lock.

> I think complex spinlocked sections should be simplified
> rather.

yes.

-

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 19:51     ` Dave Hansen
@ 2002-02-07 20:06       ` Andrew Morton
  2002-02-07 20:11         ` Robert Love
  0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2002-02-07 20:06 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Robert Love, Martin Wirth, linux-kernel, mingo, nigel

Dave Hansen wrote:
> 
> Andrew Morton wrote:
> > Robert Love wrote:
> >>On Thu, 2002-02-07 at 10:38, Martin Wirth wrote:
> >>Some of the talk I've heard has been toward an adaptive lock.  These are
> >>locks like Solaris's that can spin or sleep, usually depending on the
> >>state of the lock's holder.  Another alternative, which I prefer since
> >>it is much less overhead, is a lock that spins-then-sleeps
> >>unconditionally.
> > I dunno.  The spin-a-bit-then-sleep lock has always struck me as
> > i_dont_know_what_the_fuck_im_doing_lock().  Martin's approach puts
> > the decision in the hands of the programmer, rather than saying
> > "Oh gee I goofed" at runtime.
> 
> The spin-then-sleep lock could be interesting as a replacement for the
> BKL in places where a semaphore causes performance degredation.  In
> quite a few places where we replaced the BKL with a more finely grained
> semapore (not a spinlock because we needed to sleep during the hold),
> instead of spinning for a bit, it would schedule instead.  This was bad
> :).  Spin-then-sleep would be great behaviour in this situation.

But surely you *knew*, from inspection, which code paths needed
a spinning lock, and which code paths needed a sleeping lock?

Assuming the answer is "yes" then a nice fix would be to use
two separate locks - one which spins and one which sleeps.

Now, if the resource which is being protected truly cannot
be split up into spin-protected and sleep-protected sections
then a lock which can be atomically converted from spinning to
sleeping at the programmer's discretion seems appropriate.

A dynamic lock which says "we've spun for too long, let's sleep"
seems to be a tradeoff between programmer effort and efficiency,
and a bad one at that.

Possibly the locks could become more adaptive, and could, at
each call site, "learn" the expected spintime.  But it all seems
too baroque to me.

-

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 19:58   ` yodaiken
@ 2002-02-07 20:08     ` Robert Love
  2002-02-07 20:15       ` yodaiken
  0 siblings, 1 reply; 61+ messages in thread
From: Robert Love @ 2002-02-07 20:08 UTC (permalink / raw)
  To: yodaiken; +Cc: Martin Wirth, linux-kernel, akpm, torvalds, mingo, nigel

On Thu, 2002-02-07 at 14:58, yodaiken@fsmlabs.com wrote:

> On Thu, Feb 07, 2002 at 01:40:59PM -0500, Robert Love wrote:
> > We shouldn't engage in wholesale changing of spinlocks to semaphores
> > without a priority-inheritance mechanism.  And _that_ is the bigger
> > issue ...
> 
> Cool. We can then have the Solaris "this usually doesn't fail on test" priority
> inherit read/write lock.  I can hardly wait.

Or, we could do things right and not.

	Robert Love


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 20:06       ` Andrew Morton
@ 2002-02-07 20:11         ` Robert Love
  0 siblings, 0 replies; 61+ messages in thread
From: Robert Love @ 2002-02-07 20:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave Hansen, Martin Wirth, linux-kernel, mingo, nigel

On Thu, 2002-02-07 at 15:06, Andrew Morton wrote:

> A dynamic lock which says "we've spun for too long, let's sleep"
> seems to be a tradeoff between programmer effort and efficiency,
> and a bad one at that.

I'm not so sure.  What if we can't _know_ how long the lock will be held
because we don't know the status of the holder?  What if _he_ is
sleeping on some other lock or their are a lot of contending processes?

Certainly I agree, we need to put forth effort into designing things
right and with a minimal amount of lock held time.

> Possibly the locks could become more adaptive, and could, at
> each call site, "learn" the expected spintime.  But it all seems
> too baroque to me.

Agreed, this is much too much ;-)

	Robert Love

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 20:08     ` Robert Love
@ 2002-02-07 20:15       ` yodaiken
  2002-02-07 20:20         ` Robert Love
  0 siblings, 1 reply; 61+ messages in thread
From: yodaiken @ 2002-02-07 20:15 UTC (permalink / raw)
  To: Robert Love
  Cc: yodaiken, Martin Wirth, linux-kernel, akpm, torvalds, mingo,
	nigel

On Thu, Feb 07, 2002 at 03:08:02PM -0500, Robert Love wrote:
> On Thu, 2002-02-07 at 14:58, yodaiken@fsmlabs.com wrote:
> 
> > On Thu, Feb 07, 2002 at 01:40:59PM -0500, Robert Love wrote:
> > > We shouldn't engage in wholesale changing of spinlocks to semaphores
> > > without a priority-inheritance mechanism.  And _that_ is the bigger
> > > issue ...
> > 
> > Cool. We can then have the Solaris "this usually doesn't fail on test" priority
> > inherit read/write lock.  I can hardly wait.
> 
> Or, we could do things right and not.

I'd love to hear how things could be done right here. 
There seem to be 3 choices for reader writer locks
	1. Do the right thing and say no to inheritance: and this
	means no inheritance on mutexes either.
	2. Use the Solaris - "sometimes kinda works" method.
	3. Make readers/writer locks very slow and expensive e.g
	a complete list of reader identities that with atomic insert/delete
	and with check for uniqueness on insert! Not to mention the write
	promotion, any interactions between the "favor writes" design it should
	have and inheritance, links for a mutex inheriting lock to follow down
	the complete tree of paths from the r/w lock ...


-- 
---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 20:15       ` yodaiken
@ 2002-02-07 20:20         ` Robert Love
  2002-02-07 20:36           ` yodaiken
  0 siblings, 1 reply; 61+ messages in thread
From: Robert Love @ 2002-02-07 20:20 UTC (permalink / raw)
  To: yodaiken; +Cc: Martin Wirth, linux-kernel, akpm, torvalds, mingo, nigel

On Thu, 2002-02-07 at 15:15, yodaiken@fsmlabs.com wrote:

> I'd love to hear how things could be done right here. 
> There seem to be 3 choices for reader writer locks

Assuming there is no

	4. a solution that works

(and I do not assume that) we can just not do inheritance under
reader-writer locks and that means they remain as spin locks.  Normal
spin locks remain proper candidates.

I never mentioned anything about reader-writer locks in my original
email.  Most of the long-held locks I am considering are not in this
category anyway ...

	Robert Love

P.S. If this is going to turn into another priority-inheritance flame, I
am stopping here.  Let's take it off-list or just drop it, please.  I'd
much prefer to discuss the current combilock issue which is at hand. ;)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 22:09   ` Ingo Molnar
@ 2002-02-07 20:31     ` yodaiken
  2002-02-07 20:57       ` Andrew Morton
  2002-02-08 12:31     ` Christoph Hellwig
  1 sibling, 1 reply; 61+ messages in thread
From: yodaiken @ 2002-02-07 20:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: yodaiken, Martin Wirth, linux-kernel, akpm, torvalds, rml, nigel

On Thu, Feb 07, 2002 at 11:09:16PM +0100, Ingo Molnar wrote:
> 
> On Thu, 7 Feb 2002, yodaiken wrote:
> 
> > So what's the difference between combi_spin and combi_mutex?
> > combi_spin becomes
> > 	if not mutex locked, spin
> > 	else sleep
> > Bizzare
> 
> no, the real optimization is that when spin meets spin, they will not
> mutex. If a mutex-user has it then spins turn into mutex, but that (is
> supposed to) happen rarely.

It seems like what you want is:
	if the lock is about to be released, spin, else sleep.
But what is proposed is
	if the lock is locked as a mutex, sleep, else spin
although I doubt either of these work - they seem like attempts to avoid
designing the code.

> 
> i think one example *could* be to turn inode->i_sem into a combi-lock. Eg.
> generic_file_llseek() could use the spin variant.
> 
> this is a real performance problem, i've seen scheduling storms in
> dbench-type runs due to llseek taking the inode semaphore.
	llseek:
		atomic_enquee request
		if no room gotta sleep
		else if trylock mutex
			return
		     else
			do work
			loop:
			     process any pending requests
			     release lock;
			     if pending_requests && !(trylock mutex) goto loop

			
			
> whether combi-locks truly bring performance benefits remains to be seen,
> but the patch definitely needs to provide some working example and some
> hard numbers for some real workload.

I think it's a lot easier to propose lock structures than to work on
reducing synchronization problems.

> 
> 	Ingo

-- 
---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 20:20         ` Robert Love
@ 2002-02-07 20:36           ` yodaiken
  2002-02-07 20:57             ` Daniel Phillips
  0 siblings, 1 reply; 61+ messages in thread
From: yodaiken @ 2002-02-07 20:36 UTC (permalink / raw)
  To: Robert Love
  Cc: yodaiken, Martin Wirth, linux-kernel, akpm, torvalds, mingo,
	nigel

On Thu, Feb 07, 2002 at 03:20:24PM -0500, Robert Love wrote:
> On Thu, 2002-02-07 at 15:15, yodaiken@fsmlabs.com wrote:
> 
> > I'd love to hear how things could be done right here. 
> > There seem to be 3 choices for reader writer locks
> 
> Assuming there is no
> 
> 	4. a solution that works
> 
> (and I do not assume that) we can just not do inheritance under

> reader-writer locks and that means they remain as spin locks.  Normal
> spin locks remain proper candidates.
> 
> I never mentioned anything about reader-writer locks in my original
> email.  Most of the long-held locks I am considering are not in this
> category anyway ...

I'm content to let it drop here, but I simply observe that you keep bringing
up the glorious future of inheritance without addressing any of the hard
problems. My contention is that the very capable Solaris engineers did not find the
(4) above because it does not exist.

> P.S. If this is going to turn into another priority-inheritance flame, I
> am stopping here.  Let's take it off-list or just drop it, please.  I'd
> much prefer to discuss the current combilock issue which is at hand. ;)

It's the same issue.


-- 
---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [RFC] New locking primitive for 2.5
  2002-02-07 18:40 ` Robert Love
  2002-02-07 19:25   ` Andrew Morton
  2002-02-07 19:58   ` yodaiken
@ 2002-02-07 20:49   ` Martin Wirth
  2002-02-08  8:34   ` Martin Wirth
  3 siblings, 0 replies; 61+ messages in thread
From: Martin Wirth @ 2002-02-07 20:49 UTC (permalink / raw)
  Cc: Robert Love, linux-kernel, akpm, mingo


Christoph Hellwig  wrote:
> I think this API is really ugly.  If both pathes actually do the same,
> just with different defaults, one lock function with a flag would be
> much nicer.  
Just to use plain numbers is not very instructive, so you ask for a
macro 
definition like COMBI_LOCK_SPIN_MODE ????? 


> Also why do we need two unlock functions?
There is the generic_unlock function, if you forgot in which mode you
are.
The main reason is performance for the spin mode: combi_spin_unlock is
just
a spin_unlock, no test, no branch. So you are faster if you know what
you did
a few lines of code before ;-)


Robert Love wrote:
> > If a spin_lock request is blocked by a mutex_lock call, the spin_lock
> > attempt also sleeps i.e. behaves like a semaphore.
> > If you gained ownership of the lock, you can switch between spin-mode
> > and mutex-(ie.e sleeping) mode by calling:
> 
> This can be bad.  What if I grab a spinlock in a codepath where only a
> spinlock is appropriate (i.e. somewhere I can't sleep, like an irq
> handler) -- and then I sleep?

As noted in my initial e-mail the current implementation is not for
use in irq-handlers or BHs etc. 
> 
> > Open questions:
> >
> >   * Does it make sense to also provide irq-save versions of the
> >     locking functions? This means you could use the unlock functions
> >     from interrupt context. But the main use in this situation is
> >     completion handling and there are already (new) completion handlers
> >     available. So I don't think this is a must have
> 
> You can't sleep in an interrupt request handler\x03, so this wouldn't make a
> lot of sense.

You of course were only allowed to call the unlock() functions!!
Therefore you could use them to free a resource from the handler
(but that's very much completion handling, see above).

> We shouldn't engage in wholesale changing of spinlocks to semaphores
> without a priority-inheritance mechanism.  And _that_ is the bigger
> issue ...

The combilock at least can be used to narrow the time windows for
priority
inversion because for most purposes you would use the spin mode. I
thinking
about some extension in this direction (that's why the owner field is a
pointer
to the owning task btw.).



> As for combi lock itself, it would be great, if it were possible to
> detect whether lock is held by thread running on the same CPU and sleep
> if so. This would allow for implementing interrupts as separate threads,
> etc.

That the e.g. the aproach of Solaris which results in about 5 time
higher 
latencies from a hardware interrupt to the waiting process.


    Martin Wirth

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 20:31     ` yodaiken
@ 2002-02-07 20:57       ` Andrew Morton
  2002-02-07 21:02         ` yodaiken
  0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2002-02-07 20:57 UTC (permalink / raw)
  To: yodaiken; +Cc: Ingo Molnar, Martin Wirth, linux-kernel, rml, nigel

yodaiken@fsmlabs.com wrote:
> 
>         llseek:
>                 atomic_enquee request
>                 if no room gotta sleep
>                 else if trylock mutex
>                         return
>                      else
>                         do work
>                         loop:
>                              process any pending requests
>                              release lock;
>                              if pending_requests && !(trylock mutex) goto loop

This is how printk() works.  It was a very powerful and satisfactory
solution to a nasty locking/atomicity problem.  It'd be nice to have
a more generic way of expressing that solution.

-

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 20:36           ` yodaiken
@ 2002-02-07 20:57             ` Daniel Phillips
  2002-02-07 21:00               ` yodaiken
  0 siblings, 1 reply; 61+ messages in thread
From: Daniel Phillips @ 2002-02-07 20:57 UTC (permalink / raw)
  To: yodaiken, Robert Love
  Cc: yodaiken, Martin Wirth, linux-kernel, akpm, torvalds, mingo,
	nigel

On February 7, 2002 09:36 pm, yodaiken@fsmlabs.com wrote:
> > P.S. If this is going to turn into another priority-inheritance flame, I
> > am stopping here.  Let's take it off-list or just drop it, please.  I'd
> > much prefer to discuss the current combilock issue which is at hand. ;)
> 
> It's the same issue.

Not necessarily, look at Ingo's observation about replacing semaphores with 
combi-locks as opposed to replacing spinlocks with combi-locks.

-- 
Daniel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 20:57             ` Daniel Phillips
@ 2002-02-07 21:00               ` yodaiken
  2002-02-07 21:10                 ` Daniel Phillips
  0 siblings, 1 reply; 61+ messages in thread
From: yodaiken @ 2002-02-07 21:00 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: yodaiken, Robert Love, Martin Wirth, linux-kernel, akpm, torvalds,
	mingo, nigel

On Thu, Feb 07, 2002 at 09:57:35PM +0100, Daniel Phillips wrote:
> On February 7, 2002 09:36 pm, yodaiken@fsmlabs.com wrote:
> > > P.S. If this is going to turn into another priority-inheritance flame, I
> > > am stopping here.  Let's take it off-list or just drop it, please.  I'd
> > > much prefer to discuss the current combilock issue which is at hand. ;)
> > 
> > It's the same issue.
> 
> Not necessarily, look at Ingo's observation about replacing semaphores with 
> combi-locks as opposed to replacing spinlocks with combi-locks.

The underlying issue is an attempt to find a magic trick that will make
hard synchronization problems go away. The result is usually something that
makes hard synchronization problems more obscure. Ingo points to a case,
apparently triggered only by a wierd benchmark artefact where a queue of very
short term operations builds up a queue of expensive process reschedules. The
problem is that the same semaphore is used to for slow and fast operations
and the solution is to split them apart somehow or to conclude that its not
an important case. 

As Ingo points out, you need some actual positive results here, not a plausibility
argument.

-- 
---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 20:57       ` Andrew Morton
@ 2002-02-07 21:02         ` yodaiken
  0 siblings, 0 replies; 61+ messages in thread
From: yodaiken @ 2002-02-07 21:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: yodaiken, Ingo Molnar, Martin Wirth, linux-kernel, rml, nigel

On Thu, Feb 07, 2002 at 12:57:19PM -0800, Andrew Morton wrote:
> yodaiken@fsmlabs.com wrote:
> > 
> >         llseek:
> >                 atomic_enquee request
> >                 if no room gotta sleep
> >                 else if trylock mutex
> >                         return
> >                      else
> >                         do work
> >                         loop:
> >                              process any pending requests
> >                              release lock;
> >                              if pending_requests && !(trylock mutex) goto loop
> 
> This is how printk() works.  It was a very powerful and satisfactory
> solution to a nasty locking/atomicity problem.  It'd be nice to have
> a more generic way of expressing that solution.

note how I put in the goto so Ingo would be more happy with it -)

> 
> -

-- 
---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 21:00               ` yodaiken
@ 2002-02-07 21:10                 ` Daniel Phillips
  0 siblings, 0 replies; 61+ messages in thread
From: Daniel Phillips @ 2002-02-07 21:10 UTC (permalink / raw)
  To: yodaiken
  Cc: yodaiken, Robert Love, Martin Wirth, linux-kernel, akpm, torvalds,
	mingo, nigel

On February 7, 2002 10:00 pm, yodaiken@fsmlabs.com wrote:
> As Ingo points out, you need some actual positive results here, not a plausibility
> argument.

Yup.

-- 
Daniel

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 19:25   ` Andrew Morton
  2002-02-07 19:51     ` Dave Hansen
@ 2002-02-07 21:27     ` Ingo Molnar
  2002-02-07 19:59       ` Andrew Morton
  2002-02-08  8:20     ` Nigel Gamble
  2 siblings, 1 reply; 61+ messages in thread
From: Ingo Molnar @ 2002-02-07 21:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Robert Love, Martin Wirth, linux-kernel, nigel

On Thu, 7 Feb 2002, Andrew Morton wrote:

> Quite a few.  Significant ones.  pagemap_lru_lock and lru_list_lock
> come to mind.

ugh. Are you sure we want to *sleep* with something like pagemap_lru_lock
held? That pretty much brings all pagecache related operations to a
grinding halt. I think complex spinlocked sections should be simplified
rather.

	Ingo

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 19:56 ` yodaiken
@ 2002-02-07 22:09   ` Ingo Molnar
  2002-02-07 20:31     ` yodaiken
  2002-02-08 12:31     ` Christoph Hellwig
  2002-02-08 12:47   ` Denis Vlasenko
  1 sibling, 2 replies; 61+ messages in thread
From: Ingo Molnar @ 2002-02-07 22:09 UTC (permalink / raw)
  To: yodaiken; +Cc: Martin Wirth, linux-kernel, akpm, torvalds, rml, nigel

On Thu, 7 Feb 2002, yodaiken wrote:

> So what's the difference between combi_spin and combi_mutex?
> combi_spin becomes
> 	if not mutex locked, spin
> 	else sleep
> Bizzare

no, the real optimization is that when spin meets spin, they will not
mutex. If a mutex-user has it then spins turn into mutex, but that (is
supposed to) happen rarely.

i think one example *could* be to turn inode->i_sem into a combi-lock. Eg.
generic_file_llseek() could use the spin variant.

this is a real performance problem, i've seen scheduling storms in
dbench-type runs due to llseek taking the inode semaphore.

whether combi-locks truly bring performance benefits remains to be seen,
but the patch definitely needs to provide some working example and some
hard numbers for some real workload.

	Ingo

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 19:25   ` Andrew Morton
  2002-02-07 19:51     ` Dave Hansen
  2002-02-07 21:27     ` Ingo Molnar
@ 2002-02-08  8:20     ` Nigel Gamble
  2002-02-08 17:06       ` Larry McVoy
  2 siblings, 1 reply; 61+ messages in thread
From: Nigel Gamble @ 2002-02-08  8:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Robert Love, Martin Wirth, linux-kernel, mingo

On Thu, 7 Feb 2002, Andrew Morton wrote:
> I dunno.  The spin-a-bit-then-sleep lock has always struck me as
> i_dont_know_what_the_fuck_im_doing_lock().  Martin's approach puts
> the decision in the hands of the programmer, rather than saying
> "Oh gee I goofed" at runtime.

I completely agree, and I couldn't have put it better!  Kernel
programmers really should know exactly why, what, where and for how long
they are holding a lock.

This is why, incidently, I don't like any of the so-called lockless
schemes, including the original unix kernel monitor lock (i.e. only one
kernel thread active at a time), because they encourage unmaintainable
code where the critical sections are invisible to everyone and are
easily broken when someone accidently inserts a blocking function into
one of the invisible critical sections.

Nigel Gamble                                    nigel@nrg.org
Mountain View, CA, USA.                         http://www.nrg.org/

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 18:40 ` Robert Love
                     ` (2 preceding siblings ...)
  2002-02-07 20:49   ` Martin Wirth
@ 2002-02-08  8:34   ` Martin Wirth
  2002-02-08 18:28     ` Linus Torvalds
  3 siblings, 1 reply; 61+ messages in thread
From: Martin Wirth @ 2002-02-08  8:34 UTC (permalink / raw)
  To: Robert Love; +Cc: linux-kernel, akpm, torvalds, mingo, haveblue

Robert Love wrote:

> Some of the talk I've heard has been toward an adaptive lock.  These
> are locks like Solaris's that can spin or sleep, usually depending on
> the state of the lock's holder.  Another alternative, which I prefer
> since it is much less overhead, is a lock that spins-then-sleeps
> unconditionally.

Dave Hanson wrote:

> he spin-then-sleep lock could be interesting as a replacement for the 
> BKL in places where a semaphore causes performance degredation.  In 
> quite a few places where we replaced the BKL with a more finely grained 
> semapore (not a spinlock because we needed to sleep during the hold), 
> instead of spinning for a bit, it would schedule instead.  This was bad 
> :).  Spin-then-sleep would be great behaviour in this situation.


Wouldn't it be sufficient to include the following patch of code
at the beginning of __combi_wait(...):

        if (smp_processor_id() != owner->cpu) {
                int cnt=MAX_LOOP_CNT;
retry:
                spin_unlock(&x->wait.lock) 
                do {
                        barrier();
                while (--cnt && x->owner);
                spin_lock(&x->wait.lock);
                if (!x->owner)
                        return;
                if (cnt)
                        goto retry;
        }
       then the sleep code of __combi_wait(...)

If one fears that the owner (or current if the kernel is made
preemptible) migrated to the same cpu while we are spinning 
for x->owner and hence may
make no progress, one could let the waiting loop last about a typical
process switch time and add an outer loop that checks if the cpu
of the owner is still different.


  Martin Wirth

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 18:22 ` Christoph Hellwig
  2002-02-07 19:33   ` Daniel Phillips
  2002-02-07 19:55   ` Mark Frazer
@ 2002-02-08 12:24   ` Denis Vlasenko
  2 siblings, 0 replies; 61+ messages in thread
From: Denis Vlasenko @ 2002-02-08 12:24 UTC (permalink / raw)
  To: Christoph Hellwig, Martin Wirth; +Cc: akpm, mingo, rml, nigel, linux-kernel

On 7 February 2002 16:22, Christoph Hellwig wrote:
> In article <3C629F91.2869CB1F@dlr.de> you wrote:
> > The new lock uses a combination of a spinlock and a (mutex-)semaphore.
> > You can lock it for short-term issues in a spin-lock mode:
> >
> >         combi_spin_lock(struct combilock *x)
> >         combi_spin_unlock(struct combilock *x)
> >
> > and for longer lasting tasks in a sleeping mode by:
> >
> >         combi_mutex_lock(struct combilock *x)
> >         combi_mutex_unlock(struct combilock *x)
>
> I think this API is really ugly.  If both pathes actually do the same,
> just with different defaults, one lock function with a flag would be
> much nicer.  Also why do we need two unlock functions?
>
> What about the following instead:
>
> 	combi_lock(struct combilock *x, int spin);
> 	combi_unlock(struct combilock *x);

What is easier to read:
	combi_lock(zzzt_clock, 1);	
	// go grepping .h to find out what this 1 means
or
	combi_spin_lock(zzzt_clock);
?

OTOH single unlock() looks good.
--
vda

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 22:09   ` Ingo Molnar
  2002-02-07 20:31     ` yodaiken
@ 2002-02-08 12:31     ` Christoph Hellwig
  2002-02-08 16:51       ` Nigel Gamble
                         ` (2 more replies)
  1 sibling, 3 replies; 61+ messages in thread
From: Christoph Hellwig @ 2002-02-08 12:31 UTC (permalink / raw)
  To: Ingo Molnar, yodaiken
  Cc: Martin Wirth, linux-kernel, akpm, torvalds, rml, nigel

In article <Pine.LNX.4.33.0202072305480.2976-100000@localhost.localdomain> you wrote:
> i think one example *could* be to turn inode->i_sem into a combi-lock. Eg.
> generic_file_llseek() could use the spin variant.

No.  i_sem should be split into a spinlock for short-time accessed
fields that get written to even if the file content is only read (i.e.
atime) and a read-write semaphore.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 19:56 ` yodaiken
  2002-02-07 22:09   ` Ingo Molnar
@ 2002-02-08 12:47   ` Denis Vlasenko
  2002-02-08 15:13     ` yodaiken
  1 sibling, 1 reply; 61+ messages in thread
From: Denis Vlasenko @ 2002-02-08 12:47 UTC (permalink / raw)
  To: yodaiken, Martin Wirth; +Cc: linux-kernel, akpm, mingo, rml, nigel

On 7 February 2002 17:56, yodaiken@fsmlabs.com wrote:
> > If a spin_lock request is blocked by a mutex_lock call, the spin_lock
> > attempt also sleeps i.e. behaves like a semaphore.
>
> So what's the difference between combi_spin and combi_mutex?
> combi_spin becomes
> 	if not mutex locked, spin
> 	else sleep
> Bizzare

combi_spin_lock():
If not mutex locked, spin - will be released shortly
else sleep - may take long time before released
 * lock released *
spin lock it!     <=== this is the difference -
                       combi_mutex_lock would mutex lock it here

What's wrong with this?
--
vda

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 12:47   ` Denis Vlasenko
@ 2002-02-08 15:13     ` yodaiken
  0 siblings, 0 replies; 61+ messages in thread
From: yodaiken @ 2002-02-08 15:13 UTC (permalink / raw)
  To: Denis Vlasenko
  Cc: yodaiken, Martin Wirth, linux-kernel, akpm, mingo, rml, nigel

On Fri, Feb 08, 2002 at 10:47:24AM -0200, Denis Vlasenko wrote:
> On 7 February 2002 17:56, yodaiken@fsmlabs.com wrote:
> > > If a spin_lock request is blocked by a mutex_lock call, the spin_lock
> > > attempt also sleeps i.e. behaves like a semaphore.
> >
> > So what's the difference between combi_spin and combi_mutex?
> > combi_spin becomes
> > 	if not mutex locked, spin
> > 	else sleep
> > Bizzare
> 
> combi_spin_lock():
> If not mutex locked, spin - will be released shortly
> else sleep - may take long time before released
>  * lock released *
> spin lock it!     <=== this is the difference -
>                        combi_mutex_lock would mutex lock it here
> 
> What's wrong with this?

In the elegant words of Andrew Morton, this is a "I don't know what
the fuck I'm doing lock".


-- 
---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 12:31     ` Christoph Hellwig
@ 2002-02-08 16:51       ` Nigel Gamble
  2002-02-08 18:41       ` Andrew Morton
  2002-02-08 20:14       ` Anton Altaparmakov
  2 siblings, 0 replies; 61+ messages in thread
From: Nigel Gamble @ 2002-02-08 16:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ingo Molnar, yodaiken, Martin Wirth, linux-kernel, akpm, torvalds,
	rml

On Fri, 8 Feb 2002, Christoph Hellwig wrote:
> In article <Pine.LNX.4.33.0202072305480.2976-100000@localhost.localdomain> you wrote:
> > i think one example *could* be to turn inode->i_sem into a combi-lock. Eg.
> > generic_file_llseek() could use the spin variant.
>
> No.  i_sem should be split into a spinlock for short-time accessed
> fields that get written to even if the file content is only read (i.e.
> atime) and a read-write semaphore.

Read-write semaphores should never be used.  As others have pointed out,
they cause really intractable priority inversion problems (because a
high priority writer will often have to wait for an unbounded number of
lower priority readers, some of which may have called a blocking
function while holding the read lock).

Note that I'm not talking about read-write spinlocks, which are (or
should be) held for a short, bounded time and can't be held over a
blocking call, so they are not quite so problematic.

Nigel Gamble                                    nigel@nrg.org
Mountain View, CA, USA.                         http://www.nrg.org/

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08  8:20     ` Nigel Gamble
@ 2002-02-08 17:06       ` Larry McVoy
  0 siblings, 0 replies; 61+ messages in thread
From: Larry McVoy @ 2002-02-08 17:06 UTC (permalink / raw)
  To: Nigel Gamble
  Cc: Andrew Morton, Robert Love, Martin Wirth, linux-kernel, mingo

On Fri, Feb 08, 2002 at 12:20:37AM -0800, Nigel Gamble wrote:
> On Thu, 7 Feb 2002, Andrew Morton wrote:
> > I dunno.  The spin-a-bit-then-sleep lock has always struck me as
> > i_dont_know_what_the_fuck_im_doing_lock().  Martin's approach puts
> > the decision in the hands of the programmer, rather than saying
> > "Oh gee I goofed" at runtime.
> 
> I completely agree, and I couldn't have put it better!  Kernel
> programmers really should know exactly why, what, where and for how long
> they are holding a lock.

Should != do.

And any kernel programmer who says they do in a fine grained multithreaded
kernel is full of it.  Look at IRIX, look at Solaris, and show me someone
who says they know for a fact how long they hold each lock and I'll show
you a liar.

Furthermore, while adaptive spin-then-sleep locks may look stupid, I think
you may be missing the point.  If you are running an SMP kernel on a UP,
you want the lock to sleep immediately.  If you are running an SMP kernel
on an SMP, then you want to spin if the lock is held by some other CPU
but sleep if it is held by this CPU.  I suspect that that is what was
really meant by spin-a-bit-then-sleep, it just got lost in translation.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 18:28     ` Linus Torvalds
@ 2002-02-08 18:12       ` Martin Wirth
  2002-02-08 18:33         ` Alexander Viro
  2002-02-08 20:02         ` Linus Torvalds
  0 siblings, 2 replies; 61+ messages in thread
From: Martin Wirth @ 2002-02-08 18:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Robert Love, linux-kernel, akpm, mingo, haveblue

Linus Torvalds wrote:
> 
> Now, before we go further, can people explain _why_ we want this?
> 
> If something is getting a lot of short contention as a semaphore, maybe
> it's just broken locking. Let's not work around it with a new locking
> primitive just because we can.
> 
> What is the _existing_ problem this is trying to solve, and why?
> 
>                 Linus

There are currently several attempts discussed to push out the
BKL and replace it by a semaphore e.g. the next step Robert Love
planned for his ll_seek patch (replace the BKL by inode i_sem). 

The naive replacement BKL -> semaphore is surely bad.
Now on the other extreme you may always find a splitting of
the data you want to protected into short locked and long locked
sections like Christoph Hellwig suggested for i_sem:
> 
> No.  i_sem should be split into a spinlock for short-time accessed
> fields that get written to even if the file content is only read (i.e.
> atime) and a read-write semaphore.
> 
But this approach needs a lot a proper documentation and discipline.

So for most BKL removal work I suggested the combi-lock which scales
better than a semaphore but is more manageable than splitted locking. 

   Martin

P.S.: I posted the combi-lock as a practical RFC to bring the discussion
about lock scalability, latency and possible preemptiblity of the
linux kernel back to a concrete technical level (if you look at the 
"[2.4.17/18] VM and swap - it's really unusable" thread some weeks 
ago you can surely imagine why). And the simple reason is that for
my real time data acquisition systems I really would like to get rid
of my SunOS 5.8 and W2K machines. But the truth is that the standard
linux kernel is a real (worst case) latency hog even when compared with
these two bloated OS, at least for my applications. Only Andrew
Morton's (full)-ll patch boosts linux to comparable performance 
(less than 1-2 ms latency under heavy io-load). (I know that SunOS 5.8
and W2K also can have larger latencies under special circumstances,
but not for a simple streaming data acquisition with some online
visualization).

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08  8:34   ` Martin Wirth
@ 2002-02-08 18:28     ` Linus Torvalds
  2002-02-08 18:12       ` Martin Wirth
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2002-02-08 18:28 UTC (permalink / raw)
  To: Martin Wirth; +Cc: Robert Love, linux-kernel, akpm, mingo, haveblue

Now, before we go further, can people explain _why_ we want this?

If something is getting a lot of short contention as a semaphore, maybe
it's just broken locking. Let's not work around it with a new locking
primitive just because we can.

What is the _existing_ problem this is trying to solve, and why?

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 18:12       ` Martin Wirth
@ 2002-02-08 18:33         ` Alexander Viro
  2002-02-08 20:02         ` Linus Torvalds
  1 sibling, 0 replies; 61+ messages in thread
From: Alexander Viro @ 2002-02-08 18:33 UTC (permalink / raw)
  To: Martin Wirth
  Cc: Linus Torvalds, Robert Love, linux-kernel, akpm, mingo, haveblue

On Fri, 8 Feb 2002, Martin Wirth wrote:

> But this approach needs a lot a proper documentation and discipline.

... and this attitude is my main beef with B^WLSE crowd.

Folks, if you have nothing else to do - *don't* mess with locking just
for the heck of it.  And yes, you are supposed to understand WTF you
are doing.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 12:31     ` Christoph Hellwig
  2002-02-08 16:51       ` Nigel Gamble
@ 2002-02-08 18:41       ` Andrew Morton
  2002-02-08 20:47         ` Ingo Molnar
  2002-02-08 20:14       ` Anton Altaparmakov
  2 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2002-02-08 18:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ingo Molnar, yodaiken, Martin Wirth, linux-kernel, torvalds, rml,
	nigel

Christoph Hellwig wrote:
> 
> In article <Pine.LNX.4.33.0202072305480.2976-100000@localhost.localdomain> you wrote:
> > i think one example *could* be to turn inode->i_sem into a combi-lock. Eg.
> > generic_file_llseek() could use the spin variant.
> 
> No.  i_sem should be split into a spinlock for short-time accessed
> fields that get written to even if the file content is only read (i.e.
> atime) and a read-write semaphore.

I don't see any strong reason for taking i_sem in the generic seek
functions.  The only thing we seem to need to protect in there
is the non-atomic access to 64-bit i_size on 32-bit platforms,
for which a spinlock is appropriate.

I'd be interested in hearing more details on the regression which
Ingo has seen due to the introduction of i_sem locking in llseek.
Not just for "I told you so" value, but for the body of knowledge :)

-

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 20:02         ` Linus Torvalds
@ 2002-02-08 18:54           ` Andrew Morton
  2002-02-08 19:11             ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: Andrew Morton @ 2002-02-08 18:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin Wirth, Robert Love, linux-kernel, mingo, haveblue

Linus Torvalds wrote:
> 
> On Fri, 8 Feb 2002, Martin Wirth wrote:
> >
> > There are currently several attempts discussed to push out the
> > BKL and replace it by a semaphore e.g. the next step Robert Love
> > planned for his ll_seek patch (replace the BKL by inode i_sem).
> 
> But that won't have any contention anyway, so it's a non-issue.
> 

Yesterday, Ingo said:

> i think one example *could* be to turn inode->i_sem into a combi-lock. Eg.
> generic_file_llseek() could use the spin variant.
>
> this is a real performance problem, i've seen scheduling storms in
> dbench-type runs due to llseek taking the inode semaphore.

-

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 20:47         ` Ingo Molnar
@ 2002-02-08 18:56           ` Alexander Viro
  2002-02-08 20:59             ` Ingo Molnar
  0 siblings, 1 reply; 61+ messages in thread
From: Alexander Viro @ 2002-02-08 18:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Christoph Hellwig, yodaiken, Martin Wirth,
	linux-kernel, torvalds, rml, nigel

On Fri, 8 Feb 2002, Ingo Molnar wrote:

> i'd suggest 64-bit update instructions on x86 as well, they do exist.
> spinlock only for the truly hopeless cases like SMP boxes composed of
> i486's. We really want llseek() to scale ...

Ingo, are you sure that you actually saw llseek() causing problems?
And not, say it, ext2_get_block()?

If you've got a heavy holder of some lock + lots of guys who grab it
for a short periods, the real trouble is the former, not the latter.

I'm going to send ext2-without-BKL patches to Linus - tonight or tomorrow.
I really wonder what effect that would have on the things.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 20:59             ` Ingo Molnar
@ 2002-02-08 19:10               ` Alexander Viro
  0 siblings, 0 replies; 61+ messages in thread
From: Alexander Viro @ 2002-02-08 19:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Christoph Hellwig, yodaiken, Martin Wirth,
	linux-kernel, torvalds, rml, nigel



On Fri, 8 Feb 2002, Ingo Molnar wrote:

> > I'm going to send ext2-without-BKL patches to Linus - tonight or
> > tomorrow. I really wonder what effect that would have on the things.
> 
> oh, that is a really cool thing!
> 
> llseek() is unrelated, and i think pretty gross. Is there any other reason
> to llseek()'s i_sem usage other than the 64-bit atomic update of the file
> offset value? We can do lockless, SMP-correct 64-bit updates on x86 pretty
> easily.

Umm...  Wait a second.  You've seen the problems on ->i_sem variant
of llseek()?  My apologies - I've misparsed you.

I seriously suspect that BKL-for-lseek() will be good enough once we
kill BKL in ext2_get_block() and friends.  IOW, I doubt that
generic_file_lseek() showing up in BKL contention is the real
problem...


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 18:54           ` Andrew Morton
@ 2002-02-08 19:11             ` Linus Torvalds
  2002-02-08 19:21               ` Alexander Viro
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2002-02-08 19:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin Wirth, Robert Love, linux-kernel, mingo, haveblue



On Fri, 8 Feb 2002, Andrew Morton wrote:
>
> Yesterday, Ingo said:
>
> > i think one example *could* be to turn inode->i_sem into a combi-lock. Eg.
> > generic_file_llseek() could use the spin variant.
> >
> > this is a real performance problem, i've seen scheduling storms in
> > dbench-type runs due to llseek taking the inode semaphore.

... so just make it a spinlock instead.

The semaphore is overkill, as the only thing we're really protecting is
one 64-bit access against other updates.

		Linus


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 19:11             ` Linus Torvalds
@ 2002-02-08 19:21               ` Alexander Viro
  2002-02-08 19:36                 ` Robert Love
  2002-02-08 21:23                 ` Ingo Molnar
  0 siblings, 2 replies; 61+ messages in thread
From: Alexander Viro @ 2002-02-08 19:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Martin Wirth, Robert Love, linux-kernel, mingo,
	haveblue



On Fri, 8 Feb 2002, Linus Torvalds wrote:

> ... so just make it a spinlock instead.
> 
> The semaphore is overkill, as the only thing we're really protecting is
> one 64-bit access against other updates.

I'm not sure that we really need a separate spinlock here.  BKL might
be just fine, provided that we remove it from real hogs.  And we can
do it now.

Had anyone actually seen lseek() vs. lseek() contention prior to the
switch to ->i_sem-based variant?  If the mix looked like
	infrequently called BKL hog + many lseeks()
almost all contention cases would have lseek() spinning while
a hog holds BKL.  And real problem here is a hog...


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-07 15:38 [RFC] New locking primitive for 2.5 Martin Wirth
                   ` (3 preceding siblings ...)
  2002-02-07 19:56 ` yodaiken
@ 2002-02-08 19:22 ` Horst von Brand
  4 siblings, 0 replies; 61+ messages in thread
From: Horst von Brand @ 2002-02-08 19:22 UTC (permalink / raw)
  To: Martin Wirth; +Cc: linux-kernel

Martin Wirth <Martin.Wirth@dlr.de> said:
> This is a request for comment on a new locking primitive
> called a combilock.
> 
> The goal of this development is:
> 
> 1. To allow for a better SMP scalability of semaphores used as Mutex
> 2. As a replacement for long held spinlocks in an preemptible kernel
> 
> The new lock uses a combination of a spinlock and a (mutex-)semaphore.
> You can lock it for short-term issues in a spin-lock mode:
> 
>         combi_spin_lock(struct combilock *x)
>         combi_spin_unlock(struct combilock *x)
> 
> and for longer lasting tasks in a sleeping mode by:
> 
>         combi_mutex_lock(struct combilock *x)
>         combi_mutex_unlock(struct combilock *x)

Can you sleep if acquired as the spinlock?

Is there any measurable (or at least plausible reason why there should be)
performance improvement? (No, "should make preemptible kernel faster"
doesn't cut it at all for me). Or any hope that it will substantially
simplify kernel programming with _no_ performance degradation by replacing
both semphores and spinlocks?
-- 
Horst von Brand			     http://counter.li.org # 22616

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 19:21               ` Alexander Viro
@ 2002-02-08 19:36                 ` Robert Love
  2002-02-09  0:18                   ` Alexander Viro
  2002-02-08 21:23                 ` Ingo Molnar
  1 sibling, 1 reply; 61+ messages in thread
From: Robert Love @ 2002-02-08 19:36 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Linus Torvalds, Andrew Morton, Martin Wirth, linux-kernel, mingo,
	haveblue

On Fri, 2002-02-08 at 14:21, Alexander Viro wrote:

> Had anyone actually seen lseek() vs. lseek() contention prior to the
> switch to ->i_sem-based variant?

Yes, I did, even on my 2-way.

Additionally, when I posted the remove-bkl-llseek patch, someone from
SGI noted that on a 24-processor NUMA IA-64 machine, _50%_ of machine
time was spent spinning on the BKL in llseek-intense operations.

The bkl is not held for a long time, but it is acquired often, and there
are definitely workloads that show a big hit with the BKL in there.

	Robert Love

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 18:12       ` Martin Wirth
  2002-02-08 18:33         ` Alexander Viro
@ 2002-02-08 20:02         ` Linus Torvalds
  2002-02-08 18:54           ` Andrew Morton
  1 sibling, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2002-02-08 20:02 UTC (permalink / raw)
  To: Martin Wirth; +Cc: Robert Love, linux-kernel, akpm, mingo, haveblue



On Fri, 8 Feb 2002, Martin Wirth wrote:
>
> There are currently several attempts discussed to push out the
> BKL and replace it by a semaphore e.g. the next step Robert Love
> planned for his ll_seek patch (replace the BKL by inode i_sem).

But that won't have any contention anyway, so it's a non-issue.

		Linus


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 21:36                   ` Linus Torvalds
@ 2002-02-08 20:04                     ` Jeff Garzik
  2002-02-08 21:16                       ` Mike Fedyk
  2002-02-08 21:40                       ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 61+ messages in thread
From: Jeff Garzik @ 2002-02-08 20:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Alexander Viro, Andrew Morton, Martin Wirth,
	Robert Love, linux-kernel, haveblue

Linus Torvalds wrote:
> 
> On Fri, 8 Feb 2002, Ingo Molnar wrote:
> >
> > and regarding the reintroduction of BKL, *please* do not just use a global
> > locks around such pieces of code, lock bouncing sucks on SMP, even if
> > there is no overhead.
> 
> I'd suggest not having a lock at all, but instead add two functions: one
> to read a 64-bit value atomically, the other to write it atomically (and
> they'd be atomic only wrt each other, no memory barriers etc implied).
> 
> On 64-bit architectures that's just a direct dereference, and even on x86
> it's just a "cmpxchg8b".

Are there architectures out there that absolutely must implement this
with a spinlock?  Your suggested API of functions to read/write 64-bit
values atomically would work for such a case, but still I am just
curious.

	Jeff


-- 
Jeff Garzik      | "I went through my candy like hot oatmeal
Building 1024    |  through an internally-buttered weasel."
MandrakeSoft     |             - goats.com

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 12:31     ` Christoph Hellwig
  2002-02-08 16:51       ` Nigel Gamble
  2002-02-08 18:41       ` Andrew Morton
@ 2002-02-08 20:14       ` Anton Altaparmakov
  2002-02-08 20:38         ` yodaiken
  2002-02-08 21:55         ` Anton Altaparmakov
  2 siblings, 2 replies; 61+ messages in thread
From: Anton Altaparmakov @ 2002-02-08 20:14 UTC (permalink / raw)
  To: nigel
  Cc: Christoph Hellwig, Ingo Molnar, yodaiken, Martin Wirth,
	linux-kernel, akpm, torvalds, rml

At 16:51 08/02/02, Nigel Gamble wrote:
> > No.  i_sem should be split into a spinlock for short-time accessed
> > fields that get written to even if the file content is only read (i.e.
> > atime) and a read-write semaphore.
>
>Read-write semaphores should never be used.  As others have pointed out,
>they cause really intractable priority inversion problems (because a
>high priority writer will often have to wait for an unbounded number of
>lower priority readers, some of which may have called a blocking
>function while holding the read lock).
>
>Note that I'm not talking about read-write spinlocks, which are (or
>should be) held for a short, bounded time and can't be held over a
>blocking call, so they are not quite so problematic.

Read-write semaphores have their use and the current Linux implementation 
(big reader/occasional writer) guarantees that the writer is not starved as 
incoming read locks are put to sleep as soon as a write lock request comes 
in, even if that is sleeping waiting for the old readlocks to be released 
(unless the readers are holding the semaphore forever in which case this is 
the programmers fault and not the rw semaphore implementations fault). [I 
should add I only am familliar with the i386 implementation but I assume 
the others are the same.]

The value of allowing multiple cpus to read the same data simultaneously by 
far offsets the priority problems IMVHO. At least the way I am using rw 
semaphores in ntfs it is. Readlocks are grabbed loads and loads of times to 
serialize meta data access in the page cache while writelocks are a minute 
number in comparison and because the data required to be accessed may not 
be cached in memory (page cache page is not read in, is swapped out, 
whatever) a disk access may be required which means a rw spin lock is no 
good. In fact ntfs would be the perfect candidate for automatic rw combi 
locks where the locking switches from spinning to sleeping if the code path 
reaches a disk access. I can't use a manually controlled lock as the page 
cache lookups are done via the mm/filemap.c access functions which are the 
only ones who can know if a disk access is required or not so ntfs would 
never know if it is going to sleep or not so unless the locks had 
autodetection of whether to spin or sleep they would be useless.

I guess the point I am trying to make is that both rw semaphores and combi 
locks are not bad per se but, as all other locking mechanisms, they should 
be used in situations appropriate for their locktype and their 
implementation...

Anton

-- 
   "I've not lost my mind. It's backed up on tape somewhere." - Unknown
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 20:14       ` Anton Altaparmakov
@ 2002-02-08 20:38         ` yodaiken
  2002-02-08 21:55         ` Anton Altaparmakov
  1 sibling, 0 replies; 61+ messages in thread
From: yodaiken @ 2002-02-08 20:38 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: nigel, Christoph Hellwig, Ingo Molnar, yodaiken, Martin Wirth,
	linux-kernel, akpm, torvalds, rml

On Fri, Feb 08, 2002 at 08:14:32PM +0000, Anton Altaparmakov wrote:
> At 16:51 08/02/02, Nigel Gamble wrote:
> >Read-write semaphores should never be used.  As others have pointed out,
> >they cause really intractable priority inversion problems (because a
...
> 
> Read-write semaphores have their use and the current Linux implementation 

Here's the context: the preemption patch puts pressure on Linux to move
from BKL to semaphores and then it will be seen that semaphores need
to have dynamic priority inherit to sort-of-work, and then it will be seen
that read/write lock is a problem! 


> The value of allowing multiple cpus to read the same data simultaneously by 
> far offsets the priority problems IMVHO. At least the way I am using rw 
> semaphores in ntfs it is. Readlocks are grabbed loads and loads of times to 
> serialize meta data access in the page cache while writelocks are a minute 
> number in comparison and because the data required to be accessed may not 

this is absolutely correct. However, once the decision has been made or
fallen into to go to a priority inherit scheme, Linux will find itself
in the same bind as Solaris.

> be cached in memory (page cache page is not read in, is swapped out, 
> whatever) a disk access may be required which means a rw spin lock is no 
> good. In fact ntfs would be the perfect candidate for automatic rw combi 
> locks where the locking switches from spinning to sleeping if the code path 
> reaches a disk access. I can't use a manually controlled lock as the page 

Seem like the lock is simply grabbed way to far up. 


-- 
---------------------------------------------------------
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 18:41       ` Andrew Morton
@ 2002-02-08 20:47         ` Ingo Molnar
  2002-02-08 18:56           ` Alexander Viro
  0 siblings, 1 reply; 61+ messages in thread
From: Ingo Molnar @ 2002-02-08 20:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, yodaiken, Martin Wirth, linux-kernel, torvalds,
	rml, nigel

On Fri, 8 Feb 2002, Andrew Morton wrote:

> I'd be interested in hearing more details on the regression which Ingo
> has seen due to the introduction of i_sem locking in llseek. [...]

i saw heavy scheduling during dbench runs (even if just running 6 threads
on an 8 CPU box), and checked out the source of the scheduling storm -
most of it was due to llseek()'s down().

i also wrote a dbench-alike load simulator for pagecache scalability,
there i saw this in an even more prominent way, 200k/sec reschedules in a
situation when there should be none.

i'd suggest 64-bit update instructions on x86 as well, they do exist.
spinlock only for the truly hopeless cases like SMP boxes composed of
i486's. We really want llseek() to scale ...

	Ingo

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 18:56           ` Alexander Viro
@ 2002-02-08 20:59             ` Ingo Molnar
  2002-02-08 19:10               ` Alexander Viro
  0 siblings, 1 reply; 61+ messages in thread
From: Ingo Molnar @ 2002-02-08 20:59 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Andrew Morton, Christoph Hellwig, yodaiken, Martin Wirth,
	linux-kernel, torvalds, rml, nigel

On Fri, 8 Feb 2002, Alexander Viro wrote:

> > i'd suggest 64-bit update instructions on x86 as well, they do exist.
> > spinlock only for the truly hopeless cases like SMP boxes composed of
> > i486's. We really want llseek() to scale ...
>
> Ingo, are you sure that you actually saw llseek() causing problems?

yes.

> And not, say it, ext2_get_block()?

no. I saw ext2_get_block() overhead in other workloads, but not this one.
(this was a fully cached pagecache-only workload.)

> If you've got a heavy holder of some lock + lots of guys who grab it
> for a short periods, the real trouble is the former, not the latter.

no, i simply had code that called llseek() pretty often for the same inode
(big database file), and on multiple CPUs it was just way too easy for one
semaphore user to cause another one to block.

> I'm going to send ext2-without-BKL patches to Linus - tonight or
> tomorrow. I really wonder what effect that would have on the things.

oh, that is a really cool thing!

llseek() is unrelated, and i think pretty gross. Is there any other reason
to llseek()'s i_sem usage other than the 64-bit atomic update of the file
offset value? We can do lockless, SMP-correct 64-bit updates on x86 pretty
easily.

	Ingo

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 20:04                     ` Jeff Garzik
@ 2002-02-08 21:16                       ` Mike Fedyk
  2002-02-09  0:09                         ` Alan Cox
  2002-02-08 21:40                       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 61+ messages in thread
From: Mike Fedyk @ 2002-02-08 21:16 UTC (permalink / raw)
  To: linux-kernel

On Fri, Feb 08, 2002 at 03:04:34PM -0500, Jeff Garzik wrote:
> Linus Torvalds wrote:
> > 
> > On Fri, 8 Feb 2002, Ingo Molnar wrote:
> > >
> > > and regarding the reintroduction of BKL, *please* do not just use a global
> > > locks around such pieces of code, lock bouncing sucks on SMP, even if
> > > there is no overhead.
> > 
> > I'd suggest not having a lock at all, but instead add two functions: one
> > to read a 64-bit value atomically, the other to write it atomically (and
> > they'd be atomic only wrt each other, no memory barriers etc implied).
> > 
> > On 64-bit architectures that's just a direct dereference, and even on x86
> > it's just a "cmpxchg8b".
> 
> Are there architectures out there that absolutely must implement this
> with a spinlock?  Your suggested API of functions to read/write 64-bit
> values atomically would work for such a case, but still I am just
> curious.
> 

SMP 486s would need that (if there is such a beast).  What point does x86
get the 64 bit instructions?  If after 586, then it would definately need a
spinlock or somesuch in those functions.

Mike

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 19:21               ` Alexander Viro
  2002-02-08 19:36                 ` Robert Love
@ 2002-02-08 21:23                 ` Ingo Molnar
  2002-02-08 21:36                   ` Linus Torvalds
  1 sibling, 1 reply; 61+ messages in thread
From: Ingo Molnar @ 2002-02-08 21:23 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Linus Torvalds, Andrew Morton, Martin Wirth, Robert Love,
	linux-kernel, haveblue


On Fri, 8 Feb 2002, Alexander Viro wrote:

> Had anyone actually seen lseek() vs. lseek() contention prior to the
> switch to ->i_sem-based variant? [...]

yes, i've seen this for years. (if you accept dbench overhead.)

and regarding the reintroduction of BKL, *please* do not just use a global
locks around such pieces of code, lock bouncing sucks on SMP, even if
there is no overhead.

	Ingo


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 21:23                 ` Ingo Molnar
@ 2002-02-08 21:36                   ` Linus Torvalds
  2002-02-08 20:04                     ` Jeff Garzik
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2002-02-08 21:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexander Viro, Andrew Morton, Martin Wirth, Robert Love,
	linux-kernel, haveblue

On Fri, 8 Feb 2002, Ingo Molnar wrote:
>
> and regarding the reintroduction of BKL, *please* do not just use a global
> locks around such pieces of code, lock bouncing sucks on SMP, even if
> there is no overhead.

I'd suggest not having a lock at all, but instead add two functions: one
to read a 64-bit value atomically, the other to write it atomically (and
they'd be atomic only wrt each other, no memory barriers etc implied).

On 64-bit architectures that's just a direct dereference, and even on x86
it's just a "cmpxchg8b".

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 20:04                     ` Jeff Garzik
  2002-02-08 21:16                       ` Mike Fedyk
@ 2002-02-08 21:40                       ` Benjamin Herrenschmidt
  2002-02-09 19:32                         ` Linus Torvalds
  1 sibling, 1 reply; 61+ messages in thread
From: Benjamin Herrenschmidt @ 2002-02-08 21:40 UTC (permalink / raw)
  To: Linus Torvalds, Jeff Garzik
  Cc: Ingo Molnar, Alexander Viro, Andrew Morton, Martin Wirth,
	Robert Love, linux-kernel, haveblue

>Are there architectures out there that absolutely must implement this
>with a spinlock?  Your suggested API of functions to read/write 64-bit
>values atomically would work for such a case, but still I am just
>curious.

At least PPC32 can't do that without a spinlock_irq

Ben.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 20:14       ` Anton Altaparmakov
  2002-02-08 20:38         ` yodaiken
@ 2002-02-08 21:55         ` Anton Altaparmakov
  1 sibling, 0 replies; 61+ messages in thread
From: Anton Altaparmakov @ 2002-02-08 21:55 UTC (permalink / raw)
  To: yodaiken
  Cc: nigel, Christoph Hellwig, Ingo Molnar, yodaiken, Martin Wirth,
	linux-kernel, akpm, torvalds, rml

At 20:38 08/02/02, yodaiken@fsmlabs.com wrote:
>On Fri, Feb 08, 2002 at 08:14:32PM +0000, Anton Altaparmakov wrote:
> > The value of allowing multiple cpus to read the same data 
> simultaneously by
> > far offsets the priority problems IMVHO. At least the way I am using rw
> > semaphores in ntfs it is. Readlocks are grabbed loads and loads of 
> times to
> > serialize meta data access in the page cache while writelocks are a minute
> > number in comparison and because the data required to be accessed may not
>
>this is absolutely correct. However, once the decision has been made or
>fallen into to go to a priority inherit scheme, Linux will find itself
>in the same bind as Solaris.
>
> > be cached in memory (page cache page is not read in, is swapped out,
> > whatever) a disk access may be required which means a rw spin lock is no
> > good. In fact ntfs would be the perfect candidate for automatic rw combi
> > locks where the locking switches from spinning to sleeping if the code 
> path
> > reaches a disk access. I can't use a manually controlled lock as the page
>
>Seem like the lock is simply grabbed way to far up.

It cannot be taken any later. NTFS is a database both in memory and on disk 
and in fact the in memory meta data is the on-disk structures (i.e. no 
conversion between the two is performed, except in few speed optimization 
cases and in cases where meta data is compressed on disk).

Because everything is ogranized in database records, I need to grab the 
lock as soon as I intend to access the record (in fact the lock itself is 
taken by map_mft_record(READ or WRITE) which returns a locked record, that 
is mapped and pinned in memory, to the caller. I can then do all sorts of 
things with the record, like search it, parse data from it, etc. When 
finished I unmap_mft_record(READ or WRITE) which undoes everything 
map_mft_record() did, i.e. unpins the record, unmaps it and releases the lock.

All transactions with the on disk storage are done via the page cache and 
read_cache_page() using a special readpage() and a special async io 
completion handler.

Because of the abstraction of access layers it is undesirable to operate on 
the same locks in different layers, hence the top most layer is the one 
locking and unlocking the records. The lower layers don't even know the 
locks exist. Also pushing the lock deep into the layers is not actually 
possible as it would probably need to be taken in reac_cache_page() (the 
only place where the code knows if the page is present in memory or not and 
a disk access is required) which is a VFS/MM function and hence that is not 
an option.

Further, I may need to allocate memory in order to store decompressed 
metadata for example but I won't know if I need to do that until I have 
already locked the record and accessed it to determine if the metadata is 
compressed or not.

So basically I can only obtain sufficient info for deciding what lock I 
need after I have already accessed parts of the record, but to access it I 
need to have locked it. Chicken and egg situation... So the locks are 
always semaphores and are taken in the top layer.

A combi lock would allow spinning in the case where it turns out the code 
paths will not hit disk or sleep in kmalloc, etc and sleeping in the 
disk/kmalloc hit case.

(Yes I know atomic kmalloc exists but I actually need vmalloc in some cases 
depending how much memory I need to handle the metadata...)

Best regards,

Anton

-- 
   "I've not lost my mind. It's backed up on tape somewhere." - Unknown
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Linux NTFS Maintainer / WWW: http://linux-ntfs.sf.net/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-09  0:09                         ` Alan Cox
@ 2002-02-09  0:05                           ` Mike Fedyk
  0 siblings, 0 replies; 61+ messages in thread
From: Mike Fedyk @ 2002-02-09  0:05 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Sat, Feb 09, 2002 at 12:09:04AM +0000, Alan Cox wrote:
> > SMP 486s would need that (if there is such a beast).  What point does x86
> > get the 64 bit instructions?  If after 586, then it would definitely need a
> > spin-lock or some-such in those functions.
> 
> There are SMP 486 class x86 machines that are MP 1.1 compliant. They are
> sufficiently rare that I think its quite acceptable to "implement" a
> cmpxchg8b macro on 486 as spin_lock_irqsave/blah/spin_unlock_irqrestore. It
> would be wrong to cripple the other 99.99% of SMP users
> 

Sorry, I only meant to say that the only question is where the split should
be between spin-lock and 64bit instruction...

This would be included in the appropriate config option.

Mike

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 21:16                       ` Mike Fedyk
@ 2002-02-09  0:09                         ` Alan Cox
  2002-02-09  0:05                           ` Mike Fedyk
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Cox @ 2002-02-09  0:09 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: linux-kernel

> SMP 486s would need that (if there is such a beast).  What point does x86
> get the 64 bit instructions?  If after 586, then it would definately need a
> spinlock or somesuch in those functions.

There are SMP 486 class x86 machines that are MP 1.1 compliant. They are
sufficiently rare that I think its quite acceptable to "implement" a
cmpxchg8b macro on 486 as spin_lock_irqsave/blah/spin_unlock_irqrestore. It
would be wrong to cripple the other 99.99% of SMP users

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 19:36                 ` Robert Love
@ 2002-02-09  0:18                   ` Alexander Viro
  0 siblings, 0 replies; 61+ messages in thread
From: Alexander Viro @ 2002-02-09  0:18 UTC (permalink / raw)
  To: Robert Love
  Cc: Linus Torvalds, Andrew Morton, Martin Wirth, linux-kernel, mingo,
	haveblue

On 8 Feb 2002, Robert Love wrote:

> On Fri, 2002-02-08 at 14:21, Alexander Viro wrote:
> 
> > Had anyone actually seen lseek() vs. lseek() contention prior to the
> > switch to ->i_sem-based variant?
> 
> Yes, I did, even on my 2-way.
> 
> Additionally, when I posted the remove-bkl-llseek patch, someone from
> SGI noted that on a 24-processor NUMA IA-64 machine, _50%_ of machine
> time was spent spinning on the BKL in llseek-intense operations.

That doesn't say _anything_ about the source of contention.

> The bkl is not held for a long time, but it is acquired often, and there
> are definitely workloads that show a big hit with the BKL in there.

Sigh...  OK, here's an exercise:

You have a spinlock and two functions - hog() and light().  Both grab that
spinlock.  light() releases it and returns fast.  hog() spends quite a
while in critical area and then releases the lock.

a) show that if lock gets contended then most of the time it will be
held by hog()

b) show that time spent spinning in light() depends mostly on the amount
of time spent in hog()

c) replace light() with light1(), ..., lightN().  Compare the effect
of removing lock from one of them with that of removing it from hog().

d) characterize the dependence of light()/light() contention upon the
amount of time spent in hog().  (hint: attempts to call light()
while hog() is running will lead to accumulation of contenders and when
hog() is done they _will_ figth each other).

General rules:
	* go for hogs first, don't try to pick the light ones.
	* one spinning most is not necessary the cause of contention.
	* even when you see real light/light contention - check the
history immediately before; you may find the hog that had had held them
back.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [RFC] New locking primitive for 2.5
  2002-02-08 21:40                       ` Benjamin Herrenschmidt
@ 2002-02-09 19:32                         ` Linus Torvalds
  0 siblings, 0 replies; 61+ messages in thread
From: Linus Torvalds @ 2002-02-09 19:32 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jeff Garzik, Ingo Molnar, Alexander Viro, Andrew Morton,
	Martin Wirth, Robert Love, linux-kernel, haveblue



On Fri, 8 Feb 2002, Benjamin Herrenschmidt wrote:
>
> At least PPC32 can't do that without a spinlock_irq

We don't need to be irq-safe for this, though. Just specify it to be
process safe - which means that on UP it boils down to at most maybe
having to protect against preemption.

		Linus


^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2002-02-09 17:47 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-07 15:38 [RFC] New locking primitive for 2.5 Martin Wirth
2002-02-07 18:04 ` Daniel Phillips
2002-02-07 18:06   ` Richard Gooch
2002-02-07 18:22 ` Christoph Hellwig
2002-02-07 19:33   ` Daniel Phillips
2002-02-07 19:55   ` Mark Frazer
2002-02-08 12:24   ` Denis Vlasenko
2002-02-07 18:40 ` Robert Love
2002-02-07 19:25   ` Andrew Morton
2002-02-07 19:51     ` Dave Hansen
2002-02-07 20:06       ` Andrew Morton
2002-02-07 20:11         ` Robert Love
2002-02-07 21:27     ` Ingo Molnar
2002-02-07 19:59       ` Andrew Morton
2002-02-08  8:20     ` Nigel Gamble
2002-02-08 17:06       ` Larry McVoy
2002-02-07 19:58   ` yodaiken
2002-02-07 20:08     ` Robert Love
2002-02-07 20:15       ` yodaiken
2002-02-07 20:20         ` Robert Love
2002-02-07 20:36           ` yodaiken
2002-02-07 20:57             ` Daniel Phillips
2002-02-07 21:00               ` yodaiken
2002-02-07 21:10                 ` Daniel Phillips
2002-02-07 20:49   ` Martin Wirth
2002-02-08  8:34   ` Martin Wirth
2002-02-08 18:28     ` Linus Torvalds
2002-02-08 18:12       ` Martin Wirth
2002-02-08 18:33         ` Alexander Viro
2002-02-08 20:02         ` Linus Torvalds
2002-02-08 18:54           ` Andrew Morton
2002-02-08 19:11             ` Linus Torvalds
2002-02-08 19:21               ` Alexander Viro
2002-02-08 19:36                 ` Robert Love
2002-02-09  0:18                   ` Alexander Viro
2002-02-08 21:23                 ` Ingo Molnar
2002-02-08 21:36                   ` Linus Torvalds
2002-02-08 20:04                     ` Jeff Garzik
2002-02-08 21:16                       ` Mike Fedyk
2002-02-09  0:09                         ` Alan Cox
2002-02-09  0:05                           ` Mike Fedyk
2002-02-08 21:40                       ` Benjamin Herrenschmidt
2002-02-09 19:32                         ` Linus Torvalds
2002-02-07 19:56 ` yodaiken
2002-02-07 22:09   ` Ingo Molnar
2002-02-07 20:31     ` yodaiken
2002-02-07 20:57       ` Andrew Morton
2002-02-07 21:02         ` yodaiken
2002-02-08 12:31     ` Christoph Hellwig
2002-02-08 16:51       ` Nigel Gamble
2002-02-08 18:41       ` Andrew Morton
2002-02-08 20:47         ` Ingo Molnar
2002-02-08 18:56           ` Alexander Viro
2002-02-08 20:59             ` Ingo Molnar
2002-02-08 19:10               ` Alexander Viro
2002-02-08 20:14       ` Anton Altaparmakov
2002-02-08 20:38         ` yodaiken
2002-02-08 21:55         ` Anton Altaparmakov
2002-02-08 12:47   ` Denis Vlasenko
2002-02-08 15:13     ` yodaiken
2002-02-08 19:22 ` Horst von Brand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox