Re: RFC: patch to allow lock-free traversal of lists with insertion

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: RFC: patch to allow lock-free traversal of lists with insertion
@ 2001-10-09 15:45 Paul McKenney
  2001-10-09 17:00 ` Richard Henderson
  2001-10-10  2:05 ` [Lse-tech] " Andrea Arcangeli
  0 siblings, 2 replies; 67+ messages in thread
From: Paul McKenney @ 2001-10-09 15:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: linux-kernel, lse-tech

> On Mon, Oct 08, 2001 at 06:55:24PM -0700, Paul E. McKenney wrote:
> > This is a proposal to provide a wmb()-like primitive that enables
> > lock-free traversal of lists while elements are concurrently being
> > inserted into these lists.
>
> I've discussed this with you before and you continue to have
> completely missed the point.

It would not be the first point that I have completely missed, but
please read on.  I have discussed this algorithm with Alpha architects,
who tell me that it is sound.

> Alpha requires that you issue read-after-read memory barriers on
> the reader side if you require ordering between reads.  That is
> the extent of the weakness of the memory ordering.

I agree that Alpha requires "mb" instructions to be executed on the
reading CPUs if the reading CPUs are to observe some other CPU's writes
occuring in order.  And I agree that the usual way that this is done
is to insert "mb" instructions between the reads on the read side.

However, if the reading CPU executes an "mb" instruction between the
time that the writing CPU executes the "wmb" and the time that the writing
CPU executes the second store, then the reading CPU is guaranteed to
see the writes in order.  Here is how this happens:

     Initial values: a = 0, p = &a, b = 1.

     Writing CPU              Reading CPU

     1) b = 2;

     2) Execute "wmb" instruction

     3) Send a bunch of IPIs

                         4) Receive IPI

                         5) Execute "mb" instruction

                         6) Indicate completion

     7) Detect completion

     8) p = &b

The combination of steps 2 and 5 guarantee that the reading CPU will
invalidate the cacheline containing the old value of "b" before it
can possibly reference the new value of "p".  The CPU must read "p"
before "b", since it can't know where "p" points before reading it.

> Sparc64 is the same way.

I can believe that "membar #StoreStore" and friends operate in the same
way that the Alpha memory-ordering instructions do.  However, some of
the code in Linux seems to rely on "membar #SYNC" waiting for outstanding
invalidations to complete.  If this is the case, then "membar #SYNC"
could be used to segregate writes when the corresponding reads are
implicitly ordered by data dependencies, as they are during pointer
dereferences.

> This crap will never be applied.  Your algorithms are simply broken
> if you do not ensure proper read ordering via the rmb() macro.

Please see the example above.  I do believe that my algorithms are
reliably forcing proper read ordering using IPIs, just in an different
way.  Please note that I have discussed this algorithm with Alpha
architects, who believe that it is sound.

But they (and I) may well be confused.  If so, could you please
show me what I am missing?

                              Thanx, Paul

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-09 15:45 RFC: patch to allow lock-free traversal of lists with insertion Paul McKenney
@ 2001-10-09 17:00 ` Richard Henderson
  2001-10-10  3:33   ` Paul Mackerras
  2001-10-10  2:05 ` [Lse-tech] " Andrea Arcangeli
  1 sibling, 1 reply; 67+ messages in thread
From: Richard Henderson @ 2001-10-09 17:00 UTC (permalink / raw)
  To: Paul McKenney; +Cc: linux-kernel, lse-tech

On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote:
> Please see the example above.  I do believe that my algorithms are
> reliably forcing proper read ordering using IPIs, just in an different
> way.

I wasn't suggesting that the IPI wouldn't work -- it will.
But it will be _extremely_ slow.

I am suggesting that the lock-free algorithms should add the
read barriers, and that failure to do so indicates that they
are incomplete.  If nothing else, it documents where the real
dependancies are.


r~

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-09 17:00 ` Richard Henderson
@ 2001-10-10  3:33   ` Paul Mackerras
  2001-10-10 17:02     ` Richard Henderson
  0 siblings, 1 reply; 67+ messages in thread
From: Paul Mackerras @ 2001-10-10  3:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Paul McKenney, linux-kernel, lse-tech

Richard Henderson writes:

> I am suggesting that the lock-free algorithms should add the
> read barriers, and that failure to do so indicates that they
> are incomplete.  If nothing else, it documents where the real
> dependancies are.

Please, let's not go adding rmb's in places where there is already an
ordering forced by a data dependency - that will hurt performance
unnecessarily on x86, ppc, sparc, ia64, etc.

It seems to me that there are two viable alternatives:

1. Define an rmbdd() which is a no-op on all architectures except for
   alpha, where it is an rmb.  Richard can then have the job of
   finding all the places where an rmbdd is needed, which sounds like
   one of the smellier labors of Hercules to me. :)  

2. Use Paul McKenney's scheme.

I personally don't really mind which gets chosen.  Scheme 1 will
result in intermittent hard-to-find bugs on alpha (since the vast
majority of kernel hackers will not understand where or why rmbdd's
are required), but if Richard prefers that to scheme 2, it's his call
IMHO.

Regards,
Paul.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  3:33   ` Paul Mackerras
@ 2001-10-10 17:02     ` Richard Henderson
  0 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2001-10-10 17:02 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Paul McKenney, linux-kernel, lse-tech

On Wed, Oct 10, 2001 at 01:33:58PM +1000, Paul Mackerras wrote:
> 1. Define an rmbdd() which is a no-op on all architectures except for
>    alpha, where it is an rmb.  Richard can then have the job of
>    finding all the places where an rmbdd is needed, which sounds like
>    one of the smellier labors of Hercules to me. :)  

I don't think it's actually all that bad.  There won't be all
that many places that require the rmbdd, and they'll pretty
much exactly correspond to the places in which you have to put
wmb for all architectures anyway.


r~

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-09 15:45 RFC: patch to allow lock-free traversal of lists with insertion Paul McKenney
  2001-10-09 17:00 ` Richard Henderson
@ 2001-10-10  2:05 ` Andrea Arcangeli
  2001-10-10  5:05   ` Linus Torvalds
  2001-10-10 13:24   ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Ivan Kokshaysky
  1 sibling, 2 replies; 67+ messages in thread
From: Andrea Arcangeli @ 2001-10-10  2:05 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Richard Henderson, linux-kernel, lse-tech, Jay.Estabrook,
	Peter Rival

On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote:
> Please see the example above.  I do believe that my algorithms are
> reliably forcing proper read ordering using IPIs, just in an different
> way.  Please note that I have discussed this algorithm with Alpha
> architects, who believe that it is sound.

The IPI way is certainly safe.

The point here is that it is suprisingly that alpha needs this IPI
unlike all other architectures. So while the IPI is certainly safe we
wouldn't expect it to be necessary on alpha either.

Now my only worry is that when you worked on this years ago with the
alpha architects there were old chips, old caches and old machines (ev5
maybe?).

So before changing any code, I would prefer to double check with the
current alpha architects that the read dependency really isn't enough to
enforce read ordering without the need of rmb also on the beleeding edge
ev6/ev67/etc.. cores. So potentially as worse we'd need to redefine
wmb() as wmbdd() (and friends) only for EV5+SMP compiles of the kernel,
but we wouldn't be affected with any recent hardware compiling for
EV6/EV67.  Jay, Peter, comments?

Andrea

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  2:05 ` [Lse-tech] " Andrea Arcangeli
@ 2001-10-10  5:05   ` Linus Torvalds
  2001-10-10  5:17     ` BALBIR SINGH
                       ` (3 more replies)
  2001-10-10 13:24   ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Ivan Kokshaysky
  1 sibling, 4 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-10-10  5:05 UTC (permalink / raw)
  To: linux-kernel

In article <20011010040502.A726@athlon.random>,
Andrea Arcangeli  <andrea@suse.de> wrote:
>On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote:
>> Please see the example above.  I do believe that my algorithms are
>> reliably forcing proper read ordering using IPIs, just in an different
>> way.  Please note that I have discussed this algorithm with Alpha
>> architects, who believe that it is sound.
>
>The IPI way is certainly safe.

Now, before people get all excited, what is this particular code
actually _good_ for?

Creating a lock-free list that allows insertion concurrently with lookup
is _easy_.

But what's the point? If you insert stuff, you eventually have to remove
it. What goes up must come down. Insert-inane-quote-here.

And THAT is the hard part. Doing lookup without locks ends up being
pretty much worthless, because you need the locks for the removal
anyway, at which point the whole thing looks pretty moot.

Did I miss something?

		Linus

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  5:05   ` Linus Torvalds
@ 2001-10-10  5:17     ` BALBIR SINGH
  2001-10-10  5:29       ` Davide Libenzi
  2001-10-10  5:46       ` Linus Torvalds
  2001-10-10  6:16     ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Paul Mackerras
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 67+ messages in thread
From: BALBIR SINGH @ 2001-10-10  5:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1675 bytes --]

Linus Torvalds wrote:

>In article <20011010040502.A726@athlon.random>,
>Andrea Arcangeli  <andrea@suse.de> wrote:
>
>>On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote:
>>
>>>Please see the example above.  I do believe that my algorithms are
>>>reliably forcing proper read ordering using IPIs, just in an different
>>>way.  Please note that I have discussed this algorithm with Alpha
>>>architects, who believe that it is sound.
>>>
>>The IPI way is certainly safe.
>>
>
>Now, before people get all excited, what is this particular code
>actually _good_ for?
>
>Creating a lock-free list that allows insertion concurrently with lookup
>is _easy_.
>
>But what's the point? If you insert stuff, you eventually have to remove
>it. What goes up must come down. Insert-inane-quote-here.
>
>And THAT is the hard part. Doing lookup without locks ends up being
>pretty much worthless, because you need the locks for the removal
>anyway, at which point the whole thing looks pretty moot.
>
>Did I miss something?
>
>		Linus
>

What about cases like the pci device list or any other such list. Sometimes
you do not care if somebody added something, while you were looking through
the list as long as you do not get illegal addresses or data.
Wouldn't this be very useful there? Most of these lists come up
at system startup and change rearly, but we look through them often.

Me too, Did I miss something?

Balbir

>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>




[-- Attachment #2: Wipro_Disclaimer.txt --]
[-- Type: text/plain, Size: 853 bytes --]

----------------------------------------------------------------------------------------------------------------------
Information transmitted by this E-MAIL is proprietary to Wipro and/or its Customers and
is intended for use only by the individual or entity to which it is
addressed, and may contain information that is privileged, confidential or
exempt from disclosure under applicable law. If you are not the intended
recipient or it appears that this mail has been forwarded to you without
proper authority, you are notified that any use or dissemination of this
information in any manner is strictly prohibited. In such cases, please
notify us immediately at mailto:mailadmin@wipro.com and delete this mail
from your records.
----------------------------------------------------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  5:17     ` BALBIR SINGH
@ 2001-10-10  5:29       ` Davide Libenzi
  2001-10-10  5:46       ` Linus Torvalds
  1 sibling, 0 replies; 67+ messages in thread
From: Davide Libenzi @ 2001-10-10  5:29 UTC (permalink / raw)
  To: BALBIR SINGH; +Cc: Linus Torvalds, linux-kernel

On Wed, 10 Oct 2001, BALBIR SINGH wrote:

> What about cases like the pci device list or any other such list. Sometimes
> you do not care if somebody added something, while you were looking through
> the list as long as you do not get illegal addresses or data.
> Wouldn't this be very useful there? Most of these lists come up
> at system startup and change rearly, but we look through them often.
>
> Me too, Did I miss something?

What means that changes rarely that you're going to use rarely_lock() ?
If you're going to remove even only 1 element in 1000000 jiffies then
you've to lock.
If your list can only grow, that's different.




- Davide



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  5:17     ` BALBIR SINGH
  2001-10-10  5:29       ` Davide Libenzi
@ 2001-10-10  5:46       ` Linus Torvalds
  2001-10-10  6:01         ` BALBIR SINGH
  2001-10-10  7:14         ` kdb requires kallsyms Kirill Ratkin
  1 sibling, 2 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-10-10  5:46 UTC (permalink / raw)
  To: BALBIR SINGH; +Cc: linux-kernel

On Wed, 10 Oct 2001, BALBIR SINGH wrote:
> >
> >And THAT is the hard part. Doing lookup without locks ends up being
> >pretty much worthless, because you need the locks for the removal
> >anyway, at which point the whole thing looks pretty moot.
>
> What about cases like the pci device list or any other such list. Sometimes
> you do not care if somebody added something, while you were looking through
> the list as long as you do not get illegal addresses or data.
> Wouldn't this be very useful there? Most of these lists come up
> at system startup and change rearly, but we look through them often.

It's not about "change rarely". You cannot use a non-locking lookup if
they _ever_ remove anything.

I can't think of many lists like that. The PCI lists certainly are both
add/remove: cardbus bridges and hotplug-PCI means that they are not just
purely "enumerate at bootup".

Sure, maybe there are _some_ things that don't need to ever be removed,
but I can't think of any interesting data structure off-hand. Truly static
stuff tends to be allocated in an array that is sized once - array lookups
are much faster than traversing a linked list anyway.

So the linked list approach tends to make sense for things that _aren't_
just static, but I don't know of anything that only grows and grows. In
fact, that would sound like a horrible memory leak to me if we had
something like that. Even slow growth is bad if you want up-times measured
in years.

Now, in all fairness I can imagine hacky lock-less removals too. To get
them to work, you have to (a) change the "next" pointer to point to the
next->next (and have some serialization between removals, but removals
only, to make sure you don't have next->next also going away from you) and
(b) leave the old "next" pointer (and thus the data structure) around
until you can _prove_ that nobody is looking anything up any more, and
that the now-defunct data structure can truly be removed.

However, apart from locking, there aren't all that many ways to "prove"
that non-use. You could probably do it with interesting atomic sequence
numbers etc, although by the time you generate atomic sequence numbers
your lookup is already getting heavier - to the point where locking
probably isn't so bad an idea any more.

		Linus

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  5:46       ` Linus Torvalds
@ 2001-10-10  6:01         ` BALBIR SINGH
  2001-10-10 15:23           ` Victor Yodaiken
  2001-10-10  7:14         ` kdb requires kallsyms Kirill Ratkin
  1 sibling, 1 reply; 67+ messages in thread
From: BALBIR SINGH @ 2001-10-10  6:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2995 bytes --]

Linus Torvalds wrote:

>On Wed, 10 Oct 2001, BALBIR SINGH wrote:
>
>>>And THAT is the hard part. Doing lookup without locks ends up being
>>>pretty much worthless, because you need the locks for the removal
>>>anyway, at which point the whole thing looks pretty moot.
>>>
>>What about cases like the pci device list or any other such list. Sometimes
>>you do not care if somebody added something, while you were looking through
>>the list as long as you do not get illegal addresses or data.
>>Wouldn't this be very useful there? Most of these lists come up
>>at system startup and change rearly, but we look through them often.
>>
>
>It's not about "change rarely". You cannot use a non-locking lookup if
>they _ever_ remove anything.
>
>I can't think of many lists like that. The PCI lists certainly are both
>add/remove: cardbus bridges and hotplug-PCI means that they are not just
>purely "enumerate at bootup".
>

I agree, I just thought of one case quickly. Assume that somebody did a cat /proc/pci.
Meanwhile somebody is adding a new pci device (hotplug PCI) simultaneously (which is very rare).
I would not care if the new device showed up in the output of /proc/pci this time. It would
definitely show up next time. Meanwhile locking the list (just in case it changes) is an
overhead in the case above. I was referring to these cases in my earlier mail.
   

>
>Sure, maybe there are _some_ things that don't need to ever be removed,
>but I can't think of any interesting data structure off-hand. Truly static
>stuff tends to be allocated in an array that is sized once - array lookups
>are much faster than traversing a linked list anyway.
>
>So the linked list approach tends to make sense for things that _aren't_
>just static, but I don't know of anything that only grows and grows. In
>fact, that would sound like a horrible memory leak to me if we had
>something like that. Even slow growth is bad if you want up-times measured
>in years.
>
>Now, in all fairness I can imagine hacky lock-less removals too. To get
>them to work, you have to (a) change the "next" pointer to point to the
>next->next (and have some serialization between removals, but removals
>only, to make sure you don't have next->next also going away from you) and
>(b) leave the old "next" pointer (and thus the data structure) around
>until you can _prove_ that nobody is looking anything up any more, and
>that the now-defunct data structure can truly be removed.
>
The problem with removals is that somebody could be adding an element simulataneously
to the "next" element. Which might cause problems. I hope I correctly understood
the scenario here.

>
>However, apart from locking, there aren't all that many ways to "prove"
>that non-use. You could probably do it with interesting atomic sequence
>numbers etc, although by the time you generate atomic sequence numbers
>your lookup is already getting heavier - to the point where locking
>probably isn't so bad an idea any more.
>
>		Linus
>
>
Balbir



[-- Attachment #2: Wipro_Disclaimer.txt --]
[-- Type: text/plain, Size: 853 bytes --]

----------------------------------------------------------------------------------------------------------------------
Information transmitted by this E-MAIL is proprietary to Wipro and/or its Customers and
is intended for use only by the individual or entity to which it is
addressed, and may contain information that is privileged, confidential or
exempt from disclosure under applicable law. If you are not the intended
recipient or it appears that this mail has been forwarded to you without
proper authority, you are notified that any use or dissemination of this
information in any manner is strictly prohibited. In such cases, please
notify us immediately at mailto:mailadmin@wipro.com and delete this mail
from your records.
----------------------------------------------------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  6:01         ` BALBIR SINGH
@ 2001-10-10 15:23           ` Victor Yodaiken
  0 siblings, 0 replies; 67+ messages in thread
From: Victor Yodaiken @ 2001-10-10 15:23 UTC (permalink / raw)
  To: BALBIR SINGH; +Cc: Linus Torvalds, linux-kernel

On Wed, Oct 10, 2001 at 11:31:08AM +0530, BALBIR SINGH wrote:
> Linus Torvalds wrote:
> >I can't think of many lists like that. The PCI lists certainly are both
> >add/remove: cardbus bridges and hotplug-PCI means that they are not just
> >purely "enumerate at bootup".
> >
> 
> I agree, I just thought of one case quickly. Assume that somebody did a cat /proc/pci.
> Meanwhile somebody is adding a new pci device (hotplug PCI) simultaneously (which is very rare).
> I would not care if the new device showed up in the output of /proc/pci this time. It would
> definitely show up next time. Meanwhile locking the list (just in case it changes) is an
> overhead in the case above. I was referring to these cases in my earlier mail.

So you make the data structure and algorithms more complex and hard to maintain in
order to get an undetectable improvement in the speed of something that almost
never happens and and that is not even in the same neighborhood as being a 
bottleneck?




^ permalink raw reply	[flat|nested] 67+ messages in thread

* kdb requires kallsyms
  2001-10-10  5:46       ` Linus Torvalds
  2001-10-10  6:01         ` BALBIR SINGH
@ 2001-10-10  7:14         ` Kirill Ratkin
  2001-10-10  7:38           ` BALBIR SINGH
  1 sibling, 1 reply; 67+ messages in thread
From: Kirill Ratkin @ 2001-10-10  7:14 UTC (permalink / raw)
  To: linux-kernel

Hi. 

I've trouble with kdb. I've patched my kernel (stable
2.4.10) and tried to make kernel with kdb (from SGI)
and I see :
/bin/sh: /sbin/kallsyms: No such file or directory
make[1]: *** [kallsyms] error 127

Where can I find this kallsyms?

Regards,

__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: kdb requires kallsyms
  2001-10-10  7:14         ` kdb requires kallsyms Kirill Ratkin
@ 2001-10-10  7:38           ` BALBIR SINGH
  0 siblings, 0 replies; 67+ messages in thread
From: BALBIR SINGH @ 2001-10-10  7:38 UTC (permalink / raw)
  To: Kirill Ratkin; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 848 bytes --]

Kirill Ratkin wrote:

>Hi. 
>
>I've trouble with kdb. I've patched my kernel (stable
>2.4.10) and tried to make kernel with kdb (from SGI)
>and I see :
>/bin/sh: /sbin/kallsyms: No such file or directory
>make[1]: *** [kallsyms] error 127
>
>Where can I find this kallsyms?
>
modutils provides this executable. please use
rpm -q --whatprovides `type <name of file>' to see which package provides
the executable.

Regards,
Balbir

>
>Regards,
>
>
>
>
>
>__________________________________________________
>Do You Yahoo!?
>Make a great connection at Yahoo! Personals.
>http://personals.yahoo.com
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>




[-- Attachment #2: Wipro_Disclaimer.txt --]
[-- Type: text/plain, Size: 853 bytes --]

----------------------------------------------------------------------------------------------------------------------
Information transmitted by this E-MAIL is proprietary to Wipro and/or its Customers and
is intended for use only by the individual or entity to which it is
addressed, and may contain information that is privileged, confidential or
exempt from disclosure under applicable law. If you are not the intended
recipient or it appears that this mail has been forwarded to you without
proper authority, you are notified that any use or dissemination of this
information in any manner is strictly prohibited. In such cases, please
notify us immediately at mailto:mailadmin@wipro.com and delete this mail
from your records.
----------------------------------------------------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  5:05   ` Linus Torvalds
  2001-10-10  5:17     ` BALBIR SINGH
@ 2001-10-10  6:16     ` Paul Mackerras
  2001-10-10  6:30       ` Linus Torvalds
  2001-10-10  7:36     ` Paul Mackerras
  2001-10-10 11:54     ` Keith Owens
  3 siblings, 1 reply; 67+ messages in thread
From: Paul Mackerras @ 2001-10-10  6:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds writes:

> And THAT is the hard part. Doing lookup without locks ends up being
> pretty much worthless, because you need the locks for the removal
> anyway, at which point the whole thing looks pretty moot.
> 
> Did I miss something?

I believe this all becomes (much more) useful when you are doing
read-copy-update.

There is an assumption that anyone modifying the list (inserting or
deleting) would take a lock first, so the deletion is just a pointer
assignment.  Any reader traversing the list (without a lock) sees
either the old pointer or the new, which is fine.

The difficulty is in making sure that no reader is still inspecting
the list element you just removed before you free it, or modify any
field that the reader would be looking at (particularly the `next'
field :).  One way of doing that is to defer the free or modification
to a quiescent point.  If you have a separate `next_free' field, you
could safely put the element on a list of elements to be freed at the
next quiescent point.

Paul.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  6:16     ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Paul Mackerras
@ 2001-10-10  6:30       ` Linus Torvalds
  0 siblings, 0 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-10-10  6:30 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linux-kernel

On Wed, 10 Oct 2001, Paul Mackerras wrote:
>
> There is an assumption that anyone modifying the list (inserting or
> deleting) would take a lock first, so the deletion is just a pointer
> assignment.  Any reader traversing the list (without a lock) sees
> either the old pointer or the new, which is fine.

Right, that's not the problem.

> The difficulty is in making sure that no reader is still inspecting
> the list element you just removed before you free it, or modify any
> field that the reader would be looking at (particularly the `next'
> field :).

..which implies _some_ sort of locking, even if it may be deferred.

The locking can be per-CPU, whatever - have a per-CPU count for "this many
traversals in progress", and have every lookup increment it before
starting, and decrement it after stopping.

Then the deleter can actually free the thing once he has seen every CPU go
down to zero, with the proper memory barriers.

Yes, I see that it can work. But it sounds like a _lot_ of complexity for
not very much gain.

Right now, you already have to have eight CPU's to see locking as a large
problem in normal life. People have normal patches to bring that easily up
to 16. Then how much hard-to-debug-with-subtle-memory-ordering-issues code
do you want to handle the few machines that aren't in that range?

I'm just trying to inject a bit of realism here.

		Linus

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  5:05   ` Linus Torvalds
  2001-10-10  5:17     ` BALBIR SINGH
  2001-10-10  6:16     ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Paul Mackerras
@ 2001-10-10  7:36     ` Paul Mackerras
  2001-10-10 15:54       ` Victor Yodaiken
  2001-10-10 11:54     ` Keith Owens
  3 siblings, 1 reply; 67+ messages in thread
From: Paul Mackerras @ 2001-10-10  7:36 UTC (permalink / raw)
  To: torvalds, linux-kernel

Linus Torvalds writes:

> And THAT is the hard part. Doing lookup without locks ends up being
> pretty much worthless, because you need the locks for the removal
> anyway, at which point the whole thing looks pretty moot.
> 
> Did I miss something?

I believe this all becomes (much more) useful when you are doing
read-copy-update.

There is an assumption that anyone modifying the list (inserting or
deleting) would take a lock first, so the deletion is just a pointer
assignment.  Any reader traversing the list (without a lock) sees
either the old pointer or the new, which is fine.

The difficulty is in making sure that no reader is still inspecting
the list element you just removed before you free it, or modify any
field that the reader would be looking at (particularly the `next'
field :).  One way of doing that is to defer the free or modification
to a quiescent point.  If you have a separate `next_free' field, you
could safely put the element on a list of elements to be freed at the
next quiescent point.

Paul.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  7:36     ` Paul Mackerras
@ 2001-10-10 15:54       ` Victor Yodaiken
  2001-10-10 21:56         ` Keith Owens
  0 siblings, 1 reply; 67+ messages in thread
From: Victor Yodaiken @ 2001-10-10 15:54 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: torvalds, linux-kernel

On Wed, Oct 10, 2001 at 05:36:18PM +1000, Paul Mackerras wrote:
> Linus Torvalds writes:
> 
> > And THAT is the hard part. Doing lookup without locks ends up being
> > pretty much worthless, because you need the locks for the removal
> > anyway, at which point the whole thing looks pretty moot.
> > 
> > Did I miss something?
> 
> I believe this all becomes (much more) useful when you are doing
> read-copy-update.
> 
> There is an assumption that anyone modifying the list (inserting or
> deleting) would take a lock first, so the deletion is just a pointer
> assignment.  Any reader traversing the list (without a lock) sees
> either the old pointer or the new, which is fine.
> 
> The difficulty is in making sure that no reader is still inspecting
> the list element you just removed before you free it, or modify any
> field that the reader would be looking at (particularly the `next'
> field :).  One way of doing that is to defer the free or modification
> to a quiescent point.  If you have a separate `next_free' field, you
> could safely put the element on a list of elements to be freed at the
> next quiescent point.


And the "next quiescent point" must be a synchronization point and that must
have spinlocks around it!
Although I kind of like the idea of
	normal operation create mess by avoiding synchronization
	when system seems idle, get BKL, and clean up. 


> 
> Paul.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10 15:54       ` Victor Yodaiken
@ 2001-10-10 21:56         ` Keith Owens
  2001-10-10 22:24           ` Victor Yodaiken
  0 siblings, 1 reply; 67+ messages in thread
From: Keith Owens @ 2001-10-10 21:56 UTC (permalink / raw)
  To: Victor Yodaiken; +Cc: Paul Mackerras, torvalds, linux-kernel

On Wed, 10 Oct 2001 09:54:36 -0600, 
Victor Yodaiken <yodaiken@fsmlabs.com> wrote:
>Although I kind of like the idea of
>	normal operation create mess by avoiding synchronization
>	when system seems idle, get BKL, and clean up. 

That does not work.  A process can read an entry from a list then
perform an operation that puts the process to sleep.  When it wakes up
again, how can it tell if the list has been changed?  How can the
cleanup code tell if any process has slept while it was traversing a
list?  An idle system does not prevent processes from sleeping at the
wrong point.

Don't even think of requiring "processes must not sleep while
traversing a lock free list".  Programmers cannot get "processes must
not sleep while holding a lock" correct and that error is much easier
to detect.

The inability for update code to know when a lock free list is being
traversed is the reason that most lock free work requires type stable
storage.  You can mark a list entry as free and put it on a free chain
but you can never unmap the storage nor use the free entry for data of
a different type.  That is the only way to guarantee that a process can
safely determine if the list has changed under it during traversal
while still allowing the process to be interrupted or to sleep.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10 21:56         ` Keith Owens
@ 2001-10-10 22:24           ` Victor Yodaiken
  2001-10-10 23:46             ` David S. Miller
  0 siblings, 1 reply; 67+ messages in thread
From: Victor Yodaiken @ 2001-10-10 22:24 UTC (permalink / raw)
  To: Keith Owens; +Cc: Victor Yodaiken, Paul Mackerras, torvalds, linux-kernel

On Thu, Oct 11, 2001 at 07:56:43AM +1000, Keith Owens wrote:
> On Wed, 10 Oct 2001 09:54:36 -0600, 
> Victor Yodaiken <yodaiken@fsmlabs.com> wrote:
> >Although I kind of like the idea of
> >	normal operation create mess by avoiding synchronization
> >	when system seems idle, get BKL, and clean up. 
> 
> That does not work.  A process can read an entry from a list then
> perform an operation that puts the process to sleep.  When it wakes up
> again, how can it tell if the list has been changed?  How can the

In general you're right, and always its better to 
reduce contention than to come up with silly algorithms for 
reducing the cost of contention, but  if you want to live
dangerously:

reader:
	atomic increment read_count
	do stuff; skip queue elements marked
		as zombie
	atomic decrement read_count


writer:
	spin lock queue
	to delete element
	mark element as zombie
	unspin

cleanup:
	spin lock queue
	if(read_count == 0){
		get big kernel lock
		if(read_count is still 0)
			// now nobody will be able to get to queue
			clean up queue
		unlock kernel
		}
	unlock spin


So you accomplished making the code harder to read and maintain, slowing
down worst case, and maybe reducing average case read spinlock delay.
For some very carefully selected cases this may be a win, but ...




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10 22:24           ` Victor Yodaiken
@ 2001-10-10 23:46             ` David S. Miller
  2001-10-11  0:24               ` Davide Libenzi
  0 siblings, 1 reply; 67+ messages in thread
From: David S. Miller @ 2001-10-10 23:46 UTC (permalink / raw)
  To: yodaiken; +Cc: kaos, paulus, torvalds, linux-kernel

   From: Victor Yodaiken <yodaiken@fsmlabs.com>
   Date: Wed, 10 Oct 2001 16:24:19 -0600

   In general you're right, and always its better to 
   reduce contention than to come up with silly algorithms for 
   reducing the cost of contention,

I want to second this and remind people that the "cost" of spinlocks
is mostly not "spinning idly waiting for lock", rather the big cost
is shuffling the dirty cacheline ownership between the processors.

Any scheme involving shared data which is written (the read counts
in the various "lockless" schemes are examples) have the same "cost"
assosciated with them.

In short, I see no performance gain from the lockless algorithms
even in the places where they can be applied.

I spent some time oogling over lockless algorithms a few years ago,
but I stopped once I realized where the true costs were.  In my view,
the lockless algorithms perhaps are a win in parallel processing
environments (in fact, the supercomputing field is where a lot of the
lockless algorithm research comes from) but not in the kinds of places
and with the kinds of data structure usage the Linux kernel has.

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10 23:46             ` David S. Miller
@ 2001-10-11  0:24               ` Davide Libenzi
  0 siblings, 0 replies; 67+ messages in thread
From: Davide Libenzi @ 2001-10-11  0:24 UTC (permalink / raw)
  To: David S. Miller; +Cc: yodaiken, kaos, paulus, torvalds, linux-kernel

On Wed, 10 Oct 2001, David S. Miller wrote:

>    From: Victor Yodaiken <yodaiken@fsmlabs.com>
>    Date: Wed, 10 Oct 2001 16:24:19 -0600
>
>    In general you're right, and always its better to
>    reduce contention than to come up with silly algorithms for
>    reducing the cost of contention,
>
> I want to second this and remind people that the "cost" of spinlocks
> is mostly not "spinning idly waiting for lock", rather the big cost
> is shuffling the dirty cacheline ownership between the processors.
>
> Any scheme involving shared data which is written (the read counts
> in the various "lockless" schemes are examples) have the same "cost"
> assosciated with them.
>
> In short, I see no performance gain from the lockless algorithms
> even in the places where they can be applied.
>
> I spent some time oogling over lockless algorithms a few years ago,
> but I stopped once I realized where the true costs were.  In my view,
> the lockless algorithms perhaps are a win in parallel processing
> environments (in fact, the supercomputing field is where a lot of the
> lockless algorithm research comes from) but not in the kinds of places
> and with the kinds of data structure usage the Linux kernel has.

Agree.
To have a complete list handling you've to end up locking or
write stressing counter variables to simulate locks.
Just only having special handled lists that removes write ops from irq
handlers can make you save a big cost of ..._irqsave()/restore().
And these can be realized by having an extra list of insert/remove ops
that is filled with no locks by irq handlers and is truncated/flushed by a
__xchg() done by readers.




- Davide



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  5:05   ` Linus Torvalds
                       ` (2 preceding siblings ...)
  2001-10-10  7:36     ` Paul Mackerras
@ 2001-10-10 11:54     ` Keith Owens
  2001-10-10 13:42       ` AIC7XXX war
  3 siblings, 1 reply; 67+ messages in thread
From: Keith Owens @ 2001-10-10 11:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Wed, 10 Oct 2001 05:05:10 +0000 (UTC), 
torvalds@transmeta.com (Linus Torvalds) wrote:
>Now, before people get all excited, what is this particular code
>actually _good_ for?
>
>Creating a lock-free list that allows insertion concurrently with lookup
>is _easy_.
>
>But what's the point? If you insert stuff, you eventually have to remove
>it. What goes up must come down. Insert-inane-quote-here.
>
>And THAT is the hard part. Doing lookup without locks ends up being
>pretty much worthless, because you need the locks for the removal
>anyway, at which point the whole thing looks pretty moot.

<pedantic>

It is possible to do completely lock free queue management, I have done
it (once).  There are several major requirements :-

* Storage must never change its type (type stable storage).  Once a
  block has been assigned as struct foo, it must always contain struct
  foo.  This can be relaxed slightly as long as the data types have a
  common header format.

  The biggest problem this causes is that storage can never be released
  and unmapped.  You never know if somebody is going to follow an old
  pointer to access storage that you freed.  You must put the storage
  on a free list for struct foo instead of unmapping it, to maintain
  the type stable storage.

* Every piece of code that traverses the data tree for structure foo
  must take a copy of each struct foo into local storage.  After taking
  a copy it must verify that its local copy is self consistent, via
  validity indicators in each struct.  The validity indicators are
  typically generation numbers, whatever you use, the indicators must
  be atomically updated and atomically read, with suitable wmb and rmb
  barriers for machines with weak memory ordering.

* After following a pointer in your local copy of struct foo to access
  another data area, you must verify that the master version of your
  local copy has not been changed.  If the master copy has changed then
  the pointer you just followed is no longer reliable.  In almost every
  case you have to start the traversal from the beginning.  It makes
  for very convoluted reader and updater code.

</pedantic>

The only reason I used lock free read and update was to hook into an
existing system that mandated sub second responses.  Some of the new
operations that were performed during list traversal and update were
subject to unbounded delays that were completely outside my control.  I
could not afford to lock the data structures during traversal because
any delay in the new operations would completely stall the existing
system, also the existing system had to be able to retrieve and delete
data for the new operations at any time.

I don't recommend doing lock free unless you have absolutely no
alternative.  It requires convoluted code, it is difficult to debug, it
is expensive on memory bandwidth (forget zero copy) and tends to be
expensive on storage as well.

If you are interested in lock free work, take a look at
http://www.cs.pitt.edu/~moir/papers.html, particularly Practical
Implementations of Non-Blocking Synchronization Primitives.  But
beware, that paper has an error which caused intermittent bugs that
took me 5 months to track down.  Section 3.3, proc Copy, line 6 should
be

6:     y := addr->data[i];

^ permalink raw reply	[flat|nested] 67+ messages in thread

* AIC7XXX
  2001-10-10 11:54     ` Keith Owens
@ 2001-10-10 13:42       ` war
  2001-10-10 21:40         ` AIC7XXX Luigi Genoni
  0 siblings, 1 reply; 67+ messages in thread
From: war @ 2001-10-10 13:42 UTC (permalink / raw)
  To: linux-kernel

I'll try the new one out, but when I tried it out when it was first
introduced into the kernel, it had a lot of errors accessing my SCSI cdrom
drives.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: AIC7XXX
  2001-10-10 13:42       ` AIC7XXX war
@ 2001-10-10 21:40         ` Luigi Genoni
  0 siblings, 0 replies; 67+ messages in thread
From: Luigi Genoni @ 2001-10-10 21:40 UTC (permalink / raw)
  To: war; +Cc: linux-kernel

I had similar problems at the beginning, but now this new driver is well
working, ate less for me.

Luigi

On Wed, 10 Oct 2001, war wrote:

> I'll try the new one out, but when I tried it out when it was first
> introduced into the kernel, it had a lot of errors accessing my SCSI cdrom
> drives.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  2:05 ` [Lse-tech] " Andrea Arcangeli
  2001-10-10  5:05   ` Linus Torvalds
@ 2001-10-10 13:24   ` Ivan Kokshaysky
  2001-10-10 13:41     ` Andrea Arcangeli
  1 sibling, 1 reply; 67+ messages in thread
From: Ivan Kokshaysky @ 2001-10-10 13:24 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Paul McKenney, Richard Henderson, linux-kernel, lse-tech,
	Jay.Estabrook, Peter Rival

On Wed, Oct 10, 2001 at 04:05:02AM +0200, Andrea Arcangeli wrote:
> So before changing any code, I would prefer to double check with the
> current alpha architects that the read dependency really isn't enough to
> enforce read ordering without the need of rmb also on the beleeding edge
> ev6/ev67/etc.. cores. So potentially as worse we'd need to redefine
> wmb() as wmbdd() (and friends) only for EV5+SMP compiles of the kernel,
> but we wouldn't be affected with any recent hardware compiling for
> EV6/EV67.  Jay, Peter, comments?

21264 Compiler Writer's Guide [appendix C] explicitly says that the
second load cannot issue if its address depends on a result of previous
load until that result is available. I refuse to believe that it isn't
true for older alphas, especially because they are strictly in-order
machines, unlike ev6.

I suspect some confusion here - probably that architect meant loads
to independent addresses. Of course, in this case mb() is required
to assure ordering.

Ivan.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10 13:24   ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Ivan Kokshaysky
@ 2001-10-10 13:41     ` Andrea Arcangeli
  0 siblings, 0 replies; 67+ messages in thread
From: Andrea Arcangeli @ 2001-10-10 13:41 UTC (permalink / raw)
  To: Ivan Kokshaysky
  Cc: Paul McKenney, Richard Henderson, linux-kernel, lse-tech,
	Jay.Estabrook, Peter Rival

On Wed, Oct 10, 2001 at 05:24:31PM +0400, Ivan Kokshaysky wrote:
> On Wed, Oct 10, 2001 at 04:05:02AM +0200, Andrea Arcangeli wrote:
> > So before changing any code, I would prefer to double check with the
> > current alpha architects that the read dependency really isn't enough to
> > enforce read ordering without the need of rmb also on the beleeding edge
> > ev6/ev67/etc.. cores. So potentially as worse we'd need to redefine
> > wmb() as wmbdd() (and friends) only for EV5+SMP compiles of the kernel,
> > but we wouldn't be affected with any recent hardware compiling for
> > EV6/EV67.  Jay, Peter, comments?
> 
> 21264 Compiler Writer's Guide [appendix C] explicitly says that the
> second load cannot issue if its address depends on a result of previous
> load until that result is available. I refuse to believe that it isn't

Fine, btw I also recall to have read something on those lines, and not
even in the 21264 manual but in the alpha reference manual that would
apply to all the chips but I didn't find it with a short lookup. Thanks
for checking!

> true for older alphas, especially because they are strictly in-order
> machines, unlike ev6.

Yes, it sounds strange. However According to Paul this would not be the
cpu but a cache coherency issue. rmb() would enforce the cache coherency
etc... so maybe the issue is related to old SMP motherboard etc... not
even to the cpus ... dunno. But as said it sounded very strange that
also new chips and new boards have such a weird reodering trouble.

> I suspect some confusion here - probably that architect meant loads
> to independent addresses. Of course, in this case mb() is required
> to assure ordering.
> 
> Ivan.


Andrea

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
@ 2001-10-10  4:43 Paul McKenney
  0 siblings, 0 replies; 67+ messages in thread
From: Paul McKenney @ 2001-10-10  4:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Peter Rival, Jay.Estabrook, linux-kernel, lse-tech,
	Richard Henderson


> The point here is that it is suprisingly that alpha needs this IPI
> unlike all other architectures. So while the IPI is certainly safe we
> wouldn't expect it to be necessary on alpha either.
>
> Now my only worry is that when you worked on this years ago with the
> alpha architects there were old chips, old caches and old machines (ev5
> maybe?).
>
> So before changing any code, I would prefer to double check with the
> current alpha architects that the read dependency really isn't enough to
> enforce read ordering without the need of rmb also on the beleeding edge
> ev6/ev67/etc.. cores. So potentially as worse we'd need to redefine
> wmb() as wmbdd() (and friends) only for EV5+SMP compiles of the kernel,
> but we wouldn't be affected with any recent hardware compiling for
> EV6/EV67.  Jay, Peter, comments?

Hello, Andrea!

This is a very good point.  I will double-check with the people I know.
I agree that it would also be very good to get an independent check.
There is clearly no use in adding expensive IPIs to the kernel except
where they are absolutely necessary!

                              Thanx, Paul


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
@ 2001-10-10  6:54 Dipankar Sarma
  0 siblings, 0 replies; 67+ messages in thread
From: Dipankar Sarma @ 2001-10-10  6:54 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel

In article <Pine.LNX.4.33.0110092236210.1305-100000@penguin.transmeta.com> Linus Torvalds wrote:

> Now, in all fairness I can imagine hacky lock-less removals too. To get
> them to work, you have to (a) change the "next" pointer to point to the
> next->next (and have some serialization between removals, but removals
> only, to make sure you don't have next->next also going away from you) and
> (b) leave the old "next" pointer (and thus the data structure) around
> until you can _prove_ that nobody is looking anything up any more, and
> that the now-defunct data structure can truly be removed.

This is exactly what Read-Copy Update does. You keep the old data
structure around until you are sure that all the CPUs have lost
references to it by context switching. There are more than
one relatively non-intrusive ways of detecting completion of such a cycle
and "deleted" data can be freed afterwards.

The details are in http://lse.sourceforge.net/locking/rcupdate.html.

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
@ 2001-10-10  7:06 Dipankar Sarma
  2001-10-10  7:21 ` BALBIR SINGH
  0 siblings, 1 reply; 67+ messages in thread
From: Dipankar Sarma @ 2001-10-10  7:06 UTC (permalink / raw)
  To: balbir.singh; +Cc: linux-kernel

In article <3BC3D9ED.6050901@wipro.com> BALBIR SINGH wrote:

> What about cases like the pci device list or any other such list. Sometimes
> you do not care if somebody added something, while you were looking through
> the list as long as you do not get illegal addresses or data.
> Wouldn't this be very useful there? Most of these lists come up
> at system startup and change rearly, but we look through them often.

How often does the linux kernel need to go through the PCI device
list ? Looking at only SCSI code, it seems that not very often.
If that is the case in general, optimizing it for lockless
lookup (assuming that you use RCU or something to support deletion),
is probably an overkill.

One example of where it is useful is maintenance of route information
in either storage or network. Route information changes infrequently but
needs to be looked up for every I/O and being able to do lockless
lookup here is a good gain.

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  7:06 Dipankar Sarma
@ 2001-10-10  7:21 ` BALBIR SINGH
  2001-10-10  9:06   ` Dipankar Sarma
  0 siblings, 1 reply; 67+ messages in thread
From: BALBIR SINGH @ 2001-10-10  7:21 UTC (permalink / raw)
  To: dipankar; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1537 bytes --]

Dipankar Sarma wrote:

>In article <3BC3D9ED.6050901@wipro.com> BALBIR SINGH wrote:
>
>>What about cases like the pci device list or any other such list. Sometimes
>>you do not care if somebody added something, while you were looking through
>>the list as long as you do not get illegal addresses or data.
>>Wouldn't this be very useful there? Most of these lists come up
>>at system startup and change rearly, but we look through them often.
>>
>
>How often does the linux kernel need to go through the PCI device
>list ? Looking at only SCSI code, it seems that not very often.
>If that is the case in general, optimizing it for lockless
>lookup (assuming that you use RCU or something to support deletion),
>is probably an overkill.
>
Almost all pci drivers use the pci_find_slot or some functionality that
requires a scan of all the pci devices (to identify their device). I agree
that it is not very often. 

>
>One example of where it is useful is maintenance of route information
>in either storage or network. Route information changes infrequently but
>needs to be looked up for every I/O and being able to do lockless
>lookup here is a good gain.
>
I have a question, can this kind of locking used in cases where an interrupt
context may be involved. For example looking through the list of timers, we
disable interrupts and grab a lock using spin_lock_irqsave(&timerlist_lock, flags)

Should we just use __cli() with the RCU or something similar? or  can RCU
be used in such cases?

Thanks,
Balbir

>
>
>Thanks
>Dipankar
>




[-- Attachment #2: Wipro_Disclaimer.txt --]
[-- Type: text/plain, Size: 853 bytes --]

----------------------------------------------------------------------------------------------------------------------
Information transmitted by this E-MAIL is proprietary to Wipro and/or its Customers and
is intended for use only by the individual or entity to which it is
addressed, and may contain information that is privileged, confidential or
exempt from disclosure under applicable law. If you are not the intended
recipient or it appears that this mail has been forwarded to you without
proper authority, you are notified that any use or dissemination of this
information in any manner is strictly prohibited. In such cases, please
notify us immediately at mailto:mailadmin@wipro.com and delete this mail
from your records.
----------------------------------------------------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  7:21 ` BALBIR SINGH
@ 2001-10-10  9:06   ` Dipankar Sarma
  0 siblings, 0 replies; 67+ messages in thread
From: Dipankar Sarma @ 2001-10-10  9:06 UTC (permalink / raw)
  To: BALBIR SINGH; +Cc: linux-kernel

On Wed, Oct 10, 2001 at 12:51:05PM +0530, BALBIR SINGH wrote:
> Dipankar Sarma wrote:
> >One example of where it is useful is maintenance of route information
> >in either storage or network. Route information changes infrequently but
> >needs to be looked up for every I/O and being able to do lockless
> >lookup here is a good gain.
> >
> I have a question, can this kind of locking used in cases where an interrupt
> context may be involved. For example looking through the list of timers, we
> disable interrupts and grab a lock using spin_lock_irqsave(&timerlist_lock, flags)

I don't know about this specific case (timer list), but in general,
yes, you can use lockless traversal with involvement of interrupt
context as long as you can make sure that you see a consistent list
if interrupted by the relevant interrupt during any point.

> 
> Should we just use __cli() with the RCU or something similar? or  can RCU
> be used in such cases?

You can use RCU without blocking the relevant interrupt as long as you
can make sure that the interrupt cannot find the list in an inconsistent
state. For example, if you insert an entry by updating a single
"next" pointer, you should be safe. Any interrupt happening before
this instruction would see the old copy of data and the ones after
the instruction would see the new copy.

As far as deletion using RCU is concerned, it is safe. If you see
the old copy of the data in the interrupt handler, that means this 
CPU was interrupted before the "deletion" was scheduled. If it is 
the new copy, you don't care.

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
@ 2001-10-10  7:58 Dipankar Sarma
  0 siblings, 0 replies; 67+ messages in thread
From: Dipankar Sarma @ 2001-10-10  7:58 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel, Paul McKenney

In article <Pine.LNX.4.33.0110092323450.1360-100000@penguin.transmeta.com> Linus Torvalds wrote:
> On Wed, 10 Oct 2001, Paul Mackerras wrote:
>> The difficulty is in making sure that no reader is still inspecting
>> the list element you just removed before you free it, or modify any
>> field that the reader would be looking at (particularly the `next'
>> field :).

> ..which implies _some_ sort of locking, even if it may be deferred.

> The locking can be per-CPU, whatever - have a per-CPU count for "this many
> traversals in progress", and have every lookup increment it before
> starting, and decrement it after stopping.

> Then the deleter can actually free the thing once he has seen every CPU go
> down to zero, with the proper memory barriers.

> Yes, I see that it can work. But it sounds like a _lot_ of complexity for
> not very much gain.

It can get really ugly since the deleter has to keep
checking periodically for zero-traversal-count. During that peiod more
deletions can happen and the resulting in the need to maintain
deleters as well.

An alternative approach is to divide the timeline
of the kernel into cycles - periods where every CPU does atleast
one context switch thereby losing reference to any pointer to
kernel data. You can now batch all the deletions prior to the
start of a period and finish them after the end of that period. Any
new deletions that happens during  such a period get batched to the
next such period. Any new traversal will see not see the older
copy of the data since the global pointer(s) was updated.

> Right now, you already have to have eight CPU's to see locking as a large
> problem in normal life. People have normal patches to bring that easily up
> to 16. Then how much hard-to-debug-with-subtle-memory-ordering-issues code
> do you want to handle the few machines that aren't in that range?

Yes, various degrees of weakness of memory ordering will make writing code
harder, but it can be dealt with be defining primitives that
behaves uniformly across different architectures. As for large machines
are concerned, I hope the answer is "yes as long as it doesn't adversely
affect the lower end" :-)

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 67+ messages in thread

[parent not found: <20011010182730.0077454b.rusty@rustcorp.com.au>]

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
       [not found] <20011010182730.0077454b.rusty@rustcorp.com.au>
@ 2001-10-10  9:36 ` Linus Torvalds
  2001-10-11  6:50   ` Rusty Russell
  0 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2001-10-10  9:36 UTC (permalink / raw)
  To: Rusty Russell; +Cc: paulus, linux-kernel

On Wed, 10 Oct 2001, Rusty Russell wrote:
>
> If noone *holds* a reference, you can remove it "sometime later",
> where "sometime later" is (for example) after every CPU has scheduled.

Ehh.. One of those readers can hold on to the thing while waiting for
something else to happen.

Looking up a data structure and copying it to user space or similar is
_the_ most common operation for any lookup. You MUST NOT free it just
because we scheduled away. Scheduling points have zero meaning in real
life.

So you'll need a reference count to actually keep such a data structure
alive _over_ a schedule. Or all the readers need to copy the data too
before they actually start using it. At which point you've made your code
a _lot_ slower than the locking version.

Yeah, I can see that your data structure can be made to work by limiting
how you can use it ("you must never hold on to a entry over a schedule,
reference-counting is a no-no, and you have to stand on your left foot
when you look at it sideways").

		Linus

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10  9:36 ` Linus Torvalds
@ 2001-10-11  6:50   ` Rusty Russell
  0 siblings, 0 replies; 67+ messages in thread
From: Rusty Russell @ 2001-10-11  6:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: paulus, linux-kernel

On Wed, 10 Oct 2001 02:36:13 -0700 (PDT)
Linus Torvalds <torvalds@transmeta.com> wrote:

> 
> On Wed, 10 Oct 2001, Rusty Russell wrote:
> >
> > If noone *holds* a reference, you can remove it "sometime later",
> > where "sometime later" is (for example) after every CPU has scheduled.
> 
> Ehh.. One of those readers can hold on to the thing while waiting for
> something else to happen.
> 
> Looking up a data structure and copying it to user space or similar is
> _the_ most common operation for any lookup.

Woah!  So now we're abandoning spinlocks for semaphores, given your
enlightened reasoning?  What are you going to call your new OS?

If you needed reference counts before, you need them still.  But you can
frequently get rid of the spinlock/rwlock protecting the list as a whole.

No, it's not possible everywhere.  But naive profiling doesn't show the
damage from grabbing a read lock, since the penalty is paid by other CPUs
(running unrelated code) as well as the one dirtying the cache, so the
penalty gets smeared across the profile.

As Dave says: the cost is not spinning for the lock, it's the cacheline,
and that's the same with a rwlock as a spinlock, and it's going to get
*worse* with new architectures.

Rusty.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
@ 2001-10-10 10:06 Dipankar Sarma
  2001-10-10 10:18 ` Linus Torvalds
  0 siblings, 1 reply; 67+ messages in thread
From: Dipankar Sarma @ 2001-10-10 10:06 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel, Paul McKenney

In article <Pine.LNX.4.33.0110100230170.1518-100000@penguin.transmeta.com> Linus Torvalds wrote:

> On Wed, 10 Oct 2001, Rusty Russell wrote:
>>
>> If noone *holds* a reference, you can remove it "sometime later",
>> where "sometime later" is (for example) after every CPU has scheduled.

> Ehh.. One of those readers can hold on to the thing while waiting for
> something else to happen.

> Looking up a data structure and copying it to user space or similar is
> _the_ most common operation for any lookup. You MUST NOT free it just
> because we scheduled away. Scheduling points have zero meaning in real
> life.

How does locking inside the kernel help you here ? You could face 
the same situation even if you protect the data by locking. 
Perhaps, I am missing something here. So, a specific example would help.

> So you'll need a reference count to actually keep such a data structure
> alive _over_ a schedule. Or all the readers need to copy the data too
> before they actually start using it. At which point you've made your code
> a _lot_ slower than the locking version.

Only relevant issue I can see here is preemption - if you hold
a reference and get preempted out it is still a context switch
and the logic of "when the data can be really deleted" must take
into account such context switches. Alternatively, you could
disable preemption while traversing such lists.

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10 10:06 Dipankar Sarma
@ 2001-10-10 10:18 ` Linus Torvalds
  2001-10-10 11:43   ` Dipankar Sarma
  2001-10-12  3:27   ` Rusty Russell
  0 siblings, 2 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-10-10 10:18 UTC (permalink / raw)
  To: Dipankar Sarma; +Cc: linux-kernel, Paul McKenney

On Wed, 10 Oct 2001, Dipankar Sarma wrote:
>
> How does locking inside the kernel help you here ? You could face
> the same situation even if you protect the data by locking.
> Perhaps, I am missing something here. So, a specific example would help.

Oh, absolutely.

In the kernel, the regular usage is that we

	lock
	lookup
	increment usage count
	unlock

and then the freeing is done with either

	if (atomic_dec_and_lock()) {
		unhash
		unlock;
		free
	}

or

	lock
	unhash
	unlock
	if (atomic_dec()) {
		free
	}

that's the basic pattern for most data structures.

Yes, there are data structures that don't have refcounts. They aren't on
any global lists either, or they are buggy.

Now, with the lockless approach, the issue is that you _cannot_ just free
the thing on next schedule. You have to honor the reference count, which
means that either you have different "bins" for all different kinds of
users of RCU data structures (different "free" functions), or they all
have the reference count in a generic place for RCU.

The latter is probably pretty doable.

> Only relevant issue I can see here is preemption - if you hold
> a reference and get preempted out it is still a context switch
> and the logic of "when the data can be really deleted" must take
> into account such context switches. Alternatively, you could
> disable preemption while traversing such lists.

You have to disable pre-emption some way. The kernel does that by default
by not pre-empting kernel space, but it's easy enough to do other ways.
And you can (have to, in fact) use cpu-local data structures for it.

Locking does the same thing.

For RCU to work, it has to do all the things that locking guarantees. It
can try to do more of them locally, but that reference count increment
_is_ going to be a global atomic update, so it's not as if you can avoid
them all.

However, it should be noted that even if you add refcounts to RCU, which
makes them useful outside of atomic readers, you'll still find that most
of the really core kernel data structures need locking anyway: locking not
only protects the list traversal, it also is used to guarantee uniqueness
which is important for a lot of the data structures - inodes, buffers,
pages, etc all absolutely depend on the fact that the data structure is
unique (ie there had better _not_ be two pages in the page cache with the
same indexes, even if one process removes an old version at the same time
as another process tries to look it up and create it if necessary).

So I'm still at a loss for what this could actually be _used_. IP routing?
Certainly not sockets (which have uniqueness requirements).

		Linus

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10 10:18 ` Linus Torvalds
@ 2001-10-10 11:43   ` Dipankar Sarma
  2001-10-12  3:27   ` Rusty Russell
  1 sibling, 0 replies; 67+ messages in thread
From: Dipankar Sarma @ 2001-10-10 11:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Paul McKenney

On Wed, Oct 10, 2001 at 03:18:23AM -0700, Linus Torvalds wrote:
> 
> On Wed, 10 Oct 2001, Dipankar Sarma wrote:
> >
> > How does locking inside the kernel help you here ? You could face
> > the same situation even if you protect the data by locking.
> > Perhaps, I am missing something here. So, a specific example would help.
> 
> Oh, absolutely.
> 
> In the kernel, the regular usage is that we
> 
> 	lock
> 	lookup
> 	increment usage count
> 	unlock
> 
> and then the freeing is done with either
> 
> 	if (atomic_dec_and_lock()) {
> 		unhash
> 		unlock;
> 		free
> 	}
> 
> that's the basic pattern for most data structures.
> 
> Yes, there are data structures that don't have refcounts. They aren't on
> any global lists either, or they are buggy.

Ok, I get it now. You are talking about the situation where we
may not be able to update *all* the global pointers to the
data before they are "deferred deleted". I think once such example is the
ipv4 route cache dst entry we have to honor the refcounts.

> 
> Now, with the lockless approach, the issue is that you _cannot_ just free
> the thing on next schedule. You have to honor the reference count, which
> means that either you have different "bins" for all different kinds of
> users of RCU data structures (different "free" functions), or they all
> have the reference count in a generic place for RCU.
> 
> The latter is probably pretty doable.

Agreed. One approach I took for my experimental ipv4 route
cache code was to do "deferred delete" *after* refcount becomes
zero. I am not sure whether this is possible in all scenerios, but
the my understanding is that if you remove the global pointers
(like hash table links) to data to prevent new references and
do a "deferred delete" after refcount becomes zero, it would work.
Of course, I am in no position to make a generalized comment on this.


> > Only relevant issue I can see here is preemption - if you hold
> > a reference and get preempted out it is still a context switch
> > and the logic of "when the data can be really deleted" must take
> > into account such context switches. Alternatively, you could
> > disable preemption while traversing such lists.
> 
> You have to disable pre-emption some way. The kernel does that by default
> by not pre-empting kernel space, but it's easy enough to do other ways.
> And you can (have to, in fact) use cpu-local data structures for it.

That is what the preemption patch allows (atleast when I last looked
at it).

> However, it should be noted that even if you add refcounts to RCU, which
> makes them useful outside of atomic readers, you'll still find that most
> of the really core kernel data structures need locking anyway: locking not
> only protects the list traversal, it also is used to guarantee uniqueness
> which is important for a lot of the data structures - inodes, buffers,
> pages, etc all absolutely depend on the fact that the data structure is
> unique (ie there had better _not_ be two pages in the page cache with the
> same indexes, even if one process removes an old version at the same time
> as another process tries to look it up and create it if necessary).

Uniqueness can be ensured even with RCU, albeit it is not very
straitforward. One common technique we used was to mark elements
deleted *before* updating global pointers. However this usually
requires you to lock the element (not the list itself) while
reading its contents.

> 
> So I'm still at a loss for what this could actually be _used_. IP routing?
> Certainly not sockets (which have uniqueness requirements).

Yes. We used it in IP routing in DYNIX/ptx back then at Sequent as well
as for Multipath I/O routes for storage. That is all I can remember,
but Paul will remember and know more about it. Paul ?

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10 10:18 ` Linus Torvalds
  2001-10-10 11:43   ` Dipankar Sarma
@ 2001-10-12  3:27   ` Rusty Russell
  2001-10-12 16:56     ` Linus Torvalds
  1 sibling, 1 reply; 67+ messages in thread
From: Rusty Russell @ 2001-10-12  3:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: dipankar, linux-kernel, paul.mckenney

On Wed, 10 Oct 2001 03:18:23 -0700 (PDT)
Linus Torvalds <torvalds@transmeta.com> wrote:

> 	lock
> 	unhash
> 	unlock
> 	if (atomic_dec()) {
> 		free
> 	}

And the read side is:
	lock
	loopup
	atomic_inc
	unlock

With RCU, read side is:
	loopup
	atomic_inc

Write is:
	lock /* protect against other writes */
	unhash /* be careful with wmb()s here... */
	unlock
	call_rcu(if (atomic_dec()) free) /* ie. does this later */

Clear?
Rusty.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-12  3:27   ` Rusty Russell
@ 2001-10-12 16:56     ` Linus Torvalds
  2001-10-12 18:53       ` Dipankar Sarma
  2001-10-13  7:25       ` Rusty Russell
  0 siblings, 2 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-10-12 16:56 UTC (permalink / raw)
  To: Rusty Russell; +Cc: dipankar, linux-kernel, paul.mckenney

On Fri, 12 Oct 2001, Rusty Russell wrote:
>
> And the read side is:
> 	lock
> 	loopup
> 	atomic_inc
> 	unlock
>
> With RCU, read side is:
> 	loopup
> 	atomic_inc

Yes. With maybe

	non_preempt()
	..
	preempt()

around it for the pre-emption patches.

However, you also need to make your free _free_ be aware of the count.
Which means that the current RCU patch is really unusable for this. You
need to have the "count" always in a generic place (put it with the hash),
and your schedule-time free needs to do

	if (atomic_read(&count))
		skip_this_do_it_next_time

which starts getting complicated (it means your RCU free now has to have a
notion of "next time" - just leaving the RCU active will slow down
scheduling for as long as any reader holds on to an entry). So your
unread() path probably has to be

	if (atomic_dec_and_test(&count))
		free_it()

and the act of hashing should add a count and unhashing should delete a
count (so that the reader doesn't free it while it is hashed).

Do that, and the RCU patches may start looking usable for the real world.

		Linus

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-12 16:56     ` Linus Torvalds
@ 2001-10-12 18:53       ` Dipankar Sarma
  2001-10-13  7:25       ` Rusty Russell
  1 sibling, 0 replies; 67+ messages in thread
From: Dipankar Sarma @ 2001-10-12 18:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rusty Russell, linux-kernel, paul.mckenney

On Fri, Oct 12, 2001 at 09:56:58AM -0700, Linus Torvalds wrote:
> 
> Yes. With maybe
> 
> 	non_preempt()
> 	..
> 	preempt()
> 
> around it for the pre-emption patches.

Yes. While in the read side, preemption will have to be disabled
to prevent local pointers being carried across context switches.
It isn't impossible to allow preemption, but that makes the
quiescent point detection logic much more complicated.

> 
> However, you also need to make your free _free_ be aware of the count.
> Which means that the current RCU patch is really unusable for this. You
> need to have the "count" always in a generic place (put it with the hash),
> and your schedule-time free needs to do
> 
> 	if (atomic_read(&count))
> 		skip_this_do_it_next_time
> 
> which starts getting complicated (it means your RCU free now has to have a
> notion of "next time" - just leaving the RCU active will slow down
> scheduling for as long as any reader holds on to an entry). So your
> unread() path probably has to be
> 
> 	if (atomic_dec_and_test(&count))
> 		free_it()
> 
> and the act of hashing should add a count and unhashing should delete a
> count (so that the reader doesn't free it while it is hashed).

Perhaps I am missing something here but shouldn't the refcount based 
schemes anyway have to do this with or without RCU ? If you do

	unhash
	if (atomic_dec_and_test(&count))
		free_it();

the isn't it that if refcount is not 0, sometime later it will have to be 
cleaned up by some garbage collection scheme ? Whatever that scheme is, 
it still needs to be made certain that the element is not back in the 
hash table.  It seems to me that with RCU the same logic can be made use of.
So instead of doing 

	if (atomic_dec_and_test(&count))
		rcu_free_it()

you may do 

	if (atomic_dec_and_test(&count))
		free_it();

where in free_it()

	if (atomic_read(&count))
		return;
	rcu_free_it();

Of course, there may be more complicated freeing schemes for which
we may need additional logic just like you suggested.

> 
> Do that, and the RCU patches may start looking usable for the real world.
> 

One RCU example for refcount + hash table is at
http://lse.sourceforge.net/locking/patches/rt_rcu-2.4.6-02.patch
(ipv4 route cache). 

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-12 16:56     ` Linus Torvalds
  2001-10-12 18:53       ` Dipankar Sarma
@ 2001-10-13  7:25       ` Rusty Russell
  1 sibling, 0 replies; 67+ messages in thread
From: Rusty Russell @ 2001-10-13  7:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: dipankar, linux-kernel, paul.mckenney

In message <Pine.LNX.4.33.0110120948540.31692-100000@penguin.transmeta.com> you
 write:
> Yes. With maybe
> 
> 	non_preempt()
> 	..
> 	preempt()
> 
> around it for the pre-emption patches.

Sure, if they want pre-emption on SMP.  Of course, then they'll want
priority inheritence.

> However, you also need to make your free _free_ be aware of the count.
> Which means that the current RCU patch is really unusable for this. You
> need to have the "count" always in a generic place (put it with the hash),
> and your schedule-time free needs to do
> 
> 	if (atomic_read(&count))
> 		skip_this_do_it_next_time

WTF?  I'll spell it out for you again:

	static inline void foo_put(struct foo *foo)
	{
		if (atomic_dec_and_test(foo->use))
			kfree(foo);
	}

Write side normal:

	lock
	unhash(foo)
	unlock
	foo_put(foo)

Write side RCU:

	lock
	unhash(foo)
	unlock
	rcu_call(foo_put, foo);
		/* ie. call foo_put(foo) "later". */

That's all.  Really.

> Do that, and the RCU patches may start looking usable for the real world.

I know you're under strain, but think harder please.

Rusty.
--
Premature optmztion is rt of all evl. --DK

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
@ 2001-10-10 15:24 Paul McKenney
  2001-10-10 16:58 ` Andrea Arcangeli
  0 siblings, 1 reply; 67+ messages in thread
From: Paul McKenney @ 2001-10-10 15:24 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Peter Rival, Ivan Kokshaysky, Jay.Estabrook, linux-kernel,
	lse-tech, Richard Henderson, cardoza, woodward


> > true for older alphas, especially because they are strictly in-order
> > machines, unlike ev6.
>
> Yes, it sounds strange. However According to Paul this would not be the
> cpu but a cache coherency issue. rmb() would enforce the cache coherency
> etc... so maybe the issue is related to old SMP motherboard etc... not
> even to the cpus ... dunno. But as said it sounded very strange that
> also new chips and new boards have such a weird reodering trouble.

It sounded strange to me, too.  ;-)  And my first reading of the
Alpha AXP Archtecture RM didn't help me much.

It was indeed a cache coherency issue.  The architect I talked to felt
that it was a feature rather than a bug.  I have an email in to him.
In the meantime, Compaq's patent #6,108,737 leads me to believe that
others in DEC/Compaq also believe it to be a feature.  The paragraph
starting at column 2 line 20 of the body of this patent states:

     In a weakly-ordered system, an order is imposed between selected
     sets of memory reference operations, while other operations are
     considered unordered.  One or more MB instructions are used to
     indicate the required order.  In the case of an MB instruction
     defined by the Alpha (R) 21264 processor instruction set, the MB
     denotes that all memory reference instructions above the MB (i.e.,
     pre-MB instructions) are ordered before all reference instructions
     after the MB (i.e., post-MB instructions).  However, no order
     is required between reference instructions that are not separated
     by an MB.

(The patent talks about the WMB instruction later on.)

In other words, if there is no MB, the CPU is not required to maintain
ordering.  Regardless of data dependencies or anything else.

There is also an application note at

     http://www.openvms.compaq.com/wizard/wiz_2637.html

which states:

     For instance, your producer must issue a "memory barrier" instruction
     after writing the data to shared memory and before inserting it on
     the queue; likewise, your consumer must issue a memory barrier
     instruction after removing an item from the queue and before reading
     from its memory.  Otherwise, you risk seeing stale data, since,
     while the Alpha processor does provide coherent memory, it does
     not provide implicit ordering of reads and writes.  (That is, the
     write of the producer's data might reach memory after the write of
     the queue, such that the consumer might read the new item from the
     queue but get the previous values from the item's memory.

Note that they require a memory barrier (rmb()) between the time the
item is removed from the queue and the time that the data in the item
is referenced, despite the fact that there is a data dependency between
the dequeueing and the dereferencing.  So, again, data dependency does
-not- substitute for an MB on Alpha.

Comments from DEC/Compaq people who know Alpha architecture details?

                              Thanx, Paul

> > I suspect some confusion here - probably that architect meant loads
> > to independent addresses. Of course, in this case mb() is required
> > to assure ordering.
> >
> > Ivan.
>
>
> Andrea
>


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10 15:24 Paul McKenney
@ 2001-10-10 16:58 ` Andrea Arcangeli
  2001-10-10 17:25   ` Linus Torvalds
  0 siblings, 1 reply; 67+ messages in thread
From: Andrea Arcangeli @ 2001-10-10 16:58 UTC (permalink / raw)
  To: Paul McKenney
  Cc: Peter Rival, Ivan Kokshaysky, Jay.Estabrook, linux-kernel,
	lse-tech, Richard Henderson, cardoza, woodward

On Wed, Oct 10, 2001 at 08:24:11AM -0700, Paul McKenney wrote:
> which states:
> 
>      For instance, your producer must issue a "memory barrier" instruction
>      after writing the data to shared memory and before inserting it on
>      the queue; likewise, your consumer must issue a memory barrier
>      instruction after removing an item from the queue and before reading
>      from its memory.  Otherwise, you risk seeing stale data, since,
>      while the Alpha processor does provide coherent memory, it does
>      not provide implicit ordering of reads and writes.  (That is, the
>      write of the producer's data might reach memory after the write of
>      the queue, such that the consumer might read the new item from the
>      queue but get the previous values from the item's memory.
> 
> Note that they require a memory barrier (rmb()) between the time the
> item is removed from the queue and the time that the data in the item
> is referenced, despite the fact that there is a data dependency between
> the dequeueing and the dereferencing.  So, again, data dependency does
> -not- substitute for an MB on Alpha.

This is very explicit indeed.

In short I guess what happens is that the reader may have old data in
its cache, and the rmb() makes sure the cache used after the rmb() is
not older than the cache used before the rmb().

However the more I think about it the more I suspect we'd better use
rmb() in all readers in the common code, rather than defining rmbdd with
IPI on alpha, maybe alpha after all isn't the only one that needs the
rmb() but it's the only one defining the behaviour in detail? I can very
well imagine other archs also not invalidating all caches of all other
cpus after a write followed by wmb, the synchronization can be delayed
safely if the other cpus are readers, and they didn't issued an explicit
rmb. So what alpha is doing can really be seen as a "feature" that
allows to delay synchronization of caches after writes and wmb, unless
an explicit order is requested by the reader via rmb (or better mb on alpha).

The fact is that if there's old data in cache, a data dependency isn't
different than non data dependency if the cpu caches aren't synchronized
or invalidated during wmb run by the producer.

Andrea

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10 16:58 ` Andrea Arcangeli
@ 2001-10-10 17:25   ` Linus Torvalds
  2001-10-12  5:06     ` Rusty Russell
  0 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2001-10-10 17:25 UTC (permalink / raw)
  To: linux-kernel

In article <20011010185848.D726@athlon.random>,
Andrea Arcangeli  <andrea@suse.de> wrote:
>
>However the more I think about it the more I suspect we'd better use
>rmb() in all readers in the common code

Absolutely.  It's not that expensive an operation on sane hardware.  And
it's definitely conceptually the only right thing to do - we're saying
that we're doing a read that depends on a previous read having seen
previous memory.  Ergo, "rmb()". 

Of course, right now Linux only exports a subset of the potential memory
barriers, and maybe we should export a fuller set - allowing CPU's that
have stricter ordering to possibly make it a no-op.  But thinking about
even something like x86, I don't see where Intel would guarantee that
two reads (data-dependent or not) would have some implicit memory
ordering. 

Re-ordering reads with data dependencies is hard, but it happens quite
naturally in a CPU that does address speculation. I don't know of
anybody who does that, but I bet _somebody_ will. Maybe even the P4?

			Linus

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-10 17:25   ` Linus Torvalds
@ 2001-10-12  5:06     ` Rusty Russell
  2001-10-12 16:28       ` Linus Torvalds
  0 siblings, 1 reply; 67+ messages in thread
From: Rusty Russell @ 2001-10-12  5:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Wed, 10 Oct 2001 17:25:22 +0000 (UTC)
torvalds@transmeta.com (Linus Torvalds) wrote:

> Absolutely.  It's not that expensive an operation on sane hardware.  And
> it's definitely conceptually the only right thing to do - we're saying
> that we're doing a read that depends on a previous read having seen
> previous memory.  Ergo, "rmb()". 

Accessing pointer contents after you dereference the pointer is "obvious":
we've been trying to get Richard to understand the problem for FIVE MONTHS,
and he's not stupid!

The PPC manual (thanks Paul M) clearly indicates rmbdd() is not neccessary.
That they mention it explicitly suggests it's going to happen on more
architectures: you are correct, we should sprinkle rmbdd() everywhere
(rmb() is heavy on current PPC) and I'll update the Kernel Locking Guide now
the rules have changed.[1]

Rusty.
[1] Aren't we lucky our documentation is so sparse noone can accuse us of being
    inconsistent? 8)

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-12  5:06     ` Rusty Russell
@ 2001-10-12 16:28       ` Linus Torvalds
  2001-10-12 19:50         ` Al Dunsmuir
                           ` (2 more replies)
  0 siblings, 3 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-10-12 16:28 UTC (permalink / raw)
  To: Rusty Russell; +Cc: linux-kernel

On Fri, 12 Oct 2001, Rusty Russell wrote:
>
> The PPC manual (thanks Paul M) clearly indicates rmbdd() is not neccessary.
> That they mention it explicitly suggests it's going to happen on more
> architectures: you are correct, we should sprinkle rmbdd() everywhere
> (rmb() is heavy on current PPC) and I'll update the Kernel Locking Guide now
> the rules have changed.[1]

I would suggest re-naming "rmbdd()". I _assume_ that "dd" stands for "data
dependent", but quite frankly, "rmbdd" looks like the standard IBM "we
lost every vowel ever invented" kind of assembly lanaguage to me.

I'm sure that having programmed PPC assembly language, you find it very
natural (IBM motto: "We found five vowels hiding in a corner, and we used
them _all_ for the 'eieio' instruction so that we wouldn't have to use
them anywhere else").

But for us normal people, contractions that have more than 3 letters are
hard to remember. I wouldn't mind making the other memory barriers more
descriptive either, but with something like "mb()" at least you don't have
to look five times to make sure you got all the letters in the right
order..

(IBM motto: "If you can't read our assembly language, you must be
borderline dyslexic, and we don't want you to mess with it anyway").

So how about just going all the way and calling it what it is:
"data_dependent_read_barrier()", with a notice in the PPC docs about how
the PPC cannot speculate reads anyway, so it's a no-op.

(IBM motto: "TEN vowels? Don't you know vowels are scrd?")

And hey, if you want to, feel free to create the regular

	#define read_barrier()		rmb()
	#define write_barrier()		wmb()
	#define memory_barrier()	mb()

too. It's not as if we should ever have that many of them, and when we
_do_ have them, they might as well stand out to make it clear that we're
doing subtle things.. Although that "data-dependent read barrier" is a lot
more subtle than most.

		Linus

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-12 16:28       ` Linus Torvalds
@ 2001-10-12 19:50         ` Al Dunsmuir
  2001-10-13  1:07         ` Paul Mackerras
  2001-10-13  7:38         ` Rusty Russell
  2 siblings, 0 replies; 67+ messages in thread
From: Al Dunsmuir @ 2001-10-12 19:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel@vger.kernel.org

On Fri, 12 Oct 2001 09:28:07 -0700 (PDT), Linus Torvalds wrote:
>I'm sure that having programmed PPC assembly language, you find it very
>natural (IBM motto: "We found five vowels hiding in a corner, and we used
>them _all_ for the 'eieio' instruction so that we wouldn't have to use
>them anywhere else").

More likely a bad pun on the children's song "Old McDonald had a farm".

I have no insider knowledge, but I've worked for IBM (on S/390) for 22 years,
_am_ dyslexic, and enjoy bad puns <GRIN>

Al Dunsmuir


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-12 16:28       ` Linus Torvalds
  2001-10-12 19:50         ` Al Dunsmuir
@ 2001-10-13  1:07         ` Paul Mackerras
  2001-10-13  1:54           ` Davide Libenzi
  2001-10-13  2:00           ` Linus Torvalds
  2001-10-13  7:38         ` Rusty Russell
  2 siblings, 2 replies; 67+ messages in thread
From: Paul Mackerras @ 2001-10-13  1:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds writes:

> So how about just going all the way and calling it what it is:
> "data_dependent_read_barrier()", with a notice in the PPC docs about how
> the PPC cannot speculate reads anyway, so it's a no-op.

To set the record straight, the PPC architecture spec says that
implementations *can* speculate reads, but they have to make it look
as if dependent loads have a read barrier between them.

And it isn't speculated reads that are the problem in the alpha case,
it's the fact that the cache can reorder invalidations that are
received from the bus.  That's why you can read the new value of p but
the old value of *p on one processor after another processor has just
done something like a = 1; wmb(); p = &a.

My impression from what Paul McKenney was saying was that on most
modern architectures other than alpha, dependent loads act as if they
have a read barrier between them.  What we need to know is which
architectures specify that behaviour in their architecture spec, as
against those which do that today but which might not do it tomorrow.

I just looked at the SPARC V9 specification; it has a formal
definition of the memory model which precludes reordering dependent
loads (once again this is an "as if" rule).  So on ppc and sparc64 we
have an assurance that we won't need an rmb() between dependent loads
in the future.

As for intel x86, is there a architecture spec that talks about things
like memory ordering?  My impression is that the x86 architecture is
pretty much defined by its implementations but I could well be wrong.

> too. It's not as if we should ever have that many of them, and when we
> _do_ have them, they might as well stand out to make it clear that we're
> doing subtle things.. Although that "data-dependent read barrier" is a lot
> more subtle than most.

Indeed...  getting the new p but the old *p is quite
counter-intuitive IMHO.

Paul.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-13  1:07         ` Paul Mackerras
@ 2001-10-13  1:54           ` Davide Libenzi
  2001-10-13  2:04             ` Linus Torvalds
  2001-10-13  2:00           ` Linus Torvalds
  1 sibling, 1 reply; 67+ messages in thread
From: Davide Libenzi @ 2001-10-13  1:54 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Linus Torvalds, linux-kernel

On Sat, 13 Oct 2001, Paul Mackerras wrote:

> Linus Torvalds writes:
>
> > So how about just going all the way and calling it what it is:
> > "data_dependent_read_barrier()", with a notice in the PPC docs about how
> > the PPC cannot speculate reads anyway, so it's a no-op.
>
> To set the record straight, the PPC architecture spec says that
> implementations *can* speculate reads, but they have to make it look
> as if dependent loads have a read barrier between them.
>
> And it isn't speculated reads that are the problem in the alpha case,
> it's the fact that the cache can reorder invalidations that are
> received from the bus.  That's why you can read the new value of p but
> the old value of *p on one processor after another processor has just
> done something like a = 1; wmb(); p = &a.

I think that necessary condition to have a reordering of bus sent
invalidations is to have a partitioned cache architecture.
I don't see any valid reason for a cache controller of a linear cache
architecture to reorder an invalidation stream coming from a single cpu.
If the invalidation sequence for cpu1 ( in a linear cache architecture )
is 1a, 1b, 1c, ... this case can happen :

1a, 2a, 1b, 2b, 2c, 1c, ...
|       |           |
.........................

but not this :

1a, 2a, 2b, 1c, 2c, 1b, ...
|           |       |
...............><........

> My impression from what Paul McKenney was saying was that on most
> modern architectures other than alpha, dependent loads act as if they
> have a read barrier between them.  What we need to know is which
> architectures specify that behaviour in their architecture spec, as
> against those which do that today but which might not do it tomorrow.

Uhmm, an architecture that with  a = *p;  schedule the load of  *p  before
the load of  p  sounds screwy to me.
The problem is that even if cpu1 schedule the load of  p  before the
load of  *p  and cpu2 does  a = 1; wmb(); p = &a; , it could happen that
even if from cpu2 the invalidation stream exit in order, cpu1 could see
the value of  p  before the value of  *p  due a reordering done by the
cache controller delivering the stream to cpu1.

- Davide

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-13  1:54           ` Davide Libenzi
@ 2001-10-13  2:04             ` Linus Torvalds
  2001-10-13  2:31               ` Davide Libenzi
  2001-10-13  2:49               ` Paul Mackerras
  0 siblings, 2 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-10-13  2:04 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Paul Mackerras, linux-kernel


On Fri, 12 Oct 2001, Davide Libenzi wrote:
>
> The problem is that even if cpu1 schedule the load of  p  before the
> load of  *p  and cpu2 does  a = 1; wmb(); p = &a; , it could happen that
> even if from cpu2 the invalidation stream exit in order, cpu1 could see
> the value of  p  before the value of  *p  due a reordering done by the
> cache controller delivering the stream to cpu1.

Umm - if that happens, your cache controller isn't honouring the wmb(),
and you have problems quite regardless of any load ordering on _any_ CPU.

Ehh?

		Linus


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-13  2:04             ` Linus Torvalds
@ 2001-10-13  2:31               ` Davide Libenzi
  2001-10-13  2:46                 ` Davide Libenzi
  2001-10-13  3:30                 ` Linus Torvalds
  2001-10-13  2:49               ` Paul Mackerras
  1 sibling, 2 replies; 67+ messages in thread
From: Davide Libenzi @ 2001-10-13  2:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Davide Libenzi, Paul Mackerras, linux-kernel

On Fri, 12 Oct 2001, Linus Torvalds wrote:

>
> On Fri, 12 Oct 2001, Davide Libenzi wrote:
> >
> > The problem is that even if cpu1 schedule the load of  p  before the
> > load of  *p  and cpu2 does  a = 1; wmb(); p = &a; , it could happen that
> > even if from cpu2 the invalidation stream exit in order, cpu1 could see
> > the value of  p  before the value of  *p  due a reordering done by the
> > cache controller delivering the stream to cpu1.
>
> Umm - if that happens, your cache controller isn't honouring the wmb(),
> and you have problems quite regardless of any load ordering on _any_ CPU.
>
> Ehh?

I'm searching the hp-parisc doc about it but it seems that even Paul
McKenney pointed out this.
Suppose that  p  and  *p  are on two different cache partitions and the
invalidation order that comes from the wmb() cpu is 1) *p  2) p
Suppose that the partition when  *p  lies is damn busy and the one where
p  lies is free.
The reader cpu could pickup the value of  p  before the value of  *p  by
reading the old value of  a
The barrier on the reader side should establish a checkpoint that enforce
the commit of  *p  before  p

- Davide

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-13  2:31               ` Davide Libenzi
@ 2001-10-13  2:46                 ` Davide Libenzi
  2001-10-13  3:30                 ` Linus Torvalds
  1 sibling, 0 replies; 67+ messages in thread
From: Davide Libenzi @ 2001-10-13  2:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Mackerras, lkml

On Fri, 12 Oct 2001, Davide Libenzi wrote:

> On Fri, 12 Oct 2001, Linus Torvalds wrote:
>
> >
> > On Fri, 12 Oct 2001, Davide Libenzi wrote:
> > >
> > > The problem is that even if cpu1 schedule the load of  p  before the
> > > load of  *p  and cpu2 does  a = 1; wmb(); p = &a; , it could happen that
> > > even if from cpu2 the invalidation stream exit in order, cpu1 could see
> > > the value of  p  before the value of  *p  due a reordering done by the
> > > cache controller delivering the stream to cpu1.
> >
> > Umm - if that happens, your cache controller isn't honouring the wmb(),
> > and you have problems quite regardless of any load ordering on _any_ CPU.
> >
> > Ehh?
>
> I'm searching the hp-parisc doc about it but it seems that even Paul
> McKenney pointed out this.

ops, alpha



- Davide



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-13  2:31               ` Davide Libenzi
  2001-10-13  2:46                 ` Davide Libenzi
@ 2001-10-13  3:30                 ` Linus Torvalds
  1 sibling, 0 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-10-13  3:30 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Paul Mackerras, linux-kernel


On Fri, 12 Oct 2001, Davide Libenzi wrote:
>
> Suppose that  p  and  *p  are on two different cache partitions and the
> invalidation order that comes from the wmb() cpu is 1) *p  2) p
> Suppose that the partition when  *p  lies is damn busy and the one where
> p  lies is free.
> The reader cpu could pickup the value of  p  before the value of  *p  by
> reading the old value of  a

Ahh.. I misunderstood. You are arguing for the rmb() even if the CPU
doesn't speculate addresses for loads. Yes, I agree.

		Linus


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-13  2:04             ` Linus Torvalds
  2001-10-13  2:31               ` Davide Libenzi
@ 2001-10-13  2:49               ` Paul Mackerras
  1 sibling, 0 replies; 67+ messages in thread
From: Paul Mackerras @ 2001-10-13  2:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds writes:

> On Fri, 12 Oct 2001, Davide Libenzi wrote:
> >
> > The problem is that even if cpu1 schedule the load of  p  before the
> > load of  *p  and cpu2 does  a = 1; wmb(); p = &a; , it could happen that
> > even if from cpu2 the invalidation stream exit in order, cpu1 could see
> > the value of  p  before the value of  *p  due a reordering done by the
> > cache controller delivering the stream to cpu1.
> 
> Umm - if that happens, your cache controller isn't honouring the wmb(),
> and you have problems quite regardless of any load ordering on _any_ CPU.

Well yes.  But this is what happens on alpha apparently.

On alpha, it seems that the wmb only has an effect on the cache of the
processor doing the writes, it doesn't affect any other caches.  The
wmb ensures that the invalidates from the two writes go out onto the
bus in the right order, but the wmb itself doesn't go on the bus.
Thus the invalidates can get processed in the reverse order by a
receiving cache.  I presume that an rmb() on alpha must wait for all
outstanding invalidates to be processed by the cache.  But I'm not an
expert on the alpha by any means.

Paul.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-13  1:07         ` Paul Mackerras
  2001-10-13  1:54           ` Davide Libenzi
@ 2001-10-13  2:00           ` Linus Torvalds
  1 sibling, 0 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-10-13  2:00 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linux-kernel

On Sat, 13 Oct 2001, Paul Mackerras wrote:
>
> As for intel x86, is there a architecture spec that talks about things
> like memory ordering?  My impression is that the x86 architecture is
> pretty much defined by its implementations but I could well be wrong.

It is discussed in the multi-procesor management section, under "memory
ordering", and it does say that "reads can be carried out specilatively
and in any order".

HOWEVER, it does have what Intel calles "processor order", and it claims
that "writes by a single processor are observed in the same order by all
processors."

Which implies that if the _same_ CPU wrote *p and p, there is apparently
an ordering constraint there. But I don't think that's the case here. So
the x86 apparently needs a rmb().

(But Intel has redefined the memory ordering so many times that they might
redefine it in the future too and say that dependent loads are ok. I
suspect most of the definitions are of the type "Oh, it used to be ok in
the implementation even though it wasn't defined, and it turns out that
Windows doesn't work if we change it, so we'll define darkness to be the
new standard"..)

> Indeed...  getting the new p but the old *p is quite
> counter-intuitive IMHO.

I disagree. I think the alpha has it right - it says that if you care
about ordering, you have to tell so. No ifs, no buts, no special cases
("Oh, dependent loads are special").

Of course, I used to absolutely _hate_ the x86 ordering rules just because
they are so ad-hoc. But I'm getting more and more used to them, and the
Intel "write ordered with store-buffer forwarding" model ends up being
really nice for locking ;)

		Linus

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-12 16:28       ` Linus Torvalds
  2001-10-12 19:50         ` Al Dunsmuir
  2001-10-13  1:07         ` Paul Mackerras
@ 2001-10-13  7:38         ` Rusty Russell
  2 siblings, 0 replies; 67+ messages in thread
From: Rusty Russell @ 2001-10-13  7:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: dipankar, linux-kernel, paul.mckenney

In message <Pine.LNX.4.33.0110120919130.31677-100000@penguin.transmeta.com> you
 write:
> And hey, if you want to, feel free to create the regular
> 
> 	#define read_barrier()		rmb()
> 	#define write_barrier()		wmb()
> 	#define memory_barrier()	mb()

I agree... read_barrier_depends() then?

Rusty.
--
Premature optmztion is rt of all evl. --DK

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
@ 2001-10-10 16:00 Paul McKenney
  0 siblings, 0 replies; 67+ messages in thread
From: Paul McKenney @ 2001-10-10 16:00 UTC (permalink / raw)
  To: dipankar; +Cc: linux-kernel, Linus Torvalds


> > So I'm still at a loss for what this could actually be _used_. IP
routing?
> > Certainly not sockets (which have uniqueness requirements).
>
> Yes. We used it in IP routing in DYNIX/ptx back then at Sequent as well
> as for Multipath I/O routes for storage. That is all I can remember,
> but Paul will remember and know more about it. Paul ?

Hello!

RCU was used in the following portions of PTX:

o    Distributed lock manager: recovery, lists of callbacks used to
     report completions and error conditions to user processes, and
     lists of server and client lock data structures.

o    TCP/IP: routing tables, interface tables, and protocol-control-block
     lists.  As Linus suspected, RCU was -not- applied to the socket
     data structures.  ;-)

o    Storage-area network (SAN): routing tables and error-injection
     tables (used for stress testing).

o    Clustered journalling file systems: in-core inode lists and
     distributed-locking data structures.

o    Lock-contention measurement: data structure used to map from
     spinlock addresses to the corresponding measurement data.  (This
     was needed since PTX locks are one byte long, and you can't put
     much extra data into one byte.)

o    Application regions manager (which is a workload-management
     subsystem): maintains a list of "regions" into which processes
     may be confined.

o    Process management: per-process system-call tables and MP trace
     data structures used for debuggers that handle multi-threaded
     processes.

o    LAN drivers to resolve races between shutting down a LAN device
     and packets being received by that device.  (This use is in many
     ways similar to that of Rusty's, Anton's, and Greg's hotplug
     patch).

PTX used RCU on an as-needed basis.  K42 made more
pervasive use of a very similar technique.

RCU is definitely -not- a wholesale replacement for all locking.
For example, as Dipankar noted, it can be integrated with reference-count
schemes.  It -can- be used to replace reference counts, but only in
cases where no task blocks while holding a reference.

RCU works best in read-mostly data structures.  The most common use
is to allow lock-free dereferencing of pointers, for example, removing
the lock on a -list-, keeping locking/reference counts only on the
elements themselves.

In addition, when moving from one element in a list to the next, RCU
allows you to drop the refcnt/lock of the preceding element -before-
traversing the pointer to the next element.  This can make the
traversal code a bit simpler.

                              Thanx, Paul


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
@ 2001-10-10 21:44 Paul McKenney
  0 siblings, 0 replies; 67+ messages in thread
From: Paul McKenney @ 2001-10-10 21:44 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: cardoza, Peter Rival, Ivan Kokshaysky, Jay.Estabrook,
	linux-kernel, lse-tech, Richard Henderson, woodward

> This is very explicit indeed.
>
> In short I guess what happens is that the reader may have old data in
> its cache, and the rmb() makes sure the cache used after the rmb() is
> not older than the cache used before the rmb().

Yep!  This is what the Alpha architects eventually beat into my head
when I first encountered this issue.  ;-)

> However the more I think about it the more I suspect we'd better use
> rmb() in all readers in the common code, rather than defining rmbdd with
> IPI on alpha, maybe alpha after all isn't the only one that needs the
> rmb() but it's the only one defining the behaviour in detail? I can very
> well imagine other archs also not invalidating all caches of all other
> cpus after a write followed by wmb, the synchronization can be delayed
> safely if the other cpus are readers, and they didn't issued an explicit
> rmb. So what alpha is doing can really be seen as a "feature" that
> allows to delay synchronization of caches after writes and wmb, unless
> an explicit order is requested by the reader via rmb (or better mb on
alpha).

As I understand it, this "feature" allows the CPU to avoid stalling as
often, thereby increasing performance.  Which is presumably somewhat offset
by extra rmb()s on Alpha.

> The fact is that if there's old data in cache, a data dependency isn't
> different than non data dependency if the cpu caches aren't synchronized
> or invalidated during wmb run by the producer.

Right, and omitting this synchronization between CPU and cache presumably
decreases stalls and increases performance.

Here is my scorecard thus far on CPUs that run SMP Linux:

     SMP CPU        Data dependencies create implicit rmb()?

     Alpha          No
     i386      Yes
     IA64      Yes
     MIPS      Yes *
     MIPS64         Yes *
     PA-RISC        Yes *
     PPC       Yes
     S390      Yes
     S390X          Yes
     SPARC          Yes

     *MIPS & MIPS64: Still don't have my MIPS manual, basing this on
          earlier experience with MIPS.  Any comments from the SGI
          guys?

     *PA-RISC: Based on my reading of the PA RISC architecture manual,
          particularly the SYNC instruction and definition of the
          memory model.  However, this manual is not completely
          unambiguous.  Comments from someone in the know at HP?

     Non-SMP CPUs include: arm, cris, m68k, sh

So unless I missed something, Alpha stands alone here, as the only CPU
that does not provide some way for data dependencies to act as implicit
rmb()s.

Don't get me wrong -- I am quite impressed with the way Alpha avoids
stalls in many cases, while getting a level of synchronization that is
useful in many cases.  I just wish that they had also provided more heavy
weight memory barriers so that we aren't forced to choose between IPIs for
Alpha and lots of rmb()s sprinkled through the code, uglifying it and
slowing down all the other SMP CPUs.

Given the discussion on this topic over the past few days, and on earlier
discussions, I would argue that the Alpha semantics for wmb() and rmb()
are strongly counter-intuitive.  People seem to expect a wmb()-like
primitive that allows read-side data dependencies to act as implicit
rmb() instructions.  All the CPUs except for Alpha provide an instruction
that can act as such a wmb()-like instruction, and Alpha can simulate
such an instruction with IPIs.

And this line of reasoning lead me to send out the wmbdd() patch.  ;-)

Thoughts?

                              Thanx, Paul

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
@ 2001-10-11 10:34 Dipankar Sarma
  0 siblings, 0 replies; 67+ messages in thread
From: Dipankar Sarma @ 2001-10-11 10:34 UTC (permalink / raw)
  To: davem; +Cc: linux-kernel

In article <20011010.164628.39155290.davem@redhat.com> David S. Miller wrote:
>    From: Victor Yodaiken <yodaiken@fsmlabs.com>
>    Date: Wed, 10 Oct 2001 16:24:19 -0600
>    
>    In general you're right, and always its better to 
>    reduce contention than to come up with silly algorithms for 
>    reducing the cost of contention,

> I want to second this and remind people that the "cost" of spinlocks
> is mostly not "spinning idly waiting for lock", rather the big cost
> is shuffling the dirty cacheline ownership between the processors.

Absolutely. Even reader-writer locks with read-mostly situations
can result in painful degradation because of the dirty cacheline
bouncing around.

> Any scheme involving shared data which is written (the read counts
> in the various "lockless" schemes are examples) have the same "cost"
> assosciated with them.
> In short, I see no performance gain from the lockless algorithms
> even in the places where they can be applied.

I think it depends on the lockless algorithm. If it requires you
to write to cachelines with same level of sharing as earlier
locking algorithm, it is no good. On the other hand, if you
use schemes that minimizes or ideally has no writes to shared 
data for maintaining readers, it should result in performance gains.

> I spent some time oogling over lockless algorithms a few years ago,
> but I stopped once I realized where the true costs were.  In my view,
> the lockless algorithms perhaps are a win in parallel processing
> environments (in fact, the supercomputing field is where a lot of the
> lockless algorithm research comes from) but not in the kinds of places
> and with the kinds of data structure usage the Linux kernel has.

I would agree to the extent that lockless algorithms cannot be used as 
a wholesale replacement for spin-waiting locks. However in key areas where
performance is critical, lockless techniques can be applied provided
they don't come with the same problems as locking.

I would point to Read-Copy Update as a lockless mutual exclusion
where you don't have to maintain global reader counts and other
shared cachelines. We have been experimenting with a few
places in the linux kernel where lookups can be speeded up
by not having any writes to shared cachelines.

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
@ 2001-10-13 14:42 Paul McKenney
  2001-10-13 17:23 ` Linus Torvalds
  0 siblings, 1 reply; 67+ messages in thread
From: Paul McKenney @ 2001-10-13 14:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: dipankar, linux-kernel, Rusty Russell

> >
> > And the read side is:
> >        lock
> >        loopup
> >        atomic_inc
> >        unlock
> >
> > With RCU, read side is:
> >        loopup
> >        atomic_inc
>
> Yes. With maybe
>
>          non_preempt()
>          ..
>          preempt()
>
> around it for the pre-emption patches.
>
> However, you also need to make your free _free_ be aware of the count.
> Which means that the current RCU patch is really unusable for this. You
> need to have the "count" always in a generic place (put it with the
hash),

???

Ah!  Are you thinking of the trick that associates a reference counter
with each pointer, and where one uses a double-compare-and-swap instruction
to do updates?  That is definitely -not- what we are proposing here, since
it is important to avoid writes during read-only traversals.

Instead, we use side information to deduce when it is no longer possible
for there to be any references to a given data structure.

It -is- possible to use RCU in Linux -without- reference counts.  See
the Maneesh Soni's FD-management patch and description at:

http://lse.sourceforge.net/locking/patches/files_struct_rcu-2.4.10-04.patch
http://lse.sourceforge.net/locking/files_struct_rcu.txt

The reference counts are needed -only- in cases where references to an
RCU-protected data structure may be held across a sleep.  Dipankar Sarma's
IPV4 route-cache lookup patch at:

http://lse.sourceforge.net/locking/patches/rt_rcu-2.4.6-02.patch

is an example use of RCU where reference counts are used, since entries
can be queued.

                              Thanx, Paul

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-13 14:42 Paul McKenney
@ 2001-10-13 17:23 ` Linus Torvalds
  2001-10-13 17:28   ` Linus Torvalds
                     ` (2 more replies)
  0 siblings, 3 replies; 67+ messages in thread
From: Linus Torvalds @ 2001-10-13 17:23 UTC (permalink / raw)
  To: Paul McKenney; +Cc: dipankar, linux-kernel, Rusty Russell


On Sat, 13 Oct 2001, Paul McKenney wrote:
>
> The reference counts are needed -only- in cases where references to an
> RCU-protected data structure may be held across a sleep.

My point is that
 (a) such uses are not very common.
 (b) the whole notiong of "scheduling point" is a lot too fragile to be
     acceptable for important data structures. It breaks with the
     pre-emption patches on UP, and we've seen many times how hard it is
     for developers to notice even when there _is_ an explicit "end
     critical region now"  kind of thing

Have you seen how many uses of implicitly blocking operations there have
been with interrupts disabled or spinlocks held? Now, if the data
structure doesn't even have a "ok, I'm done with you" thing, then those
kinds of mistakes are going to be not only impossible to find
automatically, but they are going to be even easier to make by mistake.

In short, RCU has several problems in my opinion:

 - nobody has shown a case where existing normal locking ends up being
   really a huge problem, and where RCU clearly helps.

 - RCU obviously has major subtle issues, ranging from memory ordering to
   "what is quiescent", ie the scheduling points. And "subtlety" directly
   translates into bugs and hard-to-debug seldom-seen problems that end up
   being "unexplainable".

In short, RCU seems to be a case of "hey, that's cool", but it's a
solution in search of a problem so severe that it is worth it.

		Linus


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-13 17:23 ` Linus Torvalds
@ 2001-10-13 17:28   ` Linus Torvalds
  2001-10-14  7:25     ` Dipankar Sarma
  2001-10-13 18:42   ` Andi Kleen
  2001-10-13 21:19   ` Rusty Russell
  2 siblings, 1 reply; 67+ messages in thread
From: Linus Torvalds @ 2001-10-13 17:28 UTC (permalink / raw)
  To: Paul McKenney; +Cc: dipankar, linux-kernel, Rusty Russell

On Sat, 13 Oct 2001, Linus Torvalds wrote:
>
> In short, RCU seems to be a case of "hey, that's cool", but it's a
> solution in search of a problem so severe that it is worth it.

Oh, and before people start telling me that RCU was successfully used in
AIX/projectX/xxxx/etc, you have to realize that I don't give a rats *ss
about the fact that there are OS's out there that are "more scalable".

The last time I looked, Solaris and AIX and all the rest of the "scalable"
systems were absolute pigs on smaller hardware, and the "scalability" in
them often translates into "we scale linearly to many CPU's by being
really bad even on one".

		Linus

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-13 17:28   ` Linus Torvalds
@ 2001-10-14  7:25     ` Dipankar Sarma
  0 siblings, 0 replies; 67+ messages in thread
From: Dipankar Sarma @ 2001-10-14  7:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul McKenney, linux-kernel, Rusty Russell

On Sat, Oct 13, 2001 at 10:28:13AM -0700, Linus Torvalds wrote:
> 
> On Sat, 13 Oct 2001, Linus Torvalds wrote:
> >
> > In short, RCU seems to be a case of "hey, that's cool", but it's a
> > solution in search of a problem so severe that it is worth it.
> 
> Oh, and before people start telling me that RCU was successfully used in
> AIX/projectX/xxxx/etc, you have to realize that I don't give a rats *ss
> about the fact that there are OS's out there that are "more scalable".

Absolutely. Those are different OSes, different environments and mostly
imprtantly different goals. We may draw on that experience, but still
need to prove that the ideas work for *Linux*.

> 
> The last time I looked, Solaris and AIX and all the rest of the "scalable"
> systems were absolute pigs on smaller hardware, and the "scalability" in
> them often translates into "we scale linearly to many CPU's by being
> really bad even on one".

No argument here at all. Big iron is only a part of what linux
does and we are very conscious of that fact. In fact, this makes
our work quite an interesting challenge.

Thanks
Dipankar
-- 
Dipankar Sarma  <dipankar@in.ibm.com> Project: http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-13 17:23 ` Linus Torvalds
  2001-10-13 17:28   ` Linus Torvalds
@ 2001-10-13 18:42   ` Andi Kleen
  2001-10-13 19:15     ` Alexander Viro
  2001-10-13 20:44     ` Rusty Russell
  2001-10-13 21:19   ` Rusty Russell
  2 siblings, 2 replies; 67+ messages in thread
From: Andi Kleen @ 2001-10-13 18:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, lse-tech, Paul.McKenney, rusty

In article <Pine.LNX.4.33.0110131015410.8707-100000@penguin.transmeta.com>,
Linus Torvalds <torvalds@transmeta.com> writes:

>  - nobody has shown a case where existing normal locking ends up being
>    really a huge problem, and where RCU clearly helps.

The poster child of such a case is module unloading. Keeping reference
counts for every even non sleeping use of a module is very painful. 
The current "fix" -- putting module count increases in all possible module 
callers to fix the unload races is slow and ugly and far too subtle to 
get everything right. Waiting quiescent periods before unloading is a nice 
alternative.

-Andi

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-13 18:42   ` Andi Kleen
@ 2001-10-13 19:15     ` Alexander Viro
  2001-10-13 20:44     ` Rusty Russell
  1 sibling, 0 replies; 67+ messages in thread
From: Alexander Viro @ 2001-10-13 19:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Linus Torvalds, linux-kernel, lse-tech, Paul.McKenney, rusty



On 13 Oct 2001, Andi Kleen wrote:

> In article <Pine.LNX.4.33.0110131015410.8707-100000@penguin.transmeta.com>,
> Linus Torvalds <torvalds@transmeta.com> writes:
> 
> >  - nobody has shown a case where existing normal locking ends up being
> >    really a huge problem, and where RCU clearly helps.
> 
> The poster child of such a case is module unloading. Keeping reference
> counts for every even non sleeping use of a module is very painful. 
> The current "fix" -- putting module count increases in all possible module 
> callers to fix the unload races is slow and ugly and far too subtle to 
> get everything right. Waiting quiescent periods before unloading is a nice 
> alternative.

... while quiescent stuff is _not_ subtle and not prone to breakage.  Right.
In the same world where Vomit-Making System is elegant, SGI "designs" are
and NT is The Wave Of Future(tm).  Pardon me, but I'll stay in our universe
and away from the drugs of such power.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-13 18:42   ` Andi Kleen
  2001-10-13 19:15     ` Alexander Viro
@ 2001-10-13 20:44     ` Rusty Russell
  1 sibling, 0 replies; 67+ messages in thread
From: Rusty Russell @ 2001-10-13 20:44 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, lse-tech, Paul.McKenney

In message <k23d4njs9x.fsf@zero.aec.at> you write:
> In article <Pine.LNX.4.33.0110131015410.8707-100000@penguin.transmeta.com>,
> Linus Torvalds <torvalds@transmeta.com> writes:
> 
> >  - nobody has shown a case where existing normal locking ends up being
> >    really a huge problem, and where RCU clearly helps.
> 
> The poster child of such a case is module unloading. Keeping reference
> counts for every even non sleeping use of a module is very painful. 

Well, module unloading requires only a small fraction of the read copy
update infrastructure (synchronize_kernel()), and can be implemented
without any scheduler changes, as it's not at all speed critical.

If nothing else, this thread has served to make more kernel hackers
aware of the technique, so they can try it themselves as desired.

Cheers,
Rusty.
--
Premature optmztion is rt of all evl. --DK

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion
  2001-10-13 17:23 ` Linus Torvalds
  2001-10-13 17:28   ` Linus Torvalds
  2001-10-13 18:42   ` Andi Kleen
@ 2001-10-13 21:19   ` Rusty Russell
  2 siblings, 0 replies; 67+ messages in thread
From: Rusty Russell @ 2001-10-13 21:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: dipankar, linux-kernel

In message <Pine.LNX.4.33.0110131015410.8707-100000@penguin.transmeta.com> you 
write:
> 
>  (b) the whole notiong of "scheduling point" is a lot too fragile to be
>      acceptable for important data structures. It breaks with the
>      pre-emption patches on UP, and we've seen many times how hard it is
>      for developers to notice even when there _is_ an explicit "end
>      critical region now"  kind of thing

Yeah, if you can't get locking right, you can't get RCU right.  I've
shown you that using it in place of standard locking is simple: the
ONLY added issue is being careful not to screw readers while doing the
actual insert/delete.

Where, if anywhere, is this worth it?  Good question: 3% on 4-way
dbench doesn't cut it in my book...

Rusty.
--
Premature optmztion is rt of all evl. --DK

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2001-10-14  7:19 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-09 15:45 RFC: patch to allow lock-free traversal of lists with insertion Paul McKenney
2001-10-09 17:00 ` Richard Henderson
2001-10-10  3:33   ` Paul Mackerras
2001-10-10 17:02     ` Richard Henderson
2001-10-10  2:05 ` [Lse-tech] " Andrea Arcangeli
2001-10-10  5:05   ` Linus Torvalds
2001-10-10  5:17     ` BALBIR SINGH
2001-10-10  5:29       ` Davide Libenzi
2001-10-10  5:46       ` Linus Torvalds
2001-10-10  6:01         ` BALBIR SINGH
2001-10-10 15:23           ` Victor Yodaiken
2001-10-10  7:14         ` kdb requires kallsyms Kirill Ratkin
2001-10-10  7:38           ` BALBIR SINGH
2001-10-10  6:16     ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Paul Mackerras
2001-10-10  6:30       ` Linus Torvalds
2001-10-10  7:36     ` Paul Mackerras
2001-10-10 15:54       ` Victor Yodaiken
2001-10-10 21:56         ` Keith Owens
2001-10-10 22:24           ` Victor Yodaiken
2001-10-10 23:46             ` David S. Miller
2001-10-11  0:24               ` Davide Libenzi
2001-10-10 11:54     ` Keith Owens
2001-10-10 13:42       ` AIC7XXX war
2001-10-10 21:40         ` AIC7XXX Luigi Genoni
2001-10-10 13:24   ` [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion Ivan Kokshaysky
2001-10-10 13:41     ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2001-10-10  4:43 Paul McKenney
2001-10-10  6:54 Dipankar Sarma
2001-10-10  7:06 Dipankar Sarma
2001-10-10  7:21 ` BALBIR SINGH
2001-10-10  9:06   ` Dipankar Sarma
2001-10-10  7:58 Dipankar Sarma
     [not found] <20011010182730.0077454b.rusty@rustcorp.com.au>
2001-10-10  9:36 ` Linus Torvalds
2001-10-11  6:50   ` Rusty Russell
2001-10-10 10:06 Dipankar Sarma
2001-10-10 10:18 ` Linus Torvalds
2001-10-10 11:43   ` Dipankar Sarma
2001-10-12  3:27   ` Rusty Russell
2001-10-12 16:56     ` Linus Torvalds
2001-10-12 18:53       ` Dipankar Sarma
2001-10-13  7:25       ` Rusty Russell
2001-10-10 15:24 Paul McKenney
2001-10-10 16:58 ` Andrea Arcangeli
2001-10-10 17:25   ` Linus Torvalds
2001-10-12  5:06     ` Rusty Russell
2001-10-12 16:28       ` Linus Torvalds
2001-10-12 19:50         ` Al Dunsmuir
2001-10-13  1:07         ` Paul Mackerras
2001-10-13  1:54           ` Davide Libenzi
2001-10-13  2:04             ` Linus Torvalds
2001-10-13  2:31               ` Davide Libenzi
2001-10-13  2:46                 ` Davide Libenzi
2001-10-13  3:30                 ` Linus Torvalds
2001-10-13  2:49               ` Paul Mackerras
2001-10-13  2:00           ` Linus Torvalds
2001-10-13  7:38         ` Rusty Russell
2001-10-10 16:00 Paul McKenney
2001-10-10 21:44 Paul McKenney
2001-10-11 10:34 Dipankar Sarma
2001-10-13 14:42 Paul McKenney
2001-10-13 17:23 ` Linus Torvalds
2001-10-13 17:28   ` Linus Torvalds
2001-10-14  7:25     ` Dipankar Sarma
2001-10-13 18:42   ` Andi Kleen
2001-10-13 19:15     ` Alexander Viro
2001-10-13 20:44     ` Rusty Russell
2001-10-13 21:19   ` Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox