public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: William Lee Irwin III <wli@holomorphy.com>
To: Keith Owens <kaos@ocs.com.au>
Cc: Jesse Barnes <jbarnes@engr.sgi.com>,
	Chris Wright <chrisw@osdl.org>,
	Ravikiran G Thirumalai <kiran@in.ibm.com>,
	linux-kernel@vger.kernel.org, dipankar@in.ibm.com
Subject: Re: [RFC] Lock free fd lookup
Date: Fri, 16 Jul 2004 18:19:36 -0700	[thread overview]
Message-ID: <20040717011936.GK3411@holomorphy.com> (raw)
In-Reply-To: <903.1090025759@ocs3.ocs.com.au>

On Thu, 15 Jul 2004 23:27:32 -0700, William Lee Irwin III wrote:
>> The gist of all this is that busywait-free atomic updates are only
>> implementable for data structures that don't link through the object
>> but rather maintain an external index with a single pointer to elements
>> needing updates, like radix trees, B+ trees, arrays of pointers, and
>> open-addressed hashtables.

On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> There are algorithms for busywait-free (lock free) traversal and
> concurrent update of lists and structures that contain embedded
> pointers.  I once had to implement one on an O/S where the user space
> to kernel wait/alarm semantics could not satisfy the timing
> requirements of the calling code (don't ask).

Yes. I had in mind only RCU-based algorithms. Throwing more machinery at
the problem may get further.


On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> The beauty of these lockfree algorithms on large structures is that
> nothing ever stalls indefinitely.  If the underlying SMP cache hardware
> is fair, everything running a lockfree list will make progress.  These
> algorithms do not suffer from reader vs. writer starvation.

That's a large assumption. NUMA hardware typically violates it.


On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> The downside is that the algorithms rely on an extra sequence field per
> word.  They copy out the structure, modify the local copy, write back
> with an update of sequence field.  Writing back detects if any other
> update has invalidated the current structure, at which point the second
> update is discarded and the writer redrives its update.  Readers are
> guaranteed to see a consistent view of the copy, even if the master
> structure is being updated at the same time as it is being read.

I would classify this as "ticket-based".


On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> Bottom line, it can be done but ...
> * The structure size doubles with the addition of the sequence fields.
> * The hardware must support cmpxchg on the combination of the sequence
>   field and the data word that it protects.  LL/SC can be used instead
>   of cmpxchg if the hardware supports LL/SC.
> * If the hardware only supports cmpxchg4 then the sequence field is
>   restricted to 2 bytes, which increases the risk of wraparound.  If an
>   update is delayed for exactly the wraparound interval then the data
>   may be corrupted, that is a standard risk of small sequence fields.
> * The extra copy out just to read a structure needs more memory
>   bandwidth, plus local allocation for the copy.
> * The internal code of the API to traverse a list containing lockfree
>   structures is pretty messy, although that is hidden from the caller.
> * All writers must preserve enough state to redrive their update from
>   scratch if a concurrent update has changed the same structure.  Note,
>   only the curent structure, not the rest of the list.
> * It requires type stable storage.  That is, once a data area has been
>   allocated to a particular structure type, it always contains that
>   structure type, even when it has been freed from the list.  Each list
>   requires its own free pool, which can never be returned to the OS.

The last of these is particularly lethal.


On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> Nice research topics, but not suitable for day to day work.  I only
> used the lockfree algorithms because I could not find an alternative on
> that particular OS.  I am not sure that the Linux kernel has any
> problems that require the additional complexity.
> For some light reading, the code I implemented was based on Practical
> Implementations of Non-Blocking Synchronization Primitives by Mark Moir
> http://research.sun.com/scalable/Papers/moir-podc97.ps.  Note that page
> 6, line 6 has an error.  Replace "y := z" with "y := addr->data[i];".
> I implemented a completely lockfree 2-3-4 tree with embedded pointers
> using Moir's algorithm.  The implementation allowed multiple concurrent
> readers and writers.  It even allowed concurrent list insert, delete
> and rebalancing, while the structures on the were being read and
> updated.

Thanks. That may come in handy later.

On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> 2-3-4 trees are self balancing, which gave decent lookup performance.
> Since red-black trees are logically equivalent to 2-3-4 trees it should
> be possible to use lockfree red-black trees.  However I could not come
> up with a lockfree red-black tree, mainly because an read-black insert
> requires atomic changes to multiple structures.  The 2-3-4 tree only
> needs atomic update to one structure at a time.

This actually appears to confirm my earlier assertion about the linkage
of the data structure. Is this conclusion what you had in mind?


-- wli

  reply	other threads:[~2004-07-17  1:21 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-14  4:53 [RFC] Refcounting of objects part of a lockfree collection Ravikiran G Thirumalai
2004-07-14  4:56 ` [RFC] Lock free fd lookup Ravikiran G Thirumalai
2004-07-14 15:17   ` Chris Wright
2004-07-15 14:22     ` Jesse Barnes
2004-07-15 16:10       ` Dipankar Sarma
2004-07-15 16:22         ` Jesse Barnes
2004-07-15 16:34         ` Chris Wright
2004-07-16  5:38           ` Ravikiran G Thirumalai
2004-07-16  6:27       ` William Lee Irwin III
2004-07-17  0:55         ` Keith Owens
2004-07-17  1:19           ` William Lee Irwin III [this message]
2004-07-17  2:12             ` Keith Owens
2004-07-17  2:34               ` William Lee Irwin III
2004-07-17  2:28             ` Keith Owens
2004-07-17  3:16               ` William Lee Irwin III
2004-07-17 13:48     ` Peter Zijlstra
2004-07-14  7:07 ` [RFC] Refcounting of objects part of a lockfree collection Greg KH
2004-07-14  8:26   ` Dipankar Sarma
2004-07-14 14:26     ` Greg KH
2004-07-14 15:22       ` Dipankar Sarma
2004-07-14 17:03         ` Greg KH
2004-07-14 17:49           ` Dipankar Sarma
2004-07-14 18:03             ` Greg KH
2004-07-15  6:21       ` Ravikiran G Thirumalai
2004-07-15  6:56         ` Dipankar Sarma
2004-07-14  8:57   ` Ravikiran G Thirumalai
2004-07-14 17:08     ` Greg KH
2004-07-14 18:17       ` Dipankar Sarma
2004-07-15  8:02       ` Ravikiran G Thirumalai
2004-07-15  9:36         ` Dipankar Sarma
2004-07-16 14:32         ` Greg KH
2004-07-16 15:50           ` Ravikiran G Thirumalai
  -- strict thread matches above, loose matches on Subject: below --
2004-07-17  8:50 [RFC] Lock free fd lookup Manfred Spraul
2004-07-17  9:30 ` William Lee Irwin III
2004-07-17 19:17 Albert Cahalan
2004-07-29  0:14 ` William Lee Irwin III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040717011936.GK3411@holomorphy.com \
    --to=wli@holomorphy.com \
    --cc=chrisw@osdl.org \
    --cc=dipankar@in.ibm.com \
    --cc=jbarnes@engr.sgi.com \
    --cc=kaos@ocs.com.au \
    --cc=kiran@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox