From: William Lee Irwin III <wli@holomorphy.com>
To: Keith Owens <kaos@ocs.com.au>
Cc: Jesse Barnes <jbarnes@engr.sgi.com>,
Chris Wright <chrisw@osdl.org>,
Ravikiran G Thirumalai <kiran@in.ibm.com>,
linux-kernel@vger.kernel.org, dipankar@in.ibm.com
Subject: Re: [RFC] Lock free fd lookup
Date: Fri, 16 Jul 2004 18:19:36 -0700 [thread overview]
Message-ID: <20040717011936.GK3411@holomorphy.com> (raw)
In-Reply-To: <903.1090025759@ocs3.ocs.com.au>
On Thu, 15 Jul 2004 23:27:32 -0700, William Lee Irwin III wrote:
>> The gist of all this is that busywait-free atomic updates are only
>> implementable for data structures that don't link through the object
>> but rather maintain an external index with a single pointer to elements
>> needing updates, like radix trees, B+ trees, arrays of pointers, and
>> open-addressed hashtables.
On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> There are algorithms for busywait-free (lock free) traversal and
> concurrent update of lists and structures that contain embedded
> pointers. I once had to implement one on an O/S where the user space
> to kernel wait/alarm semantics could not satisfy the timing
> requirements of the calling code (don't ask).
Yes. I had in mind only RCU-based algorithms. Throwing more machinery at
the problem may get further.
On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> The beauty of these lockfree algorithms on large structures is that
> nothing ever stalls indefinitely. If the underlying SMP cache hardware
> is fair, everything running a lockfree list will make progress. These
> algorithms do not suffer from reader vs. writer starvation.
That's a large assumption. NUMA hardware typically violates it.
On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> The downside is that the algorithms rely on an extra sequence field per
> word. They copy out the structure, modify the local copy, write back
> with an update of sequence field. Writing back detects if any other
> update has invalidated the current structure, at which point the second
> update is discarded and the writer redrives its update. Readers are
> guaranteed to see a consistent view of the copy, even if the master
> structure is being updated at the same time as it is being read.
I would classify this as "ticket-based".
On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> Bottom line, it can be done but ...
> * The structure size doubles with the addition of the sequence fields.
> * The hardware must support cmpxchg on the combination of the sequence
> field and the data word that it protects. LL/SC can be used instead
> of cmpxchg if the hardware supports LL/SC.
> * If the hardware only supports cmpxchg4 then the sequence field is
> restricted to 2 bytes, which increases the risk of wraparound. If an
> update is delayed for exactly the wraparound interval then the data
> may be corrupted, that is a standard risk of small sequence fields.
> * The extra copy out just to read a structure needs more memory
> bandwidth, plus local allocation for the copy.
> * The internal code of the API to traverse a list containing lockfree
> structures is pretty messy, although that is hidden from the caller.
> * All writers must preserve enough state to redrive their update from
> scratch if a concurrent update has changed the same structure. Note,
> only the curent structure, not the rest of the list.
> * It requires type stable storage. That is, once a data area has been
> allocated to a particular structure type, it always contains that
> structure type, even when it has been freed from the list. Each list
> requires its own free pool, which can never be returned to the OS.
The last of these is particularly lethal.
On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> Nice research topics, but not suitable for day to day work. I only
> used the lockfree algorithms because I could not find an alternative on
> that particular OS. I am not sure that the Linux kernel has any
> problems that require the additional complexity.
> For some light reading, the code I implemented was based on Practical
> Implementations of Non-Blocking Synchronization Primitives by Mark Moir
> http://research.sun.com/scalable/Papers/moir-podc97.ps. Note that page
> 6, line 6 has an error. Replace "y := z" with "y := addr->data[i];".
> I implemented a completely lockfree 2-3-4 tree with embedded pointers
> using Moir's algorithm. The implementation allowed multiple concurrent
> readers and writers. It even allowed concurrent list insert, delete
> and rebalancing, while the structures on the were being read and
> updated.
Thanks. That may come in handy later.
On Sat, Jul 17, 2004 at 10:55:59AM +1000, Keith Owens wrote:
> 2-3-4 trees are self balancing, which gave decent lookup performance.
> Since red-black trees are logically equivalent to 2-3-4 trees it should
> be possible to use lockfree red-black trees. However I could not come
> up with a lockfree red-black tree, mainly because an read-black insert
> requires atomic changes to multiple structures. The 2-3-4 tree only
> needs atomic update to one structure at a time.
This actually appears to confirm my earlier assertion about the linkage
of the data structure. Is this conclusion what you had in mind?
-- wli
next prev parent reply other threads:[~2004-07-17 1:21 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-07-14 4:53 [RFC] Refcounting of objects part of a lockfree collection Ravikiran G Thirumalai
2004-07-14 4:56 ` [RFC] Lock free fd lookup Ravikiran G Thirumalai
2004-07-14 15:17 ` Chris Wright
2004-07-15 14:22 ` Jesse Barnes
2004-07-15 16:10 ` Dipankar Sarma
2004-07-15 16:22 ` Jesse Barnes
2004-07-15 16:34 ` Chris Wright
2004-07-16 5:38 ` Ravikiran G Thirumalai
2004-07-16 6:27 ` William Lee Irwin III
2004-07-17 0:55 ` Keith Owens
2004-07-17 1:19 ` William Lee Irwin III [this message]
2004-07-17 2:12 ` Keith Owens
2004-07-17 2:34 ` William Lee Irwin III
2004-07-17 2:28 ` Keith Owens
2004-07-17 3:16 ` William Lee Irwin III
2004-07-17 13:48 ` Peter Zijlstra
2004-07-14 7:07 ` [RFC] Refcounting of objects part of a lockfree collection Greg KH
2004-07-14 8:26 ` Dipankar Sarma
2004-07-14 14:26 ` Greg KH
2004-07-14 15:22 ` Dipankar Sarma
2004-07-14 17:03 ` Greg KH
2004-07-14 17:49 ` Dipankar Sarma
2004-07-14 18:03 ` Greg KH
2004-07-15 6:21 ` Ravikiran G Thirumalai
2004-07-15 6:56 ` Dipankar Sarma
2004-07-14 8:57 ` Ravikiran G Thirumalai
2004-07-14 17:08 ` Greg KH
2004-07-14 18:17 ` Dipankar Sarma
2004-07-15 8:02 ` Ravikiran G Thirumalai
2004-07-15 9:36 ` Dipankar Sarma
2004-07-16 14:32 ` Greg KH
2004-07-16 15:50 ` Ravikiran G Thirumalai
-- strict thread matches above, loose matches on Subject: below --
2004-07-17 8:50 [RFC] Lock free fd lookup Manfred Spraul
2004-07-17 9:30 ` William Lee Irwin III
2004-07-17 19:17 Albert Cahalan
2004-07-29 0:14 ` William Lee Irwin III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040717011936.GK3411@holomorphy.com \
--to=wli@holomorphy.com \
--cc=chrisw@osdl.org \
--cc=dipankar@in.ibm.com \
--cc=jbarnes@engr.sgi.com \
--cc=kaos@ocs.com.au \
--cc=kiran@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.