Re: [RFC] Lock free fd lookup - William Lee Irwin III

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: William Lee Irwin III <wli@holomorphy.com>
To: Jesse Barnes <jbarnes@engr.sgi.com>
Cc: Chris Wright <chrisw@osdl.org>,
	Ravikiran G Thirumalai <kiran@in.ibm.com>,
	linux-kernel@vger.kernel.org, dipankar@in.ibm.com
Subject: Re: [RFC] Lock free fd lookup
Date: Thu, 15 Jul 2004 23:27:32 -0700	[thread overview]
Message-ID: <20040716062732.GG3411@holomorphy.com> (raw)
In-Reply-To: <200407151022.53084.jbarnes@engr.sgi.com>

On Wednesday, July 14, 2004 11:17 am, Chris Wright wrote:
>> I'm curious, how much of the performance improvement is from RCU usage
>> vs. making the basic syncronization primitive aware of a reader and
>> writer distinction?  Do you have benchmark for simply moving to rwlock_t?

On Thu, Jul 15, 2004 at 10:22:53AM -0400, Jesse Barnes wrote:
> That's a good point.  Also, even though the implementation may be 'lockless', 
> there are still a lot of cachelines bouncing around, whether due to atomic 
> counters or cmpxchg (in fact the latter will be worse than simple atomics).
> It seems to me that RCU is basically rwlocks on steroids, which means that 
> using it requires the same care to avoid starvation and/or other scalability 
> problems (i.e. we'd better be really sure that a given codepath really should 
> be using rwlocks before we change it).

Primarily not for either of your benefit(s):

files->lock actually started as an rwlock and was changed to a spinlock
on the grounds that cmpxchg for rwlocks sucked on Pentium 4.

Typical rwlock implementations don't have such starvation issues.
Normally attempts at acquiring them for write exclude new readers with
various forms of bias. Linux has come to rely on an unfair
implementation by means of recursion, both directly (I hear somewhere
in the net stack and haven't looked) and via recursive acquisition in
interrupt context (for signal delivery). The unfairness does cause
problems in practice, e.g. writers starving on tasklist_lock takes out
machines running large numbers of processes.

RCU does not have such fairness issues. While it is heavily read
biased, readers do not exclude writers. Things can, however, delay
freeing of memory if tasks don't voluntarily yield for long periods
of time (e.g. involuntary context switches caused by kernel preemption)
as the freeing of the data structures potentially referenced by readers
must be delayed until all cpus have gone through a voluntary context
switch. This generally resolves itself as waiting for memory involves a
voluntary context switch. Readers merely disable preemption, and only
writers require atomicity or mutual exclusion, which is more typical in
Linux in part owing to preferred data structures. I suspect this is in
part due to its initial usage for doubly-linked lists that are never
traversed backward. RCU is actually a lockless update protocol and can
be done with atomic updates on the write side as opposed to spinlocks
around the updates. This happily doesn't involve busywait on machines
able to implement such things directly, but also depends heavily on the
data structure.

For instance, an open-addressed hash table or a 4-way associative cache
could simply atomically update the pointers to the elements. To see why
things get much harder when you try to go completely lockfree with less
suitable data structures, a singly-linked list would require a cmpxchg
loop inside a retry loop, temporarily marking next pointers with some
invalid value that signals accessors to retry the read operation, and
some external method of preventing simultaneous removals of the same
element (e.g. refcounting), as the next pointer of the predecessor must
be what was expected during removal, the removed element's next pointer
remain be what was expected, and the updated predecessor must be what
was discovered in the list, and more still for cpus not supporting
cmpxchg or similar operations, though that's probably not an entirely
complete or accurate description of the headaches involved when the
data structure doesn't mesh well with lock-free atomic updates. The
gist of all this is that busywait-free atomic updates are only
implementable for data structures that don't link through the object
but rather maintain an external index with a single pointer to elements
needing updates, like radix trees, B+ trees, arrays of pointers, and
open-addressed hashtables. Otherwise, spinlocks on the write side are
probably better than the completely lock-free alternative for smaller
systems, though larger systems hate sharing the cachelines.

-- wli

next prev parent reply	other threads:[~2004-07-16  6:28 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-14  4:53 [RFC] Refcounting of objects part of a lockfree collection Ravikiran G Thirumalai
2004-07-14  4:56 ` [RFC] Lock free fd lookup Ravikiran G Thirumalai
2004-07-14 15:17   ` Chris Wright
2004-07-15 14:22     ` Jesse Barnes
2004-07-15 16:10       ` Dipankar Sarma
2004-07-15 16:22         ` Jesse Barnes
2004-07-15 16:34         ` Chris Wright
2004-07-16  5:38           ` Ravikiran G Thirumalai
2004-07-16  6:27       ` William Lee Irwin III [this message]
2004-07-17  0:55         ` Keith Owens
2004-07-17  1:19           ` William Lee Irwin III
2004-07-17  2:12             ` Keith Owens
2004-07-17  2:34               ` William Lee Irwin III
2004-07-17  2:28             ` Keith Owens
2004-07-17  3:16               ` William Lee Irwin III
2004-07-17 13:48     ` Peter Zijlstra
2004-07-14  7:07 ` [RFC] Refcounting of objects part of a lockfree collection Greg KH
2004-07-14  8:26   ` Dipankar Sarma
2004-07-14 14:26     ` Greg KH
2004-07-14 15:22       ` Dipankar Sarma
2004-07-14 17:03         ` Greg KH
2004-07-14 17:49           ` Dipankar Sarma
2004-07-14 18:03             ` Greg KH
2004-07-15  6:21       ` Ravikiran G Thirumalai
2004-07-15  6:56         ` Dipankar Sarma
2004-07-14  8:57   ` Ravikiran G Thirumalai
2004-07-14 17:08     ` Greg KH
2004-07-14 18:17       ` Dipankar Sarma
2004-07-15  8:02       ` Ravikiran G Thirumalai
2004-07-15  9:36         ` Dipankar Sarma
2004-07-16 14:32         ` Greg KH
2004-07-16 15:50           ` Ravikiran G Thirumalai
  -- strict thread matches above, loose matches on Subject: below --
2004-07-17  8:50 [RFC] Lock free fd lookup Manfred Spraul
2004-07-17  9:30 ` William Lee Irwin III
2004-07-17 19:17 Albert Cahalan
2004-07-29  0:14 ` William Lee Irwin III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040716062732.GG3411@holomorphy.com \
    --to=wli@holomorphy.com \
    --cc=chrisw@osdl.org \
    --cc=dipankar@in.ibm.com \
    --cc=jbarnes@engr.sgi.com \
    --cc=kiran@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox