Re: [PATCH 0/3] 64-bit futexes: Intro

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nick Piggin <npiggin@suse.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>, David Howells <dhowells@redhat.com>,
	Ulrich Drepper <drepper@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 0/3] 64-bit futexes: Intro
Date: Fri, 6 Jun 2008 03:27:49 +0200	[thread overview]
Message-ID: <20080606012749.GA12187@wotan.suse.de> (raw)
In-Reply-To: <alpine.LFD.1.10.0806021246060.3141@woody.linux-foundation.org>

On Mon, Jun 02, 2008 at 01:22:57PM -0700, Linus Torvalds wrote:
> 
> [ I added Nick and DavidH to the Cc, since they have at least historically 
>   shown interest in locking algorithms ]
> 
> On Mon, 2 Jun 2008, Ingo Molnar wrote:
> > 
> > i suspect _any_ abstract locking functionality around a data structure 
> > can be implemented via atomic control over just a single user-space bit.
> 
> Well, theory is theory, and practice is different.
> 
> That's *especially* true when it comes to locking, which is just so 
> tightly coupled to the exact details of which atomics are fast on a 
> particular architecture.
> 
> Also, different people will want to see different performance profiles. 
> 
> For example, it makes a *huge* difference whether you are strictly fair or 
> not. I can almost guarantee that a 100% fair implementation may be really 
> "nice" from a theoretical standpoint, but suck really badly in practice, 
> because if you want best performance, then you want the lock to have a 
> very strong CPU affinity - and the faster you can do your lock ops, the 
> more of a unfairness and CPU affinity they get!
> 
> And rwlocks in particular are actually much more interesting in this 
> respect, because they not only have that CPU affinity fairness, but also 
> the reader-vs-writer fairness. You optimize for one load, and it may give 
> you almost perfect performance, but at the cost of sucking at another 
> load.
> 
> For example, some loads are almost entirely read-read locks, with only 
> very occasional write locks for some uncommon config change thing. Do you 
> want to optimize for that? Maybe. And yes, you can make those uncontended 
> read-read locks go really quickly, but then (especially if you continue to 
> let reads through even when writers want to contend), that can slow down 
> writers a *lot*, to the point of starvation.
> 
> Different CPU's will also show different patterns. 
> 
> Anyway, I was busy most of the weekend, but I've now coded up a partial 
> actual example implementation. Its' probably buggy. Uli is very correct in 
> saying that it's easy to screw up futex'es, but even in the absense of 
> futexes, it's just easy to screw up any threaded logic.
> 
> But if anybody wants to look at a rough draft that at least limps along 
> _partially_, there's
> 
> 	http://git.kernel.org/?p=linux/kernel/git/torvalds/rwlock.git;a=summary
> 
> for you.
> 
> And I'll freely admit that 
>  (a) the above thing is pretty hacked up. No guarantees as to correctness 
>      or anything else.
>  (b) I looked _mainly_ at the "all readers" case, and considered that the 
>      primary goal. Which explains why it does really well on that 
>      particular load.
>  (c) However, I refuse to starve writers. In fact, my thing will always 
>      prefer a writer to any new readers, on the assumption that the sanest 
>      rwlock situation is the "tons of readers, occasional writer".
>  (d) it does ok for me on the write-write contention case too, but the 
>      random mixed "all threads do 5% writelocks" test-case seems to suck.
> 
> As an exmaple: the current glibc code I have seems to be uniformly and 
> boringly middle-of-the-road. It really doesn't seem to be horrible at all. 
> That said, the above thing _is_ 2-3x faster for me on that read-read case 
> (from single thread up to 20 threads - but just tested on one particular 
> machine), so if that is what you want to aim for, it's certainly easy to 
> beat.
> 
> But for the write case, while I can easily beat it for a low-thread count 
> with little contention, my example thing above benchmarks about equal for 
> bigger thread counts (caveat: again - I've only tested on one machine, 
> which is a single-socket dual-core thing. Caveat emptor).
> 
> And the pthreads implementation actually beats my hacked-up one at least 
> for the 5% random writelocks case. It's not beating it by a huge amount, 
> but it's not in the noise level either (maybe 15%). And I bet the pthreads 
> implementation is a hell of a lot better tested ;)

Had a bit of a look though this, seems pretty OK to me. Obviously with
rwlocks it *is*  impossible to do non-atomic unlocks, so I can't see
a way to significantly improve your code there.

What you *could* maybe do, to slightly speed up the reader fastpath, at
the expense of the writer fastpath, is to also have the active writer add
4 to the count too, so your unlock can start with a lock xadd -4, count
in order to get the write-intent on the cacheline straight up.

Though that's a pretty tiny "optmisation", and not going to be an amazing
win, even when it does save a state transition on the cacheline...

I'd be more interested to know why this code can't be evolved into a full
rwlock implementation? This is a rather standard (though neat) looking rwlock
-- so my question is what can the patented 64-bit futex locks do that this
can't, or what can they do faster?

next prev parent reply	other threads:[~2008-06-06  1:28 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-31  1:27 [PATCH 0/3] 64-bit futexes: Intro Ulrich Drepper
2008-05-31  2:13 ` Linus Torvalds
2008-05-31  3:14   ` Ulrich Drepper
2008-05-31  3:44     ` Linus Torvalds
2008-05-31  4:04       ` Ulrich Drepper
2008-05-31  4:16         ` Linus Torvalds
2008-05-31  4:23           ` Linus Torvalds
2008-05-31  4:38             ` Ulrich Drepper
2008-05-31  4:58               ` Linus Torvalds
2008-05-31 22:25               ` Linus Torvalds
2008-05-31 22:32                 ` Linus Torvalds
2008-06-02 18:54                 ` Ingo Molnar
2008-06-02 20:22                   ` Linus Torvalds
2008-06-02 23:03                     ` Ingo Molnar
2008-06-03  3:24                       ` Nick Piggin
2008-06-04 19:57                         ` Linus Torvalds
2008-06-04 20:38                           ` Linus Torvalds
2008-06-05  1:56                             ` Nick Piggin
2008-06-05  1:58                               ` Nick Piggin
2008-06-05  3:08                               ` Linus Torvalds
2008-06-05  4:29                                 ` Nick Piggin
2008-06-05  1:45                           ` Nick Piggin
2008-06-06  1:27                     ` Nick Piggin [this message]
2008-06-06  3:37                       ` Linus Torvalds
2008-06-06 11:53                         ` Nick Piggin
2008-06-06 15:01                           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080606012749.GA12187@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=drepper@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.