Re: [PATCH 0/3] 64-bit futexes: Intro

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nick Piggin <npiggin@suse.de>
To: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	David Howells <dhowells@redhat.com>,
	Ulrich Drepper <drepper@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 0/3] 64-bit futexes: Intro
Date: Tue, 3 Jun 2008 05:24:04 +0200	[thread overview]
Message-ID: <20080603032403.GA17089@wotan.suse.de> (raw)
In-Reply-To: <20080602230316.GA24159@elte.hu>

On Tue, Jun 03, 2008 at 01:03:17AM +0200, Ingo Molnar wrote:
> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > > That bit can be used as a lock and if all access to the state of 
> > > that atomic variable uses it, arbitrary higher-order atomic state 
> > > transitions can be derived from it. The cost would be a bit more 
> > > instructions in the fastpath, but there would still only be a single 
> > > atomic op (the acquire op), as the unlock would be a natural barrier 
> > > (on x86 at least).
> > 
> > No, "unlocks as a natural barrier" only works for exclusive kernel 
> > locks (spin_unlock and write_unlock). There we can just do a write to 
> > unlock. But for anything that wants to handle contention differently 
> > than just spinning, the unlock path needs to be able to do an atomic 
> > "unlock and test if I need to do something else", because it may need 
> > to wake things up.
> 
> yeah, indeed. Compared to all the other costs that have to be dealt with 
> here, having a second atomic op isnt all that much of an issue either, 
> especially on latest hw. An atomic op will probably never be as cheap as 
> a non-atomic op, but ~20 cycles is still plenty fast for most practical 
> purposes.

I think optimised our unlock_page in a way that it can do a
non-atomic unlock in the fastpath (no waiters) using 2 bits. In
practice it was still atomic but only because other page flags
operations could operate on ->flags at the same time.

I still have to get around to submitting the damn thing upstream,
but maybe if I bring it up here, I get the idea more reviewed :)

Anyway, the algorithm goes like this:

lock_page()
{
  if (test_and_set_bit_lock(PG_locked))
    lock_page_slow();
}

lock_page_slow()
{
  /* slowpath */
again:
  prepare_to_wait();
  set_bit(PG_waiters);
  if (test_and_set_bit_lock(PG_locked)) {
    schedule();
    goto again;
  }
}

wake_page_waiters()
{
  /* slowpath */
  clear_bit(PG_waiters);
  smp_mb__after_clear_bit(); /* order clear_bit store with wake_up_page load */
  wake_up_page(PG_locked);
}

unlock_page()
{
  clear_bit_unlock(PG_locked);
  if (unlikely(test_bit(PG_waiters)))
    wake_page_waiters();
}

We don't require any load/store barrier in the unlock_page fastpath
because the bits are in the same word, so cache coherency gives us a
sequential ordering anyway.

Now you notice the lock page slowpath can set bits without holding the
lock, at first glance you'd think the clear_bit to unlock has to be
atomic too then. But actually if we're careful, we can put them in seperate
parts of the word and use the sub-word operations on x86 to avoid the atomic
requirement. I'm not aware of any architecture in which operations to the
same word could be out of order.

Not only does this avoid all barriers (except acquire/release) in the
fastpaths, but it also avoids the unconditional load of the random
hash page waitqueue cacheline on unlock.

Anything applicable to your problem at hand?

next prev parent reply	other threads:[~2008-06-03  3:24 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-31  1:27 [PATCH 0/3] 64-bit futexes: Intro Ulrich Drepper
2008-05-31  2:13 ` Linus Torvalds
2008-05-31  3:14   ` Ulrich Drepper
2008-05-31  3:44     ` Linus Torvalds
2008-05-31  4:04       ` Ulrich Drepper
2008-05-31  4:16         ` Linus Torvalds
2008-05-31  4:23           ` Linus Torvalds
2008-05-31  4:38             ` Ulrich Drepper
2008-05-31  4:58               ` Linus Torvalds
2008-05-31 22:25               ` Linus Torvalds
2008-05-31 22:32                 ` Linus Torvalds
2008-06-02 18:54                 ` Ingo Molnar
2008-06-02 20:22                   ` Linus Torvalds
2008-06-02 23:03                     ` Ingo Molnar
2008-06-03  3:24                       ` Nick Piggin [this message]
2008-06-04 19:57                         ` Linus Torvalds
2008-06-04 20:38                           ` Linus Torvalds
2008-06-05  1:56                             ` Nick Piggin
2008-06-05  1:58                               ` Nick Piggin
2008-06-05  3:08                               ` Linus Torvalds
2008-06-05  4:29                                 ` Nick Piggin
2008-06-05  1:45                           ` Nick Piggin
2008-06-06  1:27                     ` Nick Piggin
2008-06-06  3:37                       ` Linus Torvalds
2008-06-06 11:53                         ` Nick Piggin
2008-06-06 15:01                           ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080603032403.GA17089@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=drepper@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.