Re: Futex queue_me/get_user ordering

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Joe Seigh <jseigh_01@xemaps.com>
To: linux-kernel@vger.kernel.org
Subject: Re: Futex queue_me/get_user ordering
Date: Sun, 28 Nov 2004 12:36:57 -0500	[thread overview]
Message-ID: <41AA0CB9.CB30715A@xemaps.com> (raw)
In-Reply-To: 20041126170649.GA8188@mail.shareable.org



Jamie Lokier wrote:
> 
> I've looked at the problem of lost-wakeups problem with NPTL condition
> variables and 2.6 futex, with the help of Jakub's finely presented
> pseudo-code.  Unless I've made a mistake, it is fixable in userspace.
> 
> [ It might be more efficient to fix it in kernel space - on the other
>   hand, doing so might also make kernel futexes slower.  In general, I
>   prefer if the kernel futex semantics can be as "loose" as possible
>   to minimise the locking they are absolutely required to do.  Who
>   knows, we might come up with an algorithm that uses even less
>   cross-CPU traffic in the kernel, if the semantics permit it.
>   However, I appreciate that a more "atomic" kernel semantic is easier
>   to understand, and it is possible to implement that if it is really
>   worth doing.  I would like to see benchmarks proving it doesn't slow
>   down normal futex stress tests though.  It might not be slower at all. ]

[...]
>     5. Like 4, but in the kernel.  We change the kernel to _always_
>        retransmit a wakeup if it's received by the unqueue_me() in the
>        word-didn't-match branch.
> 
>        Effect: In the "Drowsy" state, a waiter may accept a WAKE token
>        but then it will offer it again so they are never lost from
>        "Sleeping" states.
> 
>        NOTE: This is NOT equivalent to changing the kernel to do
>        test-and-queue atomically.  With this change, a FUTEX_WAKE
>        operation can return to userspace _before_ the final
>        destination of the WAKE token decides to begin FUTEX_WAIT.
> 
>        This will result in spurious extra wakeups, erring too far the
>        other way, because of the difference from atomicity described
>        in the preceding paragraph.
> 
>        Therefore, I don't like this.  It would fix the NPTL condition
>        variables, but introduces two new problems:
> 
>            - It violates conservation of WAKE tokens (like energy and
>              momentum), which some other futex-using code may depend
>              on - unless the return value from FUTEX_WAIT is changed
>              to report 1 when it receives a token or 2 when it
>              forwards it successfully.
> 
>            - Some spurious wakeups at times when a wakeup is not
>              required.
> 
>            - No logical benefit over doing it in userspace, but
>              would take away flexibility if kernel always did it.
> 

I think this is similar to a solution that I proposed elsewhere.  You wake up
some other thread, if any, waiting on the futex.  This breaks what you call
WAKE tokens but wait morphing with FUTEX_CMP_REQUEUE does that already as far
as I can tell.   A FUTEX_WAIT that has been requeued onto another futex could
return EINTR instead of zero (one of the reasons you can't loop on EINTR's in
the cond wait code).

I did an alternate lock-free implementation of pthread condition variables with
a work around of sorts for that futex wake preemption problem I mentioned earlier.
I get a 3x to 200x performance improvement depending on what you are doing.  So
naturally I would be interested in a solution that doesn't require a userspace
bottleneck.

Joe Seigh

next prev parent reply	other threads:[~2004-11-28 18:11 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20041113164048.2f31a8dd.akpm@osdl.org>
2004-11-14  9:00 ` Futex queue_me/get_user ordering (was: 2.6.10-rc1-mm5 [u]) Emergency Services Jamie Lokier
2004-11-14  9:09   ` Andrew Morton
2004-11-14  9:23     ` Jamie Lokier
2004-11-14  9:50       ` bert hubert
2004-11-15 14:12         ` Jamie Lokier
2004-11-16  8:30           ` Futex queue_me/get_user ordering Hidetoshi Seto
2004-11-16 14:58             ` Jamie Lokier
2004-11-18  1:29               ` Hidetoshi Seto
2004-11-15  0:58       ` Hidetoshi Seto
2004-11-15  2:01         ` Jamie Lokier
2004-11-15  3:06           ` Hidetoshi Seto
2004-11-15 13:22             ` Jamie Lokier
2004-11-17  8:47               ` Jakub Jelinek
2004-11-18  2:10                 ` Hidetoshi Seto
2004-11-18  7:20                 ` Jamie Lokier
2004-11-18 19:47                   ` Jakub Jelinek
2005-03-17 10:26                     ` Jakub Jelinek
2005-03-17 15:20                       ` Jamie Lokier
2005-03-17 15:55                         ` Jakub Jelinek
2005-03-18 17:00                           ` Ingo Molnar
2005-03-21  2:55                             ` Jamie Lokier
2005-03-18 16:53                         ` Jakub Jelinek
2004-11-26 17:06                 ` Jamie Lokier
2004-11-28 17:36                   ` Joe Seigh [this message]
2004-11-29 11:24                   ` Jakub Jelinek
2004-11-29 21:50                     ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41AA0CB9.CB30715A@xemaps.com \
    --to=jseigh_01@xemaps.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).