All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michel Lespinasse <walken@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Darren Hart <dvhltc@us.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation
Date: Tue, 17 Nov 2009 20:21:28 -0800	[thread overview]
Message-ID: <20091118042128.GC23808@google.com> (raw)
In-Reply-To: <alpine.LFD.2.01.0911170708110.9384@localhost.localdomain>

On Tue, Nov 17, 2009 at 07:24:09AM -0800, Linus Torvalds wrote:
> The FUTEX_SET_WAIT concept seems well-defined, although it sounds more 
> like a FUTEX_CMPXCHG_WAIT to me than a "SET" operation. I'm not entirely 
> sure that we really want to do the CMPXCHG in the kernel rather than in 
> user space, since lock stealing generally isn't a problem, but I don't 
> think it's _wrong_ to add this concept.
> 
> In fact, CMPXCHG is generally seen to be the "fundamental" base for 
> implementing locking, so in that sense it makes perfect sense to have it 
> as a FUTEX model.

My first version called the operation that way, but it did *NOT* block if
val2 (now renamed setval) was already set in the futex. Turned out it helps
my use case if I do block in that situation, so I changed the operation
accordingly and renamed it into FUTEX_SET_WAIT (with a CAS model in mind,
though it's still also similar to cmpxchg in that it just returns if
the uval is not 'val' or 'setval').

> That said, I personally think the adaptive wait model is (a) more likely 
> to fix many performance issues and (b) a bit more high-level concept, so I 
> like Peter's patch too, but I don't see that the patches would really be 
> mutually exclusive.
> 
> Of course, it's possible that Michel's performance problem is fixed by the 
> adaptive approach too, in which case the FUTEX_SET_WAIT (or _CMPXCHG_WAIT) 
> patch is just fundamentally less interesting. But some people do need 
> fairness - even when it's bad for performance - so...
> 
> One thing that does strike me is that _if_ we want to do both interfaces, 
> then I would assume that we quite likely also want to have an adaptive 
> version of the FUTEX_SET|CMPXCHG_WAIT thing. Which perhaps implies that 
> the "ADAPTIVE" part should be a bitflag in the command value?

I like the adaptive approach as well, though I'm not sure yet if it'd work
for us. I can try it but it'll take a bit of time.


One difficulty with adaptive spinning is that we want to avoid deadlocks.
If two threads end up spinning in-kernel waiting for each other, we better
have preemption enabled... or detect and deal with the situation somehow.


Also one aspect I dislike is that this would impose a given format on the
futex for storing the TID. I would prefer if there were several bits available
in the futex for userspace to do whatever they want. 8 bits would likely
be enough, which leaves 24 for the TID - enough for us, but I have no idea
if that's good enough for upstream inclusion. It that's not possible,
one possible compromise could be:

- userspace passes a TID (which it extracted from the futex value; but kernel
  does not necessarily know how)
- kernel spins until that TID goes to sleep, or the futex value is not equal
  to val or setval anymore
- if val != setval and the futex value is val, set it to setval
- if the futex valus is setval, block, otherwise -EWOULDBLOCK.

If the lock got stolen from a different thread, userspace can decide to
retry with or without adaptive spinning.

That would be the most generic interface I can think of, though it's
starting to be a LOT of parameters - actually, too many to pass through
the _syscall6 interface.


I also like Darren's suggestion to do a FUTEX_SET_WAIT_REQUEUE_PI,
but it's hitting the same 'too many parameters' limitation as well :/

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

  reply	other threads:[~2009-11-18  4:21 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-17  7:46 [PATCH] futex: add FUTEX_SET_WAIT operation Michel Lespinasse
2009-11-17  8:18 ` Ingo Molnar
2009-11-17  8:55   ` Peter Zijlstra
2009-11-17 16:16     ` Darren Hart
2009-11-18  3:37       ` Michel Lespinasse
2009-11-18  5:29         ` Darren Hart
2009-11-24 14:39         ` [PATCH 0/3] perf bench: Add new benchmark for futex subsystem Hitoshi Mitake
2009-11-24 14:39         ` [PATCH 1/3] perf bench: Add wrappers for atomic operation of GCC Hitoshi Mitake
2009-11-24 16:20           ` Darren Hart
2009-11-26  5:44             ` Hitoshi Mitake
2009-11-24 14:39         ` [PATCH 2/3] perf bench: Add new files for futex performance test Hitoshi Mitake
2009-11-24 16:33           ` Darren Hart
2009-11-26  5:53             ` Hitoshi Mitake
2009-11-26  5:56               ` [PATCH] futextest: Make locktest() in harness.h more general Hitoshi Mitake
2009-11-24 14:39         ` [PATCH 3/3] perf bench: Fix misc files to build files related to futex Hitoshi Mitake
2009-11-18 22:13       ` [PATCH] futex: add FUTEX_SET_WAIT operation Michel Lespinasse
2009-11-19  6:51         ` Darren Hart
2009-11-19 17:03         ` Darren Hart
     [not found]           ` <8d20b11a0911191325u49624854u6132594f13b0718c@mail.gmail.com>
2009-11-19 23:13             ` Darren Hart
2009-11-21  2:36               ` Michel Lespinasse
2009-11-23 17:21                 ` Darren Hart
2009-11-17 17:24     ` Ingo Molnar
2009-11-17 17:27       ` Darren Hart
2009-11-18  1:49       ` Hitoshi Mitake
2009-11-17  8:50 ` Peter Zijlstra
2009-11-17 15:24   ` Linus Torvalds
2009-11-18  4:21     ` Michel Lespinasse [this message]
2009-11-18  5:40       ` Darren Hart
2009-11-30 22:09   ` Darren Hart
2009-12-03  6:55   ` [PATCH] futex: add FUTEX_SET_WAIT operation (and ADAPTIVE) Darren Hart
2009-11-17 17:22 ` [PATCH] futex: add FUTEX_SET_WAIT operation Darren Hart
2009-11-18  3:29   ` Michel Lespinasse
2009-11-18  0:13 ` Darren Hart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091118042128.GC23808@google.com \
    --to=walken@google.com \
    --cc=dvhltc@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.