public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Darren Hart <darren@dvhart.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Michael Kerrisk <mtk.manpages@googlemail.com>,
	Davidlohr Bueso <dave@stgolabs.net>, Chris Mason <clm@fb.com>,
	"Carlos O'Donell" <carlos@redhat.com>,
	Torvald Riegel <triegel@redhat.com>,
	Eric Dumazet <edumazet@google.com>
Subject: Re: [RFC patch 4/7] futex: Add support for attached futexes
Date: Sun, 3 Apr 2016 13:16:28 +0200	[thread overview]
Message-ID: <20160403111628.GA16916@gmail.com> (raw)
In-Reply-To: <20160402110035.753145539@linutronix.de>


* Thomas Gleixner <tglx@linutronix.de> wrote:

> The standard futex mechanism in the Linux kernel uses a global hash to store
> transient state. Collisions on that hash can lead to performance degradation
> and on real-time enabled kernels even to priority inversions.
> 
> To guarantee futexes without collisions on the global kernel hash, we provide
> a mechanism to attach to a futex. This creates futex private state which
> avoids hash collisions and on NUMA systems also cross node memory access.
> 
> To utilize this mechanism each thread has to attach to the futex before any
> other operations on that futex.
> 
> The inner workings are as follows:
> 
> Attach:
> 
>     sys_futex(FUTEX_ATTACH | FUTEX_ATTACHED, uaddr, ....);
> 
>     If this is the first attach to uaddr then a 'global state' object is
>     created. This global state contains a futex hash bucket and a futex_q
>     object which is enqueued into the global hash for reference so subsequent
>     attachers can find it. Each attacher takes a reference count on the
>     'global state' object and hashes 'uaddr' into a thread local hash. This
>     thread local hash is lock free and dynamically expanded to avoid
>     collisions. Each populated entry in the thread local hash stores 'uaddr'
>     and a pointer to the 'global state' object.
> 
> Futex ops:
> 
>     sys_futex(FUTEX_XXX | FUTEX_ATTACHED, uaddr, ....);
> 
>     If the attached flag is set, then 'uaddr' is hashed and the thread local
>     hash is checked whether the hash entry contains 'uaddr'. If no, an error
>     code is returned. If yes, the hash slot number is stored in the futex key
>     which is used for further operations on the futex. When the hash bucket is
>     looked up then attached futexes will use the slot number to retrieve the
>     pointer to the 'global state' object and use the embedded hash bucket for
>     the operation. Non-attached futexes just use the global hash as before.
> 
> Detach:
> 
>     sys_futex(FUTEX_DETACH | FUTEX_ATTACHED, uaddr, ....);
>    
>     Detach removes the entry in the thread local hash and decrements the
>     refcount on the 'global state' object. Once the refcount drops to zero the
>     'global state' object is removed from the global hash and destroyed.
> 
>     Thread exit cleans up the thread local hash and the 'global state' objects
>     as we do for other futex related storage already.
> 
> The thread local hash and the 'global state' object are allocated on the node
> on which the attaching thread runs.
> 
> Attached mode works with all futex operations and with both private and shared
> futexes. For operations which involve two futexes, i.e. FUTEX_REQUEUE_* both
> futexes have to be either attached or detached (like FUTEX_PRIVATE).
> 
> Why not auto attaching?
> 
>     Auto attaching has the following problems:
> 
>      - Memory consumption
>      - Life time issues
>      - Performance issues due to the necessary allocations

But those are mostly setup only costs, right?

So I don't think this conclusion is necessarily true, even on smaller systems:

>     So, no. It must be opt-in and reserved for explicit isolation purposes.
> 
> A modified version of 'perf bench futex hash' shows the following results:

and look at the very measurable performance advantages on a small NUMA system:

  Before:

  >  Averaged 1451441 operations/sec (+- 3.65%), total secs = 60

  After:

  >  Averaged 1709712 operations/sec (+- 4.67%), total secs = 60

  > That's a performance increase of 18%.

... and I suspect that on a larger NUMA system the speedup is probably a lot more 
pronounced.

Also, the thing is, allocation/deallocation costs are a second order concern IMHO, 
because most of the futex's usage is the lock/unlock operations.

So my prediction: in real life large systems will want to have collision-free 
futexes most of the time, and they don't want to modify every futex using 
application or library. So this is a mostly kernel side system sizing 
question/decision, not really a user-side system purpose policy question.

So an ABI distinction and offloading the decision to every single application that 
wants to use it and hardcode it into actual application source code via an ABI is 
pretty much the _WORST_ way to go about it IMHO...

So how about this: don't add any ABI details, but make futexes auto-attached on 
NUMA systems (and obviously PREEMPT_RT systems)?

I.e. make it a build time or boot time decision at most, don't start a messy 
'should we used attached futexes or not' decisions on the ABI side, which we know 
from Linux ABI history won't be answered and utilized very well by applications!

Thanks,

	Ingo

  parent reply	other threads:[~2016-04-03 11:16 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-02 11:09 [RFC patch 0/7] futex: Add support for attached futexes Thomas Gleixner
2016-04-02 11:09 ` [RFC patch 1/7] futex: Provide helpers for hash bucket add/remove Thomas Gleixner
2016-04-02 11:09 ` [RFC patch 2/7] futex: Add some more function commentry Thomas Gleixner
2016-04-02 11:09 ` [RFC patch 3/7] futex: Make key init a helper function Thomas Gleixner
2016-04-02 11:09 ` [RFC patch 4/7] futex: Add support for attached futexes Thomas Gleixner
2016-04-02 16:26   ` Peter Zijlstra
2016-04-02 18:01     ` Thomas Gleixner
2016-04-02 16:29   ` Peter Zijlstra
2016-04-03  9:59     ` Thomas Gleixner
2016-04-02 18:19   ` Andy Lutomirski
2016-04-03  9:57     ` Thomas Gleixner
2016-04-03 13:18       ` Andy Lutomirski
2016-04-03 15:56         ` Thomas Gleixner
2016-04-03 16:11           ` Andy Lutomirski
2016-04-02 23:48   ` Rasmus Villemoes
2016-04-03 10:05     ` Thomas Gleixner
2016-04-03 11:16   ` Ingo Molnar [this message]
2016-04-03 11:30     ` Linus Torvalds
2016-04-05  7:44       ` Torvald Riegel
2016-04-05 15:58       ` Carlos O'Donell
2016-04-02 11:09 ` [RFC patch 5/7] perf/bench/futex-hash: Support " Thomas Gleixner
2016-04-02 11:09 ` [RFC patch 7/7] [PATCH] glibc: nptl: Add support for attached pthread_mutexes Thomas Gleixner
2016-04-02 16:30   ` Peter Zijlstra
2016-04-02 16:32     ` Peter Zijlstra
2016-04-03 10:08     ` Thomas Gleixner
2016-04-02 11:09 ` [RFC patch 6/7] futex.2: Document attached mode Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160403111628.GA16916@gmail.com \
    --to=mingo@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=carlos@redhat.com \
    --cc=clm@fb.com \
    --cc=darren@dvhart.com \
    --cc=dave@stgolabs.net \
    --cc=edumazet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtk.manpages@googlemail.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=triegel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox