public inbox for linux-api@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Waiman Long <waiman.long@hp.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Darren Hart <dvhart@linux.intel.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Andi Kleen <andi@firstfloor.org>, Ingo Molnar <mingo@kernel.org>,
	Davidlohr Bueso <davidlohr@hp.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	Jason Low <jason.low2@hp.com>,
	Scott J Norton <scott.norton@hp.com>,
	Robert Haas <robertmhaas@gmail.com>
Subject: Re: [RFC PATCH 0/5] futex: introduce an optimistic spinning futex
Date: Tue, 22 Jul 2014 22:52:39 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.10.1407222243210.23352@nanos> (raw)
In-Reply-To: <53CEC8AC.7020700@hp.com>

On Tue, 22 Jul 2014, Waiman Long wrote:

> On 07/22/2014 05:59 AM, Thomas Gleixner wrote:
> > On Tue, 22 Jul 2014, Peter Zijlstra wrote:
> > > On Tue, Jul 22, 2014 at 10:39:17AM +0200, Thomas Gleixner wrote:
> > > > On Tue, 22 Jul 2014, Peter Zijlstra wrote:
> > > > > Anyway, there is one big fail in the entire futex stack that we 'need'
> > > > > to sort some day and that is NUMA. Some people (again database people)
> > > > > explicitly do not use futexes and instead use sysvsem because of this.
> > > > > 
> > > > > The problem with numa futexes is that because they're vaddr based
> > > > > there
> > > > > is no (persistent) node information. You always end up having to fall
> > > > > back to looking in all nodes before you can guarantee there is no
> > > > > matching futex.
> > > > > 
> > > > > One way to achieve it is by extending the futex value to include a
> > > > > node
> > > > > number, but that's obviously a complete ABI break. Then again, it
> > > > > should
> > > > > be pretty straight fwd, since the node number doesn't need to be part
> > > > > of
> > > > > the actual atomic update part, just part of the userspace storage.
> > > > So you want per node hash buckets, right? Fair enough, but how do you
> > > > make sure, that no thread/process on a different node is fiddling with
> > > > that "node bound" futex as well?
> > > You don't and that should work just as well, just slower. But since the
> > > node id is in the futex 'value' we'll always end up in the right
> > > node-hash, even if its a remote one.
> > > 
> > > So yes, per node hashes, and a persistent futex->node map.
> > Which works fine as long as you only have the futex_q on the stack of
> > the blocked task. If user space is lying to you, then you just end up
> > with a bunch of threads sleeping forever. Who cares?
> > 
> > But if you create independent kernel state, which we have with
> > pi_state and which you need for finegrained locking and further
> > spinning fun, you open up another can of worms. Simply because this
> > would enable rogue user space to create multiple instances of the
> > kernel internal state. I can predict the CVEs resulting from that
> > even without using a crystal ball.
> > 
> > Thanks,
> > 
> > 	tglx
> 
> I think NUMA futex, if implemented, is a completely independent piece that
> have no direct relationship with optimistic spinning futex. It should be a
> separate patch and not mixing with optimistic spinning patch which will only
> make the latter one more complicated.

Bullshit. Of course it handles separate issues, but Peter is
completely right, that the NUMA aspect is a far bigger issue than the
optimistic spinning stuff. Do you have an idea what the costs of cross
node memory access and cacheline bouncing are? Obviously not, as you
only interest seems to be to slap optimistic spinning to every place
which deals with locking.

And if you had tried to read _AND_ understand the discussion above,
you might have noticed that providing NUMA awareness requires a lot of
the functionality which is needed for optimistic spinning as well.

But no, you did not even take the time to think about it, you just
claim that it makes your optimistic stuff more complicated. Just get
it, there is a world outside of optimistic spinning.

Thanks,

	tglx

  reply	other threads:[~2014-07-22 20:52 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-21 15:24 [RFC PATCH 0/5] futex: introduce an optimistic spinning futex Waiman Long
2014-07-21 15:24 ` [RFC PATCH 1/5] futex: add new exclusive lock & unlock command codes Waiman Long
2014-07-21 16:42   ` Thomas Gleixner
2014-07-22 18:22     ` Waiman Long
     [not found]       ` <53CEABD7.3030509-VXdhtT5mjnY@public.gmane.org>
2014-07-22 21:00         ` Thomas Gleixner
     [not found] ` <1405956271-34339-1-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
2014-07-21 15:24   ` [RFC PATCH 2/5] futex: add optimistic spinning to FUTEX_SPIN_LOCK Waiman Long
     [not found]     ` <1405956271-34339-3-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
2014-07-21 17:15       ` Davidlohr Bueso
     [not found]         ` <1405962929.11927.19.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2014-07-22 18:46           ` Waiman Long
2014-07-21 20:17     ` Jason Low
2014-07-22 19:34       ` Waiman Long
2014-07-21 15:24 ` [RFC PATCH 3/5] spinning futex: move a wakened task to spinning Waiman Long
2014-07-21 15:24 ` [RFC PATCH 4/5] spinning futex: put waiting tasks in a sorted rbtree Waiman Long
2014-07-21 15:24 ` [RFC PATCH 5/5] futex, doc: add a document on how to use the spinning futexes Waiman Long
2014-07-21 15:45   ` Randy Dunlap
2014-07-22  3:19     ` Waiman Long
2014-07-21 16:42 ` [RFC PATCH 0/5] futex: introduce an optimistic spinning futex Andi Kleen
2014-07-21 16:45   ` Andi Kleen
     [not found]     ` <871tte3bjw.fsf-KWJ+5VKanrL29G5dvP0v1laTQe2KTcn/@public.gmane.org>
2014-07-21 17:20       ` Darren Hart
     [not found]     ` <CFF29A00.9D44A%dvhart@linux.intel.com>
     [not found]       ` <CFF29A00.9D44A%dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-07-21 17:41         ` Darren Hart
     [not found]           ` <CFF29E4A.9D44E%dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2014-07-21 20:16             ` Thomas Gleixner
2014-07-21 21:27               ` Peter Zijlstra
2014-07-21 21:31                 ` Andy Lutomirski
2014-07-21 21:47                   ` Thomas Gleixner
2014-07-21 22:41                     ` Darren Hart
2014-07-22  1:01                       ` Thomas Gleixner
2014-07-22  1:34                         ` Steven Rostedt
2014-07-22  2:31                           ` Mike Galbraith
2014-07-22  3:06                           ` Davidlohr Bueso
     [not found]                           ` <20140721213457.46623e2f-f9ZlEuEWxVcJvu8Pb33WZ0EMvNT87kid@public.gmane.org>
2014-07-22  7:47                             ` Peter Zijlstra
2014-07-22  8:39                               ` Thomas Gleixner
2014-07-22  8:48                                 ` Peter Zijlstra
2014-07-22  9:59                                   ` Thomas Gleixner
2014-07-22 20:25                                     ` Waiman Long
2014-07-22 20:52                                       ` Thomas Gleixner [this message]
2014-07-22 20:21                         ` Waiman Long
2014-07-22 21:03                           ` Thomas Gleixner
2014-07-22  0:32                   ` Davidlohr Bueso
2014-07-22  7:35                     ` Peter Zijlstra
2014-07-21 21:43                 ` Thomas Gleixner
2014-07-21 18:24     ` Thomas Gleixner
2014-07-22 18:35     ` Waiman Long
2014-07-22 18:28   ` Waiman Long
     [not found]   ` <8761iq3bp3.fsf-KWJ+5VKanrL29G5dvP0v1laTQe2KTcn/@public.gmane.org>
2014-07-23  4:55     ` Mike Galbraith
2014-07-23  6:57       ` Peter Zijlstra
2014-07-23  7:25         ` Mike Galbraith
2014-07-23  7:35           ` Peter Zijlstra
2014-07-23  7:39             ` Mike Galbraith
2014-07-23  7:52               ` Peter Zijlstra
2014-07-21 21:18 ` Ingo Molnar
2014-07-21 21:41   ` Thomas Gleixner
     [not found]   ` <20140721211801.GA12149-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-07-22 19:36     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.10.1407222243210.23352@nanos \
    --to=tglx@linutronix.de \
    --cc=andi@firstfloor.org \
    --cc=davidlohr@hp.com \
    --cc=dvhart@linux.intel.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jason.low2@hp.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=robertmhaas@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=scott.norton@hp.com \
    --cc=waiman.long@hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox