Re: [PATCH -tip] introduce sys_membarrier(): process-wide memory barrier (v9)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Nicholas Miell <nmiell@comcast.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, josh@joshtriplett.org,
	dvhltc@us.ibm.com, niv@us.ibm.com, tglx@linutronix.de,
	peterz@infradead.org, Valdis.Kletnieks@vt.edu,
	dhowells@redhat.com, linux-kernel@vger.kernel.org,
	Nick Piggin <npiggin@suse.de>,
	Chris Friesen <cfriesen@nortel.com>,
	Fr??d??ric Weisbecker <fweisbec@gmail.com>
Subject: Re: [PATCH -tip] introduce sys_membarrier(): process-wide memory barrier (v9)
Date: Thu, 4 Mar 2010 11:03:09 -0500	[thread overview]
Message-ID: <20100304160309.GA12634@Krystal> (raw)
In-Reply-To: <20100304155218.GA26468@Krystal>

* Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
> * Ingo Molnar (mingo@elte.hu) wrote:
> > 
> > * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> > 
> > > I am proposing this patch for the 2.6.34 merge window, as I think it is 
> > > ready for inclusion.
> > 
> > It's a bit late for this merge window i think.
> 
> OK, no problem. Thanks for taking time to review the patch. See below for
> response to your comments.
> 
> > 
> > > Here is an implementation of a new system call, sys_membarrier(), which 
> > > executes a memory barrier on all threads of the current process. It can be 
> > > used to distribute the cost of user-space memory barriers asymmetrically by 
> > > transforming pairs of memory barriers into pairs consisting of 
> > > sys_membarrier() and a compiler barrier. For synchronization primitives that 
> > > distinguish between read-side and write-side (e.g. userspace RCU, rwlocks), 
> > > the read-side can be accelerated significantly by moving the bulk of the 
> > > memory barrier overhead to the write-side.
> > 
> > Why is this such a low level and still special-purpose facility?
> > 
> > Synchronization facilities for high-performance threading may want to do a bit 
> > more than just execute a barrier instruction on another CPU that has a 
> > relevant thread running.
> 
> Yep, I'm aware of that.
> 
> > 
> > You cited signal based numbers:
> > 
> >  > (what we have now, with dynamic sys_membarrier check, expedited scheme)
> >  > memory barriers in reader: 907693804 reads, 817793 writes
> >  > sys_membarrier scheme:    4316818891 reads, 503790 writes
> >  >
> >  > (dynamic sys_membarrier check, non-expedited scheme)
> >  > memory barriers in reader: 907693804 reads, 817793 writes
> >  > sys_membarrier scheme:    8698725501 reads,    313 writes
> > 
> > Much of that signal handler overhead is i think due to:
> > 
> >  - FPU/SSE context save/restore
> >  - the need to wake up, run and deschedule all threads
> 
> This second point hurts, especially if we have more threads than processors.
> 
> > 
> > Instead i'd suggest for you to try to implement user-space RCU speedups not 
> > via the new sys_membarrier() syscall, but via two new signal extensions:
> > 
> >  - SA_NOFPU: on x86 to skip the FPU/SSE save/restore, for such fast in/out special 
> >    purpose signal handlers? (can whip up a quick patch for you if you want)
> 
> This could help.
> 
> > 
> >  - SA_RUNNING: a way to signal only running threads - as a way for user-space 
> >    based concurrency control mechanisms to deschedule running threads (or, like
> >    in your case, to implement barrier / garbage collection schemes).
> > 
> >    ( Note: to properly sync back you'll also need an sa_info field to tell
> >      target tasks how many tasks were woken up. That way a futex can be used 
> >      as a semaphore to signal back to the issuing thread, and make it all 
> >      properly event triggered and nicely scalable. Also, queued signals are a 
> >      must for such a scheme. )
> 
> Ah, nice! I wondered how you'd propose to deal with that one. It was actually my
> main problem: how to wait for all running threads to complete their execution.
> This added sa_info count and futex usage will indeed deal with the problem. And
> rt_sigqueueinfo() will ensure that we don't collapse multiple concurrent
> requests for execution of the same signal. For syncing back, I think we can do
> this without modifying sa_info. Simply passing a pointer to the counter to
> increment in the sigval value to rt_sigqueueinfo() should do the trick.

Hrm, I overlooked the fact that this counter must be written by the signal
sender. So we probably need to add a field to sa_info as you proposed.

Thanks,

Mathieu

> 
> > 
> > My estimation is that it will be _much_ faster than the naive signal based 
> > approach - maybe even quite comparable to an open-coded sys_membarrier():
> 
> Yes, especially given that your proposal permits to send all signals in in
> "broadcast to all running threads" mode, in a single system call.
> 
> > 
> >  - as most of the overhead in a real scenario ought to be the IPI sending and 
> >    latency - not the syscall entry/exit. (with a signal approach we'd still go
> >    into target thread user-mode, so one more syscall exit+re-entry)
> > 
> >  - or for the common case where there are no other threads running, we are 
> >    just in/out of SA_RUNNING without having to do any synchronization. In that
> >    case it should be quite close to sys_membarrier() - modulo some minimal 
> >    signal API overhead. [which we could optimize some more, if it's visible in
> >    your benchmarks.]
> > 
> > Signals per se are pretty scalable these days - now that most of the fastpaths 
> > are decoupled from tasklist_lock and everything is RCU-ized.
> > 
> > Further benefits are:
> > 
> >  - both SA_NOFPU and SA_RUNNING could be used by a _lot_ more user-space 
> >    facilities than just user-space RCU.
> > 
> >  - synergetic effects: growing some real high-performance facility based on 
> >    signals would ensure further signal speedups in the future as well. 
> >    Currently any server app that runs into signal limitations tends to shy 
> >    away from them and use some different (and often inferior) signalling 
> >    scheme. It would be better extend signals with 'lightweight' capabilities 
> >    as well.
> > 
> > All in one, signals are used by like 99.9% of Linux apps, while 
> > sys_membarrier() would be used only by [WAG] 0.00001% of them.
> > 
> > So before we can merge this (at least via the RCU tree, which you have sent it 
> > to), i'd like to see you try _much_, _MUCH_ harder to fix the very obvious 
> > signal overhead performance problems you have demoed via the numbers above so 
> > nicely.
> 
> I think we can start with the SA_RUNNING+modified sa_info approach to signal
> only running threads. I expect that much of the benefit will come from there.
> Then, from that point, we can see if SA_NOFPU provides a significant performance
> improvement.
> 
> Now, a very basic questions: in the signal-based approach I currently use, I
> reserve SIGUSR1 _from my liburcu library_ (yeah, that's pretty ugly). The
> problem is: how can I reserve new signal numbers from a library point of view
> without having the applications using it too ? We have room left in the rt
> signals numbers, so maybe this is a lesser problem than with standard signals,
> which are quite full, but the problem of making sure the application does not
> conflict stays.
> 
> > 
> > If _that_ fails, and if we get all the fruits of that, _then_ we might 
> > perhaps, with a lot of hesitation, concede defeat and think about adding yet 
> > another syscall.
> > 
> > I know it's cool to add a brand new syscall - but, unfortunately, in practice 
> > it doesnt help Linux apps all that much. (at least until we have tools/klibc/ 
> > or so.)
> > 
> > [ There's also a few small cleanliness details i noticed in your patch: enums
> >   are a tiny bit nicer for ABIs than #define's, the #ifdef SMP is ugly, etc. - 
> >   but it doesnt really matter much as i think we should concentrate on the
> >   scalability problems of signals first. ]
> 
> OK, let's do that.
> 
> Thanks,
> 
> Mathieu
> 
> > 
> > Thanks,
> > 
> > 	Ingo
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency Consultant
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
Operating System Efficiency Consultant
EfficiOS Inc.
http://www.efficios.com

next prev parent reply	other threads:[~2010-03-04 16:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-25 23:23 [PATCH -tip] introduce sys_membarrier(): process-wide memory barrier (v9) Mathieu Desnoyers
2010-03-01 14:25 ` Mathieu Desnoyers
2010-03-02 17:52 ` Josh Triplett
2010-03-02 23:07   ` Mathieu Desnoyers
2010-03-03  1:53     ` Josh Triplett
2010-03-04 12:23 ` Ingo Molnar
2010-03-04 15:52   ` Mathieu Desnoyers
2010-03-04 16:03     ` Mathieu Desnoyers [this message]
2010-03-04 16:34   ` Linus Torvalds
2010-03-04 16:50     ` Paul E. McKenney
2010-03-04 17:56     ` Mathieu Desnoyers
2010-03-15 20:53       ` Mathieu Desnoyers
2010-03-16  7:36         ` Ingo Molnar
2010-03-16  7:57           ` Nick Piggin
2010-03-16 13:05             ` Mathieu Desnoyers
2010-03-16 13:13             ` Ingo Molnar
2010-03-16 13:35               ` Mathieu Desnoyers
2010-03-16 13:56                 ` Ingo Molnar
2010-03-16 14:16                   ` Mathieu Desnoyers
2010-03-04 20:23     ` Ingo Molnar
2010-03-06 19:43       ` Linus Torvalds
2010-03-09  6:59         ` Nick Piggin
2010-03-10  4:16           ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100304160309.GA12634@Krystal \
    --to=mathieu.desnoyers@efficios.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=cfriesen@nortel.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhltc@us.ibm.com \
    --cc=fweisbec@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=nmiell@comcast.net \
    --cc=npiggin@suse.de \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox