Re: [PATCH -tip] introduce sys_membarrier(): process-wide memory barrier (v9)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Nicholas Miell <nmiell@comcast.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, josh@joshtriplett.org,
	dvhltc@us.ibm.com, niv@us.ibm.com, tglx@linutronix.de,
	peterz@infradead.org, Valdis.Kletnieks@vt.edu,
	dhowells@redhat.com, linux-kernel@vger.kernel.org,
	Nick Piggin <npiggin@suse.de>,
	Chris Friesen <cfriesen@nortel.com>,
	Fr??d??ric Weisbecker <fweisbec@gmail.com>
Subject: Re: [PATCH -tip] introduce sys_membarrier(): process-wide memory barrier (v9)
Date: Thu, 4 Mar 2010 11:03:09 -0500	[thread overview]
Message-ID: <20100304160309.GA12634@Krystal> (raw)
In-Reply-To: <20100304155218.GA26468@Krystal>

* Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
> * Ingo Molnar (mingo@elte.hu) wrote:
> > 
> > * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> > 
> > > I am proposing this patch for the 2.6.34 merge window, as I think it is 
> > > ready for inclusion.
> > 
> > It's a bit late for this merge window i think.
> 
> OK, no problem. Thanks for taking time to review the patch. See below for
> response to your comments.
> 
> > 
> > > Here is an implementation of a new system call, sys_membarrier(), which 
> > > executes a memory barrier on all threads of the current process. It can be 
> > > used to distribute the cost of user-space memory barriers asymmetrically by 
> > > transforming pairs of memory barriers into pairs consisting of 
> > > sys_membarrier() and a compiler barrier. For synchronization primitives that 
> > > distinguish between read-side and write-side (e.g. userspace RCU, rwlocks), 
> > > the read-side can be accelerated significantly by moving the bulk of the 
> > > memory barrier overhead to the write-side.
> > 
> > Why is this such a low level and still special-purpose facility?
> > 
> > Synchronization facilities for high-performance threading may want to do a bit 
> > more than just execute a barrier instruction on another CPU that has a 
> > relevant thread running.
> 
> Yep, I'm aware of that.
> 
> > 
> > You cited signal based numbers:
> > 
> >  > (what we have now, with dynamic sys_membarrier check, expedited scheme)
> >  > memory barriers in reader: 907693804 reads, 817793 writes
> >  > sys_membarrier scheme:    4316818891 reads, 503790 writes
> >  >
> >  > (dynamic sys_membarrier check, non-expedited scheme)
> >  > memory barriers in reader: 907693804 reads, 817793 writes
> >  > sys_membarrier scheme:    8698725501 reads,    313 writes
> > 
> > Much of that signal handler overhead is i think due to:
> > 
> >  - FPU/SSE context save/restore
> >  - the need to wake up, run and deschedule all threads
> 
> This second point hurts, especially if we have more threads than processors.
> 
> > 
> > Instead i'd suggest for you to try to implement user-space RCU speedups not 
> > via the new sys_membarrier() syscall, but via two new signal extensions:
> > 
> >  - SA_NOFPU: on x86 to skip the FPU/SSE save/restore, for such fast in/out special 
> >    purpose signal handlers? (can whip up a quick patch for you if you want)
> 
> This could help.
> 
> > 
> >  - SA_RUNNING: a way to signal only running threads - as a way for user-space 
> >    based concurrency control mechanisms to deschedule running threads (or, like
> >    in your case, to implement barrier / garbage collection schemes).
> > 
> >    ( Note: to properly sync back you'll also need an sa_info field to tell
> >      target tasks how many tasks were woken up. That way a futex can be used 
> >      as a semaphore to signal back to the issuing thread, and make it all 
> >      properly event triggered and nicely scalable. Also, queued signals are a 
> >      must for such a scheme. )
> 
> Ah, nice! I wondered how you'd propose to deal with that one. It was actually my
> main problem: how to wait for all running threads to complete their execution.
> This added sa_info count and futex usage will indeed deal with the problem. And
> rt_sigqueueinfo() will ensure that we don't collapse multiple concurrent
> requests for execution of the same signal. For syncing back, I think we can do
> this without modifying sa_info. Simply passing a pointer to the counter to
> increment in the sigval value to rt_sigqueueinfo() should do the trick.

Hrm, I overlooked the fact that this counter must be written by the signal
sender. So we probably need to add a field to sa_info as you proposed.

Thanks,

Mathieu

> 
> > 
> > My estimation is that it will be _much_ faster than the naive signal based 
> > approach - maybe even quite comparable to an open-coded sys_membarrier():
> 
> Yes, especially given that your proposal permits to send all signals in in
> "broadcast to all running threads" mode, in a single system call.
> 
> > 
> >  - as most of the overhead in a real scenario ought to be the IPI sending and 
> >    latency - not the syscall entry/exit. (with a signal approach we'd still go
> >    into target thread user-mode, so one more syscall exit+re-entry)
> > 
> >  - or for the common case where there are no other threads running, we are 
> >    just in/out of SA_RUNNING without having to do any synchronization. In that
> >    case it should be quite close to sys_membarrier() - modulo some minimal 
> >    signal API overhead. [which we could optimize some more, if it's visible in
> >    your benchmarks.]
> > 
> > Signals per se are pretty scalable these days - now that most of the fastpaths 
> > are decoupled from tasklist_lock and everything is RCU-ized.
> > 
> > Further benefits are:
> > 
> >  - both SA_NOFPU and SA_RUNNING could be used by a _lot_ more user-space 
> >    facilities than just user-space RCU.
> > 
> >  - synergetic effects: growing some real high-performance facility based on 
> >    signals would ensure further signal speedups in the future as well. 
> >    Currently any server app that runs into signal limitations tends to shy 
> >    away from them and use some different (and often inferior) signalling 
> >    scheme. It would be better extend signals with 'lightweight' capabilities 
> >    as well.
> > 
> > All in one, signals are used by like 99.9% of Linux apps, while 
> > sys_membarrier() would be used only by [WAG] 0.00001% of them.
> > 
> > So before we can merge this (at least via the RCU tree, which you have sent it 
> > to), i'd like to see you try _much_, _MUCH_ harder to fix the very obvious 
> > signal overhead performance problems you have demoed via the numbers above so 
> > nicely.
> 
> I think we can start with the SA_RUNNING+modified sa_info approach to signal
> only running threads. I expect that much of the benefit will come from there.
> Then, from that point, we can see if SA_NOFPU provides a significant performance
> improvement.
> 
> Now, a very basic questions: in the signal-based approach I currently use, I
> reserve SIGUSR1 _from my liburcu library_ (yeah, that's pretty ugly). The
> problem is: how can I reserve new signal numbers from a library point of view
> without having the applications using it too ? We have room left in the rt
> signals numbers, so maybe this is a lesser problem than with standard signals,
> which are quite full, but the problem of making sure the application does not
> conflict stays.
> 
> > 
> > If _that_ fails, and if we get all the fruits of that, _then_ we might 
> > perhaps, with a lot of hesitation, concede defeat and think about adding yet 
> > another syscall.
> > 
> > I know it's cool to add a brand new syscall - but, unfortunately, in practice 
> > it doesnt help Linux apps all that much. (at least until we have tools/klibc/ 
> > or so.)
> > 
> > [ There's also a few small cleanliness details i noticed in your patch: enums
> >   are a tiny bit nicer for ABIs than #define's, the #ifdef SMP is ugly, etc. - 
> >   but it doesnt really matter much as i think we should concentrate on the
> >   scalability problems of signals first. ]
> 
> OK, let's do that.
> 
> Thanks,
> 
> Mathieu
> 
> > 
> > Thanks,
> > 
> > 	Ingo
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency Consultant
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
Operating System Efficiency Consultant
EfficiOS Inc.
http://www.efficios.com

next prev parent reply	other threads:[~2010-03-04 16:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-25 23:23 [PATCH -tip] introduce sys_membarrier(): process-wide memory barrier (v9) Mathieu Desnoyers
2010-03-01 14:25 ` Mathieu Desnoyers
2010-03-02 17:52 ` Josh Triplett
2010-03-02 23:07   ` Mathieu Desnoyers
2010-03-03  1:53     ` Josh Triplett
2010-03-04 12:23 ` Ingo Molnar
2010-03-04 15:52   ` Mathieu Desnoyers
2010-03-04 16:03     ` Mathieu Desnoyers [this message]
2010-03-04 16:34   ` Linus Torvalds
2010-03-04 16:50     ` Paul E. McKenney
2010-03-04 17:56     ` Mathieu Desnoyers
2010-03-15 20:53       ` Mathieu Desnoyers
2010-03-16  7:36         ` Ingo Molnar
2010-03-16  7:57           ` Nick Piggin
2010-03-16 13:05             ` Mathieu Desnoyers
2010-03-16 13:13             ` Ingo Molnar
2010-03-16 13:35               ` Mathieu Desnoyers
2010-03-16 13:56                 ` Ingo Molnar
2010-03-16 14:16                   ` Mathieu Desnoyers
2010-03-04 20:23     ` Ingo Molnar
2010-03-06 19:43       ` Linus Torvalds
2010-03-09  6:59         ` Nick Piggin
2010-03-10  4:16           ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100304160309.GA12634@Krystal \
    --to=mathieu.desnoyers@efficios.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=cfriesen@nortel.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhltc@us.ibm.com \
    --cc=fweisbec@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=nmiell@comcast.net \
    --cc=npiggin@suse.de \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.