All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RFC] Userspace RCU: (ab)using futexes to save cpu cycles and energy
Date: Sun, 4 Oct 2009 17:12:16 -0400	[thread overview]
Message-ID: <20091004211216.GA23650@Krystal> (raw)
In-Reply-To: <20091004203639.GH6764@linux.vnet.ibm.com>

* Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> On Sun, Oct 04, 2009 at 10:37:45AM -0400, Mathieu Desnoyers wrote:
> > * Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> > > On Wed, Sep 23, 2009 at 01:48:20PM -0400, Mathieu Desnoyers wrote:
> > > > Hi,
> > > > 
> > > > When implementing the call_rcu() "worker thread" in userspace, I ran
> > > > into the problem that it had to be woken up periodically to check if
> > > > there are any callbacks to execute. However, I easily imagine that this
> > > > does not fit well with the "green computing" definition.
> > > > 
> > > > Therefore, I've looked at ways to have the call_rcu() callers waking up
> > > > this worker thread when callbacks are enqueued. However, I don't want to
> > > > take any lock and the fast path (when no wake up is required) should not
> > > > cause any cache-line exchange.
> > > > 
> > > > Here are the primitives I've created. I'd like to have feedback on my
> > > > futex use, just to make sure I did not do any incorrect assumptions.
> > > > 
> > > > This could also be eventually used in the QSBR Userspace RCU quiescent
> > > > state and in mb/signal userspace RCU when exiting RCU read-side C.S. to
> > > > ensure synchronize_rcu() does not busy-wait for too long.
> > > > 
> > > > /*
> > > >  * Wake-up any waiting defer thread. Called from many concurrent threads.
> > > >  */
> > > > static void wake_up_defer(void)
> > > > {
> > > >         if (unlikely(atomic_read(&defer_thread_futex) == -1))
> > > >                 atomic_set(&defer_thread_futex, 0);
> > > >                 futex(&defer_thread_futex, FUTEX_WAKE,
> > > >                       0, NULL, NULL, 0);
> > > > }
> > > > 
> > > > /*
> > > >  * Defer thread waiting. Single thread.
> > > >  */
> > > > static void wait_defer(void)
> > > > {
> > > >         atomic_dec(&defer_thread_futex);
> > > >         if (atomic_read(&defer_thread_futex) == -1)
> > > >                 futex(&defer_thread_futex, FUTEX_WAIT, -1,
> > > >                       NULL, NULL, 0);
> > > > }
> > > 
> > > The standard approach would be to use pthread_cond_wait() and
> > > pthread_cond_broadcast().  Unfortunately, this would require holding a
> > > pthread_mutex_lock across both operations, which would not necessarily
> > > be so good for wake-up-side scalability.
> > 
> > The pthread_cond_broadcast() mutex is really a bugger when it comes to
> > execute it at each rcu_read_unlock(). We could as well use a mutex to
> > protect the whole read-side.. :-(
> > 
> > > That said, without this sort of heavy-locking approach, wakeup races
> > > are quite difficult to avoid.
> > 
> > I did a formal model of my futex-based wait/wakeup. The main idea is
> > that the waiter:
> > 
> > - Set itself to "waiting"
> > - Checks the "real condition" for which it will wait (e.g. queues empty
> >   when used for rcu callbacks, no more ongoing old reader thread C.S.
> >   when used in synchronize_rcu())
> > - Calls sys_futex if the variable have not changed.
> > 
> > And the waker:
> > - sets the "real condition" waking up the waiter (enqueuing, or
> >   rcu_read_unlock())
> > - check if the waiter must be woken up, if so, wake it up by setting the
> >   state to "running" and calling sys_futex.
> > 
> > But as you say, wakeup races are difficult (but not impossible!) to
> > avoid. This is why I resorted to a formal model of the wait/wakeup
> > scheme to ensure that we cannot end up in a situation where a waker
> > races with the waiter and does not wake it up when it should. This is
> > nothing fancy (does not model memory and instruction reordering
> > automatically), but I figure that memory barriers are required between
> > almost every steps of this algorithm, so by adding smp_mb() I end up
> > ensure sequential behavior. I added test cases in the model to ensure
> > that incorrect memory reordering _would_ cause errors by doing the
> > reordering by hand in error-injection runs.
> 
> My question is whether pthread_cond_wait() and pthread_cond_broadcast()
> can substitute for the raw call to futex.  Unless I am missing something
> (which I quite possibly am), the kernel will serialize on the futex
> anyway, so serialization in user-mode code does not add much additional
> pain.

The kernel sys_futex implementation only takes per-bucket spinlocks. So
this is far from the cost of a global mutex in pthread_cond. Moreover,
my scheme does not require to take any mutex in the fast path (when
there is no waiter to wake up), which makes performances appropriate for
use in rcu read-side. It's a simple memory barrier, variable read, test
and branch in this case.

> 
> > The model is available at:
> > http://www.lttng.org/cgi-bin/gitweb.cgi?p=userspace-rcu.git;a=tree;f=futex-wakeup;h=4ddeaeb2784165cb0465d4ca9f7d27acb562eae3;hb=refs/heads/formal-model
> > 
> > (this is in the formal-model branch of the urcu tree, futex-wakeup
> > subdir)
> > 
> > This is modeling this snippet of code :
> > 
> > static int defer_thread_futex;
> > 
> > /*
> >  * Wake-up any waiting defer thread. Called from many concurrent threads.
> >  */
> > static void wake_up_defer(void)
> > {
> >         if (unlikely(uatomic_read(&defer_thread_futex) == -1)) {
> >                 uatomic_set(&defer_thread_futex, 0);
> >                 futex(&defer_thread_futex, FUTEX_WAKE, 1,
> >                       NULL, NULL, 0);
> >         }
> > }
> > 
> > static void enqueue(void *callback)	/* not the actual types */
> > {
> > 	add_to_queue(callback);
> > 	smp_mb();
> > 	wake_up_defer();
> > }
> > 
> > /*
> >  * rcu_defer_num_callbacks() returns the total number of callbacks
> >  * enqueued.
> >  */
> > 
> > /*
> >  * Defer thread waiting. Single thread.
> >  */
> > static void wait_defer(void)
> > {
> >         uatomic_dec(&defer_thread_futex);
> >         smp_mb();       /* Write futex before read queue */
> >         if (rcu_defer_num_callbacks()) {
> >                 smp_mb();       /* Read queue before write futex */
> >                 /* Callbacks are queued, don't wait. */
> >                 uatomic_set(&defer_thread_futex, 0);
> >         } else {
> >                 smp_rmb();      /* Read queue before read futex */
> >                 if (uatomic_read(&defer_thread_futex) == -1)
> >                         futex(&defer_thread_futex, FUTEX_WAIT, -1,
> >                               NULL, NULL, 0);
> >         }
> > }
> > 
> > 
> > Comments are welcome,
> 
> I will take a look after further recovery from jetlag.  Not yet competent
> to review this kind of stuff.  Give me a few days.  ;-)

No problem, thanks for looking at this,

Mathieu

> 
> 							Thanx, Paul

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2009-10-04 21:12 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-23 17:48 [RFC] Userspace RCU: (ab)using futexes to save cpu cycles and energy Mathieu Desnoyers
2009-09-23 18:04 ` Chris Friesen
2009-09-23 19:03   ` Mathieu Desnoyers
2009-09-23 22:32     ` Mathieu Desnoyers
2009-09-23 23:12       ` Chris Friesen
2009-09-23 23:28         ` Mathieu Desnoyers
2009-09-26  7:05           ` Mathieu Desnoyers
2009-09-28  7:11           ` Michael Schnell
2009-09-28 10:58             ` Michael Schnell
2009-09-28 11:01             ` Michael Schnell
2009-10-01 14:40 ` Paul E. McKenney
2009-10-04 14:37   ` Mathieu Desnoyers
2009-10-04 20:36     ` Paul E. McKenney
2009-10-04 21:12       ` Mathieu Desnoyers [this message]
     [not found]     ` <4AC99D55.8000102@lumino.de>
     [not found]       ` <20091005125533.GA1857@Krystal>
2009-10-05 13:22         ` Mathieu Desnoyers
2009-10-05 22:21           ` Mathieu Desnoyers
2009-10-07  7:22         ` Michael Schnell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091004211216.GA23650@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.