Re: [RFC] Userspace RCU: (ab)using futexes to save cpu cycles and energy

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Michael Schnell <mschnell@lumino.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Darren Hart <dvhltc@us.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC] Userspace RCU: (ab)using futexes to save cpu cycles and energy
Date: Mon, 5 Oct 2009 18:21:49 -0400	[thread overview]
Message-ID: <20091005222149.GB21419@Krystal> (raw)
In-Reply-To: <20091005132236.GA7416@Krystal>

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> (added some CC's, given the length of the answer. Thanks for asking)  ;)
> (Sorry for duplicate, messed up LKML email in original post)
> 
> * Michael Schnell (mschnell@lumino.de) wrote:
> > I still don't understand what the advantage of FUTEX is, if the thread
> > to be waked is always blocked, and thus the fast path is not in use.
> > 
> > -Michael
> 
> Hrm, your assumption about the common case does not seem to fit my
> scenarios. Typically, the wakeup will find the waiter not blocked, and
> thus skip the call to sys_futex. Here is why.
> 
> I use this scheme in two different implementations:
> 
> 1) in call_rcu(), to wake up the worker thread after adding work to the
> queue.
> 
> This worker thread, when woken up, sleeps for a few milliseconds before
> starting to dequeue work. Therefore, if the system is relatively busy,
> call_rcu() will usually see the worker thread while it's sleeping (and
> therefore _not_ waiting on the futex). Also, if work is enqueued while
> the worker thread is executing past RCU callbacks, the worker thread
> will detect it and won't wait on the futex.
> 
> Therefore, this is, by design, a very unlikely event to have call_rcu()
> calling sys_futex.
> 
> 2) in rcu_read_unlock(), to wake up synchronize_rcu() waiting on past
>    reader's grace periods.
> 
> Here, synchronize_rcu() busy-waits for the reader's G.P. to end, and if
> this does not work, after a few attempts (like the pthread mutexes), it
> adapts and uses sys_futex. The waker only need to call sys_futex if
> there is a synchronize_rcu() currently running which had to call
> sys_futex after a few active attempts failed.
> 
> As you see, in both cases, the common case, "fast path", is to find the
> futex unlocked and not having to take any lock.
> 
> 
> Now, about the slow path. I think it's worth discussing too. Indeed,
> sys_futex takes a per-address spinlock, which happens to serialize all
> sys_futex operations on the wakeup side. Therefore, for wakeup designs
> relying on calling sys_futex for wakeup very frequently, this is really
> bad.
> 
> There might be ways to mitigate this problem though: changing the
> sys_futex implementation to use lock-free lists might help.
> 
> One advantage of calling sys_futex without holding a userspace mutex is
> that the contention duration on the per-address spinlock is much shorter
> than the contention on the mutex, because of the system call execution
> overhead.
> 
> We could probably turn the sys_futex-dependent locking scheme into
> something more portable and manage to keep the same fast-path behavior
> if we replace the way I use sys_futex by a pthread cond var, e.g. :
> 
> Instead of having:
> 
> 
> static inline void wake_up_gp(void)
> {
>         if (unlikely(uatomic_read(&gp_futex) == -1)) {
>                 uatomic_set(&gp_futex, 0);
>                 futex(&gp_futex, FUTEX_WAKE, 1,
>                       NULL, NULL, 0);
>         }
> }
> 
> static void wait_gp(void)
> {
> 	uatomic_dec(&gp_futex);
>         smp_mb();
> 	if (!num_old_reader_gp()) {
>         	smp_mb();
> 		atomic_set(&gp_futex, 0);
> 		return;
> 	}
>         /* Read reader_gp before read futex */
>         smp_rmb();
>         if (uatomic_read(&gp_futex) == -1)
>                 futex(&gp_futex, FUTEX_WAIT, -1,
>                       NULL, NULL, 0);
> }
> 
> We could have:
> 
> static inline void wake_up_gp(void)
> {
> 	/* just released old reader gp */
> 	smp_mb();
>         if (unlikely(uatomic_read(&gp_wait) == -1)) {
>                 uatomic_set(&gp_wait, 0);
>                 pthread_cond_broadcast(&rcucond);
>         }
> }
> 
> static void wait_gp(void)
> {
> 	uatomic_dec(&gp_wait);
>         smp_mb();
> 	if (!num_old_reader_gp()) {
>         	smp_mb();
> 		atomic_set(&gp_wait, 0);
> 		goto unlock;
> 	}
>         /* Read reader_gp before read futex */
>         smp_rmb();
> 	pthread_mutex_lock(&rcumutex);
>         if (uatomic_read(&gp_wait) == -1) {
>                 pthread_cond_wait(&rcucond, &rcumutex);
> 		pthread_mutex_unlock(&rcumutex);
> 	}
> }
> 
> Is that what you had in mind ?
> 

Hrm, thinking about it, the pthread_cond_wait/pthread_cond_broadcast
scheme I proposed above is racy.

If a wait_gp executes concurrently with wake_up_gp, we can end up doing:

gp_wait = 0

*    wait_gp:       uatomic_dec(&gp_wait);
*    wait_gp:       smp_mb()
*    wait_gp:       if (!num_old_reader_gp()) { (false)
*    wait_gp:       pthread_mutex_lock(&rcumutex);
*    wait_gp:       if (uatomic_read(&gp_wait) == -1) { (true)
* wake_up_gp:  unlikely(uatomic_read(&gp_wait) == -1) { (true)
* wake_up_gp:  uatomic_set(&gp_wait, 0);
* wait_up_gp:  pthread_cond_broadcast(&rcucond);
*    wait_gp:       pthread_cond_wait(&rcucond, &rcumutex);

might wait forever (or until the next wakeup).

We would need an extra mutex in the wakeup, e.g.:

static inline void wake_up_gp(void)
{
	/* just released old reader gp */
	smp_mb();
        if (unlikely(uatomic_read(&gp_wait) == -1)) {
		pthread_mutex_lock(&rcumutex);
                uatomic_set(&gp_wait, 0);
                pthread_cond_broadcast(&rcucond);
		pthread_mutex_unlock(&rcumutex);
        }
}

static void wait_gp(void)
{
	uatomic_dec(&gp_wait);
        smp_mb();
	if (!num_old_reader_gp()) {
        	smp_mb();
		atomic_set(&gp_wait, 0);
		goto unlock;
	}
        /* Read reader_gp before read futex */
        smp_rmb();
	pthread_mutex_lock(&rcumutex);
        if (uatomic_read(&gp_wait) == -1) {
                pthread_cond_wait(&rcucond, &rcumutex);
		pthread_mutex_unlock(&rcumutex);
	}
}

This should work better, but I'll need to do a formal model to convince
myself.

Mathieu

> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

next prev parent reply	other threads:[~2009-10-05 22:22 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-23 17:48 [RFC] Userspace RCU: (ab)using futexes to save cpu cycles and energy Mathieu Desnoyers
2009-09-23 18:04 ` Chris Friesen
2009-09-23 19:03   ` Mathieu Desnoyers
2009-09-23 22:32     ` Mathieu Desnoyers
2009-09-23 23:12       ` Chris Friesen
2009-09-23 23:28         ` Mathieu Desnoyers
2009-09-26  7:05           ` Mathieu Desnoyers
2009-09-28  7:11           ` Michael Schnell
2009-09-28 10:58             ` Michael Schnell
2009-09-28 11:01             ` Michael Schnell
2009-10-01 14:40 ` Paul E. McKenney
2009-10-04 14:37   ` Mathieu Desnoyers
2009-10-04 20:36     ` Paul E. McKenney
2009-10-04 21:12       ` Mathieu Desnoyers
     [not found]     ` <4AC99D55.8000102@lumino.de>
     [not found]       ` <20091005125533.GA1857@Krystal>
2009-10-05 13:22         ` Mathieu Desnoyers
2009-10-05 22:21           ` Mathieu Desnoyers [this message]
2009-10-07  7:22         ` Michael Schnell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091005222149.GB21419@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=dvhltc@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mschnell@lumino.de \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.