public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Michael Schnell <mschnell@lumino.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Darren Hart <dvhltc@us.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC] Userspace RCU: (ab)using futexes to save cpu cycles and energy
Date: Mon, 5 Oct 2009 18:21:49 -0400	[thread overview]
Message-ID: <20091005222149.GB21419@Krystal> (raw)
In-Reply-To: <20091005132236.GA7416@Krystal>

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> (added some CC's, given the length of the answer. Thanks for asking)  ;)
> (Sorry for duplicate, messed up LKML email in original post)
> 
> * Michael Schnell (mschnell@lumino.de) wrote:
> > I still don't understand what the advantage of FUTEX is, if the thread
> > to be waked is always blocked, and thus the fast path is not in use.
> > 
> > -Michael
> 
> Hrm, your assumption about the common case does not seem to fit my
> scenarios. Typically, the wakeup will find the waiter not blocked, and
> thus skip the call to sys_futex. Here is why.
> 
> I use this scheme in two different implementations:
> 
> 1) in call_rcu(), to wake up the worker thread after adding work to the
> queue.
> 
> This worker thread, when woken up, sleeps for a few milliseconds before
> starting to dequeue work. Therefore, if the system is relatively busy,
> call_rcu() will usually see the worker thread while it's sleeping (and
> therefore _not_ waiting on the futex). Also, if work is enqueued while
> the worker thread is executing past RCU callbacks, the worker thread
> will detect it and won't wait on the futex.
> 
> Therefore, this is, by design, a very unlikely event to have call_rcu()
> calling sys_futex.
> 
> 2) in rcu_read_unlock(), to wake up synchronize_rcu() waiting on past
>    reader's grace periods.
> 
> Here, synchronize_rcu() busy-waits for the reader's G.P. to end, and if
> this does not work, after a few attempts (like the pthread mutexes), it
> adapts and uses sys_futex. The waker only need to call sys_futex if
> there is a synchronize_rcu() currently running which had to call
> sys_futex after a few active attempts failed.
> 
> As you see, in both cases, the common case, "fast path", is to find the
> futex unlocked and not having to take any lock.
> 
> 
> Now, about the slow path. I think it's worth discussing too. Indeed,
> sys_futex takes a per-address spinlock, which happens to serialize all
> sys_futex operations on the wakeup side. Therefore, for wakeup designs
> relying on calling sys_futex for wakeup very frequently, this is really
> bad.
> 
> There might be ways to mitigate this problem though: changing the
> sys_futex implementation to use lock-free lists might help.
> 
> One advantage of calling sys_futex without holding a userspace mutex is
> that the contention duration on the per-address spinlock is much shorter
> than the contention on the mutex, because of the system call execution
> overhead.
> 
> We could probably turn the sys_futex-dependent locking scheme into
> something more portable and manage to keep the same fast-path behavior
> if we replace the way I use sys_futex by a pthread cond var, e.g. :
> 
> Instead of having:
> 
> 
> static inline void wake_up_gp(void)
> {
>         if (unlikely(uatomic_read(&gp_futex) == -1)) {
>                 uatomic_set(&gp_futex, 0);
>                 futex(&gp_futex, FUTEX_WAKE, 1,
>                       NULL, NULL, 0);
>         }
> }
> 
> static void wait_gp(void)
> {
> 	uatomic_dec(&gp_futex);
>         smp_mb();
> 	if (!num_old_reader_gp()) {
>         	smp_mb();
> 		atomic_set(&gp_futex, 0);
> 		return;
> 	}
>         /* Read reader_gp before read futex */
>         smp_rmb();
>         if (uatomic_read(&gp_futex) == -1)
>                 futex(&gp_futex, FUTEX_WAIT, -1,
>                       NULL, NULL, 0);
> }
> 
> We could have:
> 
> static inline void wake_up_gp(void)
> {
> 	/* just released old reader gp */
> 	smp_mb();
>         if (unlikely(uatomic_read(&gp_wait) == -1)) {
>                 uatomic_set(&gp_wait, 0);
>                 pthread_cond_broadcast(&rcucond);
>         }
> }
> 
> static void wait_gp(void)
> {
> 	uatomic_dec(&gp_wait);
>         smp_mb();
> 	if (!num_old_reader_gp()) {
>         	smp_mb();
> 		atomic_set(&gp_wait, 0);
> 		goto unlock;
> 	}
>         /* Read reader_gp before read futex */
>         smp_rmb();
> 	pthread_mutex_lock(&rcumutex);
>         if (uatomic_read(&gp_wait) == -1) {
>                 pthread_cond_wait(&rcucond, &rcumutex);
> 		pthread_mutex_unlock(&rcumutex);
> 	}
> }
> 
> Is that what you had in mind ?
> 

Hrm, thinking about it, the pthread_cond_wait/pthread_cond_broadcast
scheme I proposed above is racy.

If a wait_gp executes concurrently with wake_up_gp, we can end up doing:

gp_wait = 0

*    wait_gp:       uatomic_dec(&gp_wait);
*    wait_gp:       smp_mb()
*    wait_gp:       if (!num_old_reader_gp()) { (false)
*    wait_gp:       pthread_mutex_lock(&rcumutex);
*    wait_gp:       if (uatomic_read(&gp_wait) == -1) { (true)
* wake_up_gp:  unlikely(uatomic_read(&gp_wait) == -1) { (true)
* wake_up_gp:  uatomic_set(&gp_wait, 0);
* wait_up_gp:  pthread_cond_broadcast(&rcucond);
*    wait_gp:       pthread_cond_wait(&rcucond, &rcumutex);

might wait forever (or until the next wakeup).

We would need an extra mutex in the wakeup, e.g.:

static inline void wake_up_gp(void)
{
	/* just released old reader gp */
	smp_mb();
        if (unlikely(uatomic_read(&gp_wait) == -1)) {
		pthread_mutex_lock(&rcumutex);
                uatomic_set(&gp_wait, 0);
                pthread_cond_broadcast(&rcucond);
		pthread_mutex_unlock(&rcumutex);
        }
}

static void wait_gp(void)
{
	uatomic_dec(&gp_wait);
        smp_mb();
	if (!num_old_reader_gp()) {
        	smp_mb();
		atomic_set(&gp_wait, 0);
		goto unlock;
	}
        /* Read reader_gp before read futex */
        smp_rmb();
	pthread_mutex_lock(&rcumutex);
        if (uatomic_read(&gp_wait) == -1) {
                pthread_cond_wait(&rcucond, &rcumutex);
		pthread_mutex_unlock(&rcumutex);
	}
}

This should work better, but I'll need to do a formal model to convince
myself.

Mathieu

> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2009-10-05 22:22 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-23 17:48 [RFC] Userspace RCU: (ab)using futexes to save cpu cycles and energy Mathieu Desnoyers
2009-09-23 18:04 ` Chris Friesen
2009-09-23 19:03   ` Mathieu Desnoyers
2009-09-23 22:32     ` Mathieu Desnoyers
2009-09-23 23:12       ` Chris Friesen
2009-09-23 23:28         ` Mathieu Desnoyers
2009-09-26  7:05           ` Mathieu Desnoyers
2009-09-28  7:11           ` Michael Schnell
2009-09-28 10:58             ` Michael Schnell
2009-09-28 11:01             ` Michael Schnell
2009-10-01 14:40 ` Paul E. McKenney
2009-10-04 14:37   ` Mathieu Desnoyers
2009-10-04 20:36     ` Paul E. McKenney
2009-10-04 21:12       ` Mathieu Desnoyers
     [not found]     ` <4AC99D55.8000102@lumino.de>
     [not found]       ` <20091005125533.GA1857@Krystal>
2009-10-05 13:22         ` Mathieu Desnoyers
2009-10-05 22:21           ` Mathieu Desnoyers [this message]
2009-10-07  7:22         ` Michael Schnell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091005222149.GB21419@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=dvhltc@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mschnell@lumino.de \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox