From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Chris Friesen <cfriesen@nortel.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC] Userspace RCU: (ab)using futexes to save cpu cycles and energy
Date: Wed, 23 Sep 2009 18:32:04 -0400 [thread overview]
Message-ID: <20090923223204.GA2921@Krystal> (raw)
In-Reply-To: <20090923190337.GA16983@Krystal>
* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> * Chris Friesen (cfriesen@nortel.com) wrote:
> > On 09/23/2009 11:48 AM, Mathieu Desnoyers wrote:
> >
> > > Here are the primitives I've created. I'd like to have feedback on my
> > > futex use, just to make sure I did not do any incorrect assumptions.
> >
> > > /*
> > > * Wake-up any waiting defer thread. Called from many concurrent threads.
> > > */
> > > static void wake_up_defer(void)
> > > {
> > > if (unlikely(atomic_read(&defer_thread_futex) == -1))
> > > atomic_set(&defer_thread_futex, 0);
> > > futex(&defer_thread_futex, FUTEX_WAKE,
> > > 0, NULL, NULL, 0);
> > > }
> >
> > Is it a problem if multiple threads all hit the defer_thread_futex==-1
> > case simultaneously?
>
> It should not be, since what we'll have is, e.g.:
>
> thread 1 calls wakeup
> thread 2 calls wakeup
> thread 3 calls wait
>
> (thread 3 is waiting on the futex, defer_thread_futex = -1)
> - thread 1 sees defer_thread_futex==-1
> - thread 2 sees defer_thread_futex==-1
> - thread 1 sets defer_thread_futex = 0
> - thread 2 sets defer_thread_futex = 0
> - thread 1 calls futex() to wake up the waiter, expect 0
> - thread 2 calls futex() to wake up the waiter, expect 0
>
> Basically, what happens in this scenario is that the first futex()
> call will wake up any waiter, and the second will be a no-op.
>
> Let's complicate this, if we have thread 3 running wait_defer()
> concurrently:
>
> - thread 3 decrements defer_thread_futex
> - thread 1 sees defer_thread_futex==-1
> - thread 2 sees defer_thread_futex==-1
> - thread 1 sets defer_thread_futex = 0
> - thread 2 sets defer_thread_futex = 0
> - thread 1 calls futex() to wake up the waiter, expect 0
> - thread 2 calls futex() to wake up the waiter, expect 0
> - thread 3 calls futex() to wait, expect -1
> Returns immediately because defer_thread_futex == 0
>
> Other scenario, where thread decrements defer_thread_futex a bit later:
>
> - thread 1 sees defer_thread_futex==0
> - thread 2 sees defer_thread_futex==0
> - thread 3 decrements defer_thread_futex
> - thread 3 tests defer_thread_futex==-1
> - thread 3 calls futex() to wait, expect -1
>
> In this scenario, we have to notice that if threads 1/2 enqueued tasks
> to do before checking defer_thread_futex, these tasks would not be seen
> by the waiter thread.
>
> So correct memory ordering of:
>
> - wake_up_defer:
> * queue callbacks to perform (1)
> * wake up (2)
>
> - wait_defer:
> * for (;;)
> * wait for futex (3)
> * sleep 100ms (wait for more callbacks to be enqueued)
> * dequeue callbacks, execute them (4)
>
>
> actually matters. I'll have to be really careful about that (unless we
> just accept that tasks to perform could be queued for a while, however,
> I'd like to give an upper bound to the delay between batch callback
> execution).
>
> Ensuring that 1 is written before 2, and that 4 is done before 3 seems a
> bit racy. (I have to got out for lunch now, so I'll have to review the
> ordering afterward)
>
I knew I needed to think about it a bit more. Here is the proposed
algorithm hopefully fixing the race identified in the 3rd scenario
above. The idea is to perform the "check for empty queue" between the
&defer_thread_futex decrement and the test in wait_defer. It skips the
futex call and proceed if the list is non-empty.
As I am drilling into the problem, it looks very much like an attempt to
implement efficient wait queues in userspace based on sys_futex().
/*
* Wake-up any waiting defer thread. Called from many concurrent
* threads.
*/
static void wake_up_defer(void)
{
if (unlikely(atomic_read(&defer_thread_futex) == -1)) {
atomic_set(&defer_thread_futex, 0);
futex(&defer_thread_futex, FUTEX_WAKE, 0,
NULL, NULL, 0);
}
}
/*
* Defer thread waiting. Single thread.
*/
static void wait_defer(void)
{
atomic_dec(&defer_thread_futex);
smp_mb(); /* Write futex before read queue */
if (rcu_defer_num_callbacks()) {
smp_mb(); /* Read queue before write futex */
/* Callbacks are queued, don't wait. */
atomic_set(&defer_thread_futex, 0);
} else {
smp_rmb(); /* Read queue before read futex */
if (atomic_read(&defer_thread_futex) == -1)
futex(&defer_thread_futex, FUTEX_WAIT, -1,
NULL, NULL, 0);
}
}
- call_rcu():
* queue callbacks to perform
* smp_mb()
* wake_up_defer()
- defer thread:
* for (;;)
* wait_defer()
* sleep 100ms (wait for more callbacks to be enqueued)
* dequeue callbacks, execute them
The goal here is that if call_rcu() enqueues a callback (even if it
races with defer thread going to sleep), there should not be a
potentially infinite delay before it gets executed. Therefore, being
blocked in sys_futex while there is a callback to execute, without any
hope to be woken up unless another callback is queued, would not meet
that design requirement. I think that checking the number of queued
callbacks within wait_defer() as I propose here should address this
situation.
Comments ?
Mathieu
>
> > If so, maybe this should use an atomic
> > test-and-set operation so that only one thread actually calls futex().
>
> It's not a matter if many threads wake up the waiter, so I don't think
> the test-and-set is required. The benefit of using a simple test here is
> that we don't have to bring the cache-line in exclusive mode to the
> local CPU to perform the test. It can stay shared.
>
> >
> > > /*
> > > * Defer thread waiting. Single thread.
> > > */
> > > static void wait_defer(void)
> > > {
> > > atomic_dec(&defer_thread_futex);
> > > if (atomic_read(&defer_thread_futex) == -1)
> > > futex(&defer_thread_futex, FUTEX_WAIT, -1,
> > > NULL, NULL, 0);
> > > }
> >
> > Is it a problem if the value of defer_thread_futex changes to zero after
> > the dec but before the test?
>
> No. That's not a problem, because this means there is a concurrent "wake
> up". Seeing a value of "0" here will skip over the futex wait and go on.
> The concurrent futex wakeup call will simply be a no-op in that case.
>
> Thanks for the comments,
>
> Mathieu
>
> >
> > Chris
>
> --
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2009-09-23 22:32 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-23 17:48 [RFC] Userspace RCU: (ab)using futexes to save cpu cycles and energy Mathieu Desnoyers
2009-09-23 18:04 ` Chris Friesen
2009-09-23 19:03 ` Mathieu Desnoyers
2009-09-23 22:32 ` Mathieu Desnoyers [this message]
2009-09-23 23:12 ` Chris Friesen
2009-09-23 23:28 ` Mathieu Desnoyers
2009-09-26 7:05 ` Mathieu Desnoyers
2009-09-28 7:11 ` Michael Schnell
2009-09-28 10:58 ` Michael Schnell
2009-09-28 11:01 ` Michael Schnell
2009-10-01 14:40 ` Paul E. McKenney
2009-10-04 14:37 ` Mathieu Desnoyers
2009-10-04 20:36 ` Paul E. McKenney
2009-10-04 21:12 ` Mathieu Desnoyers
[not found] ` <4AC99D55.8000102@lumino.de>
[not found] ` <20091005125533.GA1857@Krystal>
2009-10-05 13:22 ` Mathieu Desnoyers
2009-10-05 22:21 ` Mathieu Desnoyers
2009-10-07 7:22 ` Michael Schnell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090923223204.GA2921@Krystal \
--to=mathieu.desnoyers@polymtl.ca \
--cc=cfriesen@nortel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.