public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Darren Hart <dvhltc@us.ibm.com>
To: Michel Lespinasse <walken@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation
Date: Tue, 17 Nov 2009 09:22:18 -0800	[thread overview]
Message-ID: <4B02DBCA.4050307@us.ibm.com> (raw)
In-Reply-To: <20091117074655.GA14023@google.com>

Michel Lespinasse wrote:
> Hi,
> 

Hi Michel,

Thanks for the excellent writeup.  The concept looks reasonable, and 
useful, to me.  Just a few thoughts, comments below.

 > ...

> 
> By doing the futex value update atomically with the kernel's inspection
> of it to decide to wait, we avoid the time window where the futex has
> been set to the 'please wake me up' state, but the thread has not been
> queued onto the hash bucket yet. This has two effects:
> - Avoids a futex syscall with the FUTEX_WAKE operation if there is no
>   thread to be woken yet

This also reduces lock contention on the hash-bucket locks, another plus.

> - In the heavily contended case, avoids waking an extra thread that's
>   only likely to make the contention problem worse.

I'm not seeing this. What is the extra thread that would be woken which 
isn't with FUTEX_SET_WAIT?

 > ...

> 
> Signed-off-by: Michel Lespinasse <walken@google.com>
> 
> diff --git a/include/linux/futex.h b/include/linux/futex.h
> index 1e5a26d..c5e887d 100644
> --- a/include/linux/futex.h
> +++ b/include/linux/futex.h
> @@ -20,6 +20,7 @@
>  #define FUTEX_WAKE_BITSET	10
>  #define FUTEX_WAIT_REQUEUE_PI	11
>  #define FUTEX_CMP_REQUEUE_PI	12
> +#define FUTEX_SET_WAIT		13
> 
>  #define FUTEX_PRIVATE_FLAG	128
>  #define FUTEX_CLOCK_REALTIME	256
> @@ -39,6 +40,7 @@
>  					 FUTEX_PRIVATE_FLAG)
>  #define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | \
>  					 FUTEX_PRIVATE_FLAG)
> +#define FUTEX_SET_WAIT_PRIVATE		(FUTEX_SET_WAIT | FUTEX_PRIVATE_FLAG)
> 
>  /*
>   * Support for robust futexes: the kernel cleans up held futexes at
> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> index a8cc4e1..a199606 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -25,6 +25,7 @@ struct restart_block {
>  		struct {
>  			u32 *uaddr;
>  			u32 val;
> +			u32 val2;

It's a nitpic, but val2 is used in the futex syscall arguments already 
and another variable of the same name that is actually initially derived 
from uaddr2... is more likely to confuse than not.  Perhaps "setval"? 
Throughout the patch.


> @@ -1722,52 +1723,61 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
>  	 *
>  	 * The basic logical guarantee of a futex is that it blocks ONLY
>  	 * if cond(var) is known to be true at the time of blocking, for
> -	 * any cond.  If we queued after testing *uaddr, that would open
> -	 * a race condition where we could block indefinitely with
> +	 * any cond.  If we locked the hash-bucket after testing *uaddr, that
> +	 * would open a race condition where we could block indefinitely with
>  	 * cond(var) false, which would violate the guarantee.
>  	 *
> -	 * A consequence is that futex_wait() can return zero and absorb
> -	 * a wakeup when *uaddr != val on entry to the syscall.  This is
> -	 * rare, but normal.
> +	 * On the other hand, we insert q and release the hash-bucket only
> +	 * after testing *uaddr. This guarantees that futex_wait() will NOT
> +	 * absorb a wakeup if *uaddr does not match the desired values
> +	 * while the syscall executes.
>  	 */
>  retry:
>  	q->key = FUTEX_KEY_INIT;
> -	ret = get_futex_key(uaddr, fshared, &q->key, VERIFY_READ);
> +	ret = get_futex_key(uaddr, fshared, &q->key,
> +			    (val == val2) ? VERIFY_READ : VERIFY_WRITE);

Have you compared the performance of FUTEX_WAIT before and after the 
application of this patch? I'd be interested to see your test results on 
a prepatched kernel (with the FUTEX_SET_WAIT side commented out of course).

>  	if (unlikely(ret != 0))
>  		return ret;
> 
>  retry_private:
>  	*hb = queue_lock(q);
> 
> -	ret = get_futex_value_locked(&uval, uaddr);
> -
> -	if (ret) {
> +	pagefault_disable();
> +	if (unlikely(__copy_from_user_inatomic(&uval, uaddr, sizeof(u32)))) {
> +		pagefault_enable();

What about the addition of val2 makes it so we have to expand 
get_futex_value_locked() here with the nested fault handling?

>  		queue_unlock(q, *hb);
> -

Superfluous whitespace change

>  		ret = get_user(uval, uaddr);
> +	fault_common:

Inconsistent label indentation with the rest of the file.

>  		if (ret)
>  			goto out;
> -
>  		if (!fshared)
>  			goto retry_private;
> -
>  		put_futex_key(fshared, &q->key);
>  		goto retry;
>  	}
> -

Several of superfluous whitespace changes

> -	if (uval != val) {
> -		queue_unlock(q, *hb);
> -		ret = -EWOULDBLOCK;
> +	if (val != val2 && uval == val) {
> +		uval = futex_atomic_cmpxchg_inatomic(uaddr, val, val2);
> +		if (unlikely(uval == -EFAULT)) {
> +			pagefault_enable();
> +			queue_unlock(q, *hb);
> +			ret = fault_in_user_writeable(uaddr);
> +			goto fault_common;
> +		}
>  	}
> +	pagefault_enable();
> +
> +	if (uval == val || uval == val2)
> +		return 0;  /* success */

If the comment is necessary, please give it its own line (I've done a 
lot of futex commentary cleanup recently and am a little sensitive to 
maintaining that :-).

I'd rather not add another return point to the code, even if it saves us 
an if statement. You've already tested for uval == val above, perhaps 
this test can be integrated in the above blocks and then use the common 
out: label?

Now I need to review Peter's Adaptive bits and think on how these two 
relate...

Thanks,

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

  parent reply	other threads:[~2009-11-17 17:22 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-17  7:46 [PATCH] futex: add FUTEX_SET_WAIT operation Michel Lespinasse
2009-11-17  8:18 ` Ingo Molnar
2009-11-17  8:55   ` Peter Zijlstra
2009-11-17 16:16     ` Darren Hart
2009-11-18  3:37       ` Michel Lespinasse
2009-11-18  5:29         ` Darren Hart
2009-11-24 14:39         ` [PATCH 0/3] perf bench: Add new benchmark for futex subsystem Hitoshi Mitake
2009-11-24 14:39         ` [PATCH 1/3] perf bench: Add wrappers for atomic operation of GCC Hitoshi Mitake
2009-11-24 16:20           ` Darren Hart
2009-11-26  5:44             ` Hitoshi Mitake
2009-11-24 14:39         ` [PATCH 2/3] perf bench: Add new files for futex performance test Hitoshi Mitake
2009-11-24 16:33           ` Darren Hart
2009-11-26  5:53             ` Hitoshi Mitake
2009-11-26  5:56               ` [PATCH] futextest: Make locktest() in harness.h more general Hitoshi Mitake
2009-11-24 14:39         ` [PATCH 3/3] perf bench: Fix misc files to build files related to futex Hitoshi Mitake
2009-11-18 22:13       ` [PATCH] futex: add FUTEX_SET_WAIT operation Michel Lespinasse
2009-11-19  6:51         ` Darren Hart
2009-11-19 17:03         ` Darren Hart
     [not found]           ` <8d20b11a0911191325u49624854u6132594f13b0718c@mail.gmail.com>
2009-11-19 23:13             ` Darren Hart
2009-11-21  2:36               ` Michel Lespinasse
2009-11-23 17:21                 ` Darren Hart
2009-11-17 17:24     ` Ingo Molnar
2009-11-17 17:27       ` Darren Hart
2009-11-18  1:49       ` Hitoshi Mitake
2009-11-17  8:50 ` Peter Zijlstra
2009-11-17 15:24   ` Linus Torvalds
2009-11-18  4:21     ` Michel Lespinasse
2009-11-18  5:40       ` Darren Hart
2009-11-30 22:09   ` Darren Hart
2009-12-03  6:55   ` [PATCH] futex: add FUTEX_SET_WAIT operation (and ADAPTIVE) Darren Hart
2009-11-17 17:22 ` Darren Hart [this message]
2009-11-18  3:29   ` [PATCH] futex: add FUTEX_SET_WAIT operation Michel Lespinasse
2009-11-18  0:13 ` Darren Hart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B02DBCA.4050307@us.ibm.com \
    --to=dvhltc@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox