Re: [PATCH] futex: add FUTEX_SET_WAIT operation (and ADAPTIVE)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Darren Hart <dvhltc@us.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Michel Lespinasse <walken@google.com>,
	Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation (and ADAPTIVE)
Date: Wed, 02 Dec 2009 22:55:25 -0800	[thread overview]
Message-ID: <4B1760DD.6060606@us.ibm.com> (raw)
In-Reply-To: <1258447807.7816.20.camel@laptop>

Peter Zijlstra wrote:
> --- 
> Subject: futex: implement adaptive spin
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Tue Jan 20 14:40:36 CET 2009
> 
> Implement kernel side adaptive spining for futexes.

Hi Peter,

I'm working on reconciling the SET_WAIT patch from Michel and your
ADAPTIVE_WAIT patch. So far I've forward ported the adaptive patch and
cleaned up a few things. I'd like your take on the updated adaptive spin
in futex_wait() below.

Also, in the adaptive case, we have to decide how to handle the val !=
uval case after the adaptive spin has failed. Otherwise
futex_wait_setup() will return EWOULDBLOCK. Just checking for the flag
may not be enough if the lock is subsequently released as we should then
acquire it. We face many of the same problems here as we do with
FUTEX_LOCK_PI, except we don't have the rtmutex code to handle some of
them for us.

I'm starting to think this may be best implemented as a new function and
op code, maybe something like FUTEX_LOCK which could take ADAPTIVE as a
flag. FUTEX_LOCK would demux to futex_lock() in do_futex. This op would
define the futex value policy like PI and Robust futexes do. We would
basicaly be implementing a mutex, indicated by the LOCK term (as with
FUTEX_LOCK_PI), as opposed to the WAIT term which is really meant to be
a simple wait queue. This would keep some complexity out of the wait
paths.

> 
> This is implemented as a new futex op: FUTEX_WAIT_ADAPTIVE, because it
> requires the futex lock field to contain a TID and regular futexes do
> not have that restriction.
> 
> FUTEX_WAIT_ADAPTIVE will spin while the lock owner is running (on a 
> different cpu) or until the task gets preempted. After that it behaves
> like FUTEX_WAIT.
> 
> The spin loop tries to acquire the lock by cmpxchg(lock, 0, tid) == tid
> on the lower 30 bits (TID_MASK) of the lock field -- leaving the
> WAITERS and OWNER_DIED flags in tact.
> 
> NOTE: we could fold mutex_spin_on_owner() and futex_spin_on_owner() by
> adding a lock_owner function argument.
> 
>   void lock_spin_on_owner(struct thread_info *(*lock_owner)(void *lock),
>                           void *lock, struct thread_info *owner);
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  include/linux/futex.h |    2 +
>  include/linux/sched.h |    1 
>  kernel/futex.c        |   96 ++++++++++++++++++++++++++++++++++++++++++--------
>  kernel/sched.c        |   59 ++++++++++++++++++++++++++++++
>  4 files changed, 143 insertions(+), 15 deletions(-)
> 

> ...

> +static int futex_wait(u32 __user *uaddr, int flags,
>  		      u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
>  {
>  	struct task_struct *curr = current;
> @@ -1176,11 +1206,43 @@ static int futex_wait(u32 __user *uaddr,
>  	if (!bitset)
>  		return -EINVAL;
> 
> +#ifdef CONFIG_SMP
> +	if (!futex_cmpxchg_enabled || !(flags & FLAGS_ADAPTIVE))
> +		goto skip_adaptive;
> +
> +	preempt_disable();
> +	for (;;) {
> +		struct thread_info *owner;
> +		u32 curval = 0, newval = task_pid_vnr(curr);
> +
> +		owner = futex_owner(uaddr);
> +		if (owner && futex_spin_on_owner(uaddr, owner))
> +			break;
> +
> +		if (get_futex_value_locked(&uval, uaddr))
> +			break;
> +
> +		curval |= uval & ~FUTEX_TID_MASK;
> +		newval |= uval & ~FUTEX_TID_MASK;
> +
> +		if (cmpxchg_futex_value_locked(uaddr, curval, newval)
> +				== newval)
> +			return 0;
> +
> +		if (!owner && (need_resched() || rt_task(curr)))
> +			break;
> +
> +		cpu_relax();
> +	}
> +	preempt_enable();
> +skip_adaptive:
> +#endif
> +

Maybe something more like:

#ifdef CONFIG_SMP
	if ((flags & FLAGS_ADAPTIVE) && futex_cmpxchg_enabled) {
		preempt_disable();
		tid = task_pid_vnr(current);
		for (;;) {
			struct thread_info *owner;
			u32 uval, curval, newval;

			owner = futex_owner(uaddr);
			if (owner && futex_spin_on_owner(uaddr, owner))
				break;

			if (get_futex_value_locked(&uval, uaddr))
				break;

			/*
			 * Preserve robust and waiters bits.
			 */
			curval = uval & ~FUTEX_TID_MASK;
			newval = tid | curval;

			if (cmpxchg_futex_value_locked(uaddr, curval, newval)
					== curval)
				return 0;

			/*
			 * Adaptive futexes manage the owner atomically. We
			 * are not in danger of deadlock by preempting a pending
			 * owner. Abort the loop if our time slice has expired.
			 * RT Threads can spin until they preempt the owner, at
			 * which point they will break out of the loop anyway.
			 */
			if (need_resched())
				break;

			/* Do we need this here? */
			cpu_relax();
		}
		preempt_enable();
	}
#endif

Thanks,

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

next prev parent reply	other threads:[~2009-12-03  6:55 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-17  7:46 [PATCH] futex: add FUTEX_SET_WAIT operation Michel Lespinasse
2009-11-17  8:18 ` Ingo Molnar
2009-11-17  8:55   ` Peter Zijlstra
2009-11-17 16:16     ` Darren Hart
2009-11-18  3:37       ` Michel Lespinasse
2009-11-18  5:29         ` Darren Hart
2009-11-24 14:39         ` [PATCH 0/3] perf bench: Add new benchmark for futex subsystem Hitoshi Mitake
2009-11-24 14:39         ` [PATCH 1/3] perf bench: Add wrappers for atomic operation of GCC Hitoshi Mitake
2009-11-24 16:20           ` Darren Hart
2009-11-26  5:44             ` Hitoshi Mitake
2009-11-24 14:39         ` [PATCH 2/3] perf bench: Add new files for futex performance test Hitoshi Mitake
2009-11-24 16:33           ` Darren Hart
2009-11-26  5:53             ` Hitoshi Mitake
2009-11-26  5:56               ` [PATCH] futextest: Make locktest() in harness.h more general Hitoshi Mitake
2009-11-24 14:39         ` [PATCH 3/3] perf bench: Fix misc files to build files related to futex Hitoshi Mitake
2009-11-18 22:13       ` [PATCH] futex: add FUTEX_SET_WAIT operation Michel Lespinasse
2009-11-19  6:51         ` Darren Hart
2009-11-19 17:03         ` Darren Hart
     [not found]           ` <8d20b11a0911191325u49624854u6132594f13b0718c@mail.gmail.com>
2009-11-19 23:13             ` Darren Hart
2009-11-21  2:36               ` Michel Lespinasse
2009-11-23 17:21                 ` Darren Hart
2009-11-17 17:24     ` Ingo Molnar
2009-11-17 17:27       ` Darren Hart
2009-11-18  1:49       ` Hitoshi Mitake
2009-11-17  8:50 ` Peter Zijlstra
2009-11-17 15:24   ` Linus Torvalds
2009-11-18  4:21     ` Michel Lespinasse
2009-11-18  5:40       ` Darren Hart
2009-11-30 22:09   ` Darren Hart
2009-12-03  6:55   ` Darren Hart [this message]
2009-11-17 17:22 ` Darren Hart
2009-11-18  3:29   ` Michel Lespinasse
2009-11-18  0:13 ` Darren Hart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B1760DD.6060606@us.ibm.com \
    --to=dvhltc@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox