[PATCH V6 0/4][RFC] futex: FUTEX_LOCK with optional adaptive spinning

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH V6 0/4][RFC] futex: FUTEX_LOCK with optional adaptive spinning
@ 2010-05-06  6:24 Darren Hart
  2010-05-06  6:24 ` [PATCH 1/4] futex: replace fshared and clockrt with combined flags Darren Hart
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Darren Hart @ 2010-05-06  6:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Darren Hart, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason, John Cooper,
	Chris Wright, Ulrich Drepper, Alan Cox, Avi Kivity

RFC - NOT FOR INCLUSION

The following patch series implements a new experimental kernel side futex mutex
via new FUTEX_LOCK and FUTEX_LOCK_ADAPTIVE futex op codes. The adaptive spin
allows for multiple spinners until the lock is released or the owner is
descheduled. The patch currently allows the user to specify if they want
spinning or not, but the final version will only provide the adaptive variant as
blocking locks are already very efficiently implemented with the existing
operations.

This version greatly outperforms the last, and actually outperforms adaptive
pthread mutexes for long lock hold times.  The primary difference from the
previous implementation was userspace optimization, although many kernel-side
improvements were made as well.

I'm using the futex_lock branch of my futextest test suite to gather results.
The testcases (futex_lock.c, futex_wait_2.c, pthread_mutex_2.c, and
futex_wait_tid.c) are under performance/ and can be checked out here:

git://git.kernel.org/pub/scm/linux/kernel/git/dvhart/futextest.git

At Avi's suggestion, I prepared plots of multiple thread counts, they are
available at the URL below. These plots are all from runs with a period of
1,000 or 10,000 instructions, with one plot per each of several duty-cycles.
For V6 I have added two new comparison tests, "thread_wait_tid" which uses
FUTEX_WAIT/WAKE to implement a mutex just as "futex_wait" does, but it uses
the TID|FUTEX_WAITERS futex value policy to illustrate the overhead over a
simple 0,1,2 policy. Second, "pthread_mutex_pi" uses a PTHREAD_PRIO_INHERIT
mutex which also uses the TID|FUTEX_WAITERS policy and has a higher overhead
set of futex op codes (FUTEX_LOCK_PI and FUTEX_UNLOCK_PI).

http://www.kernel.org/pub/linux/kernel/people/dvhart/adaptive_futex/v6/v6c-p1000-logs/plots/
http://www.kernel.org/pub/linux/kernel/people/dvhart/adaptive_futex/v6/v6c-p10000-logs/plots/

As illustrated in the above plots:
o FUTEX_LOCK_ADAPTIVE performs significantly better than FUTEX_LOCK for the
  shorter period over all duty-cycles and comparable for the longer period on
  all but the 32% duty-cycle, where it again out-performed the non-adaptive
  version.
o PI pthread mutex underperformed every other implementaion by one or two orders
  of magnitude. This is surely due to the significant overhead of the PI futex
  operations.
o The futex_wait_tid test illustrates the additional overhead imposed by the
  TID|FUTEX_WAITERS policy which requires the use of | and & operators as well
  as more conditionals and even cmpxchg loops. This overhead becomes very
  apparent in higher thread counts. The new FUTEX_LOCK operations outperform
  the futex_wait_tid in most scenarios.
o The only consistent win for FUTEX_LOCK_ADAPTIVE is at the 32% duty cycle,
  where it outperforms every other implementation.
o The adaptive futex_wait test served as a reference to verify my general
  approach was not grossly inferior to the hand-written asm employed by
  pthreads. Notice that futex_wait*-adaptive is mostly on par with
  pthread_mutex*-adaptive.

Given the necessity of the TID|FUTEX_WAITERS policy with a kernel-side adaptive
spin implementation, I feel this implementation is pretty well optimized.

Next steps:

o Improve workload definition. The current workload is a cpu-hog. It runs a
  fixed set of instructions in a loop, with some percentage of those being
  contained within the lock/unlock block. Adding sleeps to reduce CPU overhead
  just added so much scheduling overhead that the numbers dropped absurdly for
  both normal and adaptive. I'd appreciate any assistance in preparing a
  test-case for a real-world usage scenario. I will work with some of IBM's
  product teams to do so, and would welcome any input from others with an
  interest in kernel assisted userspace spinlocks.

o Attempt the kernel assisted userspace spinning version proposed by Avi, Alan,
  and Ulrich for comparison. This involves providing information about the
  running state of each thread to userspace. Where exactly this memory should
  be allocated is still unclear. For a proof of concept I will like create a
  simple array indexed by bid and a vdso style API to access this information.

Finally, the V6 diffstat:
 include/linux/futex.h |    6 +
 include/linux/sched.h |    1 +
 kernel/futex.c        |  572 ++++++++++++++++++++++++++++++++++++-------------
 kernel/sched.c        |   66 ++++++
 4 files changed, 494 insertions(+), 151 deletions(-)

Thanks,

Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] futex: replace fshared and clockrt with combined flags
  2010-05-06  6:24 [PATCH V6 0/4][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
@ 2010-05-06  6:24 ` Darren Hart
  2010-05-06  6:24 ` [PATCH 2/4] futex: add futex_q static initializer Darren Hart
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Darren Hart @ 2010-05-06  6:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Darren Hart, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason, John Cooper,
	Chris Wright, Ulrich Drepper, Alan Cox, Avi Kivity

In the early days we passed the mmap sem around. That became the
"int fshared" with the fast gup improvements. Then we added
"int clockrt" in places. This patch unifies these options as "flags" and
cleans up various calls which had fshared in their signature but no
longer used it.

V6: Whitespace fixes
V4: Fix an operator precedence bug

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
---
 kernel/futex.c |  215 +++++++++++++++++++++++++++-----------------------------
 1 files changed, 104 insertions(+), 111 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index e7a35f1..ab3fb7c 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -69,6 +69,14 @@ int __read_mostly futex_cmpxchg_enabled;
 #define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)
 
 /*
+ * Futex flags used to encode options to functions and preserve them across
+ * restarts.
+ */
+#define FLAGS_SHARED		0x01
+#define FLAGS_CLOCKRT		0x02
+#define FLAGS_HAS_TIMEOUT	0x04
+
+/*
  * Priority Inheritance state:
  */
 struct futex_pi_state {
@@ -283,7 +291,7 @@ again:
 }
 
 static inline
-void put_futex_key(int fshared, union futex_key *key)
+void put_futex_key(union futex_key *key)
 {
 	drop_futex_key_refs(key);
 }
@@ -878,7 +886,7 @@ double_unlock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2)
 /*
  * Wake up waiters matching bitset queued on this futex (uaddr).
  */
-static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
+static int futex_wake(u32 __user *uaddr, int flags, int nr_wake, u32 bitset)
 {
 	struct futex_hash_bucket *hb;
 	struct futex_q *this, *next;
@@ -889,7 +897,7 @@ static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
 	if (!bitset)
 		return -EINVAL;
 
-	ret = get_futex_key(uaddr, fshared, &key);
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -915,7 +923,7 @@ static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
 	}
 
 	spin_unlock(&hb->lock);
-	put_futex_key(fshared, &key);
+	put_futex_key(&key);
 out:
 	return ret;
 }
@@ -925,7 +933,7 @@ out:
  * to this virtual address:
  */
 static int
-futex_wake_op(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
+futex_wake_op(u32 __user *uaddr1, int flags, u32 __user *uaddr2,
 	      int nr_wake, int nr_wake2, int op)
 {
 	union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
@@ -935,10 +943,10 @@ futex_wake_op(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
 	int ret, op_ret;
 
 retry:
-	ret = get_futex_key(uaddr1, fshared, &key1);
+	ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1);
 	if (unlikely(ret != 0))
 		goto out;
-	ret = get_futex_key(uaddr2, fshared, &key2);
+	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2);
 	if (unlikely(ret != 0))
 		goto out_put_key1;
 
@@ -970,11 +978,11 @@ retry_private:
 		if (ret)
 			goto out_put_keys;
 
-		if (!fshared)
+		if (!(flags & FLAGS_SHARED))
 			goto retry_private;
 
-		put_futex_key(fshared, &key2);
-		put_futex_key(fshared, &key1);
+		put_futex_key(&key2);
+		put_futex_key(&key1);
 		goto retry;
 	}
 
@@ -1004,9 +1012,9 @@ retry_private:
 
 	double_unlock_hb(hb1, hb2);
 out_put_keys:
-	put_futex_key(fshared, &key2);
+	put_futex_key(&key2);
 out_put_key1:
-	put_futex_key(fshared, &key1);
+	put_futex_key(&key1);
 out:
 	return ret;
 }
@@ -1140,11 +1148,13 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
 
 /**
  * futex_requeue() - Requeue waiters from uaddr1 to uaddr2
- * uaddr1:	source futex user address
- * uaddr2:	target futex user address
- * nr_wake:	number of waiters to wake (must be 1 for requeue_pi)
- * nr_requeue:	number of waiters to requeue (0-INT_MAX)
- * requeue_pi:	if we are attempting to requeue from a non-pi futex to a
+ * @uaddr1:	source futex user address
+ * @flags:	futex flags (FLAGS_SHARED, FLAGS_CLOCKRT, etc.), they must be
+ *		the same type, no requeueing from private to shared, etc.
+ * @uaddr2:	target futex user address
+ * @nr_wake:	number of waiters to wake (must be 1 for requeue_pi)
+ * @nr_requeue:	number of waiters to requeue (0-INT_MAX)
+ * @requeue_pi:	if we are attempting to requeue from a non-pi futex to a
  * 		pi futex (pi to pi requeue is not supported)
  *
  * Requeue waiters on uaddr1 to uaddr2. In the requeue_pi case, try to acquire
@@ -1154,7 +1164,7 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
  * >=0 - on success, the number of tasks requeued or woken
  *  <0 - on error
  */
-static int futex_requeue(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
+static int futex_requeue(u32 __user *uaddr1, int flags, u32 __user *uaddr2,
 			 int nr_wake, int nr_requeue, u32 *cmpval,
 			 int requeue_pi)
 {
@@ -1197,10 +1207,10 @@ retry:
 		pi_state = NULL;
 	}
 
-	ret = get_futex_key(uaddr1, fshared, &key1);
+	ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1);
 	if (unlikely(ret != 0))
 		goto out;
-	ret = get_futex_key(uaddr2, fshared, &key2);
+	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2);
 	if (unlikely(ret != 0))
 		goto out_put_key1;
 
@@ -1222,11 +1232,11 @@ retry_private:
 			if (ret)
 				goto out_put_keys;
 
-			if (!fshared)
+			if (!(flags & FLAGS_SHARED))
 				goto retry_private;
 
-			put_futex_key(fshared, &key2);
-			put_futex_key(fshared, &key1);
+			put_futex_key(&key2);
+			put_futex_key(&key1);
 			goto retry;
 		}
 		if (curval != *cmpval) {
@@ -1266,8 +1276,8 @@ retry_private:
 			break;
 		case -EFAULT:
 			double_unlock_hb(hb1, hb2);
-			put_futex_key(fshared, &key2);
-			put_futex_key(fshared, &key1);
+			put_futex_key(&key2);
+			put_futex_key(&key1);
 			ret = fault_in_user_writeable(uaddr2);
 			if (!ret)
 				goto retry;
@@ -1275,8 +1285,8 @@ retry_private:
 		case -EAGAIN:
 			/* The owner was exiting, try again. */
 			double_unlock_hb(hb1, hb2);
-			put_futex_key(fshared, &key2);
-			put_futex_key(fshared, &key1);
+			put_futex_key(&key2);
+			put_futex_key(&key1);
 			cond_resched();
 			goto retry;
 		default:
@@ -1358,9 +1368,9 @@ out_unlock:
 		drop_futex_key_refs(&key1);
 
 out_put_keys:
-	put_futex_key(fshared, &key2);
+	put_futex_key(&key2);
 out_put_key1:
-	put_futex_key(fshared, &key1);
+	put_futex_key(&key1);
 out:
 	if (pi_state != NULL)
 		free_pi_state(pi_state);
@@ -1500,7 +1510,7 @@ static void unqueue_me_pi(struct futex_q *q)
  * private futexes.
  */
 static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
-				struct task_struct *newowner, int fshared)
+				struct task_struct *newowner)
 {
 	u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
 	struct futex_pi_state *pi_state = q->pi_state;
@@ -1593,20 +1603,11 @@ handle_fault:
 	goto retry;
 }
 
-/*
- * In case we must use restart_block to restart a futex_wait,
- * we encode in the 'flags' shared capability
- */
-#define FLAGS_SHARED		0x01
-#define FLAGS_CLOCKRT		0x02
-#define FLAGS_HAS_TIMEOUT	0x04
-
 static long futex_wait_restart(struct restart_block *restart);
 
 /**
  * fixup_owner() - Post lock pi_state and corner case management
  * @uaddr:	user address of the futex
- * @fshared:	whether the futex is shared (1) or not (0)
  * @q:		futex_q (contains pi_state and access to the rt_mutex)
  * @locked:	if the attempt to take the rt_mutex succeeded (1) or not (0)
  *
@@ -1619,8 +1620,7 @@ static long futex_wait_restart(struct restart_block *restart);
  *  0 - success, lock not taken
  * <0 - on error (-EFAULT)
  */
-static int fixup_owner(u32 __user *uaddr, int fshared, struct futex_q *q,
-		       int locked)
+static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
 {
 	struct task_struct *owner;
 	int ret = 0;
@@ -1631,7 +1631,7 @@ static int fixup_owner(u32 __user *uaddr, int fshared, struct futex_q *q,
 		 * did a lock-steal - fix up the PI-state in that case:
 		 */
 		if (q->pi_state->owner != current)
-			ret = fixup_pi_state_owner(uaddr, q, current, fshared);
+			ret = fixup_pi_state_owner(uaddr, q, current);
 		goto out;
 	}
 
@@ -1658,7 +1658,7 @@ static int fixup_owner(u32 __user *uaddr, int fshared, struct futex_q *q,
 		 * lock. Fix the state up.
 		 */
 		owner = rt_mutex_owner(&q->pi_state->pi_mutex);
-		ret = fixup_pi_state_owner(uaddr, q, owner, fshared);
+		ret = fixup_pi_state_owner(uaddr, q, owner);
 		goto out;
 	}
 
@@ -1721,7 +1721,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
  * futex_wait_setup() - Prepare to wait on a futex
  * @uaddr:	the futex userspace address
  * @val:	the expected value
- * @fshared:	whether the futex is shared (1) or not (0)
+ * @flags:	futex flags (FLAGS_SHARED, etc.)
  * @q:		the associated futex_q
  * @hb:		storage for hash_bucket pointer to be returned to caller
  *
@@ -1734,7 +1734,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
  *  0 - uaddr contains val and hb has been locked
  * <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlcoked
  */
-static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
+static int futex_wait_setup(u32 __user *uaddr, u32 val, int flags,
 			   struct futex_q *q, struct futex_hash_bucket **hb)
 {
 	u32 uval;
@@ -1759,7 +1759,7 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
 	 */
 retry:
 	q->key = FUTEX_KEY_INIT;
-	ret = get_futex_key(uaddr, fshared, &q->key);
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q->key);
 	if (unlikely(ret != 0))
 		return ret;
 
@@ -1775,10 +1775,10 @@ retry_private:
 		if (ret)
 			goto out;
 
-		if (!fshared)
+		if (!(flags & FLAGS_SHARED))
 			goto retry_private;
 
-		put_futex_key(fshared, &q->key);
+		put_futex_key(&q->key);
 		goto retry;
 	}
 
@@ -1789,12 +1789,12 @@ retry_private:
 
 out:
 	if (ret)
-		put_futex_key(fshared, &q->key);
+		put_futex_key(&q->key);
 	return ret;
 }
 
-static int futex_wait(u32 __user *uaddr, int fshared,
-		      u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
+static int futex_wait(u32 __user *uaddr, int flags, u32 val, ktime_t *abs_time,
+		      u32 bitset)
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
 	struct restart_block *restart;
@@ -1813,8 +1813,9 @@ static int futex_wait(u32 __user *uaddr, int fshared,
 	if (abs_time) {
 		to = &timeout;
 
-		hrtimer_init_on_stack(&to->timer, clockrt ? CLOCK_REALTIME :
-				      CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+		hrtimer_init_on_stack(&to->timer, (flags & FLAGS_CLOCKRT) ?
+				      CLOCK_REALTIME : CLOCK_MONOTONIC,
+				      HRTIMER_MODE_ABS);
 		hrtimer_init_sleeper(to, current);
 		hrtimer_set_expires_range_ns(&to->timer, *abs_time,
 					     current->timer_slack_ns);
@@ -1822,7 +1823,7 @@ static int futex_wait(u32 __user *uaddr, int fshared,
 
 retry:
 	/* Prepare to wait on uaddr. */
-	ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
+	ret = futex_wait_setup(uaddr, val, flags, &q, &hb);
 	if (ret)
 		goto out;
 
@@ -1842,7 +1843,7 @@ retry:
 	 * victim of a spurious wakeup as well.
 	 */
 	if (!signal_pending(current)) {
-		put_futex_key(fshared, &q.key);
+		put_futex_key(&q.key);
 		goto retry;
 	}
 
@@ -1856,17 +1857,12 @@ retry:
 	restart->futex.val = val;
 	restart->futex.time = abs_time->tv64;
 	restart->futex.bitset = bitset;
-	restart->futex.flags = FLAGS_HAS_TIMEOUT;
-
-	if (fshared)
-		restart->futex.flags |= FLAGS_SHARED;
-	if (clockrt)
-		restart->futex.flags |= FLAGS_CLOCKRT;
+	restart->futex.flags = flags;
 
 	ret = -ERESTART_RESTARTBLOCK;
 
 out_put_key:
-	put_futex_key(fshared, &q.key);
+	put_futex_key(&q.key);
 out:
 	if (to) {
 		hrtimer_cancel(&to->timer);
@@ -1879,7 +1875,6 @@ out:
 static long futex_wait_restart(struct restart_block *restart)
 {
 	u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
-	int fshared = 0;
 	ktime_t t, *tp = NULL;
 
 	if (restart->futex.flags & FLAGS_HAS_TIMEOUT) {
@@ -1887,11 +1882,9 @@ static long futex_wait_restart(struct restart_block *restart)
 		tp = &t;
 	}
 	restart->fn = do_no_restart_syscall;
-	if (restart->futex.flags & FLAGS_SHARED)
-		fshared = 1;
-	return (long)futex_wait(uaddr, fshared, restart->futex.val, tp,
-				restart->futex.bitset,
-				restart->futex.flags & FLAGS_CLOCKRT);
+
+	return (long)futex_wait(uaddr, restart->futex.flags,
+				restart->futex.val, tp, restart->futex.bitset);
 }
 
 
@@ -1901,8 +1894,8 @@ static long futex_wait_restart(struct restart_block *restart)
  * if there are waiters then it will block, it does PI, etc. (Due to
  * races the kernel might see a 0 value of the futex too.)
  */
-static int futex_lock_pi(u32 __user *uaddr, int fshared,
-			 int detect, ktime_t *time, int trylock)
+static int futex_lock_pi(u32 __user *uaddr, int flags, int detect,
+			 ktime_t *time, int trylock)
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
 	struct futex_hash_bucket *hb;
@@ -1925,7 +1918,7 @@ static int futex_lock_pi(u32 __user *uaddr, int fshared,
 	q.requeue_pi_key = NULL;
 retry:
 	q.key = FUTEX_KEY_INIT;
-	ret = get_futex_key(uaddr, fshared, &q.key);
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -1947,7 +1940,7 @@ retry_private:
 			 * exit to complete.
 			 */
 			queue_unlock(&q, hb);
-			put_futex_key(fshared, &q.key);
+			put_futex_key(&q.key);
 			cond_resched();
 			goto retry;
 		default:
@@ -1977,7 +1970,7 @@ retry_private:
 	 * Fixup the pi_state owner and possibly acquire the lock if we
 	 * haven't already.
 	 */
-	res = fixup_owner(uaddr, fshared, &q, !ret);
+	res = fixup_owner(uaddr, &q, !ret);
 	/*
 	 * If fixup_owner() returned an error, proprogate that.  If it acquired
 	 * the lock, clear our -ETIMEDOUT or -EINTR.
@@ -2001,7 +1994,7 @@ out_unlock_put_key:
 	queue_unlock(&q, hb);
 
 out_put_key:
-	put_futex_key(fshared, &q.key);
+	put_futex_key(&q.key);
 out:
 	if (to)
 		destroy_hrtimer_on_stack(&to->timer);
@@ -2014,10 +2007,10 @@ uaddr_faulted:
 	if (ret)
 		goto out_put_key;
 
-	if (!fshared)
+	if (!(flags & FLAGS_SHARED))
 		goto retry_private;
 
-	put_futex_key(fshared, &q.key);
+	put_futex_key(&q.key);
 	goto retry;
 }
 
@@ -2026,7 +2019,7 @@ uaddr_faulted:
  * This is the in-kernel slowpath: we look up the PI state (if any),
  * and do the rt-mutex unlock.
  */
-static int futex_unlock_pi(u32 __user *uaddr, int fshared)
+static int futex_unlock_pi(u32 __user *uaddr, int flags)
 {
 	struct futex_hash_bucket *hb;
 	struct futex_q *this, *next;
@@ -2044,7 +2037,7 @@ retry:
 	if ((uval & FUTEX_TID_MASK) != task_pid_vnr(current))
 		return -EPERM;
 
-	ret = get_futex_key(uaddr, fshared, &key);
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -2099,14 +2092,14 @@ retry:
 
 out_unlock:
 	spin_unlock(&hb->lock);
-	put_futex_key(fshared, &key);
+	put_futex_key(&key);
 
 out:
 	return ret;
 
 pi_faulted:
 	spin_unlock(&hb->lock);
-	put_futex_key(fshared, &key);
+	put_futex_key(&key);
 
 	ret = fault_in_user_writeable(uaddr);
 	if (!ret)
@@ -2166,7 +2159,7 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
 /**
  * futex_wait_requeue_pi() - Wait on uaddr and take uaddr2
  * @uaddr:	the futex we initially wait on (non-pi)
- * @fshared:	whether the futexes are shared (1) or not (0).  They must be
+ * @flags:	futex flags (FLAGS_SHARED, FLAGS_CLOCKRT, etc.), they must be
  * 		the same type, no requeueing from private to shared, etc.
  * @val:	the expected value of uaddr
  * @abs_time:	absolute timeout
@@ -2204,9 +2197,9 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
  *  0 - On success
  * <0 - On error
  */
-static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
+static int futex_wait_requeue_pi(u32 __user *uaddr, int flags,
 				 u32 val, ktime_t *abs_time, u32 bitset,
-				 int clockrt, u32 __user *uaddr2)
+				 u32 __user *uaddr2)
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
 	struct rt_mutex_waiter rt_waiter;
@@ -2221,8 +2214,9 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 
 	if (abs_time) {
 		to = &timeout;
-		hrtimer_init_on_stack(&to->timer, clockrt ? CLOCK_REALTIME :
-				      CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+		hrtimer_init_on_stack(&to->timer, (flags & FLAGS_CLOCKRT) ?
+				      CLOCK_REALTIME : CLOCK_MONOTONIC,
+				      HRTIMER_MODE_ABS);
 		hrtimer_init_sleeper(to, current);
 		hrtimer_set_expires_range_ns(&to->timer, *abs_time,
 					     current->timer_slack_ns);
@@ -2236,7 +2230,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 	rt_waiter.task = NULL;
 
 	key2 = FUTEX_KEY_INIT;
-	ret = get_futex_key(uaddr2, fshared, &key2);
+	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -2246,7 +2240,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 	q.requeue_pi_key = &key2;
 
 	/* Prepare to wait on uaddr. */
-	ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
+	ret = futex_wait_setup(uaddr, val, flags, &q, &hb);
 	if (ret)
 		goto out_key2;
 
@@ -2274,8 +2268,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 		 */
 		if (q.pi_state && (q.pi_state->owner != current)) {
 			spin_lock(q.lock_ptr);
-			ret = fixup_pi_state_owner(uaddr2, &q, current,
-						   fshared);
+			ret = fixup_pi_state_owner(uaddr2, &q, current);
 			spin_unlock(q.lock_ptr);
 		}
 	} else {
@@ -2294,7 +2287,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 		 * Fixup the pi_state owner and possibly acquire the lock if we
 		 * haven't already.
 		 */
-		res = fixup_owner(uaddr2, fshared, &q, !ret);
+		res = fixup_owner(uaddr2, &q, !ret);
 		/*
 		 * If fixup_owner() returned an error, proprogate that.  If it
 		 * acquired the lock, clear -ETIMEDOUT or -EINTR.
@@ -2325,9 +2318,9 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 	}
 
 out_put_keys:
-	put_futex_key(fshared, &q.key);
+	put_futex_key(&q.key);
 out_key2:
-	put_futex_key(fshared, &key2);
+	put_futex_key(&key2);
 
 out:
 	if (to) {
@@ -2551,58 +2544,58 @@ void exit_robust_list(struct task_struct *curr)
 long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		u32 __user *uaddr2, u32 val2, u32 val3)
 {
-	int clockrt, ret = -ENOSYS;
+	int ret = -ENOSYS;
 	int cmd = op & FUTEX_CMD_MASK;
-	int fshared = 0;
+	int flags = 0;
 
 	if (!(op & FUTEX_PRIVATE_FLAG))
-		fshared = 1;
+		flags |= FLAGS_SHARED;
 
-	clockrt = op & FUTEX_CLOCK_REALTIME;
-	if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
-		return -ENOSYS;
+	if (op & FUTEX_CLOCK_REALTIME) {
+		flags |= FLAGS_CLOCKRT;
+		if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
+			return -ENOSYS;
+	}
 
 	switch (cmd) {
 	case FUTEX_WAIT:
 		val3 = FUTEX_BITSET_MATCH_ANY;
 	case FUTEX_WAIT_BITSET:
-		ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt);
+		ret = futex_wait(uaddr, flags, val, timeout, val3);
 		break;
 	case FUTEX_WAKE:
 		val3 = FUTEX_BITSET_MATCH_ANY;
 	case FUTEX_WAKE_BITSET:
-		ret = futex_wake(uaddr, fshared, val, val3);
+		ret = futex_wake(uaddr, flags, val, val3);
 		break;
 	case FUTEX_REQUEUE:
-		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 0);
+		ret = futex_requeue(uaddr, flags, uaddr2, val, val2, NULL, 0);
 		break;
 	case FUTEX_CMP_REQUEUE:
-		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3,
-				    0);
+		ret = futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 0);
 		break;
 	case FUTEX_WAKE_OP:
-		ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
+		ret = futex_wake_op(uaddr, flags, uaddr2, val, val2, val3);
 		break;
 	case FUTEX_LOCK_PI:
 		if (futex_cmpxchg_enabled)
-			ret = futex_lock_pi(uaddr, fshared, val, timeout, 0);
+			ret = futex_lock_pi(uaddr, flags, val, timeout, 0);
 		break;
 	case FUTEX_UNLOCK_PI:
 		if (futex_cmpxchg_enabled)
-			ret = futex_unlock_pi(uaddr, fshared);
+			ret = futex_unlock_pi(uaddr, flags);
 		break;
 	case FUTEX_TRYLOCK_PI:
 		if (futex_cmpxchg_enabled)
-			ret = futex_lock_pi(uaddr, fshared, 0, timeout, 1);
+			ret = futex_lock_pi(uaddr, flags, 0, timeout, 1);
 		break;
 	case FUTEX_WAIT_REQUEUE_PI:
 		val3 = FUTEX_BITSET_MATCH_ANY;
-		ret = futex_wait_requeue_pi(uaddr, fshared, val, timeout, val3,
-					    clockrt, uaddr2);
+		ret = futex_wait_requeue_pi(uaddr, flags, val, timeout, val3,
+					    uaddr2);
 		break;
 	case FUTEX_CMP_REQUEUE_PI:
-		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3,
-				    1);
+		ret = futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1);
 		break;
 	default:
 		ret = -ENOSYS;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] futex: add futex_q static initializer
  2010-05-06  6:24 [PATCH V6 0/4][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
  2010-05-06  6:24 ` [PATCH 1/4] futex: replace fshared and clockrt with combined flags Darren Hart
@ 2010-05-06  6:24 ` Darren Hart
  2010-05-06  6:24 ` [PATCH 3/4] futex: refactor futex_lock_pi_atomic Darren Hart
  2010-05-06  6:24 ` [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning Darren Hart
  3 siblings, 0 replies; 14+ messages in thread
From: Darren Hart @ 2010-05-06  6:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Darren Hart, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason, John Cooper,
	Chris Wright, Ulrich Drepper, Alan Cox, Avi Kivity

The futex_q struct has grown considerably over the last year or so. I
believe it now merits a static initializer to avoid uninitialized data
errors (having just spent more time than I care to admit debugging
an uninitialized q.bitset in an experimental new op code).

I originally planned on following the __THING_INITIALIZER/DECLARE_THING
method, but since we already had FUTEX_KEY_INIT, and I personally prefer
that method, I went that route.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
---
 kernel/futex.c |   19 +++++++++----------
 1 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index ab3fb7c..f06bd97 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -130,6 +130,12 @@ struct futex_q {
 	u32 bitset;
 };
 
+#define FUTEX_Q_INIT \
+	{ /* list gets initialized in queue_me()*/ \
+	  .task = NULL, NULL, FUTEX_KEY_INIT \
+	, NULL, NULL, NULL, FUTEX_BITSET_MATCH_ANY }
+
+
 /*
  * Hash buckets are shared by all the futex_keys that hash to the same
  * location.  Each key may have multiple futex_q structures, one for each task
@@ -1799,16 +1805,13 @@ static int futex_wait(u32 __user *uaddr, int flags, u32 val, ktime_t *abs_time,
 	struct hrtimer_sleeper timeout, *to = NULL;
 	struct restart_block *restart;
 	struct futex_hash_bucket *hb;
-	struct futex_q q;
+	struct futex_q q = FUTEX_Q_INIT;
 	int ret;
 
 	if (!bitset)
 		return -EINVAL;
 
-	q.pi_state = NULL;
 	q.bitset = bitset;
-	q.rt_waiter = NULL;
-	q.requeue_pi_key = NULL;
 
 	if (abs_time) {
 		to = &timeout;
@@ -1899,7 +1902,7 @@ static int futex_lock_pi(u32 __user *uaddr, int flags, int detect,
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
 	struct futex_hash_bucket *hb;
-	struct futex_q q;
+	struct futex_q q = FUTEX_Q_INIT;
 	int res, ret;
 
 	if (refill_pi_state_cache())
@@ -1913,9 +1916,6 @@ static int futex_lock_pi(u32 __user *uaddr, int flags, int detect,
 		hrtimer_set_expires(&to->timer, *time);
 	}
 
-	q.pi_state = NULL;
-	q.rt_waiter = NULL;
-	q.requeue_pi_key = NULL;
 retry:
 	q.key = FUTEX_KEY_INIT;
 	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key);
@@ -2206,7 +2206,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int flags,
 	struct rt_mutex *pi_mutex = NULL;
 	struct futex_hash_bucket *hb;
 	union futex_key key2;
-	struct futex_q q;
+	struct futex_q q = FUTEX_Q_INIT;
 	int res, ret;
 
 	if (!bitset)
@@ -2234,7 +2234,6 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int flags,
 	if (unlikely(ret != 0))
 		goto out;
 
-	q.pi_state = NULL;
 	q.bitset = bitset;
 	q.rt_waiter = &rt_waiter;
 	q.requeue_pi_key = &key2;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] futex: refactor futex_lock_pi_atomic
  2010-05-06  6:24 [PATCH V6 0/4][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
  2010-05-06  6:24 ` [PATCH 1/4] futex: replace fshared and clockrt with combined flags Darren Hart
  2010-05-06  6:24 ` [PATCH 2/4] futex: add futex_q static initializer Darren Hart
@ 2010-05-06  6:24 ` Darren Hart
  2010-05-06  6:24 ` [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning Darren Hart
  3 siblings, 0 replies; 14+ messages in thread
From: Darren Hart @ 2010-05-06  6:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Darren Hart, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason, John Cooper,
	Chris Wright, Ulrich Drepper, Alan Cox, Avi Kivity

Prepare for FUTEX_LOCK by refactoring futex_lock_pi_atomic() into
lock_futex_atomic() and lock_pi_futex_atomic(). The name change is meant to
reflect the naming convention in futex.c that futex_*() functions map directly
to futex op codes and the others are internal helper functions.

V6: Whitespace cleanup

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>

===================================================================
---
 kernel/futex.c |   79 +++++++++++++++++++++++++++++++++++++------------------
 1 files changed, 53 insertions(+), 26 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index f06bd97..75c87fc 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -625,30 +625,23 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
 }
 
 /**
- * futex_lock_pi_atomic() - Atomic work required to acquire a pi aware futex
- * @uaddr:		the pi futex user address
- * @hb:			the pi futex hash bucket
- * @key:		the futex key associated with uaddr and hb
- * @ps:			the pi_state pointer where we store the result of the
- *			lookup
- * @task:		the task to perform the atomic lock work for.  This will
- *			be "current" except in the case of requeue pi.
- * @set_waiters:	force setting the FUTEX_WAITERS bit (1) or not (0)
+ * lock_futex_atomic() - Try to acquire the futex lock atomically
+ * @uaddr:       user address of the futex
+ * @task:        the task to perform the atomic lock work for.
+ * @set_waiters: force setting the FUTEX_WAITERS bit (1) or not (0)
+ *
+ * The hb->lock shall be held by the caller.
  *
  * Returns:
  *  0 - ready to wait
  *  1 - acquired the lock
  * <0 - error
- *
- * The hb->lock and futex_key refs shall be held by the caller.
  */
-static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
-				union futex_key *key,
-				struct futex_pi_state **ps,
-				struct task_struct *task, int set_waiters)
+static int lock_futex_atomic(u32 __user *uaddr, struct task_struct *task,
+			     int set_waiters)
 {
-	int lock_taken, ret, ownerdied = 0;
 	u32 uval, newval, curval;
+	int lock_taken, ret;
 
 retry:
 	ret = lock_taken = 0;
@@ -695,10 +688,10 @@ retry:
 	 *
 	 * This is safe as we are protected by the hash bucket lock !
 	 */
-	if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
+	if (unlikely((curval & FUTEX_OWNER_DIED) ||
+		     !(curval & FUTEX_TID_MASK))) {
 		/* Keep the OWNER_DIED bit */
 		newval = (curval & ~FUTEX_TID_MASK) | task_pid_vnr(task);
-		ownerdied = 0;
 		lock_taken = 1;
 	}
 
@@ -715,10 +708,46 @@ retry:
 	if (unlikely(lock_taken))
 		return 1;
 
+	return ret;
+}
+
+/**
+ * lock_pi_futex_atomic() - Atomic work required to acquire a pi aware futex
+ * @uaddr:		the pi futex user address
+ * @hb:			the pi futex hash bucket
+ * @key:		the futex key associated with uaddr and hb
+ * @ps:			the pi_state pointer where we store the result of the
+ *			lookup
+ * @task:		the task to perform the atomic lock work for.  This will
+ *			be "current" except in the case of requeue pi.
+ * @set_waiters:	force setting the FUTEX_WAITERS bit (1) or not (0)
+ *
+ * Returns:
+ *  0 - ready to wait
+ *  1 - acquired the lock
+ * <0 - error
+ *
+ * The hb->lock and futex_key refs shall be held by the caller.
+ */
+static int lock_pi_futex_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
+				union futex_key *key,
+				struct futex_pi_state **ps,
+				struct task_struct *task, int set_waiters)
+{
+	u32 uval;
+	int ret;
+
+retry:
+	ret = lock_futex_atomic(uaddr, task, set_waiters);
+	if (ret)
+		return ret;
+
 	/*
 	 * We dont have the lock. Look up the PI state (or create it if
 	 * we are the first waiter):
 	 */
+	if (get_futex_value_locked(&uval, uaddr))
+		return -EFAULT;
 	ret = lookup_pi_state(uval, hb, key, ps);
 
 	if (unlikely(ret)) {
@@ -729,7 +758,7 @@ retry:
 			 * OWNER_DIED bit is set to figure out whether
 			 * this is a robust futex or not.
 			 */
-			if (get_futex_value_locked(&curval, uaddr))
+			if (get_futex_value_locked(&uval, uaddr))
 				return -EFAULT;
 
 			/*
@@ -737,10 +766,8 @@ retry:
 			 * futex. The code above will take the futex
 			 * and return happy.
 			 */
-			if (curval & FUTEX_OWNER_DIED) {
-				ownerdied = 1;
+			if (uval & FUTEX_OWNER_DIED)
 				goto retry;
-			}
 		default:
 			break;
 		}
@@ -1100,7 +1127,7 @@ void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key,
  *
  * Try and get the lock on behalf of the top waiter if we can do it atomically.
  * Wake the top waiter if we succeed.  If the caller specified set_waiters,
- * then direct futex_lock_pi_atomic() to force setting the FUTEX_WAITERS bit.
+ * then direct lock_pi_futex_atomic() to force setting the FUTEX_WAITERS bit.
  * hb1 and hb2 must be held by the caller.
  *
  * Returns:
@@ -1124,7 +1151,7 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
 	/*
 	 * Find the top_waiter and determine if there are additional waiters.
 	 * If the caller intends to requeue more than 1 waiter to pifutex,
-	 * force futex_lock_pi_atomic() to set the FUTEX_WAITERS bit now,
+	 * force lock_pi_futex_atomic() to set the FUTEX_WAITERS bit now,
 	 * as we have means to handle the possible fault.  If not, don't set
 	 * the bit unecessarily as it will force the subsequent unlock to enter
 	 * the kernel.
@@ -1144,7 +1171,7 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
 	 * the contended case or if set_waiters is 1.  The pi_state is returned
 	 * in ps in contended cases.
 	 */
-	ret = futex_lock_pi_atomic(pifutex, hb2, key2, ps, top_waiter->task,
+	ret = lock_pi_futex_atomic(pifutex, hb2, key2, ps, top_waiter->task,
 				   set_waiters);
 	if (ret == 1)
 		requeue_pi_wake_futex(top_waiter, key2, hb2);
@@ -1925,7 +1952,7 @@ retry:
 retry_private:
 	hb = queue_lock(&q);
 
-	ret = futex_lock_pi_atomic(uaddr, hb, &q.key, &q.pi_state, current, 0);
+	ret = lock_pi_futex_atomic(uaddr, hb, &q.key, &q.pi_state, current, 0);
 	if (unlikely(ret)) {
 		switch (ret) {
 		case 1:
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-05-06  6:24 [PATCH V6 0/4][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
                   ` (2 preceding siblings ...)
  2010-05-06  6:24 ` [PATCH 3/4] futex: refactor futex_lock_pi_atomic Darren Hart
@ 2010-05-06  6:24 ` Darren Hart
  2010-05-07 16:20   ` Thomas Gleixner
  3 siblings, 1 reply; 14+ messages in thread
From: Darren Hart @ 2010-05-06  6:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Darren Hart, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason, John Cooper,
	Chris Wright, Ulrich Drepper, Alan Cox, Avi Kivity,
	Peter Zijlstra

Add a non-pi TID value based futex locking mechanism. This enables the
use of adaptive spinning which was problematic with the basic FUTEX_WAIT
operation.

V6: Remove refill_pi_state() cruft from futex_lock()
    Replace futex_owner() with value checking in owner spin
    Fix spin_on_owner usage
    Test value before cmpxchg
    Break possible retry loop with hb lock held in lock_futex_atomic()
    Whitespace cleanup
V5: Handle timeout inside adaptive lock spin (needs optimization)
    Fix task_struct ref counting
    Allow CLOCK_RT and default to CLOCK_MONOTONIC for FUTEX_LOCK
    Cleanup futex_lock ret handling (EINTR) cruft
V4: Remove update_futex_waiters() and perform all waiter clearing in
    userspace. Turns out to be generally better to do an O(1) operation in
    userspace that forces one extra syscall than to do an O(N) operation dur
    every syscall. Go figure :).
    Cleanup timeout when exiting adaptive loop.
    Perform adaptive spinning after retry:
    General cleanups.
V3: Fix some locking bugs
    Use cmpxchg directly in the adaptive loop
    Move the adaptive loop into its own function
    Use FUTEX_WAIT instead of FUTEX_UNLOCK
    Add update_futex_waiters()
V2: Fix preempt-count breakage

This patch is a forward port of the following patch from Peter Zijlstra
with some bug fixes and a reworking of the futex op codes.

> futex: Implement kernel-side adaptive spinning for futexes
>
> This is implemented as a new futex op: FUTEX_WAIT_ADAPTIVE, because it
> requires the futex lock field to contain a TID and regular futexes do
> not have that restriction.
>
> FUTEX_WAIT_ADAPTIVE will spin while the lock owner is running (on a
> different cpu) or until the task gets preempted. After that it behaves
> like FUTEX_WAIT.
>
> The spin loop tries to acquire the lock by cmpxchg(lock, 0, tid) == tid
> on the lower 30 bits (TID_MASK) of the lock field -- leaving the
> WAITERS and OWNER_DIED flags in tact.
>
> NOTE: we could fold mutex_spin_on_owner() and futex_spin_on_owner() by
> adding a lock_owner function argument.
>
>   void lock_spin_on_owner(struct thread_info *(*lock_owner)(void *lock),
>                           void *lock, struct thread_info *owner);
>
> Peter Zijlstra's original patch forward ported by Darren Hart.
>

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
---
 include/linux/futex.h |    6 +
 include/linux/sched.h |    1 +
 kernel/futex.c        |  261 ++++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/sched.c        |   66 +++++++++++++
 4 files changed, 329 insertions(+), 5 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 1e5a26d..3ca93d1 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -20,6 +20,8 @@
 #define FUTEX_WAKE_BITSET	10
 #define FUTEX_WAIT_REQUEUE_PI	11
 #define FUTEX_CMP_REQUEUE_PI	12
+#define FUTEX_LOCK		13
+#define FUTEX_LOCK_ADAPTIVE	14
 
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
@@ -39,6 +41,9 @@
 					 FUTEX_PRIVATE_FLAG)
 #define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | \
 					 FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_PRIVATE		(FUTEX_LOCK | FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_ADAPTIVE_PRIVATE	(FUTEX_LOCK_ADAPTIVE | \
+					 FUTEX_PRIVATE_FLAG)
 
 /*
  * Support for robust futexes: the kernel cleans up held futexes at
@@ -177,6 +182,7 @@ union futex_key {
 extern void exit_robust_list(struct task_struct *curr);
 extern void exit_pi_state_list(struct task_struct *curr);
 extern int futex_cmpxchg_enabled;
+extern struct thread_info *futex_owner(u32 __user *uaddr);
 #else
 static inline void exit_robust_list(struct task_struct *curr)
 {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index dad7f66..885d659 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -362,6 +362,7 @@ extern signed long schedule_timeout_killable(signed long timeout);
 extern signed long schedule_timeout_uninterruptible(signed long timeout);
 asmlinkage void schedule(void);
 extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner);
+extern int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner);
 
 struct nsproxy;
 struct user_namespace;
diff --git a/kernel/futex.c b/kernel/futex.c
index 75c87fc..f9e56b9 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -75,6 +75,7 @@ int __read_mostly futex_cmpxchg_enabled;
 #define FLAGS_SHARED		0x01
 #define FLAGS_CLOCKRT		0x02
 #define FLAGS_HAS_TIMEOUT	0x04
+#define FLAGS_ADAPTIVE		0x08
 
 /*
  * Priority Inheritance state:
@@ -368,6 +369,43 @@ static int get_futex_value_locked(u32 *dest, u32 __user *from)
 	return ret ? -EFAULT : 0;
 }
 
+#ifdef CONFIG_SMP
+/**
+ * futex_owner() - Lookup the thread_info struct of ther futex owner
+ * @uaddr: The user address containing the TID of the owner
+ *
+ * If a non-NULL ti is returned, the caller must call
+ * put_task_struct(ti->task).
+ * */
+struct thread_info *futex_owner(u32 __user *uaddr)
+{
+	struct thread_info *ti = NULL;
+	struct task_struct *p;
+	pid_t pid;
+	u32 uval;
+
+	if (get_futex_value_locked(&uval, uaddr))
+		return NULL;
+
+	pid = uval & FUTEX_TID_MASK;
+	rcu_read_lock();
+	p = find_task_by_vpid(pid);
+	if (p) {
+		const struct cred *cred, *pcred;
+
+		cred = current_cred();
+		pcred = __task_cred(p);
+		if (cred->euid == pcred->euid || cred->euid == pcred->uid) {
+			ti = task_thread_info(p);
+			get_task_struct(ti->task);
+		}
+	}
+	rcu_read_unlock();
+
+	return ti;
+}
+#endif
+
 
 /*
  * PI code:
@@ -643,7 +681,6 @@ static int lock_futex_atomic(u32 __user *uaddr, struct task_struct *task,
 	u32 uval, newval, curval;
 	int lock_taken, ret;
 
-retry:
 	ret = lock_taken = 0;
 
 	/*
@@ -700,7 +737,7 @@ retry:
 	if (unlikely(curval == -EFAULT))
 		return -EFAULT;
 	if (unlikely(curval != uval))
-		goto retry;
+		return -EAGAIN;
 
 	/*
 	 * We took the lock due to owner died take over.
@@ -880,7 +917,7 @@ static int unlock_futex_pi(u32 __user *uaddr, u32 uval)
 
 	/*
 	 * There is no waiter, so we unlock the futex. The owner died
-	 * bit has not to be preserved here. We are the owner:
+	 * bit does not need to be preserved here. We are the owner:
 	 */
 	oldval = cmpxchg_futex_value_locked(uaddr, uval, 0);
 
@@ -2356,6 +2393,212 @@ out:
 	return ret;
 }
 
+/**
+ * trylock_futex_adaptive() - Try to acquire the futex lock in a busy loop
+ * @uaddr: the futex user address
+ * @timeout: absolute timeout or NULL if none
+ *
+ * Try to acquire a futex lock in a loop until the owner changes or the owner
+ * is descheduled. To lock the futex, set the value to the current TID.
+ *
+ * Returns:
+ *  0 - Gave up, lock not acquired
+ *  1 - Futex lock acquired
+ * <0 - On error
+ */
+static int trylock_futex_adaptive(u32 __user *uaddr, ktime_t *timeout)
+{
+	struct thread_info *owner = NULL;
+	pid_t curtid, ownertid = 0;
+	ktime_t now;
+	int ret = 0;
+	u32 curval;
+
+	curtid = task_pid_vnr(current);
+
+	for (;;) {
+		if (get_user(curval, uaddr)) {
+			ret = -EFAULT;
+			break;
+		}
+
+		/*
+		 * Try to acquire the lock.
+		 */
+		if (curval == 0)
+			curval = cmpxchg_futex_value_locked(uaddr, 0, curtid);
+
+		if (curval == 0) {
+			ret = 1;
+			break;
+		}
+
+		if (unlikely(curval == -EFAULT)) {
+			ret = -EFAULT;
+			break;
+		}
+
+		/*
+		 * Futex locks manage the owner atomically. We are not in
+		 * danger of deadlock by preempting a pending owner. Abort the
+		 * loop if our time slice has expired.  RT Threads can spin
+		 * until they preempt the owner, at which point they will break
+		 * out of the loop anyway.
+		 */
+		if (need_resched())
+			break;
+
+		if (timeout) {
+			now = ktime_get();
+			if (timeout->tv64 < now.tv64)
+				break;
+		}
+
+		cpu_relax();
+
+		if ((curval & FUTEX_TID_MASK) != ownertid) {
+			if (owner)
+				put_task_struct(owner->task);
+			owner = futex_owner(uaddr);
+		}
+		if (owner && !futex_spin_on_owner(uaddr, owner))
+			break;
+	}
+
+	if (owner)
+		put_task_struct(owner->task);
+
+	return ret;
+}
+
+/**
+ * futex_lock() - Acquire the lock and set the futex value
+ * @uaddr:  the futex user address
+ * @flags:  futex flags (FLAGS_SHARED, FLAGS_CLOCKRT, FLAGS_ADAPTIVE, etc.)
+ * @detect: detect deadlock (1) or not (0)
+ * @time:   absolute timeout
+ *
+ * futex_(un)lock() define a futex value policy and implement a full mutex. The
+ * futex value stores the owner's TID or'd with FUTEX_WAITERS and/or
+ * FUTEX_OWNER_DIED as appropriate.
+ *
+ * Userspace tried a 0 -> TID atomic transition of the futex value and failed.
+ * Try to acquire the lock in the kernel, blocking if necessary. Return to
+ * userspace with the lock held and the futex value set accordingly (or after a
+ * timeout).
+ *
+ * Returns:
+ *  0 - On success
+ * <0 - On error
+ */
+static int futex_lock(u32 __user *uaddr, int flags, int detect, ktime_t *time)
+{
+	struct hrtimer_sleeper timeout, *to = NULL;
+	struct futex_hash_bucket *hb;
+	struct futex_q q = FUTEX_Q_INIT;
+	int ret = 0;
+
+	if (time) {
+		to = &timeout;
+		hrtimer_init_on_stack(&to->timer, (flags & FLAGS_CLOCKRT) ?
+				      CLOCK_REALTIME : CLOCK_MONOTONIC,
+				      HRTIMER_MODE_ABS);
+		hrtimer_init_sleeper(to, current);
+		hrtimer_set_expires(&to->timer, *time);
+	}
+
+retry:
+#ifdef CONFIG_SMP
+	if (flags & FLAGS_ADAPTIVE) {
+		preempt_disable();
+		ret = trylock_futex_adaptive(uaddr, time);
+		preempt_enable();
+
+		/* We got the lock. */
+		if (ret == 1) {
+			ret = 0;
+			goto out;
+		}
+
+		/* We encountered an error, -EFAULT or -ETIMEDOUT */
+		if (ret)
+			goto out;
+	}
+#endif
+	q.key = FUTEX_KEY_INIT;
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key);
+	if (unlikely(ret != 0))
+		goto out;
+
+	hb = queue_lock(&q);
+
+	ret = lock_futex_atomic(uaddr, current, 0);
+	if (unlikely(ret)) {
+		switch (ret) {
+		case 1:
+			/* We got the lock. */
+			ret = 0;
+			goto out_unlock_put_key;
+		case -EAGAIN:
+			queue_unlock(&q, hb);
+			put_futex_key(&q.key);
+			goto retry;
+		case -EFAULT:
+			goto uaddr_faulted;
+		default:
+			goto out_unlock_put_key;
+		}
+	}
+	/*
+	 * Only actually queue now that the atomic ops are done:
+	 */
+	futex_wait_queue_me(hb, &q, to);
+
+	ret = unqueue_me(&q);
+
+	/* If we were woken by the owner, try and acquire the lock. */
+	if (!ret)
+		goto retry;
+
+	ret = -ETIMEDOUT;
+	if (to && !to->task)
+		goto out_put_key;
+
+	/*
+	 * We expect signal_pending(current), but we might be the
+	 * victim of a spurious wakeup as well.
+	 */
+
+	ret = -ERESTARTNOINTR;
+	if (!signal_pending(current)) {
+		put_futex_key(&q.key);
+		goto retry;
+	}
+
+	goto out_put_key;
+
+out_unlock_put_key:
+	queue_unlock(&q, hb);
+
+out_put_key:
+	put_futex_key(&q.key);
+out:
+	if (to)
+		destroy_hrtimer_on_stack(&to->timer);
+	return ret;
+
+uaddr_faulted:
+	queue_unlock(&q, hb);
+
+	ret = fault_in_user_writeable(uaddr);
+	if (ret)
+		goto out_put_key;
+
+	put_futex_key(&q.key);
+	goto retry;
+}
+
+
 /*
  * Support for robust futexes: the kernel cleans up held futexes at
  * thread exit time.
@@ -2579,7 +2822,8 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 
 	if (op & FUTEX_CLOCK_REALTIME) {
 		flags |= FLAGS_CLOCKRT;
-		if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
+		if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI &&
+		    cmd != FUTEX_LOCK && cmd != FUTEX_LOCK_ADAPTIVE)
 			return -ENOSYS;
 	}
 
@@ -2623,6 +2867,12 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 	case FUTEX_CMP_REQUEUE_PI:
 		ret = futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1);
 		break;
+	case FUTEX_LOCK_ADAPTIVE:
+		flags |= FLAGS_ADAPTIVE;
+	case FUTEX_LOCK:
+		if (futex_cmpxchg_enabled)
+			ret = futex_lock(uaddr, flags, val, timeout);
+		break;
 	default:
 		ret = -ENOSYS;
 	}
@@ -2641,7 +2891,8 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
 
 	if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
 		      cmd == FUTEX_WAIT_BITSET ||
-		      cmd == FUTEX_WAIT_REQUEUE_PI)) {
+		      cmd == FUTEX_WAIT_REQUEUE_PI ||
+		      cmd == FUTEX_LOCK || cmd == FUTEX_LOCK_ADAPTIVE)) {
 		if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
 			return -EFAULT;
 		if (!timespec_valid(&ts))
diff --git a/kernel/sched.c b/kernel/sched.c
index 6af210a..20b8d99 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3821,6 +3821,72 @@ out:
 }
 #endif
 
+#ifdef CONFIG_FUTEX
+#include <linux/futex.h>
+
+int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner)
+{
+	u32 ownertid, uval;
+	unsigned int cpu;
+	struct rq *rq;
+
+	if (!sched_feat(OWNER_SPIN))
+		return 0;
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
+	/*
+	 * Need to access the cpu field knowing that
+	 * DEBUG_PAGEALLOC could have unmapped it if
+	 * the mutex owner just released it and exited.
+	 */
+	if (probe_kernel_address(&owner->cpu, cpu))
+		goto out;
+#else
+	cpu = owner->cpu;
+#endif
+
+	/*
+	 * Even if the access succeeded (likely case),
+	 * the cpu field may no longer be valid.
+	 */
+	if (cpu >= nr_cpumask_bits)
+		goto out;
+
+	/*
+	 * We need to validate that we can do a
+	 * get_cpu() and that we have the percpu area.
+	 */
+	if (!cpu_online(cpu))
+		goto out;
+
+	rq = cpu_rq(cpu);
+
+	if (get_user(ownertid, uaddr))
+		return -EFAULT;
+	ownertid &= FUTEX_TID_MASK;
+
+	for (;;) {
+		/*
+		 * Owner changed, break to re-assess state.
+		 */
+		if (get_user(uval, uaddr))
+			return -EFAULT;
+		if ((uval & FUTEX_TID_MASK) != ownertid)
+			break;
+
+		/*
+		 * Is that owner really running on that cpu?
+		 */
+		if (task_thread_info(rq->curr) != owner || need_resched())
+			return 0;
+
+		cpu_relax();
+	}
+out:
+	return 1;
+}
+#endif
+
 #ifdef CONFIG_PREEMPT
 /*
  * this is the entry point to schedule() from in-kernel preemption
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-05-06  6:24 ` [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning Darren Hart
@ 2010-05-07 16:20   ` Thomas Gleixner
  2010-05-07 16:24     ` Peter Zijlstra
                       ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Thomas Gleixner @ 2010-05-07 16:20 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Ulrich Drepper, Alan Cox, Avi Kivity, Peter Zijlstra

On Wed, 5 May 2010, Darren Hart wrote:

> Add a non-pi TID value based futex locking mechanism. This enables the
> use of adaptive spinning which was problematic with the basic FUTEX_WAIT
> operation.

You still do way too much work in that spin code with way too much
code lines.

Can you try the following (completely uncompiled/untested) patch ?

Thanks,

	tglx
Subject: futex-simplify.patch
From: Thomas Gleixner <tglx@linutronix.de>
Date: Fri, 07 May 2010 17:56:38 +0200

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex.c |   42 ++++++++++++------------------------
 kernel/sched.c |   66 ---------------------------------------------------------
 2 files changed, 14 insertions(+), 94 deletions(-)

Index: linux-2.6-tip/kernel/futex.c
===================================================================
--- linux-2.6-tip.orig/kernel/futex.c
+++ linux-2.6-tip/kernel/futex.c
@@ -2406,11 +2406,10 @@ out:
  *  1 - Futex lock acquired
  * <0 - On error
  */
-static int trylock_futex_adaptive(u32 __user *uaddr, ktime_t *timeout)
+static int trylock_futex_adaptive(u32 __user *uaddr, struct hrtimer_sleeper *to)
 {
 	struct thread_info *owner = NULL;
 	pid_t curtid, ownertid = 0;
-	ktime_t now;
 	int ret = 0;
 	u32 curval;
 
@@ -2425,43 +2424,32 @@ static int trylock_futex_adaptive(u32 __
 		/*
 		 * Try to acquire the lock.
 		 */
-		if (curval == 0)
+		if (curval == 0) {
 			curval = cmpxchg_futex_value_locked(uaddr, 0, curtid);
 
-		if (curval == 0) {
-			ret = 1;
-			break;
-		}
+			if (curval == 0) {
+				ret = 1;
+				break;
+			}
 
-		if (unlikely(curval == -EFAULT)) {
-			ret = -EFAULT;
-			break;
+			if (unlikely(curval == -EFAULT)) {
+				ret = -EFAULT;
+				break;
+			}
 		}
 
-		/*
-		 * Futex locks manage the owner atomically. We are not in
-		 * danger of deadlock by preempting a pending owner. Abort the
-		 * loop if our time slice has expired.  RT Threads can spin
-		 * until they preempt the owner, at which point they will break
-		 * out of the loop anyway.
-		 */
-		if (need_resched())
+		if (to && !to->task) {
+			ret = -ETIMEOUT;
 			break;
-
-		if (timeout) {
-			now = ktime_get();
-			if (timeout->tv64 < now.tv64)
-				break;
 		}
 
-		cpu_relax();
-
 		if ((curval & FUTEX_TID_MASK) != ownertid) {
 			if (owner)
 				put_task_struct(owner->task);
 			owner = futex_owner(uaddr);
 		}
-		if (owner && !futex_spin_on_owner(uaddr, owner))
+
+		if (!owner->oncpu)
 			break;
 	}
 
@@ -2510,9 +2498,7 @@ static int futex_lock(u32 __user *uaddr,
 retry:
 #ifdef CONFIG_SMP
 	if (flags & FLAGS_ADAPTIVE) {
-		preempt_disable();
 		ret = trylock_futex_adaptive(uaddr, time);
-		preempt_enable();
 
 		/* We got the lock. */
 		if (ret == 1) {
Index: linux-2.6-tip/kernel/sched.c
===================================================================
--- linux-2.6-tip.orig/kernel/sched.c
+++ linux-2.6-tip/kernel/sched.c
@@ -3821,72 +3821,6 @@ out:
 }
 #endif
 
-#ifdef CONFIG_FUTEX
-#include <linux/futex.h>
-
-int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner)
-{
-	u32 ownertid, uval;
-	unsigned int cpu;
-	struct rq *rq;
-
-	if (!sched_feat(OWNER_SPIN))
-		return 0;
-
-#ifdef CONFIG_DEBUG_PAGEALLOC
-	/*
-	 * Need to access the cpu field knowing that
-	 * DEBUG_PAGEALLOC could have unmapped it if
-	 * the mutex owner just released it and exited.
-	 */
-	if (probe_kernel_address(&owner->cpu, cpu))
-		goto out;
-#else
-	cpu = owner->cpu;
-#endif
-
-	/*
-	 * Even if the access succeeded (likely case),
-	 * the cpu field may no longer be valid.
-	 */
-	if (cpu >= nr_cpumask_bits)
-		goto out;
-
-	/*
-	 * We need to validate that we can do a
-	 * get_cpu() and that we have the percpu area.
-	 */
-	if (!cpu_online(cpu))
-		goto out;
-
-	rq = cpu_rq(cpu);
-
-	if (get_user(ownertid, uaddr))
-		return -EFAULT;
-	ownertid &= FUTEX_TID_MASK;
-
-	for (;;) {
-		/*
-		 * Owner changed, break to re-assess state.
-		 */
-		if (get_user(uval, uaddr))
-			return -EFAULT;
-		if ((uval & FUTEX_TID_MASK) != ownertid)
-			break;
-
-		/*
-		 * Is that owner really running on that cpu?
-		 */
-		if (task_thread_info(rq->curr) != owner || need_resched())
-			return 0;
-
-		cpu_relax();
-	}
-out:
-	return 1;
-}
-#endif
-
 #ifdef CONFIG_PREEMPT
 /*
  * this is the entry point to schedule() from in-kernel preemption

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-05-07 16:20   ` Thomas Gleixner
@ 2010-05-07 16:24     ` Peter Zijlstra
  2010-05-07 16:30       ` Thomas Gleixner
  2010-05-07 16:52     ` Darren Hart
  2010-05-07 19:11     ` Darren Hart
  2 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2010-05-07 16:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Darren Hart, linux-kernel, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Ulrich Drepper, Alan Cox, Avi Kivity

On Fri, 2010-05-07 at 18:20 +0200, Thomas Gleixner wrote:
> +               if (!owner->oncpu)

can't do that, that relies on __ARCH_WANT_UNLOCKED_CTXSW

Please keep the code as near mutex_spin_on_owner() as possible.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-05-07 16:24     ` Peter Zijlstra
@ 2010-05-07 16:30       ` Thomas Gleixner
  2010-05-07 16:35         ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2010-05-07 16:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Darren Hart, linux-kernel, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Ulrich Drepper, Alan Cox, Avi Kivity

On Fri, 7 May 2010, Peter Zijlstra wrote:

> On Fri, 2010-05-07 at 18:20 +0200, Thomas Gleixner wrote:
> > +               if (!owner->oncpu)
> 
> can't do that, that relies on __ARCH_WANT_UNLOCKED_CTXSW

Grr, yes.
 
> Please keep the code as near mutex_spin_on_owner() as possible.

There is no reason why we can't make that unconditional.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-05-07 16:30       ` Thomas Gleixner
@ 2010-05-07 16:35         ` Peter Zijlstra
  2010-05-07 16:43           ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2010-05-07 16:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Darren Hart, linux-kernel, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Ulrich Drepper, Alan Cox, Avi Kivity

On Fri, 2010-05-07 at 18:30 +0200, Thomas Gleixner wrote:
> > Please keep the code as near mutex_spin_on_owner() as possible.
> 
> There is no reason why we can't make that unconditional.
> 
Sure, but lets do that in a separate series.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-05-07 16:35         ` Peter Zijlstra
@ 2010-05-07 16:43           ` Thomas Gleixner
  2010-05-07 19:05             ` Darren Hart
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2010-05-07 16:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Darren Hart, linux-kernel, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Ulrich Drepper, Alan Cox, Avi Kivity


On Fri, 7 May 2010, Peter Zijlstra wrote:

> On Fri, 2010-05-07 at 18:30 +0200, Thomas Gleixner wrote:
> > > Please keep the code as near mutex_spin_on_owner() as possible.
> > 
> > There is no reason why we can't make that unconditional.
> > 
> Sure, but lets do that in a separate series.

Sure. I'm not touching mutex_spin_on_owner() now. It's just for
testing now.

Thanks,

	tglx
---
Index: linux-2.6-tip/kernel/sched.c
===================================================================
--- linux-2.6-tip.orig/kernel/sched.c
+++ linux-2.6-tip/kernel/sched.c
@@ -841,6 +841,10 @@ static inline int task_running(struct rq
 
 static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
 {
+#ifdef CONFIG_SMP
+	next->oncpu = 1;
+	prev->oncpu = 0;
+#endif
 }
 
 static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
@@ -2628,7 +2632,7 @@ void sched_fork(struct task_struct *p, i
 	if (likely(sched_info_on()))
 		memset(&p->sched_info, 0, sizeof(p->sched_info));
 #endif
-#if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
+#if defined(CONFIG_SMP)
 	p->oncpu = 0;
 #endif
 #ifdef CONFIG_PREEMPT
@@ -5250,7 +5254,7 @@ void __cpuinit init_idle(struct task_str
 	__set_task_cpu(idle, cpu);
 
 	rq->curr = rq->idle = idle;
-#if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
+#if defined(CONFIG_SMP)
 	idle->oncpu = 1;
 #endif
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
Index: linux-2.6-tip/include/linux/sched.h
===================================================================
--- linux-2.6-tip.orig/include/linux/sched.h
+++ linux-2.6-tip/include/linux/sched.h
@@ -1178,10 +1178,8 @@ struct task_struct {
 	int lock_depth;		/* BKL lock depth */
 
 #ifdef CONFIG_SMP
-#ifdef __ARCH_WANT_UNLOCKED_CTXSW
 	int oncpu;
 #endif
-#endif
 
 	int prio, static_prio, normal_prio;
 	unsigned int rt_priority;


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-05-07 16:43           ` Thomas Gleixner
@ 2010-05-07 19:05             ` Darren Hart
  0 siblings, 0 replies; 14+ messages in thread
From: Darren Hart @ 2010-05-07 19:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Ulrich Drepper, Alan Cox, Avi Kivity

Thomas Gleixner wrote:
> On Fri, 7 May 2010, Peter Zijlstra wrote:
> 
>> On Fri, 2010-05-07 at 18:30 +0200, Thomas Gleixner wrote:
>>>> Please keep the code as near mutex_spin_on_owner() as possible.
>>> There is no reason why we can't make that unconditional.
>>>
>> Sure, but lets do that in a separate series.
> 
> Sure. I'm not touching mutex_spin_on_owner() now. It's just for
> testing now.
> 
> Thanks,
> 
> 	tglx


One bug below, see patch below for fix.


> ---
> Index: linux-2.6-tip/kernel/sched.c
> ===================================================================
> --- linux-2.6-tip.orig/kernel/sched.c
> +++ linux-2.6-tip/kernel/sched.c
> @@ -841,6 +841,10 @@ static inline int task_running(struct rq
> 
>  static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
>  {
> +#ifdef CONFIG_SMP
> +	next->oncpu = 1;
> +	prev->oncpu = 0;

no prev in context, moved to finish_lock_switch:

How's this?

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
---
 include/linux/sched.h |    2 --
 kernel/sched.c        |   10 ++++++++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 885d659..3fb8a45 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1178,10 +1178,8 @@ struct task_struct {
 	int lock_depth;		/* BKL lock depth */
 
 #ifdef CONFIG_SMP
-#ifdef __ARCH_WANT_UNLOCKED_CTXSW
 	int oncpu;
 #endif
-#endif
 
 	int prio, static_prio, normal_prio;
 	unsigned int rt_priority;
diff --git a/kernel/sched.c b/kernel/sched.c
index 20b8d99..9915bdf 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -841,10 +841,16 @@ static inline int task_running(struct rq *rq, struct task_struct *p)
 
 static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
 {
+#ifdef CONFIG_SMP
+	next->oncpu = 1;
+#endif
 }
 
 static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 {
+#ifdef CONFIG_SMP
+	prev->oncpu = 0;
+#endif
 #ifdef CONFIG_DEBUG_SPINLOCK
 	/* this is a valid case when another task releases the spinlock */
 	rq->lock.owner = current;
@@ -2628,7 +2634,7 @@ void sched_fork(struct task_struct *p, int clone_flags)
 	if (likely(sched_info_on()))
 		memset(&p->sched_info, 0, sizeof(p->sched_info));
 #endif
-#if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
+#if defined(CONFIG_SMP)
 	p->oncpu = 0;
 #endif
 #ifdef CONFIG_PREEMPT
@@ -5316,7 +5322,7 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu)
 	__set_task_cpu(idle, cpu);
 
 	rq->curr = rq->idle = idle;
-#if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
+#if defined(CONFIG_SMP)
 	idle->oncpu = 1;
 #endif
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
-- 
1.6.3.3


-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-05-07 16:20   ` Thomas Gleixner
  2010-05-07 16:24     ` Peter Zijlstra
@ 2010-05-07 16:52     ` Darren Hart
  2010-05-07 19:11     ` Darren Hart
  2 siblings, 0 replies; 14+ messages in thread
From: Darren Hart @ 2010-05-07 16:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Ulrich Drepper, Alan Cox, Avi Kivity, Peter Zijlstra

Thomas Gleixner wrote:
> On Wed, 5 May 2010, Darren Hart wrote:
> 
>> Add a non-pi TID value based futex locking mechanism. This enables the
>> use of adaptive spinning which was problematic with the basic FUTEX_WAIT
>> operation.
> 
> You still do way too much work in that spin code with way too much
> code lines.
> 
> Can you try the following (completely uncompiled/untested) patch ?
> 
> Thanks,
> 
> 	tglx
> Subject: futex-simplify.patch
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Fri, 07 May 2010 17:56:38 +0200
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/futex.c |   42 ++++++++++++------------------------
>  kernel/sched.c |   66 ---------------------------------------------------------
>  2 files changed, 14 insertions(+), 94 deletions(-)
> 


> +		if (to && !to->task) {
> +			ret = -ETIMEOUT;
>  			break;

I had hoped to be able to do it like this too. Unfortunately, we can't 
arm the timer until after we set TASK_INTERRUPTIBLE, and we don't do 
that until after we give up on spinning.


> -
> -		if (timeout) {
> -			now = ktime_get();
> -			if (timeout->tv64 < now.tv64)
> -				break;
>  		}

This bit clearly needs work. Either via some interpolated time 
calculation using something like jiffies, or via another timer that sets 
a bit we can check in here.

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-05-07 16:20   ` Thomas Gleixner
  2010-05-07 16:24     ` Peter Zijlstra
  2010-05-07 16:52     ` Darren Hart
@ 2010-05-07 19:11     ` Darren Hart
  2 siblings, 0 replies; 14+ messages in thread
From: Darren Hart @ 2010-05-07 19:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Ulrich Drepper, Alan Cox, Avi Kivity, Peter Zijlstra

Thomas Gleixner wrote:
> On Wed, 5 May 2010, Darren Hart wrote:
> 
>> Add a non-pi TID value based futex locking mechanism. This enables the
>> use of adaptive spinning which was problematic with the basic FUTEX_WAIT
>> operation.
> 
> You still do way too much work in that spin code with way too much
> code lines.
> 
> Can you try the following (completely uncompiled/untested) patch ?

After some tweaking this patch results in very little change in the
results - but does cut down the complexity of the solution quite a bit.

Two questions inline (not included in the new patch) and a new version
of the patch below.

> 
> Thanks,
> 
> 	tglx
> Subject: futex-simplify.patch
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Fri, 07 May 2010 17:56:38 +0200
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/futex.c |   42 ++++++++++++------------------------
>  kernel/sched.c |   66 ---------------------------------------------------------
>  2 files changed, 14 insertions(+), 94 deletions(-)
> 
> Index: linux-2.6-tip/kernel/futex.c
> ===================================================================
> --- linux-2.6-tip.orig/kernel/futex.c
> +++ linux-2.6-tip/kernel/futex.c
> @@ -2406,11 +2406,10 @@ out:
>   *  1 - Futex lock acquired
>   * <0 - On error
>   */
> -static int trylock_futex_adaptive(u32 __user *uaddr, ktime_t *timeout)
> +static int trylock_futex_adaptive(u32 __user *uaddr, struct hrtimer_sleeper *to)
>  {
>  	struct thread_info *owner = NULL;
>  	pid_t curtid, ownertid = 0;
> -	ktime_t now;
>  	int ret = 0;
>  	u32 curval;
> 
> @@ -2425,43 +2424,32 @@ static int trylock_futex_adaptive(u32 __
>  		/*
>  		 * Try to acquire the lock.
>  		 */
> -		if (curval == 0)
> +		if (curval == 0) {
>  			curval = cmpxchg_futex_value_locked(uaddr, 0, curtid);
> 
> -		if (curval == 0) {
> -			ret = 1;
> -			break;
> -		}
> +			if (curval == 0) {
> +				ret = 1;
> +				break;
> +			}
> 
> -		if (unlikely(curval == -EFAULT)) {
> -			ret = -EFAULT;
> -			break;
> +			if (unlikely(curval == -EFAULT)) {
> +				ret = -EFAULT;
> +				break;
> +			}
>  		}
> 
> -		/*
> -		 * Futex locks manage the owner atomically. We are not in
> -		 * danger of deadlock by preempting a pending owner. Abort the
> -		 * loop if our time slice has expired.  RT Threads can spin
> -		 * until they preempt the owner, at which point they will break
> -		 * out of the loop anyway.
> -		 */
> -		if (need_resched())
> +		if (to && !to->task) {
> +			ret = -ETIMEOUT;
>  			break;
> -
> -		if (timeout) {
> -			now = ktime_get();
> -			if (timeout->tv64 < now.tv64)
> -				break;
>  		}
> 
> -		cpu_relax();


Why remove this?


> -
>  		if ((curval & FUTEX_TID_MASK) != ownertid) {
>  			if (owner)
>  				put_task_struct(owner->task);
>  			owner = futex_owner(uaddr);
>  		}
> -		if (owner && !futex_spin_on_owner(uaddr, owner))
> +
> +		if (!owner->oncpu)
>  			break;
>  	}
> 
> @@ -2510,9 +2498,7 @@ static int futex_lock(u32 __user *uaddr,
>  retry:
>  #ifdef CONFIG_SMP
>  	if (flags & FLAGS_ADAPTIVE) {
> -		preempt_disable();
>  		ret = trylock_futex_adaptive(uaddr, time);
> -		preempt_enable();


Removing the preempt_disable() makes it more likely for a spinner to get
preempted while spinning. Putting these back and adding need_resched()
in trylock_futex_adaptive() will make it much more likely for a task to
queue up on the futex before getting descheduled.


> 
>  		/* We got the lock. */
>  		if (ret == 1) {
> Index: linux-2.6-tip/kernel/sched.c
> ===================================================================
> --- linux-2.6-tip.orig/kernel/sched.c
> +++ linux-2.6-tip/kernel/sched.c
> @@ -3821,72 +3821,6 @@ out:
>  }
>  #endif
> 
> -#ifdef CONFIG_FUTEX
> -#include <linux/futex.h>
> -
> -int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner)
> -{
> -	u32 ownertid, uval;
> -	unsigned int cpu;
> -	struct rq *rq;
> -
> -	if (!sched_feat(OWNER_SPIN))
> -		return 0;
> -
> -#ifdef CONFIG_DEBUG_PAGEALLOC
> -	/*
> -	 * Need to access the cpu field knowing that
> -	 * DEBUG_PAGEALLOC could have unmapped it if
> -	 * the mutex owner just released it and exited.
> -	 */
> -	if (probe_kernel_address(&owner->cpu, cpu))
> -		goto out;
> -#else
> -	cpu = owner->cpu;
> -#endif
> -
> -	/*
> -	 * Even if the access succeeded (likely case),
> -	 * the cpu field may no longer be valid.
> -	 */
> -	if (cpu >= nr_cpumask_bits)
> -		goto out;
> -
> -	/*
> -	 * We need to validate that we can do a
> -	 * get_cpu() and that we have the percpu area.
> -	 */
> -	if (!cpu_online(cpu))
> -		goto out;
> -
> -	rq = cpu_rq(cpu);
> -
> -	if (get_user(ownertid, uaddr))
> -		return -EFAULT;
> -	ownertid &= FUTEX_TID_MASK;
> -
> -	for (;;) {
> -		/*
> -		 * Owner changed, break to re-assess state.
> -		 */
> -		if (get_user(uval, uaddr))
> -			return -EFAULT;
> -		if ((uval & FUTEX_TID_MASK) != ownertid)
> -			break;
> -
> -		/*
> -		 * Is that owner really running on that cpu?
> -		 */
> -		if (task_thread_info(rq->curr) != owner || need_resched())
> -			return 0;
> -
> -		cpu_relax();
> -	}
> -out:
> -	return 1;
> -}
> -#endif
> -
>  #ifdef CONFIG_PREEMPT
>  /*
>   * this is the entry point to schedule() from in-kernel preemption


Note: this version has been modified by Darren Hart to:
o remove the timeout handling changes as the timer is not yet armed.
o "owner" conversion from ti to task
o trylock_futex_adaptive owner null check

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
---
 include/linux/futex.h |    1 -
 kernel/futex.c        |   66 ++++++++++++++++--------------------------------
 kernel/sched.c        |   66 -------------------------------------------------
 3 files changed, 22 insertions(+), 111 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 3ca93d1..ac374f2 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -182,7 +182,6 @@ union futex_key {
 extern void exit_robust_list(struct task_struct *curr);
 extern void exit_pi_state_list(struct task_struct *curr);
 extern int futex_cmpxchg_enabled;
-extern struct thread_info *futex_owner(u32 __user *uaddr);
 #else
 static inline void exit_robust_list(struct task_struct *curr)
 {
diff --git a/kernel/futex.c b/kernel/futex.c
index f9e56b9..37f763f 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -371,16 +371,15 @@ static int get_futex_value_locked(u32 *dest, u32 __user *from)
 
 #ifdef CONFIG_SMP
 /**
- * futex_owner() - Lookup the thread_info struct of ther futex owner
+ * futex_owner() - Lookup the task struct of ther futex owner
  * @uaddr: The user address containing the TID of the owner
  *
- * If a non-NULL ti is returned, the caller must call
- * put_task_struct(ti->task).
+ * If a non-NULL task is returned, the caller must call
+ * put_task_struct(task).
  * */
-struct thread_info *futex_owner(u32 __user *uaddr)
+struct task_struct *futex_owner(u32 __user *uaddr)
 {
-	struct thread_info *ti = NULL;
-	struct task_struct *p;
+	struct task_struct *p = NULL;
 	pid_t pid;
 	u32 uval;
 
@@ -389,20 +388,11 @@ struct thread_info *futex_owner(u32 __user *uaddr)
 
 	pid = uval & FUTEX_TID_MASK;
 	rcu_read_lock();
-	p = find_task_by_vpid(pid);
-	if (p) {
-		const struct cred *cred, *pcred;
-
-		cred = current_cred();
-		pcred = __task_cred(p);
-		if (cred->euid == pcred->euid || cred->euid == pcred->uid) {
-			ti = task_thread_info(p);
-			get_task_struct(ti->task);
-		}
-	}
+	if ((p = find_task_by_vpid(pid)))
+		get_task_struct(p);
 	rcu_read_unlock();
 
-	return ti;
+	return p;
 }
 #endif
 
@@ -2408,7 +2398,7 @@ out:
  */
 static int trylock_futex_adaptive(u32 __user *uaddr, ktime_t *timeout)
 {
-	struct thread_info *owner = NULL;
+	struct task_struct *owner = NULL;
 	pid_t curtid, ownertid = 0;
 	ktime_t now;
 	int ret = 0;
@@ -2425,48 +2415,38 @@ static int trylock_futex_adaptive(u32 __user *uaddr, ktime_t *timeout)
 		/*
 		 * Try to acquire the lock.
 		 */
-		if (curval == 0)
+		if (curval == 0) {
 			curval = cmpxchg_futex_value_locked(uaddr, 0, curtid);
 
-		if (curval == 0) {
-			ret = 1;
-			break;
-		}
+			if (curval == 0) {
+				ret = 1;
+				break;
+			}
 
-		if (unlikely(curval == -EFAULT)) {
-			ret = -EFAULT;
-			break;
+			if (unlikely(curval == -EFAULT)) {
+				ret = -EFAULT;
+				break;
+			}
 		}
 
-		/*
-		 * Futex locks manage the owner atomically. We are not in
-		 * danger of deadlock by preempting a pending owner. Abort the
-		 * loop if our time slice has expired.  RT Threads can spin
-		 * until they preempt the owner, at which point they will break
-		 * out of the loop anyway.
-		 */
-		if (need_resched())
-			break;
-
 		if (timeout) {
 			now = ktime_get();
 			if (timeout->tv64 < now.tv64)
 				break;
 		}
 
-		cpu_relax();
-
 		if ((curval & FUTEX_TID_MASK) != ownertid) {
 			if (owner)
-				put_task_struct(owner->task);
+				put_task_struct(owner);
 			owner = futex_owner(uaddr);
 		}
-		if (owner && !futex_spin_on_owner(uaddr, owner))
+
+		if (owner && !owner->oncpu)
 			break;
 	}
 
 	if (owner)
-		put_task_struct(owner->task);
+		put_task_struct(owner);
 
 	return ret;
 }
@@ -2510,9 +2490,7 @@ static int futex_lock(u32 __user *uaddr, int flags, int detect, ktime_t *time)
 retry:
 #ifdef CONFIG_SMP
 	if (flags & FLAGS_ADAPTIVE) {
-		preempt_disable();
 		ret = trylock_futex_adaptive(uaddr, time);
-		preempt_enable();
 
 		/* We got the lock. */
 		if (ret == 1) {
diff --git a/kernel/sched.c b/kernel/sched.c
index 9915bdf..84424e6 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3827,72 +3827,6 @@ out:
 }
 #endif
 
-#ifdef CONFIG_FUTEX
-#include <linux/futex.h>
-
-int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner)
-{
-	u32 ownertid, uval;
-	unsigned int cpu;
-	struct rq *rq;
-
-	if (!sched_feat(OWNER_SPIN))
-		return 0;
-
-#ifdef CONFIG_DEBUG_PAGEALLOC
-	/*
-	 * Need to access the cpu field knowing that
-	 * DEBUG_PAGEALLOC could have unmapped it if
-	 * the mutex owner just released it and exited.
-	 */
-	if (probe_kernel_address(&owner->cpu, cpu))
-		goto out;
-#else
-	cpu = owner->cpu;
-#endif
-
-	/*
-	 * Even if the access succeeded (likely case),
-	 * the cpu field may no longer be valid.
-	 */
-	if (cpu >= nr_cpumask_bits)
-		goto out;
-
-	/*
-	 * We need to validate that we can do a
-	 * get_cpu() and that we have the percpu area.
-	 */
-	if (!cpu_online(cpu))
-		goto out;
-
-	rq = cpu_rq(cpu);
-
-	if (get_user(ownertid, uaddr))
-		return -EFAULT;
-	ownertid &= FUTEX_TID_MASK;
-
-	for (;;) {
-		/*
-		 * Owner changed, break to re-assess state.
-		 */
-		if (get_user(uval, uaddr))
-			return -EFAULT;
-		if ((uval & FUTEX_TID_MASK) != ownertid)
-			break;
-
-		/*
-		 * Is that owner really running on that cpu?
-		 */
-		if (task_thread_info(rq->curr) != owner || need_resched())
-			return 0;
-
-		cpu_relax();
-	}
-out:
-	return 1;
-}
-#endif
-
 #ifdef CONFIG_PREEMPT
 /*
  * this is the entry point to schedule() from in-kernel preemption
-- 
1.6.3.3


-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH V5 0/4][RFC] futex: FUTEX_LOCK with optional adaptive spinning
@ 2010-04-09  5:15 dvhltc
  2010-04-09  5:15 ` [PATCH 4/4] futex: Add " dvhltc
  0 siblings, 1 reply; 14+ messages in thread
From: dvhltc @ 2010-04-09  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Darren Hart, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason, John Cooper,
	Chris Wright, Ulrich Drepper, Alan Cox, Avi Kivity

NOT FOR INCLUSION

The following patch series implements a new experimental kernel side futex mutex
via new FUTEX_LOCK and FUTEX_LOCK_ADAPTIVE futex op codes. The adaptive spin
allows for multiple spinners until the lock is released or the owner is
descheduled. The patch currently allows the user to specify if they want
spinning or not, but the final version will only provide the adaptive variant as
blocking locks are already very efficiently implemented with the existing
operations.

This version greatly outperforms the last, and actually shows some benefit using
adaptive locks! The primary difference in the implementations was the removal of
any attempt to limit the number of spinners as we have no means to track
spinners. This version also allows spinners to continue spinning if the owner of
a lock changes, and only aborts if the current owner deschedules. Aborting the
spin if the owner changes proved ineffectual as this locking mechanism does a
lot of lock stealing, so many of the loops would be a single cycle, and then
move on to block. Something to note is that the adaptive runs can make
significantly more syscalls than the non-adaptive version, and still show
significant improvement.

Normal Lock
===========
# ./futex_lock -i100000000 -n8 -p1000 -d10 
futex_lock: Measure FUTEX_LOCK operations per second
	Arguments: iterations=100000000 threads=8 adaptive=no
	           period=1000 duty-cycle=10%
Result: 606 Kiter/s
lock calls:      100000000
lock syscalls:   35729444 (35.73%)
unlock calls:    100000000
unlock syscalls: 41740294 (41.74%)

Adaptive Lock
=============
# ./futex_lock -i100000000 -n8 -p1000 -d10 -a
futex_lock: Measure FUTEX_LOCK operations per second
	Arguments: iterations=100000000 threads=8 adaptive=yes
	           period=1000 duty-cycle=10%
Result: 749 Kiter/s
lock calls:      100000000
lock syscalls:   72257679 (72.26%)
unlock calls:    100000000
unlock syscalls: 72265869 (72.27%)

I'm using the futex_lock branch of my futextest test suite to gather results.
The test is performance/futex_lock.c and can be checked out here:

git://git.kernel.org/pub/scm/linux/kernel/git/dvhart/futextest.git

At Avi's suggestion, I prepared plots of multiple thread counts, they are
available at the URL below. These plots are all from runs with a period of 1000
instructions, with one plot per each of several duty-cycles.

http://www.kernel.org/pub/linux/kernel/people/dvhart/adaptive_futex/v5/1000-period/plots.html

Now that an advantage can be shown using FUTEX_LOCK_ADAPTIVE over FUTEX_LOCK,
the next steps as I see them are:

o Try and show improvement of FUTEX_LOCK_ADAPTIVE over FUTEX_WAIT based
  implementations (pthread_mutex specifically).

o Improve workload definition. The current workload is a cpu-hog. It runs a
  fixed set of instructions in a loop, with some percentage of those being
  contained within the lock/unlock block. Adding sleeps to reduce CPU overhead
  just added so much scheduling overhead that the numbers dropped absurdly for
  both normal and adaptive. I'd appreciate any assistance in preparing a
  test-case for a real-world usage scenario. I will work with some of IBM's
  product teams to do so, and would welcome any input from others with an
  interest in kernel assisted userspace spinlocks.

o Attempt the kernel assisted userspace spinning version proposed by Avi, Alan,
  and Ulrich for comparison.

Thanks,

Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-04-09  5:15 [PATCH V5 0/4][RFC] futex: " dvhltc
@ 2010-04-09  5:15 ` dvhltc
  0 siblings, 0 replies; 14+ messages in thread
From: dvhltc @ 2010-04-09  5:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Darren Hart, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason, John Cooper,
	Chris Wright, Ulrich Drepper, Alan Cox, Avi Kivity,
	Peter Zijlstra

From: Darren Hart <dvhltc@us.ibm.com>

Add a non-pi TID value based futex locking mechanism. This enables the
use of adaptive spinning which was problematic with the basic FUTEX_WAIT
operation.

V5: Handle timeout inside adaptive lock spin (needs optimization)
    Fix task_struct ref counting
    Allow CLOCK_RT and default to CLOCK_MONOTONIC for FUTEX_LOCK
    Cleanup futex_lock ret handling (EINTR) cruft
    Remove notion of aggressive adaptive spinning
    Continue spinning if owner changes, only abort spin if owner deschedules
V4: Remove update_futex_waiters() and perform all waiter clearing in
    userspace. Turns out to be generally better to do an O(1) operation in
    userspace that forces one extra syscall than to do an O(N) operation during
    every syscall. Go figure :).
    Cleanup timeout when exiting adaptive loop.
    Perform adaptive spinning after retry:
    General cleanups.
V3: Fix some locking bugs
    Use cmpxchg directly in the adaptive loop
    Move the adaptive loop into its own function
    Use FUTEX_WAIT instead of FUTEX_UNLOCK
    Add update_futex_waiters()
V2: Fix preempt-count breakage

This patch is a forward port of the following patch from Peter Zijlstra
with some bug fixes and a reworking of the futex op codes.

> futex: Implement kernel-side adaptive spinning for futexes
>
> This is implemented as a new futex op: FUTEX_WAIT_ADAPTIVE, because it
> requires the futex lock field to contain a TID and regular futexes do
> not have that restriction.
>
> FUTEX_WAIT_ADAPTIVE will spin while the lock owner is running (on a
> different cpu) or until the task gets preempted. After that it behaves
> like FUTEX_WAIT.
>
> The spin loop tries to acquire the lock by cmpxchg(lock, 0, tid) == tid
> on the lower 30 bits (TID_MASK) of the lock field -- leaving the
> WAITERS and OWNER_DIED flags in tact.
>
> NOTE: we could fold mutex_spin_on_owner() and futex_spin_on_owner() by
> adding a lock_owner function argument.
>
>   void lock_spin_on_owner(struct thread_info *(*lock_owner)(void *lock),
>                           void *lock, struct thread_info *owner);
>
> Peter Zijlstra's original patch forward ported by Darren Hart.
>

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
---
 include/linux/futex.h |    6 +
 include/linux/sched.h |    1 +
 kernel/futex.c        |  311 +++++++++++++++++++++++++++++++++++++++++++------
 kernel/sched.c        |   66 +++++++++++
 4 files changed, 349 insertions(+), 35 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 1e5a26d..3ca93d1 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -20,6 +20,8 @@
 #define FUTEX_WAKE_BITSET	10
 #define FUTEX_WAIT_REQUEUE_PI	11
 #define FUTEX_CMP_REQUEUE_PI	12
+#define FUTEX_LOCK		13
+#define FUTEX_LOCK_ADAPTIVE	14
 
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
@@ -39,6 +41,9 @@
 					 FUTEX_PRIVATE_FLAG)
 #define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | \
 					 FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_PRIVATE		(FUTEX_LOCK | FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_ADAPTIVE_PRIVATE	(FUTEX_LOCK_ADAPTIVE | \
+					 FUTEX_PRIVATE_FLAG)
 
 /*
  * Support for robust futexes: the kernel cleans up held futexes at
@@ -177,6 +182,7 @@ union futex_key {
 extern void exit_robust_list(struct task_struct *curr);
 extern void exit_pi_state_list(struct task_struct *curr);
 extern int futex_cmpxchg_enabled;
+extern struct thread_info *futex_owner(u32 __user *uaddr);
 #else
 static inline void exit_robust_list(struct task_struct *curr)
 {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index dad7f66..885d659 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -362,6 +362,7 @@ extern signed long schedule_timeout_killable(signed long timeout);
 extern signed long schedule_timeout_uninterruptible(signed long timeout);
 asmlinkage void schedule(void);
 extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner);
+extern int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner);
 
 struct nsproxy;
 struct user_namespace;
diff --git a/kernel/futex.c b/kernel/futex.c
index 8c1bb16..2b757b7 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -75,6 +75,7 @@ int __read_mostly futex_cmpxchg_enabled;
 #define FLAGS_SHARED		0x01
 #define FLAGS_CLOCKRT		0x02
 #define FLAGS_HAS_TIMEOUT	0x04
+#define FLAGS_ADAPTIVE		0x08
 
 /*
  * Priority Inheritance state:
@@ -368,6 +369,43 @@ static int get_futex_value_locked(u32 *dest, u32 __user *from)
 	return ret ? -EFAULT : 0;
 }
 
+#ifdef CONFIG_SMP
+/**
+ * futex_owner() - Lookup the thread_info struct of ther futex owner
+ * @uaddr: The user address containing the TID of the owner
+ *
+ * If a non-NULL ti is returned, the caller must call 
+ * put_task_struct(ti->task).
+ * */
+struct thread_info *futex_owner(u32 __user *uaddr)
+{
+	struct thread_info *ti = NULL;
+	struct task_struct *p;
+	pid_t pid;
+	u32 uval;
+
+	if (get_futex_value_locked(&uval, uaddr))
+		return NULL;
+
+	pid = uval & FUTEX_TID_MASK;
+	rcu_read_lock();
+	p = find_task_by_vpid(pid);
+	if (p) {
+		const struct cred *cred, *pcred;
+
+		cred = current_cred();
+		pcred = __task_cred(p);
+		if (cred->euid == pcred->euid || cred->euid == pcred->uid) {
+			ti = task_thread_info(p);
+			get_task_struct(ti->task);
+		}
+	}
+	rcu_read_unlock();
+
+	return ti;
+}
+#endif
+
 
 /*
  * PI code:
@@ -717,7 +755,7 @@ retry:
  * @hb:			the pi futex hash bucket
  * @key:		the futex key associated with uaddr and hb
  * @ps:			the pi_state pointer where we store the result of the
- *			lookup
+ *			lookup. If non-NULL, hb and key must also be non-NULL.
  * @task:		the task to perform the atomic lock work for.  This will
  *			be "current" except in the case of requeue pi.
  * @set_waiters:	force setting the FUTEX_WAITERS bit (1) or not (0)
@@ -727,7 +765,8 @@ retry:
  *  1 - acquired the lock
  * <0 - error
  *
- * The hb->lock and futex_key refs shall be held by the caller.
+ * The hb->lock and futex_key refs shall be held by the caller if the pi_state
+ * pointer is non-NULL.
  */
 static int lock_pi_futex_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
 				union futex_key *key,
@@ -741,37 +780,32 @@ retry:
 	ret = lock_futex_atomic(uaddr, task, set_waiters);
 	if (ret)
 		return ret;
-
-	/*
-	 * We dont have the lock. Look up the PI state (or create it if
-	 * we are the first waiter):
+ 	/*
+	 * We don't have the lock.
+	 * Look up the PI state (or create it if we are the first waiter).
+	 * futex_lock_atomic() calls us with ps==NULL for non-pi atomic futex
+	 * acquisition.
 	 */
-	if (get_futex_value_locked(&uval, uaddr))
-		return -EFAULT;
-	ret = lookup_pi_state(uval, hb, key, ps);
+	if (ps)
+		ret = lookup_pi_state(uval, hb, key, ps);
 
-	if (unlikely(ret)) {
-		switch (ret) {
-		case -ESRCH:
-			/*
-			 * No owner found for this futex. Check if the
-			 * OWNER_DIED bit is set to figure out whether
-			 * this is a robust futex or not.
-			 */
-			if (get_futex_value_locked(&uval, uaddr))
-				return -EFAULT;
+	if (!ps || unlikely(ret) == -ESRCH) {
+		/*
+		 * No owner found for this futex. Check if the
+		 * OWNER_DIED bit is set to figure out whether
+		 * this is a robust futex or not.
+		 */
+		if (get_futex_value_locked(&uval, uaddr))
+			return -EFAULT;
 
-			/*
-			 * We simply start over in case of a robust
-			 * futex. The code above will take the futex
-			 * and return happy.
-			 */
-			if (uval & FUTEX_OWNER_DIED)
-				goto retry;
-		default:
-			break;
-		}
-	}
+		/*
+		 * We simply start over in case of a robust
+		 * futex. The code above will take the futex
+		 * and return happy.
+		 */
+		if (uval & FUTEX_OWNER_DIED)
+			goto retry;
+  	}
 
 	return ret;
 }
@@ -874,13 +908,13 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this)
 	return 0;
 }
 
-static int unlock_futex_pi(u32 __user *uaddr, u32 uval)
+static int unlock_futex(u32 __user *uaddr, u32 uval)
 {
 	u32 oldval;
 
 	/*
 	 * There is no waiter, so we unlock the futex. The owner died
-	 * bit has not to be preserved here. We are the owner:
+	 * bit does not need to be preserved here. We are the owner:
 	 */
 	oldval = cmpxchg_futex_value_locked(uaddr, uval, 0);
 
@@ -2112,7 +2146,7 @@ retry:
 	 * No waiters - kernel unlocks the futex:
 	 */
 	if (!(uval & FUTEX_OWNER_DIED)) {
-		ret = unlock_futex_pi(uaddr, uval);
+		ret = unlock_futex(uaddr, uval);
 		if (ret == -EFAULT)
 			goto pi_faulted;
 	}
@@ -2356,6 +2390,205 @@ out:
 	return ret;
 }
 
+/**
+ * trylock_futex_adaptive() - Try to acquire the futex lock in a busy loop
+ * @uaddr: the futex user address
+ * @timeout: absolute timeout or NULL if none
+ *
+ * Try to acquire a futex lock in a loop until the owner changes or the owner
+ * is descheduled. To lock the futex, set the value to the current TID.
+ *
+ * Returns:
+ *  0 - Gave up, lock not acquired
+ *  1 - Futex lock acquired
+ * <0 - On error
+ */
+static int trylock_futex_adaptive(u32 __user *uaddr, ktime_t *timeout)
+{
+	struct thread_info *owner = NULL;
+	pid_t curtid, ownertid = 0;
+	ktime_t now;
+	int ret = 0;
+	u32 curval;
+
+	curtid = task_pid_vnr(current); 
+
+	for (;;) {
+		/*
+		 * Try to acquire the lock. If we acquire it or error out,
+		 * break to enable preemption and handle ret outside the loop.
+		 */
+		curval = cmpxchg_futex_value_locked(uaddr, 0, curtid);
+
+		if (curval == 0) {
+			ret = 1;
+			break;
+		}
+
+		if (unlikely(curval == -EFAULT)) {
+			ret = -EFAULT;
+			break;
+		}
+
+		/*
+		 * Futex locks manage the owner atomically. We are not in
+		 * danger of deadlock by preempting a pending owner. Abort the
+		 * loop if our time slice has expired.  RT Threads can spin
+		 * until they preempt the owner, at which point they will break
+		 * out of the loop anyway.
+		 */
+		if (need_resched())
+			break;
+
+		if (timeout) {
+			now = ktime_get();
+			if (timeout->tv64 < now.tv64)
+				break;
+		}
+
+		cpu_relax();
+
+		if ((curval & FUTEX_TID_MASK) != ownertid) {
+			if (owner)
+				put_task_struct(owner->task);
+			owner = futex_owner(uaddr);
+		}
+		if (owner && !futex_spin_on_owner(uaddr, owner))
+			break;
+	}
+
+	if (owner)
+		put_task_struct(owner->task);
+
+	return ret;
+}
+
+/**
+ * futex_lock() - Acquire the lock and set the futex value
+ * @uaddr:  the futex user address
+ * @flags:  futex flags (FLAGS_SHARED, FLAGS_CLOCKRT, FLAGS_ADAPTIVE, etc.)
+ * @detect: detect deadlock (1) or not (0)
+ * @time:   absolute timeout
+ *
+ * futex_(un)lock() define a futex value policy and implement a full mutex. The
+ * futex value stores the owner's TID or'd with FUTEX_WAITERS and/or
+ * FUTEX_OWNER_DIED as appropriate.
+ *
+ * Userspace tried a 0 -> TID atomic transition of the futex value and failed.
+ * Try to acquire the lock in the kernel, blocking if necessary. Return to
+ * userspace with the lock held and the futex value set accordingly (or after a
+ * timeout).
+ *
+ * Returns:
+ *  0 - On success
+ * <0 - On error
+ */
+static int futex_lock(u32 __user *uaddr, int flags, int detect, ktime_t *time)
+{
+	struct hrtimer_sleeper timeout, *to = NULL;
+	struct futex_hash_bucket *hb;
+	struct futex_q q = FUTEX_Q_INIT;
+	int ret = 0;
+
+	if (refill_pi_state_cache())
+		return -ENOMEM;
+
+	if (time) {
+		to = &timeout;
+		hrtimer_init_on_stack(&to->timer, (flags & FLAGS_CLOCKRT) ?
+				      CLOCK_REALTIME : CLOCK_MONOTONIC,
+				      HRTIMER_MODE_ABS);
+		hrtimer_init_sleeper(to, current);
+		hrtimer_set_expires(&to->timer, *time);
+	}
+
+retry:
+#ifdef CONFIG_SMP
+	if (flags & FLAGS_ADAPTIVE) {
+		preempt_disable();
+		ret = trylock_futex_adaptive(uaddr, time);
+		preempt_enable();
+
+		/* We got the lock. */
+		if (ret == 1) {
+			ret = 0;
+			goto out;
+		}
+
+		/* We encountered an error, -EFAULT or -ETIMEDOUT */
+		if (ret)
+			goto out;
+	}
+#endif
+	q.key = FUTEX_KEY_INIT;
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key);
+	if (unlikely(ret != 0))
+		goto out;
+
+	hb = queue_lock(&q);
+
+	ret = lock_futex_atomic(uaddr, current, 0);
+	if (unlikely(ret)) {
+		switch (ret) {
+		case 1:
+			/* We got the lock. */
+			ret = 0;
+			goto out_unlock_put_key;
+		case -EFAULT:
+			goto uaddr_faulted;
+		default:
+			goto out_unlock_put_key;
+		}
+	}
+	/*
+	 * Only actually queue now that the atomic ops are done:
+	 */
+	futex_wait_queue_me(hb, &q, to);
+
+	ret = unqueue_me(&q);
+
+	/* If we were woken by the owner, try and acquire the lock. */
+	if (!ret)
+		goto retry;
+
+	ret = -ETIMEDOUT;
+	if (to && !to->task)
+		goto out_put_key;
+
+	/*
+	 * We expect signal_pending(current), but we might be the
+	 * victim of a spurious wakeup as well.
+         */
+	ret = -ERESTARTNOINTR;
+	if (!signal_pending(current)) {
+		put_futex_key(&q.key);
+		goto retry;
+	}
+
+	goto out_put_key;
+
+out_unlock_put_key:
+	queue_unlock(&q, hb);
+
+out_put_key:
+	put_futex_key(&q.key);
+out:
+	if (to)
+		destroy_hrtimer_on_stack(&to->timer);
+	return ret;
+
+uaddr_faulted:
+	queue_unlock(&q, hb);
+
+	ret = fault_in_user_writeable(uaddr);
+	if (ret)
+		goto out_put_key;
+
+	put_futex_key(&q.key);
+	goto retry;
+}
+
+
 /*
  * Support for robust futexes: the kernel cleans up held futexes at
  * thread exit time.
@@ -2579,7 +2812,8 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 
 	if (op & FUTEX_CLOCK_REALTIME) {
 		flags |= FLAGS_CLOCKRT;
-		if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
+		if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI &&
+		    cmd != FUTEX_LOCK && cmd != FUTEX_LOCK_ADAPTIVE)
 			return -ENOSYS;
 	}
 
@@ -2623,6 +2857,12 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 	case FUTEX_CMP_REQUEUE_PI:
 		ret = futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1);
 		break;
+	case FUTEX_LOCK_ADAPTIVE:
+		flags |= FLAGS_ADAPTIVE;
+	case FUTEX_LOCK:
+		if (futex_cmpxchg_enabled)
+			ret = futex_lock(uaddr, flags, val, timeout);
+		break;
 	default:
 		ret = -ENOSYS;
 	}
@@ -2641,7 +2881,8 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
 
 	if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
 		      cmd == FUTEX_WAIT_BITSET ||
-		      cmd == FUTEX_WAIT_REQUEUE_PI)) {
+		      cmd == FUTEX_WAIT_REQUEUE_PI ||
+		      cmd == FUTEX_LOCK || cmd == FUTEX_LOCK_ADAPTIVE)) {
 		if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
 			return -EFAULT;
 		if (!timespec_valid(&ts))
diff --git a/kernel/sched.c b/kernel/sched.c
index 9ab3cd7..4bb519e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3818,6 +3818,72 @@ int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner)
 out:
 	return 1;
 }
+
+#ifdef CONFIG_FUTEX
+#include <linux/futex.h>
+
+int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner)
+{
+	u32 ownertid, uval;
+	unsigned int cpu;
+	struct rq *rq;
+
+	if (!sched_feat(OWNER_SPIN))
+		return 0;
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
+	/*
+	 * Need to access the cpu field knowing that
+	 * DEBUG_PAGEALLOC could have unmapped it if
+	 * the mutex owner just released it and exited.
+	 */
+	if (probe_kernel_address(&owner->cpu, cpu))
+		goto out;
+#else
+	cpu = owner->cpu;
+#endif
+
+	/*
+	 * Even if the access succeeded (likely case),
+	 * the cpu field may no longer be valid.
+	 */
+	if (cpu >= nr_cpumask_bits)
+		goto out;
+
+	/*
+	 * We need to validate that we can do a
+	 * get_cpu() and that we have the percpu area.
+	 */
+	if (!cpu_online(cpu))
+		goto out;
+
+	rq = cpu_rq(cpu);
+
+	if (get_user(ownertid, uaddr))
+		return -EFAULT;
+	ownertid &= FUTEX_TID_MASK;
+
+	for (;;) {
+		/*
+		 * Owner changed, break to re-assess state.
+		 */
+		if (get_user(uval, uaddr))
+			return -EFAULT;
+		if ((uval & FUTEX_TID_MASK) != ownertid)
+			break;
+
+		/*
+		 * Is that owner really running on that cpu?
+		 */
+		if (task_thread_info(rq->curr) != owner || need_resched())
+			return 0;
+
+		cpu_relax();
+	}
+out:
+	return 1;
+}
+#endif
 #endif
 
 #ifdef CONFIG_PREEMPT
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-05-07 19:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-06  6:24 [PATCH V6 0/4][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
2010-05-06  6:24 ` [PATCH 1/4] futex: replace fshared and clockrt with combined flags Darren Hart
2010-05-06  6:24 ` [PATCH 2/4] futex: add futex_q static initializer Darren Hart
2010-05-06  6:24 ` [PATCH 3/4] futex: refactor futex_lock_pi_atomic Darren Hart
2010-05-06  6:24 ` [PATCH 4/4] futex: Add FUTEX_LOCK with optional adaptive spinning Darren Hart
2010-05-07 16:20   ` Thomas Gleixner
2010-05-07 16:24     ` Peter Zijlstra
2010-05-07 16:30       ` Thomas Gleixner
2010-05-07 16:35         ` Peter Zijlstra
2010-05-07 16:43           ` Thomas Gleixner
2010-05-07 19:05             ` Darren Hart
2010-05-07 16:52     ` Darren Hart
2010-05-07 19:11     ` Darren Hart
  -- strict thread matches above, loose matches on Subject: below --
2010-04-09  5:15 [PATCH V5 0/4][RFC] futex: " dvhltc
2010-04-09  5:15 ` [PATCH 4/4] futex: Add " dvhltc

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox