All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] futex: Wakeup optimizations
@ 2013-12-19 20:45 Davidlohr Bueso
  2013-12-19 20:45 ` [PATCH v4 1/4] futex: Misc cleanups Davidlohr Bueso
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2013-12-19 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, dvhart, peterz, tglx, paulmck, efault, jeffm, torvalds,
	jason.low2, Waiman.Long, tom.vaden, scott.norton, aswin,
	davidlohr

Changes from v3:
 - Fix SOB tags.

Changes from v2 [http://lwn.net/Articles/575449/]:
 - Reordered SOB tags to reflect me as primary author.

 - Improved ordering guarantee comments for patch 4.

 - Rebased patch 4 against Linus' tree (this patch didn't
   apply after the recent futex changes/fixes).

Changes from v1 [https://lkml.org/lkml/2013/11/22/525]:
 - Removed patch "futex: Check for pi futex_q only once".

 - Cleaned up ifdefs for larger hash table.

 - Added a doc patch from tglx that describes the futex 
   ordering guarantees.

 - Improved the lockless plist check for the wake calls.
   Based on the community feedback, the necessary abstractions
   and barriers are added to maintain ordering guarantees.
   Code documentation is also updated.

 - Removed patch "sched,futex: Provide delayed wakeup list".
   Based on feedback from PeterZ, I will look into this as
   a separate issue once the other patches are settled.
 

We have been dealing with a customer database workload on large
12Tb, 240 core 16 socket NUMA system that exhibits high amounts 
of contention on some of the locks that serialize internal futex 
data structures. This workload specially suffers in the wakeup 
paths, where waiting on the corresponding hb->lock can account for 
up to ~60% of the time. The result of such calls can mostly be 
classified as (i) nothing to wake up and (ii) wakeup large amount 
of tasks.

Before these patches are applied, we can see this pathological behavior:

 37.12%  826174  xxx  [kernel.kallsyms] [k] _raw_spin_lock
            --- _raw_spin_lock
             |
             |--97.14%-- futex_wake
             |          do_futex
             |          sys_futex
             |          system_call_fastpath
             |          |
             |          |--99.70%-- 0x7f383fbdea1f
             |          |           yyy

 43.71%  762296  xxx  [kernel.kallsyms] [k] _raw_spin_lock
            --- _raw_spin_lock
             |
             |--53.74%-- futex_wake
             |          do_futex
             |          sys_futex
             |          system_call_fastpath
             |          |
             |          |--99.40%-- 0x7fe7d44a4c05
             |          |           zzz
             |--45.90%-- futex_wait_setup
             |          futex_wait
             |          do_futex
             |          sys_futex
             |          system_call_fastpath
             |          0x7fe7ba315789
             |          syscall


With these patches, contention is practically non existent:

 0.10%     49   xxx  [kernel.kallsyms]   [k] _raw_spin_lock
               --- _raw_spin_lock
                |
                |--76.06%-- futex_wait_setup
                |          futex_wait
                |          do_futex
                |          sys_futex
                |          system_call_fastpath
                |          |
                |          |--99.90%-- 0x7f3165e63789
                |          |          syscall|
                           ...
                |--6.27%-- futex_wake
                |          do_futex
                |          sys_futex
                |          system_call_fastpath
                |          |
                |          |--54.56%-- 0x7f317fff2c05
                ...

Patch 1 is a cleanup.

Patch 2 addresses the well known issue of the global hash table.
By creating a larger and NUMA aware table, we can reduce the false
sharing and collisions, thus reducing the chance of different futexes 
using hb->lock.

Patch 3 documents the futex ordering guarantees.

Patch 4 reduces contention on the corresponding hb->lock by not trying to
acquire it if there are no blocked tasks in the waitqueue.
This particularly deals with point (i) above, where we see that it is not
uncommon for up to 90% of wakeup calls end up returning 0, indicating that no
tasks were woken.

This patchset has also been tested on smaller systems for a variety of
benchmarks, including java workloads, kernel builds and custom bang-the-hell-out-of
hb locks programs. So far, no functional or performance regressions have been seen.
Furthermore, no issues were found when running the different tests in the futextest 
suite: http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/

This patchset applies on top of Linus' tree as of v3.13-rc4 (f7556698).

Special thanks to Scott Norton, Tom Vanden, Mark Ray and Aswin Chandramouleeswaran
for help presenting, debugging and analyzing the data.

  futex: Misc cleanups
  futex: Larger hash table
  futex: Document ordering guarantees
  futex: Avoid taking hb lock if nothing to wakeup

 kernel/futex.c | 236 +++++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 196 insertions(+), 40 deletions(-)

-- 
1.8.1.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/4] futex: Misc cleanups
  2013-12-19 20:45 [PATCH v4 0/4] futex: Wakeup optimizations Davidlohr Bueso
@ 2013-12-19 20:45 ` Davidlohr Bueso
  2013-12-19 20:45 ` [PATCH v4 2/4] futex: Larger hash table Davidlohr Bueso
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2013-12-19 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, dvhart, peterz, tglx, paulmck, efault, jeffm, torvalds,
	jason.low2, Waiman.Long, tom.vaden, scott.norton, aswin,
	davidlohr, Ingo Molnar

From: Jason Low <jason.low2@hp.com>

- Remove unnecessary head variables.
- Delete unused parameter in queue_unlock().

Cc: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Scott Norton <scott.norton@hp.com>
Cc: Tom Vaden <tom.vaden@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 kernel/futex.c | 39 ++++++++++++---------------------------
 1 file changed, 12 insertions(+), 27 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index f6ff019..085f5fa 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -598,13 +598,10 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
 {
 	struct futex_pi_state *pi_state = NULL;
 	struct futex_q *this, *next;
-	struct plist_head *head;
 	struct task_struct *p;
 	pid_t pid = uval & FUTEX_TID_MASK;
 
-	head = &hb->chain;
-
-	plist_for_each_entry_safe(this, next, head, list) {
+	plist_for_each_entry_safe(this, next, &hb->chain, list) {
 		if (match_futex(&this->key, key)) {
 			/*
 			 * Another waiter already exists - bump up
@@ -986,7 +983,6 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 {
 	struct futex_hash_bucket *hb;
 	struct futex_q *this, *next;
-	struct plist_head *head;
 	union futex_key key = FUTEX_KEY_INIT;
 	int ret;
 
@@ -999,9 +995,8 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 
 	hb = hash_futex(&key);
 	spin_lock(&hb->lock);
-	head = &hb->chain;
 
-	plist_for_each_entry_safe(this, next, head, list) {
+	plist_for_each_entry_safe(this, next, &hb->chain, list) {
 		if (match_futex (&this->key, &key)) {
 			if (this->pi_state || this->rt_waiter) {
 				ret = -EINVAL;
@@ -1034,7 +1029,6 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
 {
 	union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
 	struct futex_hash_bucket *hb1, *hb2;
-	struct plist_head *head;
 	struct futex_q *this, *next;
 	int ret, op_ret;
 
@@ -1082,9 +1076,7 @@ retry_private:
 		goto retry;
 	}
 
-	head = &hb1->chain;
-
-	plist_for_each_entry_safe(this, next, head, list) {
+	plist_for_each_entry_safe(this, next, &hb1->chain, list) {
 		if (match_futex (&this->key, &key1)) {
 			if (this->pi_state || this->rt_waiter) {
 				ret = -EINVAL;
@@ -1097,10 +1089,8 @@ retry_private:
 	}
 
 	if (op_ret > 0) {
-		head = &hb2->chain;
-
 		op_ret = 0;
-		plist_for_each_entry_safe(this, next, head, list) {
+		plist_for_each_entry_safe(this, next, &hb2->chain, list) {
 			if (match_futex (&this->key, &key2)) {
 				if (this->pi_state || this->rt_waiter) {
 					ret = -EINVAL;
@@ -1270,7 +1260,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
 	int drop_count = 0, task_count = 0, ret;
 	struct futex_pi_state *pi_state = NULL;
 	struct futex_hash_bucket *hb1, *hb2;
-	struct plist_head *head1;
 	struct futex_q *this, *next;
 	u32 curval2;
 
@@ -1393,8 +1382,7 @@ retry_private:
 		}
 	}
 
-	head1 = &hb1->chain;
-	plist_for_each_entry_safe(this, next, head1, list) {
+	plist_for_each_entry_safe(this, next, &hb1->chain, list) {
 		if (task_count - nr_wake >= nr_requeue)
 			break;
 
@@ -1494,7 +1482,7 @@ static inline struct futex_hash_bucket *queue_lock(struct futex_q *q)
 }
 
 static inline void
-queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb)
+queue_unlock(struct futex_hash_bucket *hb)
 	__releases(&hb->lock)
 {
 	spin_unlock(&hb->lock);
@@ -1867,7 +1855,7 @@ retry_private:
 	ret = get_futex_value_locked(&uval, uaddr);
 
 	if (ret) {
-		queue_unlock(q, *hb);
+		queue_unlock(*hb);
 
 		ret = get_user(uval, uaddr);
 		if (ret)
@@ -1881,7 +1869,7 @@ retry_private:
 	}
 
 	if (uval != val) {
-		queue_unlock(q, *hb);
+		queue_unlock(*hb);
 		ret = -EWOULDBLOCK;
 	}
 
@@ -2029,7 +2017,7 @@ retry_private:
 			 * Task is exiting and we just wait for the
 			 * exit to complete.
 			 */
-			queue_unlock(&q, hb);
+			queue_unlock(hb);
 			put_futex_key(&q.key);
 			cond_resched();
 			goto retry;
@@ -2081,7 +2069,7 @@ retry_private:
 	goto out_put_key;
 
 out_unlock_put_key:
-	queue_unlock(&q, hb);
+	queue_unlock(hb);
 
 out_put_key:
 	put_futex_key(&q.key);
@@ -2091,7 +2079,7 @@ out:
 	return ret != -EINTR ? ret : -ERESTARTNOINTR;
 
 uaddr_faulted:
-	queue_unlock(&q, hb);
+	queue_unlock(hb);
 
 	ret = fault_in_user_writeable(uaddr);
 	if (ret)
@@ -2113,7 +2101,6 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
 {
 	struct futex_hash_bucket *hb;
 	struct futex_q *this, *next;
-	struct plist_head *head;
 	union futex_key key = FUTEX_KEY_INIT;
 	u32 uval, vpid = task_pid_vnr(current);
 	int ret;
@@ -2153,9 +2140,7 @@ retry:
 	 * Ok, other tasks may need to be woken up - check waiters
 	 * and do the wakeup if necessary:
 	 */
-	head = &hb->chain;
-
-	plist_for_each_entry_safe(this, next, head, list) {
+	plist_for_each_entry_safe(this, next, &hb->chain, list) {
 		if (!match_futex (&this->key, &key))
 			continue;
 		ret = wake_futex_pi(uaddr, uval, this);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/4] futex: Larger hash table
  2013-12-19 20:45 [PATCH v4 0/4] futex: Wakeup optimizations Davidlohr Bueso
  2013-12-19 20:45 ` [PATCH v4 1/4] futex: Misc cleanups Davidlohr Bueso
@ 2013-12-19 20:45 ` Davidlohr Bueso
  2013-12-19 20:45 ` [PATCH v4 3/4] futex: Document ordering guarantees Davidlohr Bueso
  2013-12-19 20:45 ` [PATCH v4 4/4] futex: Avoid taking hb lock if nothing to wakeup Davidlohr Bueso
  3 siblings, 0 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2013-12-19 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, dvhart, peterz, tglx, paulmck, efault, jeffm, torvalds,
	jason.low2, Waiman.Long, tom.vaden, scott.norton, aswin,
	davidlohr, Ingo Molnar

From: Davidlohr Bueso <davidlohr@hp.com>

Currently, the futex global hash table suffers from it's fixed, smallish
(for today's standards) size of 256 entries, as well as its lack of NUMA
awareness. Large systems, using many futexes, can be prone to high amounts
of collisions; where these futexes hash to the same bucket and lead to
extra contention on the same hb->lock. Furthermore, cacheline bouncing is a
reality when we have multiple hb->locks residing on the same cacheline and
different futexes hash to adjacent buckets.

This patch keeps the current static size of 16 entries for small systems,
or otherwise, 256 * ncpus (or larger as we need to round the number to a
power of 2). Note that this number of CPUs accounts for all CPUs that can
ever be available in the system, taking into consideration things like
hotpluging. While we do impose extra overhead at bootup by making the hash
table larger, this is a one time thing, and does not shadow the benefits
of this patch.

Furthermore, as suggested by tglx, by cache aligning the hash buckets we can
avoid access across cacheline boundaries and also avoid massive cache line
bouncing if multiple cpus are hammering away at different hash buckets which
happen to reside in the same cache line.

Also, similar to other core kernel components (pid, dcache, tcp), by using
alloc_large_system_hash() we benefit from its NUMA awareness and thus the
table is distributed among the nodes instead of in a single one.

For a custom microbenchmark that pounds on the uaddr hashing -- making the wait
path fail at futex_wait_setup() returning -EWOULDBLOCK for large amounts of
futexes, we can see the following benefits on a 80-core, 8-socket 1Tb server:

+---------+--------------------+------------------------+-----------------------+-------------------------------+
| threads | baseline (ops/sec) | aligned-only (ops/sec) | large table (ops/sec) | large table+aligned (ops/sec) |
+---------+--------------------+------------------------+-----------------------+-------------------------------+
|     512 |		 32426 | 50531  (+55.8%)	| 255274  (+687.2%)	| 292553  (+802.2%)		|
|     256 |		 65360 | 99588  (+52.3%)	| 443563  (+578.6%)	| 508088  (+677.3%)		|
|     128 |		125635 | 200075 (+59.2%)	| 742613  (+491.1%)	| 835452  (+564.9%)		|
|      80 |		193559 | 323425 (+67.1%)	| 1028147 (+431.1%)	| 1130304 (+483.9%)		|
|      64 |		247667 | 443740 (+79.1%)	| 997300  (+302.6%)	| 1145494 (+362.5%)		|
|      32 |		628412 | 721401 (+14.7%)	| 965996  (+53.7%)	| 1122115 (+78.5%)		|
+---------+--------------------+------------------------+-----------------------+-------------------------------+

Cc: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Scott Norton <scott.norton@hp.com>
Cc: Tom Vaden <tom.vaden@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Reviewed-by: Waiman Long <Waiman.Long@hp.com>
Reviewed-and-tested-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 kernel/futex.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 085f5fa..577481d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -63,6 +63,7 @@
 #include <linux/sched/rt.h>
 #include <linux/hugetlb.h>
 #include <linux/freezer.h>
+#include <linux/bootmem.h>
 
 #include <asm/futex.h>
 
@@ -70,8 +71,6 @@
 
 int __read_mostly futex_cmpxchg_enabled;
 
-#define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)
-
 /*
  * Futex flags used to encode options to functions and preserve them across
  * restarts.
@@ -149,9 +148,11 @@ static const struct futex_q futex_q_init = {
 struct futex_hash_bucket {
 	spinlock_t lock;
 	struct plist_head chain;
-};
+} ____cacheline_aligned_in_smp;
 
-static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS];
+static unsigned long __read_mostly futex_hashsize;
+
+static struct futex_hash_bucket *futex_queues;
 
 /*
  * We hash on the keys returned from get_futex_key (see below).
@@ -161,7 +162,7 @@ static struct futex_hash_bucket *hash_futex(union futex_key *key)
 	u32 hash = jhash2((u32*)&key->both.word,
 			  (sizeof(key->both.word)+sizeof(key->both.ptr))/4,
 			  key->both.offset);
-	return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)];
+	return &futex_queues[hash & (futex_hashsize - 1)];
 }
 
 /*
@@ -2719,7 +2720,18 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
 static int __init futex_init(void)
 {
 	u32 curval;
-	int i;
+	unsigned long i;
+
+#if CONFIG_BASE_SMALL
+	futex_hashsize = 16;
+#else
+	futex_hashsize = roundup_pow_of_two(256 * num_possible_cpus());
+#endif
+
+	futex_queues = alloc_large_system_hash("futex", sizeof(*futex_queues),
+					       futex_hashsize, 0,
+					       futex_hashsize < 256 ? HASH_SMALL : 0,
+					       NULL, NULL, futex_hashsize, futex_hashsize);
 
 	/*
 	 * This will fail and we want it. Some arch implementations do
@@ -2734,7 +2746,7 @@ static int __init futex_init(void)
 	if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT)
 		futex_cmpxchg_enabled = 1;
 
-	for (i = 0; i < ARRAY_SIZE(futex_queues); i++) {
+	for (i = 0; i < futex_hashsize; i++) {
 		plist_head_init(&futex_queues[i].chain);
 		spin_lock_init(&futex_queues[i].lock);
 	}
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/4] futex: Document ordering guarantees
  2013-12-19 20:45 [PATCH v4 0/4] futex: Wakeup optimizations Davidlohr Bueso
  2013-12-19 20:45 ` [PATCH v4 1/4] futex: Misc cleanups Davidlohr Bueso
  2013-12-19 20:45 ` [PATCH v4 2/4] futex: Larger hash table Davidlohr Bueso
@ 2013-12-19 20:45 ` Davidlohr Bueso
  2013-12-19 20:51   ` Randy Dunlap
  2013-12-19 20:45 ` [PATCH v4 4/4] futex: Avoid taking hb lock if nothing to wakeup Davidlohr Bueso
  3 siblings, 1 reply; 12+ messages in thread
From: Davidlohr Bueso @ 2013-12-19 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, dvhart, peterz, tglx, paulmck, efault, jeffm, torvalds,
	jason.low2, Waiman.Long, tom.vaden, scott.norton, aswin,
	davidlohr, Ingo Molnar

From: Thomas Gleixner <tglx@linutronix.de>

That's essential, if you want to hack on futexes.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Scott Norton <scott.norton@hp.com>
Cc: Tom Vaden <tom.vaden@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: Jason Low <jason.low2@hp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 kernel/futex.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/kernel/futex.c b/kernel/futex.c
index 577481d..af1fc31 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -69,6 +69,63 @@
 
 #include "locking/rtmutex_common.h"
 
+/*
+ * Basic futex operation and ordering guarantees:
+ *
+ * The waiter reads the futex value in user space and calls
+ * futex_wait(). It computes the hash bucket and acquires the hash
+ * bucket lock. After that it reads the futex user space value again
+ * and verifies that the data has not changed. If it has not changed
+ * it enqueues itself into the hash bucket, releases the hash
+ * bucket lock and schedules.
+ *
+ * The waker side modifies the user space value of the futex and calls
+ * futex_wake(). It computes the hash bucket and acquires the hash
+ * bucket lock. Then it looks for waiters on that futex in the hash
+ * bucket and wakes them.
+ *
+ * Note that the spin_lock serializes waiters and wakers, so that the
+ * following scenario is avoided:
+ *
+ * CPU 0                               CPU 1
+ * val = *futex;
+ * sys_futex(WAIT, futex, val);
+ *   futex_wait(futex, val);
+ *   uval = *futex;
+ *                                     *futex = newval;
+ *                                     sys_futex(WAKE, futex);
+ *                                       futex_wake(futex);
+ *                                       if (queue_empty())
+ *                                         return;
+ *   if (uval == val)
+ *      lock(hash_bucket(futex));
+ *      queue();
+ *     unlock(hash_bucket(futex));
+ *     schedule();
+ *
+ * This would cause the waiter on CPU 0 to wait forever because it
+ * missed the transition of the user space value from val to newval
+ * and the waker did not find the waiter in the hash bucket queue.
+ * The spinlock serializes that:
+ *
+ * CPU 0                               CPU 1
+ * val = *futex;
+ * sys_futex(WAIT, futex, val);
+ *   futex_wait(futex, val);
+ *   lock(hash_bucket(futex));
+ *   uval = *futex;
+ *                                     *futex = newval;
+ *                                     sys_futex(WAKE, futex);
+ *                                       futex_wake(futex);
+ *                                       lock(hash_bucket(futex));
+ *   if (uval == val)
+ *      queue();
+ *     unlock(hash_bucket(futex));
+ *     schedule();                       if (!queue_empty())
+ *                                         wake_waiters(futex);
+ *                                       unlock(hash_bucket(futex));
+ */
+
 int __read_mostly futex_cmpxchg_enabled;
 
 /*
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 4/4] futex: Avoid taking hb lock if nothing to wakeup
  2013-12-19 20:45 [PATCH v4 0/4] futex: Wakeup optimizations Davidlohr Bueso
                   ` (2 preceding siblings ...)
  2013-12-19 20:45 ` [PATCH v4 3/4] futex: Document ordering guarantees Davidlohr Bueso
@ 2013-12-19 20:45 ` Davidlohr Bueso
  3 siblings, 0 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2013-12-19 20:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, dvhart, peterz, tglx, paulmck, efault, jeffm, torvalds,
	jason.low2, Waiman.Long, tom.vaden, scott.norton, aswin,
	davidlohr, Ingo Molnar

From: Davidlohr Bueso <davidloh@hp.com>

In futex_wake() there is clearly no point in taking the hb->lock if
we know beforehand that there are no tasks to be woken. This comes
at the smaller cost of doing some atomic operations to keep track of
the list's size. Specifically, increment the counter when an element is
added to the list, and decrement when it is removed. Of course, if the
counter is 0, then there are no tasks blocked on a futex. Some special
considerations:

- increment the counter at queue_lock() as we always end up calling
  queue_me() which adds the element to the list. Upon any error,
  queue_unlock() is called for housekeeping, for which we decrement
  to mach the increment done in queue_lock().

- decrement the counter at __unqueue_me() to reflect when an element is
  removed from the queue for wakeup related purposes.

We make sure that the futex ordering guarantees are preserved, ensuring
that waiters either observes the changed user space value before blocking or
is woken by a concurrent waker. This is done by relying on the barriers in
atomic_inc() -- for archs that do have implict mb in atomic_inc() we
explicitly add them through a set of new functions that are introduced:
futex_get_mm(), hb_waiters_inc() and hb_waiters_dec(). For more details
please refer to the updated comments in the code and related discussion:
https://lkml.org/lkml/2013/11/26/556

Furthermore, the overhead of new barriers is trivial. The following are some
results, from a quad-core x86_64 laptop, measuring the latency of nthread
wakeups (1 at a time), for 1000 runs:

+----------+-----------------------------+------------------------------+
| nthreads | baseline time (ms) [stddev] | atomicops time (ms) [stddev] |
+----------+-----------------------------+------------------------------+
|      512 | 2.8360 [0.5168]             | 3.8150 [1.3293]              |
|      256 | 2.5080 [0.6375]             | 2.5980 [0.9079]              |
|      128 | 1.0200 [0.4264]             | 1.5180 [0.4902]              |
|       64 | 0.7890 [0.2667]             | 0.4020 [0.2447]              |
|       32 | 0.1150 [0.0184]             | 0.1490 [0.1156]              |
+----------+-----------------------------+------------------------------+

Special thanks to tglx for careful review and feedback.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Scott Norton <scott.norton@hp.com>
Cc: Tom Vaden <tom.vaden@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Reviewed-by: Waiman Long <Waiman.Long@hp.com>
Reviewed-and-tested-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
---
 kernel/futex.c | 142 +++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 122 insertions(+), 20 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index af1fc31..201af6b 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -84,8 +84,11 @@
  * bucket lock. Then it looks for waiters on that futex in the hash
  * bucket and wakes them.
  *
- * Note that the spin_lock serializes waiters and wakers, so that the
- * following scenario is avoided:
+ * In scenarios where wakeups are called and no tasks are blocked on a futex,
+ * taking the hb spinlock can be avoided and simply return. In order for this
+ * optimization to work, ordering guarantees must exist so that the waiter
+ * being added to the list is acknowledged when the list is concurrently being
+ * checked by the waker, avoiding scenarios like the following:
  *
  * CPU 0                               CPU 1
  * val = *futex;
@@ -106,24 +109,49 @@
  * This would cause the waiter on CPU 0 to wait forever because it
  * missed the transition of the user space value from val to newval
  * and the waker did not find the waiter in the hash bucket queue.
- * The spinlock serializes that:
+ *
+ * The correct serialization ensures that a waiter either observes
+ * the changed user space value before blocking or is woken by a
+ * concurrent waker:
  *
  * CPU 0                               CPU 1
  * val = *futex;
  * sys_futex(WAIT, futex, val);
  *   futex_wait(futex, val);
- *   lock(hash_bucket(futex));
- *   uval = *futex;
- *                                     *futex = newval;
- *                                     sys_futex(WAKE, futex);
- *                                       futex_wake(futex);
- *                                       lock(hash_bucket(futex));
+ *
+ *   waiters++;
+ *   mb(); (A) <-- paired with -.
+ *                              |
+ *   lock(hash_bucket(futex));  |
+ *                              |
+ *   uval = *futex;             |
+ *                              |        *futex = newval;
+ *                              |        sys_futex(WAKE, futex);
+ *                              |          futex_wake(futex);
+ *                              |
+ *                              `------->   mb(); (B)
  *   if (uval == val)
- *      queue();
+ *     queue();
  *     unlock(hash_bucket(futex));
- *     schedule();                       if (!queue_empty())
- *                                         wake_waiters(futex);
- *                                       unlock(hash_bucket(futex));
+ *     schedule();                         if (waiters)
+ *                                           lock(hash_bucket(futex));
+ *                                           wake_waiters(futex);
+ *                                           unlock(hash_bucket(futex));
+ *
+ * Where (A) orders the waiters increment and the futex value read; and
+ * where (B) orders the write to futex and the waiters read.
+ *
+ * This yields the following case (where X:=waiters, Y:=futex):
+ *
+ *	X = Y = 0
+ *
+ *	w[X]=1		w[Y]=1
+ *	MB		MB
+ *	r[Y]=y		r[X]=x
+ *
+ * Which guarantees that x==0 && y==0 is impossible; which translates back into
+ * the guarantee that we cannot both miss the futex variable change and the
+ * enqueue.
  */
 
 int __read_mostly futex_cmpxchg_enabled;
@@ -203,6 +231,7 @@ static const struct futex_q futex_q_init = {
  * waiting on a futex.
  */
 struct futex_hash_bucket {
+	atomic_t waiters;
 	spinlock_t lock;
 	struct plist_head chain;
 } ____cacheline_aligned_in_smp;
@@ -211,6 +240,53 @@ static unsigned long __read_mostly futex_hashsize;
 
 static struct futex_hash_bucket *futex_queues;
 
+static inline void futex_get_mm(union futex_key *key)
+{
+	atomic_inc(&key->private.mm->mm_count);
+#ifdef CONFIG_SMP
+	/*
+	 * Ensure futex_get_mm() implies a full barrier such that
+	 * get_futex_key() implies a full barrier. This is relied upon
+	 * as full barrier (B), see the ordering comment above.
+	 */
+	smp_mb__after_atomic_inc();
+#endif
+}
+
+/*
+ * Reflects a new waiter being added to the waitqueue.
+ */
+static inline void hb_waiters_inc(struct futex_hash_bucket *hb)
+{
+#ifdef CONFIG_SMP
+	atomic_inc(&hb->waiters);
+	/*
+	 * Full barrier (A), see the ordering comment above.
+	 */
+	smp_mb__after_atomic_inc();
+#endif
+}
+
+/*
+ * Reflects a waiter being removed from the waitqueue by wakeup
+ * paths.
+ */
+static inline void hb_waiters_dec(struct futex_hash_bucket *hb)
+{
+#ifdef CONFIG_SMP
+	atomic_dec(&hb->waiters);
+#endif
+}
+
+static inline int hb_waiters_pending(struct futex_hash_bucket *hb)
+{
+#ifdef CONFIG_SMP
+	return atomic_read(&hb->waiters);
+#else
+	return 1;
+#endif
+}
+
 /*
  * We hash on the keys returned from get_futex_key (see below).
  */
@@ -237,6 +313,8 @@ static inline int match_futex(union futex_key *key1, union futex_key *key2)
  * Take a reference to the resource addressed by a key.
  * Can be called while holding spinlocks.
  *
+ * Implies a full memory barrier; relied upon as (B), see the comment above
+ * about ordering.
  */
 static void get_futex_key_refs(union futex_key *key)
 {
@@ -245,10 +323,10 @@ static void get_futex_key_refs(union futex_key *key)
 
 	switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
 	case FUT_OFF_INODE:
-		ihold(key->shared.inode);
+		ihold(key->shared.inode); /* implies MB */
 		break;
 	case FUT_OFF_MMSHARED:
-		atomic_inc(&key->private.mm->mm_count);
+		futex_get_mm(key); /* implies MB */
 		break;
 	}
 }
@@ -292,6 +370,8 @@ static void drop_futex_key_refs(union futex_key *key)
  * We can usually work out the index without swapping in the page.
  *
  * lock_page() might sleep, the caller should not hold a spinlock.
+ *
+ * Implies a full memory barrier (B).
  */
 static int
 get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
@@ -322,7 +402,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
 	if (!fshared) {
 		key->private.mm = mm;
 		key->private.address = address;
-		get_futex_key_refs(key);
+		get_futex_key_refs(key); /* implies MB (B) */
 		return 0;
 	}
 
@@ -429,7 +509,7 @@ again:
 		key->shared.pgoff = basepage_index(page);
 	}
 
-	get_futex_key_refs(key);
+	get_futex_key_refs(key); /* implies MB (B) */
 
 out:
 	unlock_page(page_head);
@@ -893,6 +973,7 @@ static void __unqueue_futex(struct futex_q *q)
 
 	hb = container_of(q->lock_ptr, struct futex_hash_bucket, lock);
 	plist_del(&q->list, &hb->chain);
+	hb_waiters_dec(hb);
 }
 
 /*
@@ -1052,6 +1133,11 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 		goto out;
 
 	hb = hash_futex(&key);
+
+	/* Make sure we really have tasks to wakeup */
+	if (!hb_waiters_pending(hb))
+		goto out_put_key;
+
 	spin_lock(&hb->lock);
 
 	plist_for_each_entry_safe(this, next, &hb->chain, list) {
@@ -1072,6 +1158,7 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 	}
 
 	spin_unlock(&hb->lock);
+out_put_key:
 	put_futex_key(&key);
 out:
 	return ret;
@@ -1190,7 +1277,9 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
 	 */
 	if (likely(&hb1->chain != &hb2->chain)) {
 		plist_del(&q->list, &hb1->chain);
+		hb_waiters_dec(hb1);
 		plist_add(&q->list, &hb2->chain);
+		hb_waiters_inc(hb2);
 		q->lock_ptr = &hb2->lock;
 	}
 	get_futex_key_refs(key2);
@@ -1533,17 +1622,28 @@ static inline struct futex_hash_bucket *queue_lock(struct futex_q *q)
 	struct futex_hash_bucket *hb;
 
 	hb = hash_futex(&q->key);
+
+	/*
+	 * Increment the counter before taking the lock so that
+	 * a potential waker won't miss a to-be-slept task that is
+	 * waiting for the spinlock. This is safe as all queue_lock()
+	 * users end up calling queue_me(). Similarly, for housekeeping,
+	 * decrement the counter at queue_unlock() when some error has
+	 * occurred and we don't end up adding the task to the list.
+	 */
+	hb_waiters_inc(hb);
+
 	q->lock_ptr = &hb->lock;
 
 	spin_lock(&hb->lock);
 	return hb;
 }
 
-static inline void
-queue_unlock(struct futex_hash_bucket *hb)
+static inline void queue_unlock(struct futex_hash_bucket *hb)
 	__releases(&hb->lock)
 {
 	spin_unlock(&hb->lock);
+	hb_waiters_dec(hb);
 }
 
 /**
@@ -2275,6 +2375,7 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
 		 * Unqueue the futex_q and determine which it was.
 		 */
 		plist_del(&q->list, &hb->chain);
+		hb_waiters_dec(hb);
 
 		/* Handle spurious wakeups gracefully */
 		ret = -EWOULDBLOCK;
@@ -2290,7 +2391,7 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
  * futex_wait_requeue_pi() - Wait on uaddr and take uaddr2
  * @uaddr:	the futex we initially wait on (non-pi)
  * @flags:	futex flags (FLAGS_SHARED, FLAGS_CLOCKRT, etc.), they must be
- * 		the same type, no requeueing from private to shared, etc.
+ *		the same type, no requeueing from private to shared, etc.
  * @val:	the expected value of uaddr
  * @abs_time:	absolute timeout
  * @bitset:	32 bit wakeup bitset set by userspace, defaults to all
@@ -2804,6 +2905,7 @@ static int __init futex_init(void)
 		futex_cmpxchg_enabled = 1;
 
 	for (i = 0; i < futex_hashsize; i++) {
+		atomic_set(&futex_queues[i].waiters, 0);
 		plist_head_init(&futex_queues[i].chain);
 		spin_lock_init(&futex_queues[i].lock);
 	}
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/4] futex: Document ordering guarantees
  2013-12-19 20:45 ` [PATCH v4 3/4] futex: Document ordering guarantees Davidlohr Bueso
@ 2013-12-19 20:51   ` Randy Dunlap
  2013-12-19 21:00     ` Davidlohr Bueso
  0 siblings, 1 reply; 12+ messages in thread
From: Randy Dunlap @ 2013-12-19 20:51 UTC (permalink / raw)
  To: Davidlohr Bueso, linux-kernel
  Cc: mingo, dvhart, peterz, tglx, paulmck, efault, jeffm, torvalds,
	jason.low2, Waiman.Long, tom.vaden, scott.norton, aswin,
	Ingo Molnar

On 12/19/13 12:45, Davidlohr Bueso wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> That's essential, if you want to hack on futexes.
> 
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Darren Hart <dvhart@linux.intel.com>
> Acked-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Jeff Mahoney <jeffm@suse.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Scott Norton <scott.norton@hp.com>
> Cc: Tom Vaden <tom.vaden@hp.com>
> Cc: Aswin Chandramouleeswaran <aswin@hp.com>
> Cc: Waiman Long <Waiman.Long@hp.com>
> Cc: Jason Low <jason.low2@hp.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> ---
>  kernel/futex.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 57 insertions(+)
> 
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 577481d..af1fc31 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -69,6 +69,63 @@
>  
>  #include "locking/rtmutex_common.h"
>  
> +/*
> + * Basic futex operation and ordering guarantees:
> + *
> + * The waiter reads the futex value in user space and calls
> + * futex_wait(). It computes the hash bucket and acquires the hash

does                It
refer to "the waiter" or to futex_wait()?
I read it as referring to the waiter, but ISTM that the comments are using It
to refer to futex_wait()... ???

Thanks.

> + * bucket lock. After that it reads the futex user space value again
> + * and verifies that the data has not changed. If it has not changed
> + * it enqueues itself into the hash bucket, releases the hash
> + * bucket lock and schedules.
> + *
> + * The waker side modifies the user space value of the futex and calls
> + * futex_wake(). It computes the hash bucket and acquires the hash
> + * bucket lock. Then it looks for waiters on that futex in the hash
> + * bucket and wakes them.
> + *
> + * Note that the spin_lock serializes waiters and wakers, so that the
> + * following scenario is avoided:
> + *

-- 
~Randy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/4] futex: Document ordering guarantees
  2013-12-19 20:51   ` Randy Dunlap
@ 2013-12-19 21:00     ` Davidlohr Bueso
  2013-12-19 21:07       ` Darren Hart
  0 siblings, 1 reply; 12+ messages in thread
From: Davidlohr Bueso @ 2013-12-19 21:00 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-kernel, mingo, dvhart, peterz, tglx, paulmck, efault, jeffm,
	torvalds, jason.low2, Waiman.Long, tom.vaden, scott.norton, aswin,
	Ingo Molnar

On Thu, 2013-12-19 at 12:51 -0800, Randy Dunlap wrote:
> On 12/19/13 12:45, Davidlohr Bueso wrote:
> > From: Thomas Gleixner <tglx@linutronix.de>
> > 
> > That's essential, if you want to hack on futexes.
> > 
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Darren Hart <dvhart@linux.intel.com>
> > Acked-by: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Mike Galbraith <efault@gmx.de>
> > Cc: Jeff Mahoney <jeffm@suse.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Scott Norton <scott.norton@hp.com>
> > Cc: Tom Vaden <tom.vaden@hp.com>
> > Cc: Aswin Chandramouleeswaran <aswin@hp.com>
> > Cc: Waiman Long <Waiman.Long@hp.com>
> > Cc: Jason Low <jason.low2@hp.com>
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > ---
> >  kernel/futex.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 57 insertions(+)
> > 
> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index 577481d..af1fc31 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -69,6 +69,63 @@
> >  
> >  #include "locking/rtmutex_common.h"
> >  
> > +/*
> > + * Basic futex operation and ordering guarantees:
> > + *
> > + * The waiter reads the futex value in user space and calls
> > + * futex_wait(). It computes the hash bucket and acquires the hash
> 
> does                It
> refer to "the waiter" or to futex_wait()?
> I read it as referring to the waiter, but ISTM that the comments are using It
> to refer to futex_wait()... ???
> 

Yes, it refers to futex_wait(), which IMO in a futex context is pretty
clear.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/4] futex: Document ordering guarantees
  2013-12-19 21:00     ` Davidlohr Bueso
@ 2013-12-19 21:07       ` Darren Hart
  2013-12-19 23:05         ` Randy Dunlap
  0 siblings, 1 reply; 12+ messages in thread
From: Darren Hart @ 2013-12-19 21:07 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Randy Dunlap, linux-kernel, mingo, peterz, tglx, paulmck, efault,
	jeffm, torvalds, jason.low2, Waiman.Long, tom.vaden, scott.norton,
	aswin, Ingo Molnar

On Thu, 2013-12-19 at 13:00 -0800, Davidlohr Bueso wrote:
> On Thu, 2013-12-19 at 12:51 -0800, Randy Dunlap wrote:
> > On 12/19/13 12:45, Davidlohr Bueso wrote:
> > > From: Thomas Gleixner <tglx@linutronix.de>
> > > 
> > > That's essential, if you want to hack on futexes.
> > > 
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > Cc: Darren Hart <dvhart@linux.intel.com>
> > > Acked-by: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Cc: Mike Galbraith <efault@gmx.de>
> > > Cc: Jeff Mahoney <jeffm@suse.com>
> > > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > > Cc: Scott Norton <scott.norton@hp.com>
> > > Cc: Tom Vaden <tom.vaden@hp.com>
> > > Cc: Aswin Chandramouleeswaran <aswin@hp.com>
> > > Cc: Waiman Long <Waiman.Long@hp.com>
> > > Cc: Jason Low <jason.low2@hp.com>
> > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > > Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > > ---
> > >  kernel/futex.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 57 insertions(+)
> > > 
> > > diff --git a/kernel/futex.c b/kernel/futex.c
> > > index 577481d..af1fc31 100644
> > > --- a/kernel/futex.c
> > > +++ b/kernel/futex.c
> > > @@ -69,6 +69,63 @@
> > >  
> > >  #include "locking/rtmutex_common.h"
> > >  
> > > +/*
> > > + * Basic futex operation and ordering guarantees:
> > > + *
> > > + * The waiter reads the futex value in user space and calls
> > > + * futex_wait(). It computes the hash bucket and acquires the hash
> > 
> > does                It
> > refer to "the waiter" or to futex_wait()?
> > I read it as referring to the waiter, but ISTM that the comments are using It
> > to refer to futex_wait()... ???
> > 
> 
> Yes, it refers to futex_wait(), which IMO in a futex context is pretty
> clear.


Agreed.


-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/4] futex: Document ordering guarantees
  2013-12-19 21:07       ` Darren Hart
@ 2013-12-19 23:05         ` Randy Dunlap
  2013-12-19 23:19           ` Darren Hart
  0 siblings, 1 reply; 12+ messages in thread
From: Randy Dunlap @ 2013-12-19 23:05 UTC (permalink / raw)
  To: Darren Hart, Davidlohr Bueso
  Cc: linux-kernel, mingo, peterz, tglx, paulmck, efault, jeffm,
	torvalds, jason.low2, Waiman.Long, tom.vaden, scott.norton, aswin,
	Ingo Molnar

On 12/19/13 13:07, Darren Hart wrote:
> On Thu, 2013-12-19 at 13:00 -0800, Davidlohr Bueso wrote:
>> On Thu, 2013-12-19 at 12:51 -0800, Randy Dunlap wrote:
>>> On 12/19/13 12:45, Davidlohr Bueso wrote:
>>>> From: Thomas Gleixner <tglx@linutronix.de>
>>>>
>>>> That's essential, if you want to hack on futexes.
>>>>
>>>> Cc: Ingo Molnar <mingo@kernel.org>
>>>> Cc: Darren Hart <dvhart@linux.intel.com>
>>>> Acked-by: Peter Zijlstra <peterz@infradead.org>
>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>>> Cc: Mike Galbraith <efault@gmx.de>
>>>> Cc: Jeff Mahoney <jeffm@suse.com>
>>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>>>> Cc: Scott Norton <scott.norton@hp.com>
>>>> Cc: Tom Vaden <tom.vaden@hp.com>
>>>> Cc: Aswin Chandramouleeswaran <aswin@hp.com>
>>>> Cc: Waiman Long <Waiman.Long@hp.com>
>>>> Cc: Jason Low <jason.low2@hp.com>
>>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>>> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
>>>> ---
>>>>  kernel/futex.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 57 insertions(+)
>>>>
>>>> diff --git a/kernel/futex.c b/kernel/futex.c
>>>> index 577481d..af1fc31 100644
>>>> --- a/kernel/futex.c
>>>> +++ b/kernel/futex.c
>>>> @@ -69,6 +69,63 @@
>>>>  
>>>>  #include "locking/rtmutex_common.h"
>>>>  
>>>> +/*
>>>> + * Basic futex operation and ordering guarantees:
>>>> + *
>>>> + * The waiter reads the futex value in user space and calls
>>>> + * futex_wait(). It computes the hash bucket and acquires the hash
>>>
>>> does                It
>>> refer to "the waiter" or to futex_wait()?
>>> I read it as referring to the waiter, but ISTM that the comments are using It
>>> to refer to futex_wait()... ???
>>>
>>
>> Yes, it refers to futex_wait(), which IMO in a futex context is pretty
>> clear.
> 
> 
> Agreed.
> 
> 

I'll just have to get by with disagreeing with both of you then.
I say that the pronoun's antecedent is ambiguous (unclear).

-- 
~Randy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/4] futex: Document ordering guarantees
  2013-12-19 23:05         ` Randy Dunlap
@ 2013-12-19 23:19           ` Darren Hart
  2013-12-20  8:17             ` Ingo Molnar
  2013-12-22 23:24             ` Davidlohr Bueso
  0 siblings, 2 replies; 12+ messages in thread
From: Darren Hart @ 2013-12-19 23:19 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Davidlohr Bueso, linux-kernel, mingo, peterz, tglx, paulmck,
	efault, jeffm, torvalds, jason.low2, Waiman.Long, tom.vaden,
	scott.norton, aswin, Ingo Molnar

On Thu, 2013-12-19 at 15:05 -0800, Randy Dunlap wrote:
> On 12/19/13 13:07, Darren Hart wrote:
> > On Thu, 2013-12-19 at 13:00 -0800, Davidlohr Bueso wrote:
> >> On Thu, 2013-12-19 at 12:51 -0800, Randy Dunlap wrote:
> >>> On 12/19/13 12:45, Davidlohr Bueso wrote:
> >>>> From: Thomas Gleixner <tglx@linutronix.de>
> >>>>
> >>>> That's essential, if you want to hack on futexes.
> >>>>
> >>>> Cc: Ingo Molnar <mingo@kernel.org>
> >>>> Cc: Darren Hart <dvhart@linux.intel.com>
> >>>> Acked-by: Peter Zijlstra <peterz@infradead.org>
> >>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> >>>> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >>>> Cc: Mike Galbraith <efault@gmx.de>
> >>>> Cc: Jeff Mahoney <jeffm@suse.com>
> >>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> >>>> Cc: Scott Norton <scott.norton@hp.com>
> >>>> Cc: Tom Vaden <tom.vaden@hp.com>
> >>>> Cc: Aswin Chandramouleeswaran <aswin@hp.com>
> >>>> Cc: Waiman Long <Waiman.Long@hp.com>
> >>>> Cc: Jason Low <jason.low2@hp.com>
> >>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> >>>> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> >>>> ---
> >>>>  kernel/futex.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 57 insertions(+)
> >>>>
> >>>> diff --git a/kernel/futex.c b/kernel/futex.c
> >>>> index 577481d..af1fc31 100644
> >>>> --- a/kernel/futex.c
> >>>> +++ b/kernel/futex.c
> >>>> @@ -69,6 +69,63 @@
> >>>>  
> >>>>  #include "locking/rtmutex_common.h"
> >>>>  
> >>>> +/*
> >>>> + * Basic futex operation and ordering guarantees:
> >>>> + *
> >>>> + * The waiter reads the futex value in user space and calls
> >>>> + * futex_wait(). It computes the hash bucket and acquires the hash
> >>>
> >>> does                It
> >>> refer to "the waiter" or to futex_wait()?
> >>> I read it as referring to the waiter, but ISTM that the comments are using It
> >>> to refer to futex_wait()... ???
> >>>
> >>
> >> Yes, it refers to futex_wait(), which IMO in a futex context is pretty
> >> clear.
> > 
> > 
> > Agreed.
> > 
> > 
> 
> I'll just have to get by with disagreeing with both of you then.
> I say that the pronoun's antecedent is ambiguous (unclear).
> 

You are correct from a purely grammatical perspective, no argument
there. I just feel that the comment is clear enough from the context -
although perhaps people that hack on futexes should be excluded from
having an opinion here as we are biased :-)

If Davidlohr is going to resubmit for other reasons, then by all means,
please correct this per Randy's suggestion - I just didn't want to put
him (Davidlohr ;-) through another round of patches for this alone.

Thanks for ensuring our docs are as clear as possible Randy!

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/4] futex: Document ordering guarantees
  2013-12-19 23:19           ` Darren Hart
@ 2013-12-20  8:17             ` Ingo Molnar
  2013-12-22 23:24             ` Davidlohr Bueso
  1 sibling, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2013-12-20  8:17 UTC (permalink / raw)
  To: Darren Hart
  Cc: Randy Dunlap, Davidlohr Bueso, linux-kernel, mingo, peterz, tglx,
	paulmck, efault, jeffm, torvalds, jason.low2, Waiman.Long,
	tom.vaden, scott.norton, aswin


* Darren Hart <dvhart@linux.intel.com> wrote:

> On Thu, 2013-12-19 at 15:05 -0800, Randy Dunlap wrote:
> > On 12/19/13 13:07, Darren Hart wrote:
> > > On Thu, 2013-12-19 at 13:00 -0800, Davidlohr Bueso wrote:
> > >> On Thu, 2013-12-19 at 12:51 -0800, Randy Dunlap wrote:
> > >>> On 12/19/13 12:45, Davidlohr Bueso wrote:
> > >>>> From: Thomas Gleixner <tglx@linutronix.de>
> > >>>>
> > >>>> That's essential, if you want to hack on futexes.
> > >>>>
> > >>>> Cc: Ingo Molnar <mingo@kernel.org>
> > >>>> Cc: Darren Hart <dvhart@linux.intel.com>
> > >>>> Acked-by: Peter Zijlstra <peterz@infradead.org>
> > >>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> > >>>> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > >>>> Cc: Mike Galbraith <efault@gmx.de>
> > >>>> Cc: Jeff Mahoney <jeffm@suse.com>
> > >>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > >>>> Cc: Scott Norton <scott.norton@hp.com>
> > >>>> Cc: Tom Vaden <tom.vaden@hp.com>
> > >>>> Cc: Aswin Chandramouleeswaran <aswin@hp.com>
> > >>>> Cc: Waiman Long <Waiman.Long@hp.com>
> > >>>> Cc: Jason Low <jason.low2@hp.com>
> > >>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > >>>> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > >>>> ---
> > >>>>  kernel/futex.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >>>>  1 file changed, 57 insertions(+)
> > >>>>
> > >>>> diff --git a/kernel/futex.c b/kernel/futex.c
> > >>>> index 577481d..af1fc31 100644
> > >>>> --- a/kernel/futex.c
> > >>>> +++ b/kernel/futex.c
> > >>>> @@ -69,6 +69,63 @@
> > >>>>  
> > >>>>  #include "locking/rtmutex_common.h"
> > >>>>  
> > >>>> +/*
> > >>>> + * Basic futex operation and ordering guarantees:
> > >>>> + *
> > >>>> + * The waiter reads the futex value in user space and calls
> > >>>> + * futex_wait(). It computes the hash bucket and acquires the hash
> > >>>
> > >>> does                It
> > >>> refer to "the waiter" or to futex_wait()?
> > >>> I read it as referring to the waiter, but ISTM that the comments are using It
> > >>> to refer to futex_wait()... ???
> > >>>
> > >>
> > >> Yes, it refers to futex_wait(), which IMO in a futex context is pretty
> > >> clear.
> > > 
> > > 
> > > Agreed.
> > > 
> > > 
> > 
> > I'll just have to get by with disagreeing with both of you then.
> > I say that the pronoun's antecedent is ambiguous (unclear).
> > 
> 
> You are correct from a purely grammatical perspective, no argument 
> there. I just feel that the comment is clear enough from the context 
> - although perhaps people that hack on futexes should be excluded 
> from having an opinion here as we are biased :-)

It doesn't really matter that to _you_ (the futex expert) the sentence 
seems unambiguous, such comments are mostly meant for new people 
reading the code, such as Randy!

So guys please fix the comment instead of arguing about it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/4] futex: Document ordering guarantees
  2013-12-19 23:19           ` Darren Hart
  2013-12-20  8:17             ` Ingo Molnar
@ 2013-12-22 23:24             ` Davidlohr Bueso
  1 sibling, 0 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2013-12-22 23:24 UTC (permalink / raw)
  To: Darren Hart
  Cc: Randy Dunlap, linux-kernel, mingo, peterz, tglx, paulmck, efault,
	jeffm, torvalds, jason.low2, Waiman.Long, tom.vaden, scott.norton,
	aswin, Ingo Molnar

On Thu, 2013-12-19 at 15:19 -0800, Darren Hart wrote:
> On Thu, 2013-12-19 at 15:05 -0800, Randy Dunlap wrote:
> > On 12/19/13 13:07, Darren Hart wrote:
> > > On Thu, 2013-12-19 at 13:00 -0800, Davidlohr Bueso wrote:
> > >> On Thu, 2013-12-19 at 12:51 -0800, Randy Dunlap wrote:
> > >>> On 12/19/13 12:45, Davidlohr Bueso wrote:
> > >>>> From: Thomas Gleixner <tglx@linutronix.de>
> > >>>>
> > >>>> That's essential, if you want to hack on futexes.
> > >>>>
> > >>>> Cc: Ingo Molnar <mingo@kernel.org>
> > >>>> Cc: Darren Hart <dvhart@linux.intel.com>
> > >>>> Acked-by: Peter Zijlstra <peterz@infradead.org>
> > >>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> > >>>> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > >>>> Cc: Mike Galbraith <efault@gmx.de>
> > >>>> Cc: Jeff Mahoney <jeffm@suse.com>
> > >>>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > >>>> Cc: Scott Norton <scott.norton@hp.com>
> > >>>> Cc: Tom Vaden <tom.vaden@hp.com>
> > >>>> Cc: Aswin Chandramouleeswaran <aswin@hp.com>
> > >>>> Cc: Waiman Long <Waiman.Long@hp.com>
> > >>>> Cc: Jason Low <jason.low2@hp.com>
> > >>>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > >>>> Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
> > >>>> ---
> > >>>>  kernel/futex.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >>>>  1 file changed, 57 insertions(+)
> > >>>>
> > >>>> diff --git a/kernel/futex.c b/kernel/futex.c
> > >>>> index 577481d..af1fc31 100644
> > >>>> --- a/kernel/futex.c
> > >>>> +++ b/kernel/futex.c
> > >>>> @@ -69,6 +69,63 @@
> > >>>>  
> > >>>>  #include "locking/rtmutex_common.h"
> > >>>>  
> > >>>> +/*
> > >>>> + * Basic futex operation and ordering guarantees:
> > >>>> + *
> > >>>> + * The waiter reads the futex value in user space and calls
> > >>>> + * futex_wait(). It computes the hash bucket and acquires the hash
> > >>>
> > >>> does                It
> > >>> refer to "the waiter" or to futex_wait()?
> > >>> I read it as referring to the waiter, but ISTM that the comments are using It
> > >>> to refer to futex_wait()... ???
> > >>>
> > >>
> > >> Yes, it refers to futex_wait(), which IMO in a futex context is pretty
> > >> clear.
> > > 
> > > 
> > > Agreed.
> > > 
> > > 
> > 
> > I'll just have to get by with disagreeing with both of you then.
> > I say that the pronoun's antecedent is ambiguous (unclear).
> > 
> 
> You are correct from a purely grammatical perspective, no argument
> there. I just feel that the comment is clear enough from the context -
> although perhaps people that hack on futexes should be excluded from
> having an opinion here as we are biased :-)
> 
> If Davidlohr is going to resubmit for other reasons, then by all means,
> please correct this per Randy's suggestion - I just didn't want to put
> him (Davidlohr ;-) through another round of patches for this alone.
> 
> Thanks for ensuring our docs are as clear as possible Randy!

Absolutely.

I've found that we have the same grammatical error when describing
futex_wake() as well, so I've updated the doc to say "This function"
instead of "It".

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-12-22 23:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-19 20:45 [PATCH v4 0/4] futex: Wakeup optimizations Davidlohr Bueso
2013-12-19 20:45 ` [PATCH v4 1/4] futex: Misc cleanups Davidlohr Bueso
2013-12-19 20:45 ` [PATCH v4 2/4] futex: Larger hash table Davidlohr Bueso
2013-12-19 20:45 ` [PATCH v4 3/4] futex: Document ordering guarantees Davidlohr Bueso
2013-12-19 20:51   ` Randy Dunlap
2013-12-19 21:00     ` Davidlohr Bueso
2013-12-19 21:07       ` Darren Hart
2013-12-19 23:05         ` Randy Dunlap
2013-12-19 23:19           ` Darren Hart
2013-12-20  8:17             ` Ingo Molnar
2013-12-22 23:24             ` Davidlohr Bueso
2013-12-19 20:45 ` [PATCH v4 4/4] futex: Avoid taking hb lock if nothing to wakeup Davidlohr Bueso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.