From: Darren Hart <dvhart@linux.intel.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
linux-kernel@vger.kernel.org,
Steven Rostedt <rostedt@goodmis.org>,
Manfred Spraul <manfred@colorfullife.com>,
David Miller <davem@davemloft.net>,
Eric Dumazet <eric.dumazet@gmail.com>,
Mike Galbraith <efault@gmx.de>
Subject: Re: [RFC][PATCH 2/3] futex: Reduce hash bucket lock contention
Date: Wed, 14 Sep 2011 08:46:03 -0700 [thread overview]
Message-ID: <4E70CC3B.4000905@linux.intel.com> (raw)
In-Reply-To: <20110914133750.831707072@chello.nl>
On 09/14/2011 06:30 AM, Peter Zijlstra wrote:
> Use the brand spanking new wake_list to delay the futex wakeups until
> after we've released the hash bucket locks. This avoids the newly
> woken tasks from immediately getting stuck on the hb lock.
>
> This is esp. painful on -rt, where the hb lock is preemptible.
Nice!
Have you run this through the functional and performance tests from
futextest? Looks like I should also add a multiwake test to really
showcase this.
If you don't have it local I can setup a github repository for futextest
until korg is back.... or do the testing myself... right.
>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Darren Hart <dvhart@linux.intel.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> kernel/futex.c | 23 ++++++++++++++---------
> 1 file changed, 14 insertions(+), 9 deletions(-)
>
> Index: linux-2.6/kernel/futex.c
> ===================================================================
> --- linux-2.6.orig/kernel/futex.c
> +++ linux-2.6/kernel/futex.c
> @@ -823,7 +823,7 @@ static void __unqueue_futex(struct futex
> * The hash bucket lock must be held when this is called.
> * Afterwards, the futex_q must not be accessed.
> */
> -static void wake_futex(struct futex_q *q)
> +static void wake_futex(struct wake_list_head *wake_list, struct futex_q *q)
A good opportunity to add the proper kerneldoc to this function as well.
> {
> struct task_struct *p = q->task;
>
> @@ -834,7 +834,7 @@ static void wake_futex(struct futex_q *q
> * struct. Prevent this by holding a reference on p across the
> * wake up.
> */
> - get_task_struct(p);
> + wake_list_add(wake_list, p);
>
> __unqueue_futex(q);
> /*
> @@ -845,9 +845,6 @@ static void wake_futex(struct futex_q *q
> */
> smp_wmb();
> q->lock_ptr = NULL;
> -
> - wake_up_state(p, TASK_NORMAL);
> - put_task_struct(p);
> }
>
> static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this)
> @@ -964,6 +961,7 @@ futex_wake(u32 __user *uaddr, unsigned i
> struct futex_q *this, *next;
> struct plist_head *head;
> union futex_key key = FUTEX_KEY_INIT;
> + WAKE_LIST(wake_list);
> int ret;
>
> if (!bitset)
> @@ -988,7 +986,7 @@ futex_wake(u32 __user *uaddr, unsigned i
> if (!(this->bitset & bitset))
> continue;
>
> - wake_futex(this);
> + wake_futex(&wake_list, this);
I guess this is OK. wake_futex_pi will always be one task I believe, so
the list syntax might confuse newcomers... Would it make sense to have a
wake_futex_list() call? Thinking outloud...
> if (++ret >= nr_wake)
> break;
> }
> @@ -996,6 +994,8 @@ futex_wake(u32 __user *uaddr, unsigned i
>
> spin_unlock(&hb->lock);
> put_futex_key(&key);
> +
> + wake_up_list(&wake_list, TASK_NORMAL);
> out:
> return ret;
> }
> @@ -1012,6 +1012,7 @@ futex_wake_op(u32 __user *uaddr1, unsign
> struct futex_hash_bucket *hb1, *hb2;
> struct plist_head *head;
> struct futex_q *this, *next;
> + WAKE_LIST(wake_list);
> int ret, op_ret;
>
> retry:
> @@ -1062,7 +1063,7 @@ futex_wake_op(u32 __user *uaddr1, unsign
>
> plist_for_each_entry_safe(this, next, head, list) {
> if (match_futex (&this->key, &key1)) {
> - wake_futex(this);
> + wake_futex(&wake_list, this);
> if (++ret >= nr_wake)
> break;
> }
> @@ -1074,7 +1075,7 @@ futex_wake_op(u32 __user *uaddr1, unsign
> op_ret = 0;
> plist_for_each_entry_safe(this, next, head, list) {
> if (match_futex (&this->key, &key2)) {
> - wake_futex(this);
> + wake_futex(&wake_list, this);
> if (++op_ret >= nr_wake2)
> break;
> }
> @@ -1087,6 +1088,8 @@ futex_wake_op(u32 __user *uaddr1, unsign
> put_futex_key(&key2);
> out_put_key1:
> put_futex_key(&key1);
> +
> + wake_up_list(&wake_list, TASK_NORMAL);
> out:
> return ret;
> }
> @@ -1239,6 +1242,7 @@ static int futex_requeue(u32 __user *uad
> struct futex_hash_bucket *hb1, *hb2;
> struct plist_head *head1;
> struct futex_q *this, *next;
> + WAKE_LIST(wake_list);
> u32 curval2;
>
> if (requeue_pi) {
> @@ -1384,7 +1388,7 @@ static int futex_requeue(u32 __user *uad
> * woken by futex_unlock_pi().
> */
> if (++task_count <= nr_wake && !requeue_pi) {
> - wake_futex(this);
> + wake_futex(&wake_list, this);
> continue;
> }
>
> @@ -1437,6 +1441,7 @@ static int futex_requeue(u32 __user *uad
> put_futex_key(&key2);
> out_put_key1:
> put_futex_key(&key1);
> + wake_up_list(&wake_list, TASK_NORMAL);
> out:
> if (pi_state != NULL)
> free_pi_state(pi_state);
>
>
I _think_ requeue_pi is in the clear here as it uses
requeue_pi_wake_futex, which calls wake_up_state directly. Still, some
testing with futextest functional/futex_requeue_pi is in order.
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
next prev parent reply other threads:[~2011-09-14 15:46 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-14 13:30 [RFC][PATCH 0/3] delayed wakeup list Peter Zijlstra
2011-09-14 13:30 ` [RFC][PATCH 1/3] sched: Provide " Peter Zijlstra
2011-09-14 13:50 ` Peter Zijlstra
2011-09-14 14:08 ` Eric Dumazet
2011-09-14 14:12 ` Peter Zijlstra
2011-09-14 15:35 ` Darren Hart
2011-09-14 15:39 ` Peter Zijlstra
2011-09-14 15:49 ` Darren Hart
2011-09-16 7:59 ` Paul Turner
2011-09-16 7:59 ` Paul Turner
2011-09-16 8:48 ` Peter Zijlstra
2011-10-02 14:01 ` Manfred Spraul
2011-10-03 10:23 ` Peter Zijlstra
2011-09-14 13:30 ` [RFC][PATCH 2/3] futex: Reduce hash bucket lock contention Peter Zijlstra
2011-09-14 15:46 ` Darren Hart [this message]
2011-09-14 15:51 ` Peter Zijlstra
2011-09-14 16:00 ` Darren Hart
2011-09-14 20:49 ` Thomas Gleixner
2011-09-16 12:34 ` Peter Zijlstra
2011-09-17 12:57 ` Manfred Spraul
2011-09-19 7:37 ` Peter Zijlstra
2011-09-19 8:51 ` Peter Zijlstra
2011-09-14 13:30 ` [RFC][PATCH 3/3] ipc/sem: Rework wakeup scheme Peter Zijlstra
2011-09-15 17:29 ` Manfred Spraul
2011-09-15 19:32 ` Peter Zijlstra
2011-09-15 19:35 ` Peter Zijlstra
2011-09-15 19:45 ` Peter Zijlstra
2011-09-17 12:36 ` Manfred Spraul
2011-09-16 12:18 ` Peter Zijlstra
2011-09-17 12:32 ` Manfred Spraul
2011-09-16 12:39 ` Peter Zijlstra
2011-09-14 13:51 ` [RFC][PATCH 0/3] delayed wakeup list Eric Dumazet
2011-09-14 13:56 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E70CC3B.4000905@linux.intel.com \
--to=dvhart@linux.intel.com \
--cc=a.p.zijlstra@chello.nl \
--cc=davem@davemloft.net \
--cc=efault@gmx.de \
--cc=eric.dumazet@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
--cc=mingo@elte.hu \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.