From: Pierre PEIFFER <pierre.peiffer@bull.net>
To: linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@elte.hu>, jakub@redhat.com
Subject: [PATCH] 2.6.16 - futex: small optimization (?)
Date: Tue, 28 Mar 2006 09:37:27 +0200 [thread overview]
Message-ID: <4428E7B7.8040408@bull.net> (raw)
Hi,
I found a (optimization ?) problem in the futexes, during a futex_wake,
if the waiter has a higher priority than the waker.
In fact, in this case, the waiter is immediately scheduled and tries to
take a lock still held by the waker. This is specially expensive on UP
or if both threads are on the same CPU, due to the two task-switchings.
This produces an extra latency during a wakeup in pthread_cond_broadcast
or pthread_cond_signal, for example.
See below my detailed explanation.
I found a solution given by the patch, at the end of this mail. It works
for me on kernel 2.6.16, but the kernel hangs if I use it with -rt patch
from Ingo Molnar. So, I have a doubt on the correctness of the patch.
The idea is simple: in unqueue_me, I first check
"if (list_empty(&q->list))"
If yes => we were woken (the list is initialized in wake_futex).
Then, it immediately returns and let the waker drop the key_refs
(instead of the waiter).
====================================================================
Here is the detailed explanation:
Imagine two threads, on UP or on the same CPU: a futex-waker (thread A)
and a futex-waiter (thread B); thread B having a higher priority,
blocked and sleeping in futex_wait. Here is the scenario:
Thread A Thread B
(waker) (waiter with higher priority)
/* sleeping in futex_wait */
/* wake the futex */
/* (from futex_wake or futex_requeue) */
\_ wake_futex
\_ list_del_init(&q->list)
\_ wake_up_all (thread B)
=> Thread B, due to its higher priority is immediately
woken and sheduled
=> task-swith to thread B
/* sleeps */ /* awakes */
in futex_wait:
\_ ...
\_ unqueue_me
\_ lock_ptr = q->lock_ptr;
\_ if (lock_ptr != 0) {
/* TRUE */
\_ spin_lock(lock_ptr);
/* lock is still locked by the
waker, thread A, in either
futex_wake or futex_requeue
*/
=> task-switch to the lock owner, thread A
/* awakes */ /* sleeps */
\_q->lock_ptr = NULL;
/* back to futex_wake or futex_requeue */
\_ ...
\_ spin_unlock(&bh->lock);
/* this is q->lock_ptr in thread B */
/* => waiters are woken */
=> task-switch to the lock waiter, thread B
/* sleeps */ /* awakes */
\_ if (lock_ptr != q->lock_ptr) {
/* unfortunately, this is true */
/* q->lock_ptr in now NULL */
\_ spin_unlock(lock_ptr);
\_ goto retry;
\_ }
\_ lock_ptr = q->lock_ptr;
\_ if (lock_ptr != 0) {
/* FALSE now */
/* thread B can go on now */
=== end of scenario ===
So, there are two dummy and heavy task-switchings due to the
"if (lock_ptr != 0)" statement still true the first time in unqueue_me
where it should not.
Here is the patch I would like to propose, for comments.
Signed-off-by: Pierre Peiffer <Pierre.Peiffer@bull.net>
---
diff -uprN linux-2.6.16.ori/kernel/futex.c linux-2.6.16/kernel/futex.c
--- linux-2.6.16.ori/kernel/futex.c 2006-03-27 16:52:11.000000000 +0200
+++ linux-2.6.16/kernel/futex.c 2006-03-27 16:56:35.000000000 +0200
@@ -290,7 +290,7 @@ static int futex_wake(unsigned long uadd
struct futex_hash_bucket *bh;
struct list_head *head;
struct futex_q *this, *next;
- int ret;
+ int ret, drop_count=0;
down_read(¤t->mm->mmap_sem);
@@ -305,12 +305,15 @@ static int futex_wake(unsigned long uadd
list_for_each_entry_safe(this, next, head, list) {
if (match_futex (&this->key, &key)) {
wake_futex(this);
+ drop_count++;
if (++ret >= nr_wake)
break;
}
}
spin_unlock(&bh->lock);
+ while (--drop_count >= 0)
+ drop_key_refs(&key);
out:
up_read(¤t->mm->mmap_sem);
return ret;
@@ -327,6 +330,7 @@ static int futex_wake_op(unsigned long u
struct list_head *head;
struct futex_q *this, *next;
int ret, op_ret, attempt = 0;
+ int drop_count1 = 0, drop_count2 = 0;
retryfull:
down_read(¤t->mm->mmap_sem);
@@ -413,6 +417,7 @@ retry:
list_for_each_entry_safe(this, next, head, list) {
if (match_futex (&this->key, &key1)) {
wake_futex(this);
+ drop_count1++;
if (++ret >= nr_wake)
break;
}
@@ -425,6 +430,7 @@ retry:
list_for_each_entry_safe(this, next, head, list) {
if (match_futex (&this->key, &key2)) {
wake_futex(this);
+ drop_count2++;
if (++op_ret >= nr_wake2)
break;
}
@@ -435,6 +441,12 @@ retry:
spin_unlock(&bh1->lock);
if (bh1 != bh2)
spin_unlock(&bh2->lock);
+
+ /* drop_key_refs() must be called outside the spinlocks. */
+ while (--drop_count1 >= 0)
+ drop_key_refs(&key1);
+ while (--drop_count2 >= 0)
+ drop_key_refs(&key2);
out:
up_read(¤t->mm->mmap_sem);
return ret;
@@ -506,6 +518,7 @@ static int futex_requeue(unsigned long u
continue;
if (++ret <= nr_wake) {
wake_futex(this);
+ drop_count++;
} else {
list_move_tail(&this->list, &bh2->chain);
this->lock_ptr = &bh2->lock;
@@ -586,6 +599,9 @@ static int unqueue_me(struct futex_q *q)
int ret = 0;
spinlock_t *lock_ptr;
+ if (list_empty(&q->list))
+ return ret;
+
/* In the common case we don't take the spinlock, which is nice. */
retry:
lock_ptr = q->lock_ptr;
--
Pierre Peiffer
next reply other threads:[~2006-03-28 7:37 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-28 7:37 Pierre PEIFFER [this message]
2006-03-28 10:05 ` [PATCH] 2.6.16 - futex: small optimization (?) Eric Dumazet
2006-03-28 15:02 ` Ulrich Drepper
2006-03-28 22:46 ` Bill Davidsen
2006-03-29 15:26 ` Ingo Molnar
2006-03-30 20:27 ` Bill Davidsen
2006-03-31 6:01 ` Ingo Molnar
2006-03-31 14:50 ` Bill Davidsen
2006-03-31 18:15 ` Ingo Molnar
2006-03-29 13:18 ` Pierre PEIFFER
2006-03-29 15:26 ` Eric Dumazet
2006-03-30 14:51 ` Pierre PEIFFER
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4428E7B7.8040408@bull.net \
--to=pierre.peiffer@bull.net \
--cc=jakub@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.