All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Pierre PEIFFER <pierre.peiffer@bull.net>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	jakub@redhat.com
Subject: Re: [PATCH] 2.6.16 - futex: small optimization (?)
Date: Tue, 28 Mar 2006 12:05:44 +0200	[thread overview]
Message-ID: <44290A78.3050509@cosmosbay.com> (raw)
In-Reply-To: <4428E7B7.8040408@bull.net>

Pierre PEIFFER a écrit :
> Hi,
> 
> 
> I found a (optimization ?) problem in the futexes, during a futex_wake, 
>  if the waiter has a higher priority than the waker.
> 
> In fact, in this case, the waiter is immediately scheduled and tries to 
> take a lock still held by the waker. This is specially expensive on UP 
> or if both threads are on the same CPU, due to the two task-switchings. 
> This produces an extra latency during a wakeup in pthread_cond_broadcast 
> or pthread_cond_signal, for example.
> 
> See below my detailed explanation.
> 
> I found a solution given by the patch, at the end of this mail. It works 
> for me on kernel 2.6.16, but the kernel hangs if I use it with -rt patch 
> from Ingo Molnar. So, I have a doubt on the correctness of the patch.
> 
> The idea is simple: in unqueue_me, I first check
>     "if (list_empty(&q->list))"
> 
> If yes => we were woken (the list is initialized in wake_futex).
> Then, it immediately returns and let the waker drop the key_refs 
> (instead of the waiter).
> 
> 

Its true that futex code implies lot of context switches (kernel side but also 
user side).

Even if you change kernel behavior in futex_wake(), you wont change the fact 
that a typical pthread_cond_signal does :

1) lock cond var
lll_lock(cv->lock);
2) wake one waiter if necessary
FUTEX_WAKE(cv->wakeup_seq, 1);
3) unlock cond var

If a waiter process B has higher priority than the wake process A, then most 
probably, B is scheduled before A had a chance to unlock cond var (step 3))

So B will re-enter kernel (because of the contended cond var lock), and A will 
re-enter kernel too to futex_wake() process A again, but on cond var lock this 
time, not on condvar wakeup_seq futex.

Each time a thread enters futex kernel code, an expensive find_extend_vma() 
lookup is done, (expensive because of the read_lock but also the possible 
amount of vm_area_struct in mm_struct)

I wish futex code had a special implementation for PTHREAD_SCOPE_PROCESS 
futexes , where no vma lookups would be necessary at all. Most mutexes or 
condvar have a process private scope (not shared by different processes)

Eric




  reply	other threads:[~2006-03-28 10:06 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-28  7:37 [PATCH] 2.6.16 - futex: small optimization (?) Pierre PEIFFER
2006-03-28 10:05 ` Eric Dumazet [this message]
2006-03-28 15:02 ` Ulrich Drepper
2006-03-28 22:46   ` Bill Davidsen
2006-03-29 15:26     ` Ingo Molnar
2006-03-30 20:27       ` Bill Davidsen
2006-03-31  6:01         ` Ingo Molnar
2006-03-31 14:50           ` Bill Davidsen
2006-03-31 18:15             ` Ingo Molnar
2006-03-29 13:18   ` Pierre PEIFFER
2006-03-29 15:26     ` Eric Dumazet
2006-03-30 14:51       ` Pierre PEIFFER

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44290A78.3050509@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=jakub@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=pierre.peiffer@bull.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.