NPTL mutex and the scheduling priority

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* NPTL mutex and the scheduling priority
@ 2006-06-12  8:10 Atsushi Nemoto
  2006-06-12 12:23 ` Arjan van de Ven
  0 siblings, 1 reply; 18+ messages in thread
From: Atsushi Nemoto @ 2006-06-12  8:10 UTC (permalink / raw)
  To: linux-kernel

# This is a copy of message posted libc-alpha ML.  I want to hear from
# kernel people too ...

Hi.  I found that it seems NPTL's mutex does not follow the scheduling
parameter.  If some threads were blocked by getting a single
mutex_lock, I expect that a thread with highest priority got the lock
first, but current NPTL's behaviour is different.

Here is a sample program.  This creates four FIFO-class thread with
different priorities and these threads try to get a same mutex.

--- foo.c ---
#include <stdio.h>
#include <pthread.h>
#include <time.h>

static pthread_mutex_t mutex;
static volatile int val;

static void *thread_func(void *arg)
{
	int v;
	pthread_mutex_lock(&mutex);
	v = val++;
	pthread_mutex_unlock(&mutex);
	printf("thread-%ld got %d\n", (long)arg, v);
	return NULL;
}

int main(int argc, char **argv)
{
	struct sched_param param;
	struct timespec ts;
	pthread_t tid[4];
	pthread_attr_t attr;
	int i;

#if 0
	int policy;
	pthread_getschedparam(pthread_self(), &policy, &param);
	policy = SCHED_FIFO;
	param.sched_priority = 99;
	pthread_setschedparam(pthread_self(), policy, &param);
#endif

	pthread_mutex_init(&mutex, NULL);
	pthread_mutex_lock(&mutex);
	pthread_attr_init(&attr);
	pthread_attr_setschedpolicy(&attr, SCHED_FIFO);
	pthread_attr_getschedparam(&attr, &param);
	pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);
	for (i = 0; i < sizeof(tid) / sizeof(tid[0]); i++) {
		param.sched_priority = 50 + i * 10;
		pthread_attr_setschedparam(&attr, &param);
		pthread_create(&tid[i], &attr, thread_func, (void *)i);
		printf("thread-%d pri %d\n", i, param.sched_priority);
	}
	
	ts.tv_sec = 3;
	ts.tv_nsec = 0;
	nanosleep(&ts, NULL);

	val++;
	pthread_mutex_unlock(&mutex);

	for (i = 0; i < sizeof(tid) / sizeof(tid[0]); i++)
		pthread_join(tid[i], NULL);
	return 0;
}
--- foo.c ---

I thought a thread with highest priority (thread-3) will get the
mutex first, so I expected:

thread-0 pri 50
thread-1 pri 60
thread-2 pri 70
thread-3 pri 80
thread-3 got 1
thread-2 got 2
thread-1 got 3
thread-0 got 4

but with NPTL (glibc 2.4, kernel 2.6.16, mips/i386) I got:

thread-0 pri 50
thread-1 pri 60
thread-2 pri 70
thread-3 pri 80
thread-3 got 4
thread-2 got 3
thread-1 got 2
thread-0 got 1

I can get the expected result with linuxthreads (glibc 2.3.6).

I also found that I can get expected result with NPTL if I enabled the
"#if 0" block in the sample program.

Is this glibc/NPTL issue, or kernel/futex issue?  (or my expectation
is wrong?)

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-06-12  8:10 NPTL mutex and the scheduling priority Atsushi Nemoto
@ 2006-06-12 12:23 ` Arjan van de Ven
  2006-06-12 12:44   ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Arjan van de Ven @ 2006-06-12 12:23 UTC (permalink / raw)
  To: Atsushi Nemoto; +Cc: linux-kernel

On Mon, 2006-06-12 at 17:10 +0900, Atsushi Nemoto wrote:
> # This is a copy of message posted libc-alpha ML.  I want to hear from
> # kernel people too ...
> 
> Hi.  I found that it seems NPTL's mutex does not follow the scheduling
> parameter.  If some threads were blocked by getting a single
> mutex_lock, I expect that a thread with highest priority got the lock
> first, but current NPTL's behaviour is different.
\

you want to use the PI futexes that are in 2.6.17-rc5-mm tree


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-06-12 12:23 ` Arjan van de Ven
@ 2006-06-12 12:44   ` Jakub Jelinek
  2006-06-12 15:24     ` Sébastien Dugué
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2006-06-12 12:44 UTC (permalink / raw)
  To: Arjan van de Ven, Ingo Molnar; +Cc: Atsushi Nemoto, linux-kernel

On Mon, Jun 12, 2006 at 02:23:28PM +0200, Arjan van de Ven wrote:
> On Mon, 2006-06-12 at 17:10 +0900, Atsushi Nemoto wrote:
> > # This is a copy of message posted libc-alpha ML.  I want to hear from
> > # kernel people too ...
> > 
> > Hi.  I found that it seems NPTL's mutex does not follow the scheduling
> > parameter.  If some threads were blocked by getting a single
> > mutex_lock, I expect that a thread with highest priority got the lock
> > first, but current NPTL's behaviour is different.
> \
> 
> you want to use the PI futexes that are in 2.6.17-rc5-mm tree

Even for normal mutices pthread_mutex_unlock and
pthread_cond_{signal,broadcast} is supposed to honor the RT priority and
scheduling policy when waking up:
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_trylock.html
"If there are threads blocked on the mutex object referenced by mutex when
pthread_mutex_unlock() is called, resulting in the mutex becoming available,
the scheduling policy shall determine which thread shall acquire the mutex."
and similarly for condvars.
"Use PI" is not a valid answer for this.
Really FUTEX_WAKE/FUTEX_REQUEUE can't use a FIFO.  I think there was a patch
floating around to use a plist there instead, which is one possibility,
another one is to keep the queue sorted by priority (and adjust whenever
priority changes - one thread can be waiting on at most one futex at a
time).

	Jakub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-06-12 12:44   ` Jakub Jelinek
@ 2006-06-12 15:24     ` Sébastien Dugué
  2006-06-12 16:06       ` Atsushi Nemoto
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Sébastien Dugué @ 2006-06-12 15:24 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Arjan van de Ven, Ingo Molnar, Atsushi Nemoto, linux-kernel,
	Pierre PEIFFER

On Mon, 2006-06-12 at 08:44 -0400, Jakub Jelinek wrote:
> On Mon, Jun 12, 2006 at 02:23:28PM +0200, Arjan van de Ven wrote:
> > On Mon, 2006-06-12 at 17:10 +0900, Atsushi Nemoto wrote:
> > > # This is a copy of message posted libc-alpha ML.  I want to hear from
> > > # kernel people too ...
> > > 
> > > Hi.  I found that it seems NPTL's mutex does not follow the scheduling
> > > parameter.  If some threads were blocked by getting a single
> > > mutex_lock, I expect that a thread with highest priority got the lock
> > > first, but current NPTL's behaviour is different.
> > \
> > 
> > you want to use the PI futexes that are in 2.6.17-rc5-mm tree
> 
> Even for normal mutices pthread_mutex_unlock and
> pthread_cond_{signal,broadcast} is supposed to honor the RT priority and
> scheduling policy when waking up:
> http://www.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_trylock.html
> "If there are threads blocked on the mutex object referenced by mutex when
> pthread_mutex_unlock() is called, resulting in the mutex becoming available,
> the scheduling policy shall determine which thread shall acquire the mutex."
> and similarly for condvars.
> "Use PI" is not a valid answer for this.
> Really FUTEX_WAKE/FUTEX_REQUEUE can't use a FIFO.  I think there was a patch
> floating around to use a plist there instead, which is one possibility,
> another one is to keep the queue sorted by priority (and adjust whenever
> priority changes - one thread can be waiting on at most one futex at a
> time).
> 

  The patch you refer to is at
http://marc.theaimsgroup.com/?l=linux-kernel&m=114725326712391&w=2

  But maybe a better solution for condvars would be to implement
something like a futex_requeue_pi() to handle the broadcast and
only use PI futexes all along in glibc.

  Any ideas?

  Sébastien.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-06-12 15:24     ` Sébastien Dugué
@ 2006-06-12 16:06       ` Atsushi Nemoto
  2006-09-07  8:11         ` Atsushi Nemoto
  2006-06-13  8:39       ` Pierre Peiffer
  2006-06-13  8:48       ` Jakub Jelinek
  2 siblings, 1 reply; 18+ messages in thread
From: Atsushi Nemoto @ 2006-06-12 16:06 UTC (permalink / raw)
  To: sebastien.dugue; +Cc: jakub, arjan, mingo, linux-kernel, pierre.peiffer

On Mon, 12 Jun 2006 17:24:28 +0200, Sébastien Dugué <sebastien.dugue@bull.net> wrote:
> > > you want to use the PI futexes that are in 2.6.17-rc5-mm tree
> > 
> > Even for normal mutices pthread_mutex_unlock and
> > pthread_cond_{signal,broadcast} is supposed to honor the RT priority and
> > scheduling policy when waking up:
> > http://www.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_trylock.html
> > "If there are threads blocked on the mutex object referenced by mutex when
> > pthread_mutex_unlock() is called, resulting in the mutex becoming available,
> > the scheduling policy shall determine which thread shall acquire the mutex."
> > and similarly for condvars.
> > "Use PI" is not a valid answer for this.
> > Really FUTEX_WAKE/FUTEX_REQUEUE can't use a FIFO.  I think there was a patch
> > floating around to use a plist there instead, which is one possibility,
> > another one is to keep the queue sorted by priority (and adjust whenever
> > priority changes - one thread can be waiting on at most one futex at a
> > time).
> > 
> 
>   The patch you refer to is at
> http://marc.theaimsgroup.com/?l=linux-kernel&m=114725326712391&w=2

Thank you all.  I'll look into PI futexes which seems the right
direction, but I still welcome short term (limited) solutions,
hopefully work with existing glibc.  I'll look at the plist patch.

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-06-12 15:24     ` Sébastien Dugué
  2006-06-12 16:06       ` Atsushi Nemoto
@ 2006-06-13  8:39       ` Pierre Peiffer
  2006-06-13  8:48       ` Jakub Jelinek
  2 siblings, 0 replies; 18+ messages in thread
From: Pierre Peiffer @ 2006-06-13  8:39 UTC (permalink / raw)
  To: Sébastien Dugué
  Cc: Jakub Jelinek, Arjan van de Ven, Ingo Molnar, Atsushi Nemoto,
	linux-kernel

Sébastien Dugué a écrit :
>   But maybe a better solution for condvars would be to implement
> something like a futex_requeue_pi() to handle the broadcast and
> only use PI futexes all along in glibc.
> 
>   Any ideas?

I'm currently thinking about it, and as far as I can see, it should be
technically feasible but not obvious.
In fact, PI-futex adds a rt-mutex behind each futex, when there are some
waiters. Each waiter is then queued two times: once in the chain list of
the hash-bucket, once in the (ordered) wait_list of the rt-mutex.

What we want, with a futex_requeue_pi, is a requeue of some tasks from
(futex1, rt_mutex1) to (futex2, rt_mutex2), respecting the wait_list
order of rt_mutex1.wait-list.
=> this needs something like a rt_mutex_requeue, and given an element of
rt_mutex1.wait_list, we need to retrieve its futex_q to requeue it to
the second hash-bucket chain (of futex2).

Moreover, we must take care of the case where the futex2 is not yet
locked (i.e. has no owner): there is not yet a pi_state nor a rt_mutex
associated with the futex2 ...

And during all of this, we must take care of several race conditions in
several places.

I'll continue my investigation, but I really wonder if futex_requeue_pi
will still be an "optimization" as it should be.

So comments from the experts are welcome ;-)

-- 
Pierre

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-06-12 15:24     ` Sébastien Dugué
  2006-06-12 16:06       ` Atsushi Nemoto
  2006-06-13  8:39       ` Pierre Peiffer
@ 2006-06-13  8:48       ` Jakub Jelinek
  2006-06-13 12:04         ` Sébastien Dugué
  2 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2006-06-13  8:48 UTC (permalink / raw)
  To: Sebastien Dugue
  Cc: Arjan van de Ven, Ingo Molnar, Atsushi Nemoto, linux-kernel,
	Pierre PEIFFER

On Mon, Jun 12, 2006 at 05:24:28PM +0200, S?bastien Dugu? wrote:
>   The patch you refer to is at
> http://marc.theaimsgroup.com/?l=linux-kernel&m=114725326712391&w=2
> 
>   But maybe a better solution for condvars would be to implement
> something like a futex_requeue_pi() to handle the broadcast and
> only use PI futexes all along in glibc.

FUTEX_REQUEUE certainly should be able to requeue from normal futex
to a PI futex or vice versa, I don't think it is desirable to create
a separate futex cmds for that.
Now not sure what do you mean by "use PI futexes all along in glibc",
certainly you don't mean using them for normal mutexes, right? 
FUTEX_LOCK_PI has effects the normal futexes shouldn't have.
The condvars can be also used with PP mutexes and using PI for the cv
internal lock unconditionally wouldn't be the right thing either.

	Jakub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-06-13  8:48       ` Jakub Jelinek
@ 2006-06-13 12:04         ` Sébastien Dugué
  2006-06-13 12:56           ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Sébastien Dugué @ 2006-06-13 12:04 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Arjan van de Ven, Ingo Molnar, Atsushi Nemoto, linux-kernel,
	Pierre PEIFFER

On Tue, 2006-06-13 at 04:48 -0400, Jakub Jelinek wrote:
> On Mon, Jun 12, 2006 at 05:24:28PM +0200, S?bastien Dugu? wrote:
> >   The patch you refer to is at
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=114725326712391&w=2
> > 
> >   But maybe a better solution for condvars would be to implement
> > something like a futex_requeue_pi() to handle the broadcast and
> > only use PI futexes all along in glibc.
> 
> FUTEX_REQUEUE certainly should be able to requeue from normal futex
> to a PI futex or vice versa, I don't think it is desirable to create
> a separate futex cmds for that.

  Indeed, that would be preferable but might get tricky.

> Now not sure what do you mean by "use PI futexes all along in glibc",
> certainly you don't mean using them for normal mutexes, right? 
> FUTEX_LOCK_PI has effects the normal futexes shouldn't have.
> The condvars can be also used with PP mutexes and using PI for the cv
> internal lock unconditionally wouldn't be the right thing either.

  I effectively meant using a PI futex for the cv __data.__futex but now
I realize it's a Really Bad Idea.

  To summarize (correct me if I'm wrong), we need a way in the broadcast
case to promote the cv __data.__futex type to the type of the external
mutex (PI, PP, normal) in the requeue path. Therefore we need the
ability to requeue waiters on a regular futex onto a PI futex.

  Ingo, Thomas, is this feasible?

  Sébastien.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-06-13 12:04         ` Sébastien Dugué
@ 2006-06-13 12:56           ` Jakub Jelinek
  2006-06-14 13:19             ` Sébastien Dugué
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2006-06-13 12:56 UTC (permalink / raw)
  To: Sebastien Dugue
  Cc: Arjan van de Ven, Ingo Molnar, Atsushi Nemoto, Ulrich Drepper,
	linux-kernel, Pierre PEIFFER

On Tue, Jun 13, 2006 at 02:04:32PM +0200, S?bastien Dugu? wrote:
> > Now not sure what do you mean by "use PI futexes all along in glibc",
> > certainly you don't mean using them for normal mutexes, right? 
> > FUTEX_LOCK_PI has effects the normal futexes shouldn't have.
> > The condvars can be also used with PP mutexes and using PI for the cv
> > internal lock unconditionally wouldn't be the right thing either.
> 
>   I effectively meant using a PI futex for the cv __data.__futex but now
> I realize it's a Really Bad Idea.

For __data.__futex?  That is not just a really bad idea, that's not possible
at all.  PI futex has a hardcoded format (owner tid plus 2 control bits in
the MSB), while cv in __data.__futex needs to have an application controlled
format (a counter).

>   To summarize (correct me if I'm wrong), we need a way in the broadcast
> case to promote the cv __data.__futex type to the type of the external
> mutex (PI, PP, normal) in the requeue path. Therefore we need the
> ability to requeue waiters on a regular futex onto a PI futex.

We need more things:
Have FUTEX_WAKE/FUTEX_REQUEUE/FUTEX_WAKE_OP honor the scheduling policy/priorities
rather than use FIFO (this is needed not just for proper behavior
of pthread_mutex_unlock, but also for __data.__futex).

FUTEX_REQUEUE is used by pthread_cond_signal to requeue the __data.__futex
onto __data.__lock.  So it all depends on what futex type is used by the
__data.__lock.  A CV doesn't have a mutex associated to it at all times,
when there is no contention on it, it really doesn't matter.  Maybe
it would be possible to use PI futex always (well, if FUTEX_LOCK_PI is
supported, otherwise of course only use normal lock) for the internal lock
though.  All we want to ensure is that for the short time it is held
(no blocking operation should be done while __data.__lock is held
unless interrupted by signal) if the owning thread of __data.__lock is
scheduled away (or interrupted by signal) it doesn't cause priority
inversion.  That even includes e.g. two threads with different priorities
calling pthread_signal or pthread_broadcast concurrently (and in that
case, we might not have an associated mutex at all).
But, for PI __data.__lock, we need:
1) FUTEX_REQUEUE being able to requeue non-PI futex to PI futex
2) FUTEX_WAKE_OP alternative that allows the target futex be a PI futex:
   ATM NPTL uses FUTEX_OP_CLEAR_WAKE_IF_GT_ONE   ((4 << 24) | 1)
   FUTEX_WAKE_OP operation, i.e. atomically
   FUTEX_WAKE (futex); int old = *lock; *lock = 0; if (old > 1) FUTEX_WAKE (lock);
   but with PI __data.__lock, we want instead atomically:
   FUTEX_WAKE (futex); /* This one is normal, non-PI */ FUTEX_UNLOCK_PI (lock);
   (or perhaps:
   FUTEX_WAKE (futex); int old = *lock;
   if (old & FUTEX_WAITERS)
     FUTEX_UNLOCK_PI (lock);
   else if (old == gettid ())
     *lock = 0;
   else
     FUTEX_UNLOCK_PI (lock);
   ).

	Jakub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-06-13 12:56           ` Jakub Jelinek
@ 2006-06-14 13:19             ` Sébastien Dugué
  2006-06-14 13:28               ` Jakub Jelinek
                                 ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Sébastien Dugué @ 2006-06-14 13:19 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Arjan van de Ven, Ingo Molnar, Atsushi Nemoto, Ulrich Drepper,
	linux-kernel, Pierre PEIFFER

On Tue, 2006-06-13 at 08:56 -0400, Jakub Jelinek wrote:
> On Tue, Jun 13, 2006 at 02:04:32PM +0200, S?bastien Dugu? wrote:
> > > Now not sure what do you mean by "use PI futexes all along in glibc",
> > > certainly you don't mean using them for normal mutexes, right? 
> > > FUTEX_LOCK_PI has effects the normal futexes shouldn't have.
> > > The condvars can be also used with PP mutexes and using PI for the cv
> > > internal lock unconditionally wouldn't be the right thing either.
> > 
> >   I effectively meant using a PI futex for the cv __data.__futex but now
> > I realize it's a Really Bad Idea.
> 
> For __data.__futex?  That is not just a really bad idea, that's not possible
> at all.  PI futex has a hardcoded format (owner tid plus 2 control bits in
> the MSB), while cv in __data.__futex needs to have an application controlled
> format (a counter).

  Darn right, overlooked that!

> 
> >   To summarize (correct me if I'm wrong), we need a way in the broadcast
> > case to promote the cv __data.__futex type to the type of the external
> > mutex (PI, PP, normal) in the requeue path. Therefore we need the
> > ability to requeue waiters on a regular futex onto a PI futex.
> 
> We need more things:
> Have FUTEX_WAKE/FUTEX_REQUEUE/FUTEX_WAKE_OP honor the scheduling policy/priorities
> rather than use FIFO (this is needed not just for proper behavior
> of pthread_mutex_unlock, but also for __data.__futex).

  Well, this is what I tried to achieve by using plists.

> 
> FUTEX_REQUEUE is used by pthread_cond_signal to requeue the __data.__futex
> onto __data.__lock.

  You meant FUTEX_WAKE_OP, I guess. I could not find any place still 
using FUTEX_REQUEUE in glibc 2.4.

>   So it all depends on what futex type is used by the
> __data.__lock.

  OK.

>   A CV doesn't have a mutex associated to it at all times,
> when there is no contention on it, it really doesn't matter.

  Right.

>   Maybe
> it would be possible to use PI futex always (well, if FUTEX_LOCK_PI is
> supported, otherwise of course only use normal lock) for the internal lock
> though.  All we want to ensure is that for the short time it is held
> (no blocking operation should be done while __data.__lock is held
> unless interrupted by signal) if the owning thread of __data.__lock is
> scheduled away (or interrupted by signal) it doesn't cause priority
> inversion.  That even includes e.g. two threads with different priorities
> calling pthread_signal or pthread_broadcast concurrently (and in that
> case, we might not have an associated mutex at all).
> But, for PI __data.__lock, we need:
> 1) FUTEX_REQUEUE being able to requeue non-PI futex to PI futex

  We're trying to look into that right now.

> 2) FUTEX_WAKE_OP alternative that allows the target futex be a PI futex:
>    ATM NPTL uses FUTEX_OP_CLEAR_WAKE_IF_GT_ONE   ((4 << 24) | 1)
>    FUTEX_WAKE_OP operation, i.e. atomically
>    FUTEX_WAKE (futex); int old = *lock; *lock = 0; if (old > 1) FUTEX_WAKE (lock);
>    but with PI __data.__lock, we want instead atomically:
>    FUTEX_WAKE (futex); /* This one is normal, non-PI */ FUTEX_UNLOCK_PI (lock);
>    (or perhaps:
>    FUTEX_WAKE (futex); int old = *lock;
>    if (old & FUTEX_WAITERS)
>      FUTEX_UNLOCK_PI (lock);
>    else if (old == gettid ())
>      *lock = 0;
>    else
>      FUTEX_UNLOCK_PI (lock);
>    ).

  Understood.

  Thanks a lot for the info.

  Sébastien.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-06-14 13:19             ` Sébastien Dugué
@ 2006-06-14 13:28               ` Jakub Jelinek
  2006-06-14 13:38               ` Pierre Peiffer
  2006-06-15  9:28               ` Pierre Peiffer
  2 siblings, 0 replies; 18+ messages in thread
From: Jakub Jelinek @ 2006-06-14 13:28 UTC (permalink / raw)
  To: Sebastien Dugue
  Cc: Arjan van de Ven, Ingo Molnar, Atsushi Nemoto, Ulrich Drepper,
	linux-kernel, Pierre PEIFFER

On Wed, Jun 14, 2006 at 03:19:40PM +0200, S?bastien Dugu? wrote:
> > FUTEX_REQUEUE is used by pthread_cond_signal to requeue the __data.__futex
> > onto __data.__lock.
> 
>   You meant FUTEX_WAKE_OP, I guess. I could not find any place still 
> using FUTEX_REQUEUE in glibc 2.4.

glibc 2.4 uses FUTEX_CMP_REQUEUE, true, but both FUTEX_REQUEUE and
FUTEX_CMP_REQUEUE should behave the same in this regard (after all, they are
implemented using the same futex_requeue routine in the kernel).
FUTEX_CMP_REQUEUE is used in pthread_cond_broadcast, FUTEX_WAKE_OP is used in
pthread_cond_signal.  E.g. nptl/sysdeps/pthread/pthread_cond_broadcast.c:
...
      /* Wake everybody.  */
      pthread_mutex_t *mut = (pthread_mutex_t *) cond->__data.__mutex;
      /* lll_futex_requeue returns 0 for success and non-zero
         for errors.  */
      if (__builtin_expect (lll_futex_requeue (&cond->__data.__futex, 1,
                                               INT_MAX, &mut->__data.__lock,
                                               futex_val), 0))
        {
          /* The requeue functionality is not available.  */
        wake_all:
          lll_futex_wake (&cond->__data.__futex, INT_MAX);
        }
and nptl/sysdeps/pthread/pthread_cond_signal.c:
...
      /* Wake one.  */
      if (! __builtin_expect (lll_futex_wake_unlock (&cond->__data.__futex, 1,
                                                     1, &cond->__data.__lock),
                                                     0))
        return 0;

      lll_futex_wake (&cond->__data.__futex, 1);
    }

  /* We are done.  */
  lll_mutex_unlock (cond->__data.__lock);

	Jakub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-06-14 13:19             ` Sébastien Dugué
  2006-06-14 13:28               ` Jakub Jelinek
@ 2006-06-14 13:38               ` Pierre Peiffer
  2006-06-15  9:28               ` Pierre Peiffer
  2 siblings, 0 replies; 18+ messages in thread
From: Pierre Peiffer @ 2006-06-14 13:38 UTC (permalink / raw)
  To: Sébastien Dugué
  Cc: Jakub Jelinek, Arjan van de Ven, Ingo Molnar, Atsushi Nemoto,
	Ulrich Drepper, linux-kernel

Sébastien Dugué a écrit :

> 
>> FUTEX_REQUEUE is used by pthread_cond_signal to requeue the __data.__futex
>> onto __data.__lock.
> 
>   You meant FUTEX_WAKE_OP, I guess. I could not find any place still 
> using FUTEX_REQUEUE in glibc 2.4.

... FUTEX_CMP_REQUEUE ... ;-)

FUTEX_REQUEUE is obsolote ...

-- 
Pierre P.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-06-14 13:19             ` Sébastien Dugué
  2006-06-14 13:28               ` Jakub Jelinek
  2006-06-14 13:38               ` Pierre Peiffer
@ 2006-06-15  9:28               ` Pierre Peiffer
  2 siblings, 0 replies; 18+ messages in thread
From: Pierre Peiffer @ 2006-06-15  9:28 UTC (permalink / raw)
  To: Sébastien Dugué
  Cc: Jakub Jelinek, Arjan van de Ven, Ingo Molnar, Atsushi Nemoto,
	Ulrich Drepper, linux-kernel

Sébastien Dugué a écrit :

>> 1) FUTEX_REQUEUE being able to requeue non-PI futex to PI futex
> 
>   We're trying to look into that right now.
> 

Humm, let me try to propose a kind of design for a futex_requeue_pi
which does that.

According to the existing code in -mm tree:

static int futex_requeue_pi(u32 __user *uaddr1, u32 __user *uaddr2,
                 int nr_wake, int nr_requeue, u32 *cmpval) {

    1. for the first nr_wake task in the queue
            => call wake_futex(...) /* as today */
    2. if it remains some tasks in the list to requeue
       a) we must have a pi_state for the futex2.
          *) futex2 has an owner and has some waiters (easy case):
             => walk the list hb2->chain, look for a futex_q matching the
                key2 (as in lookup_pi_state).
             => retrieve the pi_state attached to this futex_q
       OR *) futex2 has an owner but no waiter yet:
             => alloc a pi_state as in lookup_pi_state.
             => set FUTEX_WAITERS flag
       OR *) futex2 has no owner and no waiter:
             => set the FUTEX_WAITERS flag on futex2
             => alloc a pi_state _without_ owner.
             => initialize the rtmutex without owner.

       b) for each futex_q to requeue
          *) requeue the futex_q from hb1 to hb2 (if not the same)
          *) set futex_q->pi_state.
          *) initialize the rt_mutex_waiter pointed by
             futex_q->task->pi_blocked_on with the properties of task
             futex_q->task  (see futex_wait below).
          *) queue it on the wait_list of pi_state->pi_mutex

       c) if there was some waiters on futex2 before us
          *) take care of prio propagation.
          *) if the top_waiter of pi_state->pi_mutex has changed
             => update pi_state->owner->pi_waiters
}


in futex_wait:
==============

static int futex_wait(...) {
    ...
    struct rt_mutex_waiter waiter;  /* hum hum */
    ...

    memset(&waiter, 0, sizeof(waiter));
    current->pi_blocked_on = &waiter; /* will be used in case of requeue
                                         on a PI-futex */
    ...

    /* just before unqueue_me */
    if (q->pi_state) {
        /* we were requeued on a PI-futex */
        1. do what is done at the end of futex_lock_pi
           after "rt_mutex_timed_lock"
        2. may be: do what is done at the end of rtmutex_slowlock
           (remove ourself from the wait_list, adjust prio, ...)

    } else if (!unqueue_me(&q))
       ...

    ...
}

in futex_lock_pi:
=================

static int do_futex_lock_pi(u32 __user *uaddr, int detect, int trylock,
                 struct hrtimer_sleeper *to) {

     ...

/* take care of the case where the futex is free (no owner)
    but there are some waiters that were requeued (futex_requeue_pi) */

   if ((curval & FUTEX_WAITERS) && (curval & FUTEX_TID_MASK) == 0) {
       1. make current the futex owner
         newval = curval & current->pid;

         inc_preempt_count();
         curval = futex_atomic_cmpxchg_inatomic(uaddr, FUTEX_WAITERS, 
newval);
         dec_preempt_count();
         /* handle faulty case */

       2. search a futex_q whose key match the current one to retrieve
          the pi_state.
       3. make current the owner of the pi_state:
          a) list_add(&pi_state->list, &current->pi_state_list);
          b) pi_state->owner = current;
       4. make current the owner of the pi_mutex:
          a) pi_mutex->owner = current;
          b) plist_add(&rt_mutex_top_waiter(&pi_mutex)->pi_list_entry,
                       &current->pi_waiters);
       5. unlock all locks and return
     }

    ...
}


... any kind of comments are welcome ...

Here are some from me:

1. The use of a rt_mutex_waiter in futex_wait does not look very clean 
to me... (or not clean at all...). And thus, I wonder if, for example, 
mixing futex_q and rt_mutex more closely would not be more helpfull in 
this case ??

2. in futex_requeue_pi: how do we know if futex2 is a PI-futex or a 
normal futex ? If there are some waiters, we can check if there is a 
pi_state linked to a futex_q, but otherwise ?
Proposal: use two separate commands (FUTEX_CMP_REQUEUE and 
FUTEX_CMP_REQUEUE_PI) and let the glibc do the choice, as it knows which 
kind of mutex/futex it uses.

Thanks.

Running for shelter now !

-- 
Pierre

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-06-12 16:06       ` Atsushi Nemoto
@ 2006-09-07  8:11         ` Atsushi Nemoto
  2006-09-07  8:32           ` Jakub Jelinek
  0 siblings, 1 reply; 18+ messages in thread
From: Atsushi Nemoto @ 2006-09-07  8:11 UTC (permalink / raw)
  To: sebastien.dugue; +Cc: jakub, arjan, mingo, linux-kernel, pierre.peiffer

On Tue, 13 Jun 2006 01:06:28 +0900 (JST), Atsushi Nemoto <anemo@mba.ocn.ne.jp> wrote:
> > > Really FUTEX_WAKE/FUTEX_REQUEUE can't use a FIFO.  I think there was a patch
> > > floating around to use a plist there instead, which is one possibility,
> > > another one is to keep the queue sorted by priority (and adjust whenever
> > > priority changes - one thread can be waiting on at most one futex at a
> > > time).
> > > 
> > 
> >   The patch you refer to is at
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=114725326712391&w=2
> 
> Thank you all.  I'll look into PI futexes which seems the right
> direction, but I still welcome short term (limited) solutions,
> hopefully work with existing glibc.  I'll look at the plist patch.

Three months after, I have tried kernel 2.6.18 with recent glibc.  I
got desired results for pthread_mutex_unlock and
pthread_cond_broadcast, with PI-mutex.

But pthread_cond_signal and sem_post still wakeup a thread in FIFO
order, as you can guess.

With the plist patch (applied by hand), I can get desired behavior.
Thank you.  But It seems the patch lacks reordering on priority
changes.

Are there any patch or future plan to address remaining wakeup-order
issues?

<off_topic>
BTW, If I tried to create a PI mutex on a kernel without PI futex
support, pthread_mutexattr_setprotocol(PTHREAD_PRIO_INHERIT) returned
0 and pthread_mutex_init() returned ENOTSUP.  This is not a right
behavior according to the manual ...
</off_topic>

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-09-07  8:11         ` Atsushi Nemoto
@ 2006-09-07  8:32           ` Jakub Jelinek
  2006-09-07  9:30             ` Atsushi Nemoto
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Jelinek @ 2006-09-07  8:32 UTC (permalink / raw)
  To: Atsushi Nemoto
  Cc: sebastien.dugue, arjan, mingo, linux-kernel, pierre.peiffer,
	Ulrich Drepper

On Thu, Sep 07, 2006 at 05:11:58PM +0900, Atsushi Nemoto wrote:
> Three months after, I have tried kernel 2.6.18 with recent glibc.  I
> got desired results for pthread_mutex_unlock and
> pthread_cond_broadcast, with PI-mutex.
> 
> But pthread_cond_signal and sem_post still wakeup a thread in FIFO
> order, as you can guess.
> 
> With the plist patch (applied by hand), I can get desired behavior.
> Thank you.  But It seems the patch lacks reordering on priority
> changes.

Yes, either something like the plist patch for FUTEX_WAKE etc. or, if that
proves to be too slow for the usual case (non-RT threads), FIFO wakeup
initially and conversion to plist wakeup whenever first waiter with realtime
priority is added, is still needed.  That will cure e.g. non-PI
pthread_mutex_unlock and sem_post.  For pthread_cond_{signal,broadcast} we
need further kernel changes, so that the condvar's internal lock can be
always a PI lock.

> <off_topic>
> BTW, If I tried to create a PI mutex on a kernel without PI futex
> support, pthread_mutexattr_setprotocol(PTHREAD_PRIO_INHERIT) returned
> 0 and pthread_mutex_init() returned ENOTSUP.  This is not a right
> behavior according to the manual ...
> </off_topic>

Why?
POSIX doesn't forbid ENOTSUP in pthread_mutex_init to my knowledge.

	Jakub

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-09-07  8:32           ` Jakub Jelinek
@ 2006-09-07  9:30             ` Atsushi Nemoto
  2006-09-07  9:37               ` Andreas Schwab
  0 siblings, 1 reply; 18+ messages in thread
From: Atsushi Nemoto @ 2006-09-07  9:30 UTC (permalink / raw)
  To: jakub; +Cc: sebastien.dugue, arjan, mingo, linux-kernel, pierre.peiffer,
	drepper

On Thu, 7 Sep 2006 04:32:44 -0400, Jakub Jelinek <jakub@redhat.com> wrote:
> > But pthread_cond_signal and sem_post still wakeup a thread in FIFO
> > order, as you can guess.
> > 
> > With the plist patch (applied by hand), I can get desired behavior.
> > Thank you.  But It seems the patch lacks reordering on priority
> > changes.
> 
> Yes, either something like the plist patch for FUTEX_WAKE etc. or, if that
> proves to be too slow for the usual case (non-RT threads), FIFO wakeup
> initially and conversion to plist wakeup whenever first waiter with realtime
> priority is added, is still needed.  That will cure e.g. non-PI
> pthread_mutex_unlock and sem_post.  For pthread_cond_{signal,broadcast} we
> need further kernel changes, so that the condvar's internal lock can be
> always a PI lock.

Thank you, I'll stay tuned.

> > <off_topic>
> > BTW, If I tried to create a PI mutex on a kernel without PI futex
> > support, pthread_mutexattr_setprotocol(PTHREAD_PRIO_INHERIT) returned
> > 0 and pthread_mutex_init() returned ENOTSUP.  This is not a right
> > behavior according to the manual ...
> > </off_topic>
> 
> Why?
> POSIX doesn't forbid ENOTSUP in pthread_mutex_init to my knowledge.

http://www.opengroup.org/onlinepubs/009695399/functions/pthread_mutexattr_setprotocol.html
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_init.html

>From ERRORS section of pthread_mutexattr_setprotocol:

	The pthread_mutexattr_setprotocol() function shall fail if:
	[ENOTSUP]
	    The value specified by protocol is an unsupported value. 

And ENOTSUP is not enumerated in ERRORS section of pthread_mutex_init.

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-09-07  9:30             ` Atsushi Nemoto
@ 2006-09-07  9:37               ` Andreas Schwab
  2006-09-07  9:42                 ` Atsushi Nemoto
  0 siblings, 1 reply; 18+ messages in thread
From: Andreas Schwab @ 2006-09-07  9:37 UTC (permalink / raw)
  To: Atsushi Nemoto
  Cc: jakub, sebastien.dugue, arjan, mingo, linux-kernel,
	pierre.peiffer, drepper

Atsushi Nemoto <nemoto@toshiba-tops.co.jp> writes:

> And ENOTSUP is not enumerated in ERRORS section of pthread_mutex_init.

POSIX does not forbid additional error conditions, as long as the
described conditions are properly reported with the documented error
numbers.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NPTL mutex and the scheduling priority
  2006-09-07  9:37               ` Andreas Schwab
@ 2006-09-07  9:42                 ` Atsushi Nemoto
  0 siblings, 0 replies; 18+ messages in thread
From: Atsushi Nemoto @ 2006-09-07  9:42 UTC (permalink / raw)
  To: schwab
  Cc: jakub, sebastien.dugue, arjan, mingo, linux-kernel,
	pierre.peiffer, drepper

On Thu, 07 Sep 2006 11:37:54 +0200, Andreas Schwab <schwab@suse.de> wrote:
> > And ENOTSUP is not enumerated in ERRORS section of pthread_mutex_init.
> 
> POSIX does not forbid additional error conditions, as long as the
> described conditions are properly reported with the documented error
> numbers.

Oh, I see the point.  Thank you.

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-09-07  9:42 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-12  8:10 NPTL mutex and the scheduling priority Atsushi Nemoto
2006-06-12 12:23 ` Arjan van de Ven
2006-06-12 12:44   ` Jakub Jelinek
2006-06-12 15:24     ` Sébastien Dugué
2006-06-12 16:06       ` Atsushi Nemoto
2006-09-07  8:11         ` Atsushi Nemoto
2006-09-07  8:32           ` Jakub Jelinek
2006-09-07  9:30             ` Atsushi Nemoto
2006-09-07  9:37               ` Andreas Schwab
2006-09-07  9:42                 ` Atsushi Nemoto
2006-06-13  8:39       ` Pierre Peiffer
2006-06-13  8:48       ` Jakub Jelinek
2006-06-13 12:04         ` Sébastien Dugué
2006-06-13 12:56           ` Jakub Jelinek
2006-06-14 13:19             ` Sébastien Dugué
2006-06-14 13:28               ` Jakub Jelinek
2006-06-14 13:38               ` Pierre Peiffer
2006-06-15  9:28               ` Pierre Peiffer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox