Re: New PriorityInheritanceTest - bug in 2.6.17-rt7 confirmed

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: New PriorityInheritanceTest - bug in 2.6.17-rt7 confirmed
  2006-07-06 13:07 New PriorityInheritanceTest - bug in 2.6.17-rt7 confirmed Esben Nielsen
@ 2006-07-06 12:34 ` Thomas Gleixner
  2006-07-06 14:11   ` Esben Nielsen
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Gleixner @ 2006-07-06 12:34 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Ingo Molnar, linux-kernel, Steven Rostedt

On Thu, 2006-07-06 at 14:07 +0100, Esben Nielsen wrote:
> So this is a real bug.

True

> In the previous mail I posted a fix for that problem (and other problems).

I had not much time to look at the patch, but I doubt that we need such
a complex hack to achieve that. I will look at it later.

	tglx



^ permalink raw reply	[flat|nested] 6+ messages in thread

* New PriorityInheritanceTest - bug in 2.6.17-rt7 confirmed
@ 2006-07-06 13:07 Esben Nielsen
  2006-07-06 12:34 ` Thomas Gleixner
  0 siblings, 1 reply; 6+ messages in thread
From: Esben Nielsen @ 2006-07-06 13:07 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Steven Rostedt, Thomas Gleixner

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2152 bytes --]

Hi,
  I finally got a glibc with PI mutexes compiled and started to work on my 
old PriorityInheritanceTest tool. This tool previously used the blocker 
device in the kernel to test the in-kernel PI mechanism, now it uses 
pthread_mutex (the blocker calling code is still there in a #ifdef).

This is the idea:

Some SCHED_OTHER tasks goes into a critical section and busy loops 
from 1ms.  The critical section can be protected by one or more mutexes.
One SCHED_OTHER task takes mutex 0, another takes mutex 1 and then mutex 0,
a third mutex 2, mutex 1 and then mutex 0 etc. Once they have mutex 0
they spin for 1ms and releases the mutexes again. This is repeated with
no interruptions.

A RT task now tries to take all the mutexes. How long does it have to 
wait? A long time ago we calculated that on a SMP machine the worst case 
should be 2^{number of mutexes}-1 ms and UP it should just me {number of 
mutexes} ms. On my UP machine it is as predicted.

Now I have added a timeout as well, i.e. the RT task calls
pthread_mutex_timedlock(). If I set the timeout to 0.5 ms, the worst case
should be 0.5 ms, right? It isn't on 2.6.17-rt7 on my UP machine. It is as 
the timeout doesn't have any effect.

I have predicted this bug in an earlier mail 
(http://marc.theaimsgroup.com/?l=linux-kernel&m=115192381727078&w=2).
It is basicly because in the event of a timeout the RT task doesn't get 
the CPU after the boosted non-RT tasks are finished. And when the RT task 
doesn't get the CPU it can't deboost the non-RT tasks...

Man page for pthread_mutex_timedlock:
"As a consequence of the priority inheritance rules (for mutexes initialized
with the PRIO_INHERIT protocol), if a timed mutex wait is terminated because
its timeout expires, the priority of the owner of the mutex shall be 
adjusted as necessary to reflect the fact that this thread is no longer among
the threads waiting for the mutex."

So this is a real bug.

In the previous mail I posted a fix for that problem (and other problems).

I have attached the PriorityInheritanceTest program. To see the bug try

./test --timeout 500000 --samples 500 --tasks 2

on a UP machine.'

Esben

[-- Attachment #2: Type: APPLICATION/x-gtar, Size: 9503 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: New PriorityInheritanceTest - bug in 2.6.17-rt7 confirmed
  2006-07-06 14:11   ` Esben Nielsen
@ 2006-07-06 13:32     ` Ingo Molnar
  2006-07-06 16:20       ` Esben Nielsen
  0 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2006-07-06 13:32 UTC (permalink / raw)
  To: Esben Nielsen; +Cc: Thomas Gleixner, linux-kernel, Steven Rostedt


* Esben Nielsen <nielsen.esben@googlemail.com> wrote:

> It can run within try_to_wake_up(). But then it the whole lock chain 
> is traversed in an atomic section. That unpredictable overall system 
> latencies since the locks can be in userspace. So it has to run in 
> some task. That task has to be high priority enough to preempt the 
> boosted tasks, but it can't be so high priority that it bothers any 
> higher priority threads than those involved in this. So it can't be, 
> forinstance a general priority 99 task we just use for this. We thus 
> need something running at a slightly higher priority than the priority 
> to which the tasks are boosted, but not a full +1 priority. I.e. we 
> need to run it at priority "+0.5".

we could just queue the task in front of the other task in the runqueue, 
and mark that task for reschedule if it's running currently. (Doing this 
is not without precedent: we do something similar in wake_up_new_task() 
to implement child-runs-first logic.)

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: New PriorityInheritanceTest - bug in 2.6.17-rt7 confirmed
  2006-07-06 12:34 ` Thomas Gleixner
@ 2006-07-06 14:11   ` Esben Nielsen
  2006-07-06 13:32     ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: Esben Nielsen @ 2006-07-06 14:11 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Esben Nielsen, Ingo Molnar, linux-kernel, Steven Rostedt

On Thu, 6 Jul 2006, Thomas Gleixner wrote:

> On Thu, 2006-07-06 at 14:07 +0100, Esben Nielsen wrote:
>> So this is a real bug.
>
> True
>
>> In the previous mail I posted a fix for that problem (and other problems).
>
> I had not much time to look at the patch, but I doubt that we need such
> a complex hack to achieve that. I will look at it later.

The problem is that the deboosting code have to run somewhere.
It can run within try_to_wake_up(). But then it the whole lock chain is 
traversed in an atomic section. That unpredictable overall system 
latencies since the locks can be in userspace.
So it has to run in some task. That task has to be high priority enough 
to preempt the boosted tasks, but it can't be so high priority that it bothers
any higher priority threads than those involved in this. So it can't be, 
forinstance a general priority 99 task we just use for this. We thus need 
something running at a slightly higher priority than the priority to 
which the tasks are boosted, but not a full +1 priority. I.e. we need to 
run it at priority "+0.5".

I also think that other stuff, like high resolution timers and others 
doing "scheduler plumbing work" in the kernel could benifit from a +0.5 
priority.

I thought about some improvements:
1) Make a general TSK_LIFO flag, That would remove some of the direct 
references in sched.c to the rtmutex system. In effect it will be the 
same, but be more usefull to other subsystems.
2) Double the number of in-kernel priorities. I.e. simply add a number of 
"hidden" priorities in which this kind of "plumbing" work can be run:

Kernel priority          User space
      0                      hidden
      1                        RT 99
      2                      hidden
      3                        RT 98
....

    199                         0
...

This might turn out more clean than a TSK_LIFO. There will be no need to
hack the core scheduler code, which can have some strange side-effect. 
But to be honest I don't think the hacks I have done are that bad - except
they refer directly to the rtmutex subsystem. Also adding priorities would
slow down the system.

>
> 	tglx
>
>

Esben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: New PriorityInheritanceTest - bug in 2.6.17-rt7 confirmed
  2006-07-06 13:32     ` Ingo Molnar
@ 2006-07-06 16:20       ` Esben Nielsen
  2006-07-07  0:02         ` Esben Nielsen
  0 siblings, 1 reply; 6+ messages in thread
From: Esben Nielsen @ 2006-07-06 16:20 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Esben Nielsen, Thomas Gleixner, linux-kernel, Steven Rostedt

On Thu, 6 Jul 2006, Ingo Molnar wrote:

>
> * Esben Nielsen <nielsen.esben@googlemail.com> wrote:
>
>> It can run within try_to_wake_up(). But then it the whole lock chain
>> is traversed in an atomic section. That unpredictable overall system
>> latencies since the locks can be in userspace. So it has to run in
>> some task. That task has to be high priority enough to preempt the
>> boosted tasks, but it can't be so high priority that it bothers any
>> higher priority threads than those involved in this. So it can't be,
>> forinstance a general priority 99 task we just use for this. We thus
>> need something running at a slightly higher priority than the priority
>> to which the tasks are boosted, but not a full +1 priority. I.e. we
>> need to run it at priority "+0.5".
>
> we could just queue the task in front of the other task in the runqueue,
> and mark that task for reschedule if it's running currently. (Doing this
> is not without precedent: we do something similar in wake_up_new_task()
> to implement child-runs-first logic.)
>
I think that is more or less what my patch does...

Esben

> 	Ingo
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: New PriorityInheritanceTest - bug in 2.6.17-rt7 confirmed
  2006-07-06 16:20       ` Esben Nielsen
@ 2006-07-07  0:02         ` Esben Nielsen
  0 siblings, 0 replies; 6+ messages in thread
From: Esben Nielsen @ 2006-07-07  0:02 UTC (permalink / raw)
  To: Esben Nielsen
  Cc: Ingo Molnar, Esben Nielsen, Thomas Gleixner, linux-kernel,
	Steven Rostedt



On Thu, 6 Jul 2006, Esben Nielsen wrote:

> On Thu, 6 Jul 2006, Ingo Molnar wrote:
>
>>
>>  * Esben Nielsen <nielsen.esben@googlemail.com> wrote:
>> 
>> >  It can run within try_to_wake_up(). But then it the whole lock chain
>> >  is traversed in an atomic section. That unpredictable overall system
>> >  latencies since the locks can be in userspace. So it has to run in
>> >  some task. That task has to be high priority enough to preempt the
>> >  boosted tasks, but it can't be so high priority that it bothers any
>> >  higher priority threads than those involved in this. So it can't be,
>> >  forinstance a general priority 99 task we just use for this. We thus
>> >  need something running at a slightly higher priority than the priority
>> >  to which the tasks are boosted, but not a full +1 priority. I.e. we
>> >  need to run it at priority "+0.5".
>>
>>  we could just queue the task in front of the other task in the runqueue,
>>  and mark that task for reschedule if it's running currently. (Doing this
>>  is not without precedent: we do something similar in wake_up_new_task()
>>  to implement child-runs-first logic.)
>> 
> I think that is more or less what my patch does...
>
I was a little bit in a hurry when I sent that comment:

What my patch does is to ensure rtmutex-boosters keep their priority and 
is scheduled to the head of the runqueue at their given priority.
What is ugly is that the scheduler core code knows about the rtmutex stuff 
directly. The previous mail was about how to generalise this so that other 
subsystems with similar needs can use it too.

Esben


> Esben
>
>>   Ingo
>> 
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-07-06 23:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-06 13:07 New PriorityInheritanceTest - bug in 2.6.17-rt7 confirmed Esben Nielsen
2006-07-06 12:34 ` Thomas Gleixner
2006-07-06 14:11   ` Esben Nielsen
2006-07-06 13:32     ` Ingo Molnar
2006-07-06 16:20       ` Esben Nielsen
2006-07-07  0:02         ` Esben Nielsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox