* [Xenomai-core] Prio-inversion on cleanup?
@ 2006-06-27 16:44 Jan Kiszka
2006-06-28 14:45 ` Jan Kiszka
0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2006-06-27 16:44 UTC (permalink / raw)
To: xenomai-core
[-- Attachment #1: Type: text/plain, Size: 626 bytes --]
Hi,
could someone give this scenario a try (requires my recent patch series)
and tell me if you are also seeing excessive latencies:
Start:
irqloop (+xeno_irqbench) -P 99 -t 0
latency -p 200 -P 50
Terminate:
irqloop
The termination seems to cause high latencies to the (then highest-prio)
periodic timer test. This does not happen with irqloop -t 1
(kernel-based task), and I can reduce the effect by invoking
pthread_setschedparam(SCHED_NORMAL) for the irqloop test thread right
before termination. Also, running and terminating another latency
instance with prio 99 does not have this effect.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-core] Prio-inversion on cleanup?
2006-06-27 16:44 [Xenomai-core] Prio-inversion on cleanup? Jan Kiszka
@ 2006-06-28 14:45 ` Jan Kiszka
2006-06-29 9:24 ` Jan Kiszka
0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2006-06-28 14:45 UTC (permalink / raw)
To: xenomai-core
[-- Attachment #1: Type: text/plain, Size: 11463 bytes --]
Jan Kiszka wrote:
> Hi,
>
> could someone give this scenario a try (requires my recent patch series)
> and tell me if you are also seeing excessive latencies:
>
> Start:
> irqloop (+xeno_irqbench) -P 99 -t 0
> latency -p 200 -P 50
>
> Terminate:
> irqloop
>
> The termination seems to cause high latencies to the (then highest-prio)
> periodic timer test. This does not happen with irqloop -t 1
> (kernel-based task), and I can reduce the effect by invoking
> pthread_setschedparam(SCHED_NORMAL) for the irqloop test thread right
> before termination. Also, running and terminating another latency
> instance with prio 99 does not have this effect.
>
I extended the tracer with support for the new service
ipipe_trace_pid(pid, priority);
and recorded the following on irqloop cleanup:
> : func -11679+ 2.661 sys_rtdm_close+0x8 (losyscall_event+0xa3)
> : func -11677+ 4.045 _rtdm_close+0xe (sys_rtdm_close+0x11)
> :| * func -11673+ 4.601 __ipipe_restore_pipeline_head+0x8 (_rtdm_close+0x8c)
> : func -11668+ 1.323 rt_irqbench_close+0x9 [xeno_irqbench] (_rtdm_close+0xd1)
> : func -11667+ 2.345 rt_irqbench_stop+0x9 [xeno_irqbench] (rt_irqbench_close+0x21 [xeno_irqbench])
> : func -11664+ 3.383 _rtdm_synch_flush+0xa (rt_irqbench_close+0x2e [xeno_irqbench])
> :| * func -11661+ 4.751 xnsynch_flush+0xe (_rtdm_synch_flush+0x3b)
> :| * func -11656+ 3.894 xnpod_resume_thread+0xe (xnsynch_flush+0x76)
> :| * func -11652+ 1.503 xnpod_schedule+0xe (_rtdm_synch_flush+0x45)
> :| * func -11651+ 2.932 ipipe_trigger_irq+0xc (xnpod_schedule+0x29)
> :| * func -11648+ 2.375 memcpy+0xe (ipipe_trigger_irq+0x4a)
> :| * func -11645+ 3.270 __ipipe_handle_irq+0xe (ipipe_trigger_irq+0x4f)
> :| * func -11642+ 3.436 __ipipe_dispatch_wired+0xe (__ipipe_handle_irq+0x8a)
> :| * func -11639+ 2.345 __ipipe_restore_pipeline_head+0x8 (_rtdm_synch_flush+0x5e)
> :| func -11636+ 1.714 __ipipe_walk_pipeline+0xe (__ipipe_restore_pipeline_head+0x67)
> :| func -11635+ 1.624 ipipe_suspend_domain+0xb (__ipipe_walk_pipeline+0x46)
> :| func -11633+ 3.864 __ipipe_sync_stage+0xe (ipipe_suspend_domain+0x47)
> :| * func -11629+ 1.864 xnpod_schedule_handler+0x8 (__ipipe_sync_stage+0x115)
> :| * func -11627+ 3.684 xnpod_schedule+0xe (xnpod_schedule_handler+0x17)
> :| * [ 925] -<?>- 0 -11624+ 6.496 xnpod_schedule+0x4f6 (xnpod_schedule_handler+0x17)
> :| * func -11617+ 7.157 __switch_to+0xe (xnpod_schedule+0x612)
> :| * [ 926] -<?>- 99 -11610+ 9.157 xnpod_schedule+0x6e6 (xnpod_suspend_thread+0xed)
> :| * func -11601! 63.413 __ipipe_restore_pipeline_head+0x8 (rtdm_event_timedwait+0xea)
> :| func -11537+ 2.586 __ipipe_handle_irq+0xe (common_interrupt+0x18)
> :| func -11535+ 1.353 __ipipe_ack_common_irq+0xa (__ipipe_handle_irq+0x80)
> :| func -11533+ 2.165 ipipe_test_and_stall_pipeline_from+0x8 (__ipipe_ack_common_irq+0x16)
> :| * func -11531+ 3.218 mask_and_ack_8259A+0xb (__ipipe_ack_common_irq+0x3f)
> :| func -11528+ 2.030 __ipipe_dispatch_wired+0xe (__ipipe_handle_irq+0x8a)
> :| * func -11526+ 2.225 xnintr_clock_handler+0x8 (__ipipe_dispatch_wired+0x7d)
> :| * func -11524+ 2.030 xnintr_irq_handler+0xb (xnintr_clock_handler+0x17)
> :| * func -11522+ 2.345 xnpod_announce_tick+0x8 (xnintr_irq_handler+0x24)
> :| * func -11519+ 2.624 xntimer_do_tick_aperiodic+0xe (xnpod_announce_tick+0xf)
> :| * func -11517+ 1.390 xnthread_periodic_handler+0x8 (xntimer_do_tick_aperiodic+0x7c)
> :| * func -11515+ 6.977 xnpod_resume_thread+0xe (xnthread_periodic_handler+0x1b)
> :| * func -11508+ 3.383 xnpod_schedule+0xe (xnintr_irq_handler+0x5f)
> :| func -11505! 261.413 __ipipe_walk_pipeline+0xe (__ipipe_handle_irq+0x179)
> : func -11244+ 6.150 __ipipe_syscall_root+0x9 (system_call+0x20)
> :| func -11237+ 2.030 __ipipe_handle_irq+0xe (common_interrupt+0x18)
> :| func -11235+ 1.458 __ipipe_ack_common_irq+0xa (__ipipe_handle_irq+0x80)
> :| func -11234+ 1.909 ipipe_test_and_stall_pipeline_from+0x8 (__ipipe_ack_common_irq+0x16)
> :| * func -11232+ 3.172 mask_and_ack_8259A+0xb (__ipipe_ack_common_irq+0x3f)
> :| func -11229+ 2.436 __ipipe_dispatch_wired+0xe (__ipipe_handle_irq+0x8a)
> :| * func -11226+ 1.984 xnintr_clock_handler+0x8 (__ipipe_dispatch_wired+0x7d)
> :| * func -11224+ 2.248 xnintr_irq_handler+0xb (xnintr_clock_handler+0x17)
> :| * func -11222+ 2.248 xnpod_announce_tick+0x8 (xnintr_irq_handler+0x24)
> :| * func -11220+ 3.969 xntimer_do_tick_aperiodic+0xe (xnpod_announce_tick+0xf)
> :| * func -11216+ 6.736 xnthread_periodic_handler+0x8 (xntimer_do_tick_aperiodic+0x7c)
> :| func -11209+ 2.421 __ipipe_walk_pipeline+0xe (__ipipe_handle_irq+0x179)
> : func -11207+ 1.939 __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x55)
> : func -11205+ 2.977 hisyscall_event+0xe (__ipipe_dispatch_event+0x5e)
> : func -11202+ 4.015 xnshadow_relax+0xe (hisyscall_event+0x1ed)
> : func -11198+ 3.218 schedule_linux_call+0xb (xnshadow_relax+0x40)
> :| * func -11195+ 3.984 __ipipe_restore_pipeline_head+0x8 (schedule_linux_call+0x5e)
> : func -11191+ 2.105 rthal_apc_schedule+0x8 (schedule_linux_call+0x68)
> : func -11189+ 5.263 __ipipe_schedule_irq+0xa (rthal_apc_schedule+0x31)
> :| * func -11183+ 6.586 xnpod_schedule_runnable+0xe (xnshadow_relax+0x87)
> :| * func -11177+ 3.473 xnpod_suspend_thread+0xe (xnshadow_relax+0xb2)
> :| * func -11173+ 3.819 xnpod_schedule+0xe (xnpod_suspend_thread+0xed)
> :| * [ 926] -<?>- 99 -11169+ 4.812 xnpod_schedule+0x4f6 (xnpod_suspend_thread+0xed)
> :| * func -11165+ 6.060 __switch_to+0xe (xnpod_schedule+0x612)
> :| * [ 925] -<?>- 99 -11159+ 3.714 xnpod_schedule+0x6e6 (xnpod_schedule_handler+0x17)
> :| func -11155+ 5.263 __ipipe_sync_stage+0xe (ipipe_suspend_domain+0x47)
> : *func -11150+ 3.383 rthal_apc_handler+0x8 (__ipipe_sync_stage+0xf2)
> : *func -11146+ 6.338 lostage_handler+0xa (rthal_apc_handler+0x2b)
> : *func -11140+ 1.466 wake_up_process+0x8 (lostage_handler+0xac)
> : *func -11138+ 1.849 try_to_wake_up+0xe (wake_up_process+0x14)
> : *func -11137+ 3.518 __ipipe_test_and_stall_root+0x8 (try_to_wake_up+0x1a)
> : *func -11133+ 3.954 sched_clock+0xa (try_to_wake_up+0x74)
> : *func -11129+ 2.481 enqueue_task+0xa (try_to_wake_up+0xc7)
> : *func -11127+ 2.015 __ipipe_restore_root+0x8 (try_to_wake_up+0x106)
> : *func -11125+ 3.729 preempt_schedule+0xb (try_to_wake_up+0x13d)
> :| *func -11121+ 2.751 __ipipe_unstall_iret_root+0x8 (restore_raw+0x0)
> : func -11118+ 7.488 irq_exit+0x8 (__ipipe_sync_stage+0xff)
> :| * func -11111+ 3.278 cleanup_instance+0xa (_rtdm_close+0x134)
> :| * func -11107+ 3.503 __ipipe_restore_pipeline_head+0x8 (cleanup_instance+0x4a)
> : func -11104+ 1.248 kfree+0xe (cleanup_instance+0x6d)
> : func -11103+ 1.924 __ipipe_test_and_stall_root+0x8 (kfree+0x1a)
> : *func -11101+ 4.060 kfree_debugcheck+0xa (kfree+0x27)
> : *func -11097+ 1.849 check_irq_off+0x8 (kfree+0x4c)
> : *func -11095+ 2.180 __ipipe_test_root+0x8 (check_irq_off+0xd)
> : *func -11093+ 1.533 cache_free_debugcheck+0xe (kfree+0x59)
> : *func -11091+ 2.541 kfree_debugcheck+0xa (cache_free_debugcheck+0x28)
> : *func -11089+ 2.165 dbg_redzone1+0x8 (cache_free_debugcheck+0x9f)
> : *func -11086+ 2.473 dbg_redzone2+0x8 (cache_free_debugcheck+0xb0)
> : *func -11084+ 1.406 dbg_redzone1+0x8 (cache_free_debugcheck+0xf7)
> : *func -11082+ 2.105 dbg_redzone2+0x8 (cache_free_debugcheck+0x106)
> : *func -11080+ 3.924 dbg_userword+0x8 (cache_free_debugcheck+0x11b)
> : *func -11076! 56.819 poison_obj+0xd (cache_free_debugcheck+0x197)
> : *func -11020+ 1.496 __ipipe_restore_root+0x8 (kfree+0x93)
> : *func -11018+ 6.631 __ipipe_unstall_root+0x8 (__ipipe_restore_root+0x2b)
> : func -11012+ 2.060 __ipipe_stall_root+0x8 (syscall_exit+0x5)
> : *func -11009+ 2.458 schedule+0xe (work_resched+0x6)
> : *func -11007+ 2.699 profile_hit+0x9 (schedule+0x6c)
> : *func -11004+ 3.518 sched_clock+0xa (schedule+0x113)
> : *func -11001+ 7.353 __ipipe_stall_root+0x8 (schedule+0x191)
> : *func -10993+ 2.834 __ipipe_dispatch_event+0xe (schedule+0x42f)
> : *func -10991+ 5.075 schedule_event+0xb (__ipipe_dispatch_event+0x5e)
> :| *func -10986+ 5.413 __switch_to+0xe (xnpod_schedule+0x612)
> :| *[ 925] -<?>- 99 -10980+ 4.120 xnpod_schedule+0x6e6 (xnpod_suspend_thread+0xed)
> :| **func -10976+ 3.947 __ipipe_restore_pipeline_head+0x8 (xnshadow_relax+0xd0)
> : *func -10972+ 1.330 ipipe_reenter_root+0xe (xnshadow_relax+0xf1)
> : *func -10971+ 6.796 __ipipe_unstall_root+0x8 (ipipe_reenter_root+0x2e)
> : func -10964+ 4.240 losyscall_event+0xe (__ipipe_dispatch_event+0x5e)
> : func -10960+ 3.443 sys_open+0x8 (syscall_call+0x7)
Explanation of the recorded pid entries before and after __switch_to():
[<pid>] <task name> <priority>
Here 925 is the main thread of irqloop, 926 the irq user-mode pthread.
The pids are resolved on trace printing. And as the process terminated,
we do not see their names anymore.
Regarding the trace:
The pthread is blocked on the irqbench device ioctl. On hitting ^C,
close() is invoked from the main thread for that device. The pthread is
woken up and obviously relaxed on some linux syscall (after being
interrupted twice by the periodic timer event of a "latency -p 300 -P
50" instance). This passes the control over to the main thread while
keeping the pthread prio of 99. And this prio seems to survive for the
following 11 ms (full trace available on request).
Any ideas what's going on?
Jan
PS: ipipe-tracer and xenomai-core patch for pid/context tracing will
follow later.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-core] Prio-inversion on cleanup?
2006-06-28 14:45 ` Jan Kiszka
@ 2006-06-29 9:24 ` Jan Kiszka
2006-06-29 10:14 ` Philippe Gerum
2006-06-29 12:27 ` Gilles Chanteperdrix
0 siblings, 2 replies; 11+ messages in thread
From: Jan Kiszka @ 2006-06-29 9:24 UTC (permalink / raw)
To: xenomai-core
[-- Attachment #1: Type: text/plain, Size: 2627 bytes --]
Jan Kiszka wrote:
> ...
> The pthread is blocked on the irqbench device ioctl. On hitting ^C,
> close() is invoked from the main thread for that device. The pthread is
> woken up and obviously relaxed on some linux syscall (after being
> interrupted twice by the periodic timer event of a "latency -p 300 -P
> 50" instance). This passes the control over to the main thread while
> keeping the pthread prio of 99. And this prio seems to survive for the
> following 11 ms (full trace available on request).
>
> Any ideas what's going on?
>
Ok, I think I finally understood the issue. It seems to lie deep in the
POSIX user-space lib, specifically its use of standard pthread services.
Let me first clarify my scenario:
A high-prio pthread of known and (theoretically) bounded workload shall
be started and stopped while a low-prio thread is already running. The
low-prio thread shall only be impacted by the real workload of the
high-prio one, not by its creation/destruction - at least not
significantly. To achieve this with the POSIX skin (actually this
applies to preempt-rt in theory as well), I have to create the thread
under SCHED_OTHER, raise its priority right before entering the
workload, and lower it again before leaving the thread.
But, unfortunately, __wrap_pthread_setschedparam() depends on some
real_pthread functions to be called. One of them is
__real_pthread_setschedparam, and this one issues a linux syscall for
obvious reasons. When lowering the thread to SCHED_OTHER, this syscall
is still issued under the original priority. And here we get bitten by
the prio-inheritance feature of the nucleus which, in my case, lets
significant parts of standard Linux execute under high priority,
delaying my other real-time threads.
Now I wonder how to resolve this, how to make pthread_setschedparam (a
rather central RT-service) really real-time safe? I would say we need
something like a lazy schedparam propagation to Linux which only takes
place when the thread enters secondary mode intentionally or no other RT
thread is ready. But I do not have a design for this at hand. Nasty.
[My preferred way for every setup != CONFIG_PREEMPT_RT + CONFIG_XENOMAI
would still be to switch this prio-inheritance off for the root thread.
But this was nack'ed by Philippe several times before... ;)]
Side note: the native skin does not seem to suffer from this effect as
it only tracks the current prio at Xenomai level.
Jan
PS: Gilles, what about marking those services of the POSIX in the doc
that may issue a linux syscall (and under which conditions)?
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-core] Prio-inversion on cleanup?
2006-06-29 9:24 ` Jan Kiszka
@ 2006-06-29 10:14 ` Philippe Gerum
2006-06-29 10:34 ` Jan Kiszka
2006-06-29 12:27 ` Gilles Chanteperdrix
1 sibling, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2006-06-29 10:14 UTC (permalink / raw)
To: Jan Kiszka; +Cc: xenomai-core
On Thu, 2006-06-29 at 11:24 +0200, Jan Kiszka wrote:
> Jan Kiszka wrote:
> > ...
> > The pthread is blocked on the irqbench device ioctl. On hitting ^C,
> > close() is invoked from the main thread for that device. The pthread is
> > woken up and obviously relaxed on some linux syscall (after being
> > interrupted twice by the periodic timer event of a "latency -p 300 -P
> > 50" instance). This passes the control over to the main thread while
> > keeping the pthread prio of 99. And this prio seems to survive for the
> > following 11 ms (full trace available on request).
> >
> > Any ideas what's going on?
> >
>
> Ok, I think I finally understood the issue. It seems to lie deep in the
> POSIX user-space lib, specifically its use of standard pthread services.
> Let me first clarify my scenario:
>
> A high-prio pthread of known and (theoretically) bounded workload shall
> be started and stopped while a low-prio thread is already running. The
> low-prio thread shall only be impacted by the real workload of the
> high-prio one, not by its creation/destruction - at least not
> significantly. To achieve this with the POSIX skin (actually this
> applies to preempt-rt in theory as well), I have to create the thread
> under SCHED_OTHER, raise its priority right before entering the
> workload, and lower it again before leaving the thread.
>
> But, unfortunately, __wrap_pthread_setschedparam() depends on some
> real_pthread functions to be called. One of them is
> __real_pthread_setschedparam, and this one issues a linux syscall for
> obvious reasons. When lowering the thread to SCHED_OTHER, this syscall
> is still issued under the original priority. And here we get bitten by
> the prio-inheritance feature of the nucleus which, in my case, lets
> significant parts of standard Linux execute under high priority,
> delaying my other real-time threads.
>
> Now I wonder how to resolve this, how to make pthread_setschedparam (a
> rather central RT-service) really real-time safe? I would say we need
> something like a lazy schedparam propagation to Linux which only takes
> place when the thread enters secondary mode intentionally or no other RT
> thread is ready. But I do not have a design for this at hand. Nasty.
>
> [My preferred way for every setup != CONFIG_PREEMPT_RT + CONFIG_XENOMAI
> would still be to switch this prio-inheritance off for the root thread.
> But this was nack'ed by Philippe several times before... ;)]
>
I nacked the proposal to _always_ switch it off. Some applications
deeply need this.
> Side note: the native skin does not seem to suffer from this effect as
> it only tracks the current prio at Xenomai level.
>
Switching off priority adjustment for the root thread before moving a
SCHED_FIFO shadow to SCHED_OTHER would prevent this side-effect. We'd
need to add a per-thread status bit to check whether we should run
xnpod_renice_root() or not for any given thread, and switch it on/off
from __wrap_pthread_setschedparam.
--
Philippe.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-core] Prio-inversion on cleanup?
2006-06-29 10:14 ` Philippe Gerum
@ 2006-06-29 10:34 ` Jan Kiszka
2006-06-29 10:48 ` Philippe Gerum
0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2006-06-29 10:34 UTC (permalink / raw)
To: rpm; +Cc: xenomai-core
[-- Attachment #1: Type: text/plain, Size: 3639 bytes --]
Philippe Gerum wrote:
> On Thu, 2006-06-29 at 11:24 +0200, Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>> ...
>>> The pthread is blocked on the irqbench device ioctl. On hitting ^C,
>>> close() is invoked from the main thread for that device. The pthread is
>>> woken up and obviously relaxed on some linux syscall (after being
>>> interrupted twice by the periodic timer event of a "latency -p 300 -P
>>> 50" instance). This passes the control over to the main thread while
>>> keeping the pthread prio of 99. And this prio seems to survive for the
>>> following 11 ms (full trace available on request).
>>>
>>> Any ideas what's going on?
>>>
>> Ok, I think I finally understood the issue. It seems to lie deep in the
>> POSIX user-space lib, specifically its use of standard pthread services.
>> Let me first clarify my scenario:
>>
>> A high-prio pthread of known and (theoretically) bounded workload shall
>> be started and stopped while a low-prio thread is already running. The
>> low-prio thread shall only be impacted by the real workload of the
>> high-prio one, not by its creation/destruction - at least not
>> significantly. To achieve this with the POSIX skin (actually this
>> applies to preempt-rt in theory as well), I have to create the thread
>> under SCHED_OTHER, raise its priority right before entering the
>> workload, and lower it again before leaving the thread.
>>
>> But, unfortunately, __wrap_pthread_setschedparam() depends on some
>> real_pthread functions to be called. One of them is
>> __real_pthread_setschedparam, and this one issues a linux syscall for
>> obvious reasons. When lowering the thread to SCHED_OTHER, this syscall
>> is still issued under the original priority. And here we get bitten by
>> the prio-inheritance feature of the nucleus which, in my case, lets
>> significant parts of standard Linux execute under high priority,
>> delaying my other real-time threads.
>>
>> Now I wonder how to resolve this, how to make pthread_setschedparam (a
>> rather central RT-service) really real-time safe? I would say we need
>> something like a lazy schedparam propagation to Linux which only takes
>> place when the thread enters secondary mode intentionally or no other RT
>> thread is ready. But I do not have a design for this at hand. Nasty.
>>
>> [My preferred way for every setup != CONFIG_PREEMPT_RT + CONFIG_XENOMAI
>> would still be to switch this prio-inheritance off for the root thread.
>> But this was nack'ed by Philippe several times before... ;)]
>>
>
> I nacked the proposal to _always_ switch it off. Some applications
> deeply need this.
I think to remember asking for a CONFIG switch here. Some applications
actually benefit while others (I even think most) do not need it or even
easily screw themselves up during init/cleanup. You know, my old
concerns. :)
>
>> Side note: the native skin does not seem to suffer from this effect as
>> it only tracks the current prio at Xenomai level.
>>
>
> Switching off priority adjustment for the root thread before moving a
> SCHED_FIFO shadow to SCHED_OTHER would prevent this side-effect. We'd
> need to add a per-thread status bit to check whether we should run
> xnpod_renice_root() or not for any given thread, and switch it on/off
> from __wrap_pthread_setschedparam.
>
This doesn't sound bad and would probably help low-prio threads also in
some other scenarios.
Nevertheless, a syscall-less pthread_setschedparam would still be a good
idea as well, this time having the caller in mind who wishes to change
its priority without entering Linux.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-core] Prio-inversion on cleanup?
2006-06-29 10:34 ` Jan Kiszka
@ 2006-06-29 10:48 ` Philippe Gerum
2006-06-29 11:12 ` Philippe Gerum
2006-06-29 13:24 ` Philippe Gerum
0 siblings, 2 replies; 11+ messages in thread
From: Philippe Gerum @ 2006-06-29 10:48 UTC (permalink / raw)
To: Jan Kiszka; +Cc: xenomai-core
On Thu, 2006-06-29 at 12:34 +0200, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > On Thu, 2006-06-29 at 11:24 +0200, Jan Kiszka wrote:
> >> Jan Kiszka wrote:
> >>> ...
> >>> The pthread is blocked on the irqbench device ioctl. On hitting ^C,
> >>> close() is invoked from the main thread for that device. The pthread is
> >>> woken up and obviously relaxed on some linux syscall (after being
> >>> interrupted twice by the periodic timer event of a "latency -p 300 -P
> >>> 50" instance). This passes the control over to the main thread while
> >>> keeping the pthread prio of 99. And this prio seems to survive for the
> >>> following 11 ms (full trace available on request).
> >>>
> >>> Any ideas what's going on?
> >>>
> >> Ok, I think I finally understood the issue. It seems to lie deep in the
> >> POSIX user-space lib, specifically its use of standard pthread services.
> >> Let me first clarify my scenario:
> >>
> >> A high-prio pthread of known and (theoretically) bounded workload shall
> >> be started and stopped while a low-prio thread is already running. The
> >> low-prio thread shall only be impacted by the real workload of the
> >> high-prio one, not by its creation/destruction - at least not
> >> significantly. To achieve this with the POSIX skin (actually this
> >> applies to preempt-rt in theory as well), I have to create the thread
> >> under SCHED_OTHER, raise its priority right before entering the
> >> workload, and lower it again before leaving the thread.
> >>
> >> But, unfortunately, __wrap_pthread_setschedparam() depends on some
> >> real_pthread functions to be called. One of them is
> >> __real_pthread_setschedparam, and this one issues a linux syscall for
> >> obvious reasons. When lowering the thread to SCHED_OTHER, this syscall
> >> is still issued under the original priority. And here we get bitten by
> >> the prio-inheritance feature of the nucleus which, in my case, lets
> >> significant parts of standard Linux execute under high priority,
> >> delaying my other real-time threads.
> >>
> >> Now I wonder how to resolve this, how to make pthread_setschedparam (a
> >> rather central RT-service) really real-time safe? I would say we need
> >> something like a lazy schedparam propagation to Linux which only takes
> >> place when the thread enters secondary mode intentionally or no other RT
> >> thread is ready. But I do not have a design for this at hand. Nasty.
> >>
> >> [My preferred way for every setup != CONFIG_PREEMPT_RT + CONFIG_XENOMAI
> >> would still be to switch this prio-inheritance off for the root thread.
> >> But this was nack'ed by Philippe several times before... ;)]
> >>
> >
> > I nacked the proposal to _always_ switch it off. Some applications
> > deeply need this.
>
> I think to remember asking for a CONFIG switch here. Some applications
> actually benefit while others (I even think most) do not need it or even
> easily screw themselves up during init/cleanup. You know, my old
> concerns. :)
>
A dynamic switch is better there. You may want this behaviour to be
settable on a thread-by-thread basis.
> >
> >> Side note: the native skin does not seem to suffer from this effect as
> >> it only tracks the current prio at Xenomai level.
> >>
> >
> > Switching off priority adjustment for the root thread before moving a
> > SCHED_FIFO shadow to SCHED_OTHER would prevent this side-effect. We'd
> > need to add a per-thread status bit to check whether we should run
> > xnpod_renice_root() or not for any given thread, and switch it on/off
> > from __wrap_pthread_setschedparam.
> >
>
> This doesn't sound bad and would probably help low-prio threads also in
> some other scenarios.
>
I'm currently implementing that at nucleus level.
> Nevertheless, a syscall-less pthread_setschedparam would still be a good
> idea as well, this time having the caller in mind who wishes to change
> its priority without entering Linux.
Reading the comment Gilles put there, it's likely not possible to have a
syscall-less implementation on top of the NPTL. We need to give a chance
to the NPTL to track the priority update, otherwise,
pthread_getschedparam() is going to break.
>
> Jan
>
--
Philippe.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-core] Prio-inversion on cleanup?
2006-06-29 10:48 ` Philippe Gerum
@ 2006-06-29 11:12 ` Philippe Gerum
2006-06-29 11:20 ` Jan Kiszka
2006-06-29 13:24 ` Philippe Gerum
1 sibling, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2006-06-29 11:12 UTC (permalink / raw)
To: Jan Kiszka; +Cc: xenomai-core
On Thu, 2006-06-29 at 12:48 +0200, Philippe Gerum wrote:
> On Thu, 2006-06-29 at 12:34 +0200, Jan Kiszka wrote:
> > Philippe Gerum wrote:
> > > On Thu, 2006-06-29 at 11:24 +0200, Jan Kiszka wrote:
> > >> Jan Kiszka wrote:
> > >>> ...
> > >>> The pthread is blocked on the irqbench device ioctl. On hitting ^C,
> > >>> close() is invoked from the main thread for that device. The pthread is
> > >>> woken up and obviously relaxed on some linux syscall (after being
> > >>> interrupted twice by the periodic timer event of a "latency -p 300 -P
> > >>> 50" instance). This passes the control over to the main thread while
> > >>> keeping the pthread prio of 99. And this prio seems to survive for the
> > >>> following 11 ms (full trace available on request).
> > >>>
> > >>> Any ideas what's going on?
> > >>>
> > >> Ok, I think I finally understood the issue. It seems to lie deep in the
> > >> POSIX user-space lib, specifically its use of standard pthread services.
> > >> Let me first clarify my scenario:
> > >>
> > >> A high-prio pthread of known and (theoretically) bounded workload shall
> > >> be started and stopped while a low-prio thread is already running. The
> > >> low-prio thread shall only be impacted by the real workload of the
> > >> high-prio one, not by its creation/destruction - at least not
> > >> significantly. To achieve this with the POSIX skin (actually this
> > >> applies to preempt-rt in theory as well), I have to create the thread
> > >> under SCHED_OTHER, raise its priority right before entering the
> > >> workload, and lower it again before leaving the thread.
> > >>
> > >> But, unfortunately, __wrap_pthread_setschedparam() depends on some
> > >> real_pthread functions to be called. One of them is
> > >> __real_pthread_setschedparam, and this one issues a linux syscall for
> > >> obvious reasons. When lowering the thread to SCHED_OTHER, this syscall
> > >> is still issued under the original priority. And here we get bitten by
> > >> the prio-inheritance feature of the nucleus which, in my case, lets
> > >> significant parts of standard Linux execute under high priority,
> > >> delaying my other real-time threads.
> > >>
> > >> Now I wonder how to resolve this, how to make pthread_setschedparam (a
> > >> rather central RT-service) really real-time safe? I would say we need
> > >> something like a lazy schedparam propagation to Linux which only takes
> > >> place when the thread enters secondary mode intentionally or no other RT
> > >> thread is ready. But I do not have a design for this at hand. Nasty.
> > >>
> > >> [My preferred way for every setup != CONFIG_PREEMPT_RT + CONFIG_XENOMAI
> > >> would still be to switch this prio-inheritance off for the root thread.
> > >> But this was nack'ed by Philippe several times before... ;)]
> > >>
> > >
> > > I nacked the proposal to _always_ switch it off. Some applications
> > > deeply need this.
> >
> > I think to remember asking for a CONFIG switch here. Some applications
> > actually benefit while others (I even think most) do not need it or even
> > easily screw themselves up during init/cleanup. You know, my old
> > concerns. :)
> >
>
> A dynamic switch is better there. You may want this behaviour to be
> settable on a thread-by-thread basis.
>
Actually, we could have both, dynamic and static switches, just like the
interrupt shield.
> > >
> > >> Side note: the native skin does not seem to suffer from this effect as
> > >> it only tracks the current prio at Xenomai level.
> > >>
> > >
> > > Switching off priority adjustment for the root thread before moving a
> > > SCHED_FIFO shadow to SCHED_OTHER would prevent this side-effect. We'd
> > > need to add a per-thread status bit to check whether we should run
> > > xnpod_renice_root() or not for any given thread, and switch it on/off
> > > from __wrap_pthread_setschedparam.
> > >
> >
> > This doesn't sound bad and would probably help low-prio threads also in
> > some other scenarios.
> >
>
> I'm currently implementing that at nucleus level.
>
> > Nevertheless, a syscall-less pthread_setschedparam would still be a good
> > idea as well, this time having the caller in mind who wishes to change
> > its priority without entering Linux.
>
> Reading the comment Gilles put there, it's likely not possible to have a
> syscall-less implementation on top of the NPTL. We need to give a chance
> to the NPTL to track the priority update, otherwise,
> pthread_getschedparam() is going to break.
>
> >
> > Jan
> >
--
Philippe.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-core] Prio-inversion on cleanup?
2006-06-29 11:12 ` Philippe Gerum
@ 2006-06-29 11:20 ` Jan Kiszka
0 siblings, 0 replies; 11+ messages in thread
From: Jan Kiszka @ 2006-06-29 11:20 UTC (permalink / raw)
To: rpm; +Cc: xenomai-core
[-- Attachment #1: Type: text/plain, Size: 5098 bytes --]
Philippe Gerum wrote:
> On Thu, 2006-06-29 at 12:48 +0200, Philippe Gerum wrote:
>> On Thu, 2006-06-29 at 12:34 +0200, Jan Kiszka wrote:
>>> Philippe Gerum wrote:
>>>> On Thu, 2006-06-29 at 11:24 +0200, Jan Kiszka wrote:
>>>>> Jan Kiszka wrote:
>>>>>> ...
>>>>>> The pthread is blocked on the irqbench device ioctl. On hitting ^C,
>>>>>> close() is invoked from the main thread for that device. The pthread is
>>>>>> woken up and obviously relaxed on some linux syscall (after being
>>>>>> interrupted twice by the periodic timer event of a "latency -p 300 -P
>>>>>> 50" instance). This passes the control over to the main thread while
>>>>>> keeping the pthread prio of 99. And this prio seems to survive for the
>>>>>> following 11 ms (full trace available on request).
>>>>>>
>>>>>> Any ideas what's going on?
>>>>>>
>>>>> Ok, I think I finally understood the issue. It seems to lie deep in the
>>>>> POSIX user-space lib, specifically its use of standard pthread services.
>>>>> Let me first clarify my scenario:
>>>>>
>>>>> A high-prio pthread of known and (theoretically) bounded workload shall
>>>>> be started and stopped while a low-prio thread is already running. The
>>>>> low-prio thread shall only be impacted by the real workload of the
>>>>> high-prio one, not by its creation/destruction - at least not
>>>>> significantly. To achieve this with the POSIX skin (actually this
>>>>> applies to preempt-rt in theory as well), I have to create the thread
>>>>> under SCHED_OTHER, raise its priority right before entering the
>>>>> workload, and lower it again before leaving the thread.
>>>>>
>>>>> But, unfortunately, __wrap_pthread_setschedparam() depends on some
>>>>> real_pthread functions to be called. One of them is
>>>>> __real_pthread_setschedparam, and this one issues a linux syscall for
>>>>> obvious reasons. When lowering the thread to SCHED_OTHER, this syscall
>>>>> is still issued under the original priority. And here we get bitten by
>>>>> the prio-inheritance feature of the nucleus which, in my case, lets
>>>>> significant parts of standard Linux execute under high priority,
>>>>> delaying my other real-time threads.
>>>>>
>>>>> Now I wonder how to resolve this, how to make pthread_setschedparam (a
>>>>> rather central RT-service) really real-time safe? I would say we need
>>>>> something like a lazy schedparam propagation to Linux which only takes
>>>>> place when the thread enters secondary mode intentionally or no other RT
>>>>> thread is ready. But I do not have a design for this at hand. Nasty.
>>>>>
>>>>> [My preferred way for every setup != CONFIG_PREEMPT_RT + CONFIG_XENOMAI
>>>>> would still be to switch this prio-inheritance off for the root thread.
>>>>> But this was nack'ed by Philippe several times before... ;)]
>>>>>
>>>> I nacked the proposal to _always_ switch it off. Some applications
>>>> deeply need this.
>>> I think to remember asking for a CONFIG switch here. Some applications
>>> actually benefit while others (I even think most) do not need it or even
>>> easily screw themselves up during init/cleanup. You know, my old
>>> concerns. :)
>>>
>> A dynamic switch is better there. You may want this behaviour to be
>> settable on a thread-by-thread basis.
>>
>
> Actually, we could have both, dynamic and static switches, just like the
> interrupt shield.
>
Hurray! (Otherwise, I would have kicked off a discussion about the
default setting ;-))
>>>>> Side note: the native skin does not seem to suffer from this effect as
>>>>> it only tracks the current prio at Xenomai level.
>>>>>
>>>> Switching off priority adjustment for the root thread before moving a
>>>> SCHED_FIFO shadow to SCHED_OTHER would prevent this side-effect. We'd
>>>> need to add a per-thread status bit to check whether we should run
>>>> xnpod_renice_root() or not for any given thread, and switch it on/off
>>>> from __wrap_pthread_setschedparam.
>>>>
>>> This doesn't sound bad and would probably help low-prio threads also in
>>> some other scenarios.
>>>
>> I'm currently implementing that at nucleus level.
>>
>>> Nevertheless, a syscall-less pthread_setschedparam would still be a good
>>> idea as well, this time having the caller in mind who wishes to change
>>> its priority without entering Linux.
>> Reading the comment Gilles put there, it's likely not possible to have a
>> syscall-less implementation on top of the NPTL. We need to give a chance
>> to the NPTL to track the priority update, otherwise,
>> pthread_getschedparam() is going to break.
>>
Yep, saw that. I would start thinking about what we gain by wrapping
that service as well (e.g. by mirroring the state in a per-__thread
variable). Would likely work as long as the thread stays in primary
mode, but we still need to propagate the setting when entering secondary
mode. Maybe some soft-IRQ to Linux can help here. It could apply the
scheduler change on Linux re-entry.
Well, I'm probably overseeing tons of pitfalls, or all this becomes
terribly complicated.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-core] Prio-inversion on cleanup?
2006-06-29 9:24 ` Jan Kiszka
2006-06-29 10:14 ` Philippe Gerum
@ 2006-06-29 12:27 ` Gilles Chanteperdrix
1 sibling, 0 replies; 11+ messages in thread
From: Gilles Chanteperdrix @ 2006-06-29 12:27 UTC (permalink / raw)
To: Jan Kiszka; +Cc: xenomai-core
Jan Kiszka wrote:
> (..)
> Now I wonder how to resolve this, how to make pthread_setschedparam (a
> rather central RT-service) really real-time safe? I would say we need
> something like a lazy schedparam propagation to Linux which only takes
> place when the thread enters secondary mode intentionally or no other RT
> thread is ready. But I do not have a design for this at hand. Nasty.
A simple solution is to wrap pthread_getschedparam as well. It would
fallback to __real_pthread_getschedparam when the xenomai posix skin
syscall returns -ESRCH. This would allow to remove the offending call to
__real_pthread_setschedparam in __wrap_pthread_setschedparam.
>
> [My preferred way for every setup != CONFIG_PREEMPT_RT + CONFIG_XENOMAI
> would still be to switch this prio-inheritance off for the root thread.
> But this was nack'ed by Philippe several times before... ;)]
>
> Side note: the native skin does not seem to suffer from this effect as
> it only tracks the current prio at Xenomai level.
>
> Jan
>
>
> PS: Gilles, what about marking those services of the POSIX in the doc
> that may issue a linux syscall (and under which conditions)?
Actually, syscalls that may be called only from a particular context or
that cause migration of their caller have a paragraph entitled "Valid
contexts", if pthread_setschedparam lack this paragraph, it is probably
because its implementation changed without its documentation.
--
Gilles Chanteperdrix.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-core] Prio-inversion on cleanup?
2006-06-29 10:48 ` Philippe Gerum
2006-06-29 11:12 ` Philippe Gerum
@ 2006-06-29 13:24 ` Philippe Gerum
2006-06-29 16:03 ` Gilles Chanteperdrix
1 sibling, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2006-06-29 13:24 UTC (permalink / raw)
To: Jan Kiszka; +Cc: xenomai-core
On Thu, 2006-06-29 at 12:48 +0200, Philippe Gerum wrote:
> > > Switching off priority adjustment for the root thread before moving a
> > > SCHED_FIFO shadow to SCHED_OTHER would prevent this side-effect. We'd
> > > need to add a per-thread status bit to check whether we should run
> > > xnpod_renice_root() or not for any given thread, and switch it on/off
> > > from __wrap_pthread_setschedparam.
> > >
> >
> > This doesn't sound bad and would probably help low-prio threads also in
> > some other scenarios.
> >
>
> I'm currently implementing that at nucleus level.
The priority coupling switch is in place now, the static config one is
called CONFIG_XENO_OPT_RPIDISABLE, and a dynamic flag as been added to
the xnthread status mask, namely XNRPIOFF (Root [thread] PI off). I've
only added the required support to control priority coupling between
both Xenomai and Linux schedulers, but refrained from choosing any
policy regarding how we are going to use it in the POSIX skin to solve
the pthread_setschedpram issue. I guess that more brain cycles are need
there, and I'm cowardly leaving this to Gilles.
--
Philippe.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Xenomai-core] Prio-inversion on cleanup?
2006-06-29 13:24 ` Philippe Gerum
@ 2006-06-29 16:03 ` Gilles Chanteperdrix
0 siblings, 0 replies; 11+ messages in thread
From: Gilles Chanteperdrix @ 2006-06-29 16:03 UTC (permalink / raw)
To: rpm; +Cc: xenomai
Philippe Gerum wrote:
> On Thu, 2006-06-29 at 12:48 +0200, Philippe Gerum wrote:
> > > > Switching off priority adjustment for the root thread before moving a
> > > > SCHED_FIFO shadow to SCHED_OTHER would prevent this side-effect. We'd
> > > > need to add a per-thread status bit to check whether we should run
> > > > xnpod_renice_root() or not for any given thread, and switch it on/off
> > > > from __wrap_pthread_setschedparam.
> > > >
> > >
> > > This doesn't sound bad and would probably help low-prio threads also in
> > > some other scenarios.
> > >
> >
> > I'm currently implementing that at nucleus level.
>
> The priority coupling switch is in place now, the static config one is
> called CONFIG_XENO_OPT_RPIDISABLE, and a dynamic flag as been added to
> the xnthread status mask, namely XNRPIOFF (Root [thread] PI off). I've
> only added the required support to control priority coupling between
> both Xenomai and Linux schedulers, but refrained from choosing any
> policy regarding how we are going to use it in the POSIX skin to solve
> the pthread_setschedpram issue. I guess that more brain cycles are need
> there, and I'm cowardly leaving this to Gilles.
I have changed __wrap_pthread_setschedparam so that it no longer uses
__real_pthread_setschedparam, and added a wrapper for
pthread_getschedparam. I will add support for this XNRPIOFF bit to
pthread_set_mode_np. This avoid the need for brain cycles.
--
Gilles Chanteperdrix.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-06-29 16:03 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-27 16:44 [Xenomai-core] Prio-inversion on cleanup? Jan Kiszka
2006-06-28 14:45 ` Jan Kiszka
2006-06-29 9:24 ` Jan Kiszka
2006-06-29 10:14 ` Philippe Gerum
2006-06-29 10:34 ` Jan Kiszka
2006-06-29 10:48 ` Philippe Gerum
2006-06-29 11:12 ` Philippe Gerum
2006-06-29 11:20 ` Jan Kiszka
2006-06-29 13:24 ` Philippe Gerum
2006-06-29 16:03 ` Gilles Chanteperdrix
2006-06-29 12:27 ` Gilles Chanteperdrix
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.