All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-core] Prio-inversion on cleanup?
@ 2006-06-27 16:44 Jan Kiszka
  2006-06-28 14:45 ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2006-06-27 16:44 UTC (permalink / raw)
  To: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 626 bytes --]

Hi,

could someone give this scenario a try (requires my recent patch series)
and tell me if you are also seeing excessive latencies:

Start:
irqloop (+xeno_irqbench) -P 99 -t 0
latency -p 200 -P 50

Terminate:
irqloop

The termination seems to cause high latencies to the (then highest-prio)
periodic timer test. This does not happen with irqloop -t 1
(kernel-based task), and I can reduce the effect by invoking
pthread_setschedparam(SCHED_NORMAL) for the irqloop test thread right
before termination. Also, running and terminating another latency
instance with prio 99 does not have this effect.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] Prio-inversion on cleanup?
  2006-06-27 16:44 [Xenomai-core] Prio-inversion on cleanup? Jan Kiszka
@ 2006-06-28 14:45 ` Jan Kiszka
  2006-06-29  9:24   ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2006-06-28 14:45 UTC (permalink / raw)
  To: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 11463 bytes --]

Jan Kiszka wrote:
> Hi,
> 
> could someone give this scenario a try (requires my recent patch series)
> and tell me if you are also seeing excessive latencies:
> 
> Start:
> irqloop (+xeno_irqbench) -P 99 -t 0
> latency -p 200 -P 50
> 
> Terminate:
> irqloop
> 
> The termination seems to cause high latencies to the (then highest-prio)
> periodic timer test. This does not happen with irqloop -t 1
> (kernel-based task), and I can reduce the effect by invoking
> pthread_setschedparam(SCHED_NORMAL) for the irqloop test thread right
> before termination. Also, running and terminating another latency
> instance with prio 99 does not have this effect.
> 

I extended the tracer with support for the new service

	ipipe_trace_pid(pid, priority);

and recorded the following on irqloop cleanup:

> :     func               -11679+   2.661  sys_rtdm_close+0x8 (losyscall_event+0xa3)
> :     func               -11677+   4.045  _rtdm_close+0xe (sys_rtdm_close+0x11)
> :|  * func               -11673+   4.601  __ipipe_restore_pipeline_head+0x8 (_rtdm_close+0x8c)
> :     func               -11668+   1.323  rt_irqbench_close+0x9 [xeno_irqbench] (_rtdm_close+0xd1)
> :     func               -11667+   2.345  rt_irqbench_stop+0x9 [xeno_irqbench] (rt_irqbench_close+0x21 [xeno_irqbench])
> :     func               -11664+   3.383  _rtdm_synch_flush+0xa (rt_irqbench_close+0x2e [xeno_irqbench])
> :|  * func               -11661+   4.751  xnsynch_flush+0xe (_rtdm_synch_flush+0x3b)
> :|  * func               -11656+   3.894  xnpod_resume_thread+0xe (xnsynch_flush+0x76)
> :|  * func               -11652+   1.503  xnpod_schedule+0xe (_rtdm_synch_flush+0x45)
> :|  * func               -11651+   2.932  ipipe_trigger_irq+0xc (xnpod_schedule+0x29)
> :|  * func               -11648+   2.375  memcpy+0xe (ipipe_trigger_irq+0x4a)
> :|  * func               -11645+   3.270  __ipipe_handle_irq+0xe (ipipe_trigger_irq+0x4f)
> :|  * func               -11642+   3.436  __ipipe_dispatch_wired+0xe (__ipipe_handle_irq+0x8a)
> :|  * func               -11639+   2.345  __ipipe_restore_pipeline_head+0x8 (_rtdm_synch_flush+0x5e)
> :|    func               -11636+   1.714  __ipipe_walk_pipeline+0xe (__ipipe_restore_pipeline_head+0x67)
> :|    func               -11635+   1.624  ipipe_suspend_domain+0xb (__ipipe_walk_pipeline+0x46)
> :|    func               -11633+   3.864  __ipipe_sync_stage+0xe (ipipe_suspend_domain+0x47)
> :|  * func               -11629+   1.864  xnpod_schedule_handler+0x8 (__ipipe_sync_stage+0x115)
> :|  * func               -11627+   3.684  xnpod_schedule+0xe (xnpod_schedule_handler+0x17)
> :|  * [  925] -<?>-    0 -11624+   6.496  xnpod_schedule+0x4f6 (xnpod_schedule_handler+0x17)
> :|  * func               -11617+   7.157  __switch_to+0xe (xnpod_schedule+0x612)
> :|  * [  926] -<?>-   99 -11610+   9.157  xnpod_schedule+0x6e6 (xnpod_suspend_thread+0xed)
> :|  * func               -11601!  63.413  __ipipe_restore_pipeline_head+0x8 (rtdm_event_timedwait+0xea)
> :|    func               -11537+   2.586  __ipipe_handle_irq+0xe (common_interrupt+0x18)
> :|    func               -11535+   1.353  __ipipe_ack_common_irq+0xa (__ipipe_handle_irq+0x80)
> :|    func               -11533+   2.165  ipipe_test_and_stall_pipeline_from+0x8 (__ipipe_ack_common_irq+0x16)
> :|  * func               -11531+   3.218  mask_and_ack_8259A+0xb (__ipipe_ack_common_irq+0x3f)
> :|    func               -11528+   2.030  __ipipe_dispatch_wired+0xe (__ipipe_handle_irq+0x8a)
> :|  * func               -11526+   2.225  xnintr_clock_handler+0x8 (__ipipe_dispatch_wired+0x7d)
> :|  * func               -11524+   2.030  xnintr_irq_handler+0xb (xnintr_clock_handler+0x17)
> :|  * func               -11522+   2.345  xnpod_announce_tick+0x8 (xnintr_irq_handler+0x24)
> :|  * func               -11519+   2.624  xntimer_do_tick_aperiodic+0xe (xnpod_announce_tick+0xf)
> :|  * func               -11517+   1.390  xnthread_periodic_handler+0x8 (xntimer_do_tick_aperiodic+0x7c)
> :|  * func               -11515+   6.977  xnpod_resume_thread+0xe (xnthread_periodic_handler+0x1b)
> :|  * func               -11508+   3.383  xnpod_schedule+0xe (xnintr_irq_handler+0x5f)
> :|    func               -11505! 261.413  __ipipe_walk_pipeline+0xe (__ipipe_handle_irq+0x179)
> :     func               -11244+   6.150  __ipipe_syscall_root+0x9 (system_call+0x20)
> :|    func               -11237+   2.030  __ipipe_handle_irq+0xe (common_interrupt+0x18)
> :|    func               -11235+   1.458  __ipipe_ack_common_irq+0xa (__ipipe_handle_irq+0x80)
> :|    func               -11234+   1.909  ipipe_test_and_stall_pipeline_from+0x8 (__ipipe_ack_common_irq+0x16)
> :|  * func               -11232+   3.172  mask_and_ack_8259A+0xb (__ipipe_ack_common_irq+0x3f)
> :|    func               -11229+   2.436  __ipipe_dispatch_wired+0xe (__ipipe_handle_irq+0x8a)
> :|  * func               -11226+   1.984  xnintr_clock_handler+0x8 (__ipipe_dispatch_wired+0x7d)
> :|  * func               -11224+   2.248  xnintr_irq_handler+0xb (xnintr_clock_handler+0x17)
> :|  * func               -11222+   2.248  xnpod_announce_tick+0x8 (xnintr_irq_handler+0x24)
> :|  * func               -11220+   3.969  xntimer_do_tick_aperiodic+0xe (xnpod_announce_tick+0xf)
> :|  * func               -11216+   6.736  xnthread_periodic_handler+0x8 (xntimer_do_tick_aperiodic+0x7c)
> :|    func               -11209+   2.421  __ipipe_walk_pipeline+0xe (__ipipe_handle_irq+0x179)
> :     func               -11207+   1.939  __ipipe_dispatch_event+0xe (__ipipe_syscall_root+0x55)
> :     func               -11205+   2.977  hisyscall_event+0xe (__ipipe_dispatch_event+0x5e)
> :     func               -11202+   4.015  xnshadow_relax+0xe (hisyscall_event+0x1ed)
> :     func               -11198+   3.218  schedule_linux_call+0xb (xnshadow_relax+0x40)
> :|  * func               -11195+   3.984  __ipipe_restore_pipeline_head+0x8 (schedule_linux_call+0x5e)
> :     func               -11191+   2.105  rthal_apc_schedule+0x8 (schedule_linux_call+0x68)
> :     func               -11189+   5.263  __ipipe_schedule_irq+0xa (rthal_apc_schedule+0x31)
> :|  * func               -11183+   6.586  xnpod_schedule_runnable+0xe (xnshadow_relax+0x87)
> :|  * func               -11177+   3.473  xnpod_suspend_thread+0xe (xnshadow_relax+0xb2)
> :|  * func               -11173+   3.819  xnpod_schedule+0xe (xnpod_suspend_thread+0xed)
> :|  * [  926] -<?>-   99 -11169+   4.812  xnpod_schedule+0x4f6 (xnpod_suspend_thread+0xed)
> :|  * func               -11165+   6.060  __switch_to+0xe (xnpod_schedule+0x612)
> :|  * [  925] -<?>-   99 -11159+   3.714  xnpod_schedule+0x6e6 (xnpod_schedule_handler+0x17)
> :|    func               -11155+   5.263  __ipipe_sync_stage+0xe (ipipe_suspend_domain+0x47)
> :    *func               -11150+   3.383  rthal_apc_handler+0x8 (__ipipe_sync_stage+0xf2)
> :    *func               -11146+   6.338  lostage_handler+0xa (rthal_apc_handler+0x2b)
> :    *func               -11140+   1.466  wake_up_process+0x8 (lostage_handler+0xac)
> :    *func               -11138+   1.849  try_to_wake_up+0xe (wake_up_process+0x14)
> :    *func               -11137+   3.518  __ipipe_test_and_stall_root+0x8 (try_to_wake_up+0x1a)
> :    *func               -11133+   3.954  sched_clock+0xa (try_to_wake_up+0x74)
> :    *func               -11129+   2.481  enqueue_task+0xa (try_to_wake_up+0xc7)
> :    *func               -11127+   2.015  __ipipe_restore_root+0x8 (try_to_wake_up+0x106)
> :    *func               -11125+   3.729  preempt_schedule+0xb (try_to_wake_up+0x13d)
> :|   *func               -11121+   2.751  __ipipe_unstall_iret_root+0x8 (restore_raw+0x0)
> :     func               -11118+   7.488  irq_exit+0x8 (__ipipe_sync_stage+0xff)
> :|  * func               -11111+   3.278  cleanup_instance+0xa (_rtdm_close+0x134)
> :|  * func               -11107+   3.503  __ipipe_restore_pipeline_head+0x8 (cleanup_instance+0x4a)
> :     func               -11104+   1.248  kfree+0xe (cleanup_instance+0x6d)
> :     func               -11103+   1.924  __ipipe_test_and_stall_root+0x8 (kfree+0x1a)
> :    *func               -11101+   4.060  kfree_debugcheck+0xa (kfree+0x27)
> :    *func               -11097+   1.849  check_irq_off+0x8 (kfree+0x4c)
> :    *func               -11095+   2.180  __ipipe_test_root+0x8 (check_irq_off+0xd)
> :    *func               -11093+   1.533  cache_free_debugcheck+0xe (kfree+0x59)
> :    *func               -11091+   2.541  kfree_debugcheck+0xa (cache_free_debugcheck+0x28)
> :    *func               -11089+   2.165  dbg_redzone1+0x8 (cache_free_debugcheck+0x9f)
> :    *func               -11086+   2.473  dbg_redzone2+0x8 (cache_free_debugcheck+0xb0)
> :    *func               -11084+   1.406  dbg_redzone1+0x8 (cache_free_debugcheck+0xf7)
> :    *func               -11082+   2.105  dbg_redzone2+0x8 (cache_free_debugcheck+0x106)
> :    *func               -11080+   3.924  dbg_userword+0x8 (cache_free_debugcheck+0x11b)
> :    *func               -11076!  56.819  poison_obj+0xd (cache_free_debugcheck+0x197)
> :    *func               -11020+   1.496  __ipipe_restore_root+0x8 (kfree+0x93)
> :    *func               -11018+   6.631  __ipipe_unstall_root+0x8 (__ipipe_restore_root+0x2b)
> :     func               -11012+   2.060  __ipipe_stall_root+0x8 (syscall_exit+0x5)
> :    *func               -11009+   2.458  schedule+0xe (work_resched+0x6)
> :    *func               -11007+   2.699  profile_hit+0x9 (schedule+0x6c)
> :    *func               -11004+   3.518  sched_clock+0xa (schedule+0x113)
> :    *func               -11001+   7.353  __ipipe_stall_root+0x8 (schedule+0x191)
> :    *func               -10993+   2.834  __ipipe_dispatch_event+0xe (schedule+0x42f)
> :    *func               -10991+   5.075  schedule_event+0xb (__ipipe_dispatch_event+0x5e)
> :|   *func               -10986+   5.413  __switch_to+0xe (xnpod_schedule+0x612)
> :|   *[  925] -<?>-   99 -10980+   4.120  xnpod_schedule+0x6e6 (xnpod_suspend_thread+0xed)
> :|  **func               -10976+   3.947  __ipipe_restore_pipeline_head+0x8 (xnshadow_relax+0xd0)
> :    *func               -10972+   1.330  ipipe_reenter_root+0xe (xnshadow_relax+0xf1)
> :    *func               -10971+   6.796  __ipipe_unstall_root+0x8 (ipipe_reenter_root+0x2e)
> :     func               -10964+   4.240  losyscall_event+0xe (__ipipe_dispatch_event+0x5e)
> :     func               -10960+   3.443  sys_open+0x8 (syscall_call+0x7)

Explanation of the recorded pid entries before and after __switch_to():

[<pid>] <task name> <priority>

Here 925 is the main thread of irqloop, 926 the irq user-mode pthread.
The pids are resolved on trace printing. And as the process terminated,
we do not see their names anymore.

Regarding the trace:

The pthread is blocked on the irqbench device ioctl. On hitting ^C,
close() is invoked from the main thread for that device. The pthread is
woken up and obviously relaxed on some linux syscall (after being
interrupted twice by the periodic timer event of a "latency -p 300 -P
50" instance). This passes the control over to the main thread while
keeping the pthread prio of 99. And this prio seems to survive for the
following 11 ms (full trace available on request).

Any ideas what's going on?

Jan


PS: ipipe-tracer and xenomai-core patch for pid/context tracing will
follow later.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] Prio-inversion on cleanup?
  2006-06-28 14:45 ` Jan Kiszka
@ 2006-06-29  9:24   ` Jan Kiszka
  2006-06-29 10:14     ` Philippe Gerum
  2006-06-29 12:27     ` Gilles Chanteperdrix
  0 siblings, 2 replies; 11+ messages in thread
From: Jan Kiszka @ 2006-06-29  9:24 UTC (permalink / raw)
  To: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 2627 bytes --]

Jan Kiszka wrote:
> ...
> The pthread is blocked on the irqbench device ioctl. On hitting ^C,
> close() is invoked from the main thread for that device. The pthread is
> woken up and obviously relaxed on some linux syscall (after being
> interrupted twice by the periodic timer event of a "latency -p 300 -P
> 50" instance). This passes the control over to the main thread while
> keeping the pthread prio of 99. And this prio seems to survive for the
> following 11 ms (full trace available on request).
> 
> Any ideas what's going on?
> 

Ok, I think I finally understood the issue. It seems to lie deep in the
POSIX user-space lib, specifically its use of standard pthread services.
Let me first clarify my scenario:

A high-prio pthread of known and (theoretically) bounded workload shall
be started and stopped while a low-prio thread is already running. The
low-prio thread shall only be impacted by the real workload of the
high-prio one, not by its creation/destruction - at least not
significantly. To achieve this with the POSIX skin (actually this
applies to preempt-rt in theory as well), I have to create the thread
under SCHED_OTHER, raise its priority right before entering the
workload, and lower it again before leaving the thread.

But, unfortunately, __wrap_pthread_setschedparam() depends on some
real_pthread functions to be called. One of them is
__real_pthread_setschedparam, and this one issues a linux syscall for
obvious reasons. When lowering the thread to SCHED_OTHER, this syscall
is still issued under the original priority. And here we get bitten by
the prio-inheritance feature of the nucleus which, in my case, lets
significant parts of standard Linux execute under high priority,
delaying my other real-time threads.

Now I wonder how to resolve this, how to make pthread_setschedparam (a
rather central RT-service) really real-time safe? I would say we need
something like a lazy schedparam propagation to Linux which only takes
place when the thread enters secondary mode intentionally or no other RT
thread is ready. But I do not have a design for this at hand. Nasty.

[My preferred way for every setup != CONFIG_PREEMPT_RT + CONFIG_XENOMAI
would still be to switch this prio-inheritance off for the root thread.
But this was nack'ed by Philippe several times before... ;)]

Side note: the native skin does not seem to suffer from this effect as
it only tracks the current prio at Xenomai level.

Jan


PS: Gilles, what about marking those services of the POSIX in the doc
that may issue a linux syscall (and under which conditions)?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] Prio-inversion on cleanup?
  2006-06-29  9:24   ` Jan Kiszka
@ 2006-06-29 10:14     ` Philippe Gerum
  2006-06-29 10:34       ` Jan Kiszka
  2006-06-29 12:27     ` Gilles Chanteperdrix
  1 sibling, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2006-06-29 10:14 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Thu, 2006-06-29 at 11:24 +0200, Jan Kiszka wrote:
> Jan Kiszka wrote:
> > ...
> > The pthread is blocked on the irqbench device ioctl. On hitting ^C,
> > close() is invoked from the main thread for that device. The pthread is
> > woken up and obviously relaxed on some linux syscall (after being
> > interrupted twice by the periodic timer event of a "latency -p 300 -P
> > 50" instance). This passes the control over to the main thread while
> > keeping the pthread prio of 99. And this prio seems to survive for the
> > following 11 ms (full trace available on request).
> > 
> > Any ideas what's going on?
> > 
> 
> Ok, I think I finally understood the issue. It seems to lie deep in the
> POSIX user-space lib, specifically its use of standard pthread services.
> Let me first clarify my scenario:
> 
> A high-prio pthread of known and (theoretically) bounded workload shall
> be started and stopped while a low-prio thread is already running. The
> low-prio thread shall only be impacted by the real workload of the
> high-prio one, not by its creation/destruction - at least not
> significantly. To achieve this with the POSIX skin (actually this
> applies to preempt-rt in theory as well), I have to create the thread
> under SCHED_OTHER, raise its priority right before entering the
> workload, and lower it again before leaving the thread.
> 
> But, unfortunately, __wrap_pthread_setschedparam() depends on some
> real_pthread functions to be called. One of them is
> __real_pthread_setschedparam, and this one issues a linux syscall for
> obvious reasons. When lowering the thread to SCHED_OTHER, this syscall
> is still issued under the original priority. And here we get bitten by
> the prio-inheritance feature of the nucleus which, in my case, lets
> significant parts of standard Linux execute under high priority,
> delaying my other real-time threads.
> 
> Now I wonder how to resolve this, how to make pthread_setschedparam (a
> rather central RT-service) really real-time safe? I would say we need
> something like a lazy schedparam propagation to Linux which only takes
> place when the thread enters secondary mode intentionally or no other RT
> thread is ready. But I do not have a design for this at hand. Nasty.
> 
> [My preferred way for every setup != CONFIG_PREEMPT_RT + CONFIG_XENOMAI
> would still be to switch this prio-inheritance off for the root thread.
> But this was nack'ed by Philippe several times before... ;)]
> 

I nacked the proposal to _always_ switch it off. Some applications
deeply need this.

> Side note: the native skin does not seem to suffer from this effect as
> it only tracks the current prio at Xenomai level.
> 

Switching off priority adjustment for the root thread before moving a
SCHED_FIFO shadow to SCHED_OTHER would prevent this side-effect. We'd
need to add a per-thread status bit to check whether we should run
xnpod_renice_root() or not for any given thread, and switch it on/off
from __wrap_pthread_setschedparam.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] Prio-inversion on cleanup?
  2006-06-29 10:14     ` Philippe Gerum
@ 2006-06-29 10:34       ` Jan Kiszka
  2006-06-29 10:48         ` Philippe Gerum
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2006-06-29 10:34 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 3639 bytes --]

Philippe Gerum wrote:
> On Thu, 2006-06-29 at 11:24 +0200, Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>> ...
>>> The pthread is blocked on the irqbench device ioctl. On hitting ^C,
>>> close() is invoked from the main thread for that device. The pthread is
>>> woken up and obviously relaxed on some linux syscall (after being
>>> interrupted twice by the periodic timer event of a "latency -p 300 -P
>>> 50" instance). This passes the control over to the main thread while
>>> keeping the pthread prio of 99. And this prio seems to survive for the
>>> following 11 ms (full trace available on request).
>>>
>>> Any ideas what's going on?
>>>
>> Ok, I think I finally understood the issue. It seems to lie deep in the
>> POSIX user-space lib, specifically its use of standard pthread services.
>> Let me first clarify my scenario:
>>
>> A high-prio pthread of known and (theoretically) bounded workload shall
>> be started and stopped while a low-prio thread is already running. The
>> low-prio thread shall only be impacted by the real workload of the
>> high-prio one, not by its creation/destruction - at least not
>> significantly. To achieve this with the POSIX skin (actually this
>> applies to preempt-rt in theory as well), I have to create the thread
>> under SCHED_OTHER, raise its priority right before entering the
>> workload, and lower it again before leaving the thread.
>>
>> But, unfortunately, __wrap_pthread_setschedparam() depends on some
>> real_pthread functions to be called. One of them is
>> __real_pthread_setschedparam, and this one issues a linux syscall for
>> obvious reasons. When lowering the thread to SCHED_OTHER, this syscall
>> is still issued under the original priority. And here we get bitten by
>> the prio-inheritance feature of the nucleus which, in my case, lets
>> significant parts of standard Linux execute under high priority,
>> delaying my other real-time threads.
>>
>> Now I wonder how to resolve this, how to make pthread_setschedparam (a
>> rather central RT-service) really real-time safe? I would say we need
>> something like a lazy schedparam propagation to Linux which only takes
>> place when the thread enters secondary mode intentionally or no other RT
>> thread is ready. But I do not have a design for this at hand. Nasty.
>>
>> [My preferred way for every setup != CONFIG_PREEMPT_RT + CONFIG_XENOMAI
>> would still be to switch this prio-inheritance off for the root thread.
>> But this was nack'ed by Philippe several times before... ;)]
>>
> 
> I nacked the proposal to _always_ switch it off. Some applications
> deeply need this.

I think to remember asking for a CONFIG switch here. Some applications
actually benefit while others (I even think most) do not need it or even
easily screw themselves up during init/cleanup. You know, my old
concerns. :)

> 
>> Side note: the native skin does not seem to suffer from this effect as
>> it only tracks the current prio at Xenomai level.
>>
> 
> Switching off priority adjustment for the root thread before moving a
> SCHED_FIFO shadow to SCHED_OTHER would prevent this side-effect. We'd
> need to add a per-thread status bit to check whether we should run
> xnpod_renice_root() or not for any given thread, and switch it on/off
> from __wrap_pthread_setschedparam.
> 

This doesn't sound bad and would probably help low-prio threads also in
some other scenarios.

Nevertheless, a syscall-less pthread_setschedparam would still be a good
idea as well, this time having the caller in mind who wishes to change
its priority without entering Linux.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] Prio-inversion on cleanup?
  2006-06-29 10:34       ` Jan Kiszka
@ 2006-06-29 10:48         ` Philippe Gerum
  2006-06-29 11:12           ` Philippe Gerum
  2006-06-29 13:24           ` Philippe Gerum
  0 siblings, 2 replies; 11+ messages in thread
From: Philippe Gerum @ 2006-06-29 10:48 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Thu, 2006-06-29 at 12:34 +0200, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > On Thu, 2006-06-29 at 11:24 +0200, Jan Kiszka wrote:
> >> Jan Kiszka wrote:
> >>> ...
> >>> The pthread is blocked on the irqbench device ioctl. On hitting ^C,
> >>> close() is invoked from the main thread for that device. The pthread is
> >>> woken up and obviously relaxed on some linux syscall (after being
> >>> interrupted twice by the periodic timer event of a "latency -p 300 -P
> >>> 50" instance). This passes the control over to the main thread while
> >>> keeping the pthread prio of 99. And this prio seems to survive for the
> >>> following 11 ms (full trace available on request).
> >>>
> >>> Any ideas what's going on?
> >>>
> >> Ok, I think I finally understood the issue. It seems to lie deep in the
> >> POSIX user-space lib, specifically its use of standard pthread services.
> >> Let me first clarify my scenario:
> >>
> >> A high-prio pthread of known and (theoretically) bounded workload shall
> >> be started and stopped while a low-prio thread is already running. The
> >> low-prio thread shall only be impacted by the real workload of the
> >> high-prio one, not by its creation/destruction - at least not
> >> significantly. To achieve this with the POSIX skin (actually this
> >> applies to preempt-rt in theory as well), I have to create the thread
> >> under SCHED_OTHER, raise its priority right before entering the
> >> workload, and lower it again before leaving the thread.
> >>
> >> But, unfortunately, __wrap_pthread_setschedparam() depends on some
> >> real_pthread functions to be called. One of them is
> >> __real_pthread_setschedparam, and this one issues a linux syscall for
> >> obvious reasons. When lowering the thread to SCHED_OTHER, this syscall
> >> is still issued under the original priority. And here we get bitten by
> >> the prio-inheritance feature of the nucleus which, in my case, lets
> >> significant parts of standard Linux execute under high priority,
> >> delaying my other real-time threads.
> >>
> >> Now I wonder how to resolve this, how to make pthread_setschedparam (a
> >> rather central RT-service) really real-time safe? I would say we need
> >> something like a lazy schedparam propagation to Linux which only takes
> >> place when the thread enters secondary mode intentionally or no other RT
> >> thread is ready. But I do not have a design for this at hand. Nasty.
> >>
> >> [My preferred way for every setup != CONFIG_PREEMPT_RT + CONFIG_XENOMAI
> >> would still be to switch this prio-inheritance off for the root thread.
> >> But this was nack'ed by Philippe several times before... ;)]
> >>
> > 
> > I nacked the proposal to _always_ switch it off. Some applications
> > deeply need this.
> 
> I think to remember asking for a CONFIG switch here. Some applications
> actually benefit while others (I even think most) do not need it or even
> easily screw themselves up during init/cleanup. You know, my old
> concerns. :)
> 

A dynamic switch is better there. You may want this behaviour to be
settable on a thread-by-thread basis.

> > 
> >> Side note: the native skin does not seem to suffer from this effect as
> >> it only tracks the current prio at Xenomai level.
> >>
> > 
> > Switching off priority adjustment for the root thread before moving a
> > SCHED_FIFO shadow to SCHED_OTHER would prevent this side-effect. We'd
> > need to add a per-thread status bit to check whether we should run
> > xnpod_renice_root() or not for any given thread, and switch it on/off
> > from __wrap_pthread_setschedparam.
> > 
> 
> This doesn't sound bad and would probably help low-prio threads also in
> some other scenarios.
> 

I'm currently implementing that at nucleus level.

> Nevertheless, a syscall-less pthread_setschedparam would still be a good
> idea as well, this time having the caller in mind who wishes to change
> its priority without entering Linux.

Reading the comment Gilles put there, it's likely not possible to have a
syscall-less implementation on top of the NPTL. We need to give a chance
to the NPTL to track the priority update, otherwise,
pthread_getschedparam() is going to break.

> 
> Jan
> 
-- 
Philippe.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] Prio-inversion on cleanup?
  2006-06-29 10:48         ` Philippe Gerum
@ 2006-06-29 11:12           ` Philippe Gerum
  2006-06-29 11:20             ` Jan Kiszka
  2006-06-29 13:24           ` Philippe Gerum
  1 sibling, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2006-06-29 11:12 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Thu, 2006-06-29 at 12:48 +0200, Philippe Gerum wrote:
> On Thu, 2006-06-29 at 12:34 +0200, Jan Kiszka wrote:
> > Philippe Gerum wrote:
> > > On Thu, 2006-06-29 at 11:24 +0200, Jan Kiszka wrote:
> > >> Jan Kiszka wrote:
> > >>> ...
> > >>> The pthread is blocked on the irqbench device ioctl. On hitting ^C,
> > >>> close() is invoked from the main thread for that device. The pthread is
> > >>> woken up and obviously relaxed on some linux syscall (after being
> > >>> interrupted twice by the periodic timer event of a "latency -p 300 -P
> > >>> 50" instance). This passes the control over to the main thread while
> > >>> keeping the pthread prio of 99. And this prio seems to survive for the
> > >>> following 11 ms (full trace available on request).
> > >>>
> > >>> Any ideas what's going on?
> > >>>
> > >> Ok, I think I finally understood the issue. It seems to lie deep in the
> > >> POSIX user-space lib, specifically its use of standard pthread services.
> > >> Let me first clarify my scenario:
> > >>
> > >> A high-prio pthread of known and (theoretically) bounded workload shall
> > >> be started and stopped while a low-prio thread is already running. The
> > >> low-prio thread shall only be impacted by the real workload of the
> > >> high-prio one, not by its creation/destruction - at least not
> > >> significantly. To achieve this with the POSIX skin (actually this
> > >> applies to preempt-rt in theory as well), I have to create the thread
> > >> under SCHED_OTHER, raise its priority right before entering the
> > >> workload, and lower it again before leaving the thread.
> > >>
> > >> But, unfortunately, __wrap_pthread_setschedparam() depends on some
> > >> real_pthread functions to be called. One of them is
> > >> __real_pthread_setschedparam, and this one issues a linux syscall for
> > >> obvious reasons. When lowering the thread to SCHED_OTHER, this syscall
> > >> is still issued under the original priority. And here we get bitten by
> > >> the prio-inheritance feature of the nucleus which, in my case, lets
> > >> significant parts of standard Linux execute under high priority,
> > >> delaying my other real-time threads.
> > >>
> > >> Now I wonder how to resolve this, how to make pthread_setschedparam (a
> > >> rather central RT-service) really real-time safe? I would say we need
> > >> something like a lazy schedparam propagation to Linux which only takes
> > >> place when the thread enters secondary mode intentionally or no other RT
> > >> thread is ready. But I do not have a design for this at hand. Nasty.
> > >>
> > >> [My preferred way for every setup != CONFIG_PREEMPT_RT + CONFIG_XENOMAI
> > >> would still be to switch this prio-inheritance off for the root thread.
> > >> But this was nack'ed by Philippe several times before... ;)]
> > >>
> > > 
> > > I nacked the proposal to _always_ switch it off. Some applications
> > > deeply need this.
> > 
> > I think to remember asking for a CONFIG switch here. Some applications
> > actually benefit while others (I even think most) do not need it or even
> > easily screw themselves up during init/cleanup. You know, my old
> > concerns. :)
> > 
> 
> A dynamic switch is better there. You may want this behaviour to be
> settable on a thread-by-thread basis.
> 

Actually, we could have both, dynamic and static switches, just like the
interrupt shield.

> > > 
> > >> Side note: the native skin does not seem to suffer from this effect as
> > >> it only tracks the current prio at Xenomai level.
> > >>
> > > 
> > > Switching off priority adjustment for the root thread before moving a
> > > SCHED_FIFO shadow to SCHED_OTHER would prevent this side-effect. We'd
> > > need to add a per-thread status bit to check whether we should run
> > > xnpod_renice_root() or not for any given thread, and switch it on/off
> > > from __wrap_pthread_setschedparam.
> > > 
> > 
> > This doesn't sound bad and would probably help low-prio threads also in
> > some other scenarios.
> > 
> 
> I'm currently implementing that at nucleus level.
> 
> > Nevertheless, a syscall-less pthread_setschedparam would still be a good
> > idea as well, this time having the caller in mind who wishes to change
> > its priority without entering Linux.
> 
> Reading the comment Gilles put there, it's likely not possible to have a
> syscall-less implementation on top of the NPTL. We need to give a chance
> to the NPTL to track the priority update, otherwise,
> pthread_getschedparam() is going to break.
> 
> > 
> > Jan
> > 
-- 
Philippe.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] Prio-inversion on cleanup?
  2006-06-29 11:12           ` Philippe Gerum
@ 2006-06-29 11:20             ` Jan Kiszka
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kiszka @ 2006-06-29 11:20 UTC (permalink / raw)
  To: rpm; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 5098 bytes --]

Philippe Gerum wrote:
> On Thu, 2006-06-29 at 12:48 +0200, Philippe Gerum wrote:
>> On Thu, 2006-06-29 at 12:34 +0200, Jan Kiszka wrote:
>>> Philippe Gerum wrote:
>>>> On Thu, 2006-06-29 at 11:24 +0200, Jan Kiszka wrote:
>>>>> Jan Kiszka wrote:
>>>>>> ...
>>>>>> The pthread is blocked on the irqbench device ioctl. On hitting ^C,
>>>>>> close() is invoked from the main thread for that device. The pthread is
>>>>>> woken up and obviously relaxed on some linux syscall (after being
>>>>>> interrupted twice by the periodic timer event of a "latency -p 300 -P
>>>>>> 50" instance). This passes the control over to the main thread while
>>>>>> keeping the pthread prio of 99. And this prio seems to survive for the
>>>>>> following 11 ms (full trace available on request).
>>>>>>
>>>>>> Any ideas what's going on?
>>>>>>
>>>>> Ok, I think I finally understood the issue. It seems to lie deep in the
>>>>> POSIX user-space lib, specifically its use of standard pthread services.
>>>>> Let me first clarify my scenario:
>>>>>
>>>>> A high-prio pthread of known and (theoretically) bounded workload shall
>>>>> be started and stopped while a low-prio thread is already running. The
>>>>> low-prio thread shall only be impacted by the real workload of the
>>>>> high-prio one, not by its creation/destruction - at least not
>>>>> significantly. To achieve this with the POSIX skin (actually this
>>>>> applies to preempt-rt in theory as well), I have to create the thread
>>>>> under SCHED_OTHER, raise its priority right before entering the
>>>>> workload, and lower it again before leaving the thread.
>>>>>
>>>>> But, unfortunately, __wrap_pthread_setschedparam() depends on some
>>>>> real_pthread functions to be called. One of them is
>>>>> __real_pthread_setschedparam, and this one issues a linux syscall for
>>>>> obvious reasons. When lowering the thread to SCHED_OTHER, this syscall
>>>>> is still issued under the original priority. And here we get bitten by
>>>>> the prio-inheritance feature of the nucleus which, in my case, lets
>>>>> significant parts of standard Linux execute under high priority,
>>>>> delaying my other real-time threads.
>>>>>
>>>>> Now I wonder how to resolve this, how to make pthread_setschedparam (a
>>>>> rather central RT-service) really real-time safe? I would say we need
>>>>> something like a lazy schedparam propagation to Linux which only takes
>>>>> place when the thread enters secondary mode intentionally or no other RT
>>>>> thread is ready. But I do not have a design for this at hand. Nasty.
>>>>>
>>>>> [My preferred way for every setup != CONFIG_PREEMPT_RT + CONFIG_XENOMAI
>>>>> would still be to switch this prio-inheritance off for the root thread.
>>>>> But this was nack'ed by Philippe several times before... ;)]
>>>>>
>>>> I nacked the proposal to _always_ switch it off. Some applications
>>>> deeply need this.
>>> I think to remember asking for a CONFIG switch here. Some applications
>>> actually benefit while others (I even think most) do not need it or even
>>> easily screw themselves up during init/cleanup. You know, my old
>>> concerns. :)
>>>
>> A dynamic switch is better there. You may want this behaviour to be
>> settable on a thread-by-thread basis.
>>
> 
> Actually, we could have both, dynamic and static switches, just like the
> interrupt shield.
> 

Hurray! (Otherwise, I would have kicked off a discussion about the
default setting ;-))

>>>>> Side note: the native skin does not seem to suffer from this effect as
>>>>> it only tracks the current prio at Xenomai level.
>>>>>
>>>> Switching off priority adjustment for the root thread before moving a
>>>> SCHED_FIFO shadow to SCHED_OTHER would prevent this side-effect. We'd
>>>> need to add a per-thread status bit to check whether we should run
>>>> xnpod_renice_root() or not for any given thread, and switch it on/off
>>>> from __wrap_pthread_setschedparam.
>>>>
>>> This doesn't sound bad and would probably help low-prio threads also in
>>> some other scenarios.
>>>
>> I'm currently implementing that at nucleus level.
>>
>>> Nevertheless, a syscall-less pthread_setschedparam would still be a good
>>> idea as well, this time having the caller in mind who wishes to change
>>> its priority without entering Linux.
>> Reading the comment Gilles put there, it's likely not possible to have a
>> syscall-less implementation on top of the NPTL. We need to give a chance
>> to the NPTL to track the priority update, otherwise,
>> pthread_getschedparam() is going to break.
>>

Yep, saw that. I would start thinking about what we gain by wrapping
that service as well (e.g. by mirroring the state in a per-__thread
variable). Would likely work as long as the thread stays in primary
mode, but we still need to propagate the setting when entering secondary
mode. Maybe some soft-IRQ to Linux can help here. It could apply the
scheduler change on Linux re-entry.

Well, I'm probably overseeing tons of pitfalls, or all this becomes
terribly complicated.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] Prio-inversion on cleanup?
  2006-06-29  9:24   ` Jan Kiszka
  2006-06-29 10:14     ` Philippe Gerum
@ 2006-06-29 12:27     ` Gilles Chanteperdrix
  1 sibling, 0 replies; 11+ messages in thread
From: Gilles Chanteperdrix @ 2006-06-29 12:27 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
 > (..)
 > Now I wonder how to resolve this, how to make pthread_setschedparam (a
 > rather central RT-service) really real-time safe? I would say we need
 > something like a lazy schedparam propagation to Linux which only takes
 > place when the thread enters secondary mode intentionally or no other RT
 > thread is ready. But I do not have a design for this at hand. Nasty.

A simple solution is to wrap pthread_getschedparam as well. It would
fallback to __real_pthread_getschedparam when the xenomai posix skin
syscall returns -ESRCH. This would allow to remove the offending call to
__real_pthread_setschedparam in __wrap_pthread_setschedparam.

 > 
 > [My preferred way for every setup != CONFIG_PREEMPT_RT + CONFIG_XENOMAI
 > would still be to switch this prio-inheritance off for the root thread.
 > But this was nack'ed by Philippe several times before... ;)]
 > 
 > Side note: the native skin does not seem to suffer from this effect as
 > it only tracks the current prio at Xenomai level.
 > 
 > Jan
 > 
 > 
 > PS: Gilles, what about marking those services of the POSIX in the doc
 > that may issue a linux syscall (and under which conditions)?

Actually, syscalls that may be called only from a particular context or
that cause migration of their caller have a paragraph entitled "Valid
contexts", if pthread_setschedparam lack this paragraph, it is probably
because its implementation changed without its documentation.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] Prio-inversion on cleanup?
  2006-06-29 10:48         ` Philippe Gerum
  2006-06-29 11:12           ` Philippe Gerum
@ 2006-06-29 13:24           ` Philippe Gerum
  2006-06-29 16:03             ` Gilles Chanteperdrix
  1 sibling, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2006-06-29 13:24 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Thu, 2006-06-29 at 12:48 +0200, Philippe Gerum wrote:
> > > Switching off priority adjustment for the root thread before moving a
> > > SCHED_FIFO shadow to SCHED_OTHER would prevent this side-effect. We'd
> > > need to add a per-thread status bit to check whether we should run
> > > xnpod_renice_root() or not for any given thread, and switch it on/off
> > > from __wrap_pthread_setschedparam.
> > > 
> > 
> > This doesn't sound bad and would probably help low-prio threads also in
> > some other scenarios.
> > 
> 
> I'm currently implementing that at nucleus level.

The priority coupling switch is in place now, the static config one is
called CONFIG_XENO_OPT_RPIDISABLE, and a dynamic flag as been added to
the xnthread status mask, namely XNRPIOFF (Root [thread] PI off). I've
only added the required support to control priority coupling between
both Xenomai and Linux schedulers, but refrained from choosing any
policy regarding how we are going to use it in the POSIX skin to solve
the pthread_setschedpram issue. I guess that more brain cycles are need
there, and I'm cowardly leaving this to Gilles.
 
-- 
Philippe.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xenomai-core] Prio-inversion on cleanup?
  2006-06-29 13:24           ` Philippe Gerum
@ 2006-06-29 16:03             ` Gilles Chanteperdrix
  0 siblings, 0 replies; 11+ messages in thread
From: Gilles Chanteperdrix @ 2006-06-29 16:03 UTC (permalink / raw)
  To: rpm; +Cc: xenomai

Philippe Gerum wrote:
 > On Thu, 2006-06-29 at 12:48 +0200, Philippe Gerum wrote:
 > > > > Switching off priority adjustment for the root thread before moving a
 > > > > SCHED_FIFO shadow to SCHED_OTHER would prevent this side-effect. We'd
 > > > > need to add a per-thread status bit to check whether we should run
 > > > > xnpod_renice_root() or not for any given thread, and switch it on/off
 > > > > from __wrap_pthread_setschedparam.
 > > > > 
 > > > 
 > > > This doesn't sound bad and would probably help low-prio threads also in
 > > > some other scenarios.
 > > > 
 > > 
 > > I'm currently implementing that at nucleus level.
 > 
 > The priority coupling switch is in place now, the static config one is
 > called CONFIG_XENO_OPT_RPIDISABLE, and a dynamic flag as been added to
 > the xnthread status mask, namely XNRPIOFF (Root [thread] PI off). I've
 > only added the required support to control priority coupling between
 > both Xenomai and Linux schedulers, but refrained from choosing any
 > policy regarding how we are going to use it in the POSIX skin to solve
 > the pthread_setschedpram issue. I guess that more brain cycles are need
 > there, and I'm cowardly leaving this to Gilles.

I have changed __wrap_pthread_setschedparam so that it no longer uses
__real_pthread_setschedparam, and added a wrapper for
pthread_getschedparam. I will add support for this XNRPIOFF bit to
pthread_set_mode_np. This avoid the need for brain cycles.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-06-29 16:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-27 16:44 [Xenomai-core] Prio-inversion on cleanup? Jan Kiszka
2006-06-28 14:45 ` Jan Kiszka
2006-06-29  9:24   ` Jan Kiszka
2006-06-29 10:14     ` Philippe Gerum
2006-06-29 10:34       ` Jan Kiszka
2006-06-29 10:48         ` Philippe Gerum
2006-06-29 11:12           ` Philippe Gerum
2006-06-29 11:20             ` Jan Kiszka
2006-06-29 13:24           ` Philippe Gerum
2006-06-29 16:03             ` Gilles Chanteperdrix
2006-06-29 12:27     ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.