All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-help] pit
@ 2008-02-12  0:45 Steven Seeger
  2008-02-12  7:51 ` Jan Kiszka
  2008-02-12  9:53 ` Philippe Gerum
  0 siblings, 2 replies; 10+ messages in thread
From: Steven Seeger @ 2008-02-12  0:45 UTC (permalink / raw)
  To: xenomai

[-- Attachment #1: Type: text/plain, Size: 1235 bytes --]

I compiled the kernel for 586 and am running the PIT timer. I still get
the 17000-18000 context switches per second, and now the irq0 handler is
taking up 11% of the CPU instead of only 5% when the two 8000Hz tasks
are loaded but delayed on events. I think that the problem isn't with
pit, but with the tasks being periodic even though they are blocked. 

 

Running in PIT mode with periodic timing on uses only 9.5% of the CPU. I
show about 9000 context switches per second. (the 2 8000 hz tasks and
the 1000 hz linux interrupt.)

 

With periodic timing, it's 5.4% when the tasks idle and about 9000
context switches a second. When one of them becomes active, the irq0
handler is using 10% of the CPU and the sound task is using about 8%.
These are two kernel tasks.

 

Userspace stack size is set to 64k. I forgot to mention this to Philippe
earlier.

 

Perhaps the problem is the overhead that the timer handler introduces
being able to support multiple skins with individual timebases. It
sounds like in order to save some cpu cycles, I may want to turn off
periodicity while threads are idle and also avoid setting threads
periodic when they can be driven some other way. 

 

Steven

 


[-- Attachment #2: Type: text/html, Size: 3888 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] pit
  2008-02-12  0:45 [Xenomai-help] pit Steven Seeger
@ 2008-02-12  7:51 ` Jan Kiszka
  2008-02-12  9:13   ` Philippe Gerum
  2008-02-12 13:14   ` Steven Seeger
  2008-02-12  9:53 ` Philippe Gerum
  1 sibling, 2 replies; 10+ messages in thread
From: Jan Kiszka @ 2008-02-12  7:51 UTC (permalink / raw)
  To: Steven Seeger; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 1936 bytes --]

Steven Seeger wrote:
> I compiled the kernel for 586 and am running the PIT timer. I still get
> the 17000-18000 context switches per second, and now the irq0 handler is
> taking up 11% of the CPU instead of only 5% when the two 8000Hz tasks
> are loaded but delayed on events. I think that the problem isn't with
> pit, but with the tasks being periodic even though they are blocked. 

That makes sense: Periodic timers keep on firing. That would explain up 
to 16000 IRQ invocations per second. And the other 1000-2000 come from 
Linux?

As suggested earlier: you can reduce the number of IRQ events by basing 
your periodic tasks on the same start date. Then both will be woken up 
at the same times and their priority will decide about the execution order.

> 
> Running in PIT mode with periodic timing on uses only 9.5% of the CPU. I
> show about 9000 context switches per second. (the 2 8000 hz tasks and
> the 1000 hz linux interrupt.)

Do you need Linux at 1 KHz? You may even want to try NO_HZ.

> 
> With periodic timing, it's 5.4% when the tasks idle and about 9000
> context switches a second. When one of them becomes active, the irq0
> handler is using 10% of the CPU and the sound task is using about 8%.
> These are two kernel tasks.
> 
>  
> 
> Userspace stack size is set to 64k. I forgot to mention this to Philippe
> earlier.
> 
>  
> 
> Perhaps the problem is the overhead that the timer handler introduces
> being able to support multiple skins with individual timebases. It
> sounds like in order to save some cpu cycles, I may want to turn off
> periodicity while threads are idle and also avoid setting threads
> periodic when they can be driven some other way. 

I'm still wondering with what older numbers you compare all the nice 
stats you now generate. Neither older Xenomai nor RTAI provide 
comparable statistics. Are we doing fair comparisons here?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] pit
  2008-02-12  7:51 ` Jan Kiszka
@ 2008-02-12  9:13   ` Philippe Gerum
  2008-02-12 13:14   ` Steven Seeger
  1 sibling, 0 replies; 10+ messages in thread
From: Philippe Gerum @ 2008-02-12  9:13 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Jan Kiszka wrote:
> Steven Seeger wrote:
>> I compiled the kernel for 586 and am running the PIT timer. I still get
>> the 17000-18000 context switches per second, and now the irq0 handler is
>> taking up 11% of the CPU instead of only 5% when the two 8000Hz tasks
>> are loaded but delayed on events. I think that the problem isn't with
>> pit, but with the tasks being periodic even though they are blocked. 
> 
> That makes sense: Periodic timers keep on firing. That would explain up
> to 16000 IRQ invocations per second. And the other 1000-2000 come from
> Linux?
> 
> As suggested earlier: you can reduce the number of IRQ events by basing
> your periodic tasks on the same start date. Then both will be woken up
> at the same times and their priority will decide about the execution order.
> 
>>
>> Running in PIT mode with periodic timing on uses only 9.5% of the CPU. I
>> show about 9000 context switches per second. (the 2 8000 hz tasks and
>> the 1000 hz linux interrupt.)
> 
> Do you need Linux at 1 KHz? You may even want to try NO_HZ.
> 
>>
>> With periodic timing, it's 5.4% when the tasks idle and about 9000
>> context switches a second. When one of them becomes active, the irq0
>> handler is using 10% of the CPU and the sound task is using about 8%.
>> These are two kernel tasks.
>>
>>  
>>
>> Userspace stack size is set to 64k. I forgot to mention this to Philippe
>> earlier.
>>
>>  
>>
>> Perhaps the problem is the overhead that the timer handler introduces
>> being able to support multiple skins with individual timebases. It
>> sounds like in order to save some cpu cycles, I may want to turn off
>> periodicity while threads are idle and also avoid setting threads
>> periodic when they can be driven some other way. 
> 
> I'm still wondering with what older numbers you compare all the nice
> stats you now generate. Neither older Xenomai nor RTAI provide
> comparable statistics. Are we doing fair comparisons here?
>

No, because RTAI charges interrupt load to the preempted task context.


-- 
Philippe.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] pit
  2008-02-12  0:45 [Xenomai-help] pit Steven Seeger
  2008-02-12  7:51 ` Jan Kiszka
@ 2008-02-12  9:53 ` Philippe Gerum
  2008-02-12 13:20   ` Steven Seeger
  1 sibling, 1 reply; 10+ messages in thread
From: Philippe Gerum @ 2008-02-12  9:53 UTC (permalink / raw)
  To: Steven Seeger; +Cc: xenomai

Steven Seeger wrote:
> I compiled the kernel for 586 and am running the PIT timer. I still get
> the 17000-18000 context switches per second, and now the irq0 handler is
> taking up 11% of the CPU instead of only 5% when the two 8000Hz tasks
> are loaded but delayed on events. I think that the problem isn’t with
> pit, but with the tasks being periodic even though they are blocked.
>

RTAI (at least the version you used) has a single per-task internal
timer, which is not really a timer, but rather a "resume_time" field,
the RTAI core is directly testing to know whether it should wake up a
delayed task.

Therefore, if your task used to call rt_task_make_periodic() on RTAI,
then just blocked on a semaphore with no timeout value, then this task
was dequeued from the timed task list, and for that reason, no oneshot
timer ticks had to be programmed to wake it up anymore. The drawback is
that you have no timer object, independent from the task itself.
Everything has to be related to this single "resume_time" field, on a
per-task basis. This is why the RTAI core has to save and restore this
value when nesting some timed operations for instance.

Xenomai has independent timers, which also means that if you call
rt_task_set_periodic() on a task, it will arm an internal per-task timer
(thread->ptimer) which will tick independently, regardless of what your
task is currently doing. So you will have timer ticks fired for that
task, even if it is blocked on some synchro with no timeout, in which
case, the tick handler will attempt to resume the task, but since the
DELAYED+BLOCKED wait states are conjunctive, it won't be able to.

I'd suggest that you choose whether your task has to undergo a periodic
timeline or not, i.e. whether it should call rt_task_wait_period() to
wait for the next timeslot, or block on some synchronization object to
resume its processing for the current period. Using both is one too many.

>  
> 
> Running in PIT mode with periodic timing on uses only 9.5% of the CPU. I
> show about 9000 context switches per second. (the 2 8000 hz tasks and
> the 1000 hz linux interrupt.)
> 
>  
> 
> With periodic timing, it’s 5.4% when the tasks idle and about 9000
> context switches a second. When one of them becomes active, the irq0
> handler is using 10% of the CPU and the sound task is using about 8%.
> These are two kernel tasks.
> 
>  
> 
> Userspace stack size is set to 64k. I forgot to mention this to Philippe
> earlier.
> 
>  
> 
> Perhaps the problem is the overhead that the timer handler introduces
> being able to support multiple skins with individual timebases. It
> sounds like in order to save some cpu cycles, I may want to turn off
> periodicity while threads are idle and also avoid setting threads
> periodic when they can be driven some other way.
> 
>  
> 
> Steven
> 
>  
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Xenomai-help mailing list
> Xenomai-help@domain.hid
> https://mail.gna.org/listinfo/xenomai-help


-- 
Philippe.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] pit
  2008-02-12  7:51 ` Jan Kiszka
  2008-02-12  9:13   ` Philippe Gerum
@ 2008-02-12 13:14   ` Steven Seeger
  2008-02-12 13:33     ` Jan Kiszka
  1 sibling, 1 reply; 10+ messages in thread
From: Steven Seeger @ 2008-02-12 13:14 UTC (permalink / raw)
  To: jan.kiszka; +Cc: xenomai

> That makes sense: Periodic timers keep on firing. That would explain
up
> to 16000 IRQ invocations per second. And the other 1000-2000 come from
> Linux?

I have linux set to "tickless" in one setting, and 1000Hz in another.
Weird.

> 
> As suggested earlier: you can reduce the number of IRQ events by
basing
> your periodic tasks on the same start date. Then both will be woken up
> at the same times and their priority will decide about the execution
order.

The problem here is that many tasks are periodic, but not always
required to run. Having them wake and wait for another period to do
nothing is also overhead.

> Do you need Linux at 1 KHz? You may even want to try NO_HZ.

It's set to "tickles."

> I'm still wondering with what older numbers you compare all the nice
> stats you now generate. Neither older Xenomai nor RTAI provide
> comparable statistics. Are we doing fair comparisons here?

Well, RTAI had that output where it would give load values in 1/10th of
a percent. (IIRC) The comparisons come from that. If, as Philippe says,
RTAI charges that load to the pre-empted context, then I'm not sure
where those numbers were coming from.

I do know that the system worked. I did some more experimenting and
timing functions, and it seems the source of all my woes are syscalls.
Namely, mutexes. There are several different resource sub-systems that
layer off each other in this application. The highest-level one requires
three mutex locks before doing a few IO operations. It takes me about
150 us to lock these three mutexes when nothing else is using them.

Steven



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] pit
  2008-02-12  9:53 ` Philippe Gerum
@ 2008-02-12 13:20   ` Steven Seeger
  0 siblings, 0 replies; 10+ messages in thread
From: Steven Seeger @ 2008-02-12 13:20 UTC (permalink / raw)
  To: rpm; +Cc: xenomai

> RTAI (at least the version you used) has a single per-task internal
> timer, which is not really a timer, but rather a "resume_time" field,
> the RTAI core is directly testing to know whether it should wake up a
> delayed task.
> 
> Therefore, if your task used to call rt_task_make_periodic() on RTAI,
> then just blocked on a semaphore with no timeout value, then this task
> was dequeued from the timed task list, and for that reason, no oneshot
> timer ticks had to be programmed to wake it up anymore. The drawback
is
> that you have no timer object, independent from the task itself.
> Everything has to be related to this single "resume_time" field, on a
> per-task basis. This is why the RTAI core has to save and restore this
> value when nesting some timed operations for instance.
> 
> Xenomai has independent timers, which also means that if you call
> rt_task_set_periodic() on a task, it will arm an internal per-task
timer
> (thread->ptimer) which will tick independently, regardless of what
your
> task is currently doing. So you will have timer ticks fired for that
> task, even if it is blocked on some synchro with no timeout, in which
> case, the tick handler will attempt to resume the task, but since the
> DELAYED+BLOCKED wait states are conjunctive, it won't be able to.
> 
> I'd suggest that you choose whether your task has to undergo a
periodic
> timeline or not, i.e. whether it should call rt_task_wait_period() to
> wait for the next timeslot, or block on some synchronization object to
> resume its processing for the current period. Using both is one too
many.
> 

Hi Philippe. Thanks for your explanation. I have made a change to the
sound driver to remove the task periodic timer before waiting on a
synchronization object. The effect is that irq0 doesn't do as much work
unless the tasks are running. In the case of my high-load tasks, one
runs at a variable period (motor ramp-up/ramp-down control) and the
other runs every 2ms to take a couple A/D measurements. The problem is
that these two tasks running together take up too many resources. If the
measurement task runs every 3 ms, then it works fine. ROOT only has
about 20% of the CPU left to it. I will point out that I noticed worse
results when having the variable (faster) period task signal a cond for
the other one to run (which is at a lower priority!) than I did having
them both set periodic. 

As I stated in a previous email, I'm starting to suspect the latencies
in syscalls as the source of my problem.

By the way, I enabled periodic timing again and had used
rt_timer_set_mode(125000) and I notice better performance in terms of
the irq0 handler under load. I think maybe this is due to the number of
threads running and, as Jan suggested, the simultaneous start date.

Steven



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] pit
  2008-02-12 13:14   ` Steven Seeger
@ 2008-02-12 13:33     ` Jan Kiszka
  2008-02-12 13:42       ` Steven Seeger
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2008-02-12 13:33 UTC (permalink / raw)
  To: Steven Seeger; +Cc: xenomai

Steven Seeger wrote:
>> That makes sense: Periodic timers keep on firing. That would explain
> up
>> to 16000 IRQ invocations per second. And the other 1000-2000 come from
>> Linux?
> 
> I have linux set to "tickless" in one setting, and 1000Hz in another.
> Weird.
> 
>> As suggested earlier: you can reduce the number of IRQ events by
> basing
>> your periodic tasks on the same start date. Then both will be woken up
>> at the same times and their priority will decide about the execution
> order.
> 
> The problem here is that many tasks are periodic, but not always
> required to run. Having them wake and wait for another period to do
> nothing is also overhead.

I'm not saying this. I'm saying that periodic task _timers_ fire anyway,
independent of the task waiting for them. So you should try to make them
fire at the same slots. That reduces the IRQ prologue/epilogue overhead
to 1, not n.

> 
>> Do you need Linux at 1 KHz? You may even want to try NO_HZ.
> 
> It's set to "tickles."
> 
>> I'm still wondering with what older numbers you compare all the nice
>> stats you now generate. Neither older Xenomai nor RTAI provide
>> comparable statistics. Are we doing fair comparisons here?
> 
> Well, RTAI had that output where it would give load values in 1/10th of
> a percent. (IIRC) The comparisons come from that. If, as Philippe says,
> RTAI charges that load to the pre-empted context, then I'm not sure
> where those numbers were coming from.

A fair comparison could be a non-real-time Linux benchmark that consumes
all the remaining CPU resources. Measure its execution time and you have
a reasonable metric for comparing the overall overhead. (The ROOT thread
CPU share with latest Xenomai should provide the same number, though.)

> 
> I do know that the system worked. I did some more experimenting and
> timing functions, and it seems the source of all my woes are syscalls.
> Namely, mutexes. There are several different resource sub-systems that
> layer off each other in this application. The highest-level one requires
> three mutex locks before doing a few IO operations. It takes me about
> 150 us to lock these three mutexes when nothing else is using them.

Lock nestings on a real-time system should be avoided, nesting levels >=
2 can generally be considered as a fatal design mistake. Just imagine
what the worst-case waiting time for your task is if all those locks are
contended! Maybe it is also worth thinking about some lock-less sync
patterns for some of your scenarios.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] pit
  2008-02-12 13:33     ` Jan Kiszka
@ 2008-02-12 13:42       ` Steven Seeger
  2008-02-12 14:09         ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Seeger @ 2008-02-12 13:42 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

> I'm not saying this. I'm saying that periodic task _timers_ fire
anyway,
> independent of the task waiting for them. So you should try to make
them
> fire at the same slots. That reduces the IRQ prologue/epilogue
overhead
> to 1, not n.

This makes sense, but if I simply disable the periodic timer then it
should have 0 timer overhead, and then I turn it periodic when I need
the task. The task timer won't fire if the periodic timer is disabled,
right?

> A fair comparison could be a non-real-time Linux benchmark that
consumes
> all the remaining CPU resources. Measure its execution time and you
have
> a reasonable metric for comparing the overall overhead. (The ROOT
thread
> CPU share with latest Xenomai should provide the same number, though.)

I should really get the old flash and take some measurements as
comparison.

> Lock nestings on a real-time system should be avoided, nesting levels
>=
> 2 can generally be considered as a fatal design mistake. Just imagine
> what the worst-case waiting time for your task is if all those locks
are
> contended! Maybe it is also worth thinking about some lock-less sync
> patterns for some of your scenarios.

Actually I disagree in this case. The reason is that each of the three
levels aren't interlocked. So, level 1 is the core, level 2 is something
that uses the core, and level 3 is something that uses something that
uses the core. Each one takes a little longer than the one below it, but
there is a very small worst case time for each that is deterministic. As
this time is (or should be!) much smaller than the base timer period
(125us) then things should be ok. They were, after all, just fine on the
RTAI version of this app. I was very pleased with the jitter and
response even on a crappy non-realtime friendly Geode.

I am starting to think about certain things, though, in order to keep
the syscalls to a minimum. We'd like to use Xenomai mainly for the
debugging capabilities that RTAI lacked. Having everything all in one
context makes for easy development. Obviously the sound driver is in the
kernel space, but that's small and simple.

Steven



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] pit
  2008-02-12 13:42       ` Steven Seeger
@ 2008-02-12 14:09         ` Jan Kiszka
  2008-02-12 14:57           ` Steven Seeger
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2008-02-12 14:09 UTC (permalink / raw)
  To: Steven Seeger; +Cc: xenomai

Steven Seeger wrote:
>> I'm not saying this. I'm saying that periodic task _timers_ fire
> anyway,
>> independent of the task waiting for them. So you should try to make
> them
>> fire at the same slots. That reduces the IRQ prologue/epilogue
> overhead
>> to 1, not n.
> 
> This makes sense, but if I simply disable the periodic timer then it
> should have 0 timer overhead, and then I turn it periodic when I need
> the task. The task timer won't fire if the periodic timer is disabled,
> right?

For sure, if there are system states where the periodic tasks do not
have to run, calling rt_task_set_periodic(..., TM_INFINITE) will help to
reduce unneeded load.

> 
>> A fair comparison could be a non-real-time Linux benchmark that
> consumes
>> all the remaining CPU resources. Measure its execution time and you
> have
>> a reasonable metric for comparing the overall overhead. (The ROOT
> thread
>> CPU share with latest Xenomai should provide the same number, though.)
> 
> I should really get the old flash and take some measurements as
> comparison.
> 
>> Lock nestings on a real-time system should be avoided, nesting levels
>> =
>> 2 can generally be considered as a fatal design mistake. Just imagine
>> what the worst-case waiting time for your task is if all those locks
> are
>> contended! Maybe it is also worth thinking about some lock-less sync
>> patterns for some of your scenarios.
> 
> Actually I disagree in this case. The reason is that each of the three
> levels aren't interlocked. So, level 1 is the core, level 2 is something
> that uses the core, and level 3 is something that uses something that
> uses the core. Each one takes a little longer than the one below it, but
> there is a very small worst case time for each that is deterministic. As

Of course, the above was a rule of thumb, and there can always be
reasonable exceptions. But they are /generally/ few. :)

> this time is (or should be!) much smaller than the base timer period
> (125us) then things should be ok. They were, after all, just fine on the
> RTAI version of this app. I was very pleased with the jitter and
> response even on a crappy non-realtime friendly Geode.

I bet the overhead was not measurable because everything lived in kernel
space, right?

> 
> I am starting to think about certain things, though, in order to keep
> the syscalls to a minimum. We'd like to use Xenomai mainly for the
> debugging capabilities that RTAI lacked. Having everything all in one
> context makes for easy development. Obviously the sound driver is in the
> kernel space, but that's small and simple.

Keep another advantage in mind: going to user space allows you (or your
contractor) to distribute closed-source applications without consulting
costly lawyers - if that can help at all... :)

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Xenomai-help] pit
  2008-02-12 14:09         ` Jan Kiszka
@ 2008-02-12 14:57           ` Steven Seeger
  0 siblings, 0 replies; 10+ messages in thread
From: Steven Seeger @ 2008-02-12 14:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

> For sure, if there are system states where the periodic tasks do not
> have to run, calling rt_task_set_periodic(..., TM_INFINITE) will help
to
> reduce unneeded load.

That's what I thought.

> I bet the overhead was not measurable because everything lived in
kernel
> space, right?

It was negligible, but I don't know for certain if that's because it was
in kernel space. It seems to be the case, though. 
 
> Keep another advantage in mind: going to user space allows you (or
your
> contractor) to distribute closed-source applications without
consulting
> costly lawyers - if that can help at all... :)

The source code is distributed with each unit. On bad sectors of a flash
card. But hey, it's there.

I'm going out of town but will return next week. I'll be thinking about
the design and share some ideas with you all. I will also go kernel
fault hunting for Philippe now that I know what he wants. I appreciate
everyone's help and feel bad that at this juncture I have to disappear
for a while. I can tell that you're all anxious to help figure out
what's going on so we (you) can make Xenomai a wonderful project that
leads to peace on earth and clean energy. ;)

Steven



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-02-12 14:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-12  0:45 [Xenomai-help] pit Steven Seeger
2008-02-12  7:51 ` Jan Kiszka
2008-02-12  9:13   ` Philippe Gerum
2008-02-12 13:14   ` Steven Seeger
2008-02-12 13:33     ` Jan Kiszka
2008-02-12 13:42       ` Steven Seeger
2008-02-12 14:09         ` Jan Kiszka
2008-02-12 14:57           ` Steven Seeger
2008-02-12  9:53 ` Philippe Gerum
2008-02-12 13:20   ` Steven Seeger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.