* [Xenomai-help] pit @ 2008-02-12 0:45 Steven Seeger 2008-02-12 7:51 ` Jan Kiszka 2008-02-12 9:53 ` Philippe Gerum 0 siblings, 2 replies; 10+ messages in thread From: Steven Seeger @ 2008-02-12 0:45 UTC (permalink / raw) To: xenomai [-- Attachment #1: Type: text/plain, Size: 1235 bytes --] I compiled the kernel for 586 and am running the PIT timer. I still get the 17000-18000 context switches per second, and now the irq0 handler is taking up 11% of the CPU instead of only 5% when the two 8000Hz tasks are loaded but delayed on events. I think that the problem isn't with pit, but with the tasks being periodic even though they are blocked. Running in PIT mode with periodic timing on uses only 9.5% of the CPU. I show about 9000 context switches per second. (the 2 8000 hz tasks and the 1000 hz linux interrupt.) With periodic timing, it's 5.4% when the tasks idle and about 9000 context switches a second. When one of them becomes active, the irq0 handler is using 10% of the CPU and the sound task is using about 8%. These are two kernel tasks. Userspace stack size is set to 64k. I forgot to mention this to Philippe earlier. Perhaps the problem is the overhead that the timer handler introduces being able to support multiple skins with individual timebases. It sounds like in order to save some cpu cycles, I may want to turn off periodicity while threads are idle and also avoid setting threads periodic when they can be driven some other way. Steven [-- Attachment #2: Type: text/html, Size: 3888 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xenomai-help] pit 2008-02-12 0:45 [Xenomai-help] pit Steven Seeger @ 2008-02-12 7:51 ` Jan Kiszka 2008-02-12 9:13 ` Philippe Gerum 2008-02-12 13:14 ` Steven Seeger 2008-02-12 9:53 ` Philippe Gerum 1 sibling, 2 replies; 10+ messages in thread From: Jan Kiszka @ 2008-02-12 7:51 UTC (permalink / raw) To: Steven Seeger; +Cc: xenomai [-- Attachment #1: Type: text/plain, Size: 1936 bytes --] Steven Seeger wrote: > I compiled the kernel for 586 and am running the PIT timer. I still get > the 17000-18000 context switches per second, and now the irq0 handler is > taking up 11% of the CPU instead of only 5% when the two 8000Hz tasks > are loaded but delayed on events. I think that the problem isn't with > pit, but with the tasks being periodic even though they are blocked. That makes sense: Periodic timers keep on firing. That would explain up to 16000 IRQ invocations per second. And the other 1000-2000 come from Linux? As suggested earlier: you can reduce the number of IRQ events by basing your periodic tasks on the same start date. Then both will be woken up at the same times and their priority will decide about the execution order. > > Running in PIT mode with periodic timing on uses only 9.5% of the CPU. I > show about 9000 context switches per second. (the 2 8000 hz tasks and > the 1000 hz linux interrupt.) Do you need Linux at 1 KHz? You may even want to try NO_HZ. > > With periodic timing, it's 5.4% when the tasks idle and about 9000 > context switches a second. When one of them becomes active, the irq0 > handler is using 10% of the CPU and the sound task is using about 8%. > These are two kernel tasks. > > > > Userspace stack size is set to 64k. I forgot to mention this to Philippe > earlier. > > > > Perhaps the problem is the overhead that the timer handler introduces > being able to support multiple skins with individual timebases. It > sounds like in order to save some cpu cycles, I may want to turn off > periodicity while threads are idle and also avoid setting threads > periodic when they can be driven some other way. I'm still wondering with what older numbers you compare all the nice stats you now generate. Neither older Xenomai nor RTAI provide comparable statistics. Are we doing fair comparisons here? Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xenomai-help] pit 2008-02-12 7:51 ` Jan Kiszka @ 2008-02-12 9:13 ` Philippe Gerum 2008-02-12 13:14 ` Steven Seeger 1 sibling, 0 replies; 10+ messages in thread From: Philippe Gerum @ 2008-02-12 9:13 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai Jan Kiszka wrote: > Steven Seeger wrote: >> I compiled the kernel for 586 and am running the PIT timer. I still get >> the 17000-18000 context switches per second, and now the irq0 handler is >> taking up 11% of the CPU instead of only 5% when the two 8000Hz tasks >> are loaded but delayed on events. I think that the problem isn't with >> pit, but with the tasks being periodic even though they are blocked. > > That makes sense: Periodic timers keep on firing. That would explain up > to 16000 IRQ invocations per second. And the other 1000-2000 come from > Linux? > > As suggested earlier: you can reduce the number of IRQ events by basing > your periodic tasks on the same start date. Then both will be woken up > at the same times and their priority will decide about the execution order. > >> >> Running in PIT mode with periodic timing on uses only 9.5% of the CPU. I >> show about 9000 context switches per second. (the 2 8000 hz tasks and >> the 1000 hz linux interrupt.) > > Do you need Linux at 1 KHz? You may even want to try NO_HZ. > >> >> With periodic timing, it's 5.4% when the tasks idle and about 9000 >> context switches a second. When one of them becomes active, the irq0 >> handler is using 10% of the CPU and the sound task is using about 8%. >> These are two kernel tasks. >> >> >> >> Userspace stack size is set to 64k. I forgot to mention this to Philippe >> earlier. >> >> >> >> Perhaps the problem is the overhead that the timer handler introduces >> being able to support multiple skins with individual timebases. It >> sounds like in order to save some cpu cycles, I may want to turn off >> periodicity while threads are idle and also avoid setting threads >> periodic when they can be driven some other way. > > I'm still wondering with what older numbers you compare all the nice > stats you now generate. Neither older Xenomai nor RTAI provide > comparable statistics. Are we doing fair comparisons here? > No, because RTAI charges interrupt load to the preempted task context. -- Philippe. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xenomai-help] pit 2008-02-12 7:51 ` Jan Kiszka 2008-02-12 9:13 ` Philippe Gerum @ 2008-02-12 13:14 ` Steven Seeger 2008-02-12 13:33 ` Jan Kiszka 1 sibling, 1 reply; 10+ messages in thread From: Steven Seeger @ 2008-02-12 13:14 UTC (permalink / raw) To: jan.kiszka; +Cc: xenomai > That makes sense: Periodic timers keep on firing. That would explain up > to 16000 IRQ invocations per second. And the other 1000-2000 come from > Linux? I have linux set to "tickless" in one setting, and 1000Hz in another. Weird. > > As suggested earlier: you can reduce the number of IRQ events by basing > your periodic tasks on the same start date. Then both will be woken up > at the same times and their priority will decide about the execution order. The problem here is that many tasks are periodic, but not always required to run. Having them wake and wait for another period to do nothing is also overhead. > Do you need Linux at 1 KHz? You may even want to try NO_HZ. It's set to "tickles." > I'm still wondering with what older numbers you compare all the nice > stats you now generate. Neither older Xenomai nor RTAI provide > comparable statistics. Are we doing fair comparisons here? Well, RTAI had that output where it would give load values in 1/10th of a percent. (IIRC) The comparisons come from that. If, as Philippe says, RTAI charges that load to the pre-empted context, then I'm not sure where those numbers were coming from. I do know that the system worked. I did some more experimenting and timing functions, and it seems the source of all my woes are syscalls. Namely, mutexes. There are several different resource sub-systems that layer off each other in this application. The highest-level one requires three mutex locks before doing a few IO operations. It takes me about 150 us to lock these three mutexes when nothing else is using them. Steven ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xenomai-help] pit 2008-02-12 13:14 ` Steven Seeger @ 2008-02-12 13:33 ` Jan Kiszka 2008-02-12 13:42 ` Steven Seeger 0 siblings, 1 reply; 10+ messages in thread From: Jan Kiszka @ 2008-02-12 13:33 UTC (permalink / raw) To: Steven Seeger; +Cc: xenomai Steven Seeger wrote: >> That makes sense: Periodic timers keep on firing. That would explain > up >> to 16000 IRQ invocations per second. And the other 1000-2000 come from >> Linux? > > I have linux set to "tickless" in one setting, and 1000Hz in another. > Weird. > >> As suggested earlier: you can reduce the number of IRQ events by > basing >> your periodic tasks on the same start date. Then both will be woken up >> at the same times and their priority will decide about the execution > order. > > The problem here is that many tasks are periodic, but not always > required to run. Having them wake and wait for another period to do > nothing is also overhead. I'm not saying this. I'm saying that periodic task _timers_ fire anyway, independent of the task waiting for them. So you should try to make them fire at the same slots. That reduces the IRQ prologue/epilogue overhead to 1, not n. > >> Do you need Linux at 1 KHz? You may even want to try NO_HZ. > > It's set to "tickles." > >> I'm still wondering with what older numbers you compare all the nice >> stats you now generate. Neither older Xenomai nor RTAI provide >> comparable statistics. Are we doing fair comparisons here? > > Well, RTAI had that output where it would give load values in 1/10th of > a percent. (IIRC) The comparisons come from that. If, as Philippe says, > RTAI charges that load to the pre-empted context, then I'm not sure > where those numbers were coming from. A fair comparison could be a non-real-time Linux benchmark that consumes all the remaining CPU resources. Measure its execution time and you have a reasonable metric for comparing the overall overhead. (The ROOT thread CPU share with latest Xenomai should provide the same number, though.) > > I do know that the system worked. I did some more experimenting and > timing functions, and it seems the source of all my woes are syscalls. > Namely, mutexes. There are several different resource sub-systems that > layer off each other in this application. The highest-level one requires > three mutex locks before doing a few IO operations. It takes me about > 150 us to lock these three mutexes when nothing else is using them. Lock nestings on a real-time system should be avoided, nesting levels >= 2 can generally be considered as a fatal design mistake. Just imagine what the worst-case waiting time for your task is if all those locks are contended! Maybe it is also worth thinking about some lock-less sync patterns for some of your scenarios. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xenomai-help] pit 2008-02-12 13:33 ` Jan Kiszka @ 2008-02-12 13:42 ` Steven Seeger 2008-02-12 14:09 ` Jan Kiszka 0 siblings, 1 reply; 10+ messages in thread From: Steven Seeger @ 2008-02-12 13:42 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai > I'm not saying this. I'm saying that periodic task _timers_ fire anyway, > independent of the task waiting for them. So you should try to make them > fire at the same slots. That reduces the IRQ prologue/epilogue overhead > to 1, not n. This makes sense, but if I simply disable the periodic timer then it should have 0 timer overhead, and then I turn it periodic when I need the task. The task timer won't fire if the periodic timer is disabled, right? > A fair comparison could be a non-real-time Linux benchmark that consumes > all the remaining CPU resources. Measure its execution time and you have > a reasonable metric for comparing the overall overhead. (The ROOT thread > CPU share with latest Xenomai should provide the same number, though.) I should really get the old flash and take some measurements as comparison. > Lock nestings on a real-time system should be avoided, nesting levels >= > 2 can generally be considered as a fatal design mistake. Just imagine > what the worst-case waiting time for your task is if all those locks are > contended! Maybe it is also worth thinking about some lock-less sync > patterns for some of your scenarios. Actually I disagree in this case. The reason is that each of the three levels aren't interlocked. So, level 1 is the core, level 2 is something that uses the core, and level 3 is something that uses something that uses the core. Each one takes a little longer than the one below it, but there is a very small worst case time for each that is deterministic. As this time is (or should be!) much smaller than the base timer period (125us) then things should be ok. They were, after all, just fine on the RTAI version of this app. I was very pleased with the jitter and response even on a crappy non-realtime friendly Geode. I am starting to think about certain things, though, in order to keep the syscalls to a minimum. We'd like to use Xenomai mainly for the debugging capabilities that RTAI lacked. Having everything all in one context makes for easy development. Obviously the sound driver is in the kernel space, but that's small and simple. Steven ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xenomai-help] pit 2008-02-12 13:42 ` Steven Seeger @ 2008-02-12 14:09 ` Jan Kiszka 2008-02-12 14:57 ` Steven Seeger 0 siblings, 1 reply; 10+ messages in thread From: Jan Kiszka @ 2008-02-12 14:09 UTC (permalink / raw) To: Steven Seeger; +Cc: xenomai Steven Seeger wrote: >> I'm not saying this. I'm saying that periodic task _timers_ fire > anyway, >> independent of the task waiting for them. So you should try to make > them >> fire at the same slots. That reduces the IRQ prologue/epilogue > overhead >> to 1, not n. > > This makes sense, but if I simply disable the periodic timer then it > should have 0 timer overhead, and then I turn it periodic when I need > the task. The task timer won't fire if the periodic timer is disabled, > right? For sure, if there are system states where the periodic tasks do not have to run, calling rt_task_set_periodic(..., TM_INFINITE) will help to reduce unneeded load. > >> A fair comparison could be a non-real-time Linux benchmark that > consumes >> all the remaining CPU resources. Measure its execution time and you > have >> a reasonable metric for comparing the overall overhead. (The ROOT > thread >> CPU share with latest Xenomai should provide the same number, though.) > > I should really get the old flash and take some measurements as > comparison. > >> Lock nestings on a real-time system should be avoided, nesting levels >> = >> 2 can generally be considered as a fatal design mistake. Just imagine >> what the worst-case waiting time for your task is if all those locks > are >> contended! Maybe it is also worth thinking about some lock-less sync >> patterns for some of your scenarios. > > Actually I disagree in this case. The reason is that each of the three > levels aren't interlocked. So, level 1 is the core, level 2 is something > that uses the core, and level 3 is something that uses something that > uses the core. Each one takes a little longer than the one below it, but > there is a very small worst case time for each that is deterministic. As Of course, the above was a rule of thumb, and there can always be reasonable exceptions. But they are /generally/ few. :) > this time is (or should be!) much smaller than the base timer period > (125us) then things should be ok. They were, after all, just fine on the > RTAI version of this app. I was very pleased with the jitter and > response even on a crappy non-realtime friendly Geode. I bet the overhead was not measurable because everything lived in kernel space, right? > > I am starting to think about certain things, though, in order to keep > the syscalls to a minimum. We'd like to use Xenomai mainly for the > debugging capabilities that RTAI lacked. Having everything all in one > context makes for easy development. Obviously the sound driver is in the > kernel space, but that's small and simple. Keep another advantage in mind: going to user space allows you (or your contractor) to distribute closed-source applications without consulting costly lawyers - if that can help at all... :) Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xenomai-help] pit 2008-02-12 14:09 ` Jan Kiszka @ 2008-02-12 14:57 ` Steven Seeger 0 siblings, 0 replies; 10+ messages in thread From: Steven Seeger @ 2008-02-12 14:57 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai > For sure, if there are system states where the periodic tasks do not > have to run, calling rt_task_set_periodic(..., TM_INFINITE) will help to > reduce unneeded load. That's what I thought. > I bet the overhead was not measurable because everything lived in kernel > space, right? It was negligible, but I don't know for certain if that's because it was in kernel space. It seems to be the case, though. > Keep another advantage in mind: going to user space allows you (or your > contractor) to distribute closed-source applications without consulting > costly lawyers - if that can help at all... :) The source code is distributed with each unit. On bad sectors of a flash card. But hey, it's there. I'm going out of town but will return next week. I'll be thinking about the design and share some ideas with you all. I will also go kernel fault hunting for Philippe now that I know what he wants. I appreciate everyone's help and feel bad that at this juncture I have to disappear for a while. I can tell that you're all anxious to help figure out what's going on so we (you) can make Xenomai a wonderful project that leads to peace on earth and clean energy. ;) Steven ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xenomai-help] pit 2008-02-12 0:45 [Xenomai-help] pit Steven Seeger 2008-02-12 7:51 ` Jan Kiszka @ 2008-02-12 9:53 ` Philippe Gerum 2008-02-12 13:20 ` Steven Seeger 1 sibling, 1 reply; 10+ messages in thread From: Philippe Gerum @ 2008-02-12 9:53 UTC (permalink / raw) To: Steven Seeger; +Cc: xenomai Steven Seeger wrote: > I compiled the kernel for 586 and am running the PIT timer. I still get > the 17000-18000 context switches per second, and now the irq0 handler is > taking up 11% of the CPU instead of only 5% when the two 8000Hz tasks > are loaded but delayed on events. I think that the problem isn’t with > pit, but with the tasks being periodic even though they are blocked. > RTAI (at least the version you used) has a single per-task internal timer, which is not really a timer, but rather a "resume_time" field, the RTAI core is directly testing to know whether it should wake up a delayed task. Therefore, if your task used to call rt_task_make_periodic() on RTAI, then just blocked on a semaphore with no timeout value, then this task was dequeued from the timed task list, and for that reason, no oneshot timer ticks had to be programmed to wake it up anymore. The drawback is that you have no timer object, independent from the task itself. Everything has to be related to this single "resume_time" field, on a per-task basis. This is why the RTAI core has to save and restore this value when nesting some timed operations for instance. Xenomai has independent timers, which also means that if you call rt_task_set_periodic() on a task, it will arm an internal per-task timer (thread->ptimer) which will tick independently, regardless of what your task is currently doing. So you will have timer ticks fired for that task, even if it is blocked on some synchro with no timeout, in which case, the tick handler will attempt to resume the task, but since the DELAYED+BLOCKED wait states are conjunctive, it won't be able to. I'd suggest that you choose whether your task has to undergo a periodic timeline or not, i.e. whether it should call rt_task_wait_period() to wait for the next timeslot, or block on some synchronization object to resume its processing for the current period. Using both is one too many. > > > Running in PIT mode with periodic timing on uses only 9.5% of the CPU. I > show about 9000 context switches per second. (the 2 8000 hz tasks and > the 1000 hz linux interrupt.) > > > > With periodic timing, it’s 5.4% when the tasks idle and about 9000 > context switches a second. When one of them becomes active, the irq0 > handler is using 10% of the CPU and the sound task is using about 8%. > These are two kernel tasks. > > > > Userspace stack size is set to 64k. I forgot to mention this to Philippe > earlier. > > > > Perhaps the problem is the overhead that the timer handler introduces > being able to support multiple skins with individual timebases. It > sounds like in order to save some cpu cycles, I may want to turn off > periodicity while threads are idle and also avoid setting threads > periodic when they can be driven some other way. > > > > Steven > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Xenomai-help mailing list > Xenomai-help@domain.hid > https://mail.gna.org/listinfo/xenomai-help -- Philippe. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xenomai-help] pit 2008-02-12 9:53 ` Philippe Gerum @ 2008-02-12 13:20 ` Steven Seeger 0 siblings, 0 replies; 10+ messages in thread From: Steven Seeger @ 2008-02-12 13:20 UTC (permalink / raw) To: rpm; +Cc: xenomai > RTAI (at least the version you used) has a single per-task internal > timer, which is not really a timer, but rather a "resume_time" field, > the RTAI core is directly testing to know whether it should wake up a > delayed task. > > Therefore, if your task used to call rt_task_make_periodic() on RTAI, > then just blocked on a semaphore with no timeout value, then this task > was dequeued from the timed task list, and for that reason, no oneshot > timer ticks had to be programmed to wake it up anymore. The drawback is > that you have no timer object, independent from the task itself. > Everything has to be related to this single "resume_time" field, on a > per-task basis. This is why the RTAI core has to save and restore this > value when nesting some timed operations for instance. > > Xenomai has independent timers, which also means that if you call > rt_task_set_periodic() on a task, it will arm an internal per-task timer > (thread->ptimer) which will tick independently, regardless of what your > task is currently doing. So you will have timer ticks fired for that > task, even if it is blocked on some synchro with no timeout, in which > case, the tick handler will attempt to resume the task, but since the > DELAYED+BLOCKED wait states are conjunctive, it won't be able to. > > I'd suggest that you choose whether your task has to undergo a periodic > timeline or not, i.e. whether it should call rt_task_wait_period() to > wait for the next timeslot, or block on some synchronization object to > resume its processing for the current period. Using both is one too many. > Hi Philippe. Thanks for your explanation. I have made a change to the sound driver to remove the task periodic timer before waiting on a synchronization object. The effect is that irq0 doesn't do as much work unless the tasks are running. In the case of my high-load tasks, one runs at a variable period (motor ramp-up/ramp-down control) and the other runs every 2ms to take a couple A/D measurements. The problem is that these two tasks running together take up too many resources. If the measurement task runs every 3 ms, then it works fine. ROOT only has about 20% of the CPU left to it. I will point out that I noticed worse results when having the variable (faster) period task signal a cond for the other one to run (which is at a lower priority!) than I did having them both set periodic. As I stated in a previous email, I'm starting to suspect the latencies in syscalls as the source of my problem. By the way, I enabled periodic timing again and had used rt_timer_set_mode(125000) and I notice better performance in terms of the irq0 handler under load. I think maybe this is due to the number of threads running and, as Jan suggested, the simultaneous start date. Steven ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-02-12 14:57 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-02-12 0:45 [Xenomai-help] pit Steven Seeger 2008-02-12 7:51 ` Jan Kiszka 2008-02-12 9:13 ` Philippe Gerum 2008-02-12 13:14 ` Steven Seeger 2008-02-12 13:33 ` Jan Kiszka 2008-02-12 13:42 ` Steven Seeger 2008-02-12 14:09 ` Jan Kiszka 2008-02-12 14:57 ` Steven Seeger 2008-02-12 9:53 ` Philippe Gerum 2008-02-12 13:20 ` Steven Seeger
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.