* [Xenomai-core] irq0 usage @ 2009-03-23 19:59 Steven Seeger 2009-03-23 22:54 ` Philippe Gerum 0 siblings, 1 reply; 9+ messages in thread From: Steven Seeger @ 2009-03-23 19:59 UTC (permalink / raw) To: xenomai We are still running into issues where irq0 is using a lot of CPU time. The same threads on an RTAi system on the same hardware used about 13% of the CPU but are using closer to 60% on Xenomai. I know there is some overhead with userspace calls but hte irq0 handler alone accounts for 20% of it. Are there any options that can speed things up? We've tried both one shot and periodic modes. I confirmed that the ISA i/o timing is 1.3usec per outb as expected. Steven ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] irq0 usage 2009-03-23 19:59 [Xenomai-core] irq0 usage Steven Seeger @ 2009-03-23 22:54 ` Philippe Gerum 2009-03-23 23:03 ` Steven Seeger 0 siblings, 1 reply; 9+ messages in thread From: Philippe Gerum @ 2009-03-23 22:54 UTC (permalink / raw) To: Steven Seeger; +Cc: xenomai On Mon, 2009-03-23 at 15:59 -0400, Steven Seeger wrote: > We are still running into issues where irq0 is using a lot of CPU > time. The same threads on an RTAi system on the same hardware used > about 13% of the CPU but are using closer to 60% on Xenomai. What are you comparing, I mean, exactly? All kernel RTAI vs all userland Xenomai? The timer handler is charged for the callbacks it runs, so it really boils down to what code is attached to Xenomai timers, aside of the built-in scheduler tick. When you measure that load, what does /proc/xenomai/timerstat say? > I know > there is some overhead with userspace calls but hte irq0 handler alone > accounts for 20% of it. Are there any options that can speed things up? > Yeah, but you won't like it: buy a Geode that has SEP support for syscalls and a working TSC, then switch on --enable-x86-sep. Ok, granted, that is _not_ funny. What would be interesting is to get the value reported for the timer interrupt when the standard latency test runs at the same frequency than your application does (use -p option). > We've tried both one shot and periodic modes. I confirmed that the ISA > i/o timing is 1.3usec per outb as expected. > > Steven > > > _______________________________________________ > Xenomai-core mailing list > Xenomai-core@domain.hid > https://mail.gna.org/listinfo/xenomai-core -- Philippe. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] irq0 usage 2009-03-23 22:54 ` Philippe Gerum @ 2009-03-23 23:03 ` Steven Seeger 2009-03-23 23:18 ` Philippe Gerum 0 siblings, 1 reply; 9+ messages in thread From: Steven Seeger @ 2009-03-23 23:03 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai > What are you comparing, I mean, exactly? > All kernel RTAI vs all userland Xenomai? Yes. > > > The timer handler is charged for the callbacks it runs, so it really > boils down to what code is attached to Xenomai timers, aside of the > built-in scheduler tick. In this case we have only a single RTDM timer that fires ever 125 us and does nothing (as a test.) It will be easy to remove this and compare the amount of usage irq0 handler uses without it. I know it'll be at least 14 or 15. > > > When you measure that load, what does /proc/xenomai/timerstat say? > >> I know >> there is some overhead with userspace calls but hte irq0 handler >> alone >> accounts for 20% of it. Are there any options that can speed things >> up? >> > > Yeah, but you won't like it: buy a Geode that has SEP support for > syscalls and a working TSC, then switch on --enable-x86-sep. Ok, > granted, that is _not_ funny. We have a new Geode that has SEP and yes, things are faster. Just how much overhead does syscall create? Is there no better option other than SEP? If we could have kernel threads work without corrupting userland FPU contexts then we could use our two higher priority drivers in a kernel module to save overhead. Is TSC really going to make that much of a difference? It seems that xenomai uses PIT anyway. We can build with TSC if we disable suspend on halt and it works. If we do this the usage stays the same. It may drop a couple tenths of a percent. > What would be interesting is to get the value reported for the timer > interrupt when the standard latency test runs at the same frequency > than > your application does (use -p option). So you mean look at cat /proc/stat/xenomai while running latency test - p? Steven ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] irq0 usage 2009-03-23 23:03 ` Steven Seeger @ 2009-03-23 23:18 ` Philippe Gerum 2009-03-23 23:32 ` Steven Seeger 0 siblings, 1 reply; 9+ messages in thread From: Philippe Gerum @ 2009-03-23 23:18 UTC (permalink / raw) To: Steven Seeger; +Cc: xenomai On Mon, 2009-03-23 at 19:03 -0400, Steven Seeger wrote: > > What are you comparing, I mean, exactly? > > All kernel RTAI vs all userland Xenomai? > > Yes. > Ok, so we will agree that the 20%/60% ratios can't be compared, in fact. > > > > > > The timer handler is charged for the callbacks it runs, so it really > > boils down to what code is attached to Xenomai timers, aside of the > > built-in scheduler tick. > > In this case we have only a single RTDM timer that fires ever 125 us > and does nothing (as a test.) It will be easy to remove this and > compare the amount of usage irq0 handler uses without it. I know it'll > be at least 14 or 15. Let's check this anyway. The fact that the GX still has to use a crappy 8253 PIT for timing and must emulate the TSC using one of the PIT channels is not helping at all. Emulating the TSC costs 1 x time_of(outb) + 2 x time_of(inb), each time a timestamp is read via the rdtsc emulation code. That is costly. > > > > > > > When you measure that load, what does /proc/xenomai/timerstat say? > > > >> I know > >> there is some overhead with userspace calls but hte irq0 handler > >> alone > >> accounts for 20% of it. Are there any options that can speed things > >> up? > >> > > > > Yeah, but you won't like it: buy a Geode that has SEP support for > > syscalls and a working TSC, then switch on --enable-x86-sep. Ok, > > granted, that is _not_ funny. > > We have a new Geode that has SEP and yes, things are faster. Just how > much overhead does syscall create? It switches to supervisor mode using an interrupt (0x80); that logic is really costly compared to the SEP entry. I'd say ~800ns-1us vs 200ns on average for your target. > Is there no better option other > than SEP? If we could have kernel threads work without corrupting > userland FPU contexts then we could use our two higher priority > drivers in a kernel module to save overhead. Btw, did you fix your driver code regarding the unprotected usage of FPU in pure Linux kernel context? > > Is TSC really going to make that much of a difference? It seems that > xenomai uses PIT anyway. Eh, no. TSC is always preferred when available. > We can build with TSC if we disable suspend > on halt and it works. If we do this the usage stays the same. It may > drop a couple tenths of a percent. Frankly, those figures are really surprising. rdtsc() is about 100-200ns, running rthal_get_8254_tsc() is a lot, lot more. > > > What would be interesting is to get the value reported for the timer > > interrupt when the standard latency test runs at the same frequency > > than > > your application does (use -p option). > > So you mean look at cat /proc/stat/xenomai while running latency test - > p? > No, when _your_ test runs. > Steven > -- Philippe. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] irq0 usage 2009-03-23 23:18 ` Philippe Gerum @ 2009-03-23 23:32 ` Steven Seeger 2009-03-24 8:59 ` Philippe Gerum 0 siblings, 1 reply; 9+ messages in thread From: Steven Seeger @ 2009-03-23 23:32 UTC (permalink / raw) To: xenomai > Ok, so we will agree that the 20%/60% ratios can't be compared, in > fact. Do you mean that this is not a fair comparison or that I should not be this slow compared to RTAI? > The fact that the GX still has to use a crappy 8253 PIT for timing and > must emulate the TSC using one of the PIT channels is not helping at > all. Emulating the TSC costs 1 x time_of(outb) + 2 x time_of(inb), > each > time a timestamp is read via the rdtsc emulation code. That is costly. Do you agree that if I build with TSC on and disable suspend on halt (or use idle=poll) that xenomai will use rdtsc? > It switches to supervisor mode using an interrupt (0x80); that logic > is > really costly compared to the SEP entry. I'd say ~800ns-1us vs 200ns > on > average for your target. This is bad, but since our fastest userspace period is 500us it is not a dealbreaker. Just rt_task_wait_next_period() and one mutex lock/ unlock is too much for it. > Btw, did you fix your driver code regarding the unprotected usage of > FPU > in pure Linux kernel context? Yes in fact the new driver does not use floats at all. It's purely integer math. > Eh, no. TSC is always preferred when available. I was looking at rthal_timer_program_shot(). > Frankly, those figures are really surprising. rdtsc() is about > 100-200ns, running rthal_get_8254_tsc() is a lot, lot more. I asked above if what we did would really use the TSC or not. What do you think? > No, when _your_ test runs. So we should run latency -p and then our test and look at the output? Thanks, Steven ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] irq0 usage 2009-03-23 23:32 ` Steven Seeger @ 2009-03-24 8:59 ` Philippe Gerum 0 siblings, 0 replies; 9+ messages in thread From: Philippe Gerum @ 2009-03-24 8:59 UTC (permalink / raw) To: Steven Seeger; +Cc: xenomai On Mon, 2009-03-23 at 19:32 -0400, Steven Seeger wrote: > > Ok, so we will agree that the 20%/60% ratios can't be compared, in > > fact. > > Do you mean that this is not a fair comparison or that I should not be > this slow compared to RTAI? > I mean that you were comparing apples to oranges. If you really want to compare them in order to figure out if a significant loss of performance happened, then run your application in an RTAI/LXRT context in userland. > > The fact that the GX still has to use a crappy 8253 PIT for timing and > > must emulate the TSC using one of the PIT channels is not helping at > > all. Emulating the TSC costs 1 x time_of(outb) + 2 x time_of(inb), > > each > > time a timestamp is read via the rdtsc emulation code. That is costly. > > Do you agree that if I build with TSC on and disable suspend on halt > (or use idle=poll) that xenomai will use rdtsc? Xenomai will use rdtsc as soon as the kernel wants to use it. And the kernel will do that as soon as the CPU model you picked in your setup does exhibit TSC support. This is not a matter of Xenomai choosing to ignore TSC support when available to the kernel, this never happens. I seem to remember that your target has a bad TSC and loses time, unless idle=poll is given; at the same time, we don't handle the SCx200 hires timer that is Geode-specific, so there is likely no fallback option to this issue but using idle=poll. > > > It switches to supervisor mode using an interrupt (0x80); that logic > > is > > really costly compared to the SEP entry. I'd say ~800ns-1us vs 200ns > > on > > average for your target. > > This is bad, but since our fastest userspace period is 500us it is not > a dealbreaker. Just rt_task_wait_next_period() and one mutex lock/ > unlock is too much for it. > 2.4.x will issue 3 syscalls there, 2.5.x only 1 most of the time. If you really want to understand what is going on your system, you should definitely enable the I-pipe tracer, and have a look at the processing that takes place. In any case, 3 syscalls over a 2Khz loop are no big deal over a sane hw; the problem I see is that your target is cumulating a lot of issues: buggy TSC, no SEP, sluggish ISA bus, no local APIC, braindamage C3 state. It's a bit like that hw would want to prevent you from using it in real-time mode, I mean. Again, the best way to know what is going on is to get a trace snapshot from the I-pipe tracer. You would get detailed timing information for kernel space activity, on a per-routine basis. > > Btw, did you fix your driver code regarding the unprotected usage of > > FPU > > in pure Linux kernel context? > > Yes in fact the new driver does not use floats at all. It's purely > integer math. > > > Eh, no. TSC is always preferred when available. > > I was looking at rthal_timer_program_shot(). > This is used to program the next aperiodic shot and this should not happen more than once per sample. OTOH, getting the CPU time via the TSC emulation occurs a few times per sample. > > Frankly, those figures are really surprising. rdtsc() is about > > 100-200ns, running rthal_get_8254_tsc() is a lot, lot more. > > I asked above if what we did would really use the TSC or not. What do > you think? > Do you have CONFIG_X86_TSC enabled in your kernel config? If so, then you do use TSC with Xenomai as well. > > No, when _your_ test runs. > > So we should run latency -p and then our test and look at the output? > Run latency -p 500 in the same load conditions than your app, and while this is running: - dump /proc/xenomai/timerstat; we will find out what timers are outstanding. - dump /proc/xenomai/stat a few times; we will find out the typical CPU consumption of the timer tick. Then, do the same with your application, and send the outputs. > Thanks, > Steven > > _______________________________________________ > Xenomai-core mailing list > Xenomai-core@domain.hid > https://mail.gna.org/listinfo/xenomai-core -- Philippe. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Xenomai-core] irq0 usage @ 2009-03-26 17:25 Steven Seeger 2009-03-26 17:28 ` Steven Seeger 0 siblings, 1 reply; 9+ messages in thread From: Steven Seeger @ 2009-03-26 17:25 UTC (permalink / raw) To: xenomai Using TSC really dropped us down. I don't know wh\y the timekeeper says tsc is unstable. We ran our system for an 8 minute cycle and timed it with a stopwatch, and it was accurate to the second. On our test irq0 usage dropped from 19% to 13%. Thanks for the help, guys. Steven ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] irq0 usage 2009-03-26 17:25 Steven Seeger @ 2009-03-26 17:28 ` Steven Seeger 2009-03-26 17:31 ` Gilles Chanteperdrix 0 siblings, 1 reply; 9+ messages in thread From: Steven Seeger @ 2009-03-26 17:28 UTC (permalink / raw) To: Steven Seeger; +Cc: xenomai I forgot to mention. In order to keep tsc as the clockdev I disabled the code in the kernel that removes it as the clocksource. Steven On Mar 26, 2009, at 1:25 PM, Steven Seeger wrote: > Using TSC really dropped us down. I don't know wh\y the timekeeper > says tsc is unstable. We ran our system for an 8 minute cycle and > timed it with a stopwatch, and it was accurate to the second. > > On our test irq0 usage dropped from 19% to 13%. Thanks for the help, > guys. > > Steven > > > _______________________________________________ > Xenomai-core mailing list > Xenomai-core@domain.hid > https://mail.gna.org/listinfo/xenomai-core ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] irq0 usage 2009-03-26 17:28 ` Steven Seeger @ 2009-03-26 17:31 ` Gilles Chanteperdrix 0 siblings, 0 replies; 9+ messages in thread From: Gilles Chanteperdrix @ 2009-03-26 17:31 UTC (permalink / raw) To: Steven Seeger; +Cc: xenomai Steven Seeger wrote: > I forgot to mention. In order to keep tsc as the clockdev I disabled > the code in the kernel that removes it as the clocksource. Even when idle=poll or nohlt the kernel disables the tsc as clocksource ? Also note that Xenomai does not care if Linux uses the tsc as clocksource or not to use the tsc. -- Gilles. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-03-26 17:31 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-03-23 19:59 [Xenomai-core] irq0 usage Steven Seeger 2009-03-23 22:54 ` Philippe Gerum 2009-03-23 23:03 ` Steven Seeger 2009-03-23 23:18 ` Philippe Gerum 2009-03-23 23:32 ` Steven Seeger 2009-03-24 8:59 ` Philippe Gerum -- strict thread matches above, loose matches on Subject: below -- 2009-03-26 17:25 Steven Seeger 2009-03-26 17:28 ` Steven Seeger 2009-03-26 17:31 ` Gilles Chanteperdrix
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.