* [Xenomai-core] irq0 usage
@ 2009-03-23 19:59 Steven Seeger
2009-03-23 22:54 ` Philippe Gerum
0 siblings, 1 reply; 9+ messages in thread
From: Steven Seeger @ 2009-03-23 19:59 UTC (permalink / raw)
To: xenomai
We are still running into issues where irq0 is using a lot of CPU
time. The same threads on an RTAi system on the same hardware used
about 13% of the CPU but are using closer to 60% on Xenomai. I know
there is some overhead with userspace calls but hte irq0 handler alone
accounts for 20% of it. Are there any options that can speed things up?
We've tried both one shot and periodic modes. I confirmed that the ISA
i/o timing is 1.3usec per outb as expected.
Steven
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] irq0 usage
2009-03-23 19:59 [Xenomai-core] irq0 usage Steven Seeger
@ 2009-03-23 22:54 ` Philippe Gerum
2009-03-23 23:03 ` Steven Seeger
0 siblings, 1 reply; 9+ messages in thread
From: Philippe Gerum @ 2009-03-23 22:54 UTC (permalink / raw)
To: Steven Seeger; +Cc: xenomai
On Mon, 2009-03-23 at 15:59 -0400, Steven Seeger wrote:
> We are still running into issues where irq0 is using a lot of CPU
> time. The same threads on an RTAi system on the same hardware used
> about 13% of the CPU but are using closer to 60% on Xenomai.
What are you comparing, I mean, exactly?
All kernel RTAI vs all userland Xenomai?
The timer handler is charged for the callbacks it runs, so it really
boils down to what code is attached to Xenomai timers, aside of the
built-in scheduler tick.
When you measure that load, what does /proc/xenomai/timerstat say?
> I know
> there is some overhead with userspace calls but hte irq0 handler alone
> accounts for 20% of it. Are there any options that can speed things up?
>
Yeah, but you won't like it: buy a Geode that has SEP support for
syscalls and a working TSC, then switch on --enable-x86-sep. Ok,
granted, that is _not_ funny.
What would be interesting is to get the value reported for the timer
interrupt when the standard latency test runs at the same frequency than
your application does (use -p option).
> We've tried both one shot and periodic modes. I confirmed that the ISA
> i/o timing is 1.3usec per outb as expected.
>
> Steven
>
>
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core
--
Philippe.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] irq0 usage
2009-03-23 22:54 ` Philippe Gerum
@ 2009-03-23 23:03 ` Steven Seeger
2009-03-23 23:18 ` Philippe Gerum
0 siblings, 1 reply; 9+ messages in thread
From: Steven Seeger @ 2009-03-23 23:03 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
> What are you comparing, I mean, exactly?
> All kernel RTAI vs all userland Xenomai?
Yes.
>
>
> The timer handler is charged for the callbacks it runs, so it really
> boils down to what code is attached to Xenomai timers, aside of the
> built-in scheduler tick.
In this case we have only a single RTDM timer that fires ever 125 us
and does nothing (as a test.) It will be easy to remove this and
compare the amount of usage irq0 handler uses without it. I know it'll
be at least 14 or 15.
>
>
> When you measure that load, what does /proc/xenomai/timerstat say?
>
>> I know
>> there is some overhead with userspace calls but hte irq0 handler
>> alone
>> accounts for 20% of it. Are there any options that can speed things
>> up?
>>
>
> Yeah, but you won't like it: buy a Geode that has SEP support for
> syscalls and a working TSC, then switch on --enable-x86-sep. Ok,
> granted, that is _not_ funny.
We have a new Geode that has SEP and yes, things are faster. Just how
much overhead does syscall create? Is there no better option other
than SEP? If we could have kernel threads work without corrupting
userland FPU contexts then we could use our two higher priority
drivers in a kernel module to save overhead.
Is TSC really going to make that much of a difference? It seems that
xenomai uses PIT anyway. We can build with TSC if we disable suspend
on halt and it works. If we do this the usage stays the same. It may
drop a couple tenths of a percent.
> What would be interesting is to get the value reported for the timer
> interrupt when the standard latency test runs at the same frequency
> than
> your application does (use -p option).
So you mean look at cat /proc/stat/xenomai while running latency test -
p?
Steven
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] irq0 usage
2009-03-23 23:03 ` Steven Seeger
@ 2009-03-23 23:18 ` Philippe Gerum
2009-03-23 23:32 ` Steven Seeger
0 siblings, 1 reply; 9+ messages in thread
From: Philippe Gerum @ 2009-03-23 23:18 UTC (permalink / raw)
To: Steven Seeger; +Cc: xenomai
On Mon, 2009-03-23 at 19:03 -0400, Steven Seeger wrote:
> > What are you comparing, I mean, exactly?
> > All kernel RTAI vs all userland Xenomai?
>
> Yes.
>
Ok, so we will agree that the 20%/60% ratios can't be compared, in fact.
> >
> >
> > The timer handler is charged for the callbacks it runs, so it really
> > boils down to what code is attached to Xenomai timers, aside of the
> > built-in scheduler tick.
>
> In this case we have only a single RTDM timer that fires ever 125 us
> and does nothing (as a test.) It will be easy to remove this and
> compare the amount of usage irq0 handler uses without it. I know it'll
> be at least 14 or 15.
Let's check this anyway.
The fact that the GX still has to use a crappy 8253 PIT for timing and
must emulate the TSC using one of the PIT channels is not helping at
all. Emulating the TSC costs 1 x time_of(outb) + 2 x time_of(inb), each
time a timestamp is read via the rdtsc emulation code. That is costly.
>
> >
> >
> > When you measure that load, what does /proc/xenomai/timerstat say?
> >
> >> I know
> >> there is some overhead with userspace calls but hte irq0 handler
> >> alone
> >> accounts for 20% of it. Are there any options that can speed things
> >> up?
> >>
> >
> > Yeah, but you won't like it: buy a Geode that has SEP support for
> > syscalls and a working TSC, then switch on --enable-x86-sep. Ok,
> > granted, that is _not_ funny.
>
> We have a new Geode that has SEP and yes, things are faster. Just how
> much overhead does syscall create?
It switches to supervisor mode using an interrupt (0x80); that logic is
really costly compared to the SEP entry. I'd say ~800ns-1us vs 200ns on
average for your target.
> Is there no better option other
> than SEP? If we could have kernel threads work without corrupting
> userland FPU contexts then we could use our two higher priority
> drivers in a kernel module to save overhead.
Btw, did you fix your driver code regarding the unprotected usage of FPU
in pure Linux kernel context?
>
> Is TSC really going to make that much of a difference? It seems that
> xenomai uses PIT anyway.
Eh, no. TSC is always preferred when available.
> We can build with TSC if we disable suspend
> on halt and it works. If we do this the usage stays the same. It may
> drop a couple tenths of a percent.
Frankly, those figures are really surprising. rdtsc() is about
100-200ns, running rthal_get_8254_tsc() is a lot, lot more.
>
> > What would be interesting is to get the value reported for the timer
> > interrupt when the standard latency test runs at the same frequency
> > than
> > your application does (use -p option).
>
> So you mean look at cat /proc/stat/xenomai while running latency test -
> p?
>
No, when _your_ test runs.
> Steven
>
--
Philippe.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] irq0 usage
2009-03-23 23:18 ` Philippe Gerum
@ 2009-03-23 23:32 ` Steven Seeger
2009-03-24 8:59 ` Philippe Gerum
0 siblings, 1 reply; 9+ messages in thread
From: Steven Seeger @ 2009-03-23 23:32 UTC (permalink / raw)
To: xenomai
> Ok, so we will agree that the 20%/60% ratios can't be compared, in
> fact.
Do you mean that this is not a fair comparison or that I should not be
this slow compared to RTAI?
> The fact that the GX still has to use a crappy 8253 PIT for timing and
> must emulate the TSC using one of the PIT channels is not helping at
> all. Emulating the TSC costs 1 x time_of(outb) + 2 x time_of(inb),
> each
> time a timestamp is read via the rdtsc emulation code. That is costly.
Do you agree that if I build with TSC on and disable suspend on halt
(or use idle=poll) that xenomai will use rdtsc?
> It switches to supervisor mode using an interrupt (0x80); that logic
> is
> really costly compared to the SEP entry. I'd say ~800ns-1us vs 200ns
> on
> average for your target.
This is bad, but since our fastest userspace period is 500us it is not
a dealbreaker. Just rt_task_wait_next_period() and one mutex lock/
unlock is too much for it.
> Btw, did you fix your driver code regarding the unprotected usage of
> FPU
> in pure Linux kernel context?
Yes in fact the new driver does not use floats at all. It's purely
integer math.
> Eh, no. TSC is always preferred when available.
I was looking at rthal_timer_program_shot().
> Frankly, those figures are really surprising. rdtsc() is about
> 100-200ns, running rthal_get_8254_tsc() is a lot, lot more.
I asked above if what we did would really use the TSC or not. What do
you think?
> No, when _your_ test runs.
So we should run latency -p and then our test and look at the output?
Thanks,
Steven
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] irq0 usage
2009-03-23 23:32 ` Steven Seeger
@ 2009-03-24 8:59 ` Philippe Gerum
0 siblings, 0 replies; 9+ messages in thread
From: Philippe Gerum @ 2009-03-24 8:59 UTC (permalink / raw)
To: Steven Seeger; +Cc: xenomai
On Mon, 2009-03-23 at 19:32 -0400, Steven Seeger wrote:
> > Ok, so we will agree that the 20%/60% ratios can't be compared, in
> > fact.
>
> Do you mean that this is not a fair comparison or that I should not be
> this slow compared to RTAI?
>
I mean that you were comparing apples to oranges. If you really want to
compare them in order to figure out if a significant loss of performance
happened, then run your application in an RTAI/LXRT context in userland.
> > The fact that the GX still has to use a crappy 8253 PIT for timing and
> > must emulate the TSC using one of the PIT channels is not helping at
> > all. Emulating the TSC costs 1 x time_of(outb) + 2 x time_of(inb),
> > each
> > time a timestamp is read via the rdtsc emulation code. That is costly.
>
> Do you agree that if I build with TSC on and disable suspend on halt
> (or use idle=poll) that xenomai will use rdtsc?
Xenomai will use rdtsc as soon as the kernel wants to use it. And the
kernel will do that as soon as the CPU model you picked in your setup
does exhibit TSC support. This is not a matter of Xenomai choosing to
ignore TSC support when available to the kernel, this never happens.
I seem to remember that your target has a bad TSC and loses time, unless
idle=poll is given; at the same time, we don't handle the SCx200 hires
timer that is Geode-specific, so there is likely no fallback option to
this issue but using idle=poll.
>
> > It switches to supervisor mode using an interrupt (0x80); that logic
> > is
> > really costly compared to the SEP entry. I'd say ~800ns-1us vs 200ns
> > on
> > average for your target.
>
> This is bad, but since our fastest userspace period is 500us it is not
> a dealbreaker. Just rt_task_wait_next_period() and one mutex lock/
> unlock is too much for it.
>
2.4.x will issue 3 syscalls there, 2.5.x only 1 most of the time.
If you really want to understand what is going on your system, you
should definitely enable the I-pipe tracer, and have a look at the
processing that takes place.
In any case, 3 syscalls over a 2Khz loop are no big deal over a sane hw;
the problem I see is that your target is cumulating a lot of issues:
buggy TSC, no SEP, sluggish ISA bus, no local APIC, braindamage C3
state. It's a bit like that hw would want to prevent you from using it
in real-time mode, I mean.
Again, the best way to know what is going on is to get a trace snapshot
from the I-pipe tracer. You would get detailed timing information for
kernel space activity, on a per-routine basis.
> > Btw, did you fix your driver code regarding the unprotected usage of
> > FPU
> > in pure Linux kernel context?
>
> Yes in fact the new driver does not use floats at all. It's purely
> integer math.
>
> > Eh, no. TSC is always preferred when available.
>
> I was looking at rthal_timer_program_shot().
>
This is used to program the next aperiodic shot and this should not
happen more than once per sample. OTOH, getting the CPU time via the TSC
emulation occurs a few times per sample.
> > Frankly, those figures are really surprising. rdtsc() is about
> > 100-200ns, running rthal_get_8254_tsc() is a lot, lot more.
>
> I asked above if what we did would really use the TSC or not. What do
> you think?
>
Do you have CONFIG_X86_TSC enabled in your kernel config? If so, then
you do use TSC with Xenomai as well.
> > No, when _your_ test runs.
>
> So we should run latency -p and then our test and look at the output?
>
Run latency -p 500 in the same load conditions than your app, and while
this is running:
- dump /proc/xenomai/timerstat; we will find out what timers are
outstanding.
- dump /proc/xenomai/stat a few times; we will find out the typical CPU
consumption of the timer tick.
Then, do the same with your application, and send the outputs.
> Thanks,
> Steven
>
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core
--
Philippe.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Xenomai-core] irq0 usage
@ 2009-03-26 17:25 Steven Seeger
2009-03-26 17:28 ` Steven Seeger
0 siblings, 1 reply; 9+ messages in thread
From: Steven Seeger @ 2009-03-26 17:25 UTC (permalink / raw)
To: xenomai
Using TSC really dropped us down. I don't know wh\y the timekeeper
says tsc is unstable. We ran our system for an 8 minute cycle and
timed it with a stopwatch, and it was accurate to the second.
On our test irq0 usage dropped from 19% to 13%. Thanks for the help,
guys.
Steven
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] irq0 usage
2009-03-26 17:25 Steven Seeger
@ 2009-03-26 17:28 ` Steven Seeger
2009-03-26 17:31 ` Gilles Chanteperdrix
0 siblings, 1 reply; 9+ messages in thread
From: Steven Seeger @ 2009-03-26 17:28 UTC (permalink / raw)
To: Steven Seeger; +Cc: xenomai
I forgot to mention. In order to keep tsc as the clockdev I disabled
the code in the kernel that removes it as the clocksource.
Steven
On Mar 26, 2009, at 1:25 PM, Steven Seeger wrote:
> Using TSC really dropped us down. I don't know wh\y the timekeeper
> says tsc is unstable. We ran our system for an 8 minute cycle and
> timed it with a stopwatch, and it was accurate to the second.
>
> On our test irq0 usage dropped from 19% to 13%. Thanks for the help,
> guys.
>
> Steven
>
>
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Xenomai-core] irq0 usage
2009-03-26 17:28 ` Steven Seeger
@ 2009-03-26 17:31 ` Gilles Chanteperdrix
0 siblings, 0 replies; 9+ messages in thread
From: Gilles Chanteperdrix @ 2009-03-26 17:31 UTC (permalink / raw)
To: Steven Seeger; +Cc: xenomai
Steven Seeger wrote:
> I forgot to mention. In order to keep tsc as the clockdev I disabled
> the code in the kernel that removes it as the clocksource.
Even when idle=poll or nohlt the kernel disables the tsc as clocksource ?
Also note that Xenomai does not care if Linux uses the tsc as
clocksource or not to use the tsc.
--
Gilles.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-03-26 17:31 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-23 19:59 [Xenomai-core] irq0 usage Steven Seeger
2009-03-23 22:54 ` Philippe Gerum
2009-03-23 23:03 ` Steven Seeger
2009-03-23 23:18 ` Philippe Gerum
2009-03-23 23:32 ` Steven Seeger
2009-03-24 8:59 ` Philippe Gerum
-- strict thread matches above, loose matches on Subject: below --
2009-03-26 17:25 Steven Seeger
2009-03-26 17:28 ` Steven Seeger
2009-03-26 17:31 ` Gilles Chanteperdrix
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.