* [Xenomai-help] Xenomai degrades syscall time on PPC405
@ 2007-08-20 12:11 Dirk Eibach
2007-08-23 20:45 ` Philippe Gerum
0 siblings, 1 reply; 4+ messages in thread
From: Dirk Eibach @ 2007-08-20 12:11 UTC (permalink / raw)
To: Xenomai help
Hello,
I am trying to understand some performance problems I ran into on my
xenomai system.
The testcode I attached performs a syscall(gettimeofday). Measuring time on
a vanilla Kernel gives the following output:
ticks for ppc_getccounter: 116, for gettimeofday: 1479
The same kernel with xenomai patched in gives this:
ticks for ppc_getccounter: 116, for gettimeofday: 4278
Is this the expected behaviour? Why is the time for a system call nearly 3
times as much?
#include <stdio.h>
#include <sys/time.h>
#include <time.h>
static void
ppc_getcounter(unsigned long long *v)
{
register unsigned long tbu, tb, tbu2;
loop:
asm volatile ("mftbu %0" : "=r" (tbu) );
asm volatile ("mftb %0" : "=r" (tb) );
asm volatile ("mftbu %0" : "=r" (tbu2));
if (__builtin_expect(tbu != tbu2, 0)) goto loop;
/* The slightly peculiar way of writing the next lines is
compiled better by GCC than any other way I tried. */
((long*)(v))[0] = tbu;
((long*)(v))[1] = tb;
}
int main(int argc, char* argv[])
{
while(1) {
struct timeval tv;
unsigned long long count0, count1, count2;
ppc_getcounter(&count0);
ppc_getcounter(&count1);
gettimeofday(&tv, NULL);
ppc_getcounter(&count2);
printf("ticks for ppc_getccounter: %lld, for gettimeofday: %lld\n",
count1 - count0, count2- count1);
sleep(1);
}
}
Cheers
--
Dirk Eibach
Entwicklung
Guntermann & Drunck GmbH Systementwicklung
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai-help] Xenomai degrades syscall time on PPC405
2007-08-20 12:11 [Xenomai-help] Xenomai degrades syscall time on PPC405 Dirk Eibach
@ 2007-08-23 20:45 ` Philippe Gerum
2007-08-24 6:33 ` Dirk Eibach
0 siblings, 1 reply; 4+ messages in thread
From: Philippe Gerum @ 2007-08-23 20:45 UTC (permalink / raw)
To: Dirk Eibach; +Cc: Xenomai help
On Mon, 2007-08-20 at 14:11 +0200, Dirk Eibach wrote:
> Hello,
>
> I am trying to understand some performance problems I ran into on my
> xenomai system.
>
> The testcode I attached performs a syscall(gettimeofday). Measuring time on
> a vanilla Kernel gives the following output:
>
> ticks for ppc_getccounter: 116, for gettimeofday: 1479
>
> The same kernel with xenomai patched in gives this:
>
> ticks for ppc_getccounter: 116, for gettimeofday: 4278
>
> Is this the expected behaviour? Why is the time for a system call nearly 3
> times as much?
>
When reporting figures, please report all of them. Here is your test ran
on a 405GPr, linux-2.6.14, I-pipe 1.5-03, with full output:
bash-3.00# ./ppc405_rt
ticks for ppc_getccounter: 122, for gettimeofday: 7337
ticks for ppc_getccounter: 208, for gettimeofday: 2816
ticks for ppc_getccounter: 210, for gettimeofday: 2512
ticks for ppc_getccounter: 174, for gettimeofday: 2622
ticks for ppc_getccounter: 174, for gettimeofday: 2320
ticks for ppc_getccounter: 208, for gettimeofday: 2634
ticks for ppc_getccounter: 174, for gettimeofday: 2344
ticks for ppc_getccounter: 208, for gettimeofday: 2660
ticks for ppc_getccounter: 174, for gettimeofday: 2320
ticks for ppc_getccounter: 208, for gettimeofday: 2816
ticks for ppc_getccounter: 208, for gettimeofday: 2734
ticks for ppc_getccounter: 208, for gettimeofday: 2802
...
Only the first shot exhibits a pathological latency, and switching on
CONFIG_XENO_OPT_DEBUG_NUCLEUS explains why:
# dmesg
...
Looking up port of RPC 100003/2 on 192.168.0.7
Looking up port of RPC 100005/1 on 192.168.0.7
VFS: Mounted root (nfs filesystem).
Freeing unused kernel memory: 128k init
Xenomai: Switching ppc405_rt to secondary mode after exception #1025 from user-space at 0xfe54f40 (pid 248)
This is confirmed by the task stats:
# cat /proc/xenomai/stat
CPU PID MSW CSW PF STAT %CPU NAME
0 0 0 2 0 00400080 99.9 ROOT
0 248 1 1 1* 00300380 0.0 ppc405_rt
0 0 0 48031 0 00000000 0.1 IRQ32: [timer]
(*) 1 page fault caught in primary mode, causing a Xenomai mode switch.
The test program hits a minor fault, due to a virtual memory management
artefact, which in turn forces a Xenomai switch for your task from
primary to secondary mode, in order to process the fault gracefully
(basically mapping a missing page table entry). 7 us is the time needed
to yield the control of the task to the regular kernel, then handle the
fault in Linux context, then eventually perform the syscall. This is a
known issue, and a solution may be available at some point, but not yet.
This mailing list contains other reports of the same issue, and detailed
explanations.
Regarding the other figures in the 2-3 us range, it is absolutely true
that Xenomai adds some overhead to the system call path which may get
noticeable on low-end boards. This is due to the pipelining of system
calls to the Xenomai domain: there is more code to execute than the
regular Linux has to run for the same task. This value may improve too
over time, since we do have a few opportunities for optimization here.
Last point: your measure do not take into account the possibility for
the syscall path to be preempted by a real-time interrupt, or even by a
plain Linux one after the task has switched to secondary mode, so you
should always read such data with a grain of salt. In any case, you
should never rely on a single sample.
> #include <stdio.h>
> #include <sys/time.h>
> #include <time.h>
>
> static void
> ppc_getcounter(unsigned long long *v)
> {
> register unsigned long tbu, tb, tbu2;
>
> loop:
> asm volatile ("mftbu %0" : "=r" (tbu) );
> asm volatile ("mftb %0" : "=r" (tb) );
> asm volatile ("mftbu %0" : "=r" (tbu2));
> if (__builtin_expect(tbu != tbu2, 0)) goto loop;
>
> /* The slightly peculiar way of writing the next lines is
> compiled better by GCC than any other way I tried. */
> ((long*)(v))[0] = tbu;
> ((long*)(v))[1] = tb;
> }
>
> int main(int argc, char* argv[])
> {
> while(1) {
> struct timeval tv;
> unsigned long long count0, count1, count2;
> ppc_getcounter(&count0);
> ppc_getcounter(&count1);
> gettimeofday(&tv, NULL);
> ppc_getcounter(&count2);
> printf("ticks for ppc_getccounter: %lld, for gettimeofday: %lld\n",
> count1 - count0, count2- count1);
> sleep(1);
> }
> }
>
>
> Cheers
--
Philippe.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai-help] Xenomai degrades syscall time on PPC405
2007-08-23 20:45 ` Philippe Gerum
@ 2007-08-24 6:33 ` Dirk Eibach
2007-08-24 8:45 ` Philippe Gerum
0 siblings, 1 reply; 4+ messages in thread
From: Dirk Eibach @ 2007-08-24 6:33 UTC (permalink / raw)
To: rpm; +Cc: Xenomai help
Hello Philippe,
tank you for your detailed answer.
>> Hello,
>>
>> I am trying to understand some performance problems I ran into on my
>> xenomai system.
>>
>> The testcode I attached performs a syscall(gettimeofday). Measuring time on
>> a vanilla Kernel gives the following output:
>>
>> ticks for ppc_getccounter: 116, for gettimeofday: 1479
>>
>> The same kernel with xenomai patched in gives this:
>>
>> ticks for ppc_getccounter: 116, for gettimeofday: 4278
>>
>> Is this the expected behaviour? Why is the time for a system call nearly 3
>> times as much?
>>
>
> When reporting figures, please report all of them. Here is your test ran
> on a 405GPr, linux-2.6.14, I-pipe 1.5-03, with full output:
>
> bash-3.00# ./ppc405_rt
> ticks for ppc_getccounter: 122, for gettimeofday: 7337
> ticks for ppc_getccounter: 208, for gettimeofday: 2816
> ticks for ppc_getccounter: 210, for gettimeofday: 2512
> ticks for ppc_getccounter: 174, for gettimeofday: 2622
> ticks for ppc_getccounter: 174, for gettimeofday: 2320
> ticks for ppc_getccounter: 208, for gettimeofday: 2634
> ticks for ppc_getccounter: 174, for gettimeofday: 2344
> ticks for ppc_getccounter: 208, for gettimeofday: 2660
> ticks for ppc_getccounter: 174, for gettimeofday: 2320
> ticks for ppc_getccounter: 208, for gettimeofday: 2816
> ticks for ppc_getccounter: 208, for gettimeofday: 2734
> ticks for ppc_getccounter: 208, for gettimeofday: 2802
> ...
I certainly tried to pick a representative sample. Here is the full series
with xenomai patched in:
# ./syscall-test
ticks for ppc_getccounter: 92, for gettimeofday: 18375
ticks for ppc_getccounter: 116, for gettimeofday: 4250
ticks for ppc_getccounter: 116, for gettimeofday: 4416
ticks for ppc_getccounter: 116, for gettimeofday: 4372
ticks for ppc_getccounter: 116, for gettimeofday: 4486
ticks for ppc_getccounter: 116, for gettimeofday: 4376
ticks for ppc_getccounter: 116, for gettimeofday: 4446
ticks for ppc_getccounter: 116, for gettimeofday: 4434
ticks for ppc_getccounter: 116, for gettimeofday: 4368
ticks for ppc_getccounter: 116, for gettimeofday: 4418
ticks for ppc_getccounter: 116, for gettimeofday: 4404
ticks for ppc_getccounter: 116, for gettimeofday: 4384
...
And here we go without:
# ./syscall-test
ticks for ppc_getccounter: 92, for gettimeofday: 6761
ticks for ppc_getccounter: 116, for gettimeofday: 1768
ticks for ppc_getccounter: 116, for gettimeofday: 1594
ticks for ppc_getccounter: 116, for gettimeofday: 1560
ticks for ppc_getccounter: 116, for gettimeofday: 1600
ticks for ppc_getccounter: 116, for gettimeofday: 1574
ticks for ppc_getccounter: 116, for gettimeofday: 1618
ticks for ppc_getccounter: 116, for gettimeofday: 1560
ticks for ppc_getccounter: 116, for gettimeofday: 1600
ticks for ppc_getccounter: 116, for gettimeofday: 1540
ticks for ppc_getccounter: 116, for gettimeofday: 1618
ticks for ppc_getccounter: 116, for gettimeofday: 1566
ticks for ppc_getccounter: 116, for gettimeofday: 1608
...
>
> Only the first shot exhibits a pathological latency, and switching on
> CONFIG_XENO_OPT_DEBUG_NUCLEUS explains why:
>
> # dmesg
>
> ...
> Looking up port of RPC 100003/2 on 192.168.0.7
> Looking up port of RPC 100005/1 on 192.168.0.7
> VFS: Mounted root (nfs filesystem).
> Freeing unused kernel memory: 128k init
> Xenomai: Switching ppc405_rt to secondary mode after exception #1025 from user-space at 0xfe54f40 (pid 248)
>
> This is confirmed by the task stats:
>
> # cat /proc/xenomai/stat
>
> CPU PID MSW CSW PF STAT %CPU NAME
> 0 0 0 2 0 00400080 99.9 ROOT
> 0 248 1 1 1* 00300380 0.0 ppc405_rt
> 0 0 0 48031 0 00000000 0.1 IRQ32: [timer]
>
> (*) 1 page fault caught in primary mode, causing a Xenomai mode switch.
>
> The test program hits a minor fault, due to a virtual memory management
> artefact, which in turn forces a Xenomai switch for your task from
> primary to secondary mode, in order to process the fault gracefully
> (basically mapping a missing page table entry). 7 us is the time needed
> to yield the control of the task to the regular kernel, then handle the
> fault in Linux context, then eventually perform the syscall. This is a
> known issue, and a solution may be available at some point, but not yet.
>
> This mailing list contains other reports of the same issue, and detailed
> explanations.
>
> Regarding the other figures in the 2-3 us range, it is absolutely true
> that Xenomai adds some overhead to the system call path which may get
> noticeable on low-end boards. This is due to the pipelining of system
> calls to the Xenomai domain: there is more code to execute than the
> regular Linux has to run for the same task. This value may improve too
> over time, since we do have a few opportunities for optimization here.
But when I compare my values above I still get an overhead of 10us which is
at least strange and not acceptable for my application. I am asking myself
if it will be possible to get this fixed with xenomai or if I have to go a
completely different way.
> Last point: your measure do not take into account the possibility for
> the syscall path to be preempted by a real-time interrupt, or even by a
> plain Linux one after the task has switched to secondary mode, so you
> should always read such data with a grain of salt. In any case, you
> should never rely on a single sample.
>
>> #include <stdio.h>
>> #include <sys/time.h>
>> #include <time.h>
>>
>> static void
>> ppc_getcounter(unsigned long long *v)
>> {
>> register unsigned long tbu, tb, tbu2;
>>
>> loop:
>> asm volatile ("mftbu %0" : "=r" (tbu) );
>> asm volatile ("mftb %0" : "=r" (tb) );
>> asm volatile ("mftbu %0" : "=r" (tbu2));
>> if (__builtin_expect(tbu != tbu2, 0)) goto loop;
>>
>> /* The slightly peculiar way of writing the next lines is
>> compiled better by GCC than any other way I tried. */
>> ((long*)(v))[0] = tbu;
>> ((long*)(v))[1] = tb;
>> }
>>
>> int main(int argc, char* argv[])
>> {
>> while(1) {
>> struct timeval tv;
>> unsigned long long count0, count1, count2;
>> ppc_getcounter(&count0);
>> ppc_getcounter(&count1);
>> gettimeofday(&tv, NULL);
>> ppc_getcounter(&count2);
>> printf("ticks for ppc_getccounter: %lld, for gettimeofday: %lld\n",
>> count1 - count0, count2- count1);
>> sleep(1);
>> }
>> }
>>
>>
>> Cheers
--
Dirk Eibach
Entwicklung
Guntermann & Drunck GmbH Systementwicklung
Telefon: +49 2739 8901-100
Fax: +49 2739 8901-120
EMail: mailto:eibach@domain.hid
-------------------------------------------------------------------------------
Guntermann & Drunck GmbH Systementwicklung
Dortmunder Str. 4a
D-57234 Wilnsdorf - Germany
Tel: +49 (0) 27 39 / 89 01 - 100 Fax: +49 (0) 27 39 / 89 01 - 120
E-Mail: mailto:sales@domain.hid Web: www.gdsys.de
-------------------------------------------------------------------------------
Geschäftsführer: Udo Guntermann - Martin Drunck - Reiner Ruelmann - Klaus Tocke
HRB 2884, Amtsgericht Siegen
USt.-Id.-Nr. DE 126575222 - Steuer-Nr. 342 / 5835 / 1041
-------------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai-help] Xenomai degrades syscall time on PPC405
2007-08-24 6:33 ` Dirk Eibach
@ 2007-08-24 8:45 ` Philippe Gerum
0 siblings, 0 replies; 4+ messages in thread
From: Philippe Gerum @ 2007-08-24 8:45 UTC (permalink / raw)
To: Dirk Eibach; +Cc: Xenomai help
On Fri, 2007-08-24 at 08:33 +0200, Dirk Eibach wrote:
> Hello Philippe,
>
> tank you for your detailed answer.
>
> >> Hello,
> >>
> >> I am trying to understand some performance problems I ran into on my
> >> xenomai system.
> >>
> >> The testcode I attached performs a syscall(gettimeofday). Measuring time on
> >> a vanilla Kernel gives the following output:
> >>
> >> ticks for ppc_getccounter: 116, for gettimeofday: 1479
> >>
> >> The same kernel with xenomai patched in gives this:
> >>
> >> ticks for ppc_getccounter: 116, for gettimeofday: 4278
> >>
> >> Is this the expected behaviour? Why is the time for a system call nearly 3
> >> times as much?
> >>
> >
> > When reporting figures, please report all of them. Here is your test ran
> > on a 405GPr, linux-2.6.14, I-pipe 1.5-03, with full output:
> >
> > bash-3.00# ./ppc405_rt
> > ticks for ppc_getccounter: 122, for gettimeofday: 7337
> > ticks for ppc_getccounter: 208, for gettimeofday: 2816
> > ticks for ppc_getccounter: 210, for gettimeofday: 2512
> > ticks for ppc_getccounter: 174, for gettimeofday: 2622
> > ticks for ppc_getccounter: 174, for gettimeofday: 2320
> > ticks for ppc_getccounter: 208, for gettimeofday: 2634
> > ticks for ppc_getccounter: 174, for gettimeofday: 2344
> > ticks for ppc_getccounter: 208, for gettimeofday: 2660
> > ticks for ppc_getccounter: 174, for gettimeofday: 2320
> > ticks for ppc_getccounter: 208, for gettimeofday: 2816
> > ticks for ppc_getccounter: 208, for gettimeofday: 2734
> > ticks for ppc_getccounter: 208, for gettimeofday: 2802
> > ...
>
> I certainly tried to pick a representative sample. Here is the full series
> with xenomai patched in:
> # ./syscall-test
> ticks for ppc_getccounter: 92, for gettimeofday: 18375
> ticks for ppc_getccounter: 116, for gettimeofday: 4250
> ticks for ppc_getccounter: 116, for gettimeofday: 4416
> ticks for ppc_getccounter: 116, for gettimeofday: 4372
> ticks for ppc_getccounter: 116, for gettimeofday: 4486
> ticks for ppc_getccounter: 116, for gettimeofday: 4376
> ticks for ppc_getccounter: 116, for gettimeofday: 4446
> ticks for ppc_getccounter: 116, for gettimeofday: 4434
> ticks for ppc_getccounter: 116, for gettimeofday: 4368
> ticks for ppc_getccounter: 116, for gettimeofday: 4418
> ticks for ppc_getccounter: 116, for gettimeofday: 4404
> ticks for ppc_getccounter: 116, for gettimeofday: 4384
> ...
>
>
> And here we go without:
> # ./syscall-test
> ticks for ppc_getccounter: 92, for gettimeofday: 6761
> ticks for ppc_getccounter: 116, for gettimeofday: 1768
> ticks for ppc_getccounter: 116, for gettimeofday: 1594
I have comparable values on the 405GPr in the vanilla case:
ticks for ppc_getccounter: 127, for gettimeofday: 6851
ticks for ppc_getccounter: 177, for gettimeofday: 1733
ticks for ppc_getccounter: 153, for gettimeofday: 1455
ticks for ppc_getccounter: 153, for gettimeofday: 1401
but still much lower values than you have in the Xenomai case.
The answer is either:
- in your kernel configuration,
- in the I-pipe patch,
- in the Xenomai version you are currently using
Btw, where does the I-pipe patch 2.6.18 1.5-00 you seem to be using come
from? Where did you pick it?
> ticks for ppc_getccounter: 116, for gettimeofday: 1560
> ticks for ppc_getccounter: 116, for gettimeofday: 1600
> ticks for ppc_getccounter: 116, for gettimeofday: 1574
> ticks for ppc_getccounter: 116, for gettimeofday: 1618
> ticks for ppc_getccounter: 116, for gettimeofday: 1560
> ticks for ppc_getccounter: 116, for gettimeofday: 1600
> ticks for ppc_getccounter: 116, for gettimeofday: 1540
> ticks for ppc_getccounter: 116, for gettimeofday: 1618
> ticks for ppc_getccounter: 116, for gettimeofday: 1566
> ticks for ppc_getccounter: 116, for gettimeofday: 1608
> ...
>
--
Philippe.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-08-24 8:45 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-20 12:11 [Xenomai-help] Xenomai degrades syscall time on PPC405 Dirk Eibach
2007-08-23 20:45 ` Philippe Gerum
2007-08-24 6:33 ` Dirk Eibach
2007-08-24 8:45 ` Philippe Gerum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.