* [Xenomai-help] Xenomai degrades syscall time on PPC405
@ 2007-08-20 12:11 Dirk Eibach
2007-08-23 20:45 ` Philippe Gerum
0 siblings, 1 reply; 4+ messages in thread
From: Dirk Eibach @ 2007-08-20 12:11 UTC (permalink / raw)
To: Xenomai help
Hello,
I am trying to understand some performance problems I ran into on my
xenomai system.
The testcode I attached performs a syscall(gettimeofday). Measuring time on
a vanilla Kernel gives the following output:
ticks for ppc_getccounter: 116, for gettimeofday: 1479
The same kernel with xenomai patched in gives this:
ticks for ppc_getccounter: 116, for gettimeofday: 4278
Is this the expected behaviour? Why is the time for a system call nearly 3
times as much?
#include <stdio.h>
#include <sys/time.h>
#include <time.h>
static void
ppc_getcounter(unsigned long long *v)
{
register unsigned long tbu, tb, tbu2;
loop:
asm volatile ("mftbu %0" : "=r" (tbu) );
asm volatile ("mftb %0" : "=r" (tb) );
asm volatile ("mftbu %0" : "=r" (tbu2));
if (__builtin_expect(tbu != tbu2, 0)) goto loop;
/* The slightly peculiar way of writing the next lines is
compiled better by GCC than any other way I tried. */
((long*)(v))[0] = tbu;
((long*)(v))[1] = tb;
}
int main(int argc, char* argv[])
{
while(1) {
struct timeval tv;
unsigned long long count0, count1, count2;
ppc_getcounter(&count0);
ppc_getcounter(&count1);
gettimeofday(&tv, NULL);
ppc_getcounter(&count2);
printf("ticks for ppc_getccounter: %lld, for gettimeofday: %lld\n",
count1 - count0, count2- count1);
sleep(1);
}
}
Cheers
--
Dirk Eibach
Entwicklung
Guntermann & Drunck GmbH Systementwicklung
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [Xenomai-help] Xenomai degrades syscall time on PPC405 2007-08-20 12:11 [Xenomai-help] Xenomai degrades syscall time on PPC405 Dirk Eibach @ 2007-08-23 20:45 ` Philippe Gerum 2007-08-24 6:33 ` Dirk Eibach 0 siblings, 1 reply; 4+ messages in thread From: Philippe Gerum @ 2007-08-23 20:45 UTC (permalink / raw) To: Dirk Eibach; +Cc: Xenomai help On Mon, 2007-08-20 at 14:11 +0200, Dirk Eibach wrote: > Hello, > > I am trying to understand some performance problems I ran into on my > xenomai system. > > The testcode I attached performs a syscall(gettimeofday). Measuring time on > a vanilla Kernel gives the following output: > > ticks for ppc_getccounter: 116, for gettimeofday: 1479 > > The same kernel with xenomai patched in gives this: > > ticks for ppc_getccounter: 116, for gettimeofday: 4278 > > Is this the expected behaviour? Why is the time for a system call nearly 3 > times as much? > When reporting figures, please report all of them. Here is your test ran on a 405GPr, linux-2.6.14, I-pipe 1.5-03, with full output: bash-3.00# ./ppc405_rt ticks for ppc_getccounter: 122, for gettimeofday: 7337 ticks for ppc_getccounter: 208, for gettimeofday: 2816 ticks for ppc_getccounter: 210, for gettimeofday: 2512 ticks for ppc_getccounter: 174, for gettimeofday: 2622 ticks for ppc_getccounter: 174, for gettimeofday: 2320 ticks for ppc_getccounter: 208, for gettimeofday: 2634 ticks for ppc_getccounter: 174, for gettimeofday: 2344 ticks for ppc_getccounter: 208, for gettimeofday: 2660 ticks for ppc_getccounter: 174, for gettimeofday: 2320 ticks for ppc_getccounter: 208, for gettimeofday: 2816 ticks for ppc_getccounter: 208, for gettimeofday: 2734 ticks for ppc_getccounter: 208, for gettimeofday: 2802 ... Only the first shot exhibits a pathological latency, and switching on CONFIG_XENO_OPT_DEBUG_NUCLEUS explains why: # dmesg ... Looking up port of RPC 100003/2 on 192.168.0.7 Looking up port of RPC 100005/1 on 192.168.0.7 VFS: Mounted root (nfs filesystem). Freeing unused kernel memory: 128k init Xenomai: Switching ppc405_rt to secondary mode after exception #1025 from user-space at 0xfe54f40 (pid 248) This is confirmed by the task stats: # cat /proc/xenomai/stat CPU PID MSW CSW PF STAT %CPU NAME 0 0 0 2 0 00400080 99.9 ROOT 0 248 1 1 1* 00300380 0.0 ppc405_rt 0 0 0 48031 0 00000000 0.1 IRQ32: [timer] (*) 1 page fault caught in primary mode, causing a Xenomai mode switch. The test program hits a minor fault, due to a virtual memory management artefact, which in turn forces a Xenomai switch for your task from primary to secondary mode, in order to process the fault gracefully (basically mapping a missing page table entry). 7 us is the time needed to yield the control of the task to the regular kernel, then handle the fault in Linux context, then eventually perform the syscall. This is a known issue, and a solution may be available at some point, but not yet. This mailing list contains other reports of the same issue, and detailed explanations. Regarding the other figures in the 2-3 us range, it is absolutely true that Xenomai adds some overhead to the system call path which may get noticeable on low-end boards. This is due to the pipelining of system calls to the Xenomai domain: there is more code to execute than the regular Linux has to run for the same task. This value may improve too over time, since we do have a few opportunities for optimization here. Last point: your measure do not take into account the possibility for the syscall path to be preempted by a real-time interrupt, or even by a plain Linux one after the task has switched to secondary mode, so you should always read such data with a grain of salt. In any case, you should never rely on a single sample. > #include <stdio.h> > #include <sys/time.h> > #include <time.h> > > static void > ppc_getcounter(unsigned long long *v) > { > register unsigned long tbu, tb, tbu2; > > loop: > asm volatile ("mftbu %0" : "=r" (tbu) ); > asm volatile ("mftb %0" : "=r" (tb) ); > asm volatile ("mftbu %0" : "=r" (tbu2)); > if (__builtin_expect(tbu != tbu2, 0)) goto loop; > > /* The slightly peculiar way of writing the next lines is > compiled better by GCC than any other way I tried. */ > ((long*)(v))[0] = tbu; > ((long*)(v))[1] = tb; > } > > int main(int argc, char* argv[]) > { > while(1) { > struct timeval tv; > unsigned long long count0, count1, count2; > ppc_getcounter(&count0); > ppc_getcounter(&count1); > gettimeofday(&tv, NULL); > ppc_getcounter(&count2); > printf("ticks for ppc_getccounter: %lld, for gettimeofday: %lld\n", > count1 - count0, count2- count1); > sleep(1); > } > } > > > Cheers -- Philippe. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai-help] Xenomai degrades syscall time on PPC405 2007-08-23 20:45 ` Philippe Gerum @ 2007-08-24 6:33 ` Dirk Eibach 2007-08-24 8:45 ` Philippe Gerum 0 siblings, 1 reply; 4+ messages in thread From: Dirk Eibach @ 2007-08-24 6:33 UTC (permalink / raw) To: rpm; +Cc: Xenomai help Hello Philippe, tank you for your detailed answer. >> Hello, >> >> I am trying to understand some performance problems I ran into on my >> xenomai system. >> >> The testcode I attached performs a syscall(gettimeofday). Measuring time on >> a vanilla Kernel gives the following output: >> >> ticks for ppc_getccounter: 116, for gettimeofday: 1479 >> >> The same kernel with xenomai patched in gives this: >> >> ticks for ppc_getccounter: 116, for gettimeofday: 4278 >> >> Is this the expected behaviour? Why is the time for a system call nearly 3 >> times as much? >> > > When reporting figures, please report all of them. Here is your test ran > on a 405GPr, linux-2.6.14, I-pipe 1.5-03, with full output: > > bash-3.00# ./ppc405_rt > ticks for ppc_getccounter: 122, for gettimeofday: 7337 > ticks for ppc_getccounter: 208, for gettimeofday: 2816 > ticks for ppc_getccounter: 210, for gettimeofday: 2512 > ticks for ppc_getccounter: 174, for gettimeofday: 2622 > ticks for ppc_getccounter: 174, for gettimeofday: 2320 > ticks for ppc_getccounter: 208, for gettimeofday: 2634 > ticks for ppc_getccounter: 174, for gettimeofday: 2344 > ticks for ppc_getccounter: 208, for gettimeofday: 2660 > ticks for ppc_getccounter: 174, for gettimeofday: 2320 > ticks for ppc_getccounter: 208, for gettimeofday: 2816 > ticks for ppc_getccounter: 208, for gettimeofday: 2734 > ticks for ppc_getccounter: 208, for gettimeofday: 2802 > ... I certainly tried to pick a representative sample. Here is the full series with xenomai patched in: # ./syscall-test ticks for ppc_getccounter: 92, for gettimeofday: 18375 ticks for ppc_getccounter: 116, for gettimeofday: 4250 ticks for ppc_getccounter: 116, for gettimeofday: 4416 ticks for ppc_getccounter: 116, for gettimeofday: 4372 ticks for ppc_getccounter: 116, for gettimeofday: 4486 ticks for ppc_getccounter: 116, for gettimeofday: 4376 ticks for ppc_getccounter: 116, for gettimeofday: 4446 ticks for ppc_getccounter: 116, for gettimeofday: 4434 ticks for ppc_getccounter: 116, for gettimeofday: 4368 ticks for ppc_getccounter: 116, for gettimeofday: 4418 ticks for ppc_getccounter: 116, for gettimeofday: 4404 ticks for ppc_getccounter: 116, for gettimeofday: 4384 ... And here we go without: # ./syscall-test ticks for ppc_getccounter: 92, for gettimeofday: 6761 ticks for ppc_getccounter: 116, for gettimeofday: 1768 ticks for ppc_getccounter: 116, for gettimeofday: 1594 ticks for ppc_getccounter: 116, for gettimeofday: 1560 ticks for ppc_getccounter: 116, for gettimeofday: 1600 ticks for ppc_getccounter: 116, for gettimeofday: 1574 ticks for ppc_getccounter: 116, for gettimeofday: 1618 ticks for ppc_getccounter: 116, for gettimeofday: 1560 ticks for ppc_getccounter: 116, for gettimeofday: 1600 ticks for ppc_getccounter: 116, for gettimeofday: 1540 ticks for ppc_getccounter: 116, for gettimeofday: 1618 ticks for ppc_getccounter: 116, for gettimeofday: 1566 ticks for ppc_getccounter: 116, for gettimeofday: 1608 ... > > Only the first shot exhibits a pathological latency, and switching on > CONFIG_XENO_OPT_DEBUG_NUCLEUS explains why: > > # dmesg > > ... > Looking up port of RPC 100003/2 on 192.168.0.7 > Looking up port of RPC 100005/1 on 192.168.0.7 > VFS: Mounted root (nfs filesystem). > Freeing unused kernel memory: 128k init > Xenomai: Switching ppc405_rt to secondary mode after exception #1025 from user-space at 0xfe54f40 (pid 248) > > This is confirmed by the task stats: > > # cat /proc/xenomai/stat > > CPU PID MSW CSW PF STAT %CPU NAME > 0 0 0 2 0 00400080 99.9 ROOT > 0 248 1 1 1* 00300380 0.0 ppc405_rt > 0 0 0 48031 0 00000000 0.1 IRQ32: [timer] > > (*) 1 page fault caught in primary mode, causing a Xenomai mode switch. > > The test program hits a minor fault, due to a virtual memory management > artefact, which in turn forces a Xenomai switch for your task from > primary to secondary mode, in order to process the fault gracefully > (basically mapping a missing page table entry). 7 us is the time needed > to yield the control of the task to the regular kernel, then handle the > fault in Linux context, then eventually perform the syscall. This is a > known issue, and a solution may be available at some point, but not yet. > > This mailing list contains other reports of the same issue, and detailed > explanations. > > Regarding the other figures in the 2-3 us range, it is absolutely true > that Xenomai adds some overhead to the system call path which may get > noticeable on low-end boards. This is due to the pipelining of system > calls to the Xenomai domain: there is more code to execute than the > regular Linux has to run for the same task. This value may improve too > over time, since we do have a few opportunities for optimization here. But when I compare my values above I still get an overhead of 10us which is at least strange and not acceptable for my application. I am asking myself if it will be possible to get this fixed with xenomai or if I have to go a completely different way. > Last point: your measure do not take into account the possibility for > the syscall path to be preempted by a real-time interrupt, or even by a > plain Linux one after the task has switched to secondary mode, so you > should always read such data with a grain of salt. In any case, you > should never rely on a single sample. > >> #include <stdio.h> >> #include <sys/time.h> >> #include <time.h> >> >> static void >> ppc_getcounter(unsigned long long *v) >> { >> register unsigned long tbu, tb, tbu2; >> >> loop: >> asm volatile ("mftbu %0" : "=r" (tbu) ); >> asm volatile ("mftb %0" : "=r" (tb) ); >> asm volatile ("mftbu %0" : "=r" (tbu2)); >> if (__builtin_expect(tbu != tbu2, 0)) goto loop; >> >> /* The slightly peculiar way of writing the next lines is >> compiled better by GCC than any other way I tried. */ >> ((long*)(v))[0] = tbu; >> ((long*)(v))[1] = tb; >> } >> >> int main(int argc, char* argv[]) >> { >> while(1) { >> struct timeval tv; >> unsigned long long count0, count1, count2; >> ppc_getcounter(&count0); >> ppc_getcounter(&count1); >> gettimeofday(&tv, NULL); >> ppc_getcounter(&count2); >> printf("ticks for ppc_getccounter: %lld, for gettimeofday: %lld\n", >> count1 - count0, count2- count1); >> sleep(1); >> } >> } >> >> >> Cheers -- Dirk Eibach Entwicklung Guntermann & Drunck GmbH Systementwicklung Telefon: +49 2739 8901-100 Fax: +49 2739 8901-120 EMail: mailto:eibach@domain.hid ------------------------------------------------------------------------------- Guntermann & Drunck GmbH Systementwicklung Dortmunder Str. 4a D-57234 Wilnsdorf - Germany Tel: +49 (0) 27 39 / 89 01 - 100 Fax: +49 (0) 27 39 / 89 01 - 120 E-Mail: mailto:sales@domain.hid Web: www.gdsys.de ------------------------------------------------------------------------------- Geschäftsführer: Udo Guntermann - Martin Drunck - Reiner Ruelmann - Klaus Tocke HRB 2884, Amtsgericht Siegen USt.-Id.-Nr. DE 126575222 - Steuer-Nr. 342 / 5835 / 1041 ------------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Xenomai-help] Xenomai degrades syscall time on PPC405 2007-08-24 6:33 ` Dirk Eibach @ 2007-08-24 8:45 ` Philippe Gerum 0 siblings, 0 replies; 4+ messages in thread From: Philippe Gerum @ 2007-08-24 8:45 UTC (permalink / raw) To: Dirk Eibach; +Cc: Xenomai help On Fri, 2007-08-24 at 08:33 +0200, Dirk Eibach wrote: > Hello Philippe, > > tank you for your detailed answer. > > >> Hello, > >> > >> I am trying to understand some performance problems I ran into on my > >> xenomai system. > >> > >> The testcode I attached performs a syscall(gettimeofday). Measuring time on > >> a vanilla Kernel gives the following output: > >> > >> ticks for ppc_getccounter: 116, for gettimeofday: 1479 > >> > >> The same kernel with xenomai patched in gives this: > >> > >> ticks for ppc_getccounter: 116, for gettimeofday: 4278 > >> > >> Is this the expected behaviour? Why is the time for a system call nearly 3 > >> times as much? > >> > > > > When reporting figures, please report all of them. Here is your test ran > > on a 405GPr, linux-2.6.14, I-pipe 1.5-03, with full output: > > > > bash-3.00# ./ppc405_rt > > ticks for ppc_getccounter: 122, for gettimeofday: 7337 > > ticks for ppc_getccounter: 208, for gettimeofday: 2816 > > ticks for ppc_getccounter: 210, for gettimeofday: 2512 > > ticks for ppc_getccounter: 174, for gettimeofday: 2622 > > ticks for ppc_getccounter: 174, for gettimeofday: 2320 > > ticks for ppc_getccounter: 208, for gettimeofday: 2634 > > ticks for ppc_getccounter: 174, for gettimeofday: 2344 > > ticks for ppc_getccounter: 208, for gettimeofday: 2660 > > ticks for ppc_getccounter: 174, for gettimeofday: 2320 > > ticks for ppc_getccounter: 208, for gettimeofday: 2816 > > ticks for ppc_getccounter: 208, for gettimeofday: 2734 > > ticks for ppc_getccounter: 208, for gettimeofday: 2802 > > ... > > I certainly tried to pick a representative sample. Here is the full series > with xenomai patched in: > # ./syscall-test > ticks for ppc_getccounter: 92, for gettimeofday: 18375 > ticks for ppc_getccounter: 116, for gettimeofday: 4250 > ticks for ppc_getccounter: 116, for gettimeofday: 4416 > ticks for ppc_getccounter: 116, for gettimeofday: 4372 > ticks for ppc_getccounter: 116, for gettimeofday: 4486 > ticks for ppc_getccounter: 116, for gettimeofday: 4376 > ticks for ppc_getccounter: 116, for gettimeofday: 4446 > ticks for ppc_getccounter: 116, for gettimeofday: 4434 > ticks for ppc_getccounter: 116, for gettimeofday: 4368 > ticks for ppc_getccounter: 116, for gettimeofday: 4418 > ticks for ppc_getccounter: 116, for gettimeofday: 4404 > ticks for ppc_getccounter: 116, for gettimeofday: 4384 > ... > > > And here we go without: > # ./syscall-test > ticks for ppc_getccounter: 92, for gettimeofday: 6761 > ticks for ppc_getccounter: 116, for gettimeofday: 1768 > ticks for ppc_getccounter: 116, for gettimeofday: 1594 I have comparable values on the 405GPr in the vanilla case: ticks for ppc_getccounter: 127, for gettimeofday: 6851 ticks for ppc_getccounter: 177, for gettimeofday: 1733 ticks for ppc_getccounter: 153, for gettimeofday: 1455 ticks for ppc_getccounter: 153, for gettimeofday: 1401 but still much lower values than you have in the Xenomai case. The answer is either: - in your kernel configuration, - in the I-pipe patch, - in the Xenomai version you are currently using Btw, where does the I-pipe patch 2.6.18 1.5-00 you seem to be using come from? Where did you pick it? > ticks for ppc_getccounter: 116, for gettimeofday: 1560 > ticks for ppc_getccounter: 116, for gettimeofday: 1600 > ticks for ppc_getccounter: 116, for gettimeofday: 1574 > ticks for ppc_getccounter: 116, for gettimeofday: 1618 > ticks for ppc_getccounter: 116, for gettimeofday: 1560 > ticks for ppc_getccounter: 116, for gettimeofday: 1600 > ticks for ppc_getccounter: 116, for gettimeofday: 1540 > ticks for ppc_getccounter: 116, for gettimeofday: 1618 > ticks for ppc_getccounter: 116, for gettimeofday: 1566 > ticks for ppc_getccounter: 116, for gettimeofday: 1608 > ... > -- Philippe. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-08-24 8:45 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-08-20 12:11 [Xenomai-help] Xenomai degrades syscall time on PPC405 Dirk Eibach 2007-08-23 20:45 ` Philippe Gerum 2007-08-24 6:33 ` Dirk Eibach 2007-08-24 8:45 ` Philippe Gerum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.