* [Adeos-main] latency results for ppc and x86 [not found] <45CD730A.6000405@domain.hid> @ 2007-02-20 7:21 ` poornima r 2007-02-21 7:13 ` Wolfgang Grandegger 0 siblings, 1 reply; 15+ messages in thread From: poornima r @ 2007-02-20 7:21 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: adeos-main Hello, These were the scheduling latency and interrupt latency test results on ppc and x86 with IPIPE tracer option disabled. 1.Please comment on these results (whether valid) and 2.Is there any method to optimize these results. 1)PPC:- (MPC-860 at 48 MHz, 4 kB I-Cache and 4 kB D-Cache) User mode:- root@domain.hid# ./latency -t0 == Sampling period: 1000000 us == Test mode: periodic user-mode task == All results in microseconds warming up... RTT| 00:00:01 (periodic user-mode task, 1000000 us period, priority 99) RTH|-----lat min|-----lat avg|-----latmax|-overrun|----lat best|---lat worst RTD| 167.000| 167.000| 167.000| 0| 167.000| 167.000 RTD| 176.000| 176.000| 176.000| 0| 167.000| 176.000 RTD| 168.000| 168.000| 168.000| 0| 167.000| 176.000 RTD| 171.000| 171.000| 171.000| 0| 167.000| 176.000 Kernel mode:- root@domain.hid# ./latency -t1 == Sampling period: 1000000 us == Test mode: in-kernel periodic task == All results in microseconds warming up... RTT| 00:00:00 (in-kernel periodic task, 1000000 us period, priority 99) RTH|-----lat min|-----lat avg|-----latmax|-overrun|----lat best|---lat worst RTD| 123.000| 123.000| 123.000| 0| 123.000| 123.000 RTD| 125.000| 125.000| 125.000| 0| 123.000| 125.000 RTD| 128.333| 128.333| 128.333| 0| 123.000| 128.333 RTD| 127.000| 127.000| 127.000| 0| 123.000| 128.333 Interrupt mode:- root@domain.hid# ./latency -t2 == Sampling period: 1000000 us == Test mode: in-kernel timer handler == All results in microseconds warming up... RTT| 00:00:01 (in-kernel timer handler, 1000000 us period, priority 99) RTH|-----lat min|-----lat avg|-----latmax|-overrun|----lat best|---lat worst RTD| 45.334| 45.334| 45.334| 0| 45.334| 45.334 RTD| 45.000| 45.000| 45.000| 0| 45.000| 45.334 RTD| 46.000| 46.000| 46.000| 0| 45.000| 46.000 RTD| 47.334| 47.334| 47.334| 0| 45.000| 47.334 RTD| 46.334| 46.334| 46.334| 0| 45.000| 47.334 2)X86:- (Pentium4, 3.06GHz, 1024 KB cache size) User mode:- Sampling period: 100 us == Test mode: in-kernel periodic task == All results in microseconds warming up... RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99) RTH|-----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat worst RTD| 3.807| 12.825| 21.565| 0| 3.807| 21.565 RTD| 3.796| 12.792| 21.483| 0| 3.796| 21.565 RTD| 3.770| 12.799| 21.501| 0| 3.770| 21.565 RTD| 3.578| 12.806| 20.890| 0| 3.578| 21.565 RTD| 3.755| 12.809| 21.486| 0| 3.578| kernel mode:- Sampling period: 100 us == Test mode: in-kernel periodic task == All results in microseconds warming up... RTT| 00:00:01 (in-kernel periodic task, 100 us period, priority 99) RTH|-----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat worst RTD| 2.381| 3.451| 19.620| 0| 2.381| 19.620 RTD| 2.332| 3.480| 19.930| 0| 2.332| 19.930 RTD| 2.382| 3.649| 19.609| 0| 2.332| 19.930 RTD| 2.323| 2.786| 14.351| 0| 2.323| 19.930 RTD| 2.375| 2.532| 5.519| 0| 2.323| 19.930 RTD| 2.332| 3.971| 19.617| 0| 2.323| 19.930 Interrupt mode:- Sampling period: 100 us == Test mode: in-kernel timer handler == All results in microseconds warming up... RTT| 00:00:01 (in-kernel timer handler, 100 us period, priority 99) RTH|-----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat worst RTD| -1.563| 7.553| 15.736| 0| -1.563| 15.736 RTD| -1.579| 7.558| 15.804| 0| -1.579| 15.804 RTD| -1.584| 7.529| 16.167| 0| -1.584| 16.167 RTD| -1.548| 7.553| 16.186| 0| -1.584| 16.186 RTD| -1.585| 7.556| 16.275| 0| -1.585| 16.275 Thanks, Poornima --- Wolfgang Grandegger <wg@domain.hid> wrote: > Hello, > > poornima r wrote: > > Hello, > > > > Srry for not replying all these days... > > (Was not in in station, may be too personal!!!!!) > > > > About software emulation error: > > > > 4)Output of /proc/xenomai/faults after the illegal > >>> instruction:- > >>> root@domain.hid# cat > >>> /proc/xenomai/faults > >>> TRAP CPU0 > >>> 0: 0 (Data or instruction > access) > >>> 1: 0 (Alignment) > >>> 2: 0 (Altivec unavailable) > >>> 3: 0 (Program check exception) > >>> 4: 0 (Machine check exception) > >>> 5: 0 (Unknown) > >>> 6: 0 (Instruction breakpoint) > >>> 7: 0 (Run mode exception) > >>> 8: 0 (Single-step exception) > >>> 9: 0 (Non-recoverable exception) > >>> 10: 1 (Software emulation) > >>> 11: 0 (Debug) > >>> 12: 0 (SPE) > >>> 13: 0 (Altivec assist) > >> Hm, I see a software emulation exception which is > >> also the reason for > >> the illegal instructions. What toolchain do you > use? > >> The toolchain > >> should support software FP emulation. > > > > 1)I am using open source too chain with software > > floating point emulation support. > > (#ppc_8xx-gcc --v > > > /lib/gcc/powerpc/3.4.3/../../../../target_powerpc/usr/include/c++/3.4.3 > > --with-numa-policy=no --with-float=soft) > > > > 2)And the kernel is included with code to emulate > a > > floating-point > > > unit, which will allow programs that > > use floating-point > > > instructions to run > > > > Kernel configuration > > ----CONFIG_MATH_EMULATION:y > > If you build with "--with-float=soft" there is no > need for math > emulation in the kernel. Likely, there is something > wrong with your > tool-chain. Could you please try a known-to-work > tool-chain like the > ELDK v4.x from http://www.denx.de. > > Wolfgang. > > > Thanks, > > Poornima > > > > --- Wolfgang Grandegger <wg@domain.hid> wrote: > > > >> poornima r wrote: > >>> Hi, > >>> > >>> 1)I am using open source kernel from Kernel.org, > >>> but what is meant by vanilla kernel from > >> Kernel.org? > >> > >> It's the kernel from kernel.org. This means that > the > >> Linux kernel 2.6.18 > >> is running fine on your MPC860 platform as is? > >> Thanks for the info. > >> > >>> 2)With sampling period of 500usec the system > >> simply > >>> hangs without printing any results (./latenct > >> -p500) > >> > >> OK. > >> > >>> 3)cyclictest with -t1 option (without > >> IPIPE-tracer) > >>> root@domain.hid# > ./cyclictest > >> -t1 > >>> 2.04 0.50 0.17 8/27 174 > >>> > >>> T: 0 ( 0) P:99 I: 1000 C: 0 Min: > >> 1000000 > >>> Act: 0 Avg: 0 Max:-1000000 > >>> Illegal instruction > >>> > >>> 4)Output of /proc/xenomai/faults after the > illegal > >>> instruction:- > >>> root@domain.hid# cat > >>> /proc/xenomai/faults > >>> TRAP CPU0 > >>> 0: 0 (Data or instruction > access) > >>> 1: 0 (Alignment) > >>> 2: 0 (Altivec unavailable) > >>> 3: 0 (Program check exception) > >>> 4: 0 (Machine check exception) > >>> 5: 0 (Unknown) > >>> 6: 0 (Instruction breakpoint) > >>> 7: 0 (Run mode exception) > >>> 8: 0 (Single-step exception) > >>> 9: 0 (Non-recoverable exception) > >>> 10: 1 (Software emulation) > >>> 11: 0 (Debug) > >>> 12: 0 (SPE) > >>> 13: 0 (Altivec assist) > >> Hm, I see a software emulation exception which is > >> also the reason for > >> the illegal instructions. What toolchain do you > use? > >> The toolchain > >> should support software FP emulation. > >> > >>> 5)Running switchtest:- > >>> root@domain.hid# > ./switchtest > >> -n > >>> --The system hangs wihtout printing any results > >>> > >>> Thanks, > >>> Poornima > >>> > >>> > >>> --- Wolfgang Grandegger <wg@domain.hid> > wrote: > >>> > >>>> poornima r wrote: > >>>>> Hi, > >>>>> > >>>>> Thanks for the reply. > >>>>> > >>>>> Linux version:linux-2.6.18 > >>>>> Xenomai: xenomai-2.3.0 (Stable version) > >>>>> adeos patch: > adeos-ipipe-2.6.18-ppc-1.5-01.patch > >>>> OK, I'm curious, did you use the vanilla kernel > >> from > >>>> kernel.org? > >>>> More comments below. > >>>> > >>>>> The tests were run as follows: > >>>>> 1)The sampling period in the code for latency > >> and > >>>>> switchbench was changed to 1000000000ns(to > >> remove > >>>>> overrun error) > >>>>> 2)switchtest was run with -n5 option > >>>>> 3)cyclictest was run with -t5 option(5 > threads > >>>>> were created.) > >>>>> 4)cyclictest was terminated with Illegal > >>>> instruction > >>>>> (after creating 5 threads) with IPIPE tracer > >>>> enabled. > >>>> > >>>>> These were the results without I-PIPE Tracer > >>>> option: > >>>>> (All the tests were run without any load) > >>>>> 1)LATENCY TEST:- > >>>>> User mode:- > >>>>> /mnt/out_xen/bin# ./latency -t0 > >>>>> == Sampling period: 1000000 us > >>>>> == Test mode: periodic user-mode task > >>>>> == All results in microseconds > >>>>> warming up... > >>>>> RTT| 00:00:01 (periodic user-mode task, > >> 1000000 > >>>> us > >>>>> period, priority 99) > >>>>> RTH|-----lat min|-----lat avg|-----lat > >>>>> max|-overrun|----lat best|---lat worst > >>>>> RTD| 167.000| 167.000| 167.000| > > >>>> 0| > >>>>> 167.000| 167.000 > >>>>> RTD| 176.000| 176.000| 176.000| > > >>>> 0| > >>>>> 167.000| 176.000 > >>>>> RTD| 168.000| 168.000| 168.000| > > >>>> 0| > >>>>> 167.000| 176.000 > === message truncated === ____________________________________________________________________________________ Get your own web address. Have a HUGE year through Yahoo! Small Business. http://smallbusiness.yahoo.com/domains/?p=BESTDEAL ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Adeos-main] latency results for ppc and x86 2007-02-20 7:21 ` [Adeos-main] latency results for ppc and x86 poornima r @ 2007-02-21 7:13 ` Wolfgang Grandegger 2007-02-21 9:33 ` poornima r 2007-03-14 12:51 ` [Adeos-main] test results for switchtest and cyclictest on x86 poornima r 0 siblings, 2 replies; 15+ messages in thread From: Wolfgang Grandegger @ 2007-02-21 7:13 UTC (permalink / raw) To: poornima r; +Cc: adeos-main Hello, poornima r wrote: > Hello, > > These were the scheduling latency and interrupt > latency test results on ppc and x86 with IPIPE tracer > option disabled. > > 1.Please comment on these results (whether valid) and Your results are OK. These are actually the figures I remember from my own tests in the past. > 2.Is there any method to optimize these results. No that I know of. There are a few ideas how to reduce latencies further like cache locking or TLB pinning. > 1)PPC:- > (MPC-860 at 48 MHz, 4 kB I-Cache and 4 kB D-Cache) > > User mode:- > root@domain.hid# ./latency -t0 > == Sampling period: 1000000 us > == Test mode: periodic user-mode task > == All results in microseconds > warming up... > RTT| 00:00:01 (periodic user-mode task, 1000000 us > period, priority 99) > RTH|-----lat min|-----lat > avg|-----latmax|-overrun|----lat best|---lat worst > RTD| 167.000| 167.000| 167.000| 0| > 167.000| 167.000 > RTD| 176.000| 176.000| 176.000| 0| > 167.000| 176.000 > RTD| 168.000| 168.000| 168.000| 0| > 167.000| 176.000 > RTD| 171.000| 171.000| 171.000| 0| > 167.000| 176.000 > > Kernel mode:- > root@domain.hid# ./latency -t1 > == Sampling period: 1000000 us > == Test mode: in-kernel periodic task > == All results in microseconds > warming up... > RTT| 00:00:00 (in-kernel periodic task, 1000000 us > period, priority 99) > RTH|-----lat min|-----lat > avg|-----latmax|-overrun|----lat best|---lat worst > RTD| 123.000| 123.000| 123.000| 0| > 123.000| 123.000 > RTD| 125.000| 125.000| 125.000| 0| > 123.000| 125.000 > RTD| 128.333| 128.333| 128.333| 0| > 123.000| 128.333 > RTD| 127.000| 127.000| 127.000| 0| > 123.000| 128.333 > > Interrupt mode:- > root@domain.hid# ./latency -t2 > == Sampling period: 1000000 us > == Test mode: in-kernel timer handler > == All results in microseconds > warming up... > RTT| 00:00:01 (in-kernel timer handler, 1000000 us > period, priority 99) > RTH|-----lat min|-----lat > avg|-----latmax|-overrun|----lat best|---lat worst > RTD| 45.334| 45.334| 45.334| 0| > 45.334| 45.334 > RTD| 45.000| 45.000| 45.000| 0| > 45.000| 45.334 > RTD| 46.000| 46.000| 46.000| 0| > 45.000| 46.000 > RTD| 47.334| 47.334| 47.334| 0| > 45.000| 47.334 > RTD| 46.334| 46.334| 46.334| 0| > 45.000| 47.334 On the MPC860, the latencies are mainly due code execution time as this processor is very slow. > 2)X86:- > (Pentium4, 3.06GHz, 1024 KB cache size) > User mode:- > Sampling period: 100 us > == Test mode: in-kernel periodic task > == All results in microseconds > warming up... > > RTT| 00:00:01 (periodic user-mode task, 100 us > period, priority 99) > RTH|-----lat min|-----lat avg|-----lat > max|-overrun|----lat best|---lat worst > RTD| 3.807| 12.825| 21.565| 0| > 3.807| 21.565 > RTD| 3.796| 12.792| 21.483| 0| > 3.796| 21.565 > RTD| 3.770| 12.799| 21.501| 0| > 3.770| 21.565 > RTD| 3.578| 12.806| 20.890| 0| > 3.578| 21.565 > RTD| 3.755| 12.809| 21.486| 0| > 3.578| > > kernel mode:- > Sampling period: 100 us > == Test mode: in-kernel periodic task > == All results in microseconds > warming up... > > RTT| 00:00:01 (in-kernel periodic task, 100 us > period, priority 99) > RTH|-----lat min|-----lat avg|-----lat > max|-overrun|----lat best|---lat worst > RTD| 2.381| 3.451| 19.620| 0| > 2.381| 19.620 > RTD| 2.332| 3.480| 19.930| 0| > 2.332| 19.930 > RTD| 2.382| 3.649| 19.609| 0| > 2.332| 19.930 > RTD| 2.323| 2.786| 14.351| 0| > 2.323| 19.930 > RTD| 2.375| 2.532| 5.519| 0| > 2.323| 19.930 > RTD| 2.332| 3.971| 19.617| 0| > 2.323| 19.930 > > Interrupt mode:- > Sampling period: 100 us > == Test mode: in-kernel timer handler > == All results in microseconds > warming up... > > RTT| 00:00:01 (in-kernel timer handler, 100 us > period, priority 99) > RTH|-----lat min|-----lat avg|-----lat > max|-overrun|----lat best|---lat worst > RTD| -1.563| 7.553| 15.736| 0| > -1.563| 15.736 > RTD| -1.579| 7.558| 15.804| 0| > -1.579| 15.804 > RTD| -1.584| 7.529| 16.167| 0| > -1.584| 16.167 > RTD| -1.548| 7.553| 16.186| 0| > -1.584| 16.186 > RTD| -1.585| 7.556| 16.275| 0| > -1.585| 16.275 Latencies are mainly due to cache refills on the P4. Have you already put load onto your system? If not, worst case latencies will be even longer. Wolfgang. > Thanks, > Poornima > > > --- Wolfgang Grandegger <wg@domain.hid> wrote: > >> Hello, >> >> poornima r wrote: >>> Hello, >>> >>> Srry for not replying all these days... >>> (Was not in in station, may be too personal!!!!!) >>> >>> About software emulation error: >>> >>> 4)Output of /proc/xenomai/faults after the illegal >>>>> instruction:- >>>>> root@domain.hid# cat >>>>> /proc/xenomai/faults >>>>> TRAP CPU0 >>>>> 0: 0 (Data or instruction >> access) >>>>> 1: 0 (Alignment) >>>>> 2: 0 (Altivec unavailable) >>>>> 3: 0 (Program check exception) >>>>> 4: 0 (Machine check exception) >>>>> 5: 0 (Unknown) >>>>> 6: 0 (Instruction breakpoint) >>>>> 7: 0 (Run mode exception) >>>>> 8: 0 (Single-step exception) >>>>> 9: 0 (Non-recoverable exception) >>>>> 10: 1 (Software emulation) >>>>> 11: 0 (Debug) >>>>> 12: 0 (SPE) >>>>> 13: 0 (Altivec assist) >>>> Hm, I see a software emulation exception which is >>>> also the reason for >>>> the illegal instructions. What toolchain do you >> use? >>>> The toolchain >>>> should support software FP emulation. >>> 1)I am using open source too chain with software >>> floating point emulation support. >>> (#ppc_8xx-gcc --v >>> > /lib/gcc/powerpc/3.4.3/../../../../target_powerpc/usr/include/c++/3.4.3 >>> --with-numa-policy=no --with-float=soft) >>> >>> 2)And the kernel is included with code to emulate >> a >>> floating-point >> >>> unit, which will allow programs that >>> use floating-point >> >>> instructions to run >>> >>> Kernel configuration >>> ----CONFIG_MATH_EMULATION:y >> If you build with "--with-float=soft" there is no >> need for math >> emulation in the kernel. Likely, there is something >> wrong with your >> tool-chain. Could you please try a known-to-work >> tool-chain like the >> ELDK v4.x from http://www.denx.de. >> >> Wolfgang. >> >>> Thanks, >>> Poornima >>> >>> --- Wolfgang Grandegger <wg@domain.hid> wrote: >>> >>>> poornima r wrote: >>>>> Hi, >>>>> >>>>> 1)I am using open source kernel from Kernel.org, >>>>> but what is meant by vanilla kernel from >>>> Kernel.org? >>>> >>>> It's the kernel from kernel.org. This means that >> the >>>> Linux kernel 2.6.18 >>>> is running fine on your MPC860 platform as is? >>>> Thanks for the info. >>>> >>>>> 2)With sampling period of 500usec the system >>>> simply >>>>> hangs without printing any results (./latenct >>>> -p500) >>>> >>>> OK. >>>> >>>>> 3)cyclictest with -t1 option (without >>>> IPIPE-tracer) >>>>> root@domain.hid# >> ./cyclictest >>>> -t1 >>>>> 2.04 0.50 0.17 8/27 174 >>>>> >>>>> T: 0 ( 0) P:99 I: 1000 C: 0 Min: >>>> 1000000 >>>>> Act: 0 Avg: 0 Max:-1000000 >>>>> Illegal instruction >>>>> >>>>> 4)Output of /proc/xenomai/faults after the >> illegal >>>>> instruction:- >>>>> root@domain.hid# cat >>>>> /proc/xenomai/faults >>>>> TRAP CPU0 >>>>> 0: 0 (Data or instruction >> access) >>>>> 1: 0 (Alignment) >>>>> 2: 0 (Altivec unavailable) >>>>> 3: 0 (Program check exception) >>>>> 4: 0 (Machine check exception) >>>>> 5: 0 (Unknown) >>>>> 6: 0 (Instruction breakpoint) >>>>> 7: 0 (Run mode exception) >>>>> 8: 0 (Single-step exception) >>>>> 9: 0 (Non-recoverable exception) >>>>> 10: 1 (Software emulation) >>>>> 11: 0 (Debug) >>>>> 12: 0 (SPE) >>>>> 13: 0 (Altivec assist) >>>> Hm, I see a software emulation exception which is >>>> also the reason for >>>> the illegal instructions. What toolchain do you >> use? >>>> The toolchain >>>> should support software FP emulation. >>>> >>>>> 5)Running switchtest:- >>>>> root@domain.hid# >> ./switchtest >>>> -n >>>>> --The system hangs wihtout printing any results >>>>> >>>>> Thanks, >>>>> Poornima >>>>> >>>>> >>>>> --- Wolfgang Grandegger <wg@domain.hid> >> wrote: >>>>>> poornima r wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Thanks for the reply. >>>>>>> >>>>>>> Linux version:linux-2.6.18 >>>>>>> Xenomai: xenomai-2.3.0 (Stable version) >>>>>>> adeos patch: >> adeos-ipipe-2.6.18-ppc-1.5-01.patch >>>>>> OK, I'm curious, did you use the vanilla kernel >>>> from >>>>>> kernel.org? >>>>>> More comments below. >>>>>> >>>>>>> The tests were run as follows: >>>>>>> 1)The sampling period in the code for latency >>>> and >>>>>>> switchbench was changed to 1000000000ns(to >>>> remove >>>>>>> overrun error) >>>>>>> 2)switchtest was run with -n5 option >>>>>>> 3)cyclictest was run with -t5 option(5 >> threads >>>>>>> were created.) >>>>>>> 4)cyclictest was terminated with Illegal >>>>>> instruction >>>>>>> (after creating 5 threads) with IPIPE tracer >>>>>> enabled. >>>>>> >>>>>>> These were the results without I-PIPE Tracer >>>>>> option: >>>>>>> (All the tests were run without any load) >>>>>>> 1)LATENCY TEST:- >>>>>>> User mode:- >>>>>>> /mnt/out_xen/bin# ./latency -t0 >>>>>>> == Sampling period: 1000000 us >>>>>>> == Test mode: periodic user-mode task >>>>>>> == All results in microseconds >>>>>>> warming up... >>>>>>> RTT| 00:00:01 (periodic user-mode task, >>>> 1000000 >>>>>> us >>>>>>> period, priority 99) >>>>>>> RTH|-----lat min|-----lat avg|-----lat >>>>>>> max|-overrun|----lat best|---lat worst >>>>>>> RTD| 167.000| 167.000| 167.000| >> >>>>>> 0| >>>>>>> 167.000| 167.000 >>>>>>> RTD| 176.000| 176.000| 176.000| >> >>>>>> 0| >>>>>>> 167.000| 176.000 >>>>>>> RTD| 168.000| 168.000| 168.000| >> >>>>>> 0| >>>>>>> 167.000| 176.000 > === message truncated === > > > > > ____________________________________________________________________________________ > Get your own web address. > Have a HUGE year through Yahoo! Small Business. > http://smallbusiness.yahoo.com/domains/?p=BESTDEAL > > _______________________________________________ > Adeos-main mailing list > Adeos-main@domain.hid > https://mail.gna.org/listinfo/adeos-main > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Adeos-main] latency results for ppc and x86 2007-02-21 7:13 ` Wolfgang Grandegger @ 2007-02-21 9:33 ` poornima r 2007-02-21 9:33 ` Nicholas Mc Guire 2007-03-14 12:51 ` [Adeos-main] test results for switchtest and cyclictest on x86 poornima r 1 sibling, 1 reply; 15+ messages in thread From: poornima r @ 2007-02-21 9:33 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: adeos-main [-- Attachment #1: Type: text/plain, Size: 11704 bytes --] Hello, Thankx for the reply. These tests were actually run without loading the system Thanks, Poornima Wolfgang Grandegger <wg@domain.hid> wrote: Hello, poornima r wrote: > Hello, > > These were the scheduling latency and interrupt > latency test results on ppc and x86 with IPIPE tracer > option disabled. > > 1.Please comment on these results (whether valid) and Your results are OK. These are actually the figures I remember from my own tests in the past. > 2.Is there any method to optimize these results. No that I know of. There are a few ideas how to reduce latencies further like cache locking or TLB pinning. > 1)PPC:- > (MPC-860 at 48 MHz, 4 kB I-Cache and 4 kB D-Cache) > > User mode:- > root@domain.hid# ./latency -t0 > == Sampling period: 1000000 us > == Test mode: periodic user-mode task > == All results in microseconds > warming up... > RTT| 00:00:01 (periodic user-mode task, 1000000 us > period, priority 99) > RTH|-----lat min|-----lat > avg|-----latmax|-overrun|----lat best|---lat worst > RTD| 167.000| 167.000| 167.000| 0| > 167.000| 167.000 > RTD| 176.000| 176.000| 176.000| 0| > 167.000| 176.000 > RTD| 168.000| 168.000| 168.000| 0| > 167.000| 176.000 > RTD| 171.000| 171.000| 171.000| 0| > 167.000| 176.000 > > Kernel mode:- > root@domain.hid# ./latency -t1 > == Sampling period: 1000000 us > == Test mode: in-kernel periodic task > == All results in microseconds > warming up... > RTT| 00:00:00 (in-kernel periodic task, 1000000 us > period, priority 99) > RTH|-----lat min|-----lat > avg|-----latmax|-overrun|----lat best|---lat worst > RTD| 123.000| 123.000| 123.000| 0| > 123.000| 123.000 > RTD| 125.000| 125.000| 125.000| 0| > 123.000| 125.000 > RTD| 128.333| 128.333| 128.333| 0| > 123.000| 128.333 > RTD| 127.000| 127.000| 127.000| 0| > 123.000| 128.333 > > Interrupt mode:- > root@domain.hid# ./latency -t2 > == Sampling period: 1000000 us > == Test mode: in-kernel timer handler > == All results in microseconds > warming up... > RTT| 00:00:01 (in-kernel timer handler, 1000000 us > period, priority 99) > RTH|-----lat min|-----lat > avg|-----latmax|-overrun|----lat best|---lat worst > RTD| 45.334| 45.334| 45.334| 0| > 45.334| 45.334 > RTD| 45.000| 45.000| 45.000| 0| > 45.000| 45.334 > RTD| 46.000| 46.000| 46.000| 0| > 45.000| 46.000 > RTD| 47.334| 47.334| 47.334| 0| > 45.000| 47.334 > RTD| 46.334| 46.334| 46.334| 0| > 45.000| 47.334 On the MPC860, the latencies are mainly due code execution time as this processor is very slow. > 2)X86:- > (Pentium4, 3.06GHz, 1024 KB cache size) > User mode:- > Sampling period: 100 us > == Test mode: in-kernel periodic task > == All results in microseconds > warming up... > > RTT| 00:00:01 (periodic user-mode task, 100 us > period, priority 99) > RTH|-----lat min|-----lat avg|-----lat > max|-overrun|----lat best|---lat worst > RTD| 3.807| 12.825| 21.565| 0| > 3.807| 21.565 > RTD| 3.796| 12.792| 21.483| 0| > 3.796| 21.565 > RTD| 3.770| 12.799| 21.501| 0| > 3.770| 21.565 > RTD| 3.578| 12.806| 20.890| 0| > 3.578| 21.565 > RTD| 3.755| 12.809| 21.486| 0| > 3.578| > > kernel mode:- > Sampling period: 100 us > == Test mode: in-kernel periodic task > == All results in microseconds > warming up... > > RTT| 00:00:01 (in-kernel periodic task, 100 us > period, priority 99) > RTH|-----lat min|-----lat avg|-----lat > max|-overrun|----lat best|---lat worst > RTD| 2.381| 3.451| 19.620| 0| > 2.381| 19.620 > RTD| 2.332| 3.480| 19.930| 0| > 2.332| 19.930 > RTD| 2.382| 3.649| 19.609| 0| > 2.332| 19.930 > RTD| 2.323| 2.786| 14.351| 0| > 2.323| 19.930 > RTD| 2.375| 2.532| 5.519| 0| > 2.323| 19.930 > RTD| 2.332| 3.971| 19.617| 0| > 2.323| 19.930 > > Interrupt mode:- > Sampling period: 100 us > == Test mode: in-kernel timer handler > == All results in microseconds > warming up... > > RTT| 00:00:01 (in-kernel timer handler, 100 us > period, priority 99) > RTH|-----lat min|-----lat avg|-----lat > max|-overrun|----lat best|---lat worst > RTD| -1.563| 7.553| 15.736| 0| > -1.563| 15.736 > RTD| -1.579| 7.558| 15.804| 0| > -1.579| 15.804 > RTD| -1.584| 7.529| 16.167| 0| > -1.584| 16.167 > RTD| -1.548| 7.553| 16.186| 0| > -1.584| 16.186 > RTD| -1.585| 7.556| 16.275| 0| > -1.585| 16.275 Latencies are mainly due to cache refills on the P4. Have you already put load onto your system? If not, worst case latencies will be even longer. Wolfgang. > Thanks, > Poornima > > > --- Wolfgang Grandegger wrote: > >> Hello, >> >> poornima r wrote: >>> Hello, >>> >>> Srry for not replying all these days... >>> (Was not in in station, may be too personal!!!!!) >>> >>> About software emulation error: >>> >>> 4)Output of /proc/xenomai/faults after the illegal >>>>> instruction:- >>>>> root@domain.hid# cat >>>>> /proc/xenomai/faults >>>>> TRAP CPU0 >>>>> 0: 0 (Data or instruction >> access) >>>>> 1: 0 (Alignment) >>>>> 2: 0 (Altivec unavailable) >>>>> 3: 0 (Program check exception) >>>>> 4: 0 (Machine check exception) >>>>> 5: 0 (Unknown) >>>>> 6: 0 (Instruction breakpoint) >>>>> 7: 0 (Run mode exception) >>>>> 8: 0 (Single-step exception) >>>>> 9: 0 (Non-recoverable exception) >>>>> 10: 1 (Software emulation) >>>>> 11: 0 (Debug) >>>>> 12: 0 (SPE) >>>>> 13: 0 (Altivec assist) >>>> Hm, I see a software emulation exception which is >>>> also the reason for >>>> the illegal instructions. What toolchain do you >> use? >>>> The toolchain >>>> should support software FP emulation. >>> 1)I am using open source too chain with software >>> floating point emulation support. >>> (#ppc_8xx-gcc --v >>> > /lib/gcc/powerpc/3.4.3/../../../../target_powerpc/usr/include/c++/3.4.3 >>> --with-numa-policy=no --with-float=soft) >>> >>> 2)And the kernel is included with code to emulate >> a >>> floating-point >> >>> unit, which will allow programs that >>> use floating-point >> >>> instructions to run >>> >>> Kernel configuration >>> ----CONFIG_MATH_EMULATION:y >> If you build with "--with-float=soft" there is no >> need for math >> emulation in the kernel. Likely, there is something >> wrong with your >> tool-chain. Could you please try a known-to-work >> tool-chain like the >> ELDK v4.x from http://www.denx.de. >> >> Wolfgang. >> >>> Thanks, >>> Poornima >>> >>> --- Wolfgang Grandegger wrote: >>> >>>> poornima r wrote: >>>>> Hi, >>>>> >>>>> 1)I am using open source kernel from Kernel.org, >>>>> but what is meant by vanilla kernel from >>>> Kernel.org? >>>> >>>> It's the kernel from kernel.org. This means that >> the >>>> Linux kernel 2.6.18 >>>> is running fine on your MPC860 platform as is? >>>> Thanks for the info. >>>> >>>>> 2)With sampling period of 500usec the system >>>> simply >>>>> hangs without printing any results (./latenct >>>> -p500) >>>> >>>> OK. >>>> >>>>> 3)cyclictest with -t1 option (without >>>> IPIPE-tracer) >>>>> root@domain.hid# >> ./cyclictest >>>> -t1 >>>>> 2.04 0.50 0.17 8/27 174 >>>>> >>>>> T: 0 ( 0) P:99 I: 1000 C: 0 Min: >>>> 1000000 >>>>> Act: 0 Avg: 0 Max:-1000000 >>>>> Illegal instruction >>>>> >>>>> 4)Output of /proc/xenomai/faults after the >> illegal >>>>> instruction:- >>>>> root@domain.hid# cat >>>>> /proc/xenomai/faults >>>>> TRAP CPU0 >>>>> 0: 0 (Data or instruction >> access) >>>>> 1: 0 (Alignment) >>>>> 2: 0 (Altivec unavailable) >>>>> 3: 0 (Program check exception) >>>>> 4: 0 (Machine check exception) >>>>> 5: 0 (Unknown) >>>>> 6: 0 (Instruction breakpoint) >>>>> 7: 0 (Run mode exception) >>>>> 8: 0 (Single-step exception) >>>>> 9: 0 (Non-recoverable exception) >>>>> 10: 1 (Software emulation) >>>>> 11: 0 (Debug) >>>>> 12: 0 (SPE) >>>>> 13: 0 (Altivec assist) >>>> Hm, I see a software emulation exception which is >>>> also the reason for >>>> the illegal instructions. What toolchain do you >> use? >>>> The toolchain >>>> should support software FP emulation. >>>> >>>>> 5)Running switchtest:- >>>>> root@domain.hid# >> ./switchtest >>>> -n >>>>> --The system hangs wihtout printing any results >>>>> >>>>> Thanks, >>>>> Poornima >>>>> >>>>> >>>>> --- Wolfgang Grandegger >> wrote: >>>>>> poornima r wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Thanks for the reply. >>>>>>> >>>>>>> Linux version:linux-2.6.18 >>>>>>> Xenomai: xenomai-2.3.0 (Stable version) >>>>>>> adeos patch: >> adeos-ipipe-2.6.18-ppc-1.5-01.patch >>>>>> OK, I'm curious, did you use the vanilla kernel >>>> from >>>>>> kernel.org? >>>>>> More comments below. >>>>>> >>>>>>> The tests were run as follows: >>>>>>> 1)The sampling period in the code for latency >>>> and >>>>>>> switchbench was changed to 1000000000ns(to >>>> remove >>>>>>> overrun error) >>>>>>> 2)switchtest was run with -n5 option >>>>>>> 3)cyclictest was run with -t5 option(5 >> threads >>>>>>> were created.) >>>>>>> 4)cyclictest was terminated with Illegal >>>>>> instruction >>>>>>> (after creating 5 threads) with IPIPE tracer >>>>>> enabled. >>>>>> >>>>>>> These were the results without I-PIPE Tracer >>>>>> option: >>>>>>> (All the tests were run without any load) >>>>>>> 1)LATENCY TEST:- >>>>>>> User mode:- >>>>>>> /mnt/out_xen/bin# ./latency -t0 >>>>>>> == Sampling period: 1000000 us >>>>>>> == Test mode: periodic user-mode task >>>>>>> == All results in microseconds >>>>>>> warming up... >>>>>>> RTT| 00:00:01 (periodic user-mode task, >>>> 1000000 >>>>>> us >>>>>>> period, priority 99) >>>>>>> RTH|-----lat min|-----lat avg|-----lat >>>>>>> max|-overrun|----lat best|---lat worst >>>>>>> RTD| 167.000| 167.000| 167.000| >> >>>>>> 0| >>>>>>> 167.000| 167.000 >>>>>>> RTD| 176.000| 176.000| 176.000| >> >>>>>> 0| >>>>>>> 167.000| 176.000 >>>>>>> RTD| 168.000| 168.000| 168.000| >> >>>>>> 0| >>>>>>> 167.000| 176.000 > === message truncated === > > > > > ____________________________________________________________________________________ > Get your own web address. > Have a HUGE year through Yahoo! Small Business. > http://smallbusiness.yahoo.com/domains/?p=BESTDEAL > > _______________________________________________ > Adeos-main mailing list > Adeos-main@domain.hid > https://mail.gna.org/listinfo/adeos-main > > --------------------------------- Bored stiff? Loosen up... Download and play hundreds of games for free on Yahoo! Games. [-- Attachment #2: Type: text/html, Size: 16396 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Adeos-main] latency results for ppc and x86 2007-02-21 9:33 ` poornima r @ 2007-02-21 9:33 ` Nicholas Mc Guire 2007-02-21 10:49 ` Jan Kiszka 0 siblings, 1 reply; 15+ messages in thread From: Nicholas Mc Guire @ 2007-02-21 9:33 UTC (permalink / raw) To: poornima r; +Cc: adeos-main, Wolfgang Grandegger -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > > Latencies are mainly due to cache refills on the P4. Have you already > put load onto your system? If not, worst case latencies will be even longer. > one posibility we found in RTLinux/GPL to reduce latency is to free up TLBs by flushing a few of the TLB hot spots, basically these flushpoints are something like: __asm__ __volatile__("invlpg %0": :"m" (*(char*)__builtin_return_address(0))); put at places where we know we don't need thos lines any more (i.e. after switching tasks or the like). By inserting only a few such flushpoints in hot code on the kernel side we found a clear reduction of the worst case jitter and interrupt response times. Aside from caches, BTB exhaustion in high load situations is also a problem that has not been addressed much in the realtime variants - with the P6 families having a botched BTB prediction unit, one can use some "strange" constructions to reduce branch penalties - i.e.: if(!condition){slow_path();} else{fast_path();} if more predictalbe than if(codition){fast_path();} else{slow_path();} as in the first case the branch prediction is static, thus the worst case is that you are jumping over a few bytes of object code when the condition is not met. in the second case the default if the BTB does not yet know this branch is to guess not-taken and thus load the jump target of the slow patch with the overhead of TLB/Cache penalties. Regarding the PPC numbers, the surprising thing for me is that the same archs are doing MUCH better with old RTAI/RTLinux versions, i.e. 2.4.4 kernel on a 50MHz MPC860 shows a worst case of 57us - so I do question what is going wrong here in the 2.6.X branches of hard-realtime Linux - my suspicion is that there is too much work being done on fast-hot CPUs and the low-end is being neglected - which is bad as the numbers you post here for ADEOS are numbers reachable with mainstream preemptive kernel by now as well (off course not on the low end systems though). hofrat -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFF3BHqnU7rXZKfY2oRAoUIAJ9F+Y/uwanXyUPJlTJYyOQm2H0efgCeOcTM Hh1/eLtu+SHeZpjlIVQMLgM= =0PD6 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Adeos-main] latency results for ppc and x86 2007-02-21 9:33 ` Nicholas Mc Guire @ 2007-02-21 10:49 ` Jan Kiszka 2007-02-21 10:26 ` Nicholas Mc Guire 0 siblings, 1 reply; 15+ messages in thread From: Jan Kiszka @ 2007-02-21 10:49 UTC (permalink / raw) To: Nicholas Mc Guire; +Cc: adeos-main, Wolfgang Grandegger [-- Attachment #1: Type: text/plain, Size: 3198 bytes --] Nicholas Mc Guire wrote: > >> Latencies are mainly due to cache refills on the P4. Have you already >> put load onto your system? If not, worst case latencies will be even >> longer. > > > one posibility we found in RTLinux/GPL to reduce latency is to free up > TLBs by flushing a few of the TLB hot spots, basically these flushpoints > are something like: > > __asm__ __volatile__("invlpg %0": :"m" > (*(char*)__builtin_return_address(0))); > > put at places where we know we don't need thos lines any more (i.e. > after switching tasks or the like). By inserting only a few such > flushpoints in > hot code on the kernel side we found a clear reduction of the worst case > jitter and interrupt response times. Interesting. Are these flushpoints present in latest kernel patches of RTLinux/GPL? Sounds like a nice thing to play with on a rainy day. :) > > Aside from caches, BTB exhaustion in high load situations is also a > problem that has not been addressed much in the realtime variants - with > the P6 families having a botched BTB prediction unit, one can use some > "strange" constructions to reduce branch penalties - i.e.: > > if(!condition){slow_path();} > else{fast_path();} > > if more predictalbe than > > if(codition){fast_path();} > else{slow_path();} I think this is also what likely()/unlikely() teaches to the the compiler on x86 (where there is no branch prediction predicate for the instructions), isn't it? > > as in the first case the branch prediction is static, thus the worst case > is that you are jumping over a few bytes of object code when the condition > is not met. in the second case the default if the BTB does not yet know > this branch is to guess not-taken and thus load the jump target of the > slow patch with the overhead of TLB/Cache penalties. > > Regarding the PPC numbers, the surprising thing for me is that the same > archs are doing MUCH better with old RTAI/RTLinux versions, i.e. 2.4.4 > kernel on a 50MHz MPC860 shows a worst case of 57us - so I do question > what is going wrong here in the 2.6.X branches of hard-realtime Linux - You forget that old stuff was kernel-only, lacking a lot of Linux integration features. Recent I-pipe-based real-time via Xenomai normally includes support for user-space RT (you can switch it off, but hardly anyone does). So its not a useful comparison given that new real-time projects almost always want full-featured user space these days. For a fairer comparison, one should consider a simple I-pipe domain that contains the real-time "application". > my suspicion is that there is too much work being done on fast-hot CPUs > and the low-end is being neglected - which is bad as the numbers you > post here for ADEOS are numbers reachable with mainstream preemptive > kernel by now as well (off course not on the low end systems though). That's scenario-dependent. Simple setups like a plain timed task can reach the dimension of I-pipe-based Xenomai, but more complex scenarios suffer from the exploding complexity in mainstream Linux, even with -rt. Just think of "simple" mutexes realised via futexes. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Adeos-main] latency results for ppc and x86 2007-02-21 10:49 ` Jan Kiszka @ 2007-02-21 10:26 ` Nicholas Mc Guire 2007-02-21 12:29 ` Jan Kiszka 0 siblings, 1 reply; 15+ messages in thread From: Nicholas Mc Guire @ 2007-02-21 10:26 UTC (permalink / raw) To: Jan Kiszka; +Cc: adeos-main, Nicholas Mc Guire, Wolfgang Grandegger -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >>> Latencies are mainly due to cache refills on the P4. Have you already >>> put load onto your system? If not, worst case latencies will be even >>> longer. >> >> >> one posibility we found in RTLinux/GPL to reduce latency is to free up >> TLBs by flushing a few of the TLB hot spots, basically these flushpoints >> are something like: >> >> __asm__ __volatile__("invlpg %0": :"m" >> (*(char*)__builtin_return_address(0))); >> >> put at places where we know we don't need thos lines any more (i.e. >> after switching tasks or the like). By inserting only a few such >> flushpoints in >> hot code on the kernel side we found a clear reduction of the worst case >> jitter and interrupt response times. > > Interesting. Are these flushpoints present in latest kernel patches of > RTLinux/GPL? Sounds like a nice thing to play with on a rainy day. :) > yup - basically if you look at the latest patches (2.4.33-rtl3.2) you will find them in the kernel code. Or in the rtlinux core code (rtl_core.c and rtl_sched.c). The concept is off course not restricted to 2.4.X kernels note thought that some archs (notably MIPS) have a problem with __builtin_return_address. >> >> Aside from caches, BTB exhaustion in high load situations is also a >> problem that has not been addressed much in the realtime variants - with >> the P6 families having a botched BTB prediction unit, one can use some >> "strange" constructions to reduce branch penalties - i.e.: >> >> if(!condition){slow_path();} >> else{fast_path();} >> >> if more predictalbe than >> >> if(codition){fast_path();} >> else{slow_path();} > > I think this is also what likely()/unlikely() teaches to the the > compiler on x86 (where there is no branch prediction predicate for the > instructions), isn't it? > no not really - likely/unlikely give hints during compilation to relocate the unlikey part to a distant location (some lable at the end of the file...) but that does not change the rpoblem at runtime with respect to the worst case. The BTB uses a hysteresis of one miss/hit to adjust the guess on P6 systems with the default (if the address is not present in the BTB) of not taken - thus if you reorder for the "not taken" case being the fast patch you will always have the fast path preloaded in the pipeline. if(likley(condition)){ fast_patch(); else slow_path(); will be fast on average but the worst case is that the address is not in the BTB so the slow_patch() tag is loaded by default. There is a paper on this (a bit messy) published at RTLWS7 (Lile) 2005 if you are interested in the details. >> >> as in the first case the branch prediction is static, thus the worst case >> is that you are jumping over a few bytes of object code when the condition >> is not met. in the second case the default if the BTB does not yet know >> this branch is to guess not-taken and thus load the jump target of the >> slow patch with the overhead of TLB/Cache penalties. >> >> Regarding the PPC numbers, the surprising thing for me is that the same >> archs are doing MUCH better with old RTAI/RTLinux versions, i.e. 2.4.4 >> kernel on a 50MHz MPC860 shows a worst case of 57us - so I do question >> what is going wrong here in the 2.6.X branches of hard-realtime Linux - > > You forget that old stuff was kernel-only, lacking a lot of Linux > integration features. Recent I-pipe-based real-time via Xenomai normally > includes support for user-space RT (you can switch it off, but hardly > anyone does). So its not a useful comparison given that new real-time > projects almost always want full-featured user space these days. For a > fairer comparison, one should consider a simple I-pipe domain that > contains the real-time "application". > note that the numbers posted here WERE kernel numbers ! I know that people want to move to user-space - but what is the advantage over RT-preempt then if you use the dynamic tick patch (scheduled to go mainline in 2.6.21 BTW) ? >> my suspicion is that there is too much work being done on fast-hot CPUs >> and the low-end is being neglected - which is bad as the numbers you >> post here for ADEOS are numbers reachable with mainstream preemptive >> kernel by now as well (off course not on the low end systems though). > > That's scenario-dependent. Simple setups like a plain timed task can > reach the dimension of I-pipe-based Xenomai, but more complex scenarios > suffer from the exploding complexity in mainstream Linux, even with -rt. > Just think of "simple" mutexes realised via futexes. > do you have some code samples with numbers ? I would be very interested in a demo that shows this problem - I was not able to really find a smoking gun with RT-preempt and dynamic ticks (2.6.17.2). hofrat -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFF3B5hnU7rXZKfY2oRAmrGAJwN6SK3pGLMBcxSa2MT9HGQv0q4+wCfZVuq Yxaynkg4Bitl0uMlFug6Yak= =5xzd -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Adeos-main] latency results for ppc and x86 2007-02-21 10:26 ` Nicholas Mc Guire @ 2007-02-21 12:29 ` Jan Kiszka 2007-02-21 12:14 ` Nicholas Mc Guire 0 siblings, 1 reply; 15+ messages in thread From: Jan Kiszka @ 2007-02-21 12:29 UTC (permalink / raw) To: Nicholas Mc Guire; +Cc: adeos-main, Wolfgang Grandegger [-- Attachment #1: Type: text/plain, Size: 6592 bytes --] Nicholas Mc Guire wrote: >>>> Latencies are mainly due to cache refills on the P4. Have you already >>>> put load onto your system? If not, worst case latencies will be even >>>> longer. >>> >>> >>> one posibility we found in RTLinux/GPL to reduce latency is to free up >>> TLBs by flushing a few of the TLB hot spots, basically these flushpoints >>> are something like: >>> >>> __asm__ __volatile__("invlpg %0": :"m" >>> (*(char*)__builtin_return_address(0))); >>> >>> put at places where we know we don't need thos lines any more (i.e. >>> after switching tasks or the like). By inserting only a few such >>> flushpoints in >>> hot code on the kernel side we found a clear reduction of the worst case >>> jitter and interrupt response times. > >> Interesting. Are these flushpoints present in latest kernel patches of >> RTLinux/GPL? Sounds like a nice thing to play with on a rainy day. :) > > > yup - basically if you look at the latest patches (2.4.33-rtl3.2) you > will find them in the kernel code. Or in the rtlinux core code > (rtl_core.c and rtl_sched.c). The concept is off course not restricted > to 2.4.X kernels note thought that some archs (notably MIPS) > have a problem with __builtin_return_address. OK, thanks. > > >>> >>> Aside from caches, BTB exhaustion in high load situations is also a >>> problem that has not been addressed much in the realtime variants - with >>> the P6 families having a botched BTB prediction unit, one can use some >>> "strange" constructions to reduce branch penalties - i.e.: >>> >>> if(!condition){slow_path();} >>> else{fast_path();} >>> >>> if more predictalbe than >>> >>> if(codition){fast_path();} >>> else{slow_path();} > >> I think this is also what likely()/unlikely() teaches to the the >> compiler on x86 (where there is no branch prediction predicate for the >> instructions), isn't it? > > > no not really - likely/unlikely give hints during compilation to relocate > the unlikey part to a distant location (some lable at the end of the > file...) but that does not change the rpoblem at runtime with respect to > the worst case. The BTB uses a hysteresis of one miss/hit to adjust the > guess on P6 systems with the default (if the address is not present in > the BTB) of not taken - thus if you reorder for the "not taken" case > being the fast patch you will always have the fast path preloaded in > the pipeline. > > if(likley(condition)){ > fast_patch(); > else > slow_path(); > > will be fast on average but the worst case is that the address is not > in the BTB so the slow_patch() tag is loaded by default. Ah, got the idea. How much arch/processor-type-dependent is this optimisation? It would surely makes no sense to optimise for arch X in generic code. > > There is a paper on this (a bit messy) published at RTLWS7 (Lile) 2005 > if you are interested in the details. > >>> >>> as in the first case the branch prediction is static, thus the worst >>> case >>> is that you are jumping over a few bytes of object code when the >>> condition >>> is not met. in the second case the default if the BTB does not yet know >>> this branch is to guess not-taken and thus load the jump target of the >>> slow patch with the overhead of TLB/Cache penalties. >>> >>> Regarding the PPC numbers, the surprising thing for me is that the same >>> archs are doing MUCH better with old RTAI/RTLinux versions, i.e. 2.4.4 >>> kernel on a 50MHz MPC860 shows a worst case of 57us - so I do question >>> what is going wrong here in the 2.6.X branches of hard-realtime Linux - > >> You forget that old stuff was kernel-only, lacking a lot of Linux >> integration features. Recent I-pipe-based real-time via Xenomai normally >> includes support for user-space RT (you can switch it off, but hardly >> anyone does). So its not a useful comparison given that new real-time >> projects almost always want full-featured user space these days. For a >> fairer comparison, one should consider a simple I-pipe domain that >> contains the real-time "application". > > > note that the numbers posted here WERE kernel numbers ! But with user space support enabled. There are no separate code paths for kernel and user space threads, basic infrastructure is shared here for good reasons. > I know that people want to move to user-space - but what is the advantage > over RT-preempt then if you use the dynamic tick patch (scheduled to go > mainline in 2.6.21 BTW) ? So far, determinism (both /wrt mainline and latest -rt). BTW, kernel space real time is specifically no longer recommendable for commercial projects that have to worry about the (likely non-GPL) license of their application code. And then there are those countless technical advantages that speed up the development process of user space apps. > >>> my suspicion is that there is too much work being done on fast-hot CPUs >>> and the low-end is being neglected - which is bad as the numbers you >>> post here for ADEOS are numbers reachable with mainstream preemptive >>> kernel by now as well (off course not on the low end systems though). > >> That's scenario-dependent. Simple setups like a plain timed task can >> reach the dimension of I-pipe-based Xenomai, but more complex scenarios >> suffer from the exploding complexity in mainstream Linux, even with -rt. >> Just think of "simple" mutexes realised via futexes. > > > do you have some code samples with numbers ? I would be very interested in > a demo that shows this problem - I was not able to really find a smoking > gun with RT-preempt and dynamic ticks (2.6.17.2). I can't help with demo code, but I can name a few conceptual issues: o Futexes may require to allocate memory when suspending on a contented lock (refill_pi_state_cache) o Futexes depend on mmap_sem o Preemptible RCU read-sides can either lead to OOM or require intrusive read-side priority boosting (see Paul McKenney's LWN article) o Excessive lock nesting depths in critical code paths makes it hard to predict worst-case behaviour (or to verify that measurements actually already triggered them) o Any nanosleep&friends-using Linux process can schedule hrtimers at arbitrary dates, requiring to have a pretty close look at the (worst-case) timer usage pattern of the _whole_ system, not only the SCHED_FIFO/RR part That's what I can tell from the heart. But one would have to analyse the code more thoroughly I guess. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Adeos-main] latency results for ppc and x86 2007-02-21 12:29 ` Jan Kiszka @ 2007-02-21 12:14 ` Nicholas Mc Guire 2007-02-21 13:51 ` Jan Kiszka 0 siblings, 1 reply; 15+ messages in thread From: Nicholas Mc Guire @ 2007-02-21 12:14 UTC (permalink / raw) To: Jan Kiszka; +Cc: adeos-main, Nicholas Mc Guire, Wolfgang Grandegger -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> the unlikey part to a distant location (some lable at the end of the >> file...) but that does not change the rpoblem at runtime with respect to >> the worst case. The BTB uses a hysteresis of one miss/hit to adjust the >> guess on P6 systems with the default (if the address is not present in >> the BTB) of not taken - thus if you reorder for the "not taken" case >> being the fast patch you will always have the fast path preloaded in >> the pipeline. >> >> if(likley(condition)){ >> fast_patch(); >> else >> slow_path(); >> >> will be fast on average but the worst case is that the address is not >> in the BTB so the slow_patch() tag is loaded by default. > > Ah, got the idea. How much arch/processor-type-dependent is this > optimisation? It would surely makes no sense to optimise for arch X in > generic code. > thats the problem it is very x86 centric - p6 and AMD Duron/K7. >> >>> You forget that old stuff was kernel-only, lacking a lot of Linux >>> integration features. Recent I-pipe-based real-time via Xenomai normally >>> includes support for user-space RT (you can switch it off, but hardly >>> anyone does). So its not a useful comparison given that new real-time >>> projects almost always want full-featured user space these days. For a >>> fairer comparison, one should consider a simple I-pipe domain that >>> contains the real-time "application". >> >> >> note that the numbers posted here WERE kernel numbers ! > > But with user space support enabled. There are no separate code paths > for kernel and user space threads, basic infrastructure is shared here > for good reasons. > >> I know that people want to move to user-space - but what is the advantage >> over RT-preempt then if you use the dynamic tick patch (scheduled to go >> mainline in 2.6.21 BTW) ? > > So far, determinism (both /wrt mainline and latest -rt). > > BTW, kernel space real time is specifically no longer recommendable for > commercial projects that have to worry about the (likely non-GPL) > license of their application code. And then there are those countless > technical advantages that speed up the development process of user space > apps. > well I don't see that advantage at this point - determinism seems to be in the same range as you get on ADEOS based systems. That there is a move towards user-space is clear. >> >>>> my suspicion is that there is too much work being done on fast-hot CPUs >>>> and the low-end is being neglected - which is bad as the numbers you >>>> post here for ADEOS are numbers reachable with mainstream preemptive >>>> kernel by now as well (off course not on the low end systems though). >> >>> That's scenario-dependent. Simple setups like a plain timed task can >>> reach the dimension of I-pipe-based Xenomai, but more complex scenarios >>> suffer from the exploding complexity in mainstream Linux, even with -rt. >>> Just think of "simple" mutexes realised via futexes. >> >> >> do you have some code samples with numbers ? I would be very interested in >> a demo that shows this problem - I was not able to really find a smoking >> gun with RT-preempt and dynamic ticks (2.6.17.2). > > I can't help with demo code, but I can name a few conceptual issues: > > o Futexes may require to allocate memory when suspending on a contented > lock (refill_pi_state_cache) > o Futexes depend on mmap_sem ok - thats a nice one > o Preemptible RCU read-sides can either lead to OOM or require > intrusive read-side priority boosting (see Paul McKenney's LWN > article) > o Excessive lock nesting depths in critical code paths makes it hard to > predict worst-case behaviour (or to verify that measurements actually > already triggered them) well thats true for ADEOS/RTAI/RTLinux as well - we are also only black-box testing the RT-kernel - there currently is absolutley NO prof for worst-case timing in any of the flavours of RT-Linux. > o Any nanosleep&friends-using Linux process can schedule hrtimers at > arbitrary dates, requiring to have a pretty close look at the > (worst-case) timer usage pattern of the _whole_ system, not only the > SCHED_FIFO/RR part true - but resource overload hits all flavours - and the splitt of timers and timeouts in 2.6.18++ does reduce the risk clearly. > > That's what I can tell from the heart. But one would have to analyse the > code more thoroughly I guess. > thanks for the imput - at the embedded world Thomas Gleixner demonstrated a simple control system that could sustain sub 10us scheduling jitter under load based on the latest rt-preempt + a bit of tuning I guess (actually don't know). The essence for me is that with the work in 2.6.X I don't see the big performance jump provided by teh hard-RT variants around - especially with respect to guaranteed worst case (and not only "black-box" results). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFF3De1nU7rXZKfY2oRAnjkAJ9jsT6PAhwlY6Wu8a3wddTjHbcWZQCgn6cZ 8Ve6WL2E+QuENP9ezT0I3HU= =hSbF -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Adeos-main] latency results for ppc and x86 2007-02-21 12:14 ` Nicholas Mc Guire @ 2007-02-21 13:51 ` Jan Kiszka 2007-02-21 14:52 ` Wolfgang Grandegger 0 siblings, 1 reply; 15+ messages in thread From: Jan Kiszka @ 2007-02-21 13:51 UTC (permalink / raw) To: Nicholas Mc Guire; +Cc: adeos-main, Wolfgang Grandegger [-- Attachment #1: Type: text/plain, Size: 4524 bytes --] Nicholas Mc Guire wrote: >>> I know that people want to move to user-space - but what is the >>> advantage >>> over RT-preempt then if you use the dynamic tick patch (scheduled to go >>> mainline in 2.6.21 BTW) ? > >> So far, determinism (both /wrt mainline and latest -rt). > >> BTW, kernel space real time is specifically no longer recommendable for >> commercial projects that have to worry about the (likely non-GPL) >> license of their application code. And then there are those countless >> technical advantages that speed up the development process of user space >> apps. > > > well I don't see that advantage at this point - determinism seems to be > in the same range as you get on ADEOS based systems. That there is a > move towards user-space is clear. Yeah, it /seems/... > >>> >>>>> my suspicion is that there is too much work being done on fast-hot >>>>> CPUs >>>>> and the low-end is being neglected - which is bad as the numbers you >>>>> post here for ADEOS are numbers reachable with mainstream preemptive >>>>> kernel by now as well (off course not on the low end systems though). >>> >>>> That's scenario-dependent. Simple setups like a plain timed task can >>>> reach the dimension of I-pipe-based Xenomai, but more complex scenarios >>>> suffer from the exploding complexity in mainstream Linux, even with >>>> -rt. >>>> Just think of "simple" mutexes realised via futexes. >>> >>> >>> do you have some code samples with numbers ? I would be very >>> interested in >>> a demo that shows this problem - I was not able to really find a smoking >>> gun with RT-preempt and dynamic ticks (2.6.17.2). > >> I can't help with demo code, but I can name a few conceptual issues: > >> o Futexes may require to allocate memory when suspending on a contented >> lock (refill_pi_state_cache) >> o Futexes depend on mmap_sem > > ok - thats a nice one > >> o Preemptible RCU read-sides can either lead to OOM or require >> intrusive read-side priority boosting (see Paul McKenney's LWN >> article) >> o Excessive lock nesting depths in critical code paths makes it hard to >> predict worst-case behaviour (or to verify that measurements actually >> already triggered them) > > well thats true for ADEOS/RTAI/RTLinux as well - we are also only > black-box testing the RT-kernel - there currently is absolutley NO > prof for worst-case timing in any of the flavours of RT-Linux. Nope, it isn't. There are neither sleeping not spinning lock nesting depths of that kind in Xenomai or Adeos/I-pipe (or older RT extensions, AFAIK) - ok, except for one spot in a driver we have scheduled for re-design already. > >> o Any nanosleep&friends-using Linux process can schedule hrtimers at >> arbitrary dates, requiring to have a pretty close look at the >> (worst-case) timer usage pattern of the _whole_ system, not only the >> SCHED_FIFO/RR part > > true - but resource overload hits all flavours - and the splitt of > timers and timeouts in 2.6.18++ does reduce the risk clearly. Compared to making all Linux timers hrtimers? Yes, for sure. But that would be an insane idea anyway, just considering all the network-related timers. > >> That's what I can tell from the heart. But one would have to analyse the >> code more thoroughly I guess. > > thanks for the imput - at the embedded world Thomas Gleixner > demonstrated a simple control system that could sustain sub 10us > scheduling jitter under load based on the latest rt-preempt + a bit > of tuning I guess (actually don't know). Without knowing the test (Wolfgang, did you see it?), I would guess the setup as follows: dual-core GHz Pentium, isolated core for the timed task, no peripheral interaction, no synchronisation means, likely even no further syscalls except for the sleep service. Surely a progress over plain Linux, but that one's only useful for very specific scenarios. No one claims -rt is not useful or too limited. Each approach has its preferred application domain. Knowing strength and weaknesses of both is required here. And providing the user the choice (like Xenomai 3 will). > The essence for me is that with > the work in 2.6.X I don't see the big performance jump provided by teh > hard-RT variants around - especially with respect to guaranteed worst > case (and not only "black-box" results). Could it be a bit too enthusiastic to base such an assessment on a corner-case demonstration? Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Adeos-main] latency results for ppc and x86 2007-02-21 13:51 ` Jan Kiszka @ 2007-02-21 14:52 ` Wolfgang Grandegger 2007-02-21 15:10 ` Nicholas Mc Guire 0 siblings, 1 reply; 15+ messages in thread From: Wolfgang Grandegger @ 2007-02-21 14:52 UTC (permalink / raw) To: Jan Kiszka; +Cc: adeos-main, Nicholas Mc Guire Jan Kiszka wrote: > Nicholas Mc Guire wrote: >>>> I know that people want to move to user-space - but what is the >>>> advantage >>>> over RT-preempt then if you use the dynamic tick patch (scheduled to go >>>> mainline in 2.6.21 BTW) ? >>> So far, determinism (both /wrt mainline and latest -rt). >>> BTW, kernel space real time is specifically no longer recommendable for >>> commercial projects that have to worry about the (likely non-GPL) >>> license of their application code. And then there are those countless >>> technical advantages that speed up the development process of user space >>> apps. >> >> well I don't see that advantage at this point - determinism seems to be >> in the same range as you get on ADEOS based systems. That there is a >> move towards user-space is clear. > > Yeah, it /seems/... > >>>>>> my suspicion is that there is too much work being done on fast-hot >>>>>> CPUs >>>>>> and the low-end is being neglected - which is bad as the numbers you >>>>>> post here for ADEOS are numbers reachable with mainstream preemptive >>>>>> kernel by now as well (off course not on the low end systems though). >>>>> That's scenario-dependent. Simple setups like a plain timed task can >>>>> reach the dimension of I-pipe-based Xenomai, but more complex scenarios >>>>> suffer from the exploding complexity in mainstream Linux, even with >>>>> -rt. >>>>> Just think of "simple" mutexes realised via futexes. >>>> >>>> do you have some code samples with numbers ? I would be very >>>> interested in >>>> a demo that shows this problem - I was not able to really find a smoking >>>> gun with RT-preempt and dynamic ticks (2.6.17.2). >>> I can't help with demo code, but I can name a few conceptual issues: >>> o Futexes may require to allocate memory when suspending on a contented >>> lock (refill_pi_state_cache) >>> o Futexes depend on mmap_sem >> ok - thats a nice one >> >>> o Preemptible RCU read-sides can either lead to OOM or require >>> intrusive read-side priority boosting (see Paul McKenney's LWN >>> article) >>> o Excessive lock nesting depths in critical code paths makes it hard to >>> predict worst-case behaviour (or to verify that measurements actually >>> already triggered them) >> well thats true for ADEOS/RTAI/RTLinux as well - we are also only >> black-box testing the RT-kernel - there currently is absolutley NO >> prof for worst-case timing in any of the flavours of RT-Linux. > > Nope, it isn't. There are neither sleeping not spinning lock nesting > depths of that kind in Xenomai or Adeos/I-pipe (or older RT extensions, > AFAIK) - ok, except for one spot in a driver we have scheduled for > re-design already. > >>> o Any nanosleep&friends-using Linux process can schedule hrtimers at >>> arbitrary dates, requiring to have a pretty close look at the >>> (worst-case) timer usage pattern of the _whole_ system, not only the >>> SCHED_FIFO/RR part >> true - but resource overload hits all flavours - and the splitt of >> timers and timeouts in 2.6.18++ does reduce the risk clearly. > > Compared to making all Linux timers hrtimers? Yes, for sure. But that > would be an insane idea anyway, just considering all the network-related > timers. > >>> That's what I can tell from the heart. But one would have to analyse the >>> code more thoroughly I guess. >> thanks for the imput - at the embedded world Thomas Gleixner >> demonstrated a simple control system that could sustain sub 10us >> scheduling jitter under load based on the latest rt-preempt + a bit >> of tuning I guess (actually don't know). > > Without knowing the test (Wolfgang, did you see it?), I would guess the > setup as follows: dual-core GHz Pentium, isolated core for the timed > task, no peripheral interaction, no synchronisation means, likely even > no further syscalls except for the sleep service. Surely a progress over > plain Linux, but that one's only useful for very specific scenarios. No, I have not seen it. But I believe, that with careful hardware selection it's possible to achieve that. On high-end systems the latency is dominate by hardware. On low-end systems code size matters. So far I have not seen any serious comparison for low-end Linux systems and -rt does not work yet on PowerPC (the high-res support is still missing). > No one claims -rt is not useful or too limited. Each approach has its > preferred application domain. Knowing strength and weaknesses of both is > required here. And providing the user the choice (like Xenomai 3 will). > >> The essence for me is that with >> the work in 2.6.X I don't see the big performance jump provided by teh >> hard-RT variants around - especially with respect to guaranteed worst >> case (and not only "black-box" results). > > Could it be a bit too enthusiastic to base such an assessment on a > corner-case demonstration? > > Jan > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Adeos-main] latency results for ppc and x86 2007-02-21 14:52 ` Wolfgang Grandegger @ 2007-02-21 15:10 ` Nicholas Mc Guire 2007-02-21 18:27 ` Jan Kiszka 0 siblings, 1 reply; 15+ messages in thread From: Nicholas Mc Guire @ 2007-02-21 15:10 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: Nicholas Mc Guire, adeos-main -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >>>>> >>>>> do you have some code samples with numbers ? I would be very >>>>> interested in >>>>> a demo that shows this problem - I was not able to really find a >>>>> smoking >>>>> gun with RT-preempt and dynamic ticks (2.6.17.2). >>>> I can't help with demo code, but I can name a few conceptual issues: >>>> o Futexes may require to allocate memory when suspending on a >>>> contented >>>> lock (refill_pi_state_cache) >>>> o Futexes depend on mmap_sem >>> ok - thats a nice one >>> >>>> o Preemptible RCU read-sides can either lead to OOM or require >>>> intrusive read-side priority boosting (see Paul McKenney's LWN >>>> article) >>>> o Excessive lock nesting depths in critical code paths makes it hard >>>> to >>>> predict worst-case behaviour (or to verify that measurements >>>> actually >>>> already triggered them) >>> well thats true for ADEOS/RTAI/RTLinux as well - we are also only >>> black-box testing the RT-kernel - there currently is absolutley NO >>> prof for worst-case timing in any of the flavours of RT-Linux. >> >> Nope, it isn't. There are neither sleeping not spinning lock nesting >> depths of that kind in Xenomai or Adeos/I-pipe (or older RT extensions, >> AFAIK) - ok, except for one spot in a driver we have scheduled for >> re-design already. that might be so - never the less there is no formal-proof that the worst case of ADEOS/I-pipe is X-microseconds, the latency/jitter numbers are based on black-box testing. In fact one problem is that there are not even code-coverage tools (or I just did not find them) that can provide coverage data for ADEOS - thus how can one guarantee worst-case ? >> >>>> o Any nanosleep&friends-using Linux process can schedule hrtimers at >>>> arbitrary dates, requiring to have a pretty close look at the >>>> (worst-case) timer usage pattern of the _whole_ system, not only the >>>> SCHED_FIFO/RR part >>> true - but resource overload hits all flavours - and the splitt of >>> timers and timeouts in 2.6.18++ does reduce the risk clearly. >> >> Compared to making all Linux timers hrtimers? Yes, for sure. But that >> would be an insane idea anyway, just considering all the network-related >> timers. well they were all on one timer wheel not too long ago - and yes - it was insane ;) >> >>>> That's what I can tell from the heart. But one would have to analyse >>>> the >>>> code more thoroughly I guess. >>> thanks for the imput - at the embedded world Thomas Gleixner >>> demonstrated a simple control system that could sustain sub 10us >>> scheduling jitter under load based on the latest rt-preempt + a bit >>> of tuning I guess (actually don't know). >> >> Without knowing the test (Wolfgang, did you see it?), I would guess the >> setup as follows: dual-core GHz Pentium, isolated core for the timed >> task, no peripheral interaction, no synchronisation means, likely even >> no further syscalls except for the sleep service. Surely a progress over >> plain Linux, but that one's only useful for very specific scenarios. > > No, I have not seen it. But I believe, that with careful hardware selection > it's possible to achieve that. On high-end systems the latency is dominate by > hardware. On low-end systems code size matters. So far I have not seen any > serious comparison for low-end Linux systems and -rt does not work yet on > PowerPC (the high-res support is still missing). I did some on low end X86 (ELAN SC520 133MHz) the results are not going to make many happy at this point (2.6.14-rt9 was my last test), but I have still to run benchmarks with dynamic tick and some of the tglx patches on low end X86. The fact that again all archs except X86 are laging behind is off course a key issue at this point. > >> No one claims -rt is not useful or too limited. Each approach has its >> preferred application domain. Knowing strength and weaknesses of both is >> required here. And providing the user the choice (like Xenomai 3 will). >> >>> The essence for me is that with >>> the work in 2.6.X I don't see the big performance jump provided by teh >>> hard-RT variants around - especially with respect to guaranteed worst >>> case (and not only "black-box" results). >> >> Could it be a bit too enthusiastic to base such an assessment on a >> corner-case demonstration? its not a corener case demonstration, Ive been doing benchmarks on rt preempt now for quite some time, there is still an advantage if you run simple comparisons (jitter measurements) - but it is clearly going down, The problem I have with RT-preempt being 50us and ADEOS is 15us is simply that the sector that does need those numbers that RT-preempt will most likely never reach is generally interested in guaranteed times, and thats where it becomes tough to argue any of the hard-realtime extensions at this point - that is not saying RT-preempt can replace ADEOS/RTAI/RTLinux-gpl Im just saying that the numbers are no longer 2/3 orders of magnitude,which they were in 2.2.X/2.4.X and where arguing the use was simple. Don't get me wrong Im not trying to argue away ADEOS/RTAI or I would have given up RTLinux/GPL quite some time ago - but I belive if these low-jitter/latency systems want to keep there acceptance in industry a key issue will be to improve the tools for verification/validation - just take this discussion - it started out with: <snip> > RTD| -1.585| 7.556| 16.275| 0| > -1.585| 16.275 Latencies are mainly due to cache refills on the P4. Have you already put load onto your system? If not, worst case latencies will be even longer. <snip> THAT is a problem in arguing for ADEOS/I-pipe - WHAT is the worst case now ? what is the cause of the worst case ? and can I really demonstrate by strong evidence that the worst case on this system is actually XXXX microseconds under arbitrary load and will not be higher in some strange corner cases ? hofrat -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFF3GD9nU7rXZKfY2oRAptQAJ4iwYoaJtfTds9am4Gwxl6xSNqR1gCfcAMC 7p9PWIJ8a6mOErrMFGQ4MbI= =QTjQ -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Adeos-main] latency results for ppc and x86 2007-02-21 15:10 ` Nicholas Mc Guire @ 2007-02-21 18:27 ` Jan Kiszka 2007-02-21 19:07 ` Nicholas Mc Guire 0 siblings, 1 reply; 15+ messages in thread From: Jan Kiszka @ 2007-02-21 18:27 UTC (permalink / raw) To: Nicholas Mc Guire; +Cc: adeos-main, Wolfgang Grandegger [-- Attachment #1: Type: text/plain, Size: 5363 bytes --] Nicholas Mc Guire wrote: >>>> well thats true for ADEOS/RTAI/RTLinux as well - we are also only >>>> black-box testing the RT-kernel - there currently is absolutley NO >>>> prof for worst-case timing in any of the flavours of RT-Linux. >>> >>> Nope, it isn't. There are neither sleeping not spinning lock nesting >>> depths of that kind in Xenomai or Adeos/I-pipe (or older RT extensions, >>> AFAIK) - ok, except for one spot in a driver we have scheduled for >>> re-design already. > > that might be so - never the less there is no formal-proof that the worst > case of ADEOS/I-pipe is X-microseconds, the latency/jitter numbers are > based on black-box testing. In fact one problem is that there are not even > code-coverage tools (or I just did not find them) that can provide > coverage data for ADEOS - thus how can one guarantee worst-case ? The fact that tool support is "improvable" doesn't mean that such an analysis is impossible. You may over-estimate, but you can derive numbers for a given system (consisting of real-time core + RT applications) based on a combined offline system analysis and runtime measurements. But hardly anyone is doing this "for fun". >>>> The essence for me is that with >>>> the work in 2.6.X I don't see the big performance jump provided by teh >>>> hard-RT variants around - especially with respect to guaranteed worst >>>> case (and not only "black-box" results). >>> >>> Could it be a bit too enthusiastic to base such an assessment on a >>> corner-case demonstration? > > its not a corener case demonstration, Ive been doing benchmarks on rt > preempt now for quite some time, there is still an advantage if you run > simple comparisons (jitter measurements) - but it is clearly going down, > The problem I have with RT-preempt being 50us and ADEOS is 15us is > simply that the sector that does need those numbers that RT-preempt will > most likely > never reach is generally interested in guaranteed times, and thats where > it becomes tough to argue any of the hard-realtime extensions at this > point - that is not saying RT-preempt can replace ADEOS/RTAI/RTLinux-gpl > Im just saying that the numbers are no longer 2/3 orders of > magnitude,which they were in 2.2.X/2.4.X and where arguing the use was > simple. Granted, arguing becomes more hairy when you have to pull out low-level system details like I posted (and not discussing individual issues of certain patches). There are scenarios where I would recommend -rt as well, but so far only few where RT extensions are fitting too. > > Don't get me wrong Im not trying to argue away ADEOS/RTAI or I would > have given up RTLinux/GPL quite some time ago - but I belive if these > low-jitter/latency systems want to keep there acceptance in industry a > key issue will be to improve the tools for verification/validation - Ack, and I'm sure they will emerge over the time. I don't expect this to happen just because someone enjoys it (adding features is always funnier), but because users will at some point really need them. It's a process that will derive from the steadily growing professional user base in both industry and academia. > just take this discussion - it started out with: > > <snip> >> RTD| -1.585| 7.556| 16.275| 0| >> -1.585| 16.275 > > Latencies are mainly due to cache refills on the P4. Have you already > put load onto your system? If not, worst case latencies will be even > longer. As pointed out earlier in this thread, those numbers doesn't tell much without appropriate load and a significant runtime. We are maintaining documentation on this in Xenomai, but it may be too tricky to find. And as always, such a test only represents one simple snapshot. At least you have to redo this on the target hardware with all peripheral devices in use. > > <snip> > > THAT is a problem in arguing for ADEOS/I-pipe - WHAT is the worst case > now ? what is the cause of the worst case ? and can I really demonstrate > by strong evidence that the worst case on this system is actually XXXX > microseconds under arbitrary load and will not be higher in some strange > corner cases ? Leaving the completely formal proof aside (that's something even microkernels still cannot provide), you may go to the drawing board, develop a model of your _specific_ system, derive worst-case constellations, and trace the real system for those events (probably also stimulating them) while measuring latencies. Then add some safety margin ;), and you have worst-case numbers of a far higher quality then by just experimenting with benchmarks. This process can become complex (ie. costly), but it is doable. The point about co-scheduling approaches is here, that they already come with a simpler base model (for the RT part), and they allow to "tune" your system to simplify this model even further - without giving up an integrated non-RT execution environment and its optimisations. We will see the effect better on upcoming multi-core systems (not claiming that Xenomai is already in /the/ perfect shape for them). However, if you have suggestions on how to improve the current tool situation, /me and likely others are all ears. And such improvements do not have to be I-pipe/Xenomai-specific... Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Adeos-main] latency results for ppc and x86 2007-02-21 18:27 ` Jan Kiszka @ 2007-02-21 19:07 ` Nicholas Mc Guire 2007-02-21 21:05 ` Jan Kiszka 0 siblings, 1 reply; 15+ messages in thread From: Nicholas Mc Guire @ 2007-02-21 19:07 UTC (permalink / raw) To: Jan Kiszka; +Cc: adeos-main, Nicholas Mc Guire, Wolfgang Grandegger -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Nicholas Mc Guire wrote: >>>>> well thats true for ADEOS/RTAI/RTLinux as well - we are also only >>>>> black-box testing the RT-kernel - there currently is absolutley NO >>>>> prof for worst-case timing in any of the flavours of RT-Linux. >>>> >>>> Nope, it isn't. There are neither sleeping not spinning lock nesting >>>> depths of that kind in Xenomai or Adeos/I-pipe (or older RT extensions, >>>> AFAIK) - ok, except for one spot in a driver we have scheduled for >>>> re-design already. >> >> that might be so - never the less there is no formal-proof that the worst >> case of ADEOS/I-pipe is X-microseconds, the latency/jitter numbers are >> based on black-box testing. In fact one problem is that there are not even >> code-coverage tools (or I just did not find them) that can provide >> coverage data for ADEOS - thus how can one guarantee worst-case ? > > The fact that tool support is "improvable" doesn't mean that such an > analysis is impossible. You may over-estimate, but you can derive > numbers for a given system (consisting of real-time core + RT > applications) based on a combined offline system analysis and runtime > measurements. But hardly anyone is doing this "for fun". > with the current status I don't think a off-line analysis is resonable I don't think a model of ADEOS is resonably duable, alteast not a modleing that would lead to any usable results - I might be wrong - do you know of any such successfull approaches ? All testing is really inherently limited, from black-box testing you simply don't get any guarantees. <snip> >> its not a corener case demonstration, Ive been doing benchmarks on rt >> preempt now for quite some time, there is still an advantage if you run >> simple comparisons (jitter measurements) - but it is clearly going down, >> The problem I have with RT-preempt being 50us and ADEOS is 15us is >> simply that the sector that does need those numbers that RT-preempt will >> most likely >> never reach is generally interested in guaranteed times, and thats where >> it becomes tough to argue any of the hard-realtime extensions at this >> point - that is not saying RT-preempt can replace ADEOS/RTAI/RTLinux-gpl >> Im just saying that the numbers are no longer 2/3 orders of >> magnitude,which they were in 2.2.X/2.4.X and where arguing the use was >> simple. > > Granted, arguing becomes more hairy when you have to pull out low-level > system details like I posted (and not discussing individual issues of > certain patches). There are scenarios where I would recommend -rt as > well, but so far only few where RT extensions are fitting too. > >> >> Don't get me wrong Im not trying to argue away ADEOS/RTAI or I would >> have given up RTLinux/GPL quite some time ago - but I belive if these >> low-jitter/latency systems want to keep there acceptance in industry a >> key issue will be to improve the tools for verification/validation - > > Ack, and I'm sure they will emerge over the time. I don't expect this to > happen just because someone enjoys it (adding features is always > funnier), but because users will at some point really need them. It's a > process that will derive from the steadily growing professional user > base in both industry and academia. let see - I hope you are right - I'm just starting into a FMEA/HAZOP for XtratuM "for fun" ;) >> >> <snip> >> >> THAT is a problem in arguing for ADEOS/I-pipe - WHAT is the worst case >> now ? what is the cause of the worst case ? and can I really demonstrate >> by strong evidence that the worst case on this system is actually XXXX >> microseconds under arbitrary load and will not be higher in some strange >> corner cases ? > > Leaving the completely formal proof aside (that's something even > microkernels still cannot provide), you may go to the drawing board, > develop a model of your _specific_ system, derive worst-case > constellations, and trace the real system for those events (probably > also stimulating them) while measuring latencies. Then add some safety > margin ;), and you have worst-case numbers of a far higher quality then > by just experimenting with benchmarks. This process can become complex > (ie. costly), but it is doable. > > The point about co-scheduling approaches is here, that they already come > with a simpler base model (for the RT part), and they allow to "tune" > your system to simplify this model even further - without giving up an > integrated non-RT execution environment and its optimisations. We will > see the effect better on upcoming multi-core systems (not claiming that > Xenomai is already in /the/ perfect shape for them). > > > However, if you have suggestions on how to improve the current tool > situation, /me and likely others are all ears. And such improvements do > not have to be I-pipe/Xenomai-specific... > well one thing Im looking into for RTLinux is to extend things like kernel GCOV into RTLinux and KFI/KFT to RTLinux as this allows much better assessment. I guess that those extensions would equally be worth while for ADESO/I-pipe/Xenomai. refs: KFT www.celinuxforum.org/CelfPubWiki/PatchArchive last one for 2.6.12 GCOV-Kernel part of LTP now (last one is for linux-2.6.16-gcov.patch.gz hofrat -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFF3Jh1nU7rXZKfY2oRAmZAAJ9ZneSKj4sRCx0h2CBlhCXvkkDVWQCfSxkb RpGdirhoa91vElKgqrZ4Cpg= =OSmY -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Adeos-main] latency results for ppc and x86 2007-02-21 19:07 ` Nicholas Mc Guire @ 2007-02-21 21:05 ` Jan Kiszka 0 siblings, 0 replies; 15+ messages in thread From: Jan Kiszka @ 2007-02-21 21:05 UTC (permalink / raw) To: Nicholas Mc Guire; +Cc: adeos-main, Wolfgang Grandegger [-- Attachment #1: Type: text/plain, Size: 7125 bytes --] Nicholas Mc Guire wrote: >> Nicholas Mc Guire wrote: >>>>>> well thats true for ADEOS/RTAI/RTLinux as well - we are also only >>>>>> black-box testing the RT-kernel - there currently is absolutley NO >>>>>> prof for worst-case timing in any of the flavours of RT-Linux. >>>>> >>>>> Nope, it isn't. There are neither sleeping not spinning lock nesting >>>>> depths of that kind in Xenomai or Adeos/I-pipe (or older RT >>>>> extensions, >>>>> AFAIK) - ok, except for one spot in a driver we have scheduled for >>>>> re-design already. >>> >>> that might be so - never the less there is no formal-proof that the >>> worst >>> case of ADEOS/I-pipe is X-microseconds, the latency/jitter numbers are >>> based on black-box testing. In fact one problem is that there are not >>> even >>> code-coverage tools (or I just did not find them) that can provide >>> coverage data for ADEOS - thus how can one guarantee worst-case ? > >> The fact that tool support is "improvable" doesn't mean that such an >> analysis is impossible. You may over-estimate, but you can derive >> numbers for a given system (consisting of real-time core + RT >> applications) based on a combined offline system analysis and runtime >> measurements. But hardly anyone is doing this "for fun". > > > with the current status I don't think a off-line analysis is resonable > I don't think a model of ADEOS is resonably duable, alteast not a > modleing that would lead to any usable results - I might be wrong - do > you know of any such successfull approaches ? All testing is really > inherently limited, from black-box testing you simply don't get any > guarantees. We are no longer black-box testing - thanks to our "KFT". I'm trying to advertise this model heavily to users, but it still requires a bit too much system knowledge. Still, modelling a system of I-pipe + Xenomai remains an open challenge AFAIK. > > <snip> >>> its not a corener case demonstration, Ive been doing benchmarks on rt >>> preempt now for quite some time, there is still an advantage if you run >>> simple comparisons (jitter measurements) - but it is clearly going down, >>> The problem I have with RT-preempt being 50us and ADEOS is 15us is >>> simply that the sector that does need those numbers that RT-preempt will >>> most likely >>> never reach is generally interested in guaranteed times, and thats where >>> it becomes tough to argue any of the hard-realtime extensions at this >>> point - that is not saying RT-preempt can replace ADEOS/RTAI/RTLinux-gpl >>> Im just saying that the numbers are no longer 2/3 orders of >>> magnitude,which they were in 2.2.X/2.4.X and where arguing the use was >>> simple. > >> Granted, arguing becomes more hairy when you have to pull out low-level >> system details like I posted (and not discussing individual issues of >> certain patches). There are scenarios where I would recommend -rt as >> well, but so far only few where RT extensions are fitting too. > >>> >>> Don't get me wrong Im not trying to argue away ADEOS/RTAI or I would >>> have given up RTLinux/GPL quite some time ago - but I belive if these >>> low-jitter/latency systems want to keep there acceptance in industry a >>> key issue will be to improve the tools for verification/validation - > >> Ack, and I'm sure they will emerge over the time. I don't expect this to >> happen just because someone enjoys it (adding features is always >> funnier), but because users will at some point really need them. It's a >> process that will derive from the steadily growing professional user >> base in both industry and academia. > > let see - I hope you are right - I'm just starting into a FMEA/HAZOP for > XtratuM "for fun" ;) Will be interesting to hear/read about practical experiences. > >>> >>> <snip> >>> >>> THAT is a problem in arguing for ADEOS/I-pipe - WHAT is the worst case >>> now ? what is the cause of the worst case ? and can I really demonstrate >>> by strong evidence that the worst case on this system is actually XXXX >>> microseconds under arbitrary load and will not be higher in some strange >>> corner cases ? > >> Leaving the completely formal proof aside (that's something even >> microkernels still cannot provide), you may go to the drawing board, >> develop a model of your _specific_ system, derive worst-case >> constellations, and trace the real system for those events (probably >> also stimulating them) while measuring latencies. Then add some safety >> margin ;), and you have worst-case numbers of a far higher quality then >> by just experimenting with benchmarks. This process can become complex >> (ie. costly), but it is doable. > >> The point about co-scheduling approaches is here, that they already come >> with a simpler base model (for the RT part), and they allow to "tune" >> your system to simplify this model even further - without giving up an >> integrated non-RT execution environment and its optimisations. We will >> see the effect better on upcoming multi-core systems (not claiming that >> Xenomai is already in /the/ perfect shape for them). > > >> However, if you have suggestions on how to improve the current tool >> situation, /me and likely others are all ears. And such improvements do >> not have to be I-pipe/Xenomai-specific... > > well one thing Im looking into for RTLinux is to extend things like > kernel GCOV into RTLinux and KFI/KFT to RTLinux as this allows much > better assessment. I guess that those extensions would equally be worth > while > for ADESO/I-pipe/Xenomai. > > refs: > > KFT www.celinuxforum.org/CelfPubWiki/PatchArchive last one for 2.6.12 > GCOV-Kernel part of LTP now (last one is for linux-2.6.16-gcov.patch.gz > [Quick glance at GCOV patch] Hmm, the thrilling thing is typically locking, but I don't see a single spinlock, just some semaphores that cannot be called from arbitrary contexts anyway. Hmm. Did you already played with it for some kernel? Regarding KFT: we have such thing already. Partly derived from Ingo Molnar's work, but with less impact during freeze, the function tracer is in I-pipe since more than a year. It's heavily used (at least by the core team) for application and kernel debugging, and for latency spotting of course. Available for most I-pipe archs, even for the latest x86_64-WiP. The funny thing is that even RTAI could make use of it - if they only realised that it's in their patches. Next to come (yeah, long announced) is LTTng support, i.e. patch and front-end extensions for Xenomai. There is a working version lying around somewhere in Canada, I just need to kick the guy again who did that work for his thesis so that he roles out a release and we can start discussing the patch integration. Good to be reminded... So there is definitely not nothing - but surely still enough to do :). If you see some potential in cooperating on front-ends (given that you still seem to head for your own kernel-patch path), let us know. I guess there should be common ground. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Adeos-main] test results for switchtest and cyclictest on x86 2007-02-21 7:13 ` Wolfgang Grandegger 2007-02-21 9:33 ` poornima r @ 2007-03-14 12:51 ` poornima r 1 sibling, 0 replies; 15+ messages in thread From: poornima r @ 2007-03-14 12:51 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: adeos-main [-- Attachment #1: Type: text/plain, Size: 12298 bytes --] Hello, There were the results for cyclictest and switchbench test results run on X86 (without load) System specifications:- Xenomai: xenomai-2.3.0 Linux kernel: linux-2.6.19.3 CPU speed : 2.79GHz Cache: 512KB a) Switchtest:- Results: ./switchtest -T 50 -n rtk rtup rtus (-T:limit of test duration i.e 50 seconds -n:disables any use of FPU instructions. threadspec:rtk i.e kernel-space realtime thread ,rtup i.e for a user-space real-time thread running in primary mode,rtus i.e user-space real-time thread running in secondary mode) RTT| 00:00:01 RTH|ctx switches|-------total RTD| 1000| 1000 RTD| 1004| 2004 RTD| 1002| 3006 RTD| 1006| 4012 RTD| 1004| 5016 RTD| 1002| 6018 RTD| 1006| 7024 RTD| 998| 8022 RTD| 1006| 9028 RTD| 1004| 10032 RTD| 516| 10548 RTD| 504| 11052 RTD| 504| 11556 RTD| 504| 12060 RTD| 504| 12564 RTD| 504| 13068 RTD| 504| 13572 RTD| 504| 14076 RTD| 504| 14580 RTD| 504| 15084 RTD| 504| 15588 RTT| 00:00:22 RTH|ctx switches|-------total RTD| 510| 16098 RTD| 846| 16944 RTD| 996| 17940 RTD| 1002| 18942 RTD| 996| 19938 RTD| 1006| 20944 RTD| 1004| 21948 RTD| 1000| 22948 RTD| 1004| 23952 RTD| 1002| 24954 RTD| 1006| 25960 RTD| 1004| 26964 RTD| 1002| 27966 RTD| 1006| 28972 RTD| 1004| 29976 RTD| 1002| 30978 RTD| 1006| 31984 RTD| 998| 32982 RTD| 1006| 33988 RTD| 1004| 34992 RTD| 1002| 35994 RTT| 00:00:43 RTH|ctx switches|-------total RTD| 1002| 36996 RTD| 1002| 37998 RTD| 1006| 39004 RTD| 1004| 40008 RTD| 1002| 41010 RTD| 1002| 42012 RTD| 1002| 43014 RTD| 790| 43804 Total no of context switches between kernel-space realtime thread ,user-space real-time thread running in primary mode and user-space real-time thread running in secondary mode is 43804. b)cyclictest:- Results:- ./cyclictest -l 100 0.59 0.13 0.06 1/99 25688 T: 0 (25688) P:99 I: 1000 C: 100 Min: 7 Act: 7 Avg: 9 Max: 24 1) Please comment on these results. 2) What are we benchmarking from the above cyclictest result? Thanks and Regards, Poornima Wolfgang Grandegger <wg@domain.hid> wrote: Hello, poornima r wrote: > Hello, > > These were the scheduling latency and interrupt > latency test results on ppc and x86 with IPIPE tracer > option disabled. > > 1.Please comment on these results (whether valid) and Your results are OK. These are actually the figures I remember from my own tests in the past. > 2.Is there any method to optimize these results. No that I know of. There are a few ideas how to reduce latencies further like cache locking or TLB pinning. > 1)PPC:- > (MPC-860 at 48 MHz, 4 kB I-Cache and 4 kB D-Cache) > > User mode:- > root@domain.hid# ./latency -t0 > == Sampling period: 1000000 us > == Test mode: periodic user-mode task > == All results in microseconds > warming up... > RTT| 00:00:01 (periodic user-mode task, 1000000 us > period, priority 99) > RTH|-----lat min|-----lat > avg|-----latmax|-overrun|----lat best|---lat worst > RTD| 167.000| 167.000| 167.000| 0| > 167.000| 167.000 > RTD| 176.000| 176.000| 176.000| 0| > 167.000| 176.000 > RTD| 168.000| 168.000| 168.000| 0| > 167.000| 176.000 > RTD| 171.000| 171.000| 171.000| 0| > 167.000| 176.000 > > Kernel mode:- > root@domain.hid# ./latency -t1 > == Sampling period: 1000000 us > == Test mode: in-kernel periodic task > == All results in microseconds > warming up... > RTT| 00:00:00 (in-kernel periodic task, 1000000 us > period, priority 99) > RTH|-----lat min|-----lat > avg|-----latmax|-overrun|----lat best|---lat worst > RTD| 123.000| 123.000| 123.000| 0| > 123.000| 123.000 > RTD| 125.000| 125.000| 125.000| 0| > 123.000| 125.000 > RTD| 128.333| 128.333| 128.333| 0| > 123.000| 128.333 > RTD| 127.000| 127.000| 127.000| 0| > 123.000| 128.333 > > Interrupt mode:- > root@domain.hid# ./latency -t2 > == Sampling period: 1000000 us > == Test mode: in-kernel timer handler > == All results in microseconds > warming up... > RTT| 00:00:01 (in-kernel timer handler, 1000000 us > period, priority 99) > RTH|-----lat min|-----lat > avg|-----latmax|-overrun|----lat best|---lat worst > RTD| 45.334| 45.334| 45.334| 0| > 45.334| 45.334 > RTD| 45.000| 45.000| 45.000| 0| > 45.000| 45.334 > RTD| 46.000| 46.000| 46.000| 0| > 45.000| 46.000 > RTD| 47.334| 47.334| 47.334| 0| > 45.000| 47.334 > RTD| 46.334| 46.334| 46.334| 0| > 45.000| 47.334 On the MPC860, the latencies are mainly due code execution time as this processor is very slow. > 2)X86:- > (Pentium4, 3.06GHz, 1024 KB cache size) > User mode:- > Sampling period: 100 us > == Test mode: in-kernel periodic task > == All results in microseconds > warming up... > > RTT| 00:00:01 (periodic user-mode task, 100 us > period, priority 99) > RTH|-----lat min|-----lat avg|-----lat > max|-overrun|----lat best|---lat worst > RTD| 3.807| 12.825| 21.565| 0| > 3.807| 21.565 > RTD| 3.796| 12.792| 21.483| 0| > 3.796| 21.565 > RTD| 3.770| 12.799| 21.501| 0| > 3.770| 21.565 > RTD| 3.578| 12.806| 20.890| 0| > 3.578| 21.565 > RTD| 3.755| 12.809| 21.486| 0| > 3.578| > > kernel mode:- > Sampling period: 100 us > == Test mode: in-kernel periodic task > == All results in microseconds > warming up... > > RTT| 00:00:01 (in-kernel periodic task, 100 us > period, priority 99) > RTH|-----lat min|-----lat avg|-----lat > max|-overrun|----lat best|---lat worst > RTD| 2.381| 3.451| 19.620| 0| > 2.381| 19.620 > RTD| 2.332| 3.480| 19.930| 0| > 2.332| 19.930 > RTD| 2.382| 3.649| 19.609| 0| > 2.332| 19.930 > RTD| 2.323| 2.786| 14.351| 0| > 2.323| 19.930 > RTD| 2.375| 2.532| 5.519| 0| > 2.323| 19.930 > RTD| 2.332| 3.971| 19.617| 0| > 2.323| 19.930 > > Interrupt mode:- > Sampling period: 100 us > == Test mode: in-kernel timer handler > == All results in microseconds > warming up... > > RTT| 00:00:01 (in-kernel timer handler, 100 us > period, priority 99) > RTH|-----lat min|-----lat avg|-----lat > max|-overrun|----lat best|---lat worst > RTD| -1.563| 7.553| 15.736| 0| > -1.563| 15.736 > RTD| -1.579| 7.558| 15.804| 0| > -1.579| 15.804 > RTD| -1.584| 7.529| 16.167| 0| > -1.584| 16.167 > RTD| -1.548| 7.553| 16.186| 0| > -1.584| 16.186 > RTD| -1.585| 7.556| 16.275| 0| > -1.585| 16.275 Latencies are mainly due to cache refills on the P4. Have you already put load onto your system? If not, worst case latencies will be even longer. Wolfgang. > Thanks, > Poornima > > > --- Wolfgang Grandegger wrote: > >> Hello, >> >> poornima r wrote: >>> Hello, >>> >>> Srry for not replying all these days... >>> (Was not in in station, may be too personal!!!!!) >>> >>> About software emulation error: >>> >>> 4)Output of /proc/xenomai/faults after the illegal >>>>> instruction:- >>>>> root@domain.hid# cat >>>>> /proc/xenomai/faults >>>>> TRAP CPU0 >>>>> 0: 0 (Data or instruction >> access) >>>>> 1: 0 (Alignment) >>>>> 2: 0 (Altivec unavailable) >>>>> 3: 0 (Program check exception) >>>>> 4: 0 (Machine check exception) >>>>> 5: 0 (Unknown) >>>>> 6: 0 (Instruction breakpoint) >>>>> 7: 0 (Run mode exception) >>>>> 8: 0 (Single-step exception) >>>>> 9: 0 (Non-recoverable exception) >>>>> 10: 1 (Software emulation) >>>>> 11: 0 (Debug) >>>>> 12: 0 (SPE) >>>>> 13: 0 (Altivec assist) >>>> Hm, I see a software emulation exception which is >>>> also the reason for >>>> the illegal instructions. What toolchain do you >> use? >>>> The toolchain >>>> should support software FP emulation. >>> 1)I am using open source too chain with software >>> floating point emulation support. >>> (#ppc_8xx-gcc --v >>> > /lib/gcc/powerpc/3.4.3/../../../../target_powerpc/usr/include/c++/3.4.3 >>> --with-numa-policy=no --with-float=soft) >>> >>> 2)And the kernel is included with code to emulate >> a >>> floating-point >> >>> unit, which will allow programs that >>> use floating-point >> >>> instructions to run >>> >>> Kernel configuration >>> ----CONFIG_MATH_EMULATION:y >> If you build with "--with-float=soft" there is no >> need for math >> emulation in the kernel. Likely, there is something >> wrong with your >> tool-chain. Could you please try a known-to-work >> tool-chain like the >> ELDK v4.x from http://www.denx.de. >> >> Wolfgang. >> >>> Thanks, >>> Poornima >>> >>> --- Wolfgang Grandegger wrote: >>> >>>> poornima r wrote: >>>>> Hi, >>>>> >>>>> 1)I am using open source kernel from Kernel.org, >>>>> but what is meant by vanilla kernel from >>>> Kernel.org? >>>> >>>> It's the kernel from kernel.org. This means that >> the >>>> Linux kernel 2.6.18 >>>> is running fine on your MPC860 platform as is? >>>> Thanks for the info. >>>> >>>>> 2)With sampling period of 500usec the system >>>> simply >>>>> hangs without printing any results (./latenct >>>> -p500) >>>> >>>> OK. >>>> >>>>> 3)cyclictest with -t1 option (without >>>> IPIPE-tracer) >>>>> root@domain.hid# >> ./cyclictest >>>> -t1 >>>>> 2.04 0.50 0.17 8/27 174 >>>>> >>>>> T: 0 ( 0) P:99 I: 1000 C: 0 Min: >>>> 1000000 >>>>> Act: 0 Avg: 0 Max:-1000000 >>>>> Illegal instruction >>>>> >>>>> 4)Output of /proc/xenomai/faults after the >> illegal >>>>> instruction:- >>>>> root@domain.hid# cat >>>>> /proc/xenomai/faults >>>>> TRAP CPU0 >>>>> 0: 0 (Data or instruction >> access) >>>>> 1: 0 (Alignment) >>>>> 2: 0 (Altivec unavailable) >>>>> 3: 0 (Program check exception) >>>>> 4: 0 (Machine check exception) >>>>> 5: 0 (Unknown) >>>>> 6: 0 (Instruction breakpoint) >>>>> 7: 0 (Run mode exception) >>>>> 8: 0 (Single-step exception) >>>>> 9: 0 (Non-recoverable exception) >>>>> 10: 1 (Software emulation) >>>>> 11: 0 (Debug) >>>>> 12: 0 (SPE) >>>>> 13: 0 (Altivec assist) >>>> Hm, I see a software emulation exception which is >>>> also the reason for >>>> the illegal instructions. What toolchain do you >> use? >>>> The toolchain >>>> should support software FP emulation. >>>> >>>>> 5)Running switchtest:- >>>>> root@domain.hid# >> ./switchtest >>>> -n >>>>> --The system hangs wihtout printing any results >>>>> >>>>> Thanks, >>>>> Poornima >>>>> >>>>> >>>>> --- Wolfgang Grandegger >> wrote: >>>>>> poornima r wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Thanks for the reply. >>>>>>> >>>>>>> Linux version:linux-2.6.18 >>>>>>> Xenomai: xenomai-2.3.0 (Stable version) >>>>>>> adeos patch: >> adeos-ipipe-2.6.18-ppc-1.5-01.patch >>>>>> OK, I'm curious, did you use the vanilla kernel >>>> from >>>>>> kernel.org? >>>>>> More comments below. >>>>>> >>>>>>> The tests were run as follows: >>>>>>> 1)The sampling period in the code for latency >>>> and >>>>>>> switchbench was changed to 1000000000ns(to >>>> remove >>>>>>> overrun error) >>>>>>> 2)switchtest was run with -n5 option >>>>>>> 3)cyclictest was run with -t5 option(5 >> threads >>>>>>> were created.) >>>>>>> 4)cyclictest was terminated with Illegal >>>>>> instruction >>>>>>> (after creating 5 threads) with IPIPE tracer >>>>>> enabled. >>>>>> >>>>>>> These were the results without I-PIPE Tracer >>>>>> option: >>>>>>> (All the tests were run without any load) >>>>>>> 1)LATENCY TEST:- >>>>>>> User mode:- >>>>>>> /mnt/out_xen/bin# ./latency -t0 >>>>>>> == Sampling period: 1000000 us >>>>>>> == Test mode: periodic user-mode task >>>>>>> == All results in microseconds >>>>>>> warming up... >>>>>>> RTT| 00:00:01 (periodic user-mode task, >>>> 1000000 >>>>>> us >>>>>>> period, priority 99) >>>>>>> RTH|-----lat min|-----lat avg|-----lat >>>>>>> max|-overrun|----lat best|---lat worst >>>>>>> RTD| 167.000| 167.000| 167.000| >> >>>>>> 0| >>>>>>> 167.000| 167.000 >>>>>>> RTD| 176.000| 176.000| 176.000| >> >>>>>> 0| >>>>>>> 167.000| 176.000 >>>>>>> RTD| 168.000| 168.000| 168.000| >> >>>>>> 0| >>>>>>> 167.000| 176.000 > === message truncated === > > > > > ____________________________________________________________________________________ > Get your own web address. > Have a HUGE year through Yahoo! Small Business. > http://smallbusiness.yahoo.com/domains/?p=BESTDEAL > > _______________________________________________ > Adeos-main mailing list > Adeos-main@domain.hid > https://mail.gna.org/listinfo/adeos-main > > --------------------------------- Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. --------------------------------- Never miss an email again! Yahoo! Toolbar alerts you the instant new Mail arrives. Check it out. [-- Attachment #2: Type: text/html, Size: 18428 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2007-03-14 12:51 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <45CD730A.6000405@domain.hid>
2007-02-20 7:21 ` [Adeos-main] latency results for ppc and x86 poornima r
2007-02-21 7:13 ` Wolfgang Grandegger
2007-02-21 9:33 ` poornima r
2007-02-21 9:33 ` Nicholas Mc Guire
2007-02-21 10:49 ` Jan Kiszka
2007-02-21 10:26 ` Nicholas Mc Guire
2007-02-21 12:29 ` Jan Kiszka
2007-02-21 12:14 ` Nicholas Mc Guire
2007-02-21 13:51 ` Jan Kiszka
2007-02-21 14:52 ` Wolfgang Grandegger
2007-02-21 15:10 ` Nicholas Mc Guire
2007-02-21 18:27 ` Jan Kiszka
2007-02-21 19:07 ` Nicholas Mc Guire
2007-02-21 21:05 ` Jan Kiszka
2007-03-14 12:51 ` [Adeos-main] test results for switchtest and cyclictest on x86 poornima r
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.