All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-core] Xenomai latency tests on various PowerPC boards
@ 2005-10-18  9:30 Wolfgang Grandegger
  2005-10-18 11:23 ` Philippe Gerum
  0 siblings, 1 reply; 8+ messages in thread
From: Wolfgang Grandegger @ 2005-10-18  9:30 UTC (permalink / raw)
  To: xenomai

[-- Attachment #1: Type: text/plain, Size: 852 bytes --]

Hallo,

attached you will find the results of Xemonai latency measurements on
various embedded PowerPC boards using MPC 8xx and AMCC 4xx processors,
from low to high end covering a worst case latency range from 25 to 225
us. It also includes a comparison with RTAI 3.0r5 on the slowest CPU.
Here are some remarks and comments:

- On low-end processor code size matters a lot and it's difficult to
  beat RTAI/RTHAL.

- Apart from the CPU power, big caches and a fast memory interface
  improves latencies.

- L2 cache improves latencies a lot (compare Ocotea with Yosemite).

- I'm a bit puzzled about the results of the "cruncher" test. Could
  someone explain the output, please?

- Stability seems already quite good. At least I did not observe any
  crash yet :-).

The PowerPC port of Xenomai is already in good shape. That's great!

Wolfgang.




[-- Attachment #2: xenomai-latencies-ppc-summary.log --]
[-- Type: text/plain, Size: 2754 bytes --]

Latency tests with Xenomai on various PowerPC boards
----------------------------------------------------

Board   : Processor  CPU-Clk Bus-Clk I-Cache D-Cache Memory Remarks

TQM860L : MPC 860     50 MHz  50 MHz    4 KB    4 kB  16 MB
TQM866M : MPC 866    133 MHz  66 MHz   16 KB    8 kB 128 MB

Walnut  : AMCC 405GP 200 MHz 100 MHz   16 KB    8 kB  32 MB
Yosemite: AMCC 440EP 533 MHz 133 MHz   32 KB   32 KB 256 MB DDR-RAM, FPU
Ocotea  : AMCC 440GX 533 MHz 152 MHz   32 KB   32 KB 256 MB DDR-RAM, L2 256 KB


Linux  : DENX linux-2.6.14-rc3-g4c234921
iPipe  : 1.0-00
Xenomai: SVN 2005-10-15


CRUNCER without load:

         | Ideal computation time
TQM860L  |   368 us ???
TQM866L  | 10008 us 
Walnut   | 10150 us
Yosemite |  9911 us
Ocotea   |  9479 us 


SWITCH without load:

         |     lat min|     lat avg|     lat max|        lost
TQM860L  |      103360|      107840|      209280|           0
TQM866L  |       25745|       31880|       51369|           5
Walnut   |       24620|       25965|       32280|           1
Yosemite |        5626|        5655|       17403|           0
Ocotea   |        5158|        5169|       10038|           0


KLATENCY with load:

         |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
TQM860L  |       50560|       98976|      199040|       0|    00:09:45
TQM866L  |       13835|       28571|       74348|       0|    00:11:44
Walnut   |       16195|       25062|       45755|       0|    00:10:09
Yosemite |        3106|        9697|       36832|       0|    00:09:55
Ocotea   |        3575|        7438|       24474|       0|    00:10:50


LATENCY with load:

         |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
TQM860L  |       60480|      120960|      224320|       0|    00:09:46
TQM866L  |       15759|       34286|       78799|       0|    00:11:14
Walnut   |       21070|       31650|       64500|       0|    00:09:58
Yosemite |        3808|       12163|       47898|       0|    00:10:00
Ocotea   |        3575|        7438|       24474|       0|    00:10:50


KLATENCY comparison Xenomai 2.0 vs. RTAI/RTHAL 3.0r5 on TQM860L:
---------------------------------------------------------------

KLATENCY with load:

            |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
Xenomai 2.0 |       50560|       98976|      199040|       0|    00:09:45
RTAI 3.0r5  |       23120|       31838|       70520|       ?|    00:12:26



Note: load has been put onto the system by running in a telnet session
      "ping -f <remote-host-ip>" and "while ls; do ls; done".

Note: all test have been run with CONFIG_XENO_HW_TIMER_LATENCY="1" and
      CONFIG_XENO_HW_SCHED_LATENCY="1" to get correct latancy values.
      RTAI figures have been corrected manually.



[-- Attachment #3: xenomai-latencies-ppc.tgz --]
[-- Type: application/x-gzip, Size: 3038 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-core] Xenomai latency tests on various PowerPC boards
  2005-10-18  9:30 [Xenomai-core] Xenomai latency tests on various PowerPC boards Wolfgang Grandegger
@ 2005-10-18 11:23 ` Philippe Gerum
  2005-10-18 11:44   ` Philippe Gerum
  0 siblings, 1 reply; 8+ messages in thread
From: Philippe Gerum @ 2005-10-18 11:23 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: xenomai

Wolfgang Grandegger wrote:
> Hallo,
> 
> attached you will find the results of Xemonai latency measurements on
> various embedded PowerPC boards using MPC 8xx and AMCC 4xx processors,
> from low to high end covering a worst case latency range from 25 to 225
> us. It also includes a comparison with RTAI 3.0r5 on the slowest CPU.
> Here are some remarks and comments:
> 
> - On low-end processor code size matters a lot and it's difficult to
>   beat RTAI/RTHAL.
>

Beat no, get closer, yes, probably. The good news is that looking at the 
figures, we do have a margin of improvement! :o>

Btw, the nucleus can be configured so that the user-space threading engine is 
compiled out (i.e. CONFIG_XENO_OPT_PERVASIVE from the nucleus menu), which would 
be the corresponding profile to compare with klatency (i.e. sched_up). Disabling 
this option reduces the code size for the nucleus from:

    text	   data	    bss	    dec	    hex	filename
   66740	    792	   6540	  74072	  12158	nucleus/xeno_nucleus.ko

to:

   text	   data	    bss	    dec	    hex	filename
   52596	    576	   3956	  57128	   df28	nucleus/xeno_nucleus.ko

Still a bit fat though.

> - Apart from the CPU power, big caches and a fast memory interface
>   improves latencies.
> 
> - L2 cache improves latencies a lot (compare Ocotea with Yosemite).
> 
> - I'm a bit puzzled about the results of the "cruncher" test. Could
>   someone explain the output, please?
> 

This test is reminiscent of the HYADES project (ia64 port of RTAI/fusion), where 
we wanted to illustrate the level of execution determinism one could achieve 
using the interrupt shield technique on large ia64 SMP systems. To this end, we 
measured the jitter in execution time of a calibrated float-crunching loop, with 
and without interrupt load. This test is likely going to disappear at some point 
in time, because it's not that informative in Xeno's context.

> - Stability seems already quite good. At least I did not observe any
>   crash yet :-).
> 

That's cool. I see no other way to properly improve performances than first 
having something which could be run on various platforms without them randomly 
jumping out of the window, or us relying on plain Voodoo stuff to explain why 
those setup would work or not.

> The PowerPC port of Xenomai is already in good shape. That's great!
> 

Thanks. This is likely because I do feel better since I have been aware that 
there's life beyond x86. :o)

> Wolfgang.
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> Latency tests with Xenomai on various PowerPC boards
> ----------------------------------------------------
> 
> Board   : Processor  CPU-Clk Bus-Clk I-Cache D-Cache Memory Remarks
> 
> TQM860L : MPC 860     50 MHz  50 MHz    4 KB    4 kB  16 MB
> TQM866M : MPC 866    133 MHz  66 MHz   16 KB    8 kB 128 MB
> 
> Walnut  : AMCC 405GP 200 MHz 100 MHz   16 KB    8 kB  32 MB
> Yosemite: AMCC 440EP 533 MHz 133 MHz   32 KB   32 KB 256 MB DDR-RAM, FPU
> Ocotea  : AMCC 440GX 533 MHz 152 MHz   32 KB   32 KB 256 MB DDR-RAM, L2 256 KB
> 
> 
> Linux  : DENX linux-2.6.14-rc3-g4c234921
> iPipe  : 1.0-00
> Xenomai: SVN 2005-10-15
> 
> 
> CRUNCER without load:
> 
>          | Ideal computation time
> TQM860L  |   368 us ???
> TQM866L  | 10008 us 
> Walnut   | 10150 us
> Yosemite |  9911 us
> Ocotea   |  9479 us 
> 
> 
> SWITCH without load:
> 
>          |     lat min|     lat avg|     lat max|        lost
> TQM860L  |      103360|      107840|      209280|           0
> TQM866L  |       25745|       31880|       51369|           5
> Walnut   |       24620|       25965|       32280|           1
> Yosemite |        5626|        5655|       17403|           0
> Ocotea   |        5158|        5169|       10038|           0
> 
> 
> KLATENCY with load:
> 
>          |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
> TQM860L  |       50560|       98976|      199040|       0|    00:09:45
> TQM866L  |       13835|       28571|       74348|       0|    00:11:44
> Walnut   |       16195|       25062|       45755|       0|    00:10:09
> Yosemite |        3106|        9697|       36832|       0|    00:09:55
> Ocotea   |        3575|        7438|       24474|       0|    00:10:50
> 
> 
> LATENCY with load:
> 
>          |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
> TQM860L  |       60480|      120960|      224320|       0|    00:09:46
> TQM866L  |       15759|       34286|       78799|       0|    00:11:14
> Walnut   |       21070|       31650|       64500|       0|    00:09:58
> Yosemite |        3808|       12163|       47898|       0|    00:10:00
> Ocotea   |        3575|        7438|       24474|       0|    00:10:50
> 
> 
> KLATENCY comparison Xenomai 2.0 vs. RTAI/RTHAL 3.0r5 on TQM860L:
> ---------------------------------------------------------------
> 
> KLATENCY with load:
> 
>             |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
> Xenomai 2.0 |       50560|       98976|      199040|       0|    00:09:45
> RTAI 3.0r5  |       23120|       31838|       70520|       ?|    00:12:26
> 
> 
> 
> Note: load has been put onto the system by running in a telnet session
>       "ping -f <remote-host-ip>" and "while ls; do ls; done".
> 
> Note: all test have been run with CONFIG_XENO_HW_TIMER_LATENCY="1" and
>       CONFIG_XENO_HW_SCHED_LATENCY="1" to get correct latancy values.
>       RTAI figures have been corrected manually.
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core


-- 

Philippe.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-core] Xenomai latency tests on various PowerPC boards
  2005-10-18 11:23 ` Philippe Gerum
@ 2005-10-18 11:44   ` Philippe Gerum
  2005-10-18 12:12     ` Wolfgang Grandegger
  0 siblings, 1 reply; 8+ messages in thread
From: Philippe Gerum @ 2005-10-18 11:44 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

Philippe Gerum wrote:
> Wolfgang Grandegger wrote:
> 
>> Hallo,
>>
>> attached you will find the results of Xemonai latency measurements on
>> various embedded PowerPC boards using MPC 8xx and AMCC 4xx processors,
>> from low to high end covering a worst case latency range from 25 to 225
>> us. It also includes a comparison with RTAI 3.0r5 on the slowest CPU.
>> Here are some remarks and comments:
>>
>> - On low-end processor code size matters a lot and it's difficult to
>>   beat RTAI/RTHAL.
>>
> 
> Beat no, get closer, yes, probably. The good news is that looking at the 
> figures, we do have a margin of improvement! :o>
> 
> Btw, the nucleus can be configured so that the user-space threading 
> engine is compiled out (i.e. CONFIG_XENO_OPT_PERVASIVE from the nucleus 
> menu), which would be the corresponding profile to compare with klatency 
> (i.e. sched_up). Disabling this option reduces the code size for the 
> nucleus from:
> 
>    text       data        bss        dec        hex    filename
>   66740        792       6540      74072      12158    
> nucleus/xeno_nucleus.ko
> 
> to:
> 
>   text       data        bss        dec        hex    filename
>   52596        576       3956      57128       df28    
> nucleus/xeno_nucleus.ko
> 

Disabling the periodic timer support which is unused for the klatency test 
brings this down to:

    text	   data	    bss	    dec	    hex	filename
   51040	    544	   3956	  55540	   d8f4	nucleus/xeno_nucleus.ko

> Still a bit fat though.
> 
>> - Apart from the CPU power, big caches and a fast memory interface
>>   improves latencies.
>>
>> - L2 cache improves latencies a lot (compare Ocotea with Yosemite).
>>
>> - I'm a bit puzzled about the results of the "cruncher" test. Could
>>   someone explain the output, please?
>>
> 
> This test is reminiscent of the HYADES project (ia64 port of 
> RTAI/fusion), where we wanted to illustrate the level of execution 
> determinism one could achieve using the interrupt shield technique on 
> large ia64 SMP systems. To this end, we measured the jitter in execution 
> time of a calibrated float-crunching loop, with and without interrupt 
> load. This test is likely going to disappear at some point in time, 
> because it's not that informative in Xeno's context.
> 
>> - Stability seems already quite good. At least I did not observe any
>>   crash yet :-).
>>
> 
> That's cool. I see no other way to properly improve performances than 
> first having something which could be run on various platforms without 
> them randomly jumping out of the window, or us relying on plain Voodoo 
> stuff to explain why those setup would work or not.
> 
>> The PowerPC port of Xenomai is already in good shape. That's great!
>>
> 
> Thanks. This is likely because I do feel better since I have been aware 
> that there's life beyond x86. :o)
> 
>> Wolfgang.
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> Latency tests with Xenomai on various PowerPC boards
>> ----------------------------------------------------
>>
>> Board   : Processor  CPU-Clk Bus-Clk I-Cache D-Cache Memory Remarks
>>
>> TQM860L : MPC 860     50 MHz  50 MHz    4 KB    4 kB  16 MB
>> TQM866M : MPC 866    133 MHz  66 MHz   16 KB    8 kB 128 MB
>>
>> Walnut  : AMCC 405GP 200 MHz 100 MHz   16 KB    8 kB  32 MB
>> Yosemite: AMCC 440EP 533 MHz 133 MHz   32 KB   32 KB 256 MB DDR-RAM, FPU
>> Ocotea  : AMCC 440GX 533 MHz 152 MHz   32 KB   32 KB 256 MB DDR-RAM, 
>> L2 256 KB
>>
>>
>> Linux  : DENX linux-2.6.14-rc3-g4c234921
>> iPipe  : 1.0-00
>> Xenomai: SVN 2005-10-15
>>
>>
>> CRUNCER without load:
>>
>>          | Ideal computation time
>> TQM860L  |   368 us ???
>> TQM866L  | 10008 us Walnut   | 10150 us
>> Yosemite |  9911 us
>> Ocotea   |  9479 us
>>
>> SWITCH without load:
>>
>>          |     lat min|     lat avg|     lat max|        lost
>> TQM860L  |      103360|      107840|      209280|           0
>> TQM866L  |       25745|       31880|       51369|           5
>> Walnut   |       24620|       25965|       32280|           1
>> Yosemite |        5626|        5655|       17403|           0
>> Ocotea   |        5158|        5169|       10038|           0
>>
>>
>> KLATENCY with load:
>>
>>          |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
>> TQM860L  |       50560|       98976|      199040|       0|    00:09:45
>> TQM866L  |       13835|       28571|       74348|       0|    00:11:44
>> Walnut   |       16195|       25062|       45755|       0|    00:10:09
>> Yosemite |        3106|        9697|       36832|       0|    00:09:55
>> Ocotea   |        3575|        7438|       24474|       0|    00:10:50
>>
>>
>> LATENCY with load:
>>
>>          |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
>> TQM860L  |       60480|      120960|      224320|       0|    00:09:46
>> TQM866L  |       15759|       34286|       78799|       0|    00:11:14
>> Walnut   |       21070|       31650|       64500|       0|    00:09:58
>> Yosemite |        3808|       12163|       47898|       0|    00:10:00
>> Ocotea   |        3575|        7438|       24474|       0|    00:10:50
>>
>>
>> KLATENCY comparison Xenomai 2.0 vs. RTAI/RTHAL 3.0r5 on TQM860L:
>> ---------------------------------------------------------------
>>
>> KLATENCY with load:
>>
>>             |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
>> Xenomai 2.0 |       50560|       98976|      199040|       0|    00:09:45
>> RTAI 3.0r5  |       23120|       31838|       70520|       ?|    00:12:26
>>
>>
>>
>> Note: load has been put onto the system by running in a telnet session
>>       "ping -f <remote-host-ip>" and "while ls; do ls; done".
>>
>> Note: all test have been run with CONFIG_XENO_HW_TIMER_LATENCY="1" and
>>       CONFIG_XENO_HW_SCHED_LATENCY="1" to get correct latancy values.
>>       RTAI figures have been corrected manually.
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Xenomai-core mailing list
>> Xenomai-core@domain.hid
>> https://mail.gna.org/listinfo/xenomai-core
> 
> 
> 


-- 

Philippe.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-core] Xenomai latency tests on various PowerPC boards
  2005-10-18 11:44   ` Philippe Gerum
@ 2005-10-18 12:12     ` Wolfgang Grandegger
  2005-10-18 12:21       ` Philippe Gerum
  2005-10-18 18:14       ` Philippe Gerum
  0 siblings, 2 replies; 8+ messages in thread
From: Wolfgang Grandegger @ 2005-10-18 12:12 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On 10/18/2005 01:44 PM Philippe Gerum wrote:
> Philippe Gerum wrote:
>> Wolfgang Grandegger wrote:
>> 
>>> Hallo,
>>>
>>> attached you will find the results of Xemonai latency measurements on
>>> various embedded PowerPC boards using MPC 8xx and AMCC 4xx processors,
>>> from low to high end covering a worst case latency range from 25 to 225
>>> us. It also includes a comparison with RTAI 3.0r5 on the slowest CPU.
>>> Here are some remarks and comments:
>>>
>>> - On low-end processor code size matters a lot and it's difficult to
>>>   beat RTAI/RTHAL.
>>>
>> 
>> Beat no, get closer, yes, probably. The good news is that looking at the 
>> figures, we do have a margin of improvement! :o>
>> 
>> Btw, the nucleus can be configured so that the user-space threading 
>> engine is compiled out (i.e. CONFIG_XENO_OPT_PERVASIVE from the nucleus 
>> menu), which would be the corresponding profile to compare with klatency 
>> (i.e. sched_up). Disabling this option reduces the code size for the 
>> nucleus from:
>> 
>>    text       data        bss        dec        hex    filename
>>   66740        792       6540      74072      12158    
>> nucleus/xeno_nucleus.ko
>> 
>> to:
>> 
>>   text       data        bss        dec        hex    filename
>>   52596        576       3956      57128       df28    
>> nucleus/xeno_nucleus.ko
>> 
> 
> Disabling the periodic timer support which is unused for the klatency test 
> brings this down to:
> 
>     text	   data	    bss	    dec	    hex	filename
>    51040	    544	   3956	  55540	   d8f4	nucleus/xeno_nucleus.ko

OK, here are the new figures with (*)

 CONFIG_XENO_OPT_PERVASIVE is not set
 CONFIG_XENO_HW_PERIODIC_TIMER is not set:

           |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
RTAI 3.0r5 |       23120|       31838|       70520|       ?|    00:12:26
Xenomai    |       50560|       98976|      199040|       0|    00:09:45
Xenomai (*)|       44160|       96215|      200640|       0|    00:09:53

The min latency decreases as expected.

> 
>> Still a bit fat though.
>> 
>>> - Apart from the CPU power, big caches and a fast memory interface
>>>   improves latencies.
>>>
>>> - L2 cache improves latencies a lot (compare Ocotea with Yosemite).
>>>
>>> - I'm a bit puzzled about the results of the "cruncher" test. Could
>>>   someone explain the output, please?
>>>
>> 
>> This test is reminiscent of the HYADES project (ia64 port of 
>> RTAI/fusion), where we wanted to illustrate the level of execution 
>> determinism one could achieve using the interrupt shield technique on 
>> large ia64 SMP systems. To this end, we measured the jitter in execution 
>> time of a calibrated float-crunching loop, with and without interrupt 
>> load. This test is likely going to disappear at some point in time, 
>> because it's not that informative in Xeno's context.
>> 
>>> - Stability seems already quite good. At least I did not observe any
>>>   crash yet :-).
>>>
>> 
>> That's cool. I see no other way to properly improve performances than 
>> first having something which could be run on various platforms without 
>> them randomly jumping out of the window, or us relying on plain Voodoo 
>> stuff to explain why those setup would work or not.
>> 
>>> The PowerPC port of Xenomai is already in good shape. That's great!
>>>
>> 
>> Thanks. This is likely because I do feel better since I have been aware 
>> that there's life beyond x86. :o)
>> 
>>> Wolfgang.
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> Latency tests with Xenomai on various PowerPC boards
>>> ----------------------------------------------------
>>>
>>> Board   : Processor  CPU-Clk Bus-Clk I-Cache D-Cache Memory Remarks
>>>
>>> TQM860L : MPC 860     50 MHz  50 MHz    4 KB    4 kB  16 MB
>>> TQM866M : MPC 866    133 MHz  66 MHz   16 KB    8 kB 128 MB
>>>
>>> Walnut  : AMCC 405GP 200 MHz 100 MHz   16 KB    8 kB  32 MB
>>> Yosemite: AMCC 440EP 533 MHz 133 MHz   32 KB   32 KB 256 MB DDR-RAM, FPU
>>> Ocotea  : AMCC 440GX 533 MHz 152 MHz   32 KB   32 KB 256 MB DDR-RAM, 
>>> L2 256 KB
>>>
>>>
>>> Linux  : DENX linux-2.6.14-rc3-g4c234921
>>> iPipe  : 1.0-00
>>> Xenomai: SVN 2005-10-15
>>>
>>>
>>> CRUNCER without load:
>>>
>>>          | Ideal computation time
>>> TQM860L  |   368 us ???
>>> TQM866L  | 10008 us Walnut   | 10150 us
>>> Yosemite |  9911 us
>>> Ocotea   |  9479 us
>>>
>>> SWITCH without load:
>>>
>>>          |     lat min|     lat avg|     lat max|        lost
>>> TQM860L  |      103360|      107840|      209280|           0
>>> TQM866L  |       25745|       31880|       51369|           5
>>> Walnut   |       24620|       25965|       32280|           1
>>> Yosemite |        5626|        5655|       17403|           0
>>> Ocotea   |        5158|        5169|       10038|           0
>>>
>>>
>>> KLATENCY with load:
>>>
>>>          |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
>>> TQM860L  |       50560|       98976|      199040|       0|    00:09:45
>>> TQM866L  |       13835|       28571|       74348|       0|    00:11:44
>>> Walnut   |       16195|       25062|       45755|       0|    00:10:09
>>> Yosemite |        3106|        9697|       36832|       0|    00:09:55
>>> Ocotea   |        3575|        7438|       24474|       0|    00:10:50
>>>
>>>
>>> LATENCY with load:
>>>
>>>          |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
>>> TQM860L  |       60480|      120960|      224320|       0|    00:09:46
>>> TQM866L  |       15759|       34286|       78799|       0|    00:11:14
>>> Walnut   |       21070|       31650|       64500|       0|    00:09:58
>>> Yosemite |        3808|       12163|       47898|       0|    00:10:00
>>> Ocotea   |        3575|        7438|       24474|       0|    00:10:50
>>>
>>>
>>> KLATENCY comparison Xenomai 2.0 vs. RTAI/RTHAL 3.0r5 on TQM860L:
>>> ---------------------------------------------------------------
>>>
>>> KLATENCY with load:
>>>
>>>             |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
>>> Xenomai 2.0 |       50560|       98976|      199040|       0|    00:09:45
>>> RTAI 3.0r5  |       23120|       31838|       70520|       ?|    00:12:26
>>>
>>>
>>>
>>> Note: load has been put onto the system by running in a telnet session
>>>       "ping -f <remote-host-ip>" and "while ls; do ls; done".
>>>
>>> Note: all test have been run with CONFIG_XENO_HW_TIMER_LATENCY="1" and
>>>       CONFIG_XENO_HW_SCHED_LATENCY="1" to get correct latancy values.
>>>       RTAI figures have been corrected manually.
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Xenomai-core mailing list
>>> Xenomai-core@domain.hid
>>> https://mail.gna.org/listinfo/xenomai-core
>> 
>> 
>> 
> 
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-core] Xenomai latency tests on various PowerPC boards
  2005-10-18 12:12     ` Wolfgang Grandegger
@ 2005-10-18 12:21       ` Philippe Gerum
  2005-10-18 18:14       ` Philippe Gerum
  1 sibling, 0 replies; 8+ messages in thread
From: Philippe Gerum @ 2005-10-18 12:21 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: xenomai

Wolfgang Grandegger wrote:
> On 10/18/2005 01:44 PM Philippe Gerum wrote:
> 
>>Philippe Gerum wrote:
>>
>>>Wolfgang Grandegger wrote:
>>>
>>>
>>>>Hallo,
>>>>
>>>>attached you will find the results of Xemonai latency measurements on
>>>>various embedded PowerPC boards using MPC 8xx and AMCC 4xx processors,
>>>>from low to high end covering a worst case latency range from 25 to 225
>>>>us. It also includes a comparison with RTAI 3.0r5 on the slowest CPU.
>>>>Here are some remarks and comments:
>>>>
>>>>- On low-end processor code size matters a lot and it's difficult to
>>>>  beat RTAI/RTHAL.
>>>>
>>>
>>>Beat no, get closer, yes, probably. The good news is that looking at the 
>>>figures, we do have a margin of improvement! :o>
>>>
>>>Btw, the nucleus can be configured so that the user-space threading 
>>>engine is compiled out (i.e. CONFIG_XENO_OPT_PERVASIVE from the nucleus 
>>>menu), which would be the corresponding profile to compare with klatency 
>>>(i.e. sched_up). Disabling this option reduces the code size for the 
>>>nucleus from:
>>>
>>>   text       data        bss        dec        hex    filename
>>>  66740        792       6540      74072      12158    
>>>nucleus/xeno_nucleus.ko
>>>
>>>to:
>>>
>>>  text       data        bss        dec        hex    filename
>>>  52596        576       3956      57128       df28    
>>>nucleus/xeno_nucleus.ko
>>>
>>
>>Disabling the periodic timer support which is unused for the klatency test 
>>brings this down to:
>>
>>    text	   data	    bss	    dec	    hex	filename
>>   51040	    544	   3956	  55540	   d8f4	nucleus/xeno_nucleus.ko
> 
> 
> OK, here are the new figures with (*)
> 
>  CONFIG_XENO_OPT_PERVASIVE is not set
>  CONFIG_XENO_HW_PERIODIC_TIMER is not set:
> 
>            |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
> RTAI 3.0r5 |       23120|       31838|       70520|       ?|    00:12:26
> Xenomai    |       50560|       98976|      199040|       0|    00:09:45
> Xenomai (*)|       44160|       96215|      200640|       0|    00:09:53
> 
> The min latency decreases as expected.
> 

Looks like significant. I wonder now what's the impact of having 2.6 trashing 
the caches during the sleep periods compared to 2.4. But to have an answer here, 
we will need Xeno running over 2.4. Ok, it's planned.

> 
>>>Still a bit fat though.
>>>
>>>
>>>>- Apart from the CPU power, big caches and a fast memory interface
>>>>  improves latencies.
>>>>
>>>>- L2 cache improves latencies a lot (compare Ocotea with Yosemite).
>>>>
>>>>- I'm a bit puzzled about the results of the "cruncher" test. Could
>>>>  someone explain the output, please?
>>>>
>>>
>>>This test is reminiscent of the HYADES project (ia64 port of 
>>>RTAI/fusion), where we wanted to illustrate the level of execution 
>>>determinism one could achieve using the interrupt shield technique on 
>>>large ia64 SMP systems. To this end, we measured the jitter in execution 
>>>time of a calibrated float-crunching loop, with and without interrupt 
>>>load. This test is likely going to disappear at some point in time, 
>>>because it's not that informative in Xeno's context.
>>>
>>>
>>>>- Stability seems already quite good. At least I did not observe any
>>>>  crash yet :-).
>>>>
>>>
>>>That's cool. I see no other way to properly improve performances than 
>>>first having something which could be run on various platforms without 
>>>them randomly jumping out of the window, or us relying on plain Voodoo 
>>>stuff to explain why those setup would work or not.
>>>
>>>
>>>>The PowerPC port of Xenomai is already in good shape. That's great!
>>>>
>>>
>>>Thanks. This is likely because I do feel better since I have been aware 
>>>that there's life beyond x86. :o)
>>>
>>>
>>>>Wolfgang.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>------------------------------------------------------------------------
>>>>
>>>>Latency tests with Xenomai on various PowerPC boards
>>>>----------------------------------------------------
>>>>
>>>>Board   : Processor  CPU-Clk Bus-Clk I-Cache D-Cache Memory Remarks
>>>>
>>>>TQM860L : MPC 860     50 MHz  50 MHz    4 KB    4 kB  16 MB
>>>>TQM866M : MPC 866    133 MHz  66 MHz   16 KB    8 kB 128 MB
>>>>
>>>>Walnut  : AMCC 405GP 200 MHz 100 MHz   16 KB    8 kB  32 MB
>>>>Yosemite: AMCC 440EP 533 MHz 133 MHz   32 KB   32 KB 256 MB DDR-RAM, FPU
>>>>Ocotea  : AMCC 440GX 533 MHz 152 MHz   32 KB   32 KB 256 MB DDR-RAM, 
>>>>L2 256 KB
>>>>
>>>>
>>>>Linux  : DENX linux-2.6.14-rc3-g4c234921
>>>>iPipe  : 1.0-00
>>>>Xenomai: SVN 2005-10-15
>>>>
>>>>
>>>>CRUNCER without load:
>>>>
>>>>         | Ideal computation time
>>>>TQM860L  |   368 us ???
>>>>TQM866L  | 10008 us Walnut   | 10150 us
>>>>Yosemite |  9911 us
>>>>Ocotea   |  9479 us
>>>>
>>>>SWITCH without load:
>>>>
>>>>         |     lat min|     lat avg|     lat max|        lost
>>>>TQM860L  |      103360|      107840|      209280|           0
>>>>TQM866L  |       25745|       31880|       51369|           5
>>>>Walnut   |       24620|       25965|       32280|           1
>>>>Yosemite |        5626|        5655|       17403|           0
>>>>Ocotea   |        5158|        5169|       10038|           0
>>>>
>>>>
>>>>KLATENCY with load:
>>>>
>>>>         |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
>>>>TQM860L  |       50560|       98976|      199040|       0|    00:09:45
>>>>TQM866L  |       13835|       28571|       74348|       0|    00:11:44
>>>>Walnut   |       16195|       25062|       45755|       0|    00:10:09
>>>>Yosemite |        3106|        9697|       36832|       0|    00:09:55
>>>>Ocotea   |        3575|        7438|       24474|       0|    00:10:50
>>>>
>>>>
>>>>LATENCY with load:
>>>>
>>>>         |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
>>>>TQM860L  |       60480|      120960|      224320|       0|    00:09:46
>>>>TQM866L  |       15759|       34286|       78799|       0|    00:11:14
>>>>Walnut   |       21070|       31650|       64500|       0|    00:09:58
>>>>Yosemite |        3808|       12163|       47898|       0|    00:10:00
>>>>Ocotea   |        3575|        7438|       24474|       0|    00:10:50
>>>>
>>>>
>>>>KLATENCY comparison Xenomai 2.0 vs. RTAI/RTHAL 3.0r5 on TQM860L:
>>>>---------------------------------------------------------------
>>>>
>>>>KLATENCY with load:
>>>>
>>>>            |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
>>>>Xenomai 2.0 |       50560|       98976|      199040|       0|    00:09:45
>>>>RTAI 3.0r5  |       23120|       31838|       70520|       ?|    00:12:26
>>>>
>>>>
>>>>
>>>>Note: load has been put onto the system by running in a telnet session
>>>>      "ping -f <remote-host-ip>" and "while ls; do ls; done".
>>>>
>>>>Note: all test have been run with CONFIG_XENO_HW_TIMER_LATENCY="1" and
>>>>      CONFIG_XENO_HW_SCHED_LATENCY="1" to get correct latancy values.
>>>>      RTAI figures have been corrected manually.
>>>>
>>>>
>>>>
>>>>
>>>>------------------------------------------------------------------------
>>>>
>>>>_______________________________________________
>>>>Xenomai-core mailing list
>>>>Xenomai-core@domain.hid
>>>>https://mail.gna.org/listinfo/xenomai-core
>>>
>>>
>>>
>>
> 
> 


-- 

Philippe.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-core] Xenomai latency tests on various PowerPC boards
  2005-10-18 12:12     ` Wolfgang Grandegger
  2005-10-18 12:21       ` Philippe Gerum
@ 2005-10-18 18:14       ` Philippe Gerum
  2005-10-19  8:35         ` Wolfgang Grandegger
  1 sibling, 1 reply; 8+ messages in thread
From: Philippe Gerum @ 2005-10-18 18:14 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: xenomai

Wolfgang Grandegger wrote:
> On 10/18/2005 01:44 PM Philippe Gerum wrote:
> 
>>Philippe Gerum wrote:
>>
>>>Wolfgang Grandegger wrote:
>>>
>>>
>>>>Hallo,
>>>>
>>>>attached you will find the results of Xemonai latency measurements on
>>>>various embedded PowerPC boards using MPC 8xx and AMCC 4xx processors,
>>>>from low to high end covering a worst case latency range from 25 to 225
>>>>us. It also includes a comparison with RTAI 3.0r5 on the slowest CPU.
>>>>Here are some remarks and comments:
>>>>
>>>>- On low-end processor code size matters a lot and it's difficult to
>>>>  beat RTAI/RTHAL.
>>>>
>>>
>>>Beat no, get closer, yes, probably. The good news is that looking at the 
>>>figures, we do have a margin of improvement! :o>
>>>
>>>Btw, the nucleus can be configured so that the user-space threading 
>>>engine is compiled out (i.e. CONFIG_XENO_OPT_PERVASIVE from the nucleus 
>>>menu), which would be the corresponding profile to compare with klatency 
>>>(i.e. sched_up). Disabling this option reduces the code size for the 
>>>nucleus from:
>>>
>>>   text       data        bss        dec        hex    filename
>>>  66740        792       6540      74072      12158    
>>>nucleus/xeno_nucleus.ko
>>>
>>>to:
>>>
>>>  text       data        bss        dec        hex    filename
>>>  52596        576       3956      57128       df28    
>>>nucleus/xeno_nucleus.ko
>>>
>>
>>Disabling the periodic timer support which is unused for the klatency test 
>>brings this down to:
>>
>>    text	   data	    bss	    dec	    hex	filename
>>   51040	    544	   3956	  55540	   d8f4	nucleus/xeno_nucleus.ko
> 
> 
> OK, here are the new figures with (*)
> 
>  CONFIG_XENO_OPT_PERVASIVE is not set
>  CONFIG_XENO_HW_PERIODIC_TIMER is not set:
> 
>            |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
> RTAI 3.0r5 |       23120|       31838|       70520|       ?|    00:12:26
> Xenomai    |       50560|       98976|      199040|       0|    00:09:45
> Xenomai (*)|       44160|       96215|      200640|       0|    00:09:53
> 
> The min latency decreases as expected.
> 

I just discovered that -00 did not include some recent changes I had in my tree, 
aimed at prevent high latencies during fork pressure. I've committed -01 which 
does include them. When time allows, I'd be interested to know if this has some 
impact on the Ocotea figures. TIA,

-- 

Philippe.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-core] Xenomai latency tests on various PowerPC boards
  2005-10-18 18:14       ` Philippe Gerum
@ 2005-10-19  8:35         ` Wolfgang Grandegger
  2005-10-19 10:56           ` Philippe Gerum
  0 siblings, 1 reply; 8+ messages in thread
From: Wolfgang Grandegger @ 2005-10-19  8:35 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On 10/18/2005 08:14 PM Philippe Gerum wrote:
> Wolfgang Grandegger wrote:
>> On 10/18/2005 01:44 PM Philippe Gerum wrote:
>> 
>>>Philippe Gerum wrote:
>>>
>>>>Wolfgang Grandegger wrote:
>>>>
>>>>
>>>>>Hallo,
>>>>>
>>>>>attached you will find the results of Xemonai latency measurements on
>>>>>various embedded PowerPC boards using MPC 8xx and AMCC 4xx processors,
>>>>>from low to high end covering a worst case latency range from 25 to 225
>>>>>us. It also includes a comparison with RTAI 3.0r5 on the slowest CPU.
>>>>>Here are some remarks and comments:
>>>>>
>>>>>- On low-end processor code size matters a lot and it's difficult to
>>>>>  beat RTAI/RTHAL.
>>>>>
>>>>
>>>>Beat no, get closer, yes, probably. The good news is that looking at the 
>>>>figures, we do have a margin of improvement! :o>
>>>>
>>>>Btw, the nucleus can be configured so that the user-space threading 
>>>>engine is compiled out (i.e. CONFIG_XENO_OPT_PERVASIVE from the nucleus 
>>>>menu), which would be the corresponding profile to compare with klatency 
>>>>(i.e. sched_up). Disabling this option reduces the code size for the 
>>>>nucleus from:
>>>>
>>>>   text       data        bss        dec        hex    filename
>>>>  66740        792       6540      74072      12158    
>>>>nucleus/xeno_nucleus.ko
>>>>
>>>>to:
>>>>
>>>>  text       data        bss        dec        hex    filename
>>>>  52596        576       3956      57128       df28    
>>>>nucleus/xeno_nucleus.ko
>>>>
>>>
>>>Disabling the periodic timer support which is unused for the klatency test 
>>>brings this down to:
>>>
>>>    text	   data	    bss	    dec	    hex	filename
>>>   51040	    544	   3956	  55540	   d8f4	nucleus/xeno_nucleus.ko
>> 
>> 
>> OK, here are the new figures with (*)
>> 
>>  CONFIG_XENO_OPT_PERVASIVE is not set
>>  CONFIG_XENO_HW_PERIODIC_TIMER is not set:
>> 
>>            |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
>> RTAI 3.0r5 |       23120|       31838|       70520|       ?|    00:12:26
>> Xenomai    |       50560|       98976|      199040|       0|    00:09:45
>> Xenomai (*)|       44160|       96215|      200640|       0|    00:09:53
>> 
>> The min latency decreases as expected.
>> 
> 
> I just discovered that -00 did not include some recent changes I had in my tree, 
> aimed at prevent high latencies during fork pressure. I've committed -01 which 
> does include them. When time allows, I'd be interested to know if this has some 
> impact on the Ocotea figures. TIA,

bash-2.05b# cat /proc/ipipe/version
1.0-01

SWITCH without load:

== Sampling period: 100 us
RTH|     lat min|     lat avg|     lat max|        lost
RTD|        5158|        5169|       10038|           0   iPipe 1.0-00
RTD|        5145|        5154|       10166|           0   iPipe 1.0-01

KLATENCY with load:

RTH|-----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat
worst
RTS|        2953|        5974|       19147|       0|    00:12:05 1.0-00
RTS|        3035|        8705|       20705|       0|    00:09:54 1.0-01

LATENCY with load:

== Sampling period: 100 us
RTH|-----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat
worst
RTS|        3575|        7438|       24474|       0|    00:10:50 1.0-00
RTS|        3553|       10125|       23970|       0|    00:09:41 1.0-01

It has no significant impact, I think.

Wolfgang.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Xenomai-core] Xenomai latency tests on various PowerPC boards
  2005-10-19  8:35         ` Wolfgang Grandegger
@ 2005-10-19 10:56           ` Philippe Gerum
  0 siblings, 0 replies; 8+ messages in thread
From: Philippe Gerum @ 2005-10-19 10:56 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: xenomai

Wolfgang Grandegger wrote:
> On 10/18/2005 08:14 PM Philippe Gerum wrote:
> 
>>Wolfgang Grandegger wrote:
>>
>>>On 10/18/2005 01:44 PM Philippe Gerum wrote:
>>>
>>>
>>>>Philippe Gerum wrote:
>>>>
>>>>
>>>>>Wolfgang Grandegger wrote:
>>>>>
>>>>>
>>>>>
>>>>>>Hallo,
>>>>>>
>>>>>>attached you will find the results of Xemonai latency measurements on
>>>>>>various embedded PowerPC boards using MPC 8xx and AMCC 4xx processors,
>>>>>
>>>>>>from low to high end covering a worst case latency range from 25 to 225
>>>>>
>>>>>>us. It also includes a comparison with RTAI 3.0r5 on the slowest CPU.
>>>>>>Here are some remarks and comments:
>>>>>>
>>>>>>- On low-end processor code size matters a lot and it's difficult to
>>>>>> beat RTAI/RTHAL.
>>>>>>
>>>>>
>>>>>Beat no, get closer, yes, probably. The good news is that looking at the 
>>>>>figures, we do have a margin of improvement! :o>
>>>>>
>>>>>Btw, the nucleus can be configured so that the user-space threading 
>>>>>engine is compiled out (i.e. CONFIG_XENO_OPT_PERVASIVE from the nucleus 
>>>>>menu), which would be the corresponding profile to compare with klatency 
>>>>>(i.e. sched_up). Disabling this option reduces the code size for the 
>>>>>nucleus from:
>>>>>
>>>>>  text       data        bss        dec        hex    filename
>>>>> 66740        792       6540      74072      12158    
>>>>>nucleus/xeno_nucleus.ko
>>>>>
>>>>>to:
>>>>>
>>>>> text       data        bss        dec        hex    filename
>>>>> 52596        576       3956      57128       df28    
>>>>>nucleus/xeno_nucleus.ko
>>>>>
>>>>
>>>>Disabling the periodic timer support which is unused for the klatency test 
>>>>brings this down to:
>>>>
>>>>   text	   data	    bss	    dec	    hex	filename
>>>>  51040	    544	   3956	  55540	   d8f4	nucleus/xeno_nucleus.ko
>>>
>>>
>>>OK, here are the new figures with (*)
>>>
>>> CONFIG_XENO_OPT_PERVASIVE is not set
>>> CONFIG_XENO_HW_PERIODIC_TIMER is not set:
>>>
>>>           |-----lat min|-----lat avg|-----lat max|-overrun|---test-time
>>>RTAI 3.0r5 |       23120|       31838|       70520|       ?|    00:12:26
>>>Xenomai    |       50560|       98976|      199040|       0|    00:09:45
>>>Xenomai (*)|       44160|       96215|      200640|       0|    00:09:53
>>>
>>>The min latency decreases as expected.
>>>
>>
>>I just discovered that -00 did not include some recent changes I had in my tree, 
>>aimed at prevent high latencies during fork pressure. I've committed -01 which 
>>does include them. When time allows, I'd be interested to know if this has some 
>>impact on the Ocotea figures. TIA,
> 
> 
> bash-2.05b# cat /proc/ipipe/version
> 1.0-01
> 
> SWITCH without load:
> 
> == Sampling period: 100 us
> RTH|     lat min|     lat avg|     lat max|        lost
> RTD|        5158|        5169|       10038|           0   iPipe 1.0-00
> RTD|        5145|        5154|       10166|           0   iPipe 1.0-01
> 
> KLATENCY with load:
> 
> RTH|-----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat
> worst
> RTS|        2953|        5974|       19147|       0|    00:12:05 1.0-00
> RTS|        3035|        8705|       20705|       0|    00:09:54 1.0-01
> 
> LATENCY with load:
> 
> == Sampling period: 100 us
> RTH|-----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat
> worst
> RTS|        3575|        7438|       24474|       0|    00:10:50 1.0-00
> RTS|        3553|       10125|       23970|       0|    00:09:41 1.0-01

Mmm, average even looks worse for both latency tests.

> It has no significant impact, I think.
> 

Ok, thanks. The same fix is worth 10 us on high-end x86 boxen, so I wondered if 
the same could apply to ppc as well.

-- 

Philippe.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-10-19 10:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-18  9:30 [Xenomai-core] Xenomai latency tests on various PowerPC boards Wolfgang Grandegger
2005-10-18 11:23 ` Philippe Gerum
2005-10-18 11:44   ` Philippe Gerum
2005-10-18 12:12     ` Wolfgang Grandegger
2005-10-18 12:21       ` Philippe Gerum
2005-10-18 18:14       ` Philippe Gerum
2005-10-19  8:35         ` Wolfgang Grandegger
2005-10-19 10:56           ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.