* RE: [Xenomai-core] Latencies for the Freescale i.MX21/CSB535FS
@ 2006-04-27 11:50 ROSSIER Daniel
2006-04-27 12:02 ` Jan Kiszka
0 siblings, 1 reply; 5+ messages in thread
From: ROSSIER Daniel @ 2006-04-27 11:50 UTC (permalink / raw)
To: Philippe Gerum; +Cc: xenomai
> -----Message d'origine-----
> De : Philippe Gerum [mailto:rpm@xenomai.org]
> Envoyé : mercredi, 26. avril 2006 10:41
> À : ROSSIER Daniel
> Cc : xenomai@xenomai.org
> Objet : Re: [Xenomai-core] Latencies for the Freescale i.MX21/CSB535FS
>
> ROSSIER Daniel wrote:
> > Hi all,
> >
> >
> >
> > As promised, you can find the latency results (latency -t0/-t1/-t2) as
> > well as the
> >
> > stats from the switch utility for the performance of our Xenomai port
> > onto the i.MX21 board.
> >
> >
> >
> > These are fesh results J and we didn't have time to analyze them yet.
> >
> >
> >
> > Thanks for any feedback...
>
> The tests have not been run long enough under load to get a reliable
> measure of the real worst-case figures, but still, the data sets seem
> consistent.
Ok; we will then make further test.
>
> - the test run of latency -t2 (in-kernel timer handler) shows equivalent
> worst-case figures than the -t1 form (in-kernel thread), which means
> that most of the latency hit is taken at the Adeos level, i.e. in-kernel
> scheduling adds little in the picture. Room for improvement is primarily
> hiding somewhere in the Adeos layer, I think.
>
ok; we still have to investigate all the call paths at the Adeos layer before the timer reprogramming.
> - comparing the min latency observed in the -t1 and -t2 forms, it looks
> like the inherent cost of traversing the rescheduling path would be
> close to ~10 us.
>
> - comparing the min latency observed in the -t0 and -t1 forms, there is
> another 10+ us consumed in switching mm contexts, and paying the
> involved cache penalties. The way to measure the level of perturbation
> Linux adds by switching its own tasks is to write a simple kernel module
> embodying a Xenomai thread that keeps the CPU busy while the performance
> test is running at a higher priority.
>
> I'd say that the most efficient way to reduce those latencies would
> require to first identify the source of the 40+ us spot observed with
> the -t2 form on an idle system. For that, I'm convinced that porting the
> I-pipe tracer to ARM would be the best option, since this tool would be
> of great help there.
>
Thanks for the hint; we will spend some time on the tracer in the coming days. We keep you informed.
> This port basically requires 1) to code the mcount() routine supporting
> gcc's -pg option, 2) to solve early boot issues so that mcount() does
> not attempt to trace anything while the memory environment has not been
> fully set up. The rest is pretty generic.
>
> --
>
> Philippe.
Daniel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-core] Latencies for the Freescale i.MX21/CSB535FS
2006-04-27 11:50 [Xenomai-core] Latencies for the Freescale i.MX21/CSB535FS ROSSIER Daniel
@ 2006-04-27 12:02 ` Jan Kiszka
2006-04-27 12:13 ` Philippe Gerum
0 siblings, 1 reply; 5+ messages in thread
From: Jan Kiszka @ 2006-04-27 12:02 UTC (permalink / raw)
To: ROSSIER Daniel; +Cc: xenomai
[-- Attachment #1: Type: text/plain, Size: 1122 bytes --]
ROSSIER Daniel wrote:
>>
>> I'd say that the most efficient way to reduce those latencies would
>> require to first identify the source of the 40+ us spot observed with
>> the -t2 form on an idle system. For that, I'm convinced that porting the
>> I-pipe tracer to ARM would be the best option, since this tool would be
>> of great help there.
>>
> Thanks for the hint; we will spend some time on the tracer in the coming days. We keep you informed.
>
Cool, tracing also for ARM!
>> This port basically requires 1) to code the mcount() routine supporting
>> gcc's -pg option, 2) to solve early boot issues so that mcount() does
>> not attempt to trace anything while the memory environment has not been
>> fully set up. The rest is pretty generic.
>>
Regarding a mcount() implementation and other details, the original
tracer effort by Ingo Molnar may give useful hints (at least it did for me):
http://people.redhat.com/mingo/latency-tracing-patches/
I remember the ARM part not being that simple as x86. I think this was
also due to lacking stack unwinding support on that arch.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-core] Latencies for the Freescale i.MX21/CSB535FS
2006-04-27 12:02 ` Jan Kiszka
@ 2006-04-27 12:13 ` Philippe Gerum
0 siblings, 0 replies; 5+ messages in thread
From: Philippe Gerum @ 2006-04-27 12:13 UTC (permalink / raw)
To: Jan Kiszka; +Cc: ROSSIER Daniel, xenomai
Jan Kiszka wrote:
> ROSSIER Daniel wrote:
>
>>>I'd say that the most efficient way to reduce those latencies would
>>>require to first identify the source of the 40+ us spot observed with
>>>the -t2 form on an idle system. For that, I'm convinced that porting the
>>>I-pipe tracer to ARM would be the best option, since this tool would be
>>>of great help there.
>>>
>>
>>Thanks for the hint; we will spend some time on the tracer in the coming days. We keep you informed.
>>
>
>
> Cool, tracing also for ARM!
>
>
>>>This port basically requires 1) to code the mcount() routine supporting
>>>gcc's -pg option, 2) to solve early boot issues so that mcount() does
>>>not attempt to trace anything while the memory environment has not been
>>>fully set up. The rest is pretty generic.
>>>
>
>
> Regarding a mcount() implementation and other details, the original
> tracer effort by Ingo Molnar may give useful hints (at least it did for me):
>
> http://people.redhat.com/mingo/latency-tracing-patches/
>
> I remember the ARM part not being that simple as x86. I think this was
> also due to lacking stack unwinding support on that arch.
>
There's a said to be working replacement for __builtin_return_address
provided by mingo's patch.
http://people.redhat.com/mingo/latency-tracing-patches/patches/latency-tracing.patch
--
Philippe.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Xenomai-core] Latencies for the Freescale i.MX21/CSB535FS
@ 2006-04-26 8:08 ROSSIER Daniel
2006-04-26 8:40 ` Philippe Gerum
0 siblings, 1 reply; 5+ messages in thread
From: ROSSIER Daniel @ 2006-04-26 8:08 UTC (permalink / raw)
To: xenomai
[-- Attachment #1.1: Type: text/plain, Size: 335 bytes --]
Hi all,
As promised, you can find the latency results (latency -t0/-t1/-t2) as
well as the
stats from the switch utility for the performance of our Xenomai port
onto the i.MX21 board.
These are fesh results :-) and we didn't have time to analyze them yet.
Thanks for any feedback...
Daniel
[-- Attachment #1.2: Type: text/html, Size: 3247 bytes --]
[-- Attachment #2: latency_log_iMX21 --]
[-- Type: application/octet-stream, Size: 13822 bytes --]
Options du programme
# ./latency --help
./latency: illegal option -- -
usage: latency [options]
[-h] # print histograms of min, avg, max latencies
[-s] # print statistics of min, avg, max latencies
[-H <histogram-size>] # default = 200, increase if your last bucket is full
[-B <bucket-size>] # default = 1000ns, decrease for more resolution
[-p <period_us>] # sampling period
[-l <data-lines per header>] # default=21, 0 to supress headers
[-T <test_duration_seconds>] # default=0, so ^C to end
[-q] # supresses RTD, RTH lines if -T is used
[-D <benchmark_device_no>] # number of benchmark device, default=0
[-t <test_mode>] # 0=user task (default), 1=kernel task, 2=timer IRQ
[-f] # freeze trace for each new max latency
#
--------------------------------------------------------------------------------
Measure 1
--------------------------------------------------------------------------------
# ./latency -t0 -h -s -T10
Results
-------
== Sampling period: 100 us
== Test mode: periodic user-mode task
== All results in microseconds
warming up...
RTT| 00:00:01 (periodic user-mode task, 100 us period)
RTH|-----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat worst
RTD| 19.921| 28.337| 57.160| 0| 19.921| 57.160
RTD| 21.616| 28.822| 59.279| 0| 19.921| 59.279
RTD| 21.858| 28.701| 66.424| 0| 19.921| 66.424
RTD| 21.374| 28.761| 59.158| 0| 19.921| 66.424
RTD| 21.071| 28.761| 57.402| 0| 19.921| 66.424
RTD| 21.798| 28.701| 65.516| 0| 19.921| 66.424
RTD| 20.890| 28.761| 59.824| 0| 19.921| 66.424
RTD| 20.587| 28.701| 61.156| 0| 19.921| 66.424
RTD| 21.132| 28.761| 63.094| 0| 19.921| 66.424
---|--param|----range-|--samples
HSD| min| 19 - 20 | 1
HSD| min| 20 - 21 | 2
HSD| min| 21 - 22 | 6
---|--param|----range-|--samples
HSD| avg| 19 - 20 | 1
HSD| avg| 20 - 21 | 4
HSD| avg| 21 - 22 | 10
HSD| avg| 22 - 23 | 8
HSD| avg| 23 - 24 | 9
HSD| avg| 24 - 25 | 23
HSD| avg| 25 - 26 | 1133
HSD| avg| 26 - 27 | 12533
HSD| avg| 27 - 28 | 36388
HSD| avg| 28 - 29 | 29348
HSD| avg| 29 - 30 | 12254
HSD| avg| 30 - 31 | 2243
HSD| avg| 31 - 32 | 634
HSD| avg| 32 - 33 | 366
HSD| avg| 33 - 34 | 288
HSD| avg| 34 - 35 | 276
HSD| avg| 35 - 36 | 245
HSD| avg| 36 - 37 | 275
HSD| avg| 37 - 38 | 237
HSD| avg| 38 - 39 | 249
HSD| avg| 39 - 40 | 243
HSD| avg| 40 - 41 | 317
HSD| avg| 41 - 42 | 253
HSD| avg| 42 - 43 | 260
HSD| avg| 43 - 44 | 251
HSD| avg| 44 - 45 | 272
HSD| avg| 45 - 46 | 257
HSD| avg| 46 - 47 | 307
HSD| avg| 47 - 48 | 247
HSD| avg| 48 - 49 | 242
HSD| avg| 49 - 50 | 187
HSD| avg| 50 - 51 | 155
HSD| avg| 51 - 52 | 109
HSD| avg| 52 - 53 | 97
HSD| avg| 53 - 54 | 55
HSD| avg| 54 - 55 | 39
HSD| avg| 55 - 56 | 24
HSD| avg| 56 - 57 | 15
HSD| avg| 57 - 58 | 9
HSD| avg| 58 - 59 | 6
HSD| avg| 59 - 60 | 8
HSD| avg| 60 - 61 | 1
HSD| avg| 61 - 62 | 1
HSD| avg| 63 - 64 | 1
HSD| avg| 65 - 66 | 1
HSD| avg| 66 - 67 | 1
HSD| avg| 78 - 79 | 1
---|--param|----range-|--samples
HSD| max| 57 - 58 | 2
HSD| max| 59 - 60 | 3
HSD| max| 61 - 62 | 1
HSD| max| 63 - 64 | 1
HSD| max| 65 - 66 | 1
HSD| max| 66 - 67 | 1
HSH|--param|--samples-|--average--|---stddev--
HSS| min| 9| 20.556| 0.726
HSS| avg| 99883| 28.241| 3.553
HSS| max| 9| 60.667| 3.317
---|------------|------------|------------|--------|-------------------------
RTS| 19.921| 28.701| 66.424| 0| 00:00:10/00:00:10
--------------------------------------------------------------------------------
Measure 2
--------------------------------------------------------------------------------
# ./latency -t1 -h -s -T10
Results
-------
== Sampling period: 100 us
== Test mode: in-kernel periodic task
== All results in microseconds
warming up...
RTT| 00:00:01 (in-kernel periodic task, 100 us period)
RTH|-----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat worst
RTD| 9.386| 12.080| 34.545| 0| 9.386| 34.545
RTD| 9.355| 12.402| 50.046| 0| 9.355| 50.046
RTD| 9.416| 12.368| 37.330| 0| 9.355| 50.046
RTD| 9.295| 12.412| 36.846| 0| 9.295| 50.046
RTD| 9.446| 12.360| 42.416| 0| 9.295| 50.046
RTD| 9.325| 12.397| 36.906| 0| 9.295| 50.046
RTD| 9.355| 12.384| 39.903| 0| 9.295| 50.046
RTD| 9.325| 12.332| 41.326| 0| 9.295| 50.046
RTD| 9.355| 12.319| 40.115| 0| 9.295| 50.046
---|--param|----range-|--samples
HSD| min| 9 - 10 | 9
---|--param|----range-|--samples
HSD| avg| 9 - 10 | 496
HSD| avg| 10 - 11 | 14221
HSD| avg| 11 - 12 | 41433
HSD| avg| 12 - 13 | 29745
HSD| avg| 13 - 14 | 8497
HSD| avg| 14 - 15 | 1306
HSD| avg| 15 - 16 | 353
HSD| avg| 16 - 17 | 341
HSD| avg| 17 - 18 | 452
HSD| avg| 18 - 19 | 457
HSD| avg| 19 - 20 | 363
HSD| avg| 20 - 21 | 235
HSD| avg| 21 - 22 | 158
HSD| avg| 22 - 23 | 120
HSD| avg| 23 - 24 | 132
HSD| avg| 24 - 25 | 154
HSD| avg| 25 - 26 | 163
HSD| avg| 26 - 27 | 201
HSD| avg| 27 - 28 | 172
HSD| avg| 28 - 29 | 229
HSD| avg| 29 - 30 | 225
HSD| avg| 30 - 31 | 178
HSD| avg| 31 - 32 | 117
HSD| avg| 32 - 33 | 83
HSD| avg| 33 - 34 | 59
HSD| avg| 34 - 35 | 37
HSD| avg| 35 - 36 | 19
HSD| avg| 36 - 37 | 14
HSD| avg| 37 - 38 | 2
HSD| avg| 38 - 39 | 3
HSD| avg| 39 - 40 | 2
HSD| avg| 40 - 41 | 1
HSD| avg| 41 - 42 | 2
HSD| avg| 42 - 43 | 1
HSD| avg| 49 - 50 | 1
HSD| avg| 50 - 51 | 1
---|--param|----range-|--samples
HSD| max| 34 - 35 | 1
HSD| max| 36 - 37 | 2
HSD| max| 37 - 38 | 1
HSD| max| 39 - 40 | 1
HSD| max| 40 - 41 | 1
HSD| max| 41 - 42 | 1
HSD| max| 42 - 43 | 1
HSD| max| 50 - 51 | 1
HSH|--param|--samples-|--average--|---stddev--
HSS| min| 9| 9.000| 0.000
HSS| avg| 99973| 11.828| 2.610
HSS| max| 9| 39.444| 4.746
---|------------|------------|------------|--------|-------------------------
RTS| 9.295| 12.339| 50.046| 0| 00:00:10/00:00:10
--------------------------------------------------------------------------------
Measure 3
--------------------------------------------------------------------------------
# ./latency -t2 -h -s -T10
Results
-------
== Sampling period: 100 us
== Test mode: in-kernel timer handler
== All results in microseconds
warming up...
RTT| 00:00:01 (in-kernel timer handler, 100 us period)
RTH|-----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat worst
RTD| 0.242| 0.483| 12.594| 0| 0.242| 12.594
RTD| 0.212| 0.606| 42.839| 0| 0.212| 42.839
RTD| 0.181| 0.542| 38.177| 0| 0.181| 42.839
RTD| 0.242| 0.575| 44.414| 0| 0.181| 44.414
RTD| 0.212| 0.558| 17.347| 0| 0.181| 44.414
RTD| 0.151| 0.566| 24.795| 0| 0.151| 44.414
RTD| 0.212| 0.560| 23.009| 0| 0.151| 44.414
RTD| 0.212| 0.572| 17.983| 0| 0.151| 44.414
RTD| 0.212| 0.562| 14.592| 0| 0.151| 44.414
---|--param|----range-|--samples
HSD| min| 0 - 1 | 9
---|--param|----range-|--samples
HSD| avg| 0 - 1 | 95677
HSD| avg| 1 - 2 | 1108
HSD| avg| 2 - 3 | 1060
HSD| avg| 3 - 4 | 650
HSD| avg| 4 - 5 | 281
HSD| avg| 5 - 6 | 173
HSD| avg| 6 - 7 | 161
HSD| avg| 7 - 8 | 177
HSD| avg| 8 - 9 | 190
HSD| avg| 9 - 10 | 142
HSD| avg| 10 - 11 | 94
HSD| avg| 11 - 12 | 71
HSD| avg| 12 - 13 | 44
HSD| avg| 13 - 14 | 16
HSD| avg| 14 - 15 | 6
HSD| avg| 15 - 16 | 6
HSD| avg| 16 - 17 | 4
HSD| avg| 17 - 18 | 5
HSD| avg| 18 - 19 | 1
HSD| avg| 19 - 20 | 2
HSD| avg| 23 - 24 | 1
HSD| avg| 24 - 25 | 1
HSD| avg| 29 - 30 | 1
HSD| avg| 38 - 39 | 1
HSD| avg| 42 - 43 | 1
HSD| avg| 44 - 45 | 1
---|--param|----range-|--samples
HSD| max| 12 - 13 | 1
HSD| max| 14 - 15 | 1
HSD| max| 17 - 18 | 2
HSD| max| 23 - 24 | 1
HSD| max| 24 - 25 | 1
HSD| max| 38 - 39 | 1
HSD| max| 42 - 43 | 1
HSD| max| 44 - 45 | 1
HSH|--param|--samples-|--average--|---stddev--
HSS| min| 9| 0.000| 0.000
HSS| avg| 99874| 0.152| 0.980
HSS| max| 9| 25.667| 12.440
---|------------|------------|------------|--------|-------------------------
RTS| 0.151| 0.558| 44.414| 0| 00:00:10/00:00:10
--------------------------------------------------------------------------------
Mesure 4
--------------------------------------------------------------------------------
# ./latency -t2
Results
-------
== Sampling period: 100 us
== Test mode: in-kernel timer handler
== All results in microseconds
warming up...
RTT| 00:00:01 (in-kernel timer handler, 100 us period)
RTH|-----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat worst
RTD| 0.152| 0.456| 15.804| 0| 0.152| 15.804
RTD| 0.212| 0.535| 22.616| 0| 0.152| 22.616
RTD| 0.121| 0.526| 21.799| 0| 0.121| 22.616
RTD| 0.061| 0.524| 25.825| 0| 0.061| 25.825
RTD| 0.212| 0.530| 44.445| 0| 0.061| 44.445
RTD| 0.182| 0.507| 13.110| 0| 0.061| 44.445
RTD| 0.212| 0.536| 37.845| 0| 0.061| 44.445
RTD| 0.091| 0.528| 30.942| 0| 0.061| 44.445
RTD| 0.121| 0.518| 14.109| 0| 0.061| 44.445
RTD| 0.091| 0.521| 13.261| 0| 0.061| 44.445
RTD| 0.182| 0.523| 48.381| 0| 0.061| 48.381
RTD| 0.121| 0.511| 16.016| 0| 0.061| 48.381
RTD| 0.182| 0.531| 46.261| 0| 0.061| 48.381
RTD| 0.182| 0.533| 20.951| 0| 0.061| 48.381
RTD| 0.182| 0.524| 17.833| 0| 0.061| 48.381
RTD| 0.152| 0.511| 23.736| 0| 0.061| 48.381
RTD| 0.182| 0.531| 15.834| 0| 0.061| 48.381
RTD| 0.182| 0.527| 39.601| 0| 0.061| 48.381
---|------------|------------|------------|--------|-------------------------
RTS| 0.061| 0.520| 48.381| 0| 00:00:18/00:00:18
--------------------------------------------------------------------------------
Measure 5
--------------------------------------------------------------------------------
# ./switch -h
Results
-------
== Sampling period: 100 us
== Do not interrupt this program
RTH| lat min| lat avg| lat max| lost
RTD| 23433| 28882| 59703| 186
23 - 24 us: 22
24 - 25 us: 347
25 - 26 us: 2626
26 - 27 us: 8848
27 - 28 us: 19500
28 - 29 us: 23290
29 - 30 us: 21493
30 - 31 us: 12357
31 - 32 us: 6532
32 - 33 us: 2653
33 - 34 us: 1140
34 - 35 us: 479
35 - 36 us: 134
36 - 37 us: 58
37 - 38 us: 27
38 - 39 us: 6
39 - 40 us: 4
42 - 43 us: 3
43 - 44 us: 3
44 - 45 us: 8
45 - 46 us: 13
46 - 47 us: 35
47 - 48 us: 45
48 - 49 us: 39
49 - 50 us: 37
50 - 51 us: 45
51 - 52 us: 23
52 - 53 us: 17
53 - 54 us: 18
54 - 55 us: 4
55 - 56 us: 2
56 - 57 us: 1
59 - 60 us: 1
--------------------------------------------------------------------------------
Measure 6
--------------------------------------------------------------------------------
# ./switch
Results
-------
== Sampling period: 100 us
== Do not interrupt this program
RTH| lat min| lat avg| lat max| lost
RTD| 22706| 28035| 58795| 2
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-core] Latencies for the Freescale i.MX21/CSB535FS
2006-04-26 8:08 ROSSIER Daniel
@ 2006-04-26 8:40 ` Philippe Gerum
0 siblings, 0 replies; 5+ messages in thread
From: Philippe Gerum @ 2006-04-26 8:40 UTC (permalink / raw)
To: ROSSIER Daniel; +Cc: xenomai
ROSSIER Daniel wrote:
> Hi all,
>
>
>
> As promised, you can find the latency results (latency –t0/-t1/-t2) as
> well as the
>
> stats from the switch utility for the performance of our Xenomai port
> onto the i.MX21 board.
>
>
>
> These are fesh results J and we didn't have time to analyze them yet.
>
>
>
> Thanks for any feedback…
The tests have not been run long enough under load to get a reliable
measure of the real worst-case figures, but still, the data sets seem
consistent.
- the test run of latency -t2 (in-kernel timer handler) shows equivalent
worst-case figures than the -t1 form (in-kernel thread), which means
that most of the latency hit is taken at the Adeos level, i.e. in-kernel
scheduling adds little in the picture. Room for improvement is primarily
hiding somewhere in the Adeos layer, I think.
- comparing the min latency observed in the -t1 and -t2 forms, it looks
like the inherent cost of traversing the rescheduling path would be
close to ~10 us.
- comparing the min latency observed in the -t0 and -t1 forms, there is
another 10+ us consumed in switching mm contexts, and paying the
involved cache penalties. The way to measure the level of perturbation
Linux adds by switching its own tasks is to write a simple kernel module
embodying a Xenomai thread that keeps the CPU busy while the performance
test is running at a higher priority.
I'd say that the most efficient way to reduce those latencies would
require to first identify the source of the 40+ us spot observed with
the -t2 form on an idle system. For that, I'm convinced that porting the
I-pipe tracer to ARM would be the best option, since this tool would be
of great help there.
This port basically requires 1) to code the mcount() routine supporting
gcc's -pg option, 2) to solve early boot issues so that mcount() does
not attempt to trace anything while the memory environment has not been
fully set up. The rest is pretty generic.
--
Philippe.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-04-27 12:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-27 11:50 [Xenomai-core] Latencies for the Freescale i.MX21/CSB535FS ROSSIER Daniel
2006-04-27 12:02 ` Jan Kiszka
2006-04-27 12:13 ` Philippe Gerum
-- strict thread matches above, loose matches on Subject: below --
2006-04-26 8:08 ROSSIER Daniel
2006-04-26 8:40 ` Philippe Gerum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.