From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <47B050A4.4060409@domain.hid>
Date: Mon, 11 Feb 2008 14:41:56 +0100
From: Jan Kiszka <jan.kiszka@domain.hid>
MIME-Version: 1.0
References: <20080123065221.GB6573@domain.hid>	<2ff1a98a0801230204s15e4eefaifdd2c946c44549df@domain.hid>	<2ff1a98a0801230515i77f8c22bk866c4cd592a3a9b8@domain.hid>	<20080124094150.GA7503@domain.hid>
	<18351.24380.551907.397687@domain.hid>
In-Reply-To: <18351.24380.551907.397687@domain.hid>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Xenomai-core] [Xenomai-help] AT91SAM9260 latency
List-Id: "Xenomai life and development \(bug reports, patches,
	discussions\)" <xenomai.xenomai.org>
List-Unsubscribe: <https://mail.gna.org/listinfo/xenomai-core>,
	<mailto:xenomai-core-request@domain.hid>
List-Archive: </public/xenomai-core>
List-Post: <mailto:xenomai@xenomai.org>
List-Help: <mailto:xenomai-core-request@domain.hid>
List-Subscribe: <https://mail.gna.org/listinfo/xenomai-core>,
	<mailto:xenomai-core-request@domain.hid>
To: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Cc: jagarcia@domain.hid, xenomai@xenomai.org

Gilles Chanteperdrix wrote:
> Juan Antonio Garcia Redondo wrote:
>  > On 23/01/08 14:15, Gilles Chanteperdrix wrote:
>  > > On Jan 23, 2008 11:04 AM, Gilles Chanteperdrix
>  > > <gilles.chanteperdrix@xenomai.org> wrote:
>  > > > On Jan 23, 2008 7:52 AM, Juan Antonio Garcia Redondo
>  > > >
>  > > > <juan-antonio.garcia@domain.hid> wrote:
>  > > > > I see everything OK except for the first samples of cyclictests. Any comments ?
>  > > >
>  > > > The load you apply does not load the cache, which is a source of
>  > > > jitter. You should run the cache calibrator (I do not find the cache
>  > > > calibrator URL, but it is somewhere in Xenomai distribution or wiki).
>  > > 
>  > > It is in the TROUBLESHOOTING guide, question "How do I adequately stress test".
>  > > 
>  > > -- 
>  > >                                                Gilles Chanteperdrix
>  > 
>  > Thanks Gilles, I've done more tests using the cache calibrator from
>  > http://www.cwi.nl/~manegold/Calibrator. The latency numbers are very
>  > similar althought I've found an strange behaviour related to telnet
>  > sessions.
>  > 
>  > Environment:
>  > 	o Tests running from console over atmel serial port.
>  > 	o A telnet session over on-chip ethernet. 
>  >         o System without load.
>  > 
>  > ./latency -p 500 -t0
>  > == All results in microseconds
>  > warming up...
>  > RTT|  00:00:01  (periodic user-mode task, 500 us period, priority 99)
>  > RTH|-RTH----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat
>  > worst
>  > RTD|      49.613|      52.190|      62.822|       0|      49.613| 62.822
>  > RTD|      42.203|      52.512|      66.365|       0|      42.203| 66.365
>  > 
>  > 
>  > Now If hit a key on the telnet session :
>  > 
>  > RTD|      36.726|      57.989|     109.536|       0|      31.572| 109.536  <-------- Here I've hit the key.
>  > RTD|      36.404|      51.868|      69.587|       0|      31.572| 109.536
>  > RTD|      35.760|      51.868|      73.775|       0|      31.572| 109.536
>  > 
>  > Now, I launch an script which executes four instances of cache
>  > calibrator.
>  > 
>  > RTD|      45.103|      57.667|      75.708|       0|      32.538| 122.422
>  > RTD|      45.425|      57.023|      76.030|       0|      32.538| 122.422
>  > RTD|      46.069|      57.023|      75.708|       0|      32.538| 122.422
>  > 
>  > Now, I can hit a key on the telnet session without effects over latency
>  > numbers:
>  > 
>  > RTD|      44.136|      57.989|      75.386|       0|      27.384| 128.221
>  > RTD|      46.713|      57.345|      76.353|       0|      27.384| 128.221
>  > RTD|      44.780|      57.345|      76.675|       0|      27.384| 128.221
>  > RTD|      43.492|      56.701|      76.997|       0|      27.384| 128.221
>  > 
>  > Now I stop the calibrator process and launch 'ping -f -s2048 192.168.2.82' from an external
>  > machine.
>  > 
>  > RTD|      40.270|      68.621|      90.850|       0|      27.384| 128.221
>  > RTD|      36.082|      68.621|      88.273|       0|      27.384| 128.221
>  > RTD|      40.592|      67.976|      91.494|       0|      27.384| 128.221
>  > RTD|      41.237|      68.298|      89.239|       0|      27.384| 128.221
>  > 
>  > 
>  > Now If hit a key on the telnet session :
>  > 
>  > RTD|      42.203|      67.976|      88.273|       0|      27.384| 128.221
>  > RTD|      32.216|      93.427|     128.543|       0|      27.384| 128.543 <---------- Here I've hit the key.
>  > RTD|      42.203|      68.298|      87.628|       0|      27.384| 128.543
>  > 
>  > And again the calibrator execution results on eliminate the strange
>  > behaviour whith the telnet session.
>  > 
>  > Any clues ?
> 
> Here is an update, follow-up on xenomai-core. I was finally able to
> reproduce this behaviour: I run latency in the background and hit the
> "Enter" key on my serial console, and get high latency figures.
> 
> I enabled the tracer, set xenomai latency to 300us and managed to get a
> trace (220us latency). However, I do not understand what is going wrong
> from reading the trace, so I post it here in case someone sees something.
> 
> Ah, and I added an ipipe_trace_special in ipipe_grab_irq to log the
> number of the received irq. 1 is serial interrupt 18 (0x12) is timer
> interrupt.
> 
> Inline, so that Jan can comment it.

Thanks, but TB is too "smart" - it cuts off everything that is marked as
footer ("--"). :-/

> I-pipe frozen back-tracing service on 2.6.20/ipipe-1.8-04
> ------------------------------------------------------------
> CPU: 0, Freeze: 450692973 cycles, Trace Points: 1000 (+10)
> Calibrated minimum trace-point overhead: 1.000 us

That is interesting. I tells us that we might subtract 1 us
_per_tracepoint_ from the given latencies due to the inherent tracer
overhead. We have about 50 entries in the critical path, so 50 us
compared to 220 us that were measured - roughly 170 us real latency.

What is the clock resolution btw? 500 ns?

So here is the interesting block, starting with the last larger IRQs-on
window.

> :   + func                -447+   2.500  xnshadow_relax+0x14 (hisyscall_event+0x210)
> :|  + begin   0x80000000  -445+   3.000  xnshadow_relax+0xd4 (hisyscall_event+0x210)
> :|  # func                -442+   5.000  schedule_linux_call+0x10 (xnshadow_relax+0x114)
> :|  # func                -437+   4.000  rthal_apc_schedule+0x10 (schedule_linux_call+0x1e8)
> :|  # func                -433+   5.000  __ipipe_schedule_irq+0x10 (rthal_apc_schedule+0xac)
> :|  # func                -428+   4.500  __ipipe_set_irq_pending+0x10 (__ipipe_schedule_irq+0xa4)
> :|  # func                -423+   3.500  rpi_push+0x14 (xnshadow_relax+0x11c)
> :|  # func                -420+   5.500  xnpod_suspend_thread+0x14 (xnshadow_relax+0x148)
> :|  # func                -414+   4.000  xnpod_schedule+0x14 (xnpod_suspend_thread+0x60c)
> :|  # [  752] -<?>-    0  -410+   7.000  xnpod_schedule+0xc8 (xnpod_suspend_thread+0x60c)
> :|  # func                -403!  56.000  xnheap_finalize_free_inner+0x10 (xnpod_schedule+0x82c)
> :|  # [    0] -<?>-   -1  -347!  20.000  xnpod_schedule+0xb14 (xnintr_clock_handler+0xa0)
> :|   +func                -327+   3.000  __ipipe_walk_pipeline+0x10 (__ipipe_handle_irq+0x124)
> :|   +func                -324+   7.500  __ipipe_sync_stage+0x14 (__ipipe_walk_pipeline+0xa8)
> :|   #end     0x80000000  -317+   7.000  __ipipe_sync_stage+0x250 (__ipipe_walk_pipeline+0xa8)

OK, the clock starts ticking...

> :|   #func                -310+   2.500  __ipipe_grab_irq+0x10 (__irq_svc+0x28)
> :|   #begin   0xffffffff  -307+   7.000  __ipipe_grab_irq+0x20 (__irq_svc+0x28)
> :|   #(0x2a)  0x00000012  -300+   5.000  __ipipe_grab_irq+0x2c (__irq_svc+0x28)
> :|   #func                -295+   4.000  __ipipe_handle_irq+0x10 (__ipipe_grab_irq+0x104)
> :|   #func                -291+   2.500  __ipipe_ack_timerirq+0x10 (__ipipe_handle_irq+0x74)
> :|   #func                -289+   3.000  __ipipe_ack_level_irq+0x10 (__ipipe_ack_timerirq+0x30)
> :|   #func                -286+   2.000  at91_aic_mask_irq+0x10 (__ipipe_ack_level_irq+0x3c)
> :|   #func                -284+   2.000  at91_aic_mask_irq+0x10 (__ipipe_ack_level_irq+0x4c)

(Without looking at the arm code: Is this double invocation of
at91_aic_mask_irq correct and required?)

> :|   #func                -282+   4.000  __ipipe_mach_acktimer+0x10 (__ipipe_ack_timerirq+0x40)
> :|   #func                -278+   2.000  __ipipe_end_level_irq+0x10 (__ipipe_ack_timerirq+0x50)
> :|   #func                -276+   2.500  at91_aic_unmask_irq+0x10 (__ipipe_end_level_irq+0x28)
> :|   #func                -273+   3.500  __ipipe_dispatch_wired+0x14 (__ipipe_handle_irq+0x80)
> :|  #*func                -270+   3.500  xnintr_clock_handler+0x10 (__ipipe_dispatch_wired+0xe4)
> :|  #*func                -266+   9.500  xntimer_tick_aperiodic+0x14 (xnintr_clock_handler+0x34)
> :|  #*func                -257+   3.500  xnthread_periodic_handler+0x10 (xntimer_tick_aperiodic+0x354)
> :|  #*func                -253+   4.000  xnpod_resume_thread+0x14 (xnthread_periodic_handler+0x34)
> :|  #*[  753] -<?>-   99  -249!  15.000  xnpod_resume_thread+0x84 (xnthread_periodic_handler+0x34)

Hmm, comparably costly, this simple resume. Hope it's not the
instrumentation (ipipe_trace_pid?) itself.

> :|  #*func                -234+   6.500  xntimer_next_local_shot+0x10 (xntimer_tick_aperiodic+0x7c0)
> :|  #*func                -228+   4.000  __ipipe_mach_set_dec+0x10 (xntimer_next_local_shot+0xbc)
> :|  #*func                -224+   3.500  xnpod_schedule+0x14 (xnintr_clock_handler+0xa0)
> :|  #*[    0] -<?>-   -1  -220!  59.500  xnpod_schedule+0xc8 (xnintr_clock_handler+0xa0)

OK, this is the cache flushing thing, I guess. Expected.

> :|  #*[  753] -<?>-   99  -161!  17.500  xnpod_schedule+0xb14 (xnpod_suspend_thread+0x60c)

Again this peak. Strange.

> :|  #*func                -143+   4.000  xntimer_get_overruns+0x14 (xnpod_wait_thread_period+0xe0)
> :|  #*func                -139+   4.000  __ipipe_restore_pipeline_head+0x10 (xnpod_wait_thread_period+0x114)
> :|  +*end     0x80000000  -135+   7.000  __ipipe_restore_pipeline_head+0x11c (xnpod_wait_thread_period+0x114)

So we woke up the thread and are on the way back to userland. But then
the serial IRQ hits us...

> :|  +*func                -128+   3.000  __ipipe_grab_irq+0x10 (__irq_svc+0x28)
> :|  +*begin   0xffffffff  -125+   7.000  __ipipe_grab_irq+0x20 (__irq_svc+0x28)
> :|  +*(0x2a)  0x00000001  -118+   4.000  __ipipe_grab_irq+0x2c (__irq_svc+0x28)
> :|  +*func                -114+   5.000  __ipipe_handle_irq+0x10 (__ipipe_grab_irq+0x104)
> :|  +*func                -109+   4.500  __ipipe_set_irq_pending+0x10 (__ipipe_handle_irq+0xdc)
> :|  +*func                -105+   2.000  __ipipe_ack_irq+0x10 (__ipipe_handle_irq+0xfc)
> :|  +*func                -103+   3.000  __ipipe_ack_level_irq+0x10 (__ipipe_ack_irq+0x28)
> :|  +*func                -100+   2.500  at91_aic_mask_irq+0x10 (__ipipe_ack_level_irq+0x3c)
> :|  +*func                 -97+   2.500  at91_aic_mask_irq+0x10 (__ipipe_ack_level_irq+0x4c)
> :|  +*func                 -95+   3.000  __ipipe_walk_pipeline+0x10 (__ipipe_handle_irq+0x124)

Serial done, the above 30 us (minus tracer overhead) are unavoidable.

> :|  +*end     0xffffffff   -92!  10.500  __ipipe_grab_irq+0x110 (__irq_svc+0x28)
> :|  +*begin   0x80000001   -81+   4.000  __ipipe_dispatch_event+0x108 (__ipipe_syscall_root+0x84)

This is the syscall epilogue related to wait_thread_period.

> :|  +*end     0x80000001   -77!  39.500  __ipipe_dispatch_event+0x1cc (__ipipe_syscall_root+0x84)
> :   +*func                 -38+   2.500  __ipipe_syscall_root+0x10 (vector_swi+0x68)

After doing some latency calculation in user land, we are back to freeze
the trace now.

> :   +*func                 -35+   2.000  __ipipe_dispatch_event+0x14 (__ipipe_syscall_root+0x84)
> :|  +*begin   0x80000001   -33+   4.000  __ipipe_dispatch_event+0x40 (__ipipe_syscall_root+0x84)
> :|  +*end     0x80000001   -29+   2.000  __ipipe_dispatch_event+0xc8 (__ipipe_syscall_root+0x84)
> :   +*func                 -27+   6.500  hisyscall_event+0x14 (__ipipe_dispatch_event+0xe0)
> :   +*func                 -21+   2.500  xnshadow_sys_trace+0x10 (hisyscall_event+0x150)
> :   +*func                 -18+   2.000  ipipe_trace_frozen_reset+0x10 (xnshadow_sys_trace+0x98)
> :   +*func                 -16+   2.000  __ipipe_global_path_lock+0x10 (ipipe_trace_frozen_reset+0x14)
> :   +*func                 -14+   2.000  __ipipe_spin_lock_irqsave+0x10 (__ipipe_global_path_lock+0x18)
> :|  +*begin   0x80000001   -12+   8.000  __ipipe_spin_lock_irqsave+0x34 (__ipipe_global_path_lock+0x18)
> :|  #*func                  -4+   2.500  __ipipe_spin_unlock_irqcomplete+0x10 (__ipipe_global_path_unlock+0x6c)
> :|  +*end     0x80000001    -2+   2.000  __ipipe_spin_unlock_irqcomplete+0x4c (__ipipe_global_path_unlock+0x6c)
> <   +*freeze  0x00035ea1     0    2.500  xnshadow_sys_trace+0xa4 (hisyscall_event+0x150)

So, all in all, nothing fishy, at least here. Any IRQ (actually even
multiple!) can hit us between wakeup and latency calculation, and this
happened here due to sending some characters over the serial console.
Plus we see the cache flushing overhead when switching processes, also
normal.

I would just check those small peaks around ipipe_trace_pid, if they are
just tracing related) (e.g. by disabling those instrumentations in the
nucleus).

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux