From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <47B050A4.4060409@domain.hid> Date: Mon, 11 Feb 2008 14:41:56 +0100 From: Jan Kiszka MIME-Version: 1.0 References: <20080123065221.GB6573@domain.hid> <2ff1a98a0801230204s15e4eefaifdd2c946c44549df@domain.hid> <2ff1a98a0801230515i77f8c22bk866c4cd592a3a9b8@domain.hid> <20080124094150.GA7503@domain.hid> <18351.24380.551907.397687@domain.hid> In-Reply-To: <18351.24380.551907.397687@domain.hid> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-core] [Xenomai-help] AT91SAM9260 latency List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gilles Chanteperdrix Cc: jagarcia@domain.hid, xenomai@xenomai.org Gilles Chanteperdrix wrote: > Juan Antonio Garcia Redondo wrote: > > On 23/01/08 14:15, Gilles Chanteperdrix wrote: > > > On Jan 23, 2008 11:04 AM, Gilles Chanteperdrix > > > wrote: > > > > On Jan 23, 2008 7:52 AM, Juan Antonio Garcia Redondo > > > > > > > > wrote: > > > > > I see everything OK except for the first samples of cyclictests. Any comments ? > > > > > > > > The load you apply does not load the cache, which is a source of > > > > jitter. You should run the cache calibrator (I do not find the cache > > > > calibrator URL, but it is somewhere in Xenomai distribution or wiki). > > > > > > It is in the TROUBLESHOOTING guide, question "How do I adequately stress test". > > > > > > -- > > > Gilles Chanteperdrix > > > > Thanks Gilles, I've done more tests using the cache calibrator from > > http://www.cwi.nl/~manegold/Calibrator. The latency numbers are very > > similar althought I've found an strange behaviour related to telnet > > sessions. > > > > Environment: > > o Tests running from console over atmel serial port. > > o A telnet session over on-chip ethernet. > > o System without load. > > > > ./latency -p 500 -t0 > > == All results in microseconds > > warming up... > > RTT| 00:00:01 (periodic user-mode task, 500 us period, priority 99) > > RTH|-RTH----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat > > worst > > RTD| 49.613| 52.190| 62.822| 0| 49.613| 62.822 > > RTD| 42.203| 52.512| 66.365| 0| 42.203| 66.365 > > > > > > Now If hit a key on the telnet session : > > > > RTD| 36.726| 57.989| 109.536| 0| 31.572| 109.536 <-------- Here I've hit the key. > > RTD| 36.404| 51.868| 69.587| 0| 31.572| 109.536 > > RTD| 35.760| 51.868| 73.775| 0| 31.572| 109.536 > > > > Now, I launch an script which executes four instances of cache > > calibrator. > > > > RTD| 45.103| 57.667| 75.708| 0| 32.538| 122.422 > > RTD| 45.425| 57.023| 76.030| 0| 32.538| 122.422 > > RTD| 46.069| 57.023| 75.708| 0| 32.538| 122.422 > > > > Now, I can hit a key on the telnet session without effects over latency > > numbers: > > > > RTD| 44.136| 57.989| 75.386| 0| 27.384| 128.221 > > RTD| 46.713| 57.345| 76.353| 0| 27.384| 128.221 > > RTD| 44.780| 57.345| 76.675| 0| 27.384| 128.221 > > RTD| 43.492| 56.701| 76.997| 0| 27.384| 128.221 > > > > Now I stop the calibrator process and launch 'ping -f -s2048 192.168.2.82' from an external > > machine. > > > > RTD| 40.270| 68.621| 90.850| 0| 27.384| 128.221 > > RTD| 36.082| 68.621| 88.273| 0| 27.384| 128.221 > > RTD| 40.592| 67.976| 91.494| 0| 27.384| 128.221 > > RTD| 41.237| 68.298| 89.239| 0| 27.384| 128.221 > > > > > > Now If hit a key on the telnet session : > > > > RTD| 42.203| 67.976| 88.273| 0| 27.384| 128.221 > > RTD| 32.216| 93.427| 128.543| 0| 27.384| 128.543 <---------- Here I've hit the key. > > RTD| 42.203| 68.298| 87.628| 0| 27.384| 128.543 > > > > And again the calibrator execution results on eliminate the strange > > behaviour whith the telnet session. > > > > Any clues ? > > Here is an update, follow-up on xenomai-core. I was finally able to > reproduce this behaviour: I run latency in the background and hit the > "Enter" key on my serial console, and get high latency figures. > > I enabled the tracer, set xenomai latency to 300us and managed to get a > trace (220us latency). However, I do not understand what is going wrong > from reading the trace, so I post it here in case someone sees something. > > Ah, and I added an ipipe_trace_special in ipipe_grab_irq to log the > number of the received irq. 1 is serial interrupt 18 (0x12) is timer > interrupt. > > Inline, so that Jan can comment it. Thanks, but TB is too "smart" - it cuts off everything that is marked as footer ("--"). :-/ > I-pipe frozen back-tracing service on 2.6.20/ipipe-1.8-04 > ------------------------------------------------------------ > CPU: 0, Freeze: 450692973 cycles, Trace Points: 1000 (+10) > Calibrated minimum trace-point overhead: 1.000 us That is interesting. I tells us that we might subtract 1 us _per_tracepoint_ from the given latencies due to the inherent tracer overhead. We have about 50 entries in the critical path, so 50 us compared to 220 us that were measured - roughly 170 us real latency. What is the clock resolution btw? 500 ns? So here is the interesting block, starting with the last larger IRQs-on window. > : + func -447+ 2.500 xnshadow_relax+0x14 (hisyscall_event+0x210) > :| + begin 0x80000000 -445+ 3.000 xnshadow_relax+0xd4 (hisyscall_event+0x210) > :| # func -442+ 5.000 schedule_linux_call+0x10 (xnshadow_relax+0x114) > :| # func -437+ 4.000 rthal_apc_schedule+0x10 (schedule_linux_call+0x1e8) > :| # func -433+ 5.000 __ipipe_schedule_irq+0x10 (rthal_apc_schedule+0xac) > :| # func -428+ 4.500 __ipipe_set_irq_pending+0x10 (__ipipe_schedule_irq+0xa4) > :| # func -423+ 3.500 rpi_push+0x14 (xnshadow_relax+0x11c) > :| # func -420+ 5.500 xnpod_suspend_thread+0x14 (xnshadow_relax+0x148) > :| # func -414+ 4.000 xnpod_schedule+0x14 (xnpod_suspend_thread+0x60c) > :| # [ 752] -- 0 -410+ 7.000 xnpod_schedule+0xc8 (xnpod_suspend_thread+0x60c) > :| # func -403! 56.000 xnheap_finalize_free_inner+0x10 (xnpod_schedule+0x82c) > :| # [ 0] -- -1 -347! 20.000 xnpod_schedule+0xb14 (xnintr_clock_handler+0xa0) > :| +func -327+ 3.000 __ipipe_walk_pipeline+0x10 (__ipipe_handle_irq+0x124) > :| +func -324+ 7.500 __ipipe_sync_stage+0x14 (__ipipe_walk_pipeline+0xa8) > :| #end 0x80000000 -317+ 7.000 __ipipe_sync_stage+0x250 (__ipipe_walk_pipeline+0xa8) OK, the clock starts ticking... > :| #func -310+ 2.500 __ipipe_grab_irq+0x10 (__irq_svc+0x28) > :| #begin 0xffffffff -307+ 7.000 __ipipe_grab_irq+0x20 (__irq_svc+0x28) > :| #(0x2a) 0x00000012 -300+ 5.000 __ipipe_grab_irq+0x2c (__irq_svc+0x28) > :| #func -295+ 4.000 __ipipe_handle_irq+0x10 (__ipipe_grab_irq+0x104) > :| #func -291+ 2.500 __ipipe_ack_timerirq+0x10 (__ipipe_handle_irq+0x74) > :| #func -289+ 3.000 __ipipe_ack_level_irq+0x10 (__ipipe_ack_timerirq+0x30) > :| #func -286+ 2.000 at91_aic_mask_irq+0x10 (__ipipe_ack_level_irq+0x3c) > :| #func -284+ 2.000 at91_aic_mask_irq+0x10 (__ipipe_ack_level_irq+0x4c) (Without looking at the arm code: Is this double invocation of at91_aic_mask_irq correct and required?) > :| #func -282+ 4.000 __ipipe_mach_acktimer+0x10 (__ipipe_ack_timerirq+0x40) > :| #func -278+ 2.000 __ipipe_end_level_irq+0x10 (__ipipe_ack_timerirq+0x50) > :| #func -276+ 2.500 at91_aic_unmask_irq+0x10 (__ipipe_end_level_irq+0x28) > :| #func -273+ 3.500 __ipipe_dispatch_wired+0x14 (__ipipe_handle_irq+0x80) > :| #*func -270+ 3.500 xnintr_clock_handler+0x10 (__ipipe_dispatch_wired+0xe4) > :| #*func -266+ 9.500 xntimer_tick_aperiodic+0x14 (xnintr_clock_handler+0x34) > :| #*func -257+ 3.500 xnthread_periodic_handler+0x10 (xntimer_tick_aperiodic+0x354) > :| #*func -253+ 4.000 xnpod_resume_thread+0x14 (xnthread_periodic_handler+0x34) > :| #*[ 753] -- 99 -249! 15.000 xnpod_resume_thread+0x84 (xnthread_periodic_handler+0x34) Hmm, comparably costly, this simple resume. Hope it's not the instrumentation (ipipe_trace_pid?) itself. > :| #*func -234+ 6.500 xntimer_next_local_shot+0x10 (xntimer_tick_aperiodic+0x7c0) > :| #*func -228+ 4.000 __ipipe_mach_set_dec+0x10 (xntimer_next_local_shot+0xbc) > :| #*func -224+ 3.500 xnpod_schedule+0x14 (xnintr_clock_handler+0xa0) > :| #*[ 0] -- -1 -220! 59.500 xnpod_schedule+0xc8 (xnintr_clock_handler+0xa0) OK, this is the cache flushing thing, I guess. Expected. > :| #*[ 753] -- 99 -161! 17.500 xnpod_schedule+0xb14 (xnpod_suspend_thread+0x60c) Again this peak. Strange. > :| #*func -143+ 4.000 xntimer_get_overruns+0x14 (xnpod_wait_thread_period+0xe0) > :| #*func -139+ 4.000 __ipipe_restore_pipeline_head+0x10 (xnpod_wait_thread_period+0x114) > :| +*end 0x80000000 -135+ 7.000 __ipipe_restore_pipeline_head+0x11c (xnpod_wait_thread_period+0x114) So we woke up the thread and are on the way back to userland. But then the serial IRQ hits us... > :| +*func -128+ 3.000 __ipipe_grab_irq+0x10 (__irq_svc+0x28) > :| +*begin 0xffffffff -125+ 7.000 __ipipe_grab_irq+0x20 (__irq_svc+0x28) > :| +*(0x2a) 0x00000001 -118+ 4.000 __ipipe_grab_irq+0x2c (__irq_svc+0x28) > :| +*func -114+ 5.000 __ipipe_handle_irq+0x10 (__ipipe_grab_irq+0x104) > :| +*func -109+ 4.500 __ipipe_set_irq_pending+0x10 (__ipipe_handle_irq+0xdc) > :| +*func -105+ 2.000 __ipipe_ack_irq+0x10 (__ipipe_handle_irq+0xfc) > :| +*func -103+ 3.000 __ipipe_ack_level_irq+0x10 (__ipipe_ack_irq+0x28) > :| +*func -100+ 2.500 at91_aic_mask_irq+0x10 (__ipipe_ack_level_irq+0x3c) > :| +*func -97+ 2.500 at91_aic_mask_irq+0x10 (__ipipe_ack_level_irq+0x4c) > :| +*func -95+ 3.000 __ipipe_walk_pipeline+0x10 (__ipipe_handle_irq+0x124) Serial done, the above 30 us (minus tracer overhead) are unavoidable. > :| +*end 0xffffffff -92! 10.500 __ipipe_grab_irq+0x110 (__irq_svc+0x28) > :| +*begin 0x80000001 -81+ 4.000 __ipipe_dispatch_event+0x108 (__ipipe_syscall_root+0x84) This is the syscall epilogue related to wait_thread_period. > :| +*end 0x80000001 -77! 39.500 __ipipe_dispatch_event+0x1cc (__ipipe_syscall_root+0x84) > : +*func -38+ 2.500 __ipipe_syscall_root+0x10 (vector_swi+0x68) After doing some latency calculation in user land, we are back to freeze the trace now. > : +*func -35+ 2.000 __ipipe_dispatch_event+0x14 (__ipipe_syscall_root+0x84) > :| +*begin 0x80000001 -33+ 4.000 __ipipe_dispatch_event+0x40 (__ipipe_syscall_root+0x84) > :| +*end 0x80000001 -29+ 2.000 __ipipe_dispatch_event+0xc8 (__ipipe_syscall_root+0x84) > : +*func -27+ 6.500 hisyscall_event+0x14 (__ipipe_dispatch_event+0xe0) > : +*func -21+ 2.500 xnshadow_sys_trace+0x10 (hisyscall_event+0x150) > : +*func -18+ 2.000 ipipe_trace_frozen_reset+0x10 (xnshadow_sys_trace+0x98) > : +*func -16+ 2.000 __ipipe_global_path_lock+0x10 (ipipe_trace_frozen_reset+0x14) > : +*func -14+ 2.000 __ipipe_spin_lock_irqsave+0x10 (__ipipe_global_path_lock+0x18) > :| +*begin 0x80000001 -12+ 8.000 __ipipe_spin_lock_irqsave+0x34 (__ipipe_global_path_lock+0x18) > :| #*func -4+ 2.500 __ipipe_spin_unlock_irqcomplete+0x10 (__ipipe_global_path_unlock+0x6c) > :| +*end 0x80000001 -2+ 2.000 __ipipe_spin_unlock_irqcomplete+0x4c (__ipipe_global_path_unlock+0x6c) > < +*freeze 0x00035ea1 0 2.500 xnshadow_sys_trace+0xa4 (hisyscall_event+0x150) So, all in all, nothing fishy, at least here. Any IRQ (actually even multiple!) can hit us between wakeup and latency calculation, and this happened here due to sending some characters over the serial console. Plus we see the cache flushing overhead when switching processes, also normal. I would just check those small peaks around ipipe_trace_pid, if they are just tracing related) (e.g. by disabling those instrumentations in the nucleus). Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux