From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <47B081C8.60409@domain.hid> Date: Mon, 11 Feb 2008 18:11:36 +0100 From: Jan Kiszka MIME-Version: 1.0 References: <20080123065221.GB6573@domain.hid> <2ff1a98a0801230204s15e4eefaifdd2c946c44549df@domain.hid> <2ff1a98a0801230515i77f8c22bk866c4cd592a3a9b8@domain.hid> <20080124094150.GA7503@domain.hid> <18351.24380.551907.397687@domain.hid> <47B050A4.4060409@domain.hid> <2ff1a98a0802110605l416bc82bpdd4a7f2e6e92a342@domain.hid> In-Reply-To: <2ff1a98a0802110605l416bc82bpdd4a7f2e6e92a342@domain.hid> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-core] [Xenomai-help] AT91SAM9260 latency List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gilles Chanteperdrix Cc: jagarcia@domain.hid, xenomai@xenomai.org Gilles Chanteperdrix wrote: > On Mon, Feb 11, 2008 at 2:41 PM, Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >> > Juan Antonio Garcia Redondo wrote: >> > > On 23/01/08 14:15, Gilles Chanteperdrix wrote: >> > > > On Jan 23, 2008 11:04 AM, Gilles Chanteperdrix >> > > > wrote: >> > > > > On Jan 23, 2008 7:52 AM, Juan Antonio Garcia Redondo >> > > > > >> > > > > wrote: >> > > > > > I see everything OK except for the first samples of cyclictests. Any comments ? >> > > > > >> > > > > The load you apply does not load the cache, which is a source of >> > > > > jitter. You should run the cache calibrator (I do not find the cache >> > > > > calibrator URL, but it is somewhere in Xenomai distribution or wiki). >> > > > >> > > > It is in the TROUBLESHOOTING guide, question "How do I adequately stress test". >> > > > >> > > > -- >> > > > Gilles Chanteperdrix >> > > >> > > Thanks Gilles, I've done more tests using the cache calibrator from >> > > http://www.cwi.nl/~manegold/Calibrator. The latency numbers are very >> > > similar althought I've found an strange behaviour related to telnet >> > > sessions. >> > > >> > > Environment: >> > > o Tests running from console over atmel serial port. >> > > o A telnet session over on-chip ethernet. >> > > o System without load. >> > > >> > > ./latency -p 500 -t0 >> > > == All results in microseconds >> > > warming up... >> > > RTT| 00:00:01 (periodic user-mode task, 500 us period, priority 99) >> > > RTH|-RTH----lat min|-----lat avg|-----lat max|-overrun|----lat best|---lat >> > > worst >> > > RTD| 49.613| 52.190| 62.822| 0| 49.613| 62.822 >> > > RTD| 42.203| 52.512| 66.365| 0| 42.203| 66.365 >> > > >> > > >> > > Now If hit a key on the telnet session : >> > > >> > > RTD| 36.726| 57.989| 109.536| 0| 31.572| 109.536 <-------- Here I've hit the key. >> > > RTD| 36.404| 51.868| 69.587| 0| 31.572| 109.536 >> > > RTD| 35.760| 51.868| 73.775| 0| 31.572| 109.536 >> > > >> > > Now, I launch an script which executes four instances of cache >> > > calibrator. >> > > >> > > RTD| 45.103| 57.667| 75.708| 0| 32.538| 122.422 >> > > RTD| 45.425| 57.023| 76.030| 0| 32.538| 122.422 >> > > RTD| 46.069| 57.023| 75.708| 0| 32.538| 122.422 >> > > >> > > Now, I can hit a key on the telnet session without effects over latency >> > > numbers: >> > > >> > > RTD| 44.136| 57.989| 75.386| 0| 27.384| 128.221 >> > > RTD| 46.713| 57.345| 76.353| 0| 27.384| 128.221 >> > > RTD| 44.780| 57.345| 76.675| 0| 27.384| 128.221 >> > > RTD| 43.492| 56.701| 76.997| 0| 27.384| 128.221 >> > > >> > > Now I stop the calibrator process and launch 'ping -f -s2048 192.168.2.82' from an external >> > > machine. >> > > >> > > RTD| 40.270| 68.621| 90.850| 0| 27.384| 128.221 >> > > RTD| 36.082| 68.621| 88.273| 0| 27.384| 128.221 >> > > RTD| 40.592| 67.976| 91.494| 0| 27.384| 128.221 >> > > RTD| 41.237| 68.298| 89.239| 0| 27.384| 128.221 >> > > >> > > >> > > Now If hit a key on the telnet session : >> > > >> > > RTD| 42.203| 67.976| 88.273| 0| 27.384| 128.221 >> > > RTD| 32.216| 93.427| 128.543| 0| 27.384| 128.543 <---------- Here I've hit the key. >> > > RTD| 42.203| 68.298| 87.628| 0| 27.384| 128.543 >> > > >> > > And again the calibrator execution results on eliminate the strange >> > > behaviour whith the telnet session. >> > > >> > > Any clues ? >> > >> > Here is an update, follow-up on xenomai-core. I was finally able to >> > reproduce this behaviour: I run latency in the background and hit the >> > "Enter" key on my serial console, and get high latency figures. >> > >> > I enabled the tracer, set xenomai latency to 300us and managed to get a >> > trace (220us latency). However, I do not understand what is going wrong >> > from reading the trace, so I post it here in case someone sees something. >> > >> > Ah, and I added an ipipe_trace_special in ipipe_grab_irq to log the >> > number of the received irq. 1 is serial interrupt 18 (0x12) is timer >> > interrupt. >> > >> > Inline, so that Jan can comment it. >> >> Thanks, but TB is too "smart" - it cuts off everything that is marked as >> footer ("--"). :-/ >> >> >> > I-pipe frozen back-tracing service on 2.6.20/ipipe-1.8-04 >> > ------------------------------------------------------------ >> > CPU: 0, Freeze: 450692973 cycles, Trace Points: 1000 (+10) >> > Calibrated minimum trace-point overhead: 1.000 us >> >> That is interesting. I tells us that we might subtract 1 us >> _per_tracepoint_ from the given latencies due to the inherent tracer >> overhead. We have about 50 entries in the critical path, so 50 us >> compared to 220 us that were measured - roughly 170 us real latency. >> >> What is the clock resolution btw? 500 ns? > > It should be around 1us. A bit less. I think the frequency is 92 MHz / > 64. Anyway, I checked once and the I-pipe rounding is really > approximative, it is equivalent to rounding frequency to 2 MHz. > >> So here is the interesting block, starting with the last larger IRQs-on >> window. >> >> >> > : + func -447+ 2.500 xnshadow_relax+0x14 (hisyscall_event+0x210) >> > :| + begin 0x80000000 -445+ 3.000 xnshadow_relax+0xd4 (hisyscall_event+0x210) >> > :| # func -442+ 5.000 schedule_linux_call+0x10 (xnshadow_relax+0x114) >> > :| # func -437+ 4.000 rthal_apc_schedule+0x10 (schedule_linux_call+0x1e8) >> > :| # func -433+ 5.000 __ipipe_schedule_irq+0x10 (rthal_apc_schedule+0xac) >> > :| # func -428+ 4.500 __ipipe_set_irq_pending+0x10 (__ipipe_schedule_irq+0xa4) >> > :| # func -423+ 3.500 rpi_push+0x14 (xnshadow_relax+0x11c) >> > :| # func -420+ 5.500 xnpod_suspend_thread+0x14 (xnshadow_relax+0x148) >> > :| # func -414+ 4.000 xnpod_schedule+0x14 (xnpod_suspend_thread+0x60c) >> > :| # [ 752] -- 0 -410+ 7.000 xnpod_schedule+0xc8 (xnpod_suspend_thread+0x60c) >> > :| # func -403! 56.000 xnheap_finalize_free_inner+0x10 (xnpod_schedule+0x82c) >> > :| # [ 0] -- -1 -347! 20.000 xnpod_schedule+0xb14 (xnintr_clock_handler+0xa0) >> > :| +func -327+ 3.000 __ipipe_walk_pipeline+0x10 (__ipipe_handle_irq+0x124) >> > :| +func -324+ 7.500 __ipipe_sync_stage+0x14 (__ipipe_walk_pipeline+0xa8) >> > :| #end 0x80000000 -317+ 7.000 __ipipe_sync_stage+0x250 (__ipipe_walk_pipeline+0xa8) >> >> OK, the clock starts ticking... >> >> >> > :| #func -310+ 2.500 __ipipe_grab_irq+0x10 (__irq_svc+0x28) >> > :| #begin 0xffffffff -307+ 7.000 __ipipe_grab_irq+0x20 (__irq_svc+0x28) >> > :| #(0x2a) 0x00000012 -300+ 5.000 __ipipe_grab_irq+0x2c (__irq_svc+0x28) >> > :| #func -295+ 4.000 __ipipe_handle_irq+0x10 (__ipipe_grab_irq+0x104) >> > :| #func -291+ 2.500 __ipipe_ack_timerirq+0x10 (__ipipe_handle_irq+0x74) >> > :| #func -289+ 3.000 __ipipe_ack_level_irq+0x10 (__ipipe_ack_timerirq+0x30) >> > :| #func -286+ 2.000 at91_aic_mask_irq+0x10 (__ipipe_ack_level_irq+0x3c) >> > :| #func -284+ 2.000 at91_aic_mask_irq+0x10 (__ipipe_ack_level_irq+0x4c) >> >> (Without looking at the arm code: Is this double invocation of >> at91_aic_mask_irq correct and required?) > > No, and I do not see where it is coming from. > >> >> > :| #func -282+ 4.000 __ipipe_mach_acktimer+0x10 (__ipipe_ack_timerirq+0x40) >> > :| #func -278+ 2.000 __ipipe_end_level_irq+0x10 (__ipipe_ack_timerirq+0x50) >> > :| #func -276+ 2.500 at91_aic_unmask_irq+0x10 (__ipipe_end_level_irq+0x28) >> > :| #func -273+ 3.500 __ipipe_dispatch_wired+0x14 (__ipipe_handle_irq+0x80) >> > :| #*func -270+ 3.500 xnintr_clock_handler+0x10 (__ipipe_dispatch_wired+0xe4) >> > :| #*func -266+ 9.500 xntimer_tick_aperiodic+0x14 (xnintr_clock_handler+0x34) >> > :| #*func -257+ 3.500 xnthread_periodic_handler+0x10 (xntimer_tick_aperiodic+0x354) >> > :| #*func -253+ 4.000 xnpod_resume_thread+0x14 (xnthread_periodic_handler+0x34) >> > :| #*[ 753] -- 99 -249! 15.000 xnpod_resume_thread+0x84 (xnthread_periodic_handler+0x34) >> >> Hmm, comparably costly, this simple resume. Hope it's not the >> instrumentation (ipipe_trace_pid?) itself. >> >> >> > :| #*func -234+ 6.500 xntimer_next_local_shot+0x10 (xntimer_tick_aperiodic+0x7c0) >> > :| #*func -228+ 4.000 __ipipe_mach_set_dec+0x10 (xntimer_next_local_shot+0xbc) >> > :| #*func -224+ 3.500 xnpod_schedule+0x14 (xnintr_clock_handler+0xa0) >> > :| #*[ 0] -- -1 -220! 59.500 xnpod_schedule+0xc8 (xnintr_clock_handler+0xa0) >> >> OK, this is the cache flushing thing, I guess. Expected. > > Yes, I took care of disabling the (still recent) irqs unmasking in > xnpod_schedule. So, we see the cache flushing with irqs disabled. > >> >> > :| #*[ 753] -- 99 -161! 17.500 xnpod_schedule+0xb14 (xnpod_suspend_thread+0x60c) >> >> Again this peak. Strange. >> >> >> > :| #*func -143+ 4.000 xntimer_get_overruns+0x14 (xnpod_wait_thread_period+0xe0) >> > :| #*func -139+ 4.000 __ipipe_restore_pipeline_head+0x10 (xnpod_wait_thread_period+0x114) >> > :| +*end 0x80000000 -135+ 7.000 __ipipe_restore_pipeline_head+0x11c (xnpod_wait_thread_period+0x114) >> >> So we woke up the thread and are on the way back to userland. But then >> the serial IRQ hits us... >> >> >> > :| +*func -128+ 3.000 __ipipe_grab_irq+0x10 (__irq_svc+0x28) >> > :| +*begin 0xffffffff -125+ 7.000 __ipipe_grab_irq+0x20 (__irq_svc+0x28) >> > :| +*(0x2a) 0x00000001 -118+ 4.000 __ipipe_grab_irq+0x2c (__irq_svc+0x28) >> > :| +*func -114+ 5.000 __ipipe_handle_irq+0x10 (__ipipe_grab_irq+0x104) >> > :| +*func -109+ 4.500 __ipipe_set_irq_pending+0x10 (__ipipe_handle_irq+0xdc) >> > :| +*func -105+ 2.000 __ipipe_ack_irq+0x10 (__ipipe_handle_irq+0xfc) >> > :| +*func -103+ 3.000 __ipipe_ack_level_irq+0x10 (__ipipe_ack_irq+0x28) >> > :| +*func -100+ 2.500 at91_aic_mask_irq+0x10 (__ipipe_ack_level_irq+0x3c) >> > :| +*func -97+ 2.500 at91_aic_mask_irq+0x10 (__ipipe_ack_level_irq+0x4c) >> > :| +*func -95+ 3.000 __ipipe_walk_pipeline+0x10 (__ipipe_handle_irq+0x124) >> >> Serial done, the above 30 us (minus tracer overhead) are unavoidable. >> >> >> > :| +*end 0xffffffff -92! 10.500 __ipipe_grab_irq+0x110 (__irq_svc+0x28) >> > :| +*begin 0x80000001 -81+ 4.000 __ipipe_dispatch_event+0x108 (__ipipe_syscall_root+0x84) >> >> This is the syscall epilogue related to wait_thread_period. > > Is it normal to see the hw irqs disabled here ? They were likely only disabled at the instrumentation point, but were enabled shortly afterwards - just to be disabled again. > >> >> > :| +*end 0x80000001 -77! 39.500 __ipipe_dispatch_event+0x1cc (__ipipe_syscall_root+0x84) >> > : +*func -38+ 2.500 __ipipe_syscall_root+0x10 (vector_swi+0x68) >> >> After doing some latency calculation in user land, we are back to freeze >> the trace now. >> >> >> > : +*func -35+ 2.000 __ipipe_dispatch_event+0x14 (__ipipe_syscall_root+0x84) >> > :| +*begin 0x80000001 -33+ 4.000 __ipipe_dispatch_event+0x40 (__ipipe_syscall_root+0x84) >> > :| +*end 0x80000001 -29+ 2.000 __ipipe_dispatch_event+0xc8 (__ipipe_syscall_root+0x84) >> > : +*func -27+ 6.500 hisyscall_event+0x14 (__ipipe_dispatch_event+0xe0) >> > : +*func -21+ 2.500 xnshadow_sys_trace+0x10 (hisyscall_event+0x150) >> > : +*func -18+ 2.000 ipipe_trace_frozen_reset+0x10 (xnshadow_sys_trace+0x98) >> > : +*func -16+ 2.000 __ipipe_global_path_lock+0x10 (ipipe_trace_frozen_reset+0x14) >> > : +*func -14+ 2.000 __ipipe_spin_lock_irqsave+0x10 (__ipipe_global_path_lock+0x18) >> > :| +*begin 0x80000001 -12+ 8.000 __ipipe_spin_lock_irqsave+0x34 (__ipipe_global_path_lock+0x18) >> > :| #*func -4+ 2.500 __ipipe_spin_unlock_irqcomplete+0x10 (__ipipe_global_path_unlock+0x6c) >> > :| +*end 0x80000001 -2+ 2.000 __ipipe_spin_unlock_irqcomplete+0x4c (__ipipe_global_path_unlock+0x6c) >> > < +*freeze 0x00035ea1 0 2.500 xnshadow_sys_trace+0xa4 (hisyscall_event+0x150) >> >> So, all in all, nothing fishy, at least here. Any IRQ (actually even >> multiple!) can hit us between wakeup and latency calculation, and this >> happened here due to sending some characters over the serial console. >> Plus we see the cache flushing overhead when switching processes, also >> normal. > > The problem is that the nucleus latency is calibrated to 300 us (which > is very high), and that we get a 220us latency, which means that it > took 500us going from the timer interrupt to the xntrace syscall. There is a shadow relax procedure running before the timer IRQ fires, and that takes another context switch. So the latency sum is: - unrelated context switch - timer IRQ - switch to woken up RT process - serial IRQ Almost the theoretical worst case. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux