From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <44CB6EB3.5070707@domain.hid> Date: Sat, 29 Jul 2006 16:20:35 +0200 From: Jan Kiszka MIME-Version: 1.0 Subject: Re: [Xenomai-help] Beginner's question / testsuite / latency References: <442248c90607201417m24729b7cs23a8b82b719ff1cc@domain.hid> <200607221152.34298.bidsonux@domain.hid> <44C25DB0.50601@domain.hid> <200607282317.34990.bidsonux@domain.hid> In-Reply-To: <200607282317.34990.bidsonux@domain.hid> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigEF6510CE6EF1EB8DA25644A2" Sender: jan.kiszka@domain.hid List-Id: Help regarding installation and common use of Xenomai List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Julien Heyman Cc: xenomai@xenomai.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigEF6510CE6EF1EB8DA25644A2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Julien Heyman wrote: > Hi, >=20 > here's a quick follow-up on my issue, thanks for your time ! >=20 > Julien Heyman >=20 >=20 > On Saturday 22 July 2006 19:17, Jan Kiszka wrote: >> Julien Heyman wrote: >>> Hi, >>> >>> On Thursday 20 July 2006 23:58, Jan Kiszka wrote: >>>> Julien Heyman wrote: >>>>> Hi, >>>>> >>>>> I am currently trying to evaluate Xenomai, and my current setup is = : >>>>> VIA C3 533Mhz processor, Kubuntu 6.06 Linux distribution. >>>>> I am using Xenomai-2.1.0 over a 2.6.17.4 kernel. >>>> General advice: especially when starting new, try to pick the latest= >>>> version, at least the latest from the preferred series (here 2.1.2).= But >>>> 2.2 is even better. :) >>> OK. Actually started with Xenomai-2.1.0 then switched to 2.1.1 to fix= a >>> kernel build error. But anyway, I will start with a fresh install of = 2.2 >>> as soon as I have figured out the other points :) >>> >>>>> When I run the latency part of the testsuite (in a console under KD= E), >>>>> I get >>>>> results that I cannot understand, so I probably did something wrong= >>>>> (execution trace included below). >>>>> I get reasonable values during the first seconds, then all of a sud= den >>>>> latencies begin to rise, continuously, to very large values. >>>>> - I did check that DMA transfer is activated on my HD. >>>>> - I did select "Enable SMI workaround" + "Globally disable SMI" in = the >>>>> Xenomai >>>>> options while configuring the kernel. >>>>> - I have disabled power management at BIOS level and disabled ACPI >>>>> support and >>>>> CPU frequency scaling during kernel configuration. >>>>> - I checked that I don't have anything called "legacy USB" in my BI= OS. >>>>> I do have an "OnChip USB" option enable in the BIOS though. >>>>> >>>>> Any advice would be appreciated ! >>>> Maybe it's related to some other weird on-chip hardware. At work we = run >>>> Xenomai only on a head-less VIA C3 box, i.e. without X. No problems = so >>>> far. I would suggest to try stopping X and run the test from the tex= t >>>> mode. >>> This is interesting : indeed when I switch to console mode (Ctrl+Atl+= F1) >>> and run the test, the latency values stay right on track. >>> If I let the test run, switch to X, and switch back to the original >>> console, the values have gone wild in the meantime. >>> So there seems to be a strong link with X. >>> What does that say about potential causes of my issue ? >> Don't know. Might be a weird hardware design (wrt bus latencies) - or = do >> you=20 >> use any binary-only driver for X?=20 >=20 >=20 > I did not install any specific driver, I probably use the standard Trid= ent=20 > driver that came with the Kubuntu distribution. >=20 >=20 >>>> A further tool to analyse such effects in details is the I-pipe trac= er. >>>> It's an additional patch you have to apply to your kernel (see >>>> http://download.gna.org/adeos/patches/v2.6/i386/tracer). Enable this= >>>> I-pipe option, rebuild your kernel, and start the latency test with = -f. >>>> The test will then capture on every new worst-case delay a backtrace= to >>>> /proc/ipipe/trace/frozen. You may want to play with the number of >>>> back-trace points or the verbose mode (see /proc/ipipe/trace/*) even= >>>> after the capturing. >>> I had already used an adeos patch during my initial kernel-patching >>> (adeos-ipipe-2.6.17-i386-1.3-07.patch) >> The tracer is an additional patch on top of I-pipe, see link above. >> >>> I tried running the latency test using /usr/xenomai/bin/xeno-load lat= ency >>> -f but I get this error : >>> >>> =3D=3D Sampling period: 100 us >>> =3D=3D Test mode: periodic user-mode task >>> =3D=3D All results in microseconds >>> latency: failed to open benchmark device, code -19 >>> (modprobe xeno_timerbench?) >>> >>> What am I doing wrong ? >> For < 2.2: check if CONFIG_XENO_DRIVERS_TIMERBENCH is y or m. If it's = a >> module, follow the suggestion latency printed. >> >> In 2.2 the required tracing interface was moved to the nucleus, no nee= d >> for these steps then. >=20 >=20 > Ok, I installed Xenomai 2.2 from scratch over a freshly extracted kerne= l=20 > source tree, used the tracer patch ipipe-tracer-2.6.17-i386-1.3-08.patc= h,=20 > rebuilt the kernel and relaunch the test : I still get increasing laten= cies. >=20 > I attach a copy of the latency test run, and the associated traces in=20 > files "max" and "frozen" from /proc/ipipe/trace >=20 > I have no clue how to interpret theses traces, is there a documentation= /user's=20 > guide available for the ipipe tracer ? Nope, I skipped this "boring" part so far. But I will provide some explanations below. >=20 > Regarding Gilles' comments on this topic :=20 >=20 >> Losing too many ticks! >> TSC cannot be used as a timesource. =C2=20 >> Possible reasons for this are: >> =C2 You're running with Speedstep, >> =C2 You don't have DMA enabled for your hard disk (see hdparm), >> =C2 Incorrect TSC synchronization on an SMP system (see dmesg). >> Falling back to a sane timesource now. >=20 >> Do you see such message in kernel logs ? >=20 > No, I don't get any of these. >=20 >> If this is the issue you are having, then you may try passing idle=3Dp= oll >> on kernel boot line. Or recompile the kernel selecting a processor tha= t >> has no tsc (for example 486), xenomai will then use the 8254 counter t= o >> emulate the tsc. >=20 > I tried to rebuild with CPU=3D486, but it gives the same result. >=20 >> Jan >> >> ----------------------------------------------------------------------= -- >> >> bids@domain.hid$ sudo /usr/xenomai/bin/xen= o-load latency -f >> Password: >> * >> * >> * Type ^C to stop this application. >> * >> * >> =3D=3D Sampling period: 100 us >> =3D=3D Test mode: periodic user-mode task >> =3D=3D All results in microseconds >> warming up... >> RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99) >> RTH|-----lat min|-----lat avg|-----lat max|-overrun|----lat best|---la= t worst >> RTD| 21.428| 23.956| 38.249| 0| 21.428| = 38.249 >> RTD| 22.419| 24.491| 31.760| 0| 21.428| = 38.249 >> RTD| 21.833| 23.731| 52.247| 0| 21.428| = 52.247 >> RTD| 21.776| 23.733| 45.060| 0| 21.428| = 52.247 >> RTD| 21.467| 24.434| 40.813| 0| 21.428| = 52.247 >> RTD| 21.613| 24.684| 40.517| 0| 21.428| = 52.247 >> RTD| 21.585| 25.503| 51.990| 0| 21.428| = 52.247 >> RTD| 21.212| 25.497| 44.028| 0| 21.212| = 52.247 >> RTD| 21.587| 25.535| 119.516| 1| 21.212| = 119.516 >> RTD| 21.604| 26.543| 225.722| 3| 21.212| = 225.722 >> RTD| 21.958| 26.575| 48.329| 3| 21.212| = 225.722 >> RTD| 22.247| 25.216| 57.580| 3| 21.212| = 225.722 >> RTD| 22.253| 25.017| 53.521| 3| 21.212| = 225.722 >> RTD| 22.376| 24.824| 54.095| 3| 21.212| = 225.722 >> RTD| 22.256| 24.655| 55.816| 3| 21.212| = 225.722 >> RTD| 22.324| 24.682| 48.350| 3| 21.212| = 225.722 >> RTD| 21.454| 24.519| 342.492| 9| 21.212| = 342.492 >> RTD| 21.578| 23.617| 49.107| 9| 21.212| = 342.492 >> RTD| 21.621| 23.611| 38.444| 9| 21.212| = 342.492 >> RTD| 21.625| 23.958| 192.672| 10| 21.212| = 342.492 >> RTD| 21.656| 23.675| 34.820| 10| 21.212| = 342.492 >> RTT| 00:00:22 (periodic user-mode task, 100 us period, priority 99) >> RTH|-----lat min|-----lat avg|-----lat max|-overrun|----lat best|---la= t worst >> RTD| 21.688| 23.626| 36.496| 10| 21.212| = 342.492 >> RTD| 21.527| 23.720| 49.283| 10| 21.212| = 342.492 >> RTD| 21.643| 23.711| 43.028| 10| 21.212| = 342.492 >> ---|------------|------------|------------|--------|------------------= ------- >> RTS| 21.212| 24.562| 342.492| 10| 00:00:24/00:00= :24 >> >> >> cat frozen :=20 This is the trace the latency application triggered on the maximum delay (342.492 us). You should extend the back-trace period to cover at least the observed latency (say up to -400 us). Try "echo 500 > /proc/ipipe/trace/back_trace_points" for this. You can furthermore switch on more verbosity with echo 1 > /proc/ipipe/trace/verbose". >> >> Freeze: 4770308881064 cycles, Trace Points: 30 (+10) >> >> +--------------- Hard IRQs ('|': locked) >> | +- Delay flag ('+': > 1 us, '!': > 10 us) >> | | >> Type Time Function (Parent) >> :|func -35 xnpod_schedule (xnintr_irq_handler) >> :|[ 3096] -34+ xnpod_schedule (xnintr_irq_handler) This traces the process (pid) being switched out. >> :|func -32+ __switch_to (xnpod_schedule) >> :|[21768] -30+ xnpod_schedule (xnpod_suspend_thread) This pid gets switched in, the latency test (more precisely the sampling thread). >> :|func -28 __ipipe_restore_pipeline_head (xnpod_wait_thread_per= iod) >> :|end -28+ __ipipe_restore_pipeline_head (xnpod_wait_thread_per= iod) >> :|begin -26 __ipipe_dispatch_event (__ipipe_syscall_root) >> :|end -25+ __ipipe_dispatch_event (__ipipe_syscall_root) >> : func -22 __ipipe_syscall_root (system_call) This is roughly the point when the latency test takes its timestamp to calculate the delay. This syscall here is then issued to convert TSCs into nanoseconds. >> : func -21 __ipipe_dispatch_event (__ipipe_syscall_root) >> :|begin -21 __ipipe_dispatch_event (__ipipe_syscall_root) >> :|end -20 __ipipe_dispatch_event (__ipipe_syscall_root) >> : func -19+ hisyscall_event (__ipipe_dispatch_event) >> : func -18 __rt_timer_tsc2ns (hisyscall_event) >> : func -17 __copy_from_user_ll (__rt_timer_tsc2ns) >> : func -16+ rt_timer_tsc2ns (__rt_timer_tsc2ns) >> : func -15 __copy_to_user_ll (__rt_timer_tsc2ns) >> :|begin -14+ __ipipe_dispatch_event (__ipipe_syscall_root) >> :|end -13+ __ipipe_dispatch_event (__ipipe_syscall_root) >> : func -11 __ipipe_syscall_root (system_call) >> : func -11 __ipipe_dispatch_event (__ipipe_syscall_root) >> :|begin -10+ __ipipe_dispatch_event (__ipipe_syscall_root) >> :|end -9 __ipipe_dispatch_event (__ipipe_syscall_root) >> : func -8+ hisyscall_event (__ipipe_dispatch_event) >> : func -7+ xnshadow_sys_trace (hisyscall_event) >> : func -5 ipipe_trace_frozen_reset (xnshadow_sys_trace) >> : func -4 __ipipe_global_path_lock (ipipe_trace_frozen_reset) >> :|begin -3+ __ipipe_global_path_lock (ipipe_trace_frozen_reset) >> :|end 0 __ipipe_global_path_unlock (ipipe_trace_frozen_reset= ) >> < freeze 0 xnshadow_sys_trace (hisyscall_event) And here we freeze after detecting a new worst-case (also issued as a syscall). >> |begin 1 __ipipe_dispatch_event (__ipipe_syscall_root) >> |end 1 __ipipe_dispatch_event (__ipipe_syscall_root) >> func 3 __ipipe_syscall_root (system_call) >> func 3 __ipipe_dispatch_event (__ipipe_syscall_root) >> |begin 4 __ipipe_dispatch_event (__ipipe_syscall_root) >> |end 5 __ipipe_dispatch_event (__ipipe_syscall_root) >> func 5 hisyscall_event (__ipipe_dispatch_event) >> func 6 __rt_task_wait_period (hisyscall_event) >> func 7 rt_task_wait_period (__rt_task_wait_period) >> func 8 xnpod_wait_thread_period (rt_task_wait_period) >> >> >> cat max :=20 This trace captured the maximum time the system spent in IRQs-off context= =2E >> Begin: 943575374801 cycles, Trace Points: 218 (-10/+1), Length: 376 us= And that period took 376 us, a pathological result. It's a bit more than what latency observed, but that's normal for a short test cycle. >> >> +--------------- Hard IRQs ('|': locked) >> | +- Delay flag ('+': > 1 us, '!': > 10 us) >> | | >> Type Time Function (Parent) >> |func -19650 xnpod_schedule (xnpod_suspend_thread) >> |[21559] -19649 xnpod_schedule (xnpod_suspend_thread) >> |func -19647 __switch_to (xnpod_schedule) >> |[ 3096] -19646 xnpod_schedule (xnintr_irq_handler) >> |func -19644 __ipipe_walk_pipeline (__ipipe_handle_irq) >> |end -19643 common_interrupt () >> |func -19643 __ipipe_stall_root (resume_userspace) >> func -19642 __ipipe_unstall_iret_root (restore_raw) >> |begin -19641 __ipipe_unstall_iret_root (restore_raw) >> |end -19640 __ipipe_unstall_iret_root (restore_raw) >>> |begin 0+ common_interrupt (<080c4fce>) Here we start our critical region. The interesting thing is that there are almost 20 ms(!) between the last kernel activity and the interrupt entry. Likely a sign for hardware-related IRQ delays (unless some privileged user-space program does dirty dancing with cli, i.e. hard IRQ blocking). >> :|func 1 __ipipe_handle_irq (common_interrupt) >> :|func 1 __ipipe_ack_common_irq (__ipipe_handle_irq) >> :|func 2 ipipe_test_and_stall_pipeline_from (__ipipe_ack_comm= on_irq) >> :|func 3+ mask_and_ack_8259A (__ipipe_ack_common_irq) >> :|func 5 __ipipe_dispatch_wired (__ipipe_handle_irq) >> :|func 6 xnintr_clock_handler (__ipipe_dispatch_wired) >> :|func 6 xnintr_irq_handler (xnintr_clock_handler) >> :|func 7 xnpod_announce_tick (xnintr_irq_handler) >> :|func 8+ xntimer_do_tick_aperiodic (xnpod_announce_tick) >> :|func 9 xnthread_periodic_handler (xntimer_do_tick_aperiodic= ) >> :|func 10 xnpod_resume_thread (xnthread_periodic_handler) >> :|[21559] 11+ xnpod_resume_thread (xnthread_periodic_handler) >> :|func 13+ xnthread_periodic_handler (xntimer_do_tick_aperiodic= ) =2E.. >> :|func 363+ xnthread_periodic_handler (xntimer_do_tick_aperiodic= ) That are a lot of overruns. Haven't counted, but it should be one xnthread_periodic_handler per missed 100 us period (20000 / 100 =3D 200!)= =2E [BTW, I think we should handle even this failure scenario without looping. And there is another loop in xnpod_wait_thread_period.] >> :|func 369+ xnpod_schedule (xnintr_irq_handler) >> :|[ 3096] 370+ xnpod_schedule (xnintr_irq_handler) >> :|func 372+ __switch_to (xnpod_schedule) >> :|[21559] 374+ xnpod_schedule (xnpod_suspend_thread) >> :|func 375 __ipipe_restore_pipeline_head (xnpod_wait_thread_per= iod) >> <|end 376 __ipipe_restore_pipeline_head (xnpod_wait_thread_per= iod) >> |begin 377 __ipipe_dispatch_event (__ipipe_syscall_root) >> No final judgement yet, but it looks to me like your hardware doesn't want to play with us. To exclude the possibility that the timer events get programmed wrongly as TSCs are skewed up, you could also try the irqbench test case. All you need is a nullmodem cable and a second PC with RS232 port, see doc/txt/irqbench.txt. And that test can also take back-traces. Jan --------------enigEF6510CE6EF1EB8DA25644A2 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFEy26zniDOoMHTA+kRAn3gAJ0Wa/CWm4NxpchMObM1hhDGzK3PSACfdZD0 J7hrxUpfdolqMSyczKUFsCI= =jai/ -----END PGP SIGNATURE----- --------------enigEF6510CE6EF1EB8DA25644A2--