From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <448EBE8C.60900@domain.hid> Date: Tue, 13 Jun 2006 15:33:00 +0200 From: Jan Kiszka MIME-Version: 1.0 Subject: Re: [Xenomai-core] ns vs. tsc as internal timer base References: <448E98A3.6080707@domain.hid> <448E9E8B.70809@domain.hid> <448EA7F7.5000802@domain.hid> <448EB038.8070802@domain.hid> In-Reply-To: <448EB038.8070802@domain.hid> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig5C3DBC900FD03E92B0377009" Sender: jan.kiszka@domain.hid List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Philippe Gerum Cc: xenomai-core This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig5C3DBC900FD03E92B0377009 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable Philippe Gerum wrote: > Jan Kiszka wrote: >> Philippe Gerum wrote: >>> from i386/kernel/timers/timer_tsc.c. And indeed, I had x 20 performan= ce >>> improvements in some cases. >> >> Oops, that sounds like a bit too extreme optimisations. Is the origina= l >> version varying that much? I didn't observe this. >> >> Here is my current version, BTW: >> >> long tsc_scale; >> unsigned int tsc_shift =3D 31; >> >> static inline long long fast_tsc_to_ns(long long ts) >> { >> long long ret; >> >> __asm__ ( >> /* HI =3D HIWORD(ts) * tsc_scale */ >> "mov %%eax,%%ebx\n\t" >> "mov %%edx,%%eax\n\t" >> "imull %2\n\t" >> "mov %%eax,%%esi\n\t" >> "mov %%edx,%%edi\n\t" >> >> /* LO =3D LOWORD(ts) * tsc_scale */ >> "mov %%ebx,%%eax\n\t" >> "mull %2\n\t" >> >> /* ret =3D (HI << 32) + LO */ >> "add %%esi,%%edx\n\t" >> "adc $0,%%edi\n\t" >> >> /* ret =3D ret >> tsc_shift */ >> "shrd %%cl,%%edx,%%eax\n\t" >> "shrd %%cl,%%edi,%%edx\n\t" >> : "=3DA"(ret) >> : "A" (ts), "m" (tsc_scale), "c" (tsc_shift) >> : "ebx", "esi", "edi"); >> >> return ret; >> } >> >> void init_tsc(unsigned long cpu_freq) >> { >> unsigned long long scale; >> >> while (1) { >> scale =3D do_div(1000000000LL << tsc_shift, cpu_freq); >> if (scale <=3D 0x7FFFFFFF) >> break; >> tsc_shift--; >> } >> tsc_scale =3D scale; >> } >> >> This version will use 31 (GHz cpu_freq) to 26 (~32 MHz) shifts, i.e. a= >> bit more than the Linux kernel's 22 bits. >> >=20 > Here is likely why we have different levels of accuracy and performance= , > firstly my version is bluntly based on the khz freq, secondly it > calculates the other way around, i.e. ns2tsc, so that tsc are keep in > the inner code, but more efficiently converted from ns counts passed to= > the outer interface: >=20 > static unsigned long ns2cyc_scale; > #define NS2CYC_SCALE_FACTOR 10 /* 2^10, carefully chosen */ Linux only uses 10 bits for scheduling time calculation, which is tick-based (low-res) anyway. The tsc clock_source uses 22 bits. The latter overflows after an hour or so, because they drop all bits > 64 after the multiplication - insignificantly faster when using optimised code anyway. >=20 > static inline void set_ns2cyc_scale(unsigned long cpu_khz) > { > ns2cyc_scale =3D (cpu_khz << NS2CYC_SCALE_FACTOR) / 1000000; > } >=20 > static inline unsigned long long ns_2_cycles(unsigned long long ns) > { > return ns * ns2cyc_scale >> NS2CYC_SCALE_FACTOR; > } >=20 >>> >>> TSC are not the whole nucleus time base, but only the timer managemen= t >>> one. The motivation to use TSCs in nucleus/timer.c was to pick a unit= >>> which would not require any conversion beyond the initial one in >>> xntimer_start. >> >> >> That helps strictly periodic application timers, not aperiodic ones li= ke >> timeouts. >> >=20 > It depends, periodic timers usually exhibit larger delays, so the gain > is more significant with oneshot timings incurring smaller delays, henc= e > a higher number of calculations. >=20 >> >>>> Any pitfalls down the road (except introducing regressions)? >>> >>> Well, pitfalls expected from changing the core idea of time of the ti= mer >>> management code... :o> >>> >> >> You mean turning >> >> rthal_timer_program_shot(rthal_imuldiv(delay,RTHAL_TIMER_FREQ,RTHAL_CP= U_FREQ)); >> >> >> into >> >> rthal_timer_program_shot(rthal_imuldiv(delay,RTHAL_TIMER_FREQ,10000000= 00)); >> >> >=20 > Not really, it was a general remark about changing a code that might > have some assumtions on using TSCs. Additionally, only x86 needs to > rescale TSC values to the timer frequency, other archs use the same uni= t > on both sides, and such unit might even have nothing to do with any CPU= > accounting (e.g. blackfin uses a free running timer, ppc uses the > internal timebase, etc). Ok, an interesting aspect I already assumed but didn't check in details yet. That makes dealing with TSCs interesting again on !=3D x86. In contrast, on x86, there is the aspect of frequency scaling that Anders brought up and which would speak pro nanos. >=20 > This said, it should not have that many assumptions, and in any case, > they should be confined to nucleus/timers.c. I think we should give thi= s > kind of optimization a try. >=20 Yep, it just needs some more brain cycles how to do this precisely. Jan --------------enig5C3DBC900FD03E92B0377009 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEjr6MniDOoMHTA+kRAqc9AJsGjS8Klfw4owwc99SighKt+3PTGgCeLiyT NZBKIFtChlAhg/W/CVhNN2k= =3gIc -----END PGP SIGNATURE----- --------------enig5C3DBC900FD03E92B0377009--