From mboxrd@z Thu Jan 1 00:00:00 1970 Resent-Message-ID: <17550.44528.674284.147085@domain.hid> Resent-To: xenomai@xenomai.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17550.43146.445745.213316@domain.hid> In-Reply-To: <448E98A3.6080707@domain.hid> References: <448E98A3.6080707@domain.hid> From: Gilles Chanteperdrix Subject: Re: [Xenomai-core] ns vs. tsc as internal timer base Date: Tue, 13 Jun 2006 13:59:06 +0200 List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Jan Kiszka wrote: > Hi, > > between some football half-times of the last days ;), I played a bit > with a hand-optimised xnarch_tsc_to_ns() for x86. Using scaled math, I > achieved between 3 (P-I 133 MHz) to 4 times (P-M 1.3 GHz) faster > conversions than with the current variant. While this optimisation only > saves a few ten nanoseconds on high-end, slow processors can gain > several hundreds of nanos per conversion (my P-133: -600 ns). Some time ago, I did also some experiment on avoiding divisions. I came to a solution that precompute fractions using a real division, and that only use additions, multiplication and shifts for imuldiv and ullimd. I thought there would be no loss in accuracy, but well, sometimes the last bit is wrong. Anyway, here is the code if you want to benchmark it, div96by32 and u64(to|from)u32 are defined in asm-i386/hal.h or asm-generic/hal.h: typedef struct { unsigned long long frac; /* Fractionary part. */ unsigned long integ; /* Integer part. */ } u32frac_t; /* m/d == integ + frac / 2^64 */ void precalc(u32frac_t *const f, const unsigned long m, const unsigned long d) { f->integ = m > d ? m / d :0; f->frac = div96by32(u64fromu32(m % d, 0), 0, d, NULL); } inline unsigned long nodiv_imuldiv(unsigned long op, u32frac_t f) { const unsigned long tmp = (ullmul(op, f.frac >> 32)) >> 32; if(f.integ) return tmp + op * f.integ; return tmp; } #define add64and32(h, l, s) do { \ __asm__ ("addl %2, %1\n\t" \ "adcl $0, %0" \ : "+r"(h), "+r"(l) \ : "r"(s)); \ } while(0) #define add96and64(l0, l1, l2, s0, s1) do { \ __asm__ ("addl %4, %2\n\t" \ "adcl %3, %1\n\t" \ "adcl $0, %0\n\t" \ : "+r"(l0), "+r"(l1), "+r"(l2) \ : "r"(s0), "r"(s1)); \ } while(0) inline unsigned long long mul64by64_high(const unsigned long long op, const unsigned long long m) { /* Compute high 64 bits of multiplication 64 bits x 64 bits. */ unsigned long long t1, t2, t3; u_long oph, opl, mh, ml, t0, t1h, t1l, t2h, t2l, t3h, t3l; u64tou32(op, oph, opl); u64tou32(m, mh, ml); t0 = ullmul(opl, ml) >> 32; t1 = ullmul(oph, ml); u64tou32(t1, t1h, t1l); add64and32(t1h, t1l, t0); t2 = ullmul(opl, mh); u64tou32(t2, t2h, t2l); t3 = ullmul(oph, mh); u64tou32(t3, t3h, t3l); add64and32(t3h, t3l, t2h); add96and64(t3h, t3l, t2l, t1h, t1l); return u64fromu32(t3h, t3l); } inline unsigned long long nodiv_ullimd(const unsigned long long op, const u32frac_t f) { const unsigned long long tmp = mul64by64_high(op, f.frac); if(f.integ) return tmp + op * f.integ; return tmp; } -- Gilles Chanteperdrix.