* [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers
@ 2007-06-04 23:25 Jan Kiszka
2007-06-05 8:31 ` Jan Kiszka
0 siblings, 1 reply; 20+ messages in thread
From: Jan Kiszka @ 2007-06-04 23:25 UTC (permalink / raw)
To: xenomai-core
[-- Attachment #1: Type: text/plain, Size: 4737 bytes --]
An undated and extended version of my patch stack for I-pipe and Xenomai
is now available at
http://www.rts.uni-hannover.de/rtaddon/patches
There is once again some new stuff from my would-be-nice-in-2.4 list
contained, namely the timerstat /proc output and a preview on my current
rtdm_timer draft. Here is the overview of the content:
/ipipe-kernel
-------------
ipipe-janitorial.patch
[unchanged]
Removes useless hunks from the I-pipe patch, specifically over i386.
disable-context-check-v3.patch
disable-context-check-v2-i386.patch
[Fixed broken !CONFIG_IPIPE_DEBUG_CONTEXT build]
Infrastructure for temporarily or permanently disabling the context
checker. Applies this on ipipe_trace_panic_freeze() and NMI.
refactor-ipipe_walk_pipeline.patch
[broken-out ipipe_processor_id removal]
Remove cpuid from __ipipe_walk_pipeline parameters, remove fastcall.
cleanup-processor_id-i386.patch
[broken-out ipipe_processor_id removal]
Drop legacy code related to i386 ipipe_processor_id.
instrument-smp_processor_id.patch
instrument-smp_processor_id-i386.patch
[broken-out ipipe_processor_id removal]
Catch smp_processor_id invocations over non-root domains on archs
that retrieve the CPU number from the kernel stack. Archs without
this problem need to define IPIPE_STACK_INVARIANT_CPUID (only
support for i386 so far).
hard-irq-disable-on-suspend-resume.patch
[unchanged]
Old patch of mine to enable software-suspend over I-pipe.
add-ipipe_preempt_disable.patch
[unchanged]
Introduces ipipe_preempt_disable as an I-pipe-safe alternative to
preempt_disable. Required for kernel markers that come with LTTng.
prepare-lttng.patch
ltt-ipipe-v2.patch
[updated to use of ipipe_processor_id() where now required]
LTTng preparation and I-pipe adoption patches. See README.lttng for
more details.
/xenomai
--------
refactor-queue-init.patch
Refactor DECLARE_XNQUEUE to DEFINE_XNQUEUE. Break out
XNQUEUE_INITIALIZER.
cleanup-proc-stuff.patch
Remove xnskentry::proc, track proc registration via xnskentry::name.
Clean up redundant typecasts. Remove unneeded code in *_seq_next().
destroy-thread-timers.patch
Unconditionally destroy xnthread timers on thread deletion. Besides
the consistency aspect, timerstats.patch will require clean
destruction.
fast-tsc-to-ns-v2.patch
[Rebased, improved rounding of least significant digit]
Integration of my scaled-math-based xnarch_tsc_to_ns service for
i386 at least.
[RFC] timerstat.patch
Dump currently or previously active timers per timebase under
/proc/xenomai/timerstat. Output looks like this:
# cat /proc/xenomai/timerstat/master
CPU SCHEDULED FIRED TIMEOUT INTERVAL HANDLER NAME
0 5959 5958 1 4000000 NULL [host tick]
0 25 24 659464312 1000000000 xnpod_watch [watchdog]
0 368 367 5042333 10000000 xnthread_pe sampling-831
The idea is to have an overview of timer activity *on the target*,
just like we already have for threads. This can help to quickly get
an overview about
- how many timers there are on a system
- how they are programmed
- how often they are the scheduled -- and actually fired
- who may have installed them
[RFC] xntimer-monotonic.patch
Use a new flag, XNTIMER_MONOTONIC, to control if absolute timeouts
shall skip wallclock_offset correction on start, making them
independent of clock adjustment. So far only used by RTDM timers,
the POSIX /might/ be able to exploit it as well.
[PREVIEW] rtdm-timers.patch
Add rtdm_timer_* services, turn timerbench into the first user. This
patch also introduces monotonic timers to rtdm_task_* and adds the
new clock service rtdm_clock_read_monotonic. The whole thing about
monotonic clocks for drivers is due to my concerns that once we
start tuning the master timebase according to external sources, we
/might/ be happy to provide non-adjustable clock and timers for
device drivers that need strictly continuous timing.
librtutils.patch
[refreshed]
Contains rt_print services so far. Still open naming question.
[RFC] rtsystrace-v2.patch
[refreshed]
Proposal to add rt_print-based Xenomai syscall tracing. Looking for
a less code-invasive approach.
[PREVIEW] lttng.patch
[rebased]
Very rough patch to make LTTng work with Xenomai.
As usual: Testers and reviews are welcome, feedback is appreciated.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-04 23:25 [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers Jan Kiszka @ 2007-06-05 8:31 ` Jan Kiszka 2007-06-05 22:28 ` Gilles Chanteperdrix 2007-06-06 12:49 ` Jan Kiszka 0 siblings, 2 replies; 20+ messages in thread From: Jan Kiszka @ 2007-06-05 8:31 UTC (permalink / raw) To: xenomai-core [-- Attachment #1: Type: text/plain, Size: 384 bytes --] Jan Kiszka wrote: ... > fast-tsc-to-ns-v2.patch > > [Rebased, improved rounding of least significant digit] Rounding in the fast path for the sake of the last digit was silly. Instead, I'm now addressing the ugly interval printing via xnarch_precise_tsc_to_ns when converting the timer interval back into nanos. -v3 incorporating this has just been uploaded. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-05 8:31 ` Jan Kiszka @ 2007-06-05 22:28 ` Gilles Chanteperdrix 2007-06-06 10:30 ` Jan Kiszka 2007-06-06 12:49 ` Jan Kiszka 1 sibling, 1 reply; 20+ messages in thread From: Gilles Chanteperdrix @ 2007-06-05 22:28 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Jan Kiszka wrote: > ... > > fast-tsc-to-ns-v2.patch > > > > [Rebased, improved rounding of least significant digit] > > Rounding in the fast path for the sake of the last digit was silly. > Instead, I'm now addressing the ugly interval printing via > xnarch_precise_tsc_to_ns when converting the timer interval back into > nanos. -v3 incorporating this has just been uploaded. Hi, I had a look at the fast-tsc-to-ns implementation, here is how I would rewrite it: static inline void xnarch_init_llmulshft(const unsigned m_in, const unsigned d_in, unsigned *m_out, unsigned *s_out) { unsigned long long mult; *s_out = 31; while (1) { mult = ((unsigned long long)m_in) << *s_out; do_div(mult, d_in); if (mult <= INT_MAX) break; (*s_out)--; } *m_out = (unsigned)mult; } /* Non x86. */ #define __rthal_u96shift(h, m, l, s) ({ \ unsigned _l = (l); \ unsigned _m = (m); \ unsigned _s = (s); \ _l >>= _s; \ _m >>= s; \ _l |= (_m << (32 - s)); \ _m |= ((h) << (32 - s)); \ __rthal_u64fromu32(_m, _l); \ }) /* x86 */ #define __rthal_u96shift(h, m, l, s) ({ \ unsigned _l = (l); \ unsigned _m = (m); \ unsigned _s = (s); \ asm ("shrdl\t%%cl,%1,%0" \ : "+r,?m"(_l) \ : "r,r"(_m), "c,c"(_s)); \ asm ("shrdl\t%%cl,%1,%0" \ : "+r,?m"(_m) \ : "r,r"(h), "c,c"(_s)); \ __rthal_u64fromu32(_m, _l); \ }) static inline long long rthal_llmi(int i, int j) { /* Signed fast 32x32->64 multiplication */ return (long long) i * j; } static inline long long gilles_llmulshft(const long long op, const unsigned m, const unsigned s) { unsigned oph, opl, tlh, tll, thh, thl; unsigned long long th, tl; __rthal_u64tou32(op, oph, opl); tl = rthal_ullmul(opl, m); __rthal_u64tou32(tl, tlh, tll); th = rthal_llmi(oph, m); th += tlh; __rthal_u64tou32(th, thh, thl); return __rthal_u96shift(thh, thl, tll, s); } -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-05 22:28 ` Gilles Chanteperdrix @ 2007-06-06 10:30 ` Jan Kiszka 2007-06-06 12:47 ` Gilles Chanteperdrix 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2007-06-06 10:30 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 4001 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > Jan Kiszka wrote: > > ... > > > fast-tsc-to-ns-v2.patch > > > > > > [Rebased, improved rounding of least significant digit] > > > > Rounding in the fast path for the sake of the last digit was silly. > > Instead, I'm now addressing the ugly interval printing via > > xnarch_precise_tsc_to_ns when converting the timer interval back into > > nanos. -v3 incorporating this has just been uploaded. > > Hi, > > I had a look at the fast-tsc-to-ns implementation, here is how I would > rewrite it: > > static inline void xnarch_init_llmulshft(const unsigned m_in, > const unsigned d_in, > unsigned *m_out, > unsigned *s_out) > { > unsigned long long mult; > > *s_out = 31; > while (1) { > mult = ((unsigned long long)m_in) << *s_out; > do_div(mult, d_in); > if (mult <= INT_MAX) > break; > (*s_out)--; > } > *m_out = (unsigned)mult; > } > > /* Non x86. */ > #define __rthal_u96shift(h, m, l, s) ({ \ > unsigned _l = (l); \ > unsigned _m = (m); \ > unsigned _s = (s); \ > _l >>= _s; \ > _m >>= s; \ > _l |= (_m << (32 - s)); \ > _m |= ((h) << (32 - s)); \ > __rthal_u64fromu32(_m, _l); \ > }) > > /* x86 */ > #define __rthal_u96shift(h, m, l, s) ({ \ > unsigned _l = (l); \ > unsigned _m = (m); \ > unsigned _s = (s); \ > asm ("shrdl\t%%cl,%1,%0" \ > : "+r,?m"(_l) \ > : "r,r"(_m), "c,c"(_s)); \ > asm ("shrdl\t%%cl,%1,%0" \ > : "+r,?m"(_m) \ > : "r,r"(h), "c,c"(_s)); \ > __rthal_u64fromu32(_m, _l); \ > }) > > static inline long long rthal_llmi(int i, int j) > { > /* Signed fast 32x32->64 multiplication */ > return (long long) i * j; > } > > static inline long long gilles_llmulshft(const long long op, > const unsigned m, > const unsigned s) > { > unsigned oph, opl, tlh, tll, thh, thl; > unsigned long long th, tl; > > __rthal_u64tou32(op, oph, opl); > tl = rthal_ullmul(opl, m); > __rthal_u64tou32(tl, tlh, tll); > th = rthal_llmi(oph, m); > th += tlh; > __rthal_u64tou32(th, thh, thl); > > return __rthal_u96shift(thh, thl, tll, s); > } > > Thanks for your suggestion. While your generic version produces comparable code, the x86 variant is about twice as large as the full-assembly version. And code size translates into I-cache occupation, which may have latency costs. [gcc 4.1, i386] -O2 -mregparm=3 -fomit-frame-pointer: 63: 08048490 119 FUNC GLOBAL DEFAULT 13 gilles_llmulshft 68: 08048510 121 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 77: 08048450 57 FUNC GLOBAL DEFAULT 13 rthal_llmulshft 78: 080483c0 135 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft -Os -mregparm=3 -fomit-frame-pointer: 63: 0804843b 93 FUNC GLOBAL DEFAULT 13 gilles_llmulshft 68: 08048498 97 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 77: 08048410 43 FUNC GLOBAL DEFAULT 13 rthal_llmulshft 78: 080483b4 92 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft -O2: 63: 08048480 120 FUNC GLOBAL DEFAULT 13 gilles_llmulshft 68: 08048500 105 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 77: 08048440 60 FUNC GLOBAL DEFAULT 13 rthal_llmulshft 78: 080483c0 117 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft -Os: 63: 08048438 104 FUNC GLOBAL DEFAULT 13 gilles_llmulshft 68: 080484a0 83 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 77: 0804840b 45 FUNC GLOBAL DEFAULT 13 rthal_llmulshft 78: 080483b4 87 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft I'm not arguing we should turn each and every Xenomai arch code into pure assembly. But in this case it already happened, it's less scattered source code-wise, and it is compacter object-wise. So I would prefer to keep it as is. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-06 10:30 ` Jan Kiszka @ 2007-06-06 12:47 ` Gilles Chanteperdrix 2007-06-06 12:59 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Gilles Chanteperdrix @ 2007-06-06 12:47 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > >>Jan Kiszka wrote: >> > Jan Kiszka wrote: >> > ... >> > > fast-tsc-to-ns-v2.patch >> > > >> > > [Rebased, improved rounding of least significant digit] >> > >> > Rounding in the fast path for the sake of the last digit was silly. >> > Instead, I'm now addressing the ugly interval printing via >> > xnarch_precise_tsc_to_ns when converting the timer interval back into >> > nanos. -v3 incorporating this has just been uploaded. >> >>Hi, >> >>I had a look at the fast-tsc-to-ns implementation, here is how I would >>rewrite it: >> >>static inline void xnarch_init_llmulshft(const unsigned m_in, >> const unsigned d_in, >> unsigned *m_out, >> unsigned *s_out) >>{ >> unsigned long long mult; >> >> *s_out = 31; >> while (1) { >> mult = ((unsigned long long)m_in) << *s_out; >> do_div(mult, d_in); >> if (mult <= INT_MAX) >> break; >> (*s_out)--; >> } >> *m_out = (unsigned)mult; >>} >> >>/* Non x86. */ >>#define __rthal_u96shift(h, m, l, s) ({ \ >> unsigned _l = (l); \ >> unsigned _m = (m); \ >> unsigned _s = (s); \ >> _l >>= _s; \ >> _m >>= s; \ >> _l |= (_m << (32 - s)); \ >> _m |= ((h) << (32 - s)); \ >> __rthal_u64fromu32(_m, _l); \ >>}) >> >>/* x86 */ >>#define __rthal_u96shift(h, m, l, s) ({ \ >> unsigned _l = (l); \ >> unsigned _m = (m); \ >> unsigned _s = (s); \ >> asm ("shrdl\t%%cl,%1,%0" \ >> : "+r,?m"(_l) \ >> : "r,r"(_m), "c,c"(_s)); \ >> asm ("shrdl\t%%cl,%1,%0" \ >> : "+r,?m"(_m) \ >> : "r,r"(h), "c,c"(_s)); \ >> __rthal_u64fromu32(_m, _l); \ >>}) >> >>static inline long long rthal_llmi(int i, int j) >>{ >> /* Signed fast 32x32->64 multiplication */ >> return (long long) i * j; >>} >> >>static inline long long gilles_llmulshft(const long long op, >> const unsigned m, >> const unsigned s) >>{ >> unsigned oph, opl, tlh, tll, thh, thl; >> unsigned long long th, tl; >> >> __rthal_u64tou32(op, oph, opl); >> tl = rthal_ullmul(opl, m); >> __rthal_u64tou32(tl, tlh, tll); >> th = rthal_llmi(oph, m); >> th += tlh; >> __rthal_u64tou32(th, thh, thl); >> >> return __rthal_u96shift(thh, thl, tll, s); >>} >> >> > > > Thanks for your suggestion. > > While your generic version produces comparable code, the x86 variant is > about twice as large as the full-assembly version. And code size > translates into I-cache occupation, which may have latency costs. > > [gcc 4.1, i386] > -O2 -mregparm=3 -fomit-frame-pointer: > 63: 08048490 119 FUNC GLOBAL DEFAULT 13 gilles_llmulshft > 68: 08048510 121 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 > 77: 08048450 57 FUNC GLOBAL DEFAULT 13 rthal_llmulshft > 78: 080483c0 135 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft > > -Os -mregparm=3 -fomit-frame-pointer: > 63: 0804843b 93 FUNC GLOBAL DEFAULT 13 gilles_llmulshft > 68: 08048498 97 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 > 77: 08048410 43 FUNC GLOBAL DEFAULT 13 rthal_llmulshft > 78: 080483b4 92 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft > > -O2: > 63: 08048480 120 FUNC GLOBAL DEFAULT 13 gilles_llmulshft > 68: 08048500 105 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 > 77: 08048440 60 FUNC GLOBAL DEFAULT 13 rthal_llmulshft > 78: 080483c0 117 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft > > -Os: > 63: 08048438 104 FUNC GLOBAL DEFAULT 13 gilles_llmulshft > 68: 080484a0 83 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 > 77: 0804840b 45 FUNC GLOBAL DEFAULT 13 rthal_llmulshft > 78: 080483b4 87 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft > > I'm not arguing we should turn each and every Xenomai arch code into > pure assembly. But in this case it already happened, it's less scattered > source code-wise, and it is compacter object-wise. So I would prefer to > keep it as is. I would say the advantage of having a C version outperform the advantages of the full assembly version. C is really easier to understand and debug. The differences between the two versions are some register moves, which cost almost nothing, especially since each operation in the assembly version depends on the result of the previous operation, which means lots of pipeline stall, the register moves will just feed the pipeline. I do not think they really matter. Look at the assembly produced for gilles_llmulshft on ARM, a low end architecture where each instruction really costs: gilles_llmulshft: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. stmfd sp!, {r4, r5, r6, r7} umull r6, r7, r0, r2 mov r4, r7 mov r5, #0 smlal r4, r5, r2, r1 rsb ip, r3, #32 mov r2, r4, lsr r3 orr r1, r2, r5, asl ip mov r2, r2, asl ip orr r0, r2, r6, lsr r3 @ lr needed for prologue ldmfd sp!, {r4, r5, r6, r7} mov pc, lr pretty minimal, no ? The full assembly version has another big drawback, it is a big block that the optimizer can not split, whereas in a C version, the optimizer can decide to interleave the surrounding code. So a C version will inline better. There is one thing I do not like with llmulshft (any implementation), it is the rounding policy towards minus infinity. llmulshft(-1, 2/3) returns -1 whereas llimd would return 0. -- Gilles Chanteperdrix ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-06 12:47 ` Gilles Chanteperdrix @ 2007-06-06 12:59 ` Jan Kiszka 2007-06-06 13:21 ` Gilles Chanteperdrix 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2007-06-06 12:59 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 6741 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >> >>> Jan Kiszka wrote: >>>> Jan Kiszka wrote: >>>> ... >>>>> fast-tsc-to-ns-v2.patch >>>>> >>>>> [Rebased, improved rounding of least significant digit] >>>> Rounding in the fast path for the sake of the last digit was silly. >>>> Instead, I'm now addressing the ugly interval printing via >>>> xnarch_precise_tsc_to_ns when converting the timer interval back into >>>> nanos. -v3 incorporating this has just been uploaded. >>> Hi, >>> >>> I had a look at the fast-tsc-to-ns implementation, here is how I would >>> rewrite it: >>> >>> static inline void xnarch_init_llmulshft(const unsigned m_in, >>> const unsigned d_in, >>> unsigned *m_out, >>> unsigned *s_out) >>> { >>> unsigned long long mult; >>> >>> *s_out = 31; >>> while (1) { >>> mult = ((unsigned long long)m_in) << *s_out; >>> do_div(mult, d_in); >>> if (mult <= INT_MAX) >>> break; >>> (*s_out)--; >>> } >>> *m_out = (unsigned)mult; >>> } >>> >>> /* Non x86. */ >>> #define __rthal_u96shift(h, m, l, s) ({ \ >>> unsigned _l = (l); \ >>> unsigned _m = (m); \ >>> unsigned _s = (s); \ >>> _l >>= _s; \ >>> _m >>= s; \ >>> _l |= (_m << (32 - s)); \ >>> _m |= ((h) << (32 - s)); \ >>> __rthal_u64fromu32(_m, _l); \ >>> }) >>> >>> /* x86 */ >>> #define __rthal_u96shift(h, m, l, s) ({ \ >>> unsigned _l = (l); \ >>> unsigned _m = (m); \ >>> unsigned _s = (s); \ >>> asm ("shrdl\t%%cl,%1,%0" \ >>> : "+r,?m"(_l) \ >>> : "r,r"(_m), "c,c"(_s)); \ >>> asm ("shrdl\t%%cl,%1,%0" \ >>> : "+r,?m"(_m) \ >>> : "r,r"(h), "c,c"(_s)); \ >>> __rthal_u64fromu32(_m, _l); \ >>> }) >>> >>> static inline long long rthal_llmi(int i, int j) >>> { >>> /* Signed fast 32x32->64 multiplication */ >>> return (long long) i * j; >>> } >>> >>> static inline long long gilles_llmulshft(const long long op, >>> const unsigned m, >>> const unsigned s) >>> { >>> unsigned oph, opl, tlh, tll, thh, thl; >>> unsigned long long th, tl; >>> >>> __rthal_u64tou32(op, oph, opl); >>> tl = rthal_ullmul(opl, m); >>> __rthal_u64tou32(tl, tlh, tll); >>> th = rthal_llmi(oph, m); >>> th += tlh; >>> __rthal_u64tou32(th, thh, thl); >>> >>> return __rthal_u96shift(thh, thl, tll, s); >>> } >>> >>> >> >> Thanks for your suggestion. >> >> While your generic version produces comparable code, the x86 variant is >> about twice as large as the full-assembly version. And code size >> translates into I-cache occupation, which may have latency costs. >> >> [gcc 4.1, i386] >> -O2 -mregparm=3 -fomit-frame-pointer: >> 63: 08048490 119 FUNC GLOBAL DEFAULT 13 gilles_llmulshft >> 68: 08048510 121 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 >> 77: 08048450 57 FUNC GLOBAL DEFAULT 13 rthal_llmulshft >> 78: 080483c0 135 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft >> >> -Os -mregparm=3 -fomit-frame-pointer: >> 63: 0804843b 93 FUNC GLOBAL DEFAULT 13 gilles_llmulshft >> 68: 08048498 97 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 >> 77: 08048410 43 FUNC GLOBAL DEFAULT 13 rthal_llmulshft >> 78: 080483b4 92 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft >> >> -O2: >> 63: 08048480 120 FUNC GLOBAL DEFAULT 13 gilles_llmulshft >> 68: 08048500 105 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 >> 77: 08048440 60 FUNC GLOBAL DEFAULT 13 rthal_llmulshft >> 78: 080483c0 117 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft >> >> -Os: >> 63: 08048438 104 FUNC GLOBAL DEFAULT 13 gilles_llmulshft >> 68: 080484a0 83 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 >> 77: 0804840b 45 FUNC GLOBAL DEFAULT 13 rthal_llmulshft >> 78: 080483b4 87 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft >> >> I'm not arguing we should turn each and every Xenomai arch code into >> pure assembly. But in this case it already happened, it's less scattered >> source code-wise, and it is compacter object-wise. So I would prefer to >> keep it as is. > > I would say the advantage of having a C version outperform the > advantages of the full assembly version. C is really easier to > understand and debug. Personally, I prefer the clear (and commented) assembly over the nested macros and inlines. > > The differences between the two versions are some register moves, which > cost almost nothing, especially since each operation in the assembly Cycle-wise, you are right. But what bites us more in the worst case are memory accesses, specifically when they are not cached. Code size matters more according to my experience. > version depends on the result of the previous operation, which means > lots of pipeline stall, the register moves will just feed the pipeline. > I do not think they really matter. Look at the assembly produced for > gilles_llmulshft on ARM, a low end architecture where each instruction > really costs: > gilles_llmulshft: > @ args = 0, pretend = 0, frame = 0 > @ frame_needed = 0, uses_anonymous_args = 0 > @ link register save eliminated. > stmfd sp!, {r4, r5, r6, r7} > umull r6, r7, r0, r2 > mov r4, r7 > mov r5, #0 > smlal r4, r5, r2, r1 > rsb ip, r3, #32 > mov r2, r4, lsr r3 > orr r1, r2, r5, asl ip > mov r2, r2, asl ip > orr r0, r2, r6, lsr r3 > @ lr needed for prologue > ldmfd sp!, {r4, r5, r6, r7} > mov pc, lr > > pretty minimal, no ? OK, your version can perfectly go into the ARM arch. But i386 is different: less registers, thus easily a lot of variable shuffling... > > The full assembly version has another big drawback, it is a big block > that the optimizer can not split, whereas in a C version, the optimizer > can decide to interleave the surrounding code. So a C version will > inline better. We are not inlining that service anymore, at least not for its primary usage tsc-to-ns. Inlining costs object size, thus increases the latency (although it saves us a few cycles). > > There is one thing I do not like with llmulshft (any implementation), it > is the rounding policy towards minus infinity. llmulshft(-1, 2/3) > returns -1 whereas llimd would return 0. See other postings: rounding of the last digit doesn't matter with scaled math, it's already inaccurate by nature. That's also why we have it only one-way. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-06 12:59 ` Jan Kiszka @ 2007-06-06 13:21 ` Gilles Chanteperdrix 2007-06-06 13:31 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Gilles Chanteperdrix @ 2007-06-06 13:21 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > >>Jan Kiszka wrote: >> >>>Gilles Chanteperdrix wrote: >>> >>> >>>>Jan Kiszka wrote: >>>> >>>>>Jan Kiszka wrote: >>>>>... >>>>> >>>>>>fast-tsc-to-ns-v2.patch >>>>>> >>>>>> [Rebased, improved rounding of least significant digit] >>>>> >>>>>Rounding in the fast path for the sake of the last digit was silly. >>>>>Instead, I'm now addressing the ugly interval printing via >>>>>xnarch_precise_tsc_to_ns when converting the timer interval back into >>>>>nanos. -v3 incorporating this has just been uploaded. >>>> >>>>Hi, >>>> >>>>I had a look at the fast-tsc-to-ns implementation, here is how I would >>>>rewrite it: >>>> >>>>static inline void xnarch_init_llmulshft(const unsigned m_in, >>>> const unsigned d_in, >>>> unsigned *m_out, >>>> unsigned *s_out) >>>>{ >>>> unsigned long long mult; >>>> >>>> *s_out = 31; >>>> while (1) { >>>> mult = ((unsigned long long)m_in) << *s_out; >>>> do_div(mult, d_in); >>>> if (mult <= INT_MAX) >>>> break; >>>> (*s_out)--; >>>> } >>>> *m_out = (unsigned)mult; >>>>} >>>> >>>>/* Non x86. */ >>>>#define __rthal_u96shift(h, m, l, s) ({ \ >>>> unsigned _l = (l); \ >>>> unsigned _m = (m); \ >>>> unsigned _s = (s); \ >>>> _l >>= _s; \ >>>> _m >>= s; \ >>>> _l |= (_m << (32 - s)); \ >>>> _m |= ((h) << (32 - s)); \ >>>> __rthal_u64fromu32(_m, _l); \ >>>>}) >>>> >>>>/* x86 */ >>>>#define __rthal_u96shift(h, m, l, s) ({ \ >>>> unsigned _l = (l); \ >>>> unsigned _m = (m); \ >>>> unsigned _s = (s); \ >>>> asm ("shrdl\t%%cl,%1,%0" \ >>>> : "+r,?m"(_l) \ >>>> : "r,r"(_m), "c,c"(_s)); \ >>>> asm ("shrdl\t%%cl,%1,%0" \ >>>> : "+r,?m"(_m) \ >>>> : "r,r"(h), "c,c"(_s)); \ >>>> __rthal_u64fromu32(_m, _l); \ >>>>}) >>>> >>>>static inline long long rthal_llmi(int i, int j) >>>>{ >>>> /* Signed fast 32x32->64 multiplication */ >>>> return (long long) i * j; >>>>} >>>> >>>>static inline long long gilles_llmulshft(const long long op, >>>> const unsigned m, >>>> const unsigned s) >>>>{ >>>> unsigned oph, opl, tlh, tll, thh, thl; >>>> unsigned long long th, tl; >>>> >>>> __rthal_u64tou32(op, oph, opl); >>>> tl = rthal_ullmul(opl, m); >>>> __rthal_u64tou32(tl, tlh, tll); >>>> th = rthal_llmi(oph, m); >>>> th += tlh; >>>> __rthal_u64tou32(th, thh, thl); >>>> >>>> return __rthal_u96shift(thh, thl, tll, s); >>>>} >>>> >>>> >>> >>>Thanks for your suggestion. >>> >>>While your generic version produces comparable code, the x86 variant is >>>about twice as large as the full-assembly version. And code size >>>translates into I-cache occupation, which may have latency costs. >>> >>>[gcc 4.1, i386] >>>-O2 -mregparm=3 -fomit-frame-pointer: >>> 63: 08048490 119 FUNC GLOBAL DEFAULT 13 gilles_llmulshft >>> 68: 08048510 121 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 >>> 77: 08048450 57 FUNC GLOBAL DEFAULT 13 rthal_llmulshft >>> 78: 080483c0 135 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft >>> >>>-Os -mregparm=3 -fomit-frame-pointer: >>> 63: 0804843b 93 FUNC GLOBAL DEFAULT 13 gilles_llmulshft >>> 68: 08048498 97 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 >>> 77: 08048410 43 FUNC GLOBAL DEFAULT 13 rthal_llmulshft >>> 78: 080483b4 92 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft >>> >>>-O2: >>> 63: 08048480 120 FUNC GLOBAL DEFAULT 13 gilles_llmulshft >>> 68: 08048500 105 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 >>> 77: 08048440 60 FUNC GLOBAL DEFAULT 13 rthal_llmulshft >>> 78: 080483c0 117 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft >>> >>>-Os: >>> 63: 08048438 104 FUNC GLOBAL DEFAULT 13 gilles_llmulshft >>> 68: 080484a0 83 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 >>> 77: 0804840b 45 FUNC GLOBAL DEFAULT 13 rthal_llmulshft >>> 78: 080483b4 87 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft >>> >>>I'm not arguing we should turn each and every Xenomai arch code into >>>pure assembly. But in this case it already happened, it's less scattered >>>source code-wise, and it is compacter object-wise. So I would prefer to >>>keep it as is. >> >>I would say the advantage of having a C version outperform the >>advantages of the full assembly version. C is really easier to >>understand and debug. > > > Personally, I prefer the clear (and commented) assembly over the nested > macros and inlines. Not when the macro and inline bear names that are easy to understand. If you do not find the names easy to understand, then change them (I do not like rthal_llmul either, but I could not find a name). To make the assembly fully understandable, you would need to comment every statement. And now, run the assembly code in gdb, and try and print the value of a 64 bits intermediate result: you can't. > > >>The differences between the two versions are some register moves, which >>cost almost nothing, especially since each operation in the assembly > > > Cycle-wise, you are right. But what bites us more in the worst case are > memory accesses, specifically when they are not cached. Code size > matters more according to my experience. > > >>version depends on the result of the previous operation, which means >>lots of pipeline stall, the register moves will just feed the pipeline. >>I do not think they really matter. Look at the assembly produced for >>gilles_llmulshft on ARM, a low end architecture where each instruction >>really costs: >>gilles_llmulshft: >> @ args = 0, pretend = 0, frame = 0 >> @ frame_needed = 0, uses_anonymous_args = 0 >> @ link register save eliminated. >> stmfd sp!, {r4, r5, r6, r7} >> umull r6, r7, r0, r2 >> mov r4, r7 >> mov r5, #0 >> smlal r4, r5, r2, r1 >> rsb ip, r3, #32 >> mov r2, r4, lsr r3 >> orr r1, r2, r5, asl ip >> mov r2, r2, asl ip >> orr r0, r2, r6, lsr r3 >> @ lr needed for prologue >> ldmfd sp!, {r4, r5, r6, r7} >> mov pc, lr >> >>pretty minimal, no ? > > > OK, your version can perfectly go into the ARM arch. But i386 is > different: less registers, thus easily a lot of variable shuffling... variable shuffling which does not really matter, that is my point, otherwise the x86 family would not be as fast as it is. > > >>The full assembly version has another big drawback, it is a big block >>that the optimizer can not split, whereas in a C version, the optimizer >>can decide to interleave the surrounding code. So a C version will >>inline better. > > > We are not inlining that service anymore, at least not for its primary > usage tsc-to-ns. Inlining costs object size, thus increases the latency > (although it saves us a few cycles). it *is* inlined, in tsc_to/from_ns. Another question that I forgot in my previous mails: why not using llmulshft for the two services ? > > >>There is one thing I do not like with llmulshft (any implementation), it >>is the rounding policy towards minus infinity. llmulshft(-1, 2/3) >>returns -1 whereas llimd would return 0. > > > See other postings: rounding of the last digit doesn't matter with > scaled math, it's already inaccurate by nature. That's also why we have > it only one-way. When returning -1 instead of 0, it is not the last digit that is wrong, but the first (and only) one. -- Gilles Chanteperdrix ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-06 13:21 ` Gilles Chanteperdrix @ 2007-06-06 13:31 ` Jan Kiszka 2007-06-06 18:23 ` Gilles Chanteperdrix 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2007-06-06 13:31 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 8471 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >> >>> Jan Kiszka wrote: >>> >>>> Gilles Chanteperdrix wrote: >>>> >>>> >>>>> Jan Kiszka wrote: >>>>> >>>>>> Jan Kiszka wrote: >>>>>> ... >>>>>> >>>>>>> fast-tsc-to-ns-v2.patch >>>>>>> >>>>>>> [Rebased, improved rounding of least significant digit] >>>>>> Rounding in the fast path for the sake of the last digit was silly. >>>>>> Instead, I'm now addressing the ugly interval printing via >>>>>> xnarch_precise_tsc_to_ns when converting the timer interval back into >>>>>> nanos. -v3 incorporating this has just been uploaded. >>>>> Hi, >>>>> >>>>> I had a look at the fast-tsc-to-ns implementation, here is how I would >>>>> rewrite it: >>>>> >>>>> static inline void xnarch_init_llmulshft(const unsigned m_in, >>>>> const unsigned d_in, >>>>> unsigned *m_out, >>>>> unsigned *s_out) >>>>> { >>>>> unsigned long long mult; >>>>> >>>>> *s_out = 31; >>>>> while (1) { >>>>> mult = ((unsigned long long)m_in) << *s_out; >>>>> do_div(mult, d_in); >>>>> if (mult <= INT_MAX) >>>>> break; >>>>> (*s_out)--; >>>>> } >>>>> *m_out = (unsigned)mult; >>>>> } >>>>> >>>>> /* Non x86. */ >>>>> #define __rthal_u96shift(h, m, l, s) ({ \ >>>>> unsigned _l = (l); \ >>>>> unsigned _m = (m); \ >>>>> unsigned _s = (s); \ >>>>> _l >>= _s; \ >>>>> _m >>= s; \ >>>>> _l |= (_m << (32 - s)); \ >>>>> _m |= ((h) << (32 - s)); \ >>>>> __rthal_u64fromu32(_m, _l); \ >>>>> }) >>>>> >>>>> /* x86 */ >>>>> #define __rthal_u96shift(h, m, l, s) ({ \ >>>>> unsigned _l = (l); \ >>>>> unsigned _m = (m); \ >>>>> unsigned _s = (s); \ >>>>> asm ("shrdl\t%%cl,%1,%0" \ >>>>> : "+r,?m"(_l) \ >>>>> : "r,r"(_m), "c,c"(_s)); \ >>>>> asm ("shrdl\t%%cl,%1,%0" \ >>>>> : "+r,?m"(_m) \ >>>>> : "r,r"(h), "c,c"(_s)); \ >>>>> __rthal_u64fromu32(_m, _l); \ >>>>> }) >>>>> >>>>> static inline long long rthal_llmi(int i, int j) >>>>> { >>>>> /* Signed fast 32x32->64 multiplication */ >>>>> return (long long) i * j; >>>>> } >>>>> >>>>> static inline long long gilles_llmulshft(const long long op, >>>>> const unsigned m, >>>>> const unsigned s) >>>>> { >>>>> unsigned oph, opl, tlh, tll, thh, thl; >>>>> unsigned long long th, tl; >>>>> >>>>> __rthal_u64tou32(op, oph, opl); >>>>> tl = rthal_ullmul(opl, m); >>>>> __rthal_u64tou32(tl, tlh, tll); >>>>> th = rthal_llmi(oph, m); >>>>> th += tlh; >>>>> __rthal_u64tou32(th, thh, thl); >>>>> >>>>> return __rthal_u96shift(thh, thl, tll, s); >>>>> } >>>>> >>>>> >>>> Thanks for your suggestion. >>>> >>>> While your generic version produces comparable code, the x86 variant is >>>> about twice as large as the full-assembly version. And code size >>>> translates into I-cache occupation, which may have latency costs. >>>> >>>> [gcc 4.1, i386] >>>> -O2 -mregparm=3 -fomit-frame-pointer: >>>> 63: 08048490 119 FUNC GLOBAL DEFAULT 13 gilles_llmulshft >>>> 68: 08048510 121 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 >>>> 77: 08048450 57 FUNC GLOBAL DEFAULT 13 rthal_llmulshft >>>> 78: 080483c0 135 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft >>>> >>>> -Os -mregparm=3 -fomit-frame-pointer: >>>> 63: 0804843b 93 FUNC GLOBAL DEFAULT 13 gilles_llmulshft >>>> 68: 08048498 97 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 >>>> 77: 08048410 43 FUNC GLOBAL DEFAULT 13 rthal_llmulshft >>>> 78: 080483b4 92 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft >>>> >>>> -O2: >>>> 63: 08048480 120 FUNC GLOBAL DEFAULT 13 gilles_llmulshft >>>> 68: 08048500 105 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 >>>> 77: 08048440 60 FUNC GLOBAL DEFAULT 13 rthal_llmulshft >>>> 78: 080483c0 117 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft >>>> >>>> -Os: >>>> 63: 08048438 104 FUNC GLOBAL DEFAULT 13 gilles_llmulshft >>>> 68: 080484a0 83 FUNC GLOBAL DEFAULT 13 gilles_llmulshft_x86 >>>> 77: 0804840b 45 FUNC GLOBAL DEFAULT 13 rthal_llmulshft >>>> 78: 080483b4 87 FUNC GLOBAL DEFAULT 13 __rthal_generic_llmulshft >>>> >>>> I'm not arguing we should turn each and every Xenomai arch code into >>>> pure assembly. But in this case it already happened, it's less scattered >>>> source code-wise, and it is compacter object-wise. So I would prefer to >>>> keep it as is. >>> I would say the advantage of having a C version outperform the >>> advantages of the full assembly version. C is really easier to >>> understand and debug. >> >> Personally, I prefer the clear (and commented) assembly over the nested >> macros and inlines. > > Not when the macro and inline bear names that are easy to understand. If > you do not find the names easy to understand, then change them (I do not > like rthal_llmul either, but I could not find a name). To make the > assembly fully understandable, you would need to comment every > statement. And now, run the assembly code in gdb, and try and print the > value of a 64 bits intermediate result: you can't. No question, this is a matter of taste. > >> >>> The differences between the two versions are some register moves, which >>> cost almost nothing, especially since each operation in the assembly >> >> Cycle-wise, you are right. But what bites us more in the worst case are >> memory accesses, specifically when they are not cached. Code size >> matters more according to my experience. >> >> >>> version depends on the result of the previous operation, which means >>> lots of pipeline stall, the register moves will just feed the pipeline. >>> I do not think they really matter. Look at the assembly produced for >>> gilles_llmulshft on ARM, a low end architecture where each instruction >>> really costs: >>> gilles_llmulshft: >>> @ args = 0, pretend = 0, frame = 0 >>> @ frame_needed = 0, uses_anonymous_args = 0 >>> @ link register save eliminated. >>> stmfd sp!, {r4, r5, r6, r7} >>> umull r6, r7, r0, r2 >>> mov r4, r7 >>> mov r5, #0 >>> smlal r4, r5, r2, r1 >>> rsb ip, r3, #32 >>> mov r2, r4, lsr r3 >>> orr r1, r2, r5, asl ip >>> mov r2, r2, asl ip >>> orr r0, r2, r6, lsr r3 >>> @ lr needed for prologue >>> ldmfd sp!, {r4, r5, r6, r7} >>> mov pc, lr >>> >>> pretty minimal, no ? >> >> OK, your version can perfectly go into the ARM arch. But i386 is >> different: less registers, thus easily a lot of variable shuffling... > > variable shuffling which does not really matter, that is my point, > otherwise the x86 family would not be as fast as it is. Think of the *code size*... > >> >>> The full assembly version has another big drawback, it is a big block >>> that the optimizer can not split, whereas in a C version, the optimizer >>> can decide to interleave the surrounding code. So a C version will >>> inline better. >> >> We are not inlining that service anymore, at least not for its primary >> usage tsc-to-ns. Inlining costs object size, thus increases the latency >> (although it saves us a few cycles). > > it *is* inlined, in tsc_to/from_ns. Another question that I forgot in my xnarch_tsc_to_ns uninlines this service, and I don't see other, larger users so far. > previous mails: why not using llmulshft for the two services ? See below, see my original post on all the conversion approaches: scaled math is inaccurate, doing it both ways may cause noticeable errors when dealing with calculated vs. measured time stamps over, granted, fairly long periods. > >> >>> There is one thing I do not like with llmulshft (any implementation), it >>> is the rounding policy towards minus infinity. llmulshft(-1, 2/3) >>> returns -1 whereas llimd would return 0. >> >> See other postings: rounding of the last digit doesn't matter with >> scaled math, it's already inaccurate by nature. That's also why we have >> it only one-way. > > When returning -1 instead of 0, it is not the last digit that is wrong, > but the first (and only) one. So this is about -1 nanoseconds vs. 0 nanoseconds. Well, does this error matter in real life? :-> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-06 13:31 ` Jan Kiszka @ 2007-06-06 18:23 ` Gilles Chanteperdrix 2007-06-06 18:46 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Gilles Chanteperdrix @ 2007-06-06 18:23 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > > Not when the macro and inline bear names that are easy to understand. If > > you do not find the names easy to understand, then change them (I do not > > like rthal_llmul either, but I could not find a name). To make the > > assembly fully understandable, you would need to comment every > > statement. And now, run the assembly code in gdb, and try and print the > > value of a 64 bits intermediate result: you can't. > > No question, this is a matter of taste. No, really, being able to debug the code inside gdb appears to me as something more than a "matter of taste", I thought that as the person who made Xenomai run with kgdb you would have agreed with me. Now, about the way I wrote arithmetic code, their are reasons behind my choices. There are some repetitive patterns in this arithmetic code and I wanted to facter them out. The first pattern is the conversion between 32 bits and 64 bits, we have to do this in a way that is understood by the compiler on a particular platform, hence the definition of rthal_u64from/tou32 which is different for each platform. x86 understands shifts and mask (or cast), but gcc for power pc or arm prefers the union trick. There is also only one way to cause gcc to use the 32x32->64 fast multiplication it is exactly what does rthal_ullmul. If you want to do the same thing, but write it differently, you invariably cause gcc to use a full 64 bits multiplication. So, when in rthal_generic_llmulshft, I read: long long hi = (ll >> BITS_PER_LONG) * m; unsigned long long lo = ((long)ll) * m; I think this is all wrong: - on a 64 bits machine, BITS_PER_LONG is 64 and ll is 64 bits, so ll >> BITS_PER_LONG is 0 - for the first multiplication, the compiler will not detect the "fastmult" condition, and will use a full 64 bits multiplication. In order to get it to generate the minimal multiplication, you should have used: long long hi = (long long)(int)(ll >> 32) * (int) m I find: static inline long long rthal_llmi(const int i, const int j) { /* Fast 32x32->64 multiplication */ return (long long) i * j; } /* (...) */ __rthal_u64tou32(op, oph, opl); hi = rthal_llmi(oph, m); easier to write and maintain, understand once you know what rthal_llmi does, and generates better code with regard to the 32bits/64bits conversion. - for the second multiplication, since the two arguments are 32 bits, the compiler will use a 32 bits multiplication, and since you (wrongly) cast the first argument to long, it will use a signed multiplication, whereas we would want it to use an unsigned multiplication, as the assembly routine correctly does. Here again, using: lo = rthal_ullmul(opl, m); would have been less error-prone. So, Ok, I will try to do something for x86 (either reduce the numbers of registers used by the C code, or reduce the assembly to the bare minimum). But, please, pick my generic implementation of llmulshft, it was carefully written. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-06 18:23 ` Gilles Chanteperdrix @ 2007-06-06 18:46 ` Jan Kiszka 2007-06-07 12:52 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2007-06-06 18:46 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 5201 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > Gilles Chanteperdrix wrote: > > > Not when the macro and inline bear names that are easy to understand. If > > > you do not find the names easy to understand, then change them (I do not > > > like rthal_llmul either, but I could not find a name). To make the > > > assembly fully understandable, you would need to comment every > > > statement. And now, run the assembly code in gdb, and try and print the > > > value of a 64 bits intermediate result: you can't. > > > > No question, this is a matter of taste. > > No, really, being able to debug the code inside gdb appears to me as > something more than a "matter of taste", I thought that as the person > who made Xenomai run with kgdb you would have agreed with me. Do we optimise hot path for debuggability? I really don't expect such a well-defined small function being the target of a debugging session. Moreover, you typically debug such micro services with stepi anyway, watching registers, not variables (which are often undefined due to gcc optimisations). > Now, about the way I wrote arithmetic code, their are reasons behind my > choices. There are some repetitive patterns in this arithmetic code and > I wanted to facter them out. The first pattern is the conversion between > 32 bits and 64 bits, we have to do this in a way that is understood by > the compiler on a particular platform, hence the definition of > rthal_u64from/tou32 which is different for each platform. x86 > understands shifts and mask (or cast), but gcc for power pc or arm > prefers the union trick. > There is also only one way to cause gcc to use the 32x32->64 fast > multiplication it is exactly what does rthal_ullmul. If you want to do > the same thing, but write it differently, you invariably cause gcc to > use a full 64 bits multiplication. > > So, when in rthal_generic_llmulshft, I read: > > long long hi = (ll >> BITS_PER_LONG) * m; > unsigned long long lo = ((long)ll) * m; > > I think this is all wrong: > - on a 64 bits machine, BITS_PER_LONG is 64 and ll is 64 bits, so ll >> > BITS_PER_LONG is 0 Yes, utterly wrong, notices this as well. We must set 32 bits in stone. And doing things with true 64 bit requires 128-bit math for the setup, I guess that's not worth the trouble. > > - for the first multiplication, the compiler will not detect the > "fastmult" condition, and will use a full 64 bits multiplication. In > order to get it to generate the minimal multiplication, you should > have used: > > long long hi = (long long)(int)(ll >> 32) * (int) m > > I find: > static inline long long rthal_llmi(const int i, const int j) > { > /* Fast 32x32->64 multiplication */ > return (long long) i * j; > } > > /* (...) */ > __rthal_u64tou32(op, oph, opl); > hi = rthal_llmi(oph, m); > > easier to write and maintain, understand once you know what > rthal_llmi does, and generates better code with regard to the > 32bits/64bits conversion. > > - for the second multiplication, since the two arguments are 32 bits, the > compiler will use a 32 bits multiplication, and since you (wrongly) > cast the first argument to long, it will use a signed multiplication, > whereas we would want it to use an unsigned multiplication, as the > assembly routine correctly does. > > Here again, using: > > lo = rthal_ullmul(opl, m); > would have been less error-prone. > > So, Ok, I will try to do something for x86 (either reduce the numbers of > registers used by the C code, or reduce the assembly to the bare > minimum). But, please, pick my generic implementation of llmulshft, it > was carefully written. Yes, it is the better choice for 32 bit archs (my previous tests didn't reflect the usage in Xenomai truely, redoing them made my generic version fall behind yours). Will include it. But your generic code produces worse binaries on 64 bit. Anyway, given the potential of 64-bit instructions, we would better do this differently there, e.g. like this for x64: #define rthal_llmulshft(ll, m, s) \ ({ \ long long __ret; \ \ __asm__ ( \ /* HI:LO = ll * m */ \ "imull %[__m]\n\t" \ \ /* ret = HI:LO >> s */ \ "shrd %%cl,%%rdx,%%rax\n\t" \ : "=a" (__ret) \ : "a" (ll), [__m] "m" (m), "c" (s)); \ __ret; \ }) This version actually makes inlining xnarch_tsc_to_ns on that arch interesting again... Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-06 18:46 ` Jan Kiszka @ 2007-06-07 12:52 ` Jan Kiszka 2007-06-07 13:02 ` Gilles Chanteperdrix 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2007-06-07 12:52 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 761 bytes --] Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> So, Ok, I will try to do something for x86 (either reduce the numbers of >> registers used by the C code, or reduce the assembly to the bare >> minimum). But, please, pick my generic implementation of llmulshft, it >> was carefully written. > > Yes, it is the better choice for 32 bit archs (my previous tests didn't > reflect the usage in Xenomai truely, redoing them made my generic > version fall behind yours). Will include it. Done, see -v6. Then I added that two-liner for x86_64 rthal_llmulshft, fixed the BITS_PER_LONG bug, and enabled generic-based support for ARM (testing welcome!). At this chance: My series now also includes rthal_llimd for x86_64, another two-liner. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-07 12:52 ` Jan Kiszka @ 2007-06-07 13:02 ` Gilles Chanteperdrix 2007-06-07 14:06 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Gilles Chanteperdrix @ 2007-06-07 13:02 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Jan Kiszka wrote: > >>Gilles Chanteperdrix wrote: >> >>>So, Ok, I will try to do something for x86 (either reduce the numbers of >>>registers used by the C code, or reduce the assembly to the bare >>>minimum). But, please, pick my generic implementation of llmulshft, it >>>was carefully written. >> >>Yes, it is the better choice for 32 bit archs (my previous tests didn't >>reflect the usage in Xenomai truely, redoing them made my generic >>version fall behind yours). Will include it. > > > Done, see -v6. Then I added that two-liner for x86_64 rthal_llmulshft, > fixed the BITS_PER_LONG bug, and enabled generic-based support for ARM > (testing welcome!). > > At this chance: My series now also includes rthal_llimd for x86_64, > another two-liner. v6 is not in the download area. -- Gilles Chanteperdrix ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-07 13:02 ` Gilles Chanteperdrix @ 2007-06-07 14:06 ` Jan Kiszka 2007-06-07 14:24 ` Gilles Chanteperdrix 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2007-06-07 14:06 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 930 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Jan Kiszka wrote: >> >>> Gilles Chanteperdrix wrote: >>> >>>> So, Ok, I will try to do something for x86 (either reduce the numbers of >>>> registers used by the C code, or reduce the assembly to the bare >>>> minimum). But, please, pick my generic implementation of llmulshft, it >>>> was carefully written. >>> Yes, it is the better choice for 32 bit archs (my previous tests didn't >>> reflect the usage in Xenomai truely, redoing them made my generic >>> version fall behind yours). Will include it. >> >> Done, see -v6. Then I added that two-liner for x86_64 rthal_llmulshft, >> fixed the BITS_PER_LONG bug, and enabled generic-based support for ARM >> (testing welcome!). >> >> At this chance: My series now also includes rthal_llimd for x86_64, >> another two-liner. > > v6 is not in the download area. > Mpf, forgot to press "update". Done. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-07 14:06 ` Jan Kiszka @ 2007-06-07 14:24 ` Gilles Chanteperdrix 2007-06-07 14:40 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Gilles Chanteperdrix @ 2007-06-07 14:24 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > >>Jan Kiszka wrote: >> >>>Jan Kiszka wrote: >>> >>> >>>>Gilles Chanteperdrix wrote: >>>> >>>> >>>>>So, Ok, I will try to do something for x86 (either reduce the numbers of >>>>>registers used by the C code, or reduce the assembly to the bare >>>>>minimum). But, please, pick my generic implementation of llmulshft, it >>>>>was carefully written. >>>> >>>>Yes, it is the better choice for 32 bit archs (my previous tests didn't >>>>reflect the usage in Xenomai truely, redoing them made my generic >>>>version fall behind yours). Will include it. >>> >>>Done, see -v6. Then I added that two-liner for x86_64 rthal_llmulshft, >>>fixed the BITS_PER_LONG bug, and enabled generic-based support for ARM >>>(testing welcome!). >>> >>>At this chance: My series now also includes rthal_llimd for x86_64, >>>another two-liner. >> >>v6 is not in the download area. >> > > > Mpf, forgot to press "update". Done. Ok, I agree with the fast-tsc-to-ns patch: I could not get gcc to generate code with less moves on x86 (which is, for me, if it was still needed, yet another proof that these register moves are harmless). However, I do not agree with the x86_64 llimd: it will not work if m is greater than 2G, that is why we implement llimd in terms of ullimd on other architectures. -- Gilles Chanteperdrix ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-07 14:24 ` Gilles Chanteperdrix @ 2007-06-07 14:40 ` Jan Kiszka 2007-06-07 14:54 ` Gilles Chanteperdrix 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2007-06-07 14:40 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 1840 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >> >>> Jan Kiszka wrote: >>> >>>> Jan Kiszka wrote: >>>> >>>> >>>>> Gilles Chanteperdrix wrote: >>>>> >>>>> >>>>>> So, Ok, I will try to do something for x86 (either reduce the numbers of >>>>>> registers used by the C code, or reduce the assembly to the bare >>>>>> minimum). But, please, pick my generic implementation of llmulshft, it >>>>>> was carefully written. >>>>> Yes, it is the better choice for 32 bit archs (my previous tests didn't >>>>> reflect the usage in Xenomai truely, redoing them made my generic >>>>> version fall behind yours). Will include it. >>>> Done, see -v6. Then I added that two-liner for x86_64 rthal_llmulshft, >>>> fixed the BITS_PER_LONG bug, and enabled generic-based support for ARM >>>> (testing welcome!). >>>> >>>> At this chance: My series now also includes rthal_llimd for x86_64, >>>> another two-liner. >>> v6 is not in the download area. >>> >> >> Mpf, forgot to press "update". Done. > > Ok, I agree with the fast-tsc-to-ns patch: I could not get gcc to > generate code with less moves on x86 (which is, for me, if it was still > needed, yet another proof that these register moves are harmless). No question -- from the average performance POV. > > However, I do not agree with the x86_64 llimd: it will not work if m is > greater than 2G, that is why we implement llimd in terms of ullimd on > other architectures. > Please help me, I don't see it yet: m is 32 bit and gets extended to 64 bit without considering any sign (as it should be). Then we multiply 64x64 bit signed, but we know for sure that the second multiplier is always positive. Same for division. Basic tests ((-1*1000000000)/2 vs. (-1*3000000000)/2) confirmed this on the target. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-07 14:40 ` Jan Kiszka @ 2007-06-07 14:54 ` Gilles Chanteperdrix 0 siblings, 0 replies; 20+ messages in thread From: Gilles Chanteperdrix @ 2007-06-07 14:54 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > >>Jan Kiszka wrote: >> >>>Gilles Chanteperdrix wrote: >>> >>> >>>>Jan Kiszka wrote: >>>> >>>> >>>>>Jan Kiszka wrote: >>>>> >>>>> >>>>> >>>>>>Gilles Chanteperdrix wrote: >>>>>> >>>>>> >>>>>> >>>>>>>So, Ok, I will try to do something for x86 (either reduce the numbers of >>>>>>>registers used by the C code, or reduce the assembly to the bare >>>>>>>minimum). But, please, pick my generic implementation of llmulshft, it >>>>>>>was carefully written. >>>>>> >>>>>>Yes, it is the better choice for 32 bit archs (my previous tests didn't >>>>>>reflect the usage in Xenomai truely, redoing them made my generic >>>>>>version fall behind yours). Will include it. >>>>> >>>>>Done, see -v6. Then I added that two-liner for x86_64 rthal_llmulshft, >>>>>fixed the BITS_PER_LONG bug, and enabled generic-based support for ARM >>>>>(testing welcome!). >>>>> >>>>>At this chance: My series now also includes rthal_llimd for x86_64, >>>>>another two-liner. >>>> >>>>v6 is not in the download area. >>>> >>> >>>Mpf, forgot to press "update". Done. >> >>Ok, I agree with the fast-tsc-to-ns patch: I could not get gcc to >>generate code with less moves on x86 (which is, for me, if it was still >>needed, yet another proof that these register moves are harmless). > > > No question -- from the average performance POV. > > >>However, I do not agree with the x86_64 llimd: it will not work if m is >>greater than 2G, that is why we implement llimd in terms of ullimd on >>other architectures. >> > > > Please help me, I don't see it yet: > > m is 32 bit and gets extended to 64 bit without considering any sign (as > it should be). Then we multiply 64x64 bit signed, but we know for sure > that the second multiplier is always positive. Same for division. Basic > tests ((-1*1000000000)/2 vs. (-1*3000000000)/2) confirmed this on the > target. No, you are right. It works. -- Gilles Chanteperdrix ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-05 8:31 ` Jan Kiszka 2007-06-05 22:28 ` Gilles Chanteperdrix @ 2007-06-06 12:49 ` Jan Kiszka 2007-06-06 13:29 ` Gilles Chanteperdrix 1 sibling, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2007-06-06 12:49 UTC (permalink / raw) To: xenomai-core [-- Attachment #1: Type: text/plain, Size: 776 bytes --] Jan Kiszka wrote: > Jan Kiszka wrote: > ... >> fast-tsc-to-ns-v2.patch >> >> [Rebased, improved rounding of least significant digit] > > Rounding in the fast path for the sake of the last digit was silly. > Instead, I'm now addressing the ugly interval printing via > xnarch_precise_tsc_to_ns when converting the timer interval back into > nanos. -v3 incorporating this has just been uploaded. > After noticing yesterday that even unpatched Xenomai sometimes converts inaccurately when showing small timer intervals under /proc, I just got an idea how to address this beautification issue even better: -v4 now rounds up in the slow, precise tsc-to-ns path, see http://www.rts.uni-hannover.de/rtaddon/patches/xenomai/fast-tsc-to-ns-v4.patch Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-06 12:49 ` Jan Kiszka @ 2007-06-06 13:29 ` Gilles Chanteperdrix 2007-06-06 13:36 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Gilles Chanteperdrix @ 2007-06-06 13:29 UTC (permalink / raw) To: Jan Kiszka; +Cc: xenomai-core Jan Kiszka wrote: > Jan Kiszka wrote: > >>Jan Kiszka wrote: >>... >> >>>fast-tsc-to-ns-v2.patch >>> >>> [Rebased, improved rounding of least significant digit] >> >>Rounding in the fast path for the sake of the last digit was silly. >>Instead, I'm now addressing the ugly interval printing via >>xnarch_precise_tsc_to_ns when converting the timer interval back into >>nanos. -v3 incorporating this has just been uploaded. >> > > > After noticing yesterday that even unpatched Xenomai sometimes converts > inaccurately when showing small timer intervals under /proc, I just got > an idea how to address this beautification issue even better: -v4 now > rounds up in the slow, precise tsc-to-ns path, see > > http://www.rts.uni-hannover.de/rtaddon/patches/xenomai/fast-tsc-to-ns-v4.patch I am the one who decided of the rounding behaviour of llimd, RTAI version had the same rounding policy as the one you propose, and I made it for the following reasons: - rouding towards 0 is the policy used by the C language, so doing this for llimd made it consistent with what one expects from C code; - values computed by llimd are used to program timers, and we prefer the timer to be programmed for a too short value than for a too long value. -- Gilles Chanteperdrix ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-06 13:29 ` Gilles Chanteperdrix @ 2007-06-06 13:36 ` Jan Kiszka 2007-06-06 15:08 ` Jan Kiszka 0 siblings, 1 reply; 20+ messages in thread From: Jan Kiszka @ 2007-06-06 13:36 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 1655 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Jan Kiszka wrote: >> >>> Jan Kiszka wrote: >>> ... >>> >>>> fast-tsc-to-ns-v2.patch >>>> >>>> [Rebased, improved rounding of least significant digit] >>> Rounding in the fast path for the sake of the last digit was silly. >>> Instead, I'm now addressing the ugly interval printing via >>> xnarch_precise_tsc_to_ns when converting the timer interval back into >>> nanos. -v3 incorporating this has just been uploaded. >>> >> >> After noticing yesterday that even unpatched Xenomai sometimes converts >> inaccurately when showing small timer intervals under /proc, I just got >> an idea how to address this beautification issue even better: -v4 now >> rounds up in the slow, precise tsc-to-ns path, see >> >> http://www.rts.uni-hannover.de/rtaddon/patches/xenomai/fast-tsc-to-ns-v4.patch > > I am the one who decided of the rounding behaviour of llimd, RTAI > version had the same rounding policy as the one you propose, and I made > it for the following reasons: > - rouding towards 0 is the policy used by the C language, so doing this > for llimd made it consistent with what one expects from C code; > - values computed by llimd are used to program timers, and we prefer the > timer to be programmed for a too short value than for a too long value. That's OK, I agree. In my patch for i386, this rounding is only relevant for display purposes. It's just to help me finding the expected period T of my task in /proc instead of T-1 sometimes. Beautification. All we need for other archs is xnarch_tsc_to_ns according to the old scheme. Will rework this. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers 2007-06-06 13:36 ` Jan Kiszka @ 2007-06-06 15:08 ` Jan Kiszka 0 siblings, 0 replies; 20+ messages in thread From: Jan Kiszka @ 2007-06-06 15:08 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai-core [-- Attachment #1: Type: text/plain, Size: 2110 bytes --] Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Jan Kiszka wrote: >>> >>>> Jan Kiszka wrote: >>>> ... >>>> >>>>> fast-tsc-to-ns-v2.patch >>>>> >>>>> [Rebased, improved rounding of least significant digit] >>>> Rounding in the fast path for the sake of the last digit was silly. >>>> Instead, I'm now addressing the ugly interval printing via >>>> xnarch_precise_tsc_to_ns when converting the timer interval back into >>>> nanos. -v3 incorporating this has just been uploaded. >>>> >>> After noticing yesterday that even unpatched Xenomai sometimes converts >>> inaccurately when showing small timer intervals under /proc, I just got >>> an idea how to address this beautification issue even better: -v4 now >>> rounds up in the slow, precise tsc-to-ns path, see >>> >>> http://www.rts.uni-hannover.de/rtaddon/patches/xenomai/fast-tsc-to-ns-v4.patch >> I am the one who decided of the rounding behaviour of llimd, RTAI >> version had the same rounding policy as the one you propose, and I made >> it for the following reasons: >> - rouding towards 0 is the policy used by the C language, so doing this >> for llimd made it consistent with what one expects from C code; >> - values computed by llimd are used to program timers, and we prefer the >> timer to be programmed for a too short value than for a too long value. > > That's OK, I agree. In my patch for i386, this rounding is only relevant > for display purposes. It's just to help me finding the expected period T > of my task in /proc instead of T-1 sometimes. Beautification. All we > need for other archs is xnarch_tsc_to_ns according to the old scheme. > Will rework this. Done, -v5 is online. This is not yet incorporating any change to the generic llmulshft. I haven't thought about nor tried our code on a 64 bit arch yet. Did you check this already? I wonder, eg., if it makes sense to exploit 128 bit with 64 bit shifts there or stick with 94/32 bit accuracy and related conversion errors. Depending on this, the generic version might have to be reconsidered. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2007-06-07 14:54 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-06-04 23:25 [Xenomai-core] [PATCH-STACK] Updates, timerstats, rtdm-timers Jan Kiszka 2007-06-05 8:31 ` Jan Kiszka 2007-06-05 22:28 ` Gilles Chanteperdrix 2007-06-06 10:30 ` Jan Kiszka 2007-06-06 12:47 ` Gilles Chanteperdrix 2007-06-06 12:59 ` Jan Kiszka 2007-06-06 13:21 ` Gilles Chanteperdrix 2007-06-06 13:31 ` Jan Kiszka 2007-06-06 18:23 ` Gilles Chanteperdrix 2007-06-06 18:46 ` Jan Kiszka 2007-06-07 12:52 ` Jan Kiszka 2007-06-07 13:02 ` Gilles Chanteperdrix 2007-06-07 14:06 ` Jan Kiszka 2007-06-07 14:24 ` Gilles Chanteperdrix 2007-06-07 14:40 ` Jan Kiszka 2007-06-07 14:54 ` Gilles Chanteperdrix 2007-06-06 12:49 ` Jan Kiszka 2007-06-06 13:29 ` Gilles Chanteperdrix 2007-06-06 13:36 ` Jan Kiszka 2007-06-06 15:08 ` Jan Kiszka
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.