From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <47F3B348.1090102@domain.hid> Date: Wed, 02 Apr 2008 18:24:40 +0200 From: Sebastian Smolorz MIME-Version: 1.0 References: <20080402012645.506e53ef.Cornelius.Koepp@domain.hid> <47F34C0D.6090809@domain.hid> <47F37579.7080601@domain.hid> <47F37BF8.6000401@domain.hid> <47F3AD14.4090306@domain.hid> <2ff1a98a0804020905v7019574ai927f213ab6603e41@domain.hid> In-Reply-To: <2ff1a98a0804020905v7019574ai927f213ab6603e41@domain.hid> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3) List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gilles Chanteperdrix Cc: Jan Kiszka , xenomai-core , =?ISO-8859-1?Q?Cornelius_K=F6pp?= Gilles Chanteperdrix wrote: > On Wed, Apr 2, 2008 at 5:58 PM, Sebastian Smolorz > wrote: >> Jan Kiszka wrote: >> > Sebastian Smolorz wrote: >> >> Jan Kiszka wrote: >> >>> Cornelius K=F6pp wrote: >> >>>> I talked with Sebastian Smolorz about this and he builds his own >> >>>> independent kernel-config to check. He got the same drifting-eff= ect >> >>>> with Xenomai 2.4.2 and Xenomai 2.4.3 running latency over severa= l >> >>>> hours. His kernel-config ist attached as >> >>>> 'config-2.6.24-xenomai-2.4.3__ssm'. >> >>>> >> >>>> Our kernel-configs are both based on a config used with Xenomai = 2.3.4 >> >>>> and Linux 2.6.20.15 without any drifting effects. >> >>> 2.3.x did not incorporate the new TSC-to-ns conversion. Maybe it = is >> >>> not a PIC vs. APIC thing, but rather a rounding problem of larger= TSC >> >>> values (that naturally show up when the system runs for a longer = time). >> >> This hint seems to point into the right direction. I tried out a >> >> modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the o= ld >> >> implementation in include/asm-generic/bits/pod.h was used. The dri= fting >> >> bug disappeared. So there seems so be a buggy x86-specific >> >> implementation of this routine. >> > >> > Hmm, maybe even a conceptional issue: the multiply-shift-based >> > xnarch_tsc_to_ns is not as precise as the still multiply-divide-bas= ed >> > xnarch_ns_to_tsc. So when converting from tsc over ns back to tsc, = we >> > may loose some bits, maybe too many bits... >> > >> > It looks like this bites us in the kernel latency tests (-t2 should >> > suffer as well). Those recalculate their timeouts each round based = on >> > absolute nanoseconds. In contrast, the periodic user mode task of -= t0 >> > uses a periodic timer that is forwarded via a tsc-based interval. >> > >> > You (or Cornelius) could try to analyse the calculation path of the >> > involved timeouts, specifically to understand why the scheduled tim= eout >> > of the underlying task timer (which is tsc-based) tend to diverge f= rom >> > the calculated one (ns-based). >> >> So here comes the explanation. The error is inside the function >> rthal_llmulshft(). It returns wrong values which are too small - the >> higher the given TSC value the bigger the error. The function >> rtdm_clock_read_monotonic() calls rthal_llmulshft(). As >> rtdm_clock_read_monotonic() is called every time the latency kernel >> thread runs [1] the values reported by latency become smaller over ti= me. >> >> In contrast, the latency task in user space only uses the conversion >> from TSC to ns only once when calling rt_timer_inquire [2]. >> timer_info.date is too small, timer_info.tsc is right. So all calcula= ted >> deltas in [3] are shifted to a smaller value. This value is constant >> during the runtime of lateny in user space because no more conversion >> from TSC to ns occurs. >=20 > latency does conversions from tsc to ns, but it converts time > differences, so the error is small relative to the results. Of course. I wasn't precise with my last statement. It should be: No=20 more conversions from *absolute* TSC values to ns occur. --=20 Sebastian