From: Jan Kiszka <jan.kiszka@domain.hid>
To: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Cc: xenomai-core <xenomai@xenomai.org>,
"Cornelius Köpp" <Cornelius.Koepp@domain.hid>
Subject: Re: [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3)
Date: Thu, 03 Apr 2008 14:50:39 +0200 [thread overview]
Message-ID: <47F4D29F.4020500@domain.hid> (raw)
In-Reply-To: <2ff1a98a0804030527r5d41efafg4aa6a464373711cc@domain.hid>
[-- Attachment #1: Type: text/plain, Size: 4877 bytes --]
Gilles Chanteperdrix wrote:
> On Thu, Apr 3, 2008 at 2:17 PM, Jan Kiszka <jan.kiszka@domain.hid> wrote:
>> Sebastian Smolorz wrote:
>>
>>> Gilles Chanteperdrix wrote:
>>>
>>>> On Wed, Apr 2, 2008 at 5:58 PM, Sebastian Smolorz
>>>> <smolorz@domain.hid> wrote:
>>>>
>>>>> Jan Kiszka wrote:
>>>>> > Sebastian Smolorz wrote:
>>>>> >> Jan Kiszka wrote:
>>>>> >>> Cornelius Köpp wrote:
>>>>> >>>> I talked with Sebastian Smolorz about this and he builds his own
>>>>> >>>> independent kernel-config to check. He got the same
>> drifting-effect
>>>>> >>>> with Xenomai 2.4.2 and Xenomai 2.4.3 running latency over
>> several
>>>>> >>>> hours. His kernel-config ist attached as
>>>>> >>>> 'config-2.6.24-xenomai-2.4.3__ssm'.
>>>>> >>>>
>>>>> >>>> Our kernel-configs are both based on a config used with Xenomai
>> 2.3.4
>>>>> >>>> and Linux 2.6.20.15 without any drifting effects.
>>>>> >>> 2.3.x did not incorporate the new TSC-to-ns conversion. Maybe it
>> is
>>>>> >>> not a PIC vs. APIC thing, but rather a rounding problem of larger
>> TSC
>>>>> >>> values (that naturally show up when the system runs for a longer
>> time).
>>>>> >> This hint seems to point into the right direction. I tried out a
>>>>> >> modified pod_32.h (xnarch_tsc_to_ns() commented out) so that the
>> old
>>>>> >> implementation in include/asm-generic/bits/pod.h was used. The
>> drifting
>>>>> >> bug disappeared. So there seems so be a buggy x86-specific
>>>>> >> implementation of this routine.
>>>>> >
>>>>> > Hmm, maybe even a conceptional issue: the multiply-shift-based
>>>>> > xnarch_tsc_to_ns is not as precise as the still
>> multiply-divide-based
>>>>> > xnarch_ns_to_tsc. So when converting from tsc over ns back to tsc,
>> we
>>>>> > may loose some bits, maybe too many bits...
>>>>> >
>>>>> > It looks like this bites us in the kernel latency tests (-t2 should
>>>>> > suffer as well). Those recalculate their timeouts each round based
>> on
>>>>> > absolute nanoseconds. In contrast, the periodic user mode task of
>> -t0
>>>>> > uses a periodic timer that is forwarded via a tsc-based interval.
>>>>> >
>>>>> > You (or Cornelius) could try to analyse the calculation path of the
>>>>> > involved timeouts, specifically to understand why the scheduled
>> timeout
>>>>> > of the underlying task timer (which is tsc-based) tend to diverge
>> from
>>>>> > the calculated one (ns-based).
>>>>>
>>>>> So here comes the explanation. The error is inside the function
>>>>> rthal_llmulshft(). It returns wrong values which are too small - the
>>>>> higher the given TSC value the bigger the error. The function
>>>>> rtdm_clock_read_monotonic() calls rthal_llmulshft(). As
>>>>> rtdm_clock_read_monotonic() is called every time the latency kernel
>>>>> thread runs [1] the values reported by latency become smaller over
>> time.
>>>>> In contrast, the latency task in user space only uses the conversion
>>>>> from TSC to ns only once when calling rt_timer_inquire [2].
>>>>> timer_info.date is too small, timer_info.tsc is right. So all
>> calculated
>>>>> deltas in [3] are shifted to a smaller value. This value is constant
>>>>> during the runtime of lateny in user space because no more conversion
>>>>> from TSC to ns occurs.
>>>>>
>>>> latency does conversions from tsc to ns, but it converts time
>>>> differences, so the error is small relative to the results.
>>>>
>>> Of course. I wasn't precise with my last statement. It should be: No more
>> conversions from *absolute* TSC values to ns occur.
>>>
>> This patch may do the trick: it uses the inverted tsc-to-ns function
>> instead of the frequency-based one. Be warned, it is totally untested inside
>> Xenomai, I just ran it in a user space test program. But it may give an
>> idea.
>>
>> Gilles, not sure if this is related to my quickly hacked test, but with
>> RTHAL_CPU_FREQ = 800MHz and TSC = 0x7000000000000000 (or larger) I get an
>> arithmetic exception with the rthal_llimd-based conversion to nanoseconds.
>> Is there an input range we may have to exclude for rthal_llimd?
>
> rthal_llimd does a multiplication first, then a division. The
> multiplication can not overflow, but the result of the division may
> not fit on 64 bits, you then get an exception on x86. This happens
> only with m > d.
OK, for tsc-to-ns this only bites us after a few hundred years of uptime
- or when we have settable tsc counters (does Linux tweak them beyond
aligning on SMP?).
But there is also the risk the other way around: ns-to-tsc with
frequency > 1GHz will fall apart (kernel oops!) when the user provides a
large timeout in nanoseconds that we then try to convert to tsc. Not
good. Wrong values are one thing, but oopses are even worse.
Any idea how to fix this?
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]
next prev parent reply other threads:[~2008-04-03 12:50 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-01 23:26 [Xenomai-core] latencys drifting into negative (Xenomai 2.4.2/2.4.3) Cornelius Köpp
2008-04-02 3:01 ` Tomas Kalibera
2008-04-02 9:04 ` Jan Kiszka
2008-04-02 12:00 ` Sebastian Smolorz
2008-04-02 12:28 ` Jan Kiszka
2008-04-02 12:46 ` Gilles Chanteperdrix
2008-04-02 13:00 ` Sebastian Smolorz
2008-04-02 15:28 ` Sebastian Smolorz
2008-04-02 15:58 ` Sebastian Smolorz
2008-04-02 16:05 ` Gilles Chanteperdrix
2008-04-02 16:24 ` Sebastian Smolorz
2008-04-03 12:17 ` Jan Kiszka
2008-04-03 12:27 ` Gilles Chanteperdrix
2008-04-03 12:50 ` Jan Kiszka [this message]
2008-04-03 12:52 ` Gilles Chanteperdrix
2008-04-03 13:15 ` Sebastian Smolorz
2008-04-03 21:52 ` Jan Kiszka
2008-04-04 8:23 ` Sebastian Smolorz
2008-04-04 10:45 ` Jan Kiszka
2008-04-04 13:18 ` Gilles Chanteperdrix
2008-04-04 13:25 ` Jan Kiszka
2008-04-04 13:32 ` Jan Kiszka
2008-04-04 13:32 ` Gilles Chanteperdrix
2008-04-04 13:57 ` Jan Kiszka
2008-04-04 14:09 ` Gilles Chanteperdrix
2008-04-04 14:33 ` Jan Kiszka
2008-04-04 15:48 ` Gilles Chanteperdrix
2008-04-04 15:52 ` Philippe Gerum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47F4D29F.4020500@domain.hid \
--to=jan.kiszka@domain.hid \
--cc=Cornelius.Koepp@domain.hid \
--cc=gilles.chanteperdrix@xenomai.org \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.