From: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
To: Xenomai <Xenomai@xenomai.org>
Subject: Re: [Xenomai] RFC: slow tsc optimization
Date: Sat, 08 Sep 2012 21:30:21 +0200 [thread overview]
Message-ID: <504B9CCD.9030701@xenomai.org> (raw)
In-Reply-To: <5049DA96.90200@xenomai.org>
On 09/07/2012 01:29 PM, Gilles Chanteperdrix wrote:
> On 09/06/2012 09:24 AM, Gilles Chanteperdrix wrote:
>
>>
>> Hi,
>>
>> The last few days, I have been working on getting the "rdtsc"
>> instruction replaced with a call to a tsc emulation function dynamically
>> at run time. It turned out to be easy with the Linux "alternative"
>> mechanism, since it implements replacements based on CPU capabilities,
>> and the TSC is such a capability. This modification allows to compile a
>> kernel with Xenomai that will run on any x86_32 platform.
>>
>> Now, when running kernels without tsc using the PIT based tsc emulation,
>> I found out something pretty obvious, this PIT based tsc emulation is
>> slow, it takes 4us every time we call it. And the nucleus reads the tsc
>> a number of times when a timer interrupt happens up to the wake up of
>> the latency user-space task:
>> - at the very beginning of the timer interrupt
>> - after the execution of the latency thread timer
>> - in the timer programming function, to compute the timer delay
>> - in the middle of the context switch, if the "statistics collection"
>> feature is enabled
>> - in the xnpod_wait_thread_periodic function, after the context switch,
>> in order to compute the number of timer overruns.
>>
>> That is 20us, and the thread is not yet running in user-space.
>>
>> So, I have been thinking about reducing the number of calls to the PIT,
>> unfortunately keeping the last tsc value around and reusing it is a bit
>> heavy, and implies modifications which are completely useless for the
>> non PIT case (which should be the vast majority), and in fact, the tsc
>> emulation code has to keep the last read value, since it is required to
>> convert clocksources with less than 64 bits to a 64 bits value. So, I
>> propose the following approach:
>>
>> The I-pipe core will provide two tsc reading functions:
>> ipipe_read_tsc which reads the counter
>> ipipe_read_tsc_fast which will read the tsc if the cpu has a tsc, or
>> return the last value read if the tsc is emulated.
>
>
> An update on this work. In fact the "read_tsc_fast" should be the most
> common operation, and really reading the PIT counter is not. And reading
> the slow tsc should be made at some critical points so that the fast_tsc
> is reasonably accurate. So, in fact I implemented:
> ipipe_read_tsc, which returns the last tsc value, and is replaced with
> rdtsc when available
> ipipe_read_slow_tsc, which reads the emulated tsc, and is also replaced
> with rdtsc when available,
> ipipe_touch_tsc, which reads the emulated tsc, but is replaced with a
> nop when rdtsc is available.
>
> Now, the real remaining problem is where to use
> ipipe_touch_tsc/ipipe_read_slow_tsc, to have a rasonable accuracy, but
> not read the hardware tsc too often.
>
> I implemented the following approach: the tsc is read at every entry
> point of the nucleus, that is: interrupts, xenomai syscalls, events for
> xenomai tasks. We also need to reread the tsc before programming the
> next shot, in order to avoid programming too long delays (with a restart
> of xntimer_tick_aperiodic if we find out that the delay is too short,
> instead of going through another irq). All in all, these are fairly
> lightweight modifications, and the latency test seems reasonable. Even
> on a kernel with statistics collection enabled. I suspect the statistics
> are a bit off, but at least they are there.
>
> Since we read the tsc twice per interrupt, and reading it takes 4us, the
> minimum latency is around 8us, I thought about including the tsc latency
> (twice) into the nktimerlat latency, but this results in negative
> latencies, and anyway, we should leave the choice to the user to do that
> with /proc/xenomai/latency if he wants.
>
> Now the remaining issues are:
> - kernel-space code. We can trap insmod/rmmod in losyscall, but if an
> RTDM driver ioctl method takes a long time to execute, or when a
> kernel-space thread runs long tasks before calling xenomai services, it
> may use old clock data
> - the time of a syscall is always at least 4us. That is a bit stupid
> when, say, for instance you want to lock a mutex, to read the tsc, lock
> the mutex, then return to user space. Working this around seems
> complicated. We could for instance add a "NOTSC" syscall flag to
> indicate that the tsc should not be read before a syscall callback, but
> modifying correctly the syscall tables to add this flag to the proper
> syscalls is probably not so easy. For instance, when statistics
> collection is enabled, we want to read the tsc before locking the mutex,
> since if there is a context switch, we will need the value for updating
> the statistics.
>
Some benchmarks on atom. In the second try "pit, one read", we do not
re-read the emulated tsc before programming the timer, we avoid loosing
4us, at the expense of the precision of the timer tick.
http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom2.png
--
Gilles.
prev parent reply other threads:[~2012-09-08 19:30 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-06 7:24 [Xenomai] RFC: slow tsc optimization Gilles Chanteperdrix
2012-09-07 11:29 ` Gilles Chanteperdrix
2012-09-08 19:30 ` Gilles Chanteperdrix [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=504B9CCD.9030701@xenomai.org \
--to=gilles.chanteperdrix@xenomai.org \
--cc=Xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.