All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@domain.hid>
To: xenomai-core <xenomai@xenomai.org>
Subject: [Xenomai-core] Timer optimisations, continued
Date: Tue, 25 Jul 2006 20:26:50 +0200	[thread overview]
Message-ID: <44C6626A.6000304@domain.hid> (raw)
In-Reply-To: <17550.61982.685449.470866@domain.hid>

[-- Attachment #1: Type: text/plain, Size: 18489 bytes --]

Hi all,

to continue the discussion about improving the timer subsystem,
specifically with respect to unit conversion overhead, I'm posting here
a (fairly long) report of my findings and consideration.

First of all I did some benchmarking of the various optimised conversion
routines that popped up. I stressed them on the different x86-platforms.
The numbers are for 1000 iterations (loop overhead compensated), used
compiler was gcc-4.1. Just to recall the actors:

xnarch_tsc_to_ns - original accurate 64-bit division for converting TSC
                   ticks in nanoseconds (and vice versa)
fast_tsc_to_ns   - my scaled-math-based assembler variant, suffering
                   from some inaccuracy for large intervals, still
                   requires normal 64-bit muldiv for the ns-to-TSC
                   return path
ns_2_cycles      - Philippe's similar version, a bit more inaccurate
nodiv_ullimd     - Gilles' 64-bit conversion routine, only sometimes
                   varying in the last bit from the original result
nodiv_imuldiv    - Gilles' 32-bit div-less conversion for small
                   intervals (haven't checked, but I assume it's as
                   accurate as the 64-bit variant in the limited domain)

And here are the results (ugly test code available on request):

VIA C2, 600 MHz:
xnarch_tsc_to_ns:  160680 cycles /  267800 ns
fast_tsc_to_ns:    119842 cycles /  199736 ns
ns_2_cycles:        69376 cycles /  115626 ns
nodiv_ullimd:      179042 cycles /  298403 ns
nodiv_imuldiv:      41336 cycles /   68893 ns

P-III, 1GHz:
xnarch_tsc_to_ns:  108475 cycles /  107935 ns
fast_tsc_to_ns:     24127 cycles /   24006 ns
ns_2_cycles:        21338 cycles /   21231 ns
nodiv_ullimd:       67974 cycles /   67635 ns
nodiv_imuldiv:      13269 cycles /   13202 ns

P-MMX, 266 MHz:
xnarch_tsc_to_ns:  131886 cycles /  495812 ns
fast_tsc_to_ns:     47697 cycles /  179312 ns
ns_2_cycles:        43627 cycles /  164011 ns
nodiv_ullimd:      141915 cycles /  533515 ns
nodiv_imuldiv:      44761 cycles /  168274 ns

P-M, 1,3GHz:
xnarch_tsc_to_ns:  113219 cycles /   87091 ns
fast_tsc_to_ns:     26718 cycles /   20552 ns
ns_2_cycles:        15024 cycles /   11556 ns
nodiv_ullimd:       49620 cycles /   38169 ns
nodiv_imuldiv:      17036 cycles /   13104 ns

Opteron 275 (32-bit mode), 1,8 GHz:
xnarch_tsc_to_ns:  112507 cycles /   62503 ns
fast_tsc_to_ns:     21857 cycles /   12142 ns
ns_2_cycles:        12545 cycles /    6969 ns
nodiv_ullimd:       41175 cycles /   22875 ns
nodiv_imuldiv:       7261 cycles /    4033 ns

For sure, working with only 32-bit is the fastest variant on all
platforms. Other variants do not always perform well or have limited
accuracy. Unfortunately, 32-bit conversions cannot be applied on all
scenarios, we will see this below.


After hacking my fast_tsc_to_ns, my original plan was to switch the
internal timer base completely to nanoseconds in the hope to reduce the
number of conversions in the timer hot-paths. Luckily I decided to
analyse the typical scenarios first before starting the develop any
patch. I consider the following 5 scenarios for heavy timer usage. Both
TSC and nanoseconds as time base are analysed, also a potential
timer_start() variant that accepts absolute timeout values. The pseudo
code /should/ be self-explaining. If not do not hesitate to ask.


1. Periodic Timers
==================

Start once, run continuously
=> hot-path is the timer IRQ


1.1 TSC-based
-------------

task_set_periodic(start, interval)              [rarely]
        delay = start - get_time()
                get_time(): tsc -> ns           [64-bit]
        timer_start(delay, interval)
                delay: ns -> tsc                [32-bit candidate]
                date = get_tsc() + delay
                interval: ns -> tsc             [32-bit candidate]
                program_timer(date)
                        delay = date - get_tsc()
                        set_hw_timer(delay)
-or-
task_set_periodic(start, interval)              [rarely]
        timer_start_abs(start, interval)
                date: ns -> tsc                 [64-bit]
                interval: ns -> tsc             [32-bit candidate]
                program_timer(date)
                        delay = date - get_tsc()
                        set_hw_timer(delay)

timer_irq()                                     [hot-path]
        date <= get_tsc()?
        date = get_tsc() + interval
        program_timer(date)
                delay = date - get_tsc()
                set_hw_timer(delay)


1.2 ns-based
------------

task_set_periodic(start, interval)              [rarely]
        delay = start - get_time()
                get_time(): tsc -> ns           [64-bit]
        timer_start(delay, interval)
                date = get_time() + delay
                        get_time(): tsc -> ns   [64-bit]
                program_timer(date)
                        date: ns -> tsc         [64-bit]
                        delay = date - get_tsc()
                        set_hw_timer(delay)
-or-
task_set_periodic(start, interval)              [rarely]
        timer_start_abs(start, interval)
                date = start
                program_timer(date)
                        date: ns -> tsc         [64-bit]
                        delay = date - get_tsc()
                        set_hw_timer(delay)

timer_irq()                                     [hot-path]
        now = get_time()
                get_time: tsc -> ns             [64-bit]
        date <= now()?
        date = now + interval
        program_timer(date)
                date: ns -> tsc                 [64-bit]
                delay = date - get_tsc()
                set_hw_timer(delay)


1.3 Summary of (only!) the hot-path
-----------------------------------

              | total       | tsc->ns | ns->tsc | possible
              | conversions |         |         | 32-bit ns->tsc
--------------+-------------+---------+---------+----------------
TSC-based     | 0           | 0       | 0       | 0
TSC-based+ABS | 0           | 0       | 0       | 0
ns-based      | 2           | 1       | 1       | 0
ns-based+ABS  | 2           | 1       | 1       | 0



2. Relative Timers
==================
(explicit relative delays)

Started often, typically time out
=> hot-path is timer_start() and the timer IRQ


2.1 TSC-based
-------------

task_sleep(delay)                               [hot-path]
        timer_start(delay, 0)
                delay: ns -> tsc                [32-bit candidate]
                date = get_tsc() + delay
                program_timer(date)
                        delay = date - get_tsc()
                        set_hw_timer(delay)

timer_irq()                                     [hot-path]
        date <= get_tsc()?
        (programming of succeeding timer intentionally not included)


2.2 ns-based
-------------

task_sleep(delay)                               [hot-path]
        timer_start(delay, 0)
                date = get_time() + delay
                        get_time: tsc -> ns     [64-bit]
                program_timer(date)
                        date: ns -> tsc         [64-bit]
                        delay = date - get_tsc()
                        set_hw_timer(delay)

timer_irq()                                     [hot-path]
        now = get_time()
                get_time: tsc -> ns             [64-bit]
        date <= now?
        (programming of succeeding timer intentionally not included)


2.3 Summary of the hot-path
---------------------------

              | total       | tsc->ns | ns->tsc | possible
              | conversions |         |         | 32-bit ns->tsc
--------------+-------------+---------+---------+----------------
TSC-based     | 1           | 0       | 1       | 1
ns-based      | 3           | 2       | 1       | 0



3. Relative Timeouts
====================
(IPC mechanisms, device operations, etc.)

Started often, typically do not fire, often comparably large timeout
values that do not make it down to program_timer() before cancellation
=> hot-path is timer_start() and timer_stop()


3.1 TSC-based
-------------

mutex_lock(..., delay)                          [hot-path]
        timer_start(delay, 0)
                delay: ns -> tsc                [32-bit candidate]
                date = get_tsc() + delay
                program_timer(date)             [rarely]
                        delay = date - get_tsc()
                        set_hw_timer(delay)
        block_on_mutex()
        timer_stop()
                program_timer(date)             [rarely]
                        delay = date - get_tsc()
                        set_hw_timer(delay)

timer_irq()                                     [rarely]
        date <= get_tsc()?


3.2 ns-based
------------

mutex_lock(..., delay)                          [hot-path]
        timer_start(delay, 0)
                date = get_time() + delay
                        get_time: tsc -> ns     [64-bit]
                program_timer(date)             [rarely]
                        date: ns -> tsc         [64-bit]
                        delay = date - get_tsc()
                        set_hw_timer(delay)
        block_on_mutex()
        timer_stop()
                program_timer(date)             [rarely]
                        date: ns -> tsc         [64-bit]
                        delay = date - get_tsc()
                        set_hw_timer(delay)

timer_irq()                                     [rarely]
        now = get_time()
                get_time: tsc -> ns             [64-bit]
        date <= now?


3.3 Summary of the hot-path
---------------------------

              | total       | tsc->ns | ns->tsc | possible
              | conversions |         |         | 32-bit ns->tsc
--------------+-------------+---------+---------+----------------
TSC-based     | 1           | 0       | 1       | 1
ns-based      | 1           | 1       | 0       | 0



4. Absolute Timers
==================
(e.g. TDMA slot timing in RTnet)

Started often, include time-stamp acquisition and conversion, typically
fire => hot-path is get_time(), timer_start(), and the timer IRQ


4.1 TSC-based
-------------

date = get_time() + delay                       [hot-path]
        get_time: tsc -> ns                     [64-bit]

task_sleep_until(date)                          [hot-path]
        delay = date - get_time()
                get_time: tsc -> ns             [64-bit]
        timer_start(delay, 0)
                delay: ns -> tsc                [32-bit candidate]
                date = get_tsc() + delay
                program_timer(date)
                        delay = date - get_tsc()
                        set_hw_timer(delay)
-or-
task_sleep_until(date)                          [hot-path]
        timer_start_abs(date, 0)
                date: ns -> tsc                 [64-bit]
                program_timer(date)
                        delay = date - get_tsc()
                        set_hw_timer(delay)

timer_irq()                                     [hot-path]
        test date <= get_tsc()?


4.2 ns-based
------------

date = get_time() + delay                       [hot-path]
        get_time: tsc -> ns                     [64-bit]

task_sleep_until(date)                          [hot-path]
        delay = date - get_time()
                get_time: tsc -> ns             [64-bit]
        timer_start(delay, 0)
                date = get_time() + delay
                        get_time(): tsc -> ns   [64-bit]
                program_timer(date)
                        date: ns -> tsc         [64-bit]
                        delay = date - get_tsc()
                        set_hw_timer(delay)
-or-
task_sleep_until(date)                          [hot-path]
        timer_start_abs(date, 0)
                program_timer(date)
                        date: ns -> tsc         [64-bit]
                        delay = date - get_tsc()
                        set_hw_timer(delay)

timer_irq()                                     [hot-path]
        now = get_time()
                get_time: tsc -> ns             [64-bit]
        date <= now()?


4.3 Summary of the hot-path
---------------------------

              | total       | tsc->ns | ns->tsc | possible
              | conversions |         |         | 32-bit ns->tsc
--------------+-------------+---------+---------+----------------
TSC-based     | 3           | 2       | 1       | 1
TSC-based+ABS | 2           | 1       | 1       | 0
ns-based      | 5           | 4       | 1       | 0
ns-based+ABS  | 3           | 2       | 1       | 0



5. Absolute Timeouts
====================
(e.g. POSIX IPC mechanisms)

Started often, typically do not fire, include time-stamp acquisition,
often comparably large timeout values
=> hot-path is get_time(), timer_start(), and timer_stop()


5.1 TSC-based
-------------

date = get_time() + delay                       [hot-path]
        get_time: tsc -> ns                     [64-bit]

sem_timeddown(..., date)                        [hot-path]
        delay = date - get_time()
                get_time: tsc -> ns             [64-bit]
        timer_start(delay, 0)
                delay: ns -> tsc                [32-bit candidate]
                date = get_tsc() + delay
                program_timer(date)             [rarely]
                        delay = date - get_tsc()
                        set_hw_timer(delay)
        block_on_sem()
        timer_stop()
                program_timer(date)             [rarely]
                        delay = date - get_tsc()
                        set_hw_timer(delay)
-or-
sem_timeddown(..., date)                        [hot-path]
        timer_start_abs(date, 0)
                date: ns -> tsc                 [64-bit]
                program_timer(date)             [rarely]
                        delay = date - get_tsc()
                        set_hw_timer(delay)
        block_on_sem()
        timer_stop()
                program_timer(date)             [rarely]
                        delay = date - get_tsc()
                        set_hw_timer(delay)

timer_irq()                                     [rarely]
        date <= get_tsc()?


5.2 ns-based
------------

date = get_time() + delay                       [hot-path]
        get_time: tsc -> ns                     [64-bit]

sem_timeddown(..., date)                        [hot-path]
        delay = date - get_time()
                get_time: tsc -> ns             [64-bit]
        timer_start(delay, 0)
                date = get_time() + delay
                        get_time(): tsc -> ns   [64-bit]
                program_timer(date)             [rarely]
                        date: ns -> tsc         [64-bit]
                        delay = date - get_tsc()
                        set_hw_timer(delay)
        block_on_sem()
        timer_stop()
                program_timer(date)             [rarely]
                        date: ns -> tsc         [64-bit]
                        delay = date - get_tsc()
                        set_hw_timer(delay)
-or-
sem_timeddown(..., date)                        [hot-path]
        timer_start_abs(date, 0)
                program_timer(date)             [rarely]
                        date: ns -> tsc         [64-bit]
                        delay = date - get_tsc()
                        set_hw_timer(delay)
        block_on_sem()
        timer_stop()
                program_timer(date)             [rarely]
                        date: ns -> tsc         [64-bit]
                        delay = date - get_tsc()
                        set_hw_timer(delay)

timer_irq()                                     [rarely]
        now = get_time()
                get_time: tsc -> ns             [64-bit]
        date <= now()?


5.3 Summary of the hot-path
---------------------------

              | total       | tsc->ns | ns->tsc | possible
              | conversions |         |         | 32-bit ns->tsc
--------------+-------------+---------+---------+----------------
TSC-based     | 3           | 2       | 1       | 1
TSC-based+ABS | 2           | 1       | 1       | 0
ns-based      | 3           | 3       | 0       | 0
ns-based+ABS  | 1           | 1       | 0       | 0

[Please don't take every detail above for granted. Some bugs may sleep
even there. Too many conversions...]


To summarise these lengthy results:

 o ns-based xntimers are nice on first sight, but not on second. Most
   use-cases (except 5) require less conversions when we keep the
   abstraction as it is.

 o Performance should be improvable by combining fast_tsc_to_ns for full
   64-bit conversions with nodiv_imuldiv for short relative ns-to-tsc.
   It should be ok to loose some accuracy wrt to long periods given that
   TSC are AFAIK not very accurate themselves. Nevertheless, to keep
   precision on 64-bit ns-to-tsc reverse conversions, those should
   remain implemented as they are:
   "if (ns <= ULONG_MAX) nodiv_imuldiv else xnarch_ns_to_tsc"

 o A further improvement should be achievable for scenarios 4 and 5 by
   introducing absolute xntimers (more precisely: a flag to
   differentiate between the mode on xntimer_start). I have an outdated
   patch for this in my repos, needs re-basing.

To verify that we actually improve something with each of the changes
above, some kind of fine-grained test suite will be required. The
timerbench could be extended to support all 5 scenarios. But does
someone have any quick idea how to evaluate the overall performances
best? The new per-task statistics code is not accurate enough as it
accounts IRQs mostly to the preempted task, not the preempting one. Mm,
execution time of some long-running number-crunching Linux task in the
background?

Looking forward to feedback!

Jan


PS: Finally, after stabilising the xntimers again, we will see a nice
rtdm_timer API as well. But those patches need even more re-basing then...


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

  parent reply	other threads:[~2006-07-25 18:26 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-13 10:51 [Xenomai-core] ns vs. tsc as internal timer base Jan Kiszka
2006-06-13 11:16 ` Philippe Gerum
2006-06-13 11:56   ` Jan Kiszka
2006-06-13 12:31     ` Philippe Gerum
2006-06-13 13:07       ` Gilles Chanteperdrix
2006-06-13 13:28         ` Philippe Gerum
2006-06-13 13:34           ` Gilles Chanteperdrix
2006-06-13 13:45             ` Philippe Gerum
2006-06-13 13:33       ` Jan Kiszka
2006-06-13 13:51         ` Philippe Gerum
2006-06-13 16:19       ` Jan Kiszka
2006-06-13 16:29         ` Gilles Chanteperdrix
2006-06-13 17:04         ` Philippe Gerum
2006-06-13 17:13           ` Gilles Chanteperdrix
2006-06-13 17:58             ` Philippe Gerum
2006-06-14  9:25               ` Jim Cromie
2006-06-14 12:29                 ` Philippe Gerum
2006-06-14 13:07                   ` Jan Kiszka
2006-06-14 16:04                     ` Jan Kiszka
2006-07-25 18:26             ` Jan Kiszka [this message]
2006-07-27  8:53               ` [Xenomai-core] Timer optimisations, continued Philippe Gerum
2006-07-27 12:42                 ` Gilles Chanteperdrix
2006-07-27 13:19                   ` Philippe Gerum
2006-07-27 13:54                     ` Jan Kiszka
2006-07-27 14:10                       ` Philippe Gerum
2006-06-13 11:59 ` [Xenomai-core] ns vs. tsc as internal timer base Gilles Chanteperdrix
2006-06-13 12:00 ` Anders Blomdell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44C6626A.6000304@domain.hid \
    --to=jan.kiszka@domain.hid \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.