From: George Anzinger <george@mvista.com>
To: tglx@linutronix.de
Cc: john stultz <johnstul@us.ibm.com>,
Nish Aravamudan <nish.aravamudan@gmail.com>,
Ingo Molnar <mingo@elte.hu>,
linux-kernel@vger.kernel.org,
Steven Rostedt <rostedt@goodmis.org>,
dwalker@mvista.com,
"'high-res-timers-discourse@lists.sourceforge.net'"
<high-res-timers-discourse@lists.sourceforge.net>
Subject: Re: 2.6.13-rt6, ktimer subsystem
Date: Thu, 15 Sep 2005 15:35:31 -0700 [thread overview]
Message-ID: <4329F733.2090604@mvista.com> (raw)
In-Reply-To: <1126751140.6509.474.camel@tglx.tec.linutronix.de>
Thomas Gleixner wrote:
> On Wed, 2005-09-14 at 12:38 -0700, George Anzinger wrote:
>
>
>>Well, having spent a bit of time looking at the code it appears that a
>>lot of the ideas we looked at and discarded (see
>>high-res-timers-discourse@lists.sourceforge.net) are in this. Shame it
>>was all done with out reference or comment to that list, anyone on it or
>>even the lkml.
>
>
> Well, I'm considering to wear sackcloth and ashes. But this seems like
> the pot calling the kettle back as I don't remember a single relevant
> reference/comment from you on the UTIME/KURT mailing list after your
> UTIME->HRT fork.
Hm...All the work has been on HRT mailing list. In the early days the
UTIME/KURT folks were on the list, still may be for all I know.
Still, as much as sackcloth and ashes might become you or me, I would
rather try to learn how to be more open about what we are doing so that
what results is the best possible for the linux community.
>
>
>>A few of the top issues:
>>
>>time in nanoseconds 64-bits, requires a divide to do much of anything
>>with it. Divides are slow and should be avoided if possible.
>
>
> The divides are rare and definitely not in the hot paths. I'm sure that
> they can be replaced by some intelligent scaled math algorithm if it
> turns out to be necessary. The hot path instructions are simple
> add/sub/cmp which are less/equal expensive on a 32bit machine to an
> operation on struct timespec or an jiffies/arch_cycles pair. The non
> nsec based implementation gives a burden to 64bit machines and is
> provable wrong in the aspect of summing rounding errors of interval
> timers.
With out arguing with your "provable wrong", I will admit that the
64-bit LOOKS good and easy. At the same time I think we can abstract
all the required actions on the "nstime" thing and set it up so we can
use either a timespec or a s64 depending on the machine (or some such).
I have such a patch nearly ready...
>
>
~
>
>>The rbtree is a high overhead tree. I suspect performance problems
>>here.
>
>
> 1. rbtree is available out of the box
True.
>
> 2. rbtree is proven to be efficient - at least there are a couple of
> performance relevant users relying on it e.g. mm, ext3
>
> 3. I did insertion/removal tests with 10k entries (<2us on a 1GHz box in
> user space). This is way below the experienced and reproducible trouble
> of recascading. The penalty is completely on the thread which owns the
> timer for non signal related functions. For signal related functions
> only the removal on expiry is a penalty for the complete system (in the
> softirq)
Well, I never was a cascade fan.
In the early HRT patches the whole timer list was replaced with a hashed
list. It was O(N/M) on insertion where we could easily choose M (for a
while it was even a config option). Removal was just an unlink, same as
the cascade list.
To be clear on my take on this, as I understand it the rblist is
something like O(N*somelog 2). What is left out here is the fixed
overhead of F which is there even if N = 1. So we have something like
(F+O(f(N)) for a list. For the most part we don't look at F, but as
list complexity grows, it gets larger thus pushing out the cross over
point to a higher "N" when comparing two lists. I considered the rbtree
when doing the secondary list for HRT and concluded that "N" was small
enough that a simple O(N/2) list would do just fine as it would only
contain timers set to expire in the next jiffie.
I don't think an O(N/2) list is right for all timers (as in the
ktimers), but I still wonder if there isn't something better than and
rblist. Note, "wonder". I don't have such a list structure in hand.
>
> The cascading is a penalty for the complete system all the time.
>
> Performance is a straw man argument here. You know very well that > 90%
> of the timers are inaccurate "timeout" timers related to I/O,
> networking, devices. Most of those never expire (the positive feedback
> removes the timer before expiry) and those timers have no constraint to
> be accurate, except for the fact that they have to detect an
> device/network problem at some time. In this case it is completely
> irrelevant whether the timeout occurs n msecs earlier or later.
I agree, but it not accuracy that I am arguing, but cpu cycles. Those
we use in the kernel are not available for the user.
>
>
>>If it is the right answer here, then why not use it for normal
>>timers? A list of timers is a rather unique thing and, I think,
>>deserves a management structure that accounts for the fact that the
>>elements in the tree are perishable.
>
>
> The first goal was to separate out the timers from the timeout API and I
> believe that this separation is necessary.
>
> The implementation of ktimers is not at all restricted to the timers we
> addressed for now, it can also be utilized for the timeout API without
> much effort.
>
> The performance improvement is enormous despite the alleged 64bit math
> overhead.
>
> Testcase on a 600MHz CeleronM, 512MB RAM:
>
> 16 cyclic SCHED_FIFO tasks using clock_nanosleep(ABSTIME)
> base interval = 1000 us, offset per task = 50 us
> tracing each latency value to disk
>
> 16 cyclic SCHED_OTHER tasks using clock_nanosleep(ABSTIME)
> base interval = 1000 us, offset per task = 500 us
> tracing each latency value to disk
>
> while true; do hackbench 50; done
> cat /dev/hda | nc atlas 4711
>
> This injects a (insane) load avg of >600 permantely
>
> 2.6.13-rt4 (cascade based implementation)
> 16 cyclic SCHED_FIFO tasks using clock_nanosleep(ABSTIME)
>
> task loops min max average sigma prio
> 0 999999 5 869 19 20 80
> 1 999999 9 883 18 22 79
> 2 999999 9 927 19 23 78
> 3 999999 5 908 21 28 77
> 4 999999 0 1056 22 33 76
> 5 999999 0 973 23 33 75
> 6 999999 0 926 23 33 74
> 7 999999 1 893 24 33 73
> 8 999999 2 942 23 34 72
> 9 999999 1 868 24 34 71
> 10 999999 0 912 23 34 70
> 11 999999 0 911 28 46 69
> 12 999999 0 912 28 46 68
> 13 999999 9 967 28 46 67
> 14 999999 9 954 28 46 66
> 15 999999 9 946 28 46 65
>
>
> 2.6.13-rt11 (ktimers based implementation)
> 16 cyclic SCHED_FIFO tasks using clock_nanosleep(ABSTIME)
>
> task loops min max average sigma prio
> 0 999999 9 76 20 4 80
> 1 999999 8 84 20 4 79
> 2 999999 8 98 21 4 78
> 3 999999 9 121 20 4 77
> 4 999999 8 124 20 4 76
> 5 999999 9 140 20 4 75
> 6 999999 10 103 22 5 74
> 7 999999 9 99 21 5 73
> 8 999999 8 95 21 5 72
> 9 999999 9 148 21 5 71
> 10 999999 9 141 22 6 70
> 11 999999 9 143 22 5 69
> 12 999999 8 129 20 4 68
> 13 999999 9 149 21 5 67
> 14 999999 9 135 21 5 66
> 15 999999 8 230 22 5 65
I confess I don't understand the above numbers. What are min and max
and in what units? Are you saying the large max numbers are caused by
the cascade?
>
>
>
>>It appears that the "monotonic_clock" is being used to drive ktimers.
>>The "monotonic_clock" was NEVER meant to poke outside of the kernel. It
>>is a raw kernel clock that is only required to be monotonic with nothing
>>said about accuracy. It should NOT be confused with CLOCK_MONOTONIC
>>which is directly tied to xtime and therefor is ntp corrected.
>
>
> I'm well aware of that and this is a completely different playground.
>
> The ktimers implementation is _independent_ of the clock sources. The
> current clock source implementation may suck, but thats not a problem of
> ktimers at all. Its a problem of arch/XXX/timeYY and kernel/time.c. I
> did not spend too much time on that as John Stultz is working on the
> timesource and timeofday stuff and I really hope that this gets
> accepted.
>
> The relation ship of clock sources and their accuracy is also a
> worthwhile field of discussion.
>
>
>>These are only the concerns I have from having a rather quick look at
>>the code. I am sure that there are other issues...
>
>
> It would be hubristic to deny that :)
>
> OTH,
>
> - The posix timer tests run all successful, except the broken 2timertest
> which fails on any other HRT kernel too and the sleep to long for real
> timers when the clock is set backwards, which is easily solvable
> (working on that).
Your mileage seems to differ from mine. Here is what I get from ./do_test:
The following tests failed:
clock_nanosleeptest
abs_timer_test
4-1
clock_settimetest
clock_gettimetest2
2timer_test
Then, on the second run, it crashed in an attempt to get the monotonic
clock (a divide error). System is a dual PIII, 800Mhz. This from the
rt11 patch.
>
> - The performance improvements in combination with simpler code is an
> argument of itself
--
George Anzinger george@mvista.com
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/
next prev parent reply other threads:[~2005-09-15 22:36 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-09-13 10:00 2.6.13-rt6, ktimer subsystem Ingo Molnar
2005-09-13 19:59 ` Lee Revell
2005-09-13 20:06 ` Lee Revell
2005-09-13 20:10 ` Ingo Molnar
2005-09-13 20:36 ` Lee Revell
2005-09-15 7:55 ` Ingo Molnar
2005-09-15 11:37 ` Ingo Molnar
2005-09-14 15:56 ` Darren Hart
2005-09-14 22:09 ` Darren Hart
2005-09-14 19:38 ` George Anzinger
2005-09-15 2:25 ` Thomas Gleixner
2005-09-15 22:35 ` George Anzinger [this message]
2005-09-15 22:53 ` Thomas Gleixner
2005-09-15 23:10 ` George Anzinger
2005-09-15 23:09 ` Daniel Walker
2005-09-16 0:08 ` George Anzinger
2005-09-15 9:20 ` Ingo Molnar
2005-09-15 23:04 ` George Anzinger
2005-09-15 23:20 ` Thomas Gleixner
2005-09-15 9:43 ` Roman Zippel
2005-09-26 7:02 ` 2.6.14-rc2-rt2 Ingo Molnar
2005-09-27 6:13 ` 2.6.14-rc2-rt2 Eran Mann
2005-09-27 10:33 ` 2.6.14-rc2-rt2 Ingo Molnar
2005-09-27 16:59 ` 2.6.14-rc2-rt2 Fernando Lopez-Lezcano
2005-09-27 22:15 ` 2.6.14-rc2-rt2 Thomas Gleixner
2005-09-27 23:11 ` 2.6.14-rc2-rt2 Fernando Lopez-Lezcano
2005-09-27 23:10 ` 2.6.14-rc2-rt2 Daniel Walker
2005-09-28 3:04 ` 2.6.14-rc2-rt2 Fernando Lopez-Lezcano
2005-09-28 9:48 ` 2.6.14-rc2-rt2 Ingo Molnar
2005-09-28 16:34 ` 2.6.14-rc2-rt2 Fernando Lopez-Lezcano
2005-09-29 9:07 ` 2.6.14-rc2-rt2 Eran Mann
2005-09-28 9:10 ` 2.6.14-rc2-rt2 Peter Zijlstra
2005-09-29 16:45 ` 2.6.14-rc2-rt2 Badari Pulavarty
2005-09-30 10:58 ` 2.6.14-rc2-rt2 Ingo Molnar
2005-10-02 15:18 ` 2.6.14-rc3-rt1 Ingo Molnar
2005-10-02 15:42 ` 2.6.14-rc3-rt1 Mark Knecht
2005-10-02 19:25 ` 2.6.14-rc3-rt1 Mark Knecht
2005-10-06 17:13 ` 2.6.14-rc3-rt1 Steven Rostedt
2005-10-07 11:09 ` [patch] pcmcia-shutdown-fix.patch Ingo Molnar
2005-10-07 19:17 ` Russell King
2005-10-07 19:46 ` Steven Rostedt
2005-10-10 15:13 ` Steven Rostedt
2005-10-10 15:37 ` Steven Rostedt
2005-10-02 20:51 ` 2.6.14-rc3-rt1 Felix Oxley
2005-10-02 21:55 ` 2.6.14-rc3-rt1 Felix Oxley
2005-10-03 6:33 ` 2.6.14-rc3-rt1 Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4329F733.2090604@mvista.com \
--to=george@mvista.com \
--cc=dwalker@mvista.com \
--cc=high-res-timers-discourse@lists.sourceforge.net \
--cc=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=nish.aravamudan@gmail.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox