From: James Bruce <bruce@andrew.cmu.edu>
To: Roman Zippel <zippel@linux-m68k.org>
Cc: David Lang <david.lang@digitalinsight.com>,
Kyle Moffett <mrmacman_g4@mac.com>,
Steven Rostedt <rostedt@goodmis.org>,
johnstul@us.ibm.com, george@mvista.com, mingo@elte.hu,
akpm@osdl.org, linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
ray-gmail@madrabbit.org, Russell King <rmk+lkml@arm.linux.org.uk>
Subject: Re: [patch 00/43] ktimer reworked
Date: Wed, 07 Dec 2005 04:35:49 -0500 [thread overview]
Message-ID: <4396ACF5.3050204@andrew.cmu.edu> (raw)
In-Reply-To: <Pine.LNX.4.61.0512021124360.1609@scrub.home>
Hi,
Roman Zippel wrote:
> On Thu, 1 Dec 2005, David Lang wrote:
>>In addition, once you remove the bulk of these uses from the picture (by
>>makeing them use a new timer type that's optimized for their useage pattern,
>>the 'unlikly to expire' case) the remainder of the timer users easily fall
>>into the catagory where the timer is expected to expire, so that code can
>>accept a performance hit for removing events prior to them going off that
>>would not be acceptable in a general case version.
>
> Guys, before you continue spreading nonsense, please read carefully Ingos
> description of the timer wheel at http://lwn.net/Articles/156329/ .
> Let me also refine the statement I made in this mail: the _focus_ on
> delivery is complete nonsense.
Must you start every email with inflammatory rhetoric? If you want to
know why you find it difficult to get people to see things your way, the
key is in the above paragraph. In everyday life you don't insult a
person on the street and then ask them for directions.
Yes, the expiry/non-expiry distinction is an approximation and perhaps
an oversimplification. However, after insulting others for that, you
continue with your own oversimplification of the algorithms involved.
Following your words, I could say "Roman, before you continue spreading
nonsense, please go back and read your algorithms textbook". The
reality though is that both of you are approximately correct, and
neither post deserves to be called nonsense.
> The delivery is really not the important part, what is important is the
> _lifetime_ of the timer. As Ingo said we try to delay as much work as
> possible into the future, so that all the work needed for short-lived
> timer is basically:
>
> list_add() + list_del()
>
> This is a constant operation and whether at the end is a callback is
> unimportant from the perspective of the timer system.
Timeout-style timers imply a short lifetime, independent of their
maximum expiry time. Regular timers expected to expire can have their
lifetime predicted accurately by looking at their expiry time. An
interface which gives a hint as to the type of timer allows us to
predict the lifetime. Please tell me how this tight relation is nonsense?
You are right in that the lifetime is what is important, but the whole
point of the ktimer distinction is that by knowing if something is a
timer vs a timeout, *we can more accurately predict the lifetime*.
> When the timer spends more time in the timer wheel, it has to be moved
> into different slots over time, but this not a really expensive operation
> either, so e.g. all the work needed with a single cascading step is:
>
> list_add() + list_del() + list_add() + list_del()
>
> This is still quite cheap and with a single cascading step we cover 2^14
> jiffies (2^10 for small configurations), which is quite a lot of time and
> whether in that time the timer is delivered or not doesn't change above
> cost. Another important thing to realize is that this cost is independent
> of the amount of timers, the per timer cost depends only on the timeout
> value.
On the other hand, there is a huge difference between *amortized*
constant time, and constant time. The cascade falls into the former
category, and affects latency a great deal. It's cheap *per timer*, but
by batching so much work to be done at once, it's not cheap to execute
the *cascade operation*. If you are putting important, latency
sensitive timers in the same data structure as non-latency-sensitive
timeouts, it is going to hurt accuracy and timeliness. For timeouts we
don't care much, since they rarely cascade; Timers which expire *will*
go through all the cascades based on their expiry time, and if there are
many of them, worst-case latency will suffer. In a mathematical sense
then, it's not O(1) and to call it so is incorrect. It's "amortized
O(log(l))", where "l" is the lifetime of the timer, and the minimum
resolution is constant.
> So let's look at the new timer which uses a rbtree. Its per timer cost
> doesn't depend on the expiry value, but on the size of the tree instead.
> All you have to do with the timer is:
>
> tree_insert() + tree_remove()
>
> This is not a constant operation, with O(log(n)) it grows quite slowly,
> but in any case it's more expensive than a simple list_add/list_del, this
> means you have to do a number of list operations before it becomes more
> expensive than a single tree operation. The nonconstant cost also means
> the more timer start using the rbtree, the relatively cheaper it becomes
> to use the timer wheel again.
Thank you for a very good argument about why timeouts shouldn't use
rbtree, and should continue to use the timer wheel. Nobody disagrees on
this however, and adding ktimers will not force any existing users to
change to the new interface.
> The break-even point may now be different on various machines, but I think
> it's safe to assume that two list add/del is at least as cheap and usually
> cheaper then a tree add/del. This means timers which run for less than
> 2^14 jiffies are better off using the timer wheel, unless they require the
> higher resolution of the new timer system.
Again, putting timeouts on the timer wheel is ideal, since we know they
tend to have short lifetimes. Same goes for low-resolution timers which
only need jiffy accuracy. However, jiffy accuracy doesn't cut it for a
lot of applications. It is when we add high accuracy that the timer
wheel falls down, and requires a different approach. So let's call "dt"
the desired resolution of the timer in seconds. Then the timer wheel
becomes "amortized O(log(l/dt)) = O(log(l) + log(1/dt))". When you
start talking about resolutions where dt=25usec, then the timer wheel
all the sudden becomes worse than a balanced tree, which is always O(n),
independent of resolution.
And that's the whole *point* about how we got here. Let the low
resolution, low lifetime timeouts stay on the timer wheel, and make a
new approach that specializes in handling longer lifetime, higher
resolution timers. That's ktimers in a nutshell. You seem to be
arguing for it rather than against it.
> Moving timers away from the timer wheel will also not help with the
> problem cases of the timer wheel. If you have a million network timer, a
> cascading step for thousands of timer takes time, but it doesn't change
> the cost per timer, we just have to do the work that we were too lazy to
> do before. In this case it would be better to look into solutions which
> avoid generating millions of timer in first place.
Putting timers on an rbtree most definitely helps with the worst-case
latency of the timer wheel. That is an issue that some of us care very
deeply about.
You've brought up the fact that networking shouldn't use lots of timers
several times in the overall discussion. If you know how to do this,
I'm sure you can start sending patches to netdev and show them all how
stupid they've been all along. However, more likely you'll just find
out that just maybe the networking people really *have* thought about
the problem, and the solution they came up with is actually a pretty
good one.
At any rate, while you fix up all those "timer-abusing" subsystems
throughout the kernel, can we just try to improve the timer system in
the meantime?
> So can we please stop this likely/unlikely expiry nonsense? It's great if
> you want to tell aunt Tillie about kernel hacking, but it's terrible
> advice to kernel programmers. When it comes to choosing a timer
> implementation, the delivery is completely and utterly unimportant.
Expected expiry is a simple predictor of expected lifetime. If we knew
the lifetime, we could use that, but expiry is one hint that is easier
for the developer to provide. Really, we want to know "E[l]/dt" (E[] is
notation for expected value), but that's unrealistic to estimate. What
ktimers says is: if it's a timeout (E[l] is low and dt is high), use the
timer wheel, and if its a timer (E[l] is high and dt is low), use an
rbtree. In what way is that not a reasonable approach?
Jim Bruce
next prev parent reply other threads:[~2005-12-07 9:36 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-11-30 23:56 [patch 00/43] ktimer reworked Thomas Gleixner
2005-12-01 0:41 ` Andrew Morton
2005-12-01 2:19 ` Ingo Molnar
2005-12-01 3:32 ` Roman Zippel
2005-12-01 3:57 ` Kyle Moffett
2005-12-01 15:40 ` Roman Zippel
2005-12-01 16:22 ` Ray Lee
2005-12-01 16:51 ` Russell King
2005-12-01 17:44 ` Roman Zippel
2005-12-01 19:08 ` Steven Rostedt
2005-12-01 21:11 ` Roman Zippel
2005-12-01 22:03 ` Steven Rostedt
2005-12-02 0:29 ` Roman Zippel
2005-12-02 0:41 ` Kyle Moffett
2005-12-02 0:58 ` john stultz
2005-12-02 1:01 ` Roman Zippel
2005-12-02 1:09 ` Kyle Moffett
2005-12-02 1:24 ` Roman Zippel
2005-12-02 1:47 ` David Lang
2005-12-02 14:43 ` Roman Zippel
2005-12-02 15:41 ` Kyle Moffett
2005-12-07 9:35 ` James Bruce [this message]
2005-12-07 12:34 ` Roman Zippel
2005-12-07 14:15 ` Kyle Moffett
2005-12-07 15:03 ` Roman Zippel
2005-12-07 14:17 ` Steven Rostedt
2005-12-08 15:43 ` James Bruce
2005-12-02 2:51 ` Steven Rostedt
2005-12-04 1:28 ` Andrew James Wade
2005-12-05 19:40 ` Roman Zippel
2005-12-06 2:46 ` Andrew James Wade
2005-12-01 20:24 ` Andrew Morton
2005-12-01 21:19 ` Ingo Molnar
2005-12-01 21:51 ` Andrew Morton
2005-12-01 22:13 ` Kyle Moffett
2005-12-01 22:15 ` Christoph Hellwig
2005-12-02 0:02 ` Thomas Gleixner
2005-12-02 0:36 ` Kyle Moffett
2005-12-02 1:06 ` Andrew Morton
2005-12-02 14:42 ` John Stoffel
2005-12-02 2:21 ` Steven Rostedt
2005-12-02 0:46 ` Roman Zippel
2005-12-01 16:52 ` Roman Zippel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4396ACF5.3050204@andrew.cmu.edu \
--to=bruce@andrew.cmu.edu \
--cc=akpm@osdl.org \
--cc=david.lang@digitalinsight.com \
--cc=george@mvista.com \
--cc=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mrmacman_g4@mac.com \
--cc=ray-gmail@madrabbit.org \
--cc=rmk+lkml@arm.linux.org.uk \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=zippel@linux-m68k.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox