From: Andi Kleen <ak@suse.de>
To: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Vladimir B. Savkin" <master@sectorb.msk.ru>,
Jesper Dangaard Brouer <hawk@diku.dk>,
Harry Edmon <harry@atmos.washington.edu>,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Date: Mon, 18 Sep 2006 17:54:37 +0200 [thread overview]
Message-ID: <200609181754.37623.ak@suse.de> (raw)
In-Reply-To: <20060918153822.GA805@ms2.inr.ac.ru>
On Monday 18 September 2006 17:38, Alexey Kuznetsov wrote:
> Hello!
>
> > For netdev: I'm more and more thinking we should just avoid the problem
> > completely and switch to "true end2end" timestamps. This means don't
> > time stamp when a packet is received, but only when it is delivered
> > to a socket.
>
> This will work.
>
> From viewpoint of existing uses of timestamp by packet socket
> this time is not worse. The only danger is violation of casuality
> (when forwarded packet or reply packet gets timestamp earlier than
> original packet).
Hmm, not sure how that could happen. Also is it a real problem
even if it could?
> > handler runs. Then the problem above would completely disappear.
>
> Well, not completely. Too slow clock source remains too slow clock source.
> If it is so slow, that it results in "performance degradation", it just
> should not be used at all, even such pariah as tcpdump wants to be fast.
>
> Actually, I have a question. Why the subject is
> "Network performance degradation from 2.6.11.12 to 2.6.16.20"?
> I do not see beginning of the thread and cannot guess
> why clock source degraded. :-)
It's a long and sad story.
Old kernels didn't disable the TSC on those boxes (multi core K8) and assumed
they were synchronized for timing purposes.
This initially mostly worked if you don't use cpufreq,
but over a longer uptime the TSCs would drift against each other and timing
would jump more and more between CPUs.
On older versions of K8 this drift happened much slower (more
aggressive power saving in HLT in newer steppings made it worse; that is why
idle=poll helps) and could be often ignored. But technically it was still a
bug there because it would could break timing after long uptimes.
New multi socket K8 boxes are generally
totally unusable with TSC because they use cpufreq and the TSCs can run
at completely differently frequencies, which obviously doesn't give very
good timing information if you assume the TSC is globally synchronized.
That is why later kernels default to TSC off. The original plan
was to use HPET then, which is slower than TSC, but still not that bad.
But while most modern systems have a HPET timer somewhere in the chipset
nearly all BIOS vendors "forgot" to describe it in the BIOS because Windows
didn't use it and Linux can't find it because of that.
Then it has to use the ACPI pmtmr which is really really slow.
The overhead of that thing is so large that you can clearly see it in
the network benchmark.
The real fix long term is to change the timer subsystem to keep all TSC
state per CPU, then it'll work on the K8s too. Unfortunately it's a moderately
hard problem to make the result still fully monotonic. But people are working
on it.
-Andi
next prev parent reply other threads:[~2006-09-18 15:54 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4492D5D3.4000303@atmos.washington.edu>
2006-06-17 22:35 ` Network performance degradation from 2.6.11.12 to 2.6.16.20 Andrew Morton
2006-06-17 23:23 ` Harry Edmon
2006-06-17 23:56 ` Andrew Morton
2006-06-18 3:16 ` Stephen Hemminger
2006-06-18 23:23 ` Harry Edmon
2006-06-19 13:54 ` Harry Edmon
2006-06-20 2:11 ` Herbert Xu
2006-06-19 14:47 ` Jesper Dangaard Brouer
2006-06-19 15:24 ` Andi Kleen
2006-06-19 17:34 ` Chris Friesen
2006-06-19 20:39 ` Andi Kleen
2006-06-19 18:24 ` Jesper Dangaard Brouer
2006-06-25 21:51 ` Harry Edmon
2006-06-26 4:20 ` Bill Fink
2006-06-25 22:22 ` Willy Tarreau
2006-06-26 5:23 ` Andi Kleen
2006-07-04 11:41 ` Jesper Dangaard Brouer
2006-07-04 11:54 ` Andi Kleen
2006-07-10 10:55 ` Jesper Dangaard Brouer
2006-09-16 12:08 ` Vladimir B. Savkin
2006-09-18 8:35 ` Andi Kleen
2006-09-18 9:03 ` Vladimir B. Savkin
2006-09-18 9:58 ` Andi Kleen
2006-09-18 10:29 ` Vladimir B. Savkin
2006-09-18 11:27 ` Andi Kleen
2006-09-18 15:38 ` Alexey Kuznetsov
2006-09-18 15:54 ` Andi Kleen [this message]
2006-09-18 16:28 ` Alexey Kuznetsov
2006-09-18 16:50 ` Andi Kleen
2006-09-18 21:03 ` Alexey Kuznetsov
2006-09-18 21:22 ` David Miller
2006-09-18 21:46 ` Alexey Kuznetsov
2006-09-19 5:55 ` Andi Kleen
2006-09-19 20:31 ` Thomas Graf
2006-09-19 20:43 ` Andi Kleen
2006-09-19 5:52 ` Andi Kleen
2006-09-18 21:18 ` Vladimir B. Savkin
2006-09-18 22:00 ` Alexey Kuznetsov
2006-09-18 21:57 ` David Lang
2006-09-19 19:40 ` David Miller
2006-09-19 19:44 ` Stephen Hemminger
2006-09-18 22:03 ` Vladimir B. Savkin
2006-09-19 19:41 ` David Miller
2006-09-19 19:47 ` David Miller
2006-09-22 15:35 ` Alexey Kuznetsov
2006-09-22 15:43 ` Andi Kleen
2006-09-22 16:51 ` Rick Jones
2007-03-06 13:25 ` Packet timestamps (was: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20) Vladimir B. Savkin
2007-03-06 14:38 ` Eric Dumazet
2007-03-06 14:43 ` Vladimir B. Savkin
2007-03-06 15:16 ` Eric Dumazet
2007-03-06 18:15 ` Vladimir B. Savkin
2006-09-18 21:08 ` Network performance degradation from 2.6.11.12 to 2.6.16.20 Vladimir B. Savkin
2006-09-18 14:09 ` David Miller
2006-09-18 14:29 ` Andi Kleen
2006-09-18 15:19 ` Alan Cox
2006-09-18 15:19 ` Andi Kleen
2006-06-19 16:40 ` Harry Edmon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200609181754.37623.ak@suse.de \
--to=ak@suse.de \
--cc=harry@atmos.washington.edu \
--cc=hawk@diku.dk \
--cc=kuznet@ms2.inr.ac.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=master@sectorb.msk.ru \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).