From: Brian Gerst <bgerst@didntduck.org>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: Kevin Pedretti <ktpedre@sandia.gov>, linux-kernel@vger.kernel.org
Subject: Re: [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance)
Date: Tue, 18 Mar 2003 13:30:54 -0500 [thread overview]
Message-ID: <3E7765DE.10609@didntduck.org> (raw)
In-Reply-To: <Pine.LNX.4.44.0303180809190.11381-100000@home.transmeta.com>
Linus Torvalds wrote:
> On Tue, 18 Mar 2003, Kevin Pedretti wrote:
>
>> I wasn't aware of what you state below but it makes sense. What I
>>haven't been able to figure out, and nobody seems to know, is why the
>>rodata section of an executable is placed in the text section and is not
>>page aligned. This seems to be a mixing of code and data on the same
>>page. Maybe it doesn't matter since it is read only?
>
>
> It's a bad idea to share even read-only data, but the impact of read-only
> data is much less that read-write. In particular, you should avoid sharing
> _any_ code and data in the same physical L1 cache-line, since that will be
> a big problem for any CPU with exclusion between the I$ and D$.
>
> HOWEVER, modern x86 CPU's tend to have the I$ be part of the cache
> coherency protocol, so instead of having exclusion they allow sharing as
> long as the D$ isn't actually dirty. In that case it's fine to share
> read-only data and code, although the cache utilization goes down if you
> do a lot of it.
>
> Anyway, as long as they are in separate cache-lines, you should be ok even
> on something with cache exclusion.
>
> When it comes to actually _writing_ to the data, at least on the P4 you
> don't want to have read-write data anywhere _near_ the I$ (somebody
> reported half-page granularity). This is true on crusoe too, btw (at a
> 128-byte granularity).
>
> Anyway, I think gcc should make sure that even the ro-data section is at
> least cacheline-aligned so that it stays away from cachelines used for I$.
> That makes sense even on CPU's that don't have exclusion, since it
> actually gives slightly better L1 cache utilization.
>
> You can run this (stupid) test-program to try. On my P4 I get
>
> empty overhead=320 cycles
> load overhead=0 cycles
> I$ load overhead=0 cycles
> I$ load overhead=0 cycles
> I$ store overhead=264 cycles
>
> and on my PIII I get
>
> empty overhead=74 cycles
> load overhead=8 cycles
> I$ load overhead=8 cycles
> I$ load overhead=8 cycles
> I$ store overhead=103 cycles
>
> and (just for fun) on an old crusoe I get
>
> empty overhead=67 cycles
> load overhead=-9 cycles
> I$ load overhead=-14 cycles
> I$ load overhead=-14 cycles
> I$ store overhead=12 cycles
>
> where that "negative overhead" just shows that we do some strnge things to
> scheduling, and the loop actually ends up faster if it has a load in it
> than without the load..
>
> But you can see that storing to code is a really bad idea. Especially on a
> P4, where the overhead for a store was 264 cycles! (You can also see the
> cost of doing just the empty synchronization and rdtsc - 320 cycles for a
> rdtsc and two locked memory accesses on a P4).
>
> I don't have access to an old Pentium - I think that was the one that had
> the strict exclusion between the L1 I$ and D$, and then you should see the
> I$ load overhead go up.
>
> Linus
Here's a few more data points:
vendor_id : AuthenticAMD
cpu family : 5
model : 8
model name : AMD-K6(tm) 3D processor
stepping : 12
cpu MHz : 451.037
empty overhead=105 cycles
load overhead=-2 cycles
I$ load overhead=30 cycles
I$ load overhead=90 cycles
I$ store overhead=95 cycles
vendor_id : GenuineIntel
cpu family : 6
model : 3
model name : Pentium II (Klamath)
stepping : 3
cpu MHz : 265.913
empty overhead=73 cycles
load overhead=10 cycles
I$ load overhead=10 cycles
I$ load overhead=10 cycles
I$ store overhead=2 cycles
vendor_id : AuthenticAMD
cpu family : 6
model : 6
model name : AMD Athlon(tm) Processor
stepping : 2
cpu MHz : 1409.946
empty overhead=11 cycles
load overhead=5 cycles
I$ load overhead=5 cycles
I$ load overhead=5 cycles
I$ store overhead=826 cycles
The Athlon XP shows really bad behavior when you store to the text area.
--
Brian Gerst
next prev parent reply other threads:[~2003-03-18 18:20 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-02-12 1:35 [Bug 350] New: i386 context switch very slow compared to 2.4 due to wrmsr (performance) Martin J. Bligh
2003-02-12 2:59 ` Dave Jones
2003-02-12 4:21 ` Jamie Lokier
2003-02-12 5:49 ` Linus Torvalds
2003-02-12 10:12 ` Jamie Lokier
2003-03-10 3:07 ` Linus Torvalds
2003-03-10 11:06 ` Andi Kleen
2003-03-10 18:33 ` Linus Torvalds
2003-03-10 22:44 ` Linus Torvalds
2003-02-12 12:54 ` Dave Jones
2003-02-12 7:50 ` Andi Kleen
2003-02-12 10:27 ` Jamie Lokier
2003-02-12 10:45 ` Andi Kleen
2003-02-12 17:52 ` Ingo Oeser
2003-02-12 18:13 ` Dave Jones
2003-02-12 18:18 ` Andi Kleen
2003-02-13 2:42 ` Alan Cox
2003-02-13 5:17 ` Eric W. Biederman
2003-02-13 18:07 ` Andi Kleen
2003-02-14 0:14 ` [discuss] " Peter Tattam
2003-02-14 1:29 ` Andi Kleen
2003-02-14 1:51 ` Eric Northup
2003-02-14 2:01 ` Peter Tattam
2003-02-14 4:07 ` Thomas J. Merritt
2003-02-14 9:38 ` Peter Finderup Lund
2003-02-14 8:27 ` Eric W. Biederman
2003-03-19 1:22 ` Rob Landley
2003-02-12 4:18 ` Jamie Lokier
2003-02-12 5:54 ` Linus Torvalds
2003-02-12 10:18 ` Jamie Lokier
2003-02-12 17:24 ` Linus Torvalds
2003-03-18 15:24 ` Kevin Pedretti
2003-03-18 16:41 ` Linus Torvalds
2003-03-18 18:30 ` Brian Gerst [this message]
2003-03-18 19:14 ` Thomas Molina
2003-03-18 19:21 ` Linus Torvalds
2003-03-18 20:03 ` Thomas Schlichter
2003-03-18 20:24 ` Steven Cole
2003-03-19 0:42 ` H. Peter Anvin
2003-03-19 2:22 ` george anzinger
[not found] <20030318165013$55f4@gated-at.bofh.it>
[not found] ` <20030318184010$6448@gated-at.bofh.it>
2003-03-18 20:19 ` Pascal Schmidt
-- strict thread matches above, loose matches on Subject: below --
2003-03-19 9:55 Ph. Marek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3E7765DE.10609@didntduck.org \
--to=bgerst@didntduck.org \
--cc=ktpedre@sandia.gov \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox