From: Nick Craig-Wood <ncw1@axis.demon.co.uk>
To: Linus Torvalds <torvalds@osdl.org>
Cc: William Lee Irwin III <wli@holomorphy.com>,
linux-kernel@vger.kernel.org, Rohit Seth <rohit.seth@intel.com>
Subject: Re: 2.6.0 Huge pages not working as expected
Date: Sat, 27 Dec 2003 09:01:42 +0000 [thread overview]
Message-ID: <20031227090142.GA8777@axis.demon.co.uk> (raw)
In-Reply-To: <Pine.LNX.4.58.0312261226560.14874@home.osdl.org>
On Fri, Dec 26, 2003 at 12:33:58PM -0800, Linus Torvalds wrote:
> On Fri, 26 Dec 2003, Nick Craig-Wood wrote:
> >
> > The results are just about the same - a slight slowdown for
> > hugepages...
>
> I don't think you are really testing the TLB - you are testing the data
> cache.
>
> And the thing is, using huge pages will mean that the pages are 1:1
> mapped, and thus get "perfectly" cache-coloured, while the anonymous mmap
> will give you random placement.
Mmmm, yes.
> And what you are seeing is likely the fact that random placement is
> guaranteed to not have any worst-case behaviour. While perfect
> cache-coloring very much _does_ have worst-case schenarios, and you're
> likely triggering one of them.
>
> In particular, using a pure power-of-two stride means that you are
> limiting your cache to a certain subset of the full result with the
> perfect coloring.
>
> This, btw, is why I don't like page coloring: it does give nicely
> reproducible results, but it does not necessarily improve performance.
> Random placement has a lot of advantages, one of which is a lot smoother
> performance degradation - which I personally think is a good thing.
>
> Try your program with non-power-of-two, and non-page-aligned strides. I
> suspect the results will change (but I suspect that the TLB wins will
> still be pretty much in the noise compared to the actual data cache
> effects).
Yes you are right and I should have thought have that as I know that
FFTs often have a bit of padding on each row to make them a non power
of two to avoid this effect!
Here are the results again with a some non power of two strides run on
a P4. Apart from the variable results the hugetlb ones are always
less than the small page ones.
Memory from /dev/zero
Testing memory at 0x42400000
span = 1, time = 12.103 ms, total = -973807672
span = 2, time = 21.051 ms, total = -973807672
span = 3, time = 28.391 ms, total = -973807672
span = 5, time = 44.004 ms, total = -973807672
span = 7, time = 60.622 ms, total = -973807672
span = 11, time = 96.537 ms, total = -973807672
span = 13, time = 116.335 ms, total = -973807672
span = 17, time = 153.163 ms, total = -973807672
span = 33, time = 276.764 ms, total = -973807672
span = 77, time = 282.419 ms, total = -973807672
span = 119, time = 287.168 ms, total = -973807672
span = 221, time = 298.292 ms, total = -973807672
span = 561, time = 343.215 ms, total = -973807672
span = 963, time = 418.078 ms, total = -973807672
span = 1309, time = 446.026 ms, total = -973807672
span = 2023, time = 253.098 ms, total = -973807672
span = 4335, time = 68.616 ms, total = -973807672
Memory from hugetlbfs
Testing memory at 0x41400000
span = 1, time = 12.059 ms, total = -973807672
span = 2, time = 20.745 ms, total = -973807672
span = 3, time = 28.324 ms, total = -973807672
span = 5, time = 43.683 ms, total = -973807672
span = 7, time = 60.228 ms, total = -973807672
span = 11, time = 95.680 ms, total = -973807672
span = 13, time = 115.695 ms, total = -973807672
span = 17, time = 152.603 ms, total = -973807672
span = 33, time = 275.821 ms, total = -973807672
span = 77, time = 280.759 ms, total = -973807672
span = 119, time = 285.515 ms, total = -973807672
span = 221, time = 295.163 ms, total = -973807672
span = 561, time = 335.941 ms, total = -973807672
span = 963, time = 411.387 ms, total = -973807672
span = 1309, time = 433.168 ms, total = -973807672
span = 2023, time = 119.780 ms, total = -973807672
span = 4335, time = 32.085 ms, total = -973807672
Isn't modern memory management fun ;-)
--
Nick Craig-Wood
ncw1@axis.demon.co.uk
next prev parent reply other threads:[~2003-12-27 9:01 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-12-26 10:54 2.6.0 Huge pages not working as expected Nick Craig-Wood
2003-12-26 11:56 ` William Lee Irwin III
2003-12-26 20:10 ` Nick Craig-Wood
2003-12-26 20:15 ` William Lee Irwin III
2003-12-26 20:33 ` Linus Torvalds
2003-12-27 3:36 ` Andrea Arcangeli
2003-12-27 4:01 ` Linus Torvalds
2003-12-27 9:28 ` David S. Miller
2003-12-27 15:58 ` Andrea Arcangeli
2003-12-27 9:01 ` Nick Craig-Wood [this message]
2004-01-06 14:24 ` Kurt Garloff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20031227090142.GA8777@axis.demon.co.uk \
--to=ncw1@axis.demon.co.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=rohit.seth@intel.com \
--cc=torvalds@osdl.org \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox