* Re: MAP_SHARED bizarrely slow
2004-10-27 8:06 ` Andrew Morton
@ 2004-10-27 8:20 ` David Gibson
2004-10-27 8:30 ` James Cloos
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2004-10-27 8:20 UTC (permalink / raw)
To: Andrew Morton; +Cc: James Cloos, linux-kernel
On Wed, Oct 27, 2004 at 01:06:59AM -0700, Andrew Morton wrote:
> James Cloos <cloos@jhcloos.com> wrote:
> >
> > >>>>> "David" == David Gibson <david@gibson.dropbear.id.au> writes:
> >
> > David> http://www.ozlabs.org/people/dgibson/maptest.tar.gz
> >
> > David> On a number of machines I've tested - both ppc64 and x86 - the
> > David> SHARED version is consistently and significantly (50-100%)
> > David> slower than the PRIVATE version.
> >
> > Just gave it a test on my laptop and server. Both are p3. The
> > laptop is under heavier mem pressure; the server has just under
> > a gig with most free/cache/buff. Laptop is still running 2.6.7
> > whereas the server is bk as of 2004-10-24.
> >
> > Buth took about 11 seconds for the private and around 30 seconds
> > for the shared tests.
> >
>
> I get the exact opposite, on a P4:
>
> vmm:/home/akpm/maptest> time ./mm-sharemmap
> ./mm-sharemmap 10.81s user 0.05s system 100% cpu 10.855 total
> vmm:/home/akpm/maptest> time ./mm-sharemmap
> ./mm-sharemmap 11.04s user 0.05s system 100% cpu 11.086 total
> vmm:/home/akpm/maptest> time ./mm-privmmap
> ./mm-privmmap 26.91s user 0.02s system 100% cpu 26.903 total
> vmm:/home/akpm/maptest> time ./mm-privmmap
> ./mm-privmmap 26.89s user 0.02s system 100% cpu 26.894 total
> vmm:/home/akpm/maptest> uname -a
> Linux vmm 2.6.10-rc1-mm1 #14 SMP Tue Oct 26 23:23:23 PDT 2004 i686 i686 i386 GNU/Linux
How very odd. I've now understood what was happening (see other
post), but I'm not sure what could reverse the situation. Can you
download the test tarball again - I've put up an updated version which
pretouches the pages and gives some extra info. Running it both with
and without pretouch would be interesting (#if 0/1 in matmul.h to
change).
> It's all user time so I can think of no reason apart from physical page
> allocation order causing additional TLB reloads in one case. One is using
> anonymous pages and the other is using shmem-backed pages, although I can't
> think why that would make a difference.
>
>
> Let's back out the no-buddy-bitmap patches:
>
> vmm:/home/akpm/maptest> time ./mm-sharemmap
> ./mm-sharemmap 12.01s user 0.06s system 99% cpu 12.087 total
> vmm:/home/akpm/maptest> time ./mm-sharemmap
> ./mm-sharemmap 12.56s user 0.05s system 100% cpu 12.607 total
> vmm:/home/akpm/maptest> time ./mm-privmmap
> ./mm-privmmap 26.74s user 0.03s system 99% cpu 26.776 total
> vmm:/home/akpm/maptest> time ./mm-privmmap
> ./mm-privmmap 26.66s user 0.02s system 100% cpu 26.674 total
>
> much the same.
>
> Backing out "[PATCH] tweak the buddy allocator for better I/O merging" from
> June 24 makes no difference.
>
--
David Gibson | For every complex problem there is a
david AT gibson.dropbear.id.au | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: MAP_SHARED bizarrely slow
2004-10-27 8:06 ` Andrew Morton
2004-10-27 8:20 ` David Gibson
@ 2004-10-27 8:30 ` James Cloos
2004-10-27 20:54 ` Bill Davidsen
2004-10-28 5:54 ` dean gaudet
3 siblings, 0 replies; 10+ messages in thread
From: James Cloos @ 2004-10-27 8:30 UTC (permalink / raw)
To: linux-kernel; +Cc: david, Andrew Morton
>>>>> "Andrew" == Andrew Morton <akpm@osdl.org> writes:
JimC> Both took about 11 seconds for the private and around 30 seconds
JimC> for the shared tests.
Andrew> I get the exact opposite, on a P4:
Interesting. I gave it a try on a couple of my UMLs. One is on a P4
(possibly xeon; not sure) and the other is on an athlon. The p4 did
shared about twice as fast as private and the athlon was 50% faster.
(p4 uses uml kernel 2.4.27; athlon 2.6.6; no idea what the hosts run.)
-JimC
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: MAP_SHARED bizarrely slow
2004-10-27 8:06 ` Andrew Morton
2004-10-27 8:20 ` David Gibson
2004-10-27 8:30 ` James Cloos
@ 2004-10-27 20:54 ` Bill Davidsen
2004-10-28 1:16 ` David Gibson
2004-10-28 5:54 ` dean gaudet
3 siblings, 1 reply; 10+ messages in thread
From: Bill Davidsen @ 2004-10-27 20:54 UTC (permalink / raw)
To: Andrew Morton; +Cc: James Cloos, linux-kernel, david
Andrew Morton wrote:
> James Cloos <cloos@jhcloos.com> wrote:
>
>>>>>>>"David" == David Gibson <david@gibson.dropbear.id.au> writes:
>>
>>David> http://www.ozlabs.org/people/dgibson/maptest.tar.gz
>>
>>David> On a number of machines I've tested - both ppc64 and x86 - the
>>David> SHARED version is consistently and significantly (50-100%)
>>David> slower than the PRIVATE version.
>>
>>Just gave it a test on my laptop and server. Both are p3. The
>>laptop is under heavier mem pressure; the server has just under
>>a gig with most free/cache/buff. Laptop is still running 2.6.7
>>whereas the server is bk as of 2004-10-24.
>>
>>Buth took about 11 seconds for the private and around 30 seconds
>>for the shared tests.
>>
>
>
> I get the exact opposite, on a P4:
>
> vmm:/home/akpm/maptest> time ./mm-sharemmap
> ./mm-sharemmap 10.81s user 0.05s system 100% cpu 10.855 total
> vmm:/home/akpm/maptest> time ./mm-sharemmap
> ./mm-sharemmap 11.04s user 0.05s system 100% cpu 11.086 total
> vmm:/home/akpm/maptest> time ./mm-privmmap
> ./mm-privmmap 26.91s user 0.02s system 100% cpu 26.903 total
> vmm:/home/akpm/maptest> time ./mm-privmmap
> ./mm-privmmap 26.89s user 0.02s system 100% cpu 26.894 total
> vmm:/home/akpm/maptest> uname -a
> Linux vmm 2.6.10-rc1-mm1 #14 SMP Tue Oct 26 23:23:23 PDT 2004 i686 i686 i386 GNU/Linux
>
> It's all user time so I can think of no reason apart from physical page
> allocation order causing additional TLB reloads in one case. One is using
> anonymous pages and the other is using shmem-backed pages, although I can't
> think why that would make a difference.
I think the cause was covered in another post, I'm surprised that the
page overhead is reported as user time. It would have been a good hint
if the big jump were in system time.
Yes, I know some kernel time is charged to the user, I'm just not sure
diddling the page tables should be, since it might mask the effect of vm
changes, etc.
That's comment not a suggestion.
--
-bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: MAP_SHARED bizarrely slow
2004-10-27 20:54 ` Bill Davidsen
@ 2004-10-28 1:16 ` David Gibson
0 siblings, 0 replies; 10+ messages in thread
From: David Gibson @ 2004-10-28 1:16 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Andrew Morton, James Cloos, linux-kernel
On Wed, Oct 27, 2004 at 04:54:42PM -0400, Bill Davidsen wrote:
> Andrew Morton wrote:
> >James Cloos <cloos@jhcloos.com> wrote:
> >
> >>>>>>>"David" == David Gibson <david@gibson.dropbear.id.au> writes:
> >>
> >>David> http://www.ozlabs.org/people/dgibson/maptest.tar.gz
> >>
> >>David> On a number of machines I've tested - both ppc64 and x86 - the
> >>David> SHARED version is consistently and significantly (50-100%)
> >>David> slower than the PRIVATE version.
> >>
> >>Just gave it a test on my laptop and server. Both are p3. The
> >>laptop is under heavier mem pressure; the server has just under
> >>a gig with most free/cache/buff. Laptop is still running 2.6.7
> >>whereas the server is bk as of 2004-10-24.
> >>
> >>Buth took about 11 seconds for the private and around 30 seconds
> >>for the shared tests.
> >>
> >
> >
> >I get the exact opposite, on a P4:
> >
> >vmm:/home/akpm/maptest> time ./mm-sharemmap
> >./mm-sharemmap 10.81s user 0.05s system 100% cpu 10.855 total
> >vmm:/home/akpm/maptest> time ./mm-sharemmap
> >./mm-sharemmap 11.04s user 0.05s system 100% cpu 11.086 total
> >vmm:/home/akpm/maptest> time ./mm-privmmap
> >./mm-privmmap 26.91s user 0.02s system 100% cpu 26.903 total
> >vmm:/home/akpm/maptest> time ./mm-privmmap
> >./mm-privmmap 26.89s user 0.02s system 100% cpu 26.894 total
> >vmm:/home/akpm/maptest> uname -a
> >Linux vmm 2.6.10-rc1-mm1 #14 SMP Tue Oct 26 23:23:23 PDT 2004 i686 i686
> >i386 GNU/Linux
> >
> >It's all user time so I can think of no reason apart from physical page
> >allocation order causing additional TLB reloads in one case. One is using
> >anonymous pages and the other is using shmem-backed pages, although I can't
> >think why that would make a difference.
>
> I think the cause was covered in another post, I'm surprised that the
> page overhead is reported as user time. It would have been a good hint
> if the big jump were in system time.
The cause isn't page overhead. The problem is that the SHARED version
actually uses a whole lot more real memory, so cache performance is
much worse. So the time really is in userland.
--
David Gibson | For every complex problem there is a
david AT gibson.dropbear.id.au | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: MAP_SHARED bizarrely slow
2004-10-27 8:06 ` Andrew Morton
` (2 preceding siblings ...)
2004-10-27 20:54 ` Bill Davidsen
@ 2004-10-28 5:54 ` dean gaudet
3 siblings, 0 replies; 10+ messages in thread
From: dean gaudet @ 2004-10-28 5:54 UTC (permalink / raw)
To: Andrew Morton; +Cc: James Cloos, linux-kernel, david
On Wed, 27 Oct 2004, Andrew Morton wrote:
> I get the exact opposite, on a P4:
>
> vmm:/home/akpm/maptest> time ./mm-sharemmap
> ./mm-sharemmap 10.81s user 0.05s system 100% cpu 10.855 total
> vmm:/home/akpm/maptest> time ./mm-sharemmap
> ./mm-sharemmap 11.04s user 0.05s system 100% cpu 11.086 total
> vmm:/home/akpm/maptest> time ./mm-privmmap
> ./mm-privmmap 26.91s user 0.02s system 100% cpu 26.903 total
> vmm:/home/akpm/maptest> time ./mm-privmmap
> ./mm-privmmap 26.89s user 0.02s system 100% cpu 26.894 total
> vmm:/home/akpm/maptest> uname -a
> Linux vmm 2.6.10-rc1-mm1 #14 SMP Tue Oct 26 23:23:23 PDT 2004 i686 i686 i386 GNU/Linux
>
> It's all user time so I can think of no reason apart from physical page
> allocation order causing additional TLB reloads in one case. One is using
> anonymous pages and the other is using shmem-backed pages, although I can't
> think why that would make a difference.
you're experiencing the wonder of the L1 data cache on the P4 ... based on
its behaviour i'm pretty sure that early in the pipeline they use the
virtual address to match a virtual tag and procede with that data as if
it's correct. not until the TLB lookup and physical tag check many cycles
later does it realise that it's done something wrong and pull kill / flush
pipelines.
when you set up a virtual alias, like you have with the shared zero page,
it becomes very confused.
in fact if you do something as simple as a 4 element pointer-chase where
the cache lines for elt 0 and 2 alias, and cache lines for 1 and 3 alias
then you can watch some p4 take up to 3000 cycles per reference.
-dean
^ permalink raw reply [flat|nested] 10+ messages in thread