public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.11 vs 2.6.10 slowdown on i686
@ 2005-03-17 12:16 Ian Pratt
  2005-03-17 12:37 ` Nick Piggin
  2005-03-17 18:36 ` Andi Kleen
  0 siblings, 2 replies; 6+ messages in thread
From: Ian Pratt @ 2005-03-17 12:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: garloff, ak


Folks, 

When we upgraded arch xen/x86 to kernel 2.6.11, we noticed a slowdown
on a number of micro-benchmarks. In order to investigate, I built
native (non Xen) i686 uniprocessor kernels for 2.6.10 and 2.6.11 with
the same configuration and ran lmbench-3.0-a3 on them. The test
machine was a 2.4GHz Xeon box, gcc 3.3.3 (FC3 default) was used to
compile the kernels, NOHIGHMEM=y (2-level only).

On the i686 fork and exec benchmarks I found that there's been a
significant slowdown between 2.6.10 and 2.6.11. Some of the other
numbers a bit ugly too (see attached).

fork: 166 -> 235  (40% slowdown)
exec: 857 -> 1003 (17% slowdown)

I'm guessing this is down to the 4 level pagetables. This is rather a
surprise as I thought the compiler would optimise most of these
changes away. Apparently not. 

Anyhow, this explains the arch Xen results we were seeing.

Results appended, median of 6 runs.

Best,
Ian


Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh  
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
commando-  Linux 2.6.10 2400 0.49 0.57 2.06 3.06 19.6 0.89 2.70 166. 857. 2972
commando-  Linux 2.6.11 2400 0.49 0.60 2.12 3.35 20.8 0.92 2.73 235. 1003 3168

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
commando-  Linux 2.6.10 7.5800 4.3300 8.1900 5.1100   33.1 8.37000    41.9
commando-  Linux 2.6.11 7.9200 8.3200 8.3200 5.8300   26.6 9.46000    40.4

*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
commando-  Linux 2.6.10 7.750  19.4 21.3  37.2  45.5  42.5  53.2  76.
commando-  Linux 2.6.11 7.920  20.3 23.6  40.2  50.1  46.5  57.6  87.

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host                 OS   0K File      10K File     Mmap    Prot   Page   100fd
                        Create Delete Create Delete Latency Fault  Fault  selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
commando-  Linux 2.6.10   39.3   16.2   92.7   35.2   122.0 1.200 2.14310  18.3
commando-  Linux 2.6.11   40.8   16.8   99.5   36.7   163.0 1.075 2.27760  18.8

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
commando-  Linux 2.6.10 313. 440. 222. 1551.7 1528.5  549.1  566.8 1550 784.8
commando-  Linux 2.6.11 554. 450. 224. 1564.8 1548.3  549.9  574.6 1528 760.5


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.11 vs 2.6.10 slowdown on i686
  2005-03-17 12:16 2.6.11 vs 2.6.10 slowdown on i686 Ian Pratt
@ 2005-03-17 12:37 ` Nick Piggin
  2005-03-17 20:23   ` Ian Pratt
  2005-03-18  8:25   ` Kurt Garloff
  2005-03-17 18:36 ` Andi Kleen
  1 sibling, 2 replies; 6+ messages in thread
From: Nick Piggin @ 2005-03-17 12:37 UTC (permalink / raw)
  To: Ian Pratt; +Cc: linux-kernel, garloff, ak

Ian Pratt wrote:
> Folks, 
> 
> When we upgraded arch xen/x86 to kernel 2.6.11, we noticed a slowdown
> on a number of micro-benchmarks. In order to investigate, I built
> native (non Xen) i686 uniprocessor kernels for 2.6.10 and 2.6.11 with
> the same configuration and ran lmbench-3.0-a3 on them. The test
> machine was a 2.4GHz Xeon box, gcc 3.3.3 (FC3 default) was used to
> compile the kernels, NOHIGHMEM=y (2-level only).
> 
> On the i686 fork and exec benchmarks I found that there's been a
> significant slowdown between 2.6.10 and 2.6.11. Some of the other
> numbers a bit ugly too (see attached).
> 
> fork: 166 -> 235  (40% slowdown)
> exec: 857 -> 1003 (17% slowdown)
> 
> I'm guessing this is down to the 4 level pagetables. This is rather a
> surprise as I thought the compiler would optimise most of these
> changes away. Apparently not. 
> 

There are some changes in the current -bk tree (which are a
bit in-flux at the moment) which introduce some optimisations.

They should bring 2-level performance close to par with 2.6.10.
If not, complain again :)

Thanks,
Nick


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.11 vs 2.6.10 slowdown on i686
  2005-03-17 12:16 2.6.11 vs 2.6.10 slowdown on i686 Ian Pratt
  2005-03-17 12:37 ` Nick Piggin
@ 2005-03-17 18:36 ` Andi Kleen
  1 sibling, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2005-03-17 18:36 UTC (permalink / raw)
  To: Ian Pratt; +Cc: linux-kernel, garloff, ak

On Thu, Mar 17, 2005 at 12:16:40PM +0000, Ian Pratt wrote:
> 
> Folks, 
> 
> When we upgraded arch xen/x86 to kernel 2.6.11, we noticed a slowdown
> on a number of micro-benchmarks. In order to investigate, I built
> native (non Xen) i686 uniprocessor kernels for 2.6.10 and 2.6.11 with
> the same configuration and ran lmbench-3.0-a3 on them. The test
> machine was a 2.4GHz Xeon box, gcc 3.3.3 (FC3 default) was used to
> compile the kernels, NOHIGHMEM=y (2-level only).

Hmm, it is known that x86-64 performance is down because it touches
a lot more memory now on fork/exit. I have some optimizations planned to fix
that, in fact it should be faster in the end.

i386 slowdowns are unexpected though.

I remember I tested i386 briefly with lmbench with my original 4level
patch, and there werent any significant slowdowns. However the patch
that eventually went into mainline was very different and in particular
clear_page_range() which is very critical looks completely different
now and does more work than before. Perhaps the slowdown happens in this
area.

diffprofile of before and after would be interesting.

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.11 vs 2.6.10 slowdown on i686
  2005-03-17 12:37 ` Nick Piggin
@ 2005-03-17 20:23   ` Ian Pratt
  2005-03-18  8:25   ` Kurt Garloff
  1 sibling, 0 replies; 6+ messages in thread
From: Ian Pratt @ 2005-03-17 20:23 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Ian Pratt, linux-kernel, garloff, ak, Ian.Pratt


> There are some changes in the current -bk tree (which are a
> bit in-flux at the moment) which introduce some optimisations.
> 
> They should bring 2-level performance close to par with 2.6.10.
> If not, complain again :)

The good news is that with a BK snapshot from today
[md5key=4238cb8e36_Z5Cgys8rTovspboIJpw] performance is rather
improved relative to 2.6.11 :

 fork: 166 -> 187   -13%
 exec: 857 -> 909   -6%

Rather better than -40%, but still not brilliant.

Any more improvements in the pipeline?

Ian



------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh  
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
commando-  Linux 2.6.10 2400 0.49 0.57 2.06 3.06 19.6 0.89 2.70 166. 857. 2972
commando-  Linux 2.6.12 2400 0.49 0.60 2.37 3.43 20.9 0.91 2.64 187. 909. 3076

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
commando-  Linux 2.6.10 7.5800 4.3300 8.1900 5.1100   33.1 8.37000    41.9
commando-  Linux 2.6.12 7.7400 7.9200 8.3700 5.1600   27.0 9.32000    36.5

*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
commando-  Linux 2.6.10 7.750  19.4 21.3  37.2  45.5  42.5  53.2  76.
commando-  Linux 2.6.12 7.740  18.2 23.1  37.4  45.6  42.6  54.9  80.

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host                 OS   0K File      10K File     Mmap    Prot   Page   100fd
                        Create Delete Create Delete Latency Fault  Fault  selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
commando-  Linux 2.6.10   39.3   16.2   92.7   35.2   122.0 1.200 2.14310  18.3
commando-  Linux 2.6.12   38.7   16.4   94.1   35.1   148.0 1.029 2.25100  18.0

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
commando-  Linux 2.6.10 313. 440. 222. 1551.7 1528.5  549.1  566.8 1550 784.8
commando-  Linux 2.6.12 556. 477. 224. 1540.3 1551.4  566.5  566.6 1551 786.2



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.11 vs 2.6.10 slowdown on i686
  2005-03-17 12:37 ` Nick Piggin
  2005-03-17 20:23   ` Ian Pratt
@ 2005-03-18  8:25   ` Kurt Garloff
  2005-03-18  8:46     ` Nick Piggin
  1 sibling, 1 reply; 6+ messages in thread
From: Kurt Garloff @ 2005-03-18  8:25 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Linux kernel list, Ian Pratt, Andi Kleen

[-- Attachment #1: Type: text/plain, Size: 708 bytes --]

Hi Nick,

On Thu, Mar 17, 2005 at 11:37:24PM +1100, Nick Piggin wrote:
> Ian Pratt wrote:
> >fork: 166 -> 235  (40% slowdown)
> >exec: 857 -> 1003 (17% slowdown)
> >
> >I'm guessing this is down to the 4 level pagetables. This is rather a
> >surprise as I thought the compiler would optimise most of these
> >changes away. Apparently not. 
> 
> There are some changes in the current -bk tree (which are a
> bit in-flux at the moment) which introduce some optimisations.
> 
> They should bring 2-level performance close to par with 2.6.10.
> If not, complain again :)

Is there a clean patchset that we should look at to test?

Regards,
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.11 vs 2.6.10 slowdown on i686
  2005-03-18  8:25   ` Kurt Garloff
@ 2005-03-18  8:46     ` Nick Piggin
  0 siblings, 0 replies; 6+ messages in thread
From: Nick Piggin @ 2005-03-18  8:46 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: Linux kernel list, Ian Pratt, Andi Kleen

Kurt Garloff wrote:
> Hi Nick,
> 

Hi Kurt!

> On Thu, Mar 17, 2005 at 11:37:24PM +1100, Nick Piggin wrote:
> 
>>Ian Pratt wrote:
>>
>>>fork: 166 -> 235  (40% slowdown)
>>>exec: 857 -> 1003 (17% slowdown)
>>>
>>>I'm guessing this is down to the 4 level pagetables. This is rather a
>>>surprise as I thought the compiler would optimise most of these
>>>changes away. Apparently not. 
>>
>>There are some changes in the current -bk tree (which are a
>>bit in-flux at the moment) which introduce some optimisations.
>>
>>They should bring 2-level performance close to par with 2.6.10.
>>If not, complain again :)
> 
> 
> Is there a clean patchset that we should look at to test?
> 

Probably the best thing would be to wait and see what happens
with the ptwalk patches. There is a fix in there for ia64 now,
but I think that may be a temporary one.

Andi is probably keeping an eye on that, but if not then I
could put a patchset together when things finalise in 2.6.

 From the profiles I have seen, the ptwalk patches bring page
table walking performance pretty well back to 2.6.10 levels,
however the "aggressive page table freeing" (clear_page_range)
changes that went in at the same time as the 4level stuff
seem to be what is slowing down exit() and unmapping performance.

Not by a huge amount, mind you, and it is not completely wasted
performance, because it provides better page table freeing.
But it is enough to be annoying! I haven't had much time to look
at it lately, but I hope to get onto it soon.

Nick


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-03-18  8:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-17 12:16 2.6.11 vs 2.6.10 slowdown on i686 Ian Pratt
2005-03-17 12:37 ` Nick Piggin
2005-03-17 20:23   ` Ian Pratt
2005-03-18  8:25   ` Kurt Garloff
2005-03-18  8:46     ` Nick Piggin
2005-03-17 18:36 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox