* [PATCH 0/4] 8xx: Optimize TLB Miss code. @ 2010-03-02 15:37 Joakim Tjernlund 2010-03-02 15:37 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Joakim Tjernlund 0 siblings, 1 reply; 22+ messages in thread From: Joakim Tjernlund @ 2010-03-02 15:37 UTC (permalink / raw) To: linuxppc-dev, Scott Wood This set of tries to optimize the TLB code on 8xx even more. If they work, it should be a noticable performance boost. I would be very happy if you could test them for me. - v2: Since Scott has done some testing of these patches I resend them with my SOB. Scott, can you "bless" these patches too? Joakim Tjernlund (4): 8xx: Optimze TLB Miss handlers 8xx: Avoid testing for kernel space in ITLB Miss. 8xx: Don't touch ACCESSED when no SWAP. 8xx: Use SPRG2 and DAR registers to stash r11 and cr. arch/powerpc/kernel/head_8xx.S | 70 +++++++++++++++++++++++++++------------- 1 files changed, 47 insertions(+), 23 deletions(-) ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/4] 8xx: Optimze TLB Miss handlers 2010-03-02 15:37 [PATCH 0/4] 8xx: Optimize TLB Miss code Joakim Tjernlund @ 2010-03-02 15:37 ` Joakim Tjernlund 2010-03-02 15:37 ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund 0 siblings, 1 reply; 22+ messages in thread From: Joakim Tjernlund @ 2010-03-02 15:37 UTC (permalink / raw) To: linuxppc-dev, Scott Wood This removes a couple of insn's from the TLB Miss handlers whithout changing functionality. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> --- arch/powerpc/kernel/head_8xx.S | 11 +++-------- 1 files changed, 3 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 3ef743f..ecc4a02 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -343,17 +343,14 @@ InstructionTLBMiss: cmpwi cr0, r11, _PAGE_ACCESSED | _PAGE_PRESENT bne- cr0, 2f - /* Clear PP lsb, 0x400 */ - rlwinm r10, r10, 0, 22, 20 - /* The Linux PTE won't go exactly into the MMU TLB. - * Software indicator bits 22 and 28 must be clear. + * Software indicator bits 21 and 28 must be clear. * Software indicator bits 24, 25, 26, and 27 must be * set. All other Linux PTE bits control the behavior * of the MMU. */ li r11, 0x00f0 - rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */ + rlwimi r10, r11, 0, 0x07f8 /* Set 24-27, clear 21-23,28 */ DO_8xx_CPU6(0x2d80, r3) mtspr SPRN_MI_RPN, r10 /* Update TLB entry */ @@ -444,9 +441,7 @@ DataStoreTLBMiss: /* Honour kernel RO, User NA */ /* 0x200 == Extended encoding, bit 22 */ - /* r11 = (r10 & _PAGE_USER) >> 2 */ - rlwinm r11, r10, 32-2, 0x200 - or r10, r11, r10 + rlwimi r10, r10, 32-2, 0x200 /* Copy USER to bit 22, 0x200 */ /* r11 = (r10 & _PAGE_RW) >> 1 */ rlwinm r11, r10, 32-1, 0x200 or r10, r11, r10 -- 1.6.4.4 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss. 2010-03-02 15:37 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Joakim Tjernlund @ 2010-03-02 15:37 ` Joakim Tjernlund 2010-03-02 15:37 ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Joakim Tjernlund 0 siblings, 1 reply; 22+ messages in thread From: Joakim Tjernlund @ 2010-03-02 15:37 UTC (permalink / raw) To: linuxppc-dev, Scott Wood Only modules will cause ITLB Misses as we always pin the first 8MB of kernel memory. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> --- arch/powerpc/kernel/head_8xx.S | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index ecc4a02..84ca1d9 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -318,12 +318,16 @@ InstructionTLBMiss: /* If we are faulting a kernel address, we have to use the * kernel page tables. */ +#ifdef CONFIG_MODULES + /* Only modules will cause ITLB Misses as we always + * pin the first 8MB of kernel memory */ andi. r11, r10, 0x0800 /* Address >= 0x80000000 */ beq 3f lis r11, swapper_pg_dir@h ori r11, r11, swapper_pg_dir@l rlwimi r10, r11, 0, 2, 19 3: +#endif lwz r11, 0(r10) /* Get the level 1 entry */ rlwinm. r10, r11,0,0,19 /* Extract page descriptor page address */ beq 2f /* If zero, don't try to find a pte */ -- 1.6.4.4 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP. 2010-03-02 15:37 ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund @ 2010-03-02 15:37 ` Joakim Tjernlund 2010-03-02 15:37 ` [PATCH 4/4] 8xx: Use SPRG2 and DAR registers to stash r11 and cr Joakim Tjernlund 0 siblings, 1 reply; 22+ messages in thread From: Joakim Tjernlund @ 2010-03-02 15:37 UTC (permalink / raw) To: linuxppc-dev, Scott Wood Only the swap function cares about the ACCESSED bit in the pte. Do not waste cycles updateting ACCESSED when swap is not compiled into the kernel. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> --- arch/powerpc/kernel/head_8xx.S | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 84ca1d9..6478a96 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -343,10 +343,11 @@ InstructionTLBMiss: mfspr r11, SPRN_MD_TWC /* ....and get the pte address */ lwz r10, 0(r11) /* Get the pte */ +#ifdef CONFIG_SWAP andi. r11, r10, _PAGE_ACCESSED | _PAGE_PRESENT cmpwi cr0, r11, _PAGE_ACCESSED | _PAGE_PRESENT bne- cr0, 2f - +#endif /* The Linux PTE won't go exactly into the MMU TLB. * Software indicator bits 21 and 28 must be clear. * Software indicator bits 24, 25, 26, and 27 must be @@ -439,10 +440,11 @@ DataStoreTLBMiss: * r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5)); * r10 = (r10 & ~PRESENT) | r11; */ +#ifdef CONFIG_SWAP rlwinm r11, r10, 32-5, _PAGE_PRESENT and r11, r11, r10 rlwimi r10, r11, 0, _PAGE_PRESENT - +#endif /* Honour kernel RO, User NA */ /* 0x200 == Extended encoding, bit 22 */ rlwimi r10, r10, 32-2, 0x200 /* Copy USER to bit 22, 0x200 */ -- 1.6.4.4 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 4/4] 8xx: Use SPRG2 and DAR registers to stash r11 and cr. 2010-03-02 15:37 ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Joakim Tjernlund @ 2010-03-02 15:37 ` Joakim Tjernlund 0 siblings, 0 replies; 22+ messages in thread From: Joakim Tjernlund @ 2010-03-02 15:37 UTC (permalink / raw) To: linuxppc-dev, Scott Wood This avoids storing these registers in memory. CPU6 errata will still use the old way. Remove some G2 leftover accesses from 2.4 Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> --- arch/powerpc/kernel/head_8xx.S | 49 +++++++++++++++++++++++++++++---------- 1 files changed, 36 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 6478a96..1f1a04b 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -71,9 +71,6 @@ _ENTRY(_start); * in the first level table, but that would require many changes to the * Linux page directory/table functions that I don't want to do right now. * - * I used to use SPRG2 for a temporary register in the TLB handler, but it - * has since been put to other uses. I now use a hack to save a register - * and the CCR at memory location 0.....Someday I'll fix this..... * -- Dan */ .globl __start @@ -302,8 +299,13 @@ InstructionTLBMiss: DO_8xx_CPU6(0x3f80, r3) mtspr SPRN_M_TW, r10 /* Save a couple of working registers */ mfcr r10 +#ifdef CONFIG_8xx_CPU6 stw r10, 0(r0) stw r11, 4(r0) +#else + mtspr SPRN_DAR, r10 + mtspr SPRN_SPRG2, r11 +#endif mfspr r10, SPRN_SRR0 /* Get effective address of fault */ #ifdef CONFIG_8xx_CPU15 addi r11, r10, 0x1000 @@ -359,13 +361,19 @@ InstructionTLBMiss: DO_8xx_CPU6(0x2d80, r3) mtspr SPRN_MI_RPN, r10 /* Update TLB entry */ - mfspr r10, SPRN_M_TW /* Restore registers */ + /* Restore registers */ +#ifndef CONFIG_8xx_CPU6 + mfspr r10, SPRN_DAR + mtcr r10 + mtspr SPRN_DAR, r11 /* Tag DAR */ + mfspr r11, SPRN_SPRG2 +#else lwz r11, 0(r0) mtcr r11 lwz r11, 4(r0) -#ifdef CONFIG_8xx_CPU6 lwz r3, 8(r0) #endif + mfspr r10, SPRN_M_TW rfi 2: mfspr r11, SPRN_SRR1 @@ -375,13 +383,20 @@ InstructionTLBMiss: rlwinm r11, r11, 0, 0xffff mtspr SPRN_SRR1, r11 - mfspr r10, SPRN_M_TW /* Restore registers */ + /* Restore registers */ +#ifndef CONFIG_8xx_CPU6 + mfspr r10, SPRN_DAR + mtcr r10 + li r11, 0x00f0 + mtspr SPRN_DAR, r11 /* Tag DAR */ + mfspr r11, SPRN_SPRG2 +#else lwz r11, 0(r0) mtcr r11 lwz r11, 4(r0) -#ifdef CONFIG_8xx_CPU6 lwz r3, 8(r0) #endif + mfspr r10, SPRN_M_TW b InstructionAccess . = 0x1200 @@ -392,8 +407,13 @@ DataStoreTLBMiss: DO_8xx_CPU6(0x3f80, r3) mtspr SPRN_M_TW, r10 /* Save a couple of working registers */ mfcr r10 +#ifdef CONFIG_8xx_CPU6 stw r10, 0(r0) stw r11, 4(r0) +#else + mtspr SPRN_DAR, r10 + mtspr SPRN_SPRG2, r11 +#endif mfspr r10, SPRN_M_TWB /* Get level 1 table entry address */ /* If we are faulting a kernel address, we have to use the @@ -461,18 +481,24 @@ DataStoreTLBMiss: * of the MMU. */ 2: li r11, 0x00f0 - mtspr SPRN_DAR,r11 /* Tag DAR */ rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */ DO_8xx_CPU6(0x3d80, r3) mtspr SPRN_MD_RPN, r10 /* Update TLB entry */ - mfspr r10, SPRN_M_TW /* Restore registers */ + /* Restore registers */ +#ifndef CONFIG_8xx_CPU6 + mfspr r10, SPRN_DAR + mtcr r10 + mtspr SPRN_DAR, r11 /* Tag DAR */ + mfspr r11, SPRN_SPRG2 +#else + mtspr SPRN_DAR, r11 /* Tag DAR */ lwz r11, 0(r0) mtcr r11 lwz r11, 4(r0) -#ifdef CONFIG_8xx_CPU6 lwz r3, 8(r0) #endif + mfspr r10, SPRN_M_TW rfi /* This is an instruction TLB error on the MPC8xx. This could be due @@ -684,9 +710,6 @@ start_here: tophys(r4,r2) addi r4,r4,THREAD /* init task's THREAD */ mtspr SPRN_SPRG_THREAD,r4 - li r3,0 - /* XXX What is that for ? SPRG2 appears otherwise unused on 8xx */ - mtspr SPRN_SPRG2,r3 /* 0 => r1 has kernel sp */ /* stack */ lis r1,init_thread_union@ha -- 1.6.4.4 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
@ 2010-03-03 8:02 Heiko Schocher
2010-03-03 8:48 ` Joakim Tjernlund
0 siblings, 1 reply; 22+ messages in thread
From: Heiko Schocher @ 2010-03-03 8:02 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk
Hello Joakim,
I tried your 4 patches on a MPC855M based system:
-bash-3.2# cat /proc/cpuinfo
processor : 0
cpu : 8xx
clock : 66.000000MHz
revision : 0.0 (pvr 0050 0000)
bogomips : 8.25
timebase : 4125000
platform : TQM8xx
model : TQM8xx
Memory : 32 MB
-bash-3.2# cat /proc/version
Linux version 2.6.33-rc6-01500-gbddcb41-dirty (hs@xpert.denx.de) (gcc version 4.2.2) #9 Tue Mar 2 18:08:49 CET 2010
-bash-3.2#
First I looked for the Boottime:
Booting Linux:
2.6.33 2.6.33tunned
... until "Freeing unused kernel memory" message (= enter user space) ~4s ~4s
... until "login:" message (= full multi-user mode) 56s 56s
and I did a Performance test with lmbench, see:
http://sourceforge.net/projects/lmbench
Here the results:
(The first 4 rows are the results for the kernel without your patches,
the next 4 rows are the results for the kernel with your patches)
make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results'
L M B E N C H 3 . 0 S U M M A R Y
------------------------------------
(Alpha software, do not distribute)
Basic system parameters
------------------------------------------------------------------------------
Host OS Description Mhz tlb cache mem scal
pages line par load
bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1
tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1
Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
tqm8xx Linux 2.6.33- 66 2.97 10.3 129. 1377 272. 21.8 91.3 6949 29.K 89.K
tqm8xx Linux 2.6.33- 66 3.06 10.5 124. 1375 273. 21.8 91.3 7136 30.K 89.K
tqm8xx Linux 2.6.33- 66 3.06 10.6 129. 1365 272. 21.2 96.6 6889 29.K 89.K
tqm8xx Linux 2.6.33- 66 3.06 10.5 124. 1309 272. 21.8 101. 6896 29.K 89.K
tqm8xx Linux 2.6.33- 66 2.97 8.86 126. 1336 273. 21.7 84.2 6785 29.K 88.K
tqm8xx Linux 2.6.33- 66 3.06 8.90 130. 1343 263. 21.3 84.7 7080 29.K 88.K
tqm8xx Linux 2.6.33- 66 3.52 8.97 129. 1339 270. 22.4 84.4 6823 29.K 88.K
tqm8xx Linux 2.6.33- 66 2.97 8.99 127. 1333 261. 22.4 87.0 7037 29.K 87.K
Basic integer operations - times in nanoseconds - smaller is better
-------------------------------------------------------------------
Host OS intgr intgr intgr intgr intgr
bit add mul div mod
--------- ------------- ------ ------ ------ ------ ------
tqm8xx Linux 2.6.33- 15.7 18.0 1.5600 124.2 203.1
tqm8xx Linux 2.6.33- 15.7 17.4 1.5800 121.1 202.8
tqm8xx Linux 2.6.33- 15.2 17.9 1.6200 124.2 202.7
tqm8xx Linux 2.6.33- 15.2 17.9 1.6000 125.0 204.0
tqm8xx Linux 2.6.33- 15.7 18.1 1.5600 124.7 204.4
tqm8xx Linux 2.6.33- 15.7 18.1 1.5800 124.2 202.8
tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 203.2
tqm8xx Linux 2.6.33- 15.7 18.1 1.5500 124.5 202.0
Basic uint64 operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host OS int64 int64 int64 int64 int64
bit add mul div mod
--------- ------------- ------ ------ ------ ------ ------
tqm8xx Linux 2.6.33- 15. 13.3 1952.2 1838.2
tqm8xx Linux 2.6.33- 15. 13.2 1951.5 1837.8
tqm8xx Linux 2.6.33- 15. 13.2 1886.7 1907.8
tqm8xx Linux 2.6.33- 15. 13.2 1951.5 1838.2
tqm8xx Linux 2.6.33- 15. 13.3 1887.0 1902.2
tqm8xx Linux 2.6.33- 15. 13.3 1887.4 1901.5
tqm8xx Linux 2.6.33- 15. 13.3 1886.7 1893.0
tqm8xx Linux 2.6.33- 15. 13.3 1950.0 1900.4
Basic float operations - times in nanoseconds - smaller is better
-----------------------------------------------------------------
Host OS float float float float
add mul div bogo
--------- ------------- ------ ------ ------ ------
tqm8xx Linux 2.6.33- 1008.9 1629.2 5527.0 9895.0
tqm8xx Linux 2.6.33- 1008.9 1628.9 5495.0 9892.0
tqm8xx Linux 2.6.33- 1007.8 1622.0 5499.0 9886.0
tqm8xx Linux 2.6.33- 1016.5 1628.6 5319.0 9940.0
tqm8xx Linux 2.6.33- 1008.0 1628.3 5497.0 9879.0
tqm8xx Linux 2.6.33- 1007.6 1577.4 5495.0 9881.0
tqm8xx Linux 2.6.33- 1014.8 1627.1 5493.0 9889.0
tqm8xx Linux 2.6.33- 1004.6 1627.7 5487.0 9881.0
Basic double operations - times in nanoseconds - smaller is better
------------------------------------------------------------------
Host OS double double double double
add mul div bogo
--------- ------------- ------ ------ ------ ------
tqm8xx Linux 2.6.33- 1562.4 2782.8 3730.7 12.6K
tqm8xx Linux 2.6.33- 1556.1 2781.5 3724.3 12.6K
tqm8xx Linux 2.6.33- 1513.9 2801.0 3726.4 12.8K
tqm8xx Linux 2.6.33- 1556.1 2780.9 3611.4 12.6K
tqm8xx Linux 2.6.33- 1570.5 2772.6 3742.1 12.6K
tqm8xx Linux 2.6.33- 1560.1 2703.0 3611.4 12.7K
tqm8xx Linux 2.6.33- 1560.4 2779.5 3760.7 12.7K
tqm8xx Linux 2.6.33- 1559.8 2773.0 3742.1 12.6K
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
tqm8xx Linux 2.6.33- 92.6 109.6 110.9 137.5 173.8 151.8 199.3
tqm8xx Linux 2.6.33- 95.8 108.5 104.7 137.1 172.7 150.9 194.7
tqm8xx Linux 2.6.33- 95.8 118.8 97.5 146.4 162.0 160.8 190.1
tqm8xx Linux 2.6.33- 92.9 111.9 101.0 138.1 166.6 152.3 192.0
tqm8xx Linux 2.6.33- 90.8 108.5 116.2 134.3 171.8 147.1 210.0
tqm8xx Linux 2.6.33- 100.1 111.4 105.0 136.4 173.1 148.3 200.8
tqm8xx Linux 2.6.33- 98.7 111.3 111.8 135.7 172.5 147.9 200.9
tqm8xx Linux 2.6.33- 92.0 117.9 109.9 141.6 170.4 154.9 196.4
*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
tqm8xx Linux 2.6.33- 92.6 338.4 581. 720.1 1047. 2749
tqm8xx Linux 2.6.33- 95.8 334.0 595. 725.0 1051. 2754
tqm8xx Linux 2.6.33- 95.8 330.9 574. 720.1 1047. 2772
tqm8xx Linux 2.6.33- 92.9 338.8 574. 714.3 1046. 2742
tqm8xx Linux 2.6.33- 90.8 322.1 576. 734.9 1012. 2706
tqm8xx Linux 2.6.33- 100.1 326.0 565. 719.5 1027. 2702
tqm8xx Linux 2.6.33- 98.7 322.8 571. 713.8 1028. 2711
tqm8xx Linux 2.6.33- 92.0 328.1 549. 714.1 1022. 2696
*Remote* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host OS UDP RPC/ TCP RPC/ TCP
UDP TCP conn
--------- ------------- ----- ----- ----- ----- ----
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
tqm8xx Linux 2.6.33-
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
tqm8xx Linux 2.6.33- 5917.2 3968.3 31.2K 4329.0 4147.0 18.8 34.1 135.2
tqm8xx Linux 2.6.33- 5714.3 3937.0 32.3K 6060.6 4210.0 14.2 34.5 131.4
tqm8xx Linux 2.6.33- 5747.1 4000.0 31.2K 4329.0 4114.0 7.692 34.0 133.1
tqm8xx Linux 2.6.33- 5747.1 4081.6 30.3K 4273.5 4100.0 18.2 34.2 135.0
tqm8xx Linux 2.6.33- 5714.3 3952.6 31.2K 4273.5 4130.0 33.5 35.1 136.1
tqm8xx Linux 2.6.33- 5714.3 3906.2 31.2K 6060.6 4105.0 25.7 35.5 135.9
tqm8xx Linux 2.6.33- 5681.8 3921.6 32.3K 4255.3 4144.0 23.5 35.0 134.9
tqm8xx Linux 2.6.33- 5649.7 3937.0 30.3K 4237.3 4116.0 21.6 35.3 135.3
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
tqm8xx Linux 2.6.33- 14.8 15.6 10.1 21.0 55.5 32.3 34.5 55.6 53.0
tqm8xx Linux 2.6.33- 14.8 15.6 10.7 21.0 55.5 32.3 34.5 55.6 53.0
tqm8xx Linux 2.6.33- 14.8 15.7 12.7 21.0 55.5 32.3 34.5 55.6 53.0
tqm8xx Linux 2.6.33- 14.8 15.6 13.9 21.0 55.5 32.3 34.5 55.6 53.0
tqm8xx Linux 2.6.33- 14.8 15.8 12.9 21.0 55.7 32.5 34.6 55.8 53.1
tqm8xx Linux 2.6.33- 14.8 15.7 14.0 21.0 55.7 32.4 34.6 55.8 53.1
tqm8xx Linux 2.6.33- 14.8 15.8 12.9 21.0 55.7 32.5 34.6 55.8 53.1
tqm8xx Linux 2.6.33- 14.8 15.8 13.0 21.0 55.7 32.5 34.6 55.8 53.1
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
------------------------------------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses
--------- ------------- --- ---- ---- -------- -------- -------
tqm8xx Linux 2.6.33- 66 31.8 141.0 184.0 1165.7
tqm8xx Linux 2.6.33- 66 31.8 141.2 184.2 1165.3
tqm8xx Linux 2.6.33- 66 31.8 141.3 184.3 1165.6
tqm8xx Linux 2.6.33- 66 31.8 141.3 184.2 1166.2
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1100.5 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1102.5 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.7 No L2 cache?
tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.6 No L2 cache?
make[1]: Leaving directory `/home/hs/lmbench-3.0-a9/results'
bye
Heiko
--
DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-03 8:02 [PATCH 0/4] 8xx: Optimize TLB Miss code Heiko Schocher @ 2010-03-03 8:48 ` Joakim Tjernlund 2010-03-03 8:59 ` Joakim Tjernlund 2010-03-03 10:10 ` Heiko Schocher 0 siblings, 2 replies; 22+ messages in thread From: Joakim Tjernlund @ 2010-03-03 8:48 UTC (permalink / raw) To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk Heiko Schocher <hs@denx.de> wrote on 2010/03/03 09:02:47: > > Hello Joakim, > > I tried your 4 patches on a MPC855M based system: Thanks a lot for testing this for me! > > -bash-3.2# cat /proc/cpuinfo > processor : 0 > cpu : 8xx > clock : 66.000000MHz > revision : 0.0 (pvr 0050 0000) > bogomips : 8.25 > timebase : 4125000 > platform : TQM8xx > model : TQM8xx > Memory : 32 MB > -bash-3.2# cat /proc/version > Linux version 2.6.33-rc6-01500-gbddcb41-dirty (hs@xpert.denx.de) (gcc version > 4.2.2) #9 Tue Mar 2 18:08:49 CET 2010 > -bash-3.2# > > First I looked for the Boottime: > > Booting Linux: > > 2.6.33 2.6.33tunned > ... until "Freeing unused kernel memory" message (= enter user space) ~4s ~4s > ... until "login:" message (= full multi-user mode) 56s 56s > > and I did a Performance test with lmbench, see: > http://sourceforge.net/projects/lmbench > > Here the results: > (The first 4 rows are the results for the kernel without your patches, > the next 4 rows are the results for the kernel with your patches) > > make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results' I see both ups and downs in this test, don't quite understand why. What is your config w.r.t SWAP, MODULES, CPU6 and CPU15? > > L M B E N C H 3 . 0 S U M M A R Y > ------------------------------------ > (Alpha software, do not distribute) > > Basic system parameters > ------------------------------------------------------------------------------ > Host OS Description Mhz tlb cache mem scal > pages line par load > bytes > --------- ------------- ----------------------- ---- ----- ----- ------ ---- > tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1 > tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1 > tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1 > tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1 > tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1 > tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1 > tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1 > tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1 > > Processor, Processes - times in microseconds - smaller is better > ------------------------------------------------------------------------------ > Host OS Mhz null null open slct sig sig fork exec sh > call I/O stat clos TCP inst hndl proc proc proc > --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- > tqm8xx Linux 2.6.33- 66 2.97 10.3 129. 1377 272. 21.8 91.3 6949 29.K 89.K > tqm8xx Linux 2.6.33- 66 3.06 10.5 124. 1375 273. 21.8 91.3 7136 30.K 89.K > tqm8xx Linux 2.6.33- 66 3.06 10.6 129. 1365 272. 21.2 96.6 6889 29.K 89.K > tqm8xx Linux 2.6.33- 66 3.06 10.5 124. 1309 272. 21.8 101. 6896 29.K 89.K > tqm8xx Linux 2.6.33- 66 2.97 8.86 126. 1336 273. 21.7 84.2 6785 29.K 88.K > tqm8xx Linux 2.6.33- 66 3.06 8.90 130. 1343 263. 21.3 84.7 7080 29.K 88.K > tqm8xx Linux 2.6.33- 66 3.52 8.97 129. 1339 270. 22.4 84.4 6823 29.K 88.K > tqm8xx Linux 2.6.33- 66 2.97 8.99 127. 1333 261. 22.4 87.0 7037 29.K 87.K > [SNIP integer/float test, these are not relevant] > > Context switching - times in microseconds - smaller is better > ------------------------------------------------------------------------- > Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K > ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw > --------- ------------- ------ ------ ------ ------ ------ ------- ------- > tqm8xx Linux 2.6.33- 92.6 109.6 110.9 137.5 173.8 151.8 199.3 > tqm8xx Linux 2.6.33- 95.8 108.5 104.7 137.1 172.7 150.9 194.7 > tqm8xx Linux 2.6.33- 95.8 118.8 97.5 146.4 162.0 160.8 190.1 > tqm8xx Linux 2.6.33- 92.9 111.9 101.0 138.1 166.6 152.3 192.0 > tqm8xx Linux 2.6.33- 90.8 108.5 116.2 134.3 171.8 147.1 210.0 > tqm8xx Linux 2.6.33- 100.1 111.4 105.0 136.4 173.1 148.3 200.8 > tqm8xx Linux 2.6.33- 98.7 111.3 111.8 135.7 172.5 147.9 200.9 > tqm8xx Linux 2.6.33- 92.0 117.9 109.9 141.6 170.4 154.9 196.4 > > *Local* Communication latencies in microseconds - smaller is better > --------------------------------------------------------------------- > Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP > ctxsw UNIX UDP TCP conn > --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- > tqm8xx Linux 2.6.33- 92.6 338.4 581. 720.1 1047. 2749 > tqm8xx Linux 2.6.33- 95.8 334.0 595. 725.0 1051. 2754 > tqm8xx Linux 2.6.33- 95.8 330.9 574. 720.1 1047. 2772 > tqm8xx Linux 2.6.33- 92.9 338.8 574. 714.3 1046. 2742 > tqm8xx Linux 2.6.33- 90.8 322.1 576. 734.9 1012. 2706 > tqm8xx Linux 2.6.33- 100.1 326.0 565. 719.5 1027. 2702 > tqm8xx Linux 2.6.33- 98.7 322.8 571. 713.8 1028. 2711 > tqm8xx Linux 2.6.33- 92.0 328.1 549. 714.1 1022. 2696 > > *Remote* Communication latencies in microseconds - smaller is better > --------------------------------------------------------------------- > Host OS UDP RPC/ TCP RPC/ TCP > UDP TCP conn > --------- ------------- ----- ----- ----- ----- ---- > tqm8xx Linux 2.6.33- > tqm8xx Linux 2.6.33- > tqm8xx Linux 2.6.33- > tqm8xx Linux 2.6.33- > tqm8xx Linux 2.6.33- > tqm8xx Linux 2.6.33- > tqm8xx Linux 2.6.33- > tqm8xx Linux 2.6.33- > > File & VM system latencies in microseconds - smaller is better > ------------------------------------------------------------------------------- > Host OS 0K File 10K File Mmap Prot Page 100fd > Create Delete Create Delete Latency Fault Fault selct > --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- > tqm8xx Linux 2.6.33- 5917.2 3968.3 31.2K 4329.0 4147.0 18.8 34.1 135.2 > tqm8xx Linux 2.6.33- 5714.3 3937.0 32.3K 6060.6 4210.0 14.2 34.5 131.4 > tqm8xx Linux 2.6.33- 5747.1 4000.0 31.2K 4329.0 4114.0 7.692 34.0 133.1 > tqm8xx Linux 2.6.33- 5747.1 4081.6 30.3K 4273.5 4100.0 18.2 34.2 135.0 > tqm8xx Linux 2.6.33- 5714.3 3952.6 31.2K 4273.5 4130.0 33.5 35.1 136.1 > tqm8xx Linux 2.6.33- 5714.3 3906.2 31.2K 6060.6 4105.0 25.7 35.5 135.9 > tqm8xx Linux 2.6.33- 5681.8 3921.6 32.3K 4255.3 4144.0 23.5 35.0 134.9 > tqm8xx Linux 2.6.33- 5649.7 3937.0 30.3K 4237.3 4116.0 21.6 35.3 135.3 > > *Local* Communication bandwidths in MB/s - bigger is better > ----------------------------------------------------------------------------- > Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem > UNIX reread reread (libc) (hand) read write > --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- ----- > tqm8xx Linux 2.6.33- 14.8 15.6 10.1 21.0 55.5 32.3 34.5 55.6 53.0 > tqm8xx Linux 2.6.33- 14.8 15.6 10.7 21.0 55.5 32.3 34.5 55.6 53.0 > tqm8xx Linux 2.6.33- 14.8 15.7 12.7 21.0 55.5 32.3 34.5 55.6 53.0 > tqm8xx Linux 2.6.33- 14.8 15.6 13.9 21.0 55.5 32.3 34.5 55.6 53.0 > tqm8xx Linux 2.6.33- 14.8 15.8 12.9 21.0 55.7 32.5 34.6 55.8 53.1 > tqm8xx Linux 2.6.33- 14.8 15.7 14.0 21.0 55.7 32.4 34.6 55.8 53.1 > tqm8xx Linux 2.6.33- 14.8 15.8 12.9 21.0 55.7 32.5 34.6 55.8 53.1 > tqm8xx Linux 2.6.33- 14.8 15.8 13.0 21.0 55.7 32.5 34.6 55.8 53.1 > > Memory latencies in nanoseconds - smaller is better > (WARNING - may not be correct, check graphs) > ------------------------------------------------------------------------------ > Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses > --------- ------------- --- ---- ---- -------- -------- ------- > tqm8xx Linux 2.6.33- 66 31.8 141.0 184.0 1165.7 > tqm8xx Linux 2.6.33- 66 31.8 141.2 184.2 1165.3 > tqm8xx Linux 2.6.33- 66 31.8 141.3 184.3 1165.6 > tqm8xx Linux 2.6.33- 66 31.8 141.3 184.2 1166.2 > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1100.5 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1102.5 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.7 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.6 No L2 cache? > make[1]: Leaving directory `/home/hs/lmbench-3.0-a9/results' > > bye > Heiko > > -- > DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-03 8:48 ` Joakim Tjernlund @ 2010-03-03 8:59 ` Joakim Tjernlund 2010-03-03 10:10 ` Heiko Schocher 1 sibling, 0 replies; 22+ messages in thread From: Joakim Tjernlund @ 2010-03-03 8:59 UTC (permalink / raw) Cc: Scott Wood, linuxppc-dev, hs, Wolfgang Denk > > Heiko Schocher <hs@denx.de> wrote on 2010/03/03 09:02:47: > > > > Hello Joakim, > > > > I tried your 4 patches on a MPC855M based system: > > Thanks a lot for testing this for me! > > > > > -bash-3.2# cat /proc/cpuinfo > > processor : 0 > > cpu : 8xx > > clock : 66.000000MHz > > revision : 0.0 (pvr 0050 0000) > > bogomips : 8.25 > > timebase : 4125000 > > platform : TQM8xx > > model : TQM8xx > > Memory : 32 MB > > -bash-3.2# cat /proc/version > > Linux version 2.6.33-rc6-01500-gbddcb41-dirty (hs@xpert.denx.de) (gcc version > > 4.2.2) #9 Tue Mar 2 18:08:49 CET 2010 > > -bash-3.2# > > > > First I looked for the Boottime: > > > > Booting Linux: > > > > 2.6.33 2.6.33tunned > > ... until "Freeing unused kernel memory" message (= enter user space) ~4s ~4s > > ... until "login:" message (= full multi-user mode) 56s 56s > > > > and I did a Performance test with lmbench, see: > > http://sourceforge.net/projects/lmbench > > > > Here the results: > > (The first 4 rows are the results for the kernel without your patches, > > the next 4 rows are the results for the kernel with your patches) > > > > make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results' > > I see both ups and downs in this test, don't quite understand why. > What is your config w.r.t SWAP, MODULES, CPU6 and CPU15? Forgot to ask for PIN_TLB too ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-03 8:48 ` Joakim Tjernlund 2010-03-03 8:59 ` Joakim Tjernlund @ 2010-03-03 10:10 ` Heiko Schocher 2010-03-03 10:38 ` Joakim Tjernlund 1 sibling, 1 reply; 22+ messages in thread From: Heiko Schocher @ 2010-03-03 10:10 UTC (permalink / raw) To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk Hello Joakim, Joakim Tjernlund wrote: > Heiko Schocher <hs@denx.de> wrote on 2010/03/03 09:02:47: [...] >> Here the results: >> (The first 4 rows are the results for the kernel without your patches, >> the next 4 rows are the results for the kernel with your patches) >> >> make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results' > > I see both ups and downs in this test, don't quite understand why. > What is your config w.r.t SWAP, MODULES, CPU6 and CPU15? Sorry, forgot to say, where to find the sources. You can find them here: http://git.denx.de/?p=linux-2.6-denx.git;a=shortlog;h=refs/heads/tqm8xx bye Heiko -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-03 10:10 ` Heiko Schocher @ 2010-03-03 10:38 ` Joakim Tjernlund 2010-03-04 10:30 ` Heiko Schocher 0 siblings, 1 reply; 22+ messages in thread From: Joakim Tjernlund @ 2010-03-03 10:38 UTC (permalink / raw) To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk Heiko Schocher <hs@denx.de> wrote on 2010/03/03 11:10:10: > > Hello Joakim, > > Joakim Tjernlund wrote: > > Heiko Schocher <hs@denx.de> wrote on 2010/03/03 09:02:47: > [...] > >> Here the results: > >> (The first 4 rows are the results for the kernel without your patches, > >> the next 4 rows are the results for the kernel with your patches) > >> > >> make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results' > > > > I see both ups and downs in this test, don't quite understand why. > > What is your config w.r.t SWAP, MODULES, CPU6 and CPU15? > > Sorry, forgot to say, where to find the sources. You can find them > here: > > http://git.denx.de/?p=linux-2.6-denx.git;a=shortlog;h=refs/heads/tqm8xx OK, so you got SWAP=no, MODULES=yes, CPU6=no, CPU15=no PIN_TLB isn't listed in you def config so I assume it is no? MODULES=yes nullifies one optimization. I don't understand the bad numbers for Prot Fault: File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- tqm8xx Linux 2.6.33- 5917.2 3968.3 31.2K 4329.0 4147.0 18.8 34.1 135.2 tqm8xx Linux 2.6.33- 5714.3 3937.0 32.3K 6060.6 4210.0 14.2 34.5 131.4 tqm8xx Linux 2.6.33- 5747.1 4000.0 31.2K 4329.0 4114.0 7.692 34.0 133.1 tqm8xx Linux 2.6.33- 5747.1 4081.6 30.3K 4273.5 4100.0 18.2 34.2 135.0 tqm8xx Linux 2.6.33- 5714.3 3952.6 31.2K 4273.5 4130.0 33.5 35.1 136.1 tqm8xx Linux 2.6.33- 5714.3 3906.2 31.2K 6060.6 4105.0 25.7 35.5 135.9 tqm8xx Linux 2.6.33- 5681.8 3921.6 32.3K 4255.3 4144.0 23.5 35.0 134.9 tqm8xx Linux 2.6.33- 5649.7 3937.0 30.3K 4237.3 4116.0 21.6 35.3 135.3 Could you try reverting patch: 8xx: Don't touch ACCESSED when no SWAP. and see if that makes a difference? Turning on pinned TLBs(you must turn on ADVANCED_OPTIONS first) could be an improvement, regardless of my patches. Jocke ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-03 10:38 ` Joakim Tjernlund @ 2010-03-04 10:30 ` Heiko Schocher 2010-03-04 12:16 ` Wolfgang Denk 0 siblings, 1 reply; 22+ messages in thread From: Heiko Schocher @ 2010-03-04 10:30 UTC (permalink / raw) To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk Hello Joakim, Joakim Tjernlund wrote: > Could you try reverting patch: > 8xx: Don't touch ACCESSED when no SWAP. > and see if that makes a difference? [...] > Turning on pinned TLBs(you must turn on ADVANCED_OPTIONS first) could be an improvement, > regardless of my patches. here the results: run version 1-4 2.6.33-rc6 without your patches 5-8 2.6.33-rc6 with all your patches 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED when no SWAP) 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y > Turning on pinned TLBs(you must turn on ADVANCED_OPTIONS first) could be an improvement, > regardless of my patches. make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results' L M B E N C H 3 . 0 S U M M A R Y ------------------------------------ (Alpha software, do not distribute) Basic system parameters ------------------------------------------------------------------------------ Host OS Description Mhz tlb cache mem scal pages line par load bytes --------- ------------- ----------------------- ---- ----- ----- ------ ---- tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0100 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 32 16 1.0100 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.1700 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0100 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1 Processor, Processes - times in microseconds - smaller is better ------------------------------------------------------------------------------ Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- tqm8xx Linux 2.6.33- 66 2.97 10.3 129. 1377 272. 21.8 91.3 6949 29.K 89.K tqm8xx Linux 2.6.33- 66 3.06 10.5 124. 1375 273. 21.8 91.3 7136 30.K 89.K tqm8xx Linux 2.6.33- 66 3.06 10.6 129. 1365 272. 21.2 96.6 6889 29.K 89.K tqm8xx Linux 2.6.33- 66 3.06 10.5 124. 1309 272. 21.8 101. 6896 29.K 89.K tqm8xx Linux 2.6.33- 66 2.97 8.86 126. 1336 273. 21.7 84.2 6785 29.K 88.K tqm8xx Linux 2.6.33- 66 3.06 8.90 130. 1343 263. 21.3 84.7 7080 29.K 88.K tqm8xx Linux 2.6.33- 66 3.52 8.97 129. 1339 270. 22.4 84.4 6823 29.K 88.K tqm8xx Linux 2.6.33- 66 2.97 8.99 127. 1333 261. 22.4 87.0 7037 29.K 87.K tqm8xx Linux 2.6.33- 66 3.06 8.83 128. 1355 269. 20.7 89.2 6927 29.K 87.K tqm8xx Linux 2.6.33- 66 3.05 8.84 127. 1344 271. 21.6 90.5 6868 29.K 88.K tqm8xx Linux 2.6.33- 66 3.06 8.84 131. 1376 260. 21.4 88.1 7119 29.K 87.K tqm8xx Linux 2.6.33- 66 3.05 8.90 122. 1342 272. 21.4 88.6 6847 29.K 88.K tqm8xx Linux 2.6.33- 66 3.19 9.10 122. 1205 265. 20.9 90.3 6358 27.K 83.K tqm8xx Linux 2.6.33- 66 3.28 9.10 124. 1208 270. 20.9 95.2 6217 27.K 82.K tqm8xx Linux 2.6.33- 66 3.19 8.98 125. 1210 270. 21.1 87.9 6364 27.K 83.K tqm8xx Linux 2.6.33- 66 3.19 8.86 124. 1237 262. 21.3 90.7 6311 27.K 84.K Basic integer operations - times in nanoseconds - smaller is better ------------------------------------------------------------------- Host OS intgr intgr intgr intgr intgr bit add mul div mod --------- ------------- ------ ------ ------ ------ ------ tqm8xx Linux 2.6.33- 15.7 18.0 1.5600 124.2 203.1 tqm8xx Linux 2.6.33- 15.7 17.4 1.5800 121.1 202.8 tqm8xx Linux 2.6.33- 15.2 17.9 1.6200 124.2 202.7 tqm8xx Linux 2.6.33- 15.2 17.9 1.6000 125.0 204.0 tqm8xx Linux 2.6.33- 15.7 18.1 1.5600 124.7 204.4 tqm8xx Linux 2.6.33- 15.7 18.1 1.5800 124.2 202.8 tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 203.2 tqm8xx Linux 2.6.33- 15.7 18.1 1.5500 124.5 202.0 tqm8xx Linux 2.6.33- 15.7 18.1 1.5500 124.5 202.6 tqm8xx Linux 2.6.33- 15.7 18.1 1.5500 121.0 196.5 tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 121.0 202.5 tqm8xx Linux 2.6.33- 15.7 18.1 1.5500 125.1 196.4 tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 202.1 tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 203.4 tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 196.4 tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 196.5 Basic uint64 operations - times in nanoseconds - smaller is better ------------------------------------------------------------------ Host OS int64 int64 int64 int64 int64 bit add mul div mod --------- ------------- ------ ------ ------ ------ ------ tqm8xx Linux 2.6.33- 15. 13.3 1952.2 1838.2 tqm8xx Linux 2.6.33- 15. 13.2 1951.5 1837.8 tqm8xx Linux 2.6.33- 15. 13.2 1886.7 1907.8 tqm8xx Linux 2.6.33- 15. 13.2 1951.5 1838.2 tqm8xx Linux 2.6.33- 15. 13.3 1887.0 1902.2 tqm8xx Linux 2.6.33- 15. 13.3 1887.4 1901.5 tqm8xx Linux 2.6.33- 15. 13.3 1886.7 1893.0 tqm8xx Linux 2.6.33- 15. 13.3 1950.0 1900.4 tqm8xx Linux 2.6.33- 15. 13.3 1955.2 1906.7 tqm8xx Linux 2.6.33- 15. 13.2 1943.7 1900.7 tqm8xx Linux 2.6.33- 15. 13.3 1958.2 1910.4 tqm8xx Linux 2.6.33- 15. 13.3 1886.7 1900.7 tqm8xx Linux 2.6.33- 15. 13.3 1943.7 1837.4 tqm8xx Linux 2.6.33- 15. 13.2 1944.1 1837.4 tqm8xx Linux 2.6.33- 15. 13.2 1944.4 1906.1 tqm8xx Linux 2.6.33- 15. 13.2 1957.8 1894.8 Basic float operations - times in nanoseconds - smaller is better ----------------------------------------------------------------- Host OS float float float float add mul div bogo --------- ------------- ------ ------ ------ ------ tqm8xx Linux 2.6.33- 1008.9 1629.2 5527.0 9895.0 tqm8xx Linux 2.6.33- 1008.9 1628.9 5495.0 9892.0 tqm8xx Linux 2.6.33- 1007.8 1622.0 5499.0 9886.0 tqm8xx Linux 2.6.33- 1016.5 1628.6 5319.0 9940.0 tqm8xx Linux 2.6.33- 1008.0 1628.3 5497.0 9879.0 tqm8xx Linux 2.6.33- 1007.6 1577.4 5495.0 9881.0 tqm8xx Linux 2.6.33- 1014.8 1627.1 5493.0 9889.0 tqm8xx Linux 2.6.33- 1004.6 1627.7 5487.0 9881.0 tqm8xx Linux 2.6.33- 1003.8 1627.1 5490.0 9875.0 tqm8xx Linux 2.6.33- 977.2 1628.0 5318.0 9924.0 tqm8xx Linux 2.6.33- 1007.4 1627.7 5490.0 9882.0 tqm8xx Linux 2.6.33- 1004.7 1628.0 5495.0 9891.0 tqm8xx Linux 2.6.33- 1011.6 1630.1 5484.0 9855.0 tqm8xx Linux 2.6.33- 977.0 1621.4 5469.0 9856.0 tqm8xx Linux 2.6.33- 1011.4 1621.4 5471.0 9856.0 tqm8xx Linux 2.6.33- 1004.9 1577.1 5470.0 9866.0 Basic double operations - times in nanoseconds - smaller is better ------------------------------------------------------------------ Host OS double double double double add mul div bogo --------- ------------- ------ ------ ------ ------ tqm8xx Linux 2.6.33- 1562.4 2782.8 3730.7 12.6K tqm8xx Linux 2.6.33- 1556.1 2781.5 3724.3 12.6K tqm8xx Linux 2.6.33- 1513.9 2801.0 3726.4 12.8K tqm8xx Linux 2.6.33- 1556.1 2780.9 3611.4 12.6K tqm8xx Linux 2.6.33- 1570.5 2772.6 3742.1 12.6K tqm8xx Linux 2.6.33- 1560.1 2703.0 3611.4 12.7K tqm8xx Linux 2.6.33- 1560.4 2779.5 3760.7 12.7K tqm8xx Linux 2.6.33- 1559.8 2773.0 3742.1 12.6K tqm8xx Linux 2.6.33- 1564.7 2699.0 3722.1 12.6K tqm8xx Linux 2.6.33- 1560.7 2790.0 3725.7 12.7K tqm8xx Linux 2.6.33- 1565.0 2780.0 3749.3 12.7K tqm8xx Linux 2.6.33- 1560.4 2700.0 3767.1 12.8K tqm8xx Linux 2.6.33- 1555.5 2772.1 3747.9 12.6K tqm8xx Linux 2.6.33- 1513.5 2772.5 3725.7 12.6K tqm8xx Linux 2.6.33- 1557.0 2772.5 3725.7 12.7K tqm8xx Linux 2.6.33- 1514.1 2773.5 3719.3 12.7K Context switching - times in microseconds - smaller is better ------------------------------------------------------------------------- Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --------- ------------- ------ ------ ------ ------ ------ ------- ------- tqm8xx Linux 2.6.33- 92.6 109.6 110.9 137.5 173.8 151.8 199.3 tqm8xx Linux 2.6.33- 95.8 108.5 104.7 137.1 172.7 150.9 194.7 tqm8xx Linux 2.6.33- 95.8 118.8 97.5 146.4 162.0 160.8 190.1 tqm8xx Linux 2.6.33- 92.9 111.9 101.0 138.1 166.6 152.3 192.0 tqm8xx Linux 2.6.33- 90.8 108.5 116.2 134.3 171.8 147.1 210.0 tqm8xx Linux 2.6.33- 100.1 111.4 105.0 136.4 173.1 148.3 200.8 tqm8xx Linux 2.6.33- 98.7 111.3 111.8 135.7 172.5 147.9 200.9 tqm8xx Linux 2.6.33- 92.0 117.9 109.9 141.6 170.4 154.9 196.4 tqm8xx Linux 2.6.33- 96.9 112.4 95.4 138.3 165.1 152.2 196.4 tqm8xx Linux 2.6.33- 100.6 115.8 109.3 138.5 173.3 150.9 199.2 tqm8xx Linux 2.6.33- 102.2 114.3 109.4 140.9 175.5 153.2 202.0 tqm8xx Linux 2.6.33- 99.1 114.5 106.5 138.2 174.7 151.7 199.9 tqm8xx Linux 2.6.33- 69.5 80.5 88.9 119.6 147.3 130.4 178.7 tqm8xx Linux 2.6.33- 85.8 97.6 79.1 122.3 154.1 132.6 180.1 tqm8xx Linux 2.6.33- 89.4 93.8 125.7 120.8 178.4 129.5 206.1 tqm8xx Linux 2.6.33- 88.1 101.8 91.2 121.4 162.8 131.6 191.4 *Local* Communication latencies in microseconds - smaller is better --------------------------------------------------------------------- Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- tqm8xx Linux 2.6.33- 92.6 338.4 581. 720.1 1047. 2749 tqm8xx Linux 2.6.33- 95.8 334.0 595. 725.0 1051. 2754 tqm8xx Linux 2.6.33- 95.8 330.9 574. 720.1 1047. 2772 tqm8xx Linux 2.6.33- 92.9 338.8 574. 714.3 1046. 2742 tqm8xx Linux 2.6.33- 90.8 322.1 576. 734.9 1012. 2706 tqm8xx Linux 2.6.33- 100.1 326.0 565. 719.5 1027. 2702 tqm8xx Linux 2.6.33- 98.7 322.8 571. 713.8 1028. 2711 tqm8xx Linux 2.6.33- 92.0 328.1 549. 714.1 1022. 2696 tqm8xx Linux 2.6.33- 96.9 327.0 573. 722.3 1036. 2721 tqm8xx Linux 2.6.33- 100.6 330.4 561. 723.8 1024. 2726 tqm8xx Linux 2.6.33- 102.2 331.4 590. 728.6 1040. 2753 tqm8xx Linux 2.6.33- 99.1 330.1 585. 723.5 1023. 2750 tqm8xx Linux 2.6.33- 69.5 265.9 447. 632.6 909.0 2431 tqm8xx Linux 2.6.33- 85.8 267.0 492. 650.6 909.4 2455 tqm8xx Linux 2.6.33- 89.4 295.6 493. 643.0 908.8 2453 tqm8xx Linux 2.6.33- 88.1 301.0 494. 645.1 907.9 2451 *Remote* Communication latencies in microseconds - smaller is better --------------------------------------------------------------------- Host OS UDP RPC/ TCP RPC/ TCP UDP TCP conn --------- ------------- ----- ----- ----- ----- ---- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- tqm8xx Linux 2.6.33- 5917.2 3968.3 31.2K 4329.0 4147.0 18.8 34.1 135.2 tqm8xx Linux 2.6.33- 5714.3 3937.0 32.3K 6060.6 4210.0 14.2 34.5 131.4 tqm8xx Linux 2.6.33- 5747.1 4000.0 31.2K 4329.0 4114.0 7.692 34.0 133.1 tqm8xx Linux 2.6.33- 5747.1 4081.6 30.3K 4273.5 4100.0 18.2 34.2 135.0 tqm8xx Linux 2.6.33- 5714.3 3952.6 31.2K 4273.5 4130.0 33.5 35.1 136.1 tqm8xx Linux 2.6.33- 5714.3 3906.2 31.2K 6060.6 4105.0 25.7 35.5 135.9 tqm8xx Linux 2.6.33- 5681.8 3921.6 32.3K 4255.3 4144.0 23.5 35.0 134.9 tqm8xx Linux 2.6.33- 5649.7 3937.0 30.3K 4237.3 4116.0 21.6 35.3 135.3 tqm8xx Linux 2.6.33- 5747.1 3921.6 32.3K 4329.0 4107.0 17.7 35.6 131.2 tqm8xx Linux 2.6.33- 5952.4 3937.0 31.2K 4273.5 4119.0 25.4 35.8 136.4 tqm8xx Linux 2.6.33- 5848.0 3937.0 32.3K 4484.3 4223.0 14.3 35.4 135.1 tqm8xx Linux 2.6.33- 6172.8 3984.1 35.7K 4291.8 4210.0 14.4 36.0 135.0 tqm8xx Linux 2.6.33- 5291.0 3610.1 31.2K 4065.0 3836.0 1.389 30.0 135.7 tqm8xx Linux 2.6.33- 5524.9 3649.6 29.4K 3906.2 3867.0 14.9 29.8 137.7 tqm8xx Linux 2.6.33- 5319.1 3649.6 29.4K 4048.6 3873.0 13.3 30.3 135.9 tqm8xx Linux 2.6.33- 5347.6 3623.2 32.3K 3921.6 3894.0 13.3 30.4 135.8 *Local* Communication bandwidths in MB/s - bigger is better ----------------------------------------------------------------------------- Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- ----- tqm8xx Linux 2.6.33- 14.8 15.6 10.1 21.0 55.5 32.3 34.5 55.6 53.0 tqm8xx Linux 2.6.33- 14.8 15.6 10.7 21.0 55.5 32.3 34.5 55.6 53.0 tqm8xx Linux 2.6.33- 14.8 15.7 12.7 21.0 55.5 32.3 34.5 55.6 53.0 tqm8xx Linux 2.6.33- 14.8 15.6 13.9 21.0 55.5 32.3 34.5 55.6 53.0 tqm8xx Linux 2.6.33- 14.8 15.8 12.9 21.0 55.7 32.5 34.6 55.8 53.1 tqm8xx Linux 2.6.33- 14.8 15.7 14.0 21.0 55.7 32.4 34.6 55.8 53.1 tqm8xx Linux 2.6.33- 14.8 15.8 12.9 21.0 55.7 32.5 34.6 55.8 53.1 tqm8xx Linux 2.6.33- 14.8 15.8 13.0 21.0 55.7 32.5 34.6 55.8 53.1 tqm8xx Linux 2.6.33- 14.8 15.7 14.0 21.0 55.6 32.4 34.6 55.8 53.1 tqm8xx Linux 2.6.33- 14.7 15.7 12.8 21.0 55.6 32.4 34.6 55.7 53.1 tqm8xx Linux 2.6.33- 14.6 15.7 12.8 21.0 55.6 32.4 34.6 55.8 53.1 tqm8xx Linux 2.6.33- 14.8 15.7 12.8 21.0 55.6 32.4 34.6 55.8 53.1 tqm8xx Linux 2.6.33- 15.0 16.0 13.2 21.3 55.8 32.5 34.7 55.9 53.2 tqm8xx Linux 2.6.33- 15.0 16.0 13.4 21.3 55.8 32.5 34.7 55.8 53.2 tqm8xx Linux 2.6.33- 15.0 16.0 13.9 21.3 55.8 32.5 34.7 55.9 53.2 tqm8xx Linux 2.6.33- 15.0 16.0 13.2 21.2 55.8 32.5 34.6 55.9 53.2 Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) ------------------------------------------------------------------------------ Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses --------- ------------- --- ---- ---- -------- -------- ------- tqm8xx Linux 2.6.33- 66 31.8 141.0 184.0 1165.7 tqm8xx Linux 2.6.33- 66 31.8 141.2 184.2 1165.3 tqm8xx Linux 2.6.33- 66 31.8 141.3 184.3 1165.6 tqm8xx Linux 2.6.33- 66 31.8 141.3 184.2 1166.2 tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1100.5 No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1102.5 No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.7 No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.6 No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.1 No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.0 No L2 cache? tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.7 No L2 cache? tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.2 No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 171.1 171.7 1099.8 No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 171.1 171.6 1100.5 No L2 cache? tqm8xx Linux 2.6.33- 66 31.7 171.0 171.7 1101.0 No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 171.0 171.6 1101.3 No L2 cache? make[1]: Leaving directory `/home/hs/lmbench-3.0-a9/results' bye Heiko -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-04 10:30 ` Heiko Schocher @ 2010-03-04 12:16 ` Wolfgang Denk 2010-03-04 13:06 ` Joakim Tjernlund 0 siblings, 1 reply; 22+ messages in thread From: Wolfgang Denk @ 2010-03-04 12:16 UTC (permalink / raw) To: hs; +Cc: Scott Wood, linuxppc-dev Dear Heiko, thanks for running the tests. In message <4B8F8BB4.6070201@denx.de> you wrote: > > here the results: > > run version > > 1-4 2.6.33-rc6 without your patches > 5-8 2.6.33-rc6 with all your patches > 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED when no SWAP) > 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y So CONFIG_PIN_TLB imroves the performance as expected, while the other patches don;t show any measurable improvememt - or am I reading the results incorrectly? Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de And now remains That we find out the cause of this effect, Or rather say, the cause of this defect... -- Hamlet, Act II, Scene 2 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-04 12:16 ` Wolfgang Denk @ 2010-03-04 13:06 ` Joakim Tjernlund 2010-03-04 16:30 ` Heiko Schocher 0 siblings, 1 reply; 22+ messages in thread From: Joakim Tjernlund @ 2010-03-04 13:06 UTC (permalink / raw) To: Wolfgang Denk; +Cc: Scott Wood, linuxppc-dev, hs Wolfgang Denk <wd@denx.de> wrote on 2010/03/04 13:16:56: > From: Wolfgang Denk <wd@denx.de> > To: hs@denx.de > Cc: Joakim Tjernlund <joakim.tjernlund@transmode.se>, Klaus-J=FCrgen > <heydeck@kieback-peter.de>, linuxppc-dev@ozlabs.org, Scott Wood > <scottwood@freescale.com> > Date: 2010/03/04 13:17 > Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. > > Dear Heiko, > > thanks for running the tests. > > In message <4B8F8BB4.6070201@denx.de> you wrote: > > > > here the results: > > > > run version > > > > 1-4 2.6.33-rc6 without your patches > > 5-8 2.6.33-rc6 with all your patches > > 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch = ACCESSED > when no SWAP) > > 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=3Dy > > So CONFIG_PIN_TLB imroves the performance as expected, while the othe= r > patches don;t show any measurable improvememt - or am I reading the > results incorrectly? Close but not quite. What stands out most is: Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) -----------------------------------------------------------------------= ------- Host OS Mhz L1 $ L2 $ Main mem Rand mem = Guesses --------- ------------- --- ---- ---- -------- -------- = ------- tqm8xx Linux 2.6.33- 66 31.8 141.0 184.0 1165.7 tqm8xx Linux 2.6.33- 66 31.8 141.2 184.2 1165.3 tqm8xx Linux 2.6.33- 66 31.8 141.3 184.3 1165.6 tqm8xx Linux 2.6.33- 66 31.8 141.3 184.2 1166.2 tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1100.5 = No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1102.5 = No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.7 = No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.6 = No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.1 = No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.0 = No L2 cache? tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.7 = No L2 cache? tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.2 = No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 171.1 171.7 1099.8 = No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 171.1 171.6 1100.5 = No L2 cache? tqm8xx Linux 2.6.33- 66 31.7 171.0 171.7 1101.0 = No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 171.0 171.6 1101.3 = No L2 cache? Besides the numbers, note how the first group doesn't have a Guesses en= try. Is there something odd with the results for the first group? Also, since you are using MODULES, patch 2 is nullified. Patch 1 is very minor and should not show I think. This leaves patches 3 & 4. There appears to be something funny with patch 3,Don't touch ACCESSED w= hen no SWAP, as it yields bad numbers for Prot Fault so perhaps I am missing something = that needs ACCESSED even if NO_SWAP. Perhaps a someone that knows MM in Linux knows? Is there any messages in the kernel log(dmesg)? Jocke= ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-04 13:06 ` Joakim Tjernlund @ 2010-03-04 16:30 ` Heiko Schocher 2010-03-05 10:40 ` Joakim Tjernlund 2010-03-07 16:03 ` Joakim Tjernlund 0 siblings, 2 replies; 22+ messages in thread From: Heiko Schocher @ 2010-03-04 16:30 UTC (permalink / raw) To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk Hello Joakim, Joakim Tjernlund wrote: > Wolfgang Denk <wd@denx.de> wrote on 2010/03/04 13:16:56: >> From: Wolfgang Denk <wd@denx.de> >> To: hs@denx.de >> Cc: Joakim Tjernlund <joakim.tjernlund@transmode.se>, Klaus-Jürgen >> <heydeck@kieback-peter.de>, linuxppc-dev@ozlabs.org, Scott Wood >> <scottwood@freescale.com> >> Date: 2010/03/04 13:17 >> Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. >> >> Dear Heiko, >> >> thanks for running the tests. >> >> In message <4B8F8BB4.6070201@denx.de> you wrote: >>> here the results: >>> >>> run version >>> >>> 1-4 2.6.33-rc6 without your patches >>> 5-8 2.6.33-rc6 with all your patches >>> 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED >> when no SWAP) >>> 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y >> So CONFIG_PIN_TLB imroves the performance as expected, while the other >> patches don;t show any measurable improvememt - or am I reading the >> results incorrectly? > > Close but not quite. What stands out most is: > > Memory latencies in nanoseconds - smaller is better > (WARNING - may not be correct, check graphs) > ------------------------------------------------------------------------------ > Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses > --------- ------------- --- ---- ---- -------- -------- ------- > tqm8xx Linux 2.6.33- 66 31.8 141.0 184.0 1165.7 > tqm8xx Linux 2.6.33- 66 31.8 141.2 184.2 1165.3 > tqm8xx Linux 2.6.33- 66 31.8 141.3 184.3 1165.6 > tqm8xx Linux 2.6.33- 66 31.8 141.3 184.2 1166.2 > > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1100.5 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1102.5 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.7 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.6 No L2 cache? > > tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.1 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.0 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.7 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.2 No L2 cache? > > tqm8xx Linux 2.6.33- 66 31.8 171.1 171.7 1099.8 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.8 171.1 171.6 1100.5 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.7 171.0 171.7 1101.0 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.8 171.0 171.6 1101.3 No L2 cache? > > > Besides the numbers, note how the first group doesn't have a Guesses entry. > Is there something odd with the results for the first group? Hmm.. just to be safe, I made this test again, but it shows also no entry in "Guesses" ... Hardware, Linux Source, rootFS, lmbench sources, all the same ... > Also, since you are using MODULES, patch 2 is nullified. > Patch 1 is very minor and should not show I think. > This leaves patches 3 & 4. > There appears to be something funny with patch 3,Don't touch ACCESSED when no SWAP, as > it yields bad numbers for Prot Fault so perhaps I am missing something that needs ACCESSED > even if NO_SWAP. Perhaps a someone that knows MM in Linux knows? > Is there any messages in the kernel log(dmesg)? I couldn;t find something in the output with dmesg ... but if you want this output, I can send it to you. bye Heiko -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-04 16:30 ` Heiko Schocher @ 2010-03-05 10:40 ` Joakim Tjernlund 2010-03-08 7:46 ` Heiko Schocher 2010-03-07 16:03 ` Joakim Tjernlund 1 sibling, 1 reply; 22+ messages in thread From: Joakim Tjernlund @ 2010-03-05 10:40 UTC (permalink / raw) To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk Heiko Schocher <hs@denx.de> wrote on 2010/03/04 17:30:07: > > Hello Joakim, > > Joakim Tjernlund wrote: > > Wolfgang Denk <wd@denx.de> wrote on 2010/03/04 13:16:56: > >> From: Wolfgang Denk <wd@denx.de> > >> To: hs@denx.de > >> Cc: Joakim Tjernlund <joakim.tjernlund@transmode.se>, Klaus-J=FCrg= en > >> <heydeck@kieback-peter.de>, linuxppc-dev@ozlabs.org, Scott Wood > >> <scottwood@freescale.com> > >> Date: 2010/03/04 13:17 > >> Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. > >> > >> Dear Heiko, > >> > >> thanks for running the tests. > >> > >> In message <4B8F8BB4.6070201@denx.de> you wrote: > >>> here the results: > >>> > >>> run version > >>> > >>> 1-4 2.6.33-rc6 without your patches > >>> 5-8 2.6.33-rc6 with all your patches > >>> 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touc= h ACCESSED > >> when no SWAP) > >>> 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=3Dy > >> So CONFIG_PIN_TLB imroves the performance as expected, while the o= ther > >> patches don;t show any measurable improvememt - or am I reading th= e > >> results incorrectly? > > > > Close but not quite. What stands out most is: > > > > Memory latencies in nanoseconds - smaller is better > > (WARNING - may not be correct, check graphs) > > -------------------------------------------------------------------= ----------- > > Host OS Mhz L1 $ L2 $ Main mem Rand mem= Guesses > > --------- ------------- --- ---- ---- -------- --------= ------- > > tqm8xx Linux 2.6.33- 66 31.8 141.0 184.0 1165.7= > > tqm8xx Linux 2.6.33- 66 31.8 141.2 184.2 1165.3= > > tqm8xx Linux 2.6.33- 66 31.8 141.3 184.3 1165.6= > > tqm8xx Linux 2.6.33- 66 31.8 141.3 184.2 1166.2= > > > > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1100.5= No L2 cache? > > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1102.5= No L2 cache? > > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.7= No L2 cache? > > tqm8xx Linux 2.6.33- 66 31.8 141.0 171.8 1101.6= No L2 cache? > > > > tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.1= No L2 cache? > > tqm8xx Linux 2.6.33- 66 31.8 141.1 173.4 1149.0= No L2 cache? > > tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.7= No L2 cache? > > tqm8xx Linux 2.6.33- 66 31.7 141.1 173.4 1148.2= No L2 cache? > > > > tqm8xx Linux 2.6.33- 66 31.8 171.1 171.7 1099.8= No L2 cache? > > tqm8xx Linux 2.6.33- 66 31.8 171.1 171.6 1100.5= No L2 cache? > > tqm8xx Linux 2.6.33- 66 31.7 171.0 171.7 1101.0= No L2 cache? > > tqm8xx Linux 2.6.33- 66 31.8 171.0 171.6 1101.3= No L2 cache? > > > > > > Besides the numbers, note how the first group doesn't have a Guesse= s entry. > > Is there something odd with the results for the first group? > > Hmm.. just to be safe, I made this test again, but it shows also no e= ntry in > "Guesses" ... Hardware, Linux Source, rootFS, lmbench sources, all th= e > same ... OK > > > Also, since you are using MODULES, patch 2 is nullified. > > Patch 1 is very minor and should not show I think. > > This leaves patches 3 & 4. > > There appears to be something funny with patch 3,Don't touch ACCESS= ED when no SWAP, as > > it yields bad numbers for Prot Fault so perhaps I am missing someth= ing that > needs ACCESSED > > even if NO_SWAP. Perhaps a someone that knows MM in Linux knows? > > Is there any messages in the kernel log(dmesg)? > > I couldn;t find something in the output with dmesg ... but if you > want this output, I can send it to you. No, if you can't find anything in there, I won't either. What would be interesting is to skip patch 3 and turn off MODULES add PIN_TLB and compare that against your unpatched .33 but with MODULES off and PIN_TLB on Jocke= ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-05 10:40 ` Joakim Tjernlund @ 2010-03-08 7:46 ` Heiko Schocher 2010-03-08 8:44 ` Joakim Tjernlund 0 siblings, 1 reply; 22+ messages in thread From: Heiko Schocher @ 2010-03-08 7:46 UTC (permalink / raw) To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk Hello Joakim, Joakim Tjernlund wrote: [...] > What would be interesting is to skip patch 3 and turn off > MODULES add PIN_TLB and compare that against your unpatched .33 but > with MODULES off and PIN_TLB on run version 1-4 Linux2.6.33-rc without module support and PIN_TLB=on 5-8 Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4 L M B E N C H 3 . 0 S U M M A R Y ------------------------------------ (Alpha software, do not distribute) Basic system parameters ------------------------------------------------------------------------------ Host OS Description Mhz tlb cache mem scal pages line par load bytes --------- ------------- ----------------------- ---- ----- ----- ------ ---- tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0100 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0300 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0100 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 7 16 1.0400 1 tqm8xx Linux 2.6.33- powerpc-linux-gnu 66 28 16 1.0100 1 Processor, Processes - times in microseconds - smaller is better ------------------------------------------------------------------------------ Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- tqm8xx Linux 2.6.33- 66 2.97 8.91 127. 1238 270. 22.3 92.1 6386 27.K 83.K tqm8xx Linux 2.6.33- 66 3.05 8.99 129. 1208 261. 22.3 85.3 6418 27.K 83.K tqm8xx Linux 2.6.33- 66 3.05 8.81 128. 1205 270. 22.3 87.3 6342 27.K 82.K tqm8xx Linux 2.6.33- 66 3.05 8.82 132. 1215 270. 23.1 86.7 6357 27.K 82.K tqm8xx Linux 2.6.33- 66 3.28 9.29 128. 1257 260. 23.9 83.7 6511 28.K 84.K tqm8xx Linux 2.6.33- 66 3.34 9.35 126. 1264 271. 23.1 86.6 6437 27.K 84.K tqm8xx Linux 2.6.33- 66 3.19 8.97 130. 1212 271. 23.1 95.3 6480 27.K 84.K tqm8xx Linux 2.6.33- 66 3.28 8.76 127. 1229 269. 22.9 90.9 6293 27.K 82.K Basic integer operations - times in nanoseconds - smaller is better ------------------------------------------------------------------- Host OS intgr intgr intgr intgr intgr bit add mul div mod --------- ------------- ------ ------ ------ ------ ------ tqm8xx Linux 2.6.33- 15.2 17.9 1.2500 124.1 202.4 tqm8xx Linux 2.6.33- 15.6 18.0 1.1900 124.1 196.4 tqm8xx Linux 2.6.33- 15.2 17.9 1.2400 124.9 202.5 tqm8xx Linux 2.6.33- 15.2 17.9 1.2400 124.2 196.8 tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 203.6 tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 124.2 202.1 tqm8xx Linux 2.6.33- 15.7 17.9 1.5700 125.0 202.2 tqm8xx Linux 2.6.33- 15.7 17.9 1.5500 121.1 196.4 Basic uint64 operations - times in nanoseconds - smaller is better ------------------------------------------------------------------ Host OS int64 int64 int64 int64 int64 bit add mul div mod --------- ------------- ------ ------ ------ ------ ------ tqm8xx Linux 2.6.33- 15. 12.9 1944.1 1895.2 tqm8xx Linux 2.6.33- 15. 12.9 1886.3 1894.4 tqm8xx Linux 2.6.33- 15. 12.9 1944.1 1895.2 tqm8xx Linux 2.6.33- 15. 12.9 1886.3 1894.8 tqm8xx Linux 2.6.33- 15. 13.2 1944.1 1894.4 tqm8xx Linux 2.6.33- 15. 13.2 1944.8 1896.3 tqm8xx Linux 2.6.33- 15. 13.2 1945.2 1837.4 tqm8xx Linux 2.6.33- 15. 13.2 1957.8 1907.4 Basic float operations - times in nanoseconds - smaller is better ----------------------------------------------------------------- Host OS float float float float add mul div bogo --------- ------------- ------ ------ ------ ------ tqm8xx Linux 2.6.33- 1011.0 1620.2 5467.0 9868.0 tqm8xx Linux 2.6.33- 1004.5 1630.1 5468.0 9852.0 tqm8xx Linux 2.6.33- 1012.2 1620.5 5472.0 9855.0 tqm8xx Linux 2.6.33- 1011.0 1620.2 5469.0 9866.0 tqm8xx Linux 2.6.33- 1004.8 1617.3 5503.0 9856.0 tqm8xx Linux 2.6.33- 1004.9 1577.1 5469.0 9859.0 tqm8xx Linux 2.6.33- 1011.4 1618.5 5470.0 9859.0 tqm8xx Linux 2.6.33- 1004.9 1620.5 5471.0 9904.0 Basic double operations - times in nanoseconds - smaller is better ------------------------------------------------------------------ Host OS double double double double add mul div bogo --------- ------------- ------ ------ ------ ------ tqm8xx Linux 2.6.33- 1555.5 2789.5 3725.7 12.8K tqm8xx Linux 2.6.33- 1513.2 2772.0 3720.0 12.7K tqm8xx Linux 2.6.33- 1555.8 2772.1 3730.0 12.7K tqm8xx Linux 2.6.33- 1555.5 2699.0 3725.0 12.7K tqm8xx Linux 2.6.33- 1513.8 2699.5 3610.7 12.7K tqm8xx Linux 2.6.33- 1566.7 2771.6 3750.0 12.7K tqm8xx Linux 2.6.33- 1556.7 2789.2 3612.1 12.6K tqm8xx Linux 2.6.33- 1556.7 2698.5 3749.3 12.6K Context switching - times in microseconds - smaller is better ------------------------------------------------------------------------- Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --------- ------------- ------ ------ ------ ------ ------ ------- ------- tqm8xx Linux 2.6.33- 64.4 74.9 130.2 111.1 180.4 123.2 211.1 tqm8xx Linux 2.6.33- 67.4 81.0 125.0 117.0 183.7 127.7 208.4 tqm8xx Linux 2.6.33- 67.5 80.5 92.7 115.3 156.9 128.0 183.8 tqm8xx Linux 2.6.33- 67.0 80.2 90.5 114.6 159.4 126.8 185.8 tqm8xx Linux 2.6.33- 82.0 87.8 88.0 116.1 149.3 125.5 182.2 tqm8xx Linux 2.6.33- 81.7 98.5 97.6 123.8 158.1 135.3 188.0 tqm8xx Linux 2.6.33- 67.9 87.7 90.7 114.9 151.1 127.3 177.9 tqm8xx Linux 2.6.33- 67.5 80.3 84.6 113.6 145.7 124.8 170.9 *Local* Communication latencies in microseconds - smaller is better --------------------------------------------------------------------- Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- tqm8xx Linux 2.6.33- 64.4 254.3 455. 648.0 941.8 2505 tqm8xx Linux 2.6.33- 67.4 261.2 456. 645.8 909.1 2439 tqm8xx Linux 2.6.33- 67.5 264.8 459. 638.5 932.0 2447 tqm8xx Linux 2.6.33- 67.0 262.4 454. 643.9 909.9 2442 tqm8xx Linux 2.6.33- 82.0 302.1 500. 651.4 937.2 2504 tqm8xx Linux 2.6.33- 81.7 300.2 510. 643.2 909.7 2490 tqm8xx Linux 2.6.33- 67.9 266.7 498. 645.5 923.4 2442 tqm8xx Linux 2.6.33- 67.5 260.8 444. 640.3 917.7 2440 *Remote* Communication latencies in microseconds - smaller is better --------------------------------------------------------------------- Host OS UDP RPC/ TCP RPC/ TCP UDP TCP conn --------- ------------- ----- ----- ----- ----- ---- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- tqm8xx Linux 2.6.33- File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- tqm8xx Linux 2.6.33- 6097.6 3731.3 30.3K 4000.0 4026.0 20.5 31.9 131.9 tqm8xx Linux 2.6.33- 5747.1 3623.2 32.3K 3952.6 4030.0 16.6 31.0 132.7 tqm8xx Linux 2.6.33- 5405.4 3610.1 32.3K 3921.6 4004.0 15.5 30.0 131.9 tqm8xx Linux 2.6.33- 5681.8 3891.1 35.7K 4219.4 3966.0 6.038 30.4 128.7 tqm8xx Linux 2.6.33- 12.7K 3649.6 34.5K 7092.2 4066.0 3.604 31.4 133.6 tqm8xx Linux 2.6.33- 5405.4 4032.3 38.5K 5494.5 4036.0 18.1 31.0 128.6 tqm8xx Linux 2.6.33- 5405.4 3610.1 37.0K 7142.9 4078.0 15.4 31.0 133.2 tqm8xx Linux 2.6.33- 5714.3 3623.2 30.3K 7194.2 4054.0 12.7 29.9 133.0 *Local* Communication bandwidths in MB/s - bigger is better ----------------------------------------------------------------------------- Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- ----- tqm8xx Linux 2.6.33- 14.9 16.1 13.0 21.4 55.6 32.4 34.5 55.7 53.0 tqm8xx Linux 2.6.33- 14.9 16.2 12.9 21.3 55.5 32.4 34.5 55.7 53.0 tqm8xx Linux 2.6.33- 14.8 16.0 13.0 21.4 55.6 32.4 34.5 55.7 53.0 tqm8xx Linux 2.6.33- 15.0 16.2 13.8 21.3 55.6 32.4 34.5 55.7 53.0 tqm8xx Linux 2.6.33- 14.9 16.0 13.4 21.3 55.7 32.5 34.6 55.8 53.2 tqm8xx Linux 2.6.33- 15.1 16.2 13.6 21.3 55.7 32.5 34.6 55.8 53.2 tqm8xx Linux 2.6.33- 15.0 16.2 12.9 21.3 55.7 32.5 34.6 55.8 53.2 tqm8xx Linux 2.6.33- 15.1 16.2 13.1 21.5 55.7 32.5 34.7 55.8 53.2 Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) ------------------------------------------------------------------------------ Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses --------- ------------- --- ---- ---- -------- -------- ------- tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.0 No L2 cache? tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1164.8 No L2 cache? tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.2 No L2 cache? tqm8xx Linux 2.6.33- 66 31.7 183.2 183.8 1163.7 No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 172.4 173.2 1147.3 No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1148.3 No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 172.5 173.1 1146.9 No L2 cache? tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1147.3 No L2 cache? make[1]: Leaving directory `/home/hs/lmbench-3.0-a9/results' -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-08 7:46 ` Heiko Schocher @ 2010-03-08 8:44 ` Joakim Tjernlund 2010-03-08 9:06 ` Heiko Schocher 0 siblings, 1 reply; 22+ messages in thread From: Joakim Tjernlund @ 2010-03-08 8:44 UTC (permalink / raw) To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk Heiko Schocher <hs@denx.de> wrote on 2010/03/08 08:46:29: > > Hello Joakim, > > Joakim Tjernlund wrote: > [...] > > What would be interesting is to skip patch 3 and turn off > > MODULES add PIN_TLB and compare that against your unpatched .33 but > > with MODULES off and PIN_TLB on > > run version > > 1-4 Linux2.6.33-rc without module support and PIN_TLB=on > 5-8 Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4 > > L M B E N C H 3 . 0 S U M M A R Y > ------------------------------------ > (Alpha software, do not distribute) hmm, these results varies a lot. The only stable result I can see is: > Memory latencies in nanoseconds - smaller is better > (WARNING - may not be correct, check graphs) > ------------------------------------------------------------------------------ > Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses > --------- ------------- --- ---- ---- -------- -------- ------- > tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.0 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1164.8 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.2 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.7 183.2 183.8 1163.7 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.8 172.4 173.2 1147.3 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1148.3 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.8 172.5 173.1 1146.9 No L2 cache? > tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1147.3 No L2 cache? I don't see why the other results vary so much. Are you using NFS or having much network traffic? Jocke ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-08 8:44 ` Joakim Tjernlund @ 2010-03-08 9:06 ` Heiko Schocher 2010-03-08 10:42 ` Joakim Tjernlund 0 siblings, 1 reply; 22+ messages in thread From: Heiko Schocher @ 2010-03-08 9:06 UTC (permalink / raw) To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk Hello Joakim, Joakim Tjernlund wrote: > Heiko Schocher <hs@denx.de> wrote on 2010/03/08 08:46:29: >> Hello Joakim, >> >> Joakim Tjernlund wrote: >> [...] >>> What would be interesting is to skip patch 3 and turn off >>> MODULES add PIN_TLB and compare that against your unpatched .33 but >>> with MODULES off and PIN_TLB on >> run version >> >> 1-4 Linux2.6.33-rc without module support and PIN_TLB=on >> 5-8 Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4 >> >> L M B E N C H 3 . 0 S U M M A R Y >> ------------------------------------ >> (Alpha software, do not distribute) > > hmm, these results varies a lot. The only stable result I can see is: > >> Memory latencies in nanoseconds - smaller is better >> (WARNING - may not be correct, check graphs) >> ------------------------------------------------------------------------------ >> Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses >> --------- ------------- --- ---- ---- -------- -------- ------- >> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.0 No L2 cache? >> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1164.8 No L2 cache? >> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.2 No L2 cache? >> tqm8xx Linux 2.6.33- 66 31.7 183.2 183.8 1163.7 No L2 cache? >> tqm8xx Linux 2.6.33- 66 31.8 172.4 173.2 1147.3 No L2 cache? >> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1148.3 No L2 cache? >> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.1 1146.9 No L2 cache? >> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1147.3 No L2 cache? > > I don't see why the other results vary so much. Are you using NFS or having much network > traffic? I use NFS. bye Heiko -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-08 9:06 ` Heiko Schocher @ 2010-03-08 10:42 ` Joakim Tjernlund 2010-03-09 6:30 ` Wolfgang Denk 0 siblings, 1 reply; 22+ messages in thread From: Joakim Tjernlund @ 2010-03-08 10:42 UTC (permalink / raw) To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk Heiko Schocher <hs@denx.de> wrote on 2010/03/08 10:06:39: > > Hello Joakim, > > Joakim Tjernlund wrote: > > Heiko Schocher <hs@denx.de> wrote on 2010/03/08 08:46:29: > >> Hello Joakim, > >> > >> Joakim Tjernlund wrote: > >> [...] > >>> What would be interesting is to skip patch 3 and turn off > >>> MODULES add PIN_TLB and compare that against your unpatched .33 but > >>> with MODULES off and PIN_TLB on > >> run version > >> > >> 1-4 Linux2.6.33-rc without module support and PIN_TLB=on > >> 5-8 Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4 > >> > >> L M B E N C H 3 . 0 S U M M A R Y > >> ------------------------------------ > >> (Alpha software, do not distribute) > > > > hmm, these results varies a lot. The only stable result I can see is: > > > >> Memory latencies in nanoseconds - smaller is better > >> (WARNING - may not be correct, check graphs) > >> ------------------------------------------------------------------------------ > >> Host OS Mhz L1 $ L2 $ Main mem Rand mem Guesses > >> --------- ------------- --- ---- ---- -------- -------- ------- > >> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.0 No L2 cache? > >> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1164.8 No L2 cache? > >> tqm8xx Linux 2.6.33- 66 31.7 183.2 184.0 1163.2 No L2 cache? > >> tqm8xx Linux 2.6.33- 66 31.7 183.2 183.8 1163.7 No L2 cache? > >> tqm8xx Linux 2.6.33- 66 31.8 172.4 173.2 1147.3 No L2 cache? > >> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1148.3 No L2 cache? > >> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.1 1146.9 No L2 cache? > >> tqm8xx Linux 2.6.33- 66 31.8 172.5 173.2 1147.3 No L2 cache? > > > > I don't see why the other results vary so much. Are you using NFS or having > much network > > traffic? > > I use NFS. Then I think it is possible NFS gets in the way for stable measurements. Anyone have experience with running lmbench on NFS? Jocke ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-08 10:42 ` Joakim Tjernlund @ 2010-03-09 6:30 ` Wolfgang Denk 0 siblings, 0 replies; 22+ messages in thread From: Wolfgang Denk @ 2010-03-09 6:30 UTC (permalink / raw) To: Joakim Tjernlund; +Cc: Scott Wood, hs, linuxppc-dev Dear Joakim Tjernlund, In message <OF1413A940.58E7B20E-ONC12576E0.003A9000-C12576E0.003ACFB7@transmode.se> you wrote: > > > I use NFS. > > Then I think it is possible NFS gets in the way for stable measurements. Anyone > have experience with running lmbench on NFS? NFS may have some influence here, but I doubt it is the primary cause for these variations. The network where Heiko is running these tests is mostly idle, so it should provide fairly constant conditions. Of coursem the use of the network on the MPC8xx itself will add to the variation, but again I would not expect so big differences. Heiko - there is a 10 GB disk attached to the "tqm8xx" system; I think there should be a usable root file system on it, but I cannot remember the actual state. Maybe we can use that. Please contact me on jabber this afternoon! Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Living on Earth may be expensive, but it includes an annual free trip around the Sun. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. 2010-03-04 16:30 ` Heiko Schocher 2010-03-05 10:40 ` Joakim Tjernlund @ 2010-03-07 16:03 ` Joakim Tjernlund 1 sibling, 0 replies; 22+ messages in thread From: Joakim Tjernlund @ 2010-03-07 16:03 UTC (permalink / raw) To: hs; +Cc: Scott Wood, linuxppc-dev, Wolfgang Denk Heiko Schocher <hs@denx.de> wrote on 2010/03/04 17:30:07: > From: Heiko Schocher <hs@denx.de> > To: Joakim Tjernlund <joakim.tjernlund@transmode.se> > Cc: Wolfgang Denk <wd@denx.de>, Klaus-J=FCrgen <heydeck@kieback-peter= .de>, > linuxppc-dev@ozlabs.org, Scott Wood <scottwood@freescale.com> > Date: 2010/03/04 17:30 > Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. > > Hello Joakim, > > Joakim Tjernlund wrote: > > Wolfgang Denk <wd@denx.de> wrote on 2010/03/04 13:16:56: > >> From: Wolfgang Denk <wd@denx.de> > >> To: hs@denx.de > >> Cc: Joakim Tjernlund <joakim.tjernlund@transmode.se>, Klaus-J=FCrg= en > >> <heydeck@kieback-peter.de>, linuxppc-dev@ozlabs.org, Scott Wood > >> <scottwood@freescale.com> > >> Date: 2010/03/04 13:17 > >> Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. > >> > >> Dear Heiko, > >> > >> thanks for running the tests. > >> > >> In message <4B8F8BB4.6070201@denx.de> you wrote: > >>> here the results: > >>> > >>> run version > >>> > >>> 1-4 2.6.33-rc6 without your patches > >>> 5-8 2.6.33-rc6 with all your patches > >>> 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touc= h ACCESSED > >> when no SWAP) > >>> 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=3Dy > >> So CONFIG_PIN_TLB imroves the performance as expected, while the o= ther > >> patches don;t show any measurable improvememt - or am I reading th= e > >> results incorrectly? BTW, I have impl. all of the newer 2.6 TLB/MMU fixes(including the dcbX= fixup) for 2.4 as well. If there is any interest I can polish them and submit for 2.4? I do nee= d an external tester for that though. Jocke= ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 0/4] 8xx: Optimize TLB Miss code. @ 2010-02-26 8:29 Joakim Tjernlund 0 siblings, 0 replies; 22+ messages in thread From: Joakim Tjernlund @ 2010-02-26 8:29 UTC (permalink / raw) To: linuxppc-dev This set of tries to optimize the TLB code on 8xx even more. If they work, it should be a noticable performance boost. I would be very happy if you could test them for me. Joakim Tjernlund (4): 8xx: Optimze TLB Miss handlers 8xx: Avoid testing for kernel space in ITLB Miss. 8xx: Don't touch ACCESSED when no SWAP. 8xx: Use SPRG2 and DAR registers to stash r11 and cr. arch/powerpc/kernel/head_8xx.S | 70 +++++++++++++++++++++++++++------------- 1 files changed, 47 insertions(+), 23 deletions(-) ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2010-03-09 6:30 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-02 15:37 [PATCH 0/4] 8xx: Optimize TLB Miss code Joakim Tjernlund 2010-03-02 15:37 ` [PATCH 1/4] 8xx: Optimze TLB Miss handlers Joakim Tjernlund 2010-03-02 15:37 ` [PATCH 2/4] 8xx: Avoid testing for kernel space in ITLB Miss Joakim Tjernlund 2010-03-02 15:37 ` [PATCH 3/4] 8xx: Don't touch ACCESSED when no SWAP Joakim Tjernlund 2010-03-02 15:37 ` [PATCH 4/4] 8xx: Use SPRG2 and DAR registers to stash r11 and cr Joakim Tjernlund -- strict thread matches above, loose matches on Subject: below -- 2010-03-03 8:02 [PATCH 0/4] 8xx: Optimize TLB Miss code Heiko Schocher 2010-03-03 8:48 ` Joakim Tjernlund 2010-03-03 8:59 ` Joakim Tjernlund 2010-03-03 10:10 ` Heiko Schocher 2010-03-03 10:38 ` Joakim Tjernlund 2010-03-04 10:30 ` Heiko Schocher 2010-03-04 12:16 ` Wolfgang Denk 2010-03-04 13:06 ` Joakim Tjernlund 2010-03-04 16:30 ` Heiko Schocher 2010-03-05 10:40 ` Joakim Tjernlund 2010-03-08 7:46 ` Heiko Schocher 2010-03-08 8:44 ` Joakim Tjernlund 2010-03-08 9:06 ` Heiko Schocher 2010-03-08 10:42 ` Joakim Tjernlund 2010-03-09 6:30 ` Wolfgang Denk 2010-03-07 16:03 ` Joakim Tjernlund 2010-02-26 8:29 Joakim Tjernlund
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).