v2.6 performance slowdown on MPC8xx: Measuring TLB cache misses

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* v2.6 performance slowdown on MPC8xx: Measuring TLB cache misses
@ 2005-04-21 18:32 Marcelo Tosatti
  2005-04-21 18:50 ` [26-devel] " Marcelo Tosatti
  2005-04-24 20:59 ` Wolfgang Denk
  0 siblings, 2 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2005-04-21 18:32 UTC (permalink / raw)
  To: 26-devel, linux-ppc-embedded

Hi everyone,

I found out that the previous TLB counter numbers were wrong, two 
of the values were switched!

CPU is a 48Mhz 855T with 32 TLB entries, and 128Mb of RAM.

Now I've got valid results. With an idle machine, this are the results
of /proc/tlbmiss capture session with 1 second interval. Note that
idle actually means about 4/5 processes (AcsWeb, cy_pmd, cy_alarm, cy_wdt
kernel's keventd) running and switching over, but CPU is about 96-97% 
idle. 

As you can see, the ratio which TLB misses happen in v2.6 is 
significantly higher, for both I/D caches, even with an almost idle machine.

The v2.6 kernel has grown in size relative to TLB usage (cache footprint), 
which is, I start to believe, the major cause for this issue. If that 
is the case other platforms will also suffer. 

As one example, the number of page addresses which the "sys_read()" 
system call needs to fetch to the I-cache in order to execute the task
(the calltree) is about twice in size as in v2.4. 

Pantelis Antoniou informed that that 64 TLB-entry versions of MPC8xx
processors do not suffer such significant performance slowdown.

One point in reading these numbers is that v2.6 will count twice for
page fault misses which result in pte creation (DataTLBMiss->DataTLBError),
but I hope to change that for better precision. In this specific 
case I guess it should not be significant given that no processes are 
being created, mostly already mapped (periodic) routines are running. 

I hope that capturing the TLB miss difference between v2.4 and v2.6 
on a simple CPU intense benchmark such as the "dd" I've been using before 
and multiplying that by translation cache miss penalty (20-23 clocks 
on a miss versus 1 clock on a hit) should give us a good estimate
the real cost of these misses). 

And I wonder, no other arches have been noticed this? 

Comments are appreciated.

Capture session of /proc/tlbmiss with 1 second interval:

v2.6:					v2.4:
I-TLB userspace misses: 2577            I-TLB userspace misses: 2192
I-TLB kernel misses: 1557               I-TLB kernel misses: 1328
D-TLB userspace misses: 7173            D-TLB userspace misses: 6801
D-TLB kernel misses: 4442               D-TLB kernel misses: 4260
*                                       *
I-TLB userspace misses: 5324            I-TLB userspace misses: 4557
I-TLB kernel misses: 3277               I-TLB kernel misses: 2821
D-TLB userspace misses: 14399           D-TLB userspace misses: 13816
D-TLB kernel misses: 9069               D-TLB kernel misses: 8734
*                                       *
I-TLB userspace misses: 8078            I-TLB userspace misses: 7003
I-TLB kernel misses: 4960               I-TLB kernel misses: 4360
D-TLB userspace misses: 22038           D-TLB userspace misses: 20952
D-TLB kernel misses: 13929              D-TLB kernel misses: 13299
*                                       *
I-TLB userspace misses: 10791           I-TLB userspace misses: 9404
I-TLB kernel misses: 6643               I-TLB kernel misses: 5874
D-TLB userspace misses: 29350           D-TLB userspace misses: 27963
D-TLB kernel misses: 18555              D-TLB kernel misses: 17768
*                                       *
I-TLB userspace misses: 13531           I-TLB userspace misses: 11801
I-TLB kernel misses: 8311               I-TLB kernel misses: 7390
D-TLB userspace misses: 36750           D-TLB userspace misses: 35123
D-TLB kernel misses: 23271              D-TLB kernel misses: 22416
*                                       *
I-TLB userspace misses: 16434           I-TLB userspace misses: 14229
I-TLB kernel misses: 10172              I-TLB kernel misses: 8925
D-TLB userspace misses: 51096           D-TLB userspace misses: 42241
D-TLB kernel misses: 34982              D-TLB kernel misses: 26995
*                                       *
I-TLB userspace misses: 19183           I-TLB userspace misses: 16646
I-TLB kernel misses: 11890              I-TLB kernel misses: 10445
D-TLB userspace misses: 58557           D-TLB userspace misses: 49291
D-TLB kernel misses: 39726              D-TLB kernel misses: 31479
*                                       *
I-TLB userspace misses: 21973           I-TLB userspace misses: 19125
I-TLB kernel misses: 13596              I-TLB kernel misses: 12011
D-TLB userspace misses: 65933           D-TLB userspace misses: 56376
D-TLB kernel misses: 44401              D-TLB kernel misses: 36025
*                                       *
I-TLB userspace misses: 24644           I-TLB userspace misses: 21509
I-TLB kernel misses: 15231              I-TLB kernel misses: 13526
D-TLB userspace misses: 73345           D-TLB userspace misses: 63431
D-TLB kernel misses: 49083              D-TLB kernel misses: 40567
*                                       *
I-TLB userspace misses: 27451           I-TLB userspace misses: 23894
I-TLB kernel misses: 16974              I-TLB kernel misses: 15031
D-TLB userspace misses: 80652           D-TLB userspace misses: 70467
D-TLB kernel misses: 53739              D-TLB kernel misses: 45089

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [26-devel] v2.6 performance slowdown on MPC8xx: Measuring TLB cache misses
  2005-04-21 18:32 v2.6 performance slowdown on MPC8xx: Measuring TLB cache misses Marcelo Tosatti
@ 2005-04-21 18:50 ` Marcelo Tosatti
  2005-04-22  6:18   ` Pantelis Antoniou
  2005-04-24 20:59 ` Wolfgang Denk
  1 sibling, 1 reply; 8+ messages in thread
From: Marcelo Tosatti @ 2005-04-21 18:50 UTC (permalink / raw)
  To: 26-devel, linux-ppc-embedded

[-- Attachment #1: Type: text/plain, Size: 175 bytes --]

On Thu, Apr 21, 2005 at 03:32:39PM -0300, Marcelo Tosatti wrote:
> Capture session of /proc/tlbmiss with 1 second interval:

Forgot to attach /proc/tlbmiss patch, here it is.

[-- Attachment #2: tlbmiss-count-2.4.patch --]
[-- Type: text/plain, Size: 4835 bytes --]

--- linux-216.orig/arch/ppc/kernel/head_8xx.S	2005-01-19 10:37:12.000000000 -0200
+++ linux-216/arch/ppc/kernel/head_8xx.S	2005-03-04 18:56:38.351004576 -0300
@@ -331,10 +331,21 @@
 	 * kernel page tables.
 	 */
 	andi.	r21, r20, 0x0800	/* Address >= 0x80000000 */
-	beq	3f
+	beq	4f
 	lis	r21, swapper_pg_dir@h
 	ori	r21, r21, swapper_pg_dir@l
 	rlwimi	r20, r21, 0, 2, 19
+
+        lis     r3,(itlbkernel_miss-KERNELBASE)@ha
+        lwz     r11,(itlbkernel_miss-KERNELBASE)@l(r3)
+        addi    r11,r11,1
+        stw     r11,(itlbkernel_miss-KERNELBASE)@l(r3)
+        beq 3f
+4:
+	lis     r3,(itlbuser_miss-KERNELBASE)@ha
+        lwz     r11,(itlbuser_miss-KERNELBASE)@l(r3)
+        addi    r11,r11,1
+        stw     r11,(itlbuser_miss-KERNELBASE)@l(r3)
 3:
 	lwz	r21, 0(r20)	/* Get the level 1 entry */
 	rlwinm.	r20, r21,0,0,19	/* Extract page descriptor page address */
@@ -414,10 +425,23 @@
 	 * kernel page tables.
 	 */
 	andi.	r21, r20, 0x0800
-	beq	3f
+	beq	4f
 	lis	r21, swapper_pg_dir@h
 	ori	r21, r21, swapper_pg_dir@l
 	rlwimi r20, r21, 0, 2, 19
+
+        lis     r3,(dtlbkernel_miss-KERNELBASE)@ha
+        lwz     r11,(dtlbkernel_miss-KERNELBASE)@l(r3)
+        addi    r11,r11,1
+        stw     r11,(dtlbkernel_miss-KERNELBASE)@l(r3)
+        beq 3f
+
+4:
+        lis     r3,(dtlbuser_miss-KERNELBASE)@ha
+        lwz     r11,(dtlbuser_miss-KERNELBASE)@l(r3)
+        addi    r11,r11,1
+        stw     r11,(dtlbuser_miss-KERNELBASE)@l(r3)
+
 3:
 	lwz	r21, 0(r20)	/* Get the level 1 entry */
 	rlwinm.	r20, r21,0,0,19	/* Extract page descriptor page address */
@@ -989,3 +1013,14 @@
 	.space	16
 #endif
 
+_GLOBAL(itlbuser_miss)
+        .space 4
+                                                                                       
+_GLOBAL(itlbkernel_miss)
+        .space 4
+                                                                                       
+_GLOBAL(dtlbuser_miss)
+        .long 0
+                                                                                       
+_GLOBAL(dtlbkernel_miss)
+        .long 0
--- linux-216.orig/fs/proc/proc_misc.c	2005-01-19 10:37:12.000000000 -0200
+++ linux-216/fs/proc/proc_misc.c	2005-03-04 18:57:37.241051928 -0300
@@ -621,6 +621,12 @@
 		if (entry)
 			entry->proc_fops = &ppc_htab_operations;
 	}
+        {
+        extern struct file_operations ppc_tlbmiss_operations;
+        entry = create_proc_entry("tlbmiss", S_IRUGO|S_IWUSR, NULL);
+        if (entry)
+                entry->proc_fops = &ppc_tlbmiss_operations;
+        }
 #endif
 	entry = create_proc_read_entry("slabinfo", S_IWUSR | S_IRUGO, NULL,
 				       slabinfo_read_proc, NULL);
--- linux-216.orig/arch/ppc/kernel/ppc_htab.c	2005-01-19 10:37:12.000000000 -0200
+++ linux-216/arch/ppc/kernel/ppc_htab.c	2005-03-04 19:04:05.276061640 -0300
@@ -21,6 +21,7 @@
 #include <linux/sysctl.h>
 #include <linux/ctype.h>
 #include <linux/threads.h>
+#include <linux/seq_file.h>
 
 #include <asm/uaccess.h>
 #include <asm/bitops.h>
@@ -32,6 +33,51 @@
 #include <asm/cputable.h>
 #include <asm/system.h>
 
+#if 1
+
+extern unsigned long itlbuser_miss, itlbkernel_miss;
+extern unsigned long dtlbuser_miss, dtlbkernel_miss;
+
+static ssize_t ppc_tlbmiss_write(struct file *file, const char * buffer,
+                                size_t count, loff_t *ppos);
+static int ppc_tlbmiss_show(struct seq_file *m, void *v);
+static int ppc_tlbmiss_open(struct inode *inode, struct file *file);
+
+struct file_operations ppc_tlbmiss_operations = {
+        .open   = ppc_tlbmiss_open,
+        .read   = seq_read,
+        .llseek = seq_lseek,
+        .write = ppc_tlbmiss_write,
+        .release = seq_release,
+};
+
+static int ppc_tlbmiss_open(struct inode *inode, struct file *file)
+{
+        return seq_open(file, &ppc_tlbmiss_show);
+}
+                                                                                         
+static int ppc_tlbmiss_show(struct seq_file *m, void *v)
+{
+        seq_printf(m, "I-TLB userspace misses: %lu\n"
+                      "I-TLB kernel misses: %lu\n"
+                      "D-TLB userspace misses: %lu\n"
+                      "D-TLB kernel misses: %lu\n",
+                        itlbuser_miss, itlbkernel_miss,
+                        dtlbuser_miss, dtlbkernel_miss);
+        return 0;
+}
+                                                                                         
+static ssize_t ppc_tlbmiss_write(struct file *file, const char * buffer,
+                                size_t count, loff_t *ppos)
+{
+        itlbuser_miss = 0;
+        itlbkernel_miss = 0;
+        dtlbuser_miss = 0;
+        dtlbkernel_miss = 0;
+}
+#endif
+
+
 static ssize_t ppc_htab_read(struct file * file, char * buf,
 			     size_t count, loff_t *ppos);
 static ssize_t ppc_htab_write(struct file * file, const char * buffer,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [26-devel] v2.6 performance slowdown on MPC8xx: Measuring TLB cache misses
  2005-04-21 18:50 ` [26-devel] " Marcelo Tosatti
@ 2005-04-22  6:18   ` Pantelis Antoniou
  2005-04-22 15:39     ` Marcelo Tosatti
  0 siblings, 1 reply; 8+ messages in thread
From: Pantelis Antoniou @ 2005-04-22  6:18 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: 26-devel, linux-ppc-embedded

Marcelo Tosatti wrote:
> On Thu, Apr 21, 2005 at 03:32:39PM -0300, Marcelo Tosatti wrote:
> 
>>Capture session of /proc/tlbmiss with 1 second interval:
> 
> 
> Forgot to attach /proc/tlbmiss patch, here it is.
> 
> 
[snip]

> 
>  

Thanks Marcelo.

I'll try to run this on my 870 board & mail the results.

Regards

Pantelis

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [26-devel] v2.6 performance slowdown on MPC8xx: Measuring TLB cache misses
  2005-04-22  6:18   ` Pantelis Antoniou
@ 2005-04-22 15:39     ` Marcelo Tosatti
  0 siblings, 0 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2005-04-22 15:39 UTC (permalink / raw)
  To: Pantelis Antoniou; +Cc: 26-devel, linux-ppc-embedded

On Fri, Apr 22, 2005 at 09:18:17AM +0300, Pantelis Antoniou wrote:
> Marcelo Tosatti wrote:
> >On Thu, Apr 21, 2005 at 03:32:39PM -0300, Marcelo Tosatti wrote:
> >
> >>Capture session of /proc/tlbmiss with 1 second interval:
> >
> >
> >Forgot to attach /proc/tlbmiss patch, here it is.
> >
> >
> [snip]
> 
> >
> > 
> 
> Thanks Marcelo.
> 
> I'll try to run this on my 870 board & mail the results.

Hi, 

Here goes more data about the v2.6 performance slowdown on MPC8xx.

Thanks Benjamin for the TLB miss counter idea! 

This are results of the following test script which zeroes the TLB counters,
copies 16MB of data from memory to memory using "dd", and reads the counters
again. 

-- 

#!/bin/bash
echo 0 > /proc/tlbmiss
time dd if=/dev/zero of=file bs=4k count=3840
cat /proc/tlbmiss

-- 

The results:

v2.6: 				v2.4: 				delta
[root@CAS root]# sh script     	[root@CAS root]# sh script     
real    0m4.241s                         real    0m3.440s
user    0m0.140s                         user    0m0.090s
sys     0m3.820s                         sys     0m3.330s

I-TLB userspace misses: 142369  I-TLB userspace misses: 2179    ITLB u: 139190
I-TLB kernel misses: 118288    	I-TLB kernel misses: 1369	ITLB k: 116319
D-TLB userspace misses: 222916 	D-TLB userspace misses: 180249	DTLB u: 38667
D-TLB kernel misses: 207773    	D-TLB kernel misses: 167236	DTLB k: 38273

The sum of all TLB miss counter delta's between v2.4 and v2.6 is: 

139190 + 116319 + 38667 + 38273  = 332449 

Multiplied by 23 cycles, which is the average wait time to read a 
page translation miss from memory:

332449 * 23 = 7646327 cycles.

Which is about 16% of 48000000, the total number of cycles this CPU 
performs on one second. Its very likely that there is a significant
indirect effect of this TLB miss increase, other than the wasted 
cycles to bring the page tables from memory: exception execution time 
and context switching.

Checking "time" output, we can see 1s of slowdown:  

[root@CAS root]# time dd if=/dev/zero of=file bs=4k count=3840 

v2.4:				v2.6:				diff
real    0m3.366s		real    0m4.360s		0.994s
user    0m0.080s		user    0m0.111s	        0.31s
sys     0m3.260s		sys     0m4.218s 		0.958s

Mostly caused by increased kernel execution time.

This proves that the slowdown is, in great part, due to increased 
translation cache trashing. 

Now, what is the best way to bring the performance back to v2.4 levels? 

For this "dd" test, which is dominated by "sys_read/sys_write", I thought 
of trying to bring the hotpath functions into the same pages, thus
decreasing the number of page translations required for such tasks.

Comments are appreciated.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v2.6 performance slowdown on MPC8xx: Measuring TLB cache misses
  2005-04-24 20:59 ` Wolfgang Denk
@ 2005-04-24 17:25   ` Marcelo Tosatti
  2005-04-24 22:51     ` Wolfgang Denk
  0 siblings, 1 reply; 8+ messages in thread
From: Marcelo Tosatti @ 2005-04-24 17:25 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: 26-devel, linux-ppc-embedded

On Sun, Apr 24, 2005 at 10:59:40PM +0200, Wolfgang Denk wrote:
> Dear Marcelo,

Hi Wolfgang! 

> thanks for starting this discussion, and for providing a patch for 8xx. 

Thanks so much for spending the time to do this, some very interesting
results inside..

> However, I think we should not only look at the TLB handling problems
> on the 8xx processors. This is probably just a part of  the  problem.
> In  general  the 2.6 performance on (small) embedded systems is much,
> much worse than what we see with a 2.4 kernel. 
> 
> I put some results (2.4.25 vs. 2.6.11.7 on a MPC860 and on a MPC8240)
> at http://www.denx.de/twiki/bin/view/Know/Linux24vs26
> 
> Here is the summary:
> 
> Using the 2.6 kernel on embedded  systems  implicates  the  following
> disadvantages:
> * Slow to build: 2.6 takes 30...40% longer to compile
> * Big memory footprint in flash: the 2.6 compressed kernel image is
>   30...40% bigger 
>
> * Big memory footprint in RAM: the 2.6 kernel needs 30...40% more
>   RAM; the available RAM size for applications is 700kB smaller

I've shrank the v2.6 kernel build to a size significantly smaller than our 
v2.4 build, and performance did not increase at all. 

>From that, I could figure that the performance problem, in this case, 
was not related to decreased available free memory. From then on I started
going the TLB direction.

But yes, in general, v2.6 image is bigger and memory consumption is higher 
than v2.4. 

One important project in this area is linux-tiny, which allows one to 
disable unwanted features.

> * Slow to boot: 2.6 takes 5...15% longer to boot into multi-user mode

Others have mentioned, and I agree, that sysfs is likely to be the major 
cause for boot-time slowdown. Have you tried disabling sysfs? 

> * Slow to run: context switches up to 96% slower, local communication
>   latencies up to 80% slower, file system latencies up to 76% slower,
>   local communication bandwidth less than 50% in some cases. 

I've noticed the v2.6 scheduler context switching _more_ than v2.4...

Question: Such huge regressions are seen on MPC8xx only, MPC82xx slowdown 
is not so bad, correct? 

> It's a disappointing result, indeed.

Yes we are in bad shape :( 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v2.6 performance slowdown on MPC8xx: Measuring TLB cache misses
  2005-04-21 18:32 v2.6 performance slowdown on MPC8xx: Measuring TLB cache misses Marcelo Tosatti
  2005-04-21 18:50 ` [26-devel] " Marcelo Tosatti
@ 2005-04-24 20:59 ` Wolfgang Denk
  2005-04-24 17:25   ` Marcelo Tosatti
  1 sibling, 1 reply; 8+ messages in thread
From: Wolfgang Denk @ 2005-04-24 20:59 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: 26-devel, linux-ppc-embedded

Dear Marcelo,

thanks for starting this discussion, and for providing a patch for 8xx.

However, I think we should not only look at the TLB handling problems
on the 8xx processors. This is probably just a part of  the  problem.
In  general  the 2.6 performance on (small) embedded systems is much,
much worse than what we see with a 2.4 kernel.

I put some results (2.4.25 vs. 2.6.11.7 on a MPC860 and on a MPC8240)
at http://www.denx.de/twiki/bin/view/Know/Linux24vs26

Here is the summary:

Using the 2.6 kernel on embedded  systems  implicates  the  following
disadvantages:
* Slow to build: 2.6 takes 30...40% longer to compile
* Big memory footprint in flash: the 2.6 compressed kernel image is
  30...40% bigger
* Big memory footprint in RAM: the 2.6 kernel needs 30...40% more
  RAM; the available RAM size for applications is 700kB smaller
* Slow to boot: 2.6 takes 5...15% longer to boot into multi-user mode
* Slow to run: context switches up to 96% slower, local communication
  latencies up to 80% slower, file system latencies up to 76% slower,
  local communication bandwidth less than 50% in some cases.

It's a disappointing result, indeed.

Best regards,

Wolfgang Denk

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Another megabytes the dust.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v2.6 performance slowdown on MPC8xx: Measuring TLB cache misses
  2005-04-24 17:25   ` Marcelo Tosatti
@ 2005-04-24 22:51     ` Wolfgang Denk
  2005-04-25 11:44       ` Pantelis Antoniou
  0 siblings, 1 reply; 8+ messages in thread
From: Wolfgang Denk @ 2005-04-24 22:51 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-ppc-embedded

Dear Marcelo,

in message <20050424172518.GB22786@logos.cnet> you wrote:
> 
> Others have mentioned, and I agree, that sysfs is likely to be the major 
> cause for boot-time slowdown. Have you tried disabling sysfs? 

No, not yet.

> Question: Such huge regressions are seen on MPC8xx only, MPC82xx slowdown 
> is not so bad, correct? 

No. You can find both the LMBENCH summar and the raw data at
http://www.denx.de/twiki/pub/Know/Linux24vs26/lmbench_results resp.
http://www.denx.de/twiki/pub/Know/Linux24vs26/lmbench_results_raw.tar.gz

In most cases the MPC8240 is as bad as the  MPC860;  just  for  local
communication   bandwidth  there  is  a  visible  dependency  on  the
processor: pipes are faster on 8240 but much slower (49%) on the 860,
but the UNIX sockets are 11% slower on 8240 while we  get  about  the
same speed as with 2.4 on the 860, etc.

Here is the context switching part:

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
sp8240     Linux 2.4.25         200.4   69.0  217.1   78.1   219.9    78.5
sp8240     Linux 2.4.25         207.5   63.4  229.9   93.7   230.8    81.1
sp8240     Linux 2.4.25         207.4   72.4  230.6   89.5   233.9    86.3
sp8240    Linux 2.6.11. 8.9400  254.1  143.0  261.3  161.3   259.4   160.6
sp8240    Linux 2.6.11. 8.5100  234.4  127.4  256.4  161.4   251.3   149.0
sp8240    Linux 2.6.11. 8.5400  211.8  128.0  240.2  157.7   243.7   153.9
tqm8xx     Linux 2.4.25   29.4   64.7          78.4           81.6        
tqm8xx     Linux 2.4.25   32.9   56.5          75.8           80.0        
tqm8xx     Linux 2.4.25   29.9   66.7          76.6           80.8        
tqm8xx    Linux 2.6.11.   44.7   90.3         132.1          131.3        
tqm8xx    Linux 2.6.11.   48.8  117.1         132.7          136.6        
tqm8xx    Linux 2.6.11.   47.6   90.7         126.7          133.1        

and the local comm latencies:

*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
sp8240     Linux 2.4.25        46.5 120. 522.5 1362. 842.1 1817. 2828
sp8240     Linux 2.4.25        47.1 135. 504.6 1330. 880.2 1838. 2774
sp8240     Linux 2.4.25        47.1 134. 535.2 1369. 855.4 1810. 2929
sp8240    Linux 2.6.11. 8.940  89.4 251. 683.0 1506. 1020. 2021. 3507
sp8240    Linux 2.6.11. 8.510  89.5 251. 701.7 1500. 1075. 2032. 3492
sp8240    Linux 2.6.11. 8.540  88.2 263. 703.1 1550. 1110. 2076. 3588
tqm8xx     Linux 2.4.25  29.4 145.3 309. 682.3 1427. 1000. 1896. 2992
tqm8xx     Linux 2.4.25  32.9 144.3 338. 675.9 1434. 1002. 1933. 2990
tqm8xx     Linux 2.4.25  29.9 150.5 352. 679.4 1429. 1006. 1931. 2983
tqm8xx    Linux 2.6.11.  44.7 238.8 522. 940.4 1629. 1265. 2125. 3792
tqm8xx    Linux 2.6.11.  48.8 255.2 531.             1255.       3750
tqm8xx    Linux 2.6.11.  47.6 258.6 550.             1252.       3783

Actually the 8240 is worse than the 860 in some of the tests...

Best regards,

Wolfgang Denk

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Man did not weave the web of life; he  is  merely  a  strand  in  it.
Whatever he does to the web, he does to himself.     - Seattle [1854]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: v2.6 performance slowdown on MPC8xx: Measuring TLB cache misses
  2005-04-24 22:51     ` Wolfgang Denk
@ 2005-04-25 11:44       ` Pantelis Antoniou
  0 siblings, 0 replies; 8+ messages in thread
From: Pantelis Antoniou @ 2005-04-25 11:44 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: linux-ppc-embedded

Wolfgang Denk wrote:
> Dear Marcelo,
> 

[depressing data snipped]

> 
> Wolfgang Denk
> 

Well it's a mess alright.

Unfortunately we cannot declare that we'll stay on 2.4 forever.

Several subsystems *MTD gough* do not support latest hw, or
the developers have moved on to 2.6 full time, refusing to
bother with 2.4 anymore.

Can we make an effort to pinpoint the performance
bottlenecks & re-implement the affected areas sanely?

The -tiny patchset is a start, but frankly I don't think it's
code/data footprint that it's the problem.

IMHO it's not just the small embedded systems that have been
affected; just on them the effects are more obvious.

So what do you all think?

Regards

Pantelis

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-04-25 11:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-21 18:32 v2.6 performance slowdown on MPC8xx: Measuring TLB cache misses Marcelo Tosatti
2005-04-21 18:50 ` [26-devel] " Marcelo Tosatti
2005-04-22  6:18   ` Pantelis Antoniou
2005-04-22 15:39     ` Marcelo Tosatti
2005-04-24 20:59 ` Wolfgang Denk
2005-04-24 17:25   ` Marcelo Tosatti
2005-04-24 22:51     ` Wolfgang Denk
2005-04-25 11:44       ` Pantelis Antoniou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).