linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Kernel 2.6 on MPC8xx performance trouble...
@ 2005-10-28  6:57 David Jander
  2005-10-28  9:36 ` Roger Larsson
  2005-10-28 15:30 ` Marcelo Tosatti
  0 siblings, 2 replies; 11+ messages in thread
From: David Jander @ 2005-10-28  6:57 UTC (permalink / raw)
  To: linux-ppc-embedded


Hi all,

Many people have said it before: 2.6 has a performance penalty specially for 
embdedded systems.
But now that I have 2.6 running on our 100MHz MPC852T based board, I was 
shocked to see the result:
The most simple task doing nothing but a closed loop of integer math runs at 
_half_ the speed compared to kernel 2.4.25!!!!!

Here are the conditions for the test:
- Bogomips are the same, so the CPU definitely runs at the same clock-rate 
(and not half) as with "2.4".
- Enabling and disabling preemption doesn't have any impact (as expected for 
such kinds of tasks).
- Setting HZ to 100 or 1000 also has only about 3% impact on speed.
- The binary of the test program is the same in both cases (no re-compile with 
other optimizations by accident).
- The hardware is the same (exact same board).
- Certain hardware drivers that are not ported to "2.6" yet were present in 
"2.4" but (obviously) not in "2.6", but non of them could have a _positive_ 
impact on performance.
- Kernel versions are 2.4.25 (denx-devel) and 2.6.14-rc5 (denx-git 20051027).

Result: The test takes 3 seconds on kernel-2.6 and 1.5 seconds on kernel-2.4. 
Here is what "time" has to say about it:

real    0m3.119s
user    0m3.080s
sys     0m0.040s

The test loop is pretty brain-dead, but that doesn't matter right now.
This is it:
	[....]
	gettimeofday(&tv0,NULL);
        for(t=0L; t<10000000L; t++)
        {
                a+=b; /* Do something */
        }
        gettimeofday(&tv,NULL);
	[...]

Any ideas?
Am I misconfiguring something? Is 2.6 support for mpc8xx still broken 
(cache/tlb, mm, etc) or is 2.6 supposed to perform THAT bad??

Greetings,

-- 
David Jander

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 2.6 on MPC8xx performance trouble...
  2005-10-28  6:57 Kernel 2.6 on MPC8xx performance trouble David Jander
@ 2005-10-28  9:36 ` Roger Larsson
  2005-10-28 10:57   ` David Jander
  2005-10-28 15:30 ` Marcelo Tosatti
  1 sibling, 1 reply; 11+ messages in thread
From: Roger Larsson @ 2005-10-28  9:36 UTC (permalink / raw)
  To: linuxppc-embedded

On Friday 28 October 2005 08.57, David Jander wrote:
> Hi all,
>
> Many people have said it before: 2.6 has a performance penalty specially
> for embdedded systems.
> But now that I have 2.6 running on our 100MHz MPC852T based board, I was
> shocked to see the result:
> The most simple task doing nothing but a closed loop of integer math runs
> at _half_ the speed compared to kernel 2.4.25!!!!!
>
> Here are the conditions for the test:
> - Bogomips are the same, so the CPU definitely runs at the same clock-rate
> (and not half) as with "2.4".
> - Enabling and disabling preemption doesn't have any impact (as expected
> for such kinds of tasks).
> - Setting HZ to 100 or 1000 also has only about 3% impact on speed.
> - The binary of the test program is the same in both cases (no re-compile
> with other optimizations by accident).
> - The hardware is the same (exact same board).
> - Certain hardware drivers that are not ported to "2.6" yet were present in
> "2.4" but (obviously) not in "2.6", but non of them could have a _positive_
> impact on performance.
> - Kernel versions are 2.4.25 (denx-devel) and 2.6.14-rc5 (denx-git
> 20051027).
>
> Result: The test takes 3 seconds on kernel-2.6 and 1.5 seconds on
> kernel-2.4. Here is what "time" has to say about it:
>
> real    0m3.119s
> user    0m3.080s
> sys     0m0.040s
>
> The test loop is pretty brain-dead, but that doesn't matter right now.
> This is it:
> 	[....]
> 	gettimeofday(&tv0,NULL);
>         for(t=0L; t<10000000L; t++)
>         {
>                 a+=b; /* Do something */
>         }
>         gettimeofday(&tv,NULL);
> 	[...]
>
> Any ideas?
> Am I misconfiguring something? Is 2.6 support for mpc8xx still broken
> (cache/tlb, mm, etc) or is 2.6 supposed to perform THAT bad??

Have you verified the system measured time with wall clock (wrist watch)?
System time could be wrong on one of the systems.

What is 'a' and 'b'? The only other explanation I can see is that your
"Do something" is more memory heavy than you think - array calculations?
(Cache problems should probably give a worse result, but you could check that
those config registers are the same).

/RogerL

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 2.6 on MPC8xx performance trouble...
  2005-10-28  9:36 ` Roger Larsson
@ 2005-10-28 10:57   ` David Jander
  2005-10-28 18:44     ` Roger Larsson
  2005-10-28 20:37     ` Wolfgang Denk
  0 siblings, 2 replies; 11+ messages in thread
From: David Jander @ 2005-10-28 10:57 UTC (permalink / raw)
  To: linuxppc-embedded

On Friday 28 October 2005 11:36, Roger Larsson wrote:
> >[...]
> > Result: The test takes 3 seconds on kernel-2.6 and 1.5 seconds on
> > kernel-2.4. Here is what "time" has to say about it:
> >
> > real    0m3.119s
> > user    0m3.080s
> > sys     0m0.040s
> >
> > The test loop is pretty brain-dead, but that doesn't matter right now.
> > This is it:
> > 	[....]
> > 	gettimeofday(&tv0,NULL);
> >         for(t=0L; t<10000000L; t++)
> >         {
> >                 a+=b; /* Do something */
> >         }
> >         gettimeofday(&tv,NULL);
> > 	[...]
> >
> > Any ideas?
> > Am I misconfiguring something? Is 2.6 support for mpc8xx still broken
> > (cache/tlb, mm, etc) or is 2.6 supposed to perform THAT bad??
>
> Have you verified the system measured time with wall clock (wrist watch)?
> System time could be wrong on one of the systems.

Yes. Wall-clock==gettimeofday-clock on both systems.

> What is 'a' and 'b'? The only other explanation I can see is that your
> "Do something" is more memory heavy than you think - array calculations?
> (Cache problems should probably give a worse result, but you could check
> that those config registers are the same).

They are just integers with fixed start values. These are in the loop, so it's 
not an empty loop and hopefully the compiler won't out-optimize it so easily 
(that is of course without specifying any optimization flags). Please don't 
tell me it's a lousy benchmark, because I already know that! Be it as lousy 
as it is, I shouldn't get _those_ results IMHO.

I have downloaded nbench (hopefully a more serious benchmark for raw computing 
power), and the results are as follows (I deliberately excluded tests that 
don't make sense (ie. use FP)):

Kernel 2.4.25:

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          30.438  :       0.78  :       0.26
STRING SORT         :          1.5842  :       0.71  :       0.11
BITFIELD            :      7.9506e+06  :       1.36  :       0.28
FP EMULATION        :           3.258  :       1.56  :       0.36
IDEA                :          108.89  :       1.67  :       0.49

Kernel 2.6.14-r5:

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          21.042  :       0.54  :       0.18
STRING SORT         :         0.88215  :       0.39  :       0.06
BITFIELD            :      6.0979e+06  :       1.05  :       0.22
FP EMULATION        :          1.6453  :       0.79  :       0.18
IDEA                :          110.25  :       1.69  :       0.50


Now, the strange thing is, IDEA is the only test where 2.6 is slightly faster 
(results are consistent over repeated runs). Compiler options are: "-s 
-static -Wall -O3 -msoft-float", FP-emulation in the kernel is never 
activated.

I think I will need to look closer at how the "IDEA"-test works, but first 
I'll have to run the nbench sources through "indent", because they are pretty 
unreadable as is ;-)

I also downloaded oprofile, and am studying its manual right now. Hopefully 
using this might get me a clue. If anybody already happens to know the 
answer, please shout!

Greetings,

-- 
David Jander

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 2.6 on MPC8xx performance trouble...
  2005-10-28  6:57 Kernel 2.6 on MPC8xx performance trouble David Jander
  2005-10-28  9:36 ` Roger Larsson
@ 2005-10-28 15:30 ` Marcelo Tosatti
  2005-10-31  8:21   ` David Jander
  1 sibling, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2005-10-28 15:30 UTC (permalink / raw)
  To: David Jander; +Cc: linux-ppc-embedded

On Fri, Oct 28, 2005 at 08:57:44AM +0200, David Jander wrote:
> 
> Hi all,
> 
> Many people have said it before: 2.6 has a performance penalty specially for 
> embdedded systems.
> But now that I have 2.6 running on our 100MHz MPC852T based board, I was 
> shocked to see the result:
> The most simple task doing nothing but a closed loop of integer math runs at 
> _half_ the speed compared to kernel 2.4.25!!!!!

David,

Do you have CONFIG_PIN_TLB enabled?

If you do, please patch this in:

http://www.kernel.org/git/?p=linux/kernel/git/marcelo/8xx-fixes;a=commitdiff;h=a41ba028534c45280170c05c23609ea84f34b38a

And select DEBUG_PIN_TLBIE.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 2.6 on MPC8xx performance trouble...
  2005-10-28 10:57   ` David Jander
@ 2005-10-28 18:44     ` Roger Larsson
  2005-10-28 20:37     ` Wolfgang Denk
  1 sibling, 0 replies; 11+ messages in thread
From: Roger Larsson @ 2005-10-28 18:44 UTC (permalink / raw)
  To: David Jander; +Cc: linuxppc-embedded

On Friday 28 October 2005 12.57, David Jander wrote:
> They are just integers with fixed start values. These are in the loop, so
> it's not an empty loop and hopefully the compiler won't out-optimize it so
> easily (that is of course without specifying any optimization flags).
> Please don't tell me it's a lousy benchmark, because I already know that!
> Be it as lousy as it is, I shouldn't get _those_ results IMHO.
>
> I have downloaded nbench (hopefully a more serious benchmark for raw
> computing power), and the results are as follows (I deliberately excluded
> tests that don't make sense (ie. use FP)):
>
> Kernel 2.4.25:
>
> TEST                : Iterations/sec.  : Old Index   : New Index
>
>                     :                  : Pentium 90* : AMD K6/233*
>
> --------------------:------------------:-------------:------------
> NUMERIC SORT        :          30.438  :       0.78  :       0.26
> STRING SORT         :          1.5842  :       0.71  :       0.11
> BITFIELD            :      7.9506e+06  :       1.36  :       0.28
> FP EMULATION        :           3.258  :       1.56  :       0.36
> IDEA                :          108.89  :       1.67  :       0.49
>
> Kernel 2.6.14-r5:
>
> TEST                : Iterations/sec.  : Old Index   : New Index
>
>                     :                  : Pentium 90* : AMD K6/233*
>
> --------------------:------------------:-------------:------------
> NUMERIC SORT        :          21.042  :       0.54  :       0.18
> STRING SORT         :         0.88215  :       0.39  :       0.06
> BITFIELD            :      6.0979e+06  :       1.05  :       0.22
> FP EMULATION        :          1.6453  :       0.79  :       0.18
> IDEA                :          110.25  :       1.69  :       0.50
>
>

What about the Pentium 90 and AMD K6? Are those values actual measured
results? By you? If not why do THEY differ between the kernel versions?

Is this a MPC8xx problem - can it be verified on a x86?

/RogerL

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 2.6 on MPC8xx performance trouble...
  2005-10-28 10:57   ` David Jander
  2005-10-28 18:44     ` Roger Larsson
@ 2005-10-28 20:37     ` Wolfgang Denk
  2005-10-31  9:31       ` David Jander
  1 sibling, 1 reply; 11+ messages in thread
From: Wolfgang Denk @ 2005-10-28 20:37 UTC (permalink / raw)
  To: David Jander; +Cc: linuxppc-embedded

In message <200510281257.22650.david.jander@protonic.nl> you wrote:
>
> > What is 'a' and 'b'? The only other explanation I can see is that your
> > "Do something" is more memory heavy than you think - array calculations?
> > (Cache problems should probably give a worse result, but you could check
> > that those config registers are the same).
> 
> They are just integers with fixed start values. These are in the loop, so it's 
> not an empty loop and hopefully the compiler won't out-optimize it so easily 
> (that is of course without specifying any optimization flags). Please don't 
> tell me it's a lousy benchmark, because I already know that! Be it as lousy 
> as it is, I shouldn't get _those_ results IMHO.

Indeed, you should not get such results.  If  you  compare  with  the
lmbench  results  of  our 2.4/2.6 comparison, you will notice that we
did NOT see such behaviour. There was a  perfromnce  degradation  for
pure  integer tests, due to increased system overhead, but far from a
factor of 2.
See http://www.denx.de/wiki/pub/Know/Linux24vs26/lmbench_results

Best regards,

Wolfgang Denk

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
A fanatic is a person who can't change his mind and won't change  the
subject.                                          - Winston Churchill

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 2.6 on MPC8xx performance trouble...
  2005-10-28 15:30 ` Marcelo Tosatti
@ 2005-10-31  8:21   ` David Jander
  2005-10-31 12:58     ` Mark Chambers
  0 siblings, 1 reply; 11+ messages in thread
From: David Jander @ 2005-10-31  8:21 UTC (permalink / raw)
  To: linuxppc-embedded

On Friday 28 October 2005 17:30, Marcelo Tosatti wrote:
>[...]
> David,
>
> Do you have CONFIG_PIN_TLB enabled?
>
> If you do, please patch this in:
>
> http://www.kernel.org/git/?p=linux/kernel/git/marcelo/8xx-fixes;a=commitdif
>f;h=a41ba028534c45280170c05c23609ea84f34b38a
>
> And select DEBUG_PIN_TLBIE.

Ok, done that... no change.
I don't get any of those debug messages, so I guess that was not the problem.

I have made another test in the meantime, trying to check if cache is working.
The test is pretty simple: see how fast I can fill a block of memory whose 
size is increasing by a power of two. You should expect to see a step-like 
decrease in speed when surpassing the size of the data cache (4kbyte).

The results are very suspicious:

kernel-2.4:

Memsize    512 : 39.342773 Mbyte/s
Memsize   1024 : 41.871094 Mbyte/s
Memsize   2048 : 43.212891 Mbyte/s
Memsize   4096 : 40.117188 Mbyte/s
Memsize   8192 : 28.148438 Mbyte/s
Memsize  16384 : 28.484375 Mbyte/s
Memsize  32768 : 28.656250 Mbyte/s
Memsize  65536 : 28.687500 Mbyte/s

This looks quite healthy: above 4kbyte we see a clear drop in performance, so 
we just learned that our data-cache is most probably 4kbyte in size, wow!

Kernel-2.6:

Memsize    512 : 21.033691 Mbyte/s
Memsize   1024 : 22.047852 Mbyte/s
Memsize   2048 : 22.601562 Mbyte/s
Memsize   4096 : 22.882812 Mbyte/s
Memsize   8192 : 23.007812 Mbyte/s
Memsize  16384 : 23.093750 Mbyte/s
Memsize  32768 : 23.125000 Mbyte/s
Memsize  65536 : 23.125000 Mbyte/s

No, sir, no cache detected !

Where do I have to look now?

Greetings,

-- 
David Jander
Protonic Holland.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 2.6 on MPC8xx performance trouble...
  2005-10-28 20:37     ` Wolfgang Denk
@ 2005-10-31  9:31       ` David Jander
  0 siblings, 0 replies; 11+ messages in thread
From: David Jander @ 2005-10-31  9:31 UTC (permalink / raw)
  To: linuxppc-embedded

On Friday 28 October 2005 22:37, Wolfgang Denk wrote:
>[...]
> > They are just integers with fixed start values. These are in the loop, so
> > it's not an empty loop and hopefully the compiler won't out-optimize it
> > so easily (that is of course without specifying any optimization flags).
> > Please don't tell me it's a lousy benchmark, because I already know that!
> > Be it as lousy as it is, I shouldn't get _those_ results IMHO.
>
> Indeed, you should not get such results.  If  you  compare  with  the
> lmbench  results  of  our 2.4/2.6 comparison, you will notice that we
> did NOT see such behaviour. There was a  perfromnce  degradation  for
> pure  integer tests, due to increased system overhead, but far from a
> factor of 2.
> See http://www.denx.de/wiki/pub/Know/Linux24vs26/lmbench_results

I have seen them, and my conclusion is: Your kernel was working ok, while 
mine, a newer one, is broken. 
As you can see in the other e-mail I just posted (replying to Marcelo), at 
least the CPU cache seems to be disabled. Might this have something to do 
with processor model (mis-) identification?
I had to apply the "ppc_sys: do not BUG if system ID is unknown" patch from 
Marcelo Tosatti a few days back in order to be able to boot in the first 
place. I had a look at ppc_sys system identification for 8xx and it looked a 
little bit nonsensical to me, since all 8xx report the same ID. Maybe the 
intention was to set the ID "by hand" in board support and setup.
Problem is: there is still no real board-support infrastructure for mpc8xx, 
like there is for mpc82xx for example. What are the plans for 8xx? Should I 
try to emulate what others have done for some PQ2 platforms, i.e. create a 
arch/ppc/platforms/myplatform.c file and implement board_init()?

Greetings,

-- 
David Jander
Protonic Holland.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 2.6 on MPC8xx performance trouble...
  2005-10-31  8:21   ` David Jander
@ 2005-10-31 12:58     ` Mark Chambers
  2005-10-31 13:08       ` David Jander
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Chambers @ 2005-10-31 12:58 UTC (permalink / raw)
  To: David Jander, linuxppc-embedded

>
> I have made another test in the meantime, trying to check if cache is
working.
> <snip>
> No, sir, no cache detected !
>
> Where do I have to look now?
>

Could the cache be in writethrough mode? (Instead of writeback)

Mark Chambers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 2.6 on MPC8xx performance trouble...
  2005-10-31 12:58     ` Mark Chambers
@ 2005-10-31 13:08       ` David Jander
  2005-10-31 15:29         ` David Jander
  0 siblings, 1 reply; 11+ messages in thread
From: David Jander @ 2005-10-31 13:08 UTC (permalink / raw)
  To: linuxppc-embedded

On Monday 31 October 2005 13:58, Mark Chambers wrote:
>[...]
> > No, sir, no cache detected !
> >
> > Where do I have to look now?
>
> Could the cache be in writethrough mode? (Instead of writeback)

Good point. I just checked the same test, but reading instead of writing. Same 
result, no cache detected.

I know how to change cache policy in u-boot, but does the kernel change that? 
If so, where?

Greetings,

-- 
David Jander

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel 2.6 on MPC8xx performance trouble...
  2005-10-31 13:08       ` David Jander
@ 2005-10-31 15:29         ` David Jander
  0 siblings, 0 replies; 11+ messages in thread
From: David Jander @ 2005-10-31 15:29 UTC (permalink / raw)
  To: linuxppc-embedded

On Monday 31 October 2005 14:08, David Jander wrote:
> On Monday 31 October 2005 13:58, Mark Chambers wrote:
> >[...]
> >
> > > No, sir, no cache detected !
> > >
> > > Where do I have to look now?
> >
> > Could the cache be in writethrough mode? (Instead of writeback)
>
> Good point. I just checked the same test, but reading instead of writing.
> Same result, no cache detected.
>
> I know how to change cache policy in u-boot, but does the kernel change
> that? If so, where?

Sorry for the stupid question. I did not remember it was there right before my 
eyes. It seems I just got lost in the new menu structure of the Kconfig.
And yes, cache was in writethrough mode!!
It seems as though this option disables cache entirely, because after turning 
it on, I suddenly get 2.4-type speed measurements.
With CONFIG_8xx_COPYBACK undefined, both reading and writing gives results as 
if there was simply no cache. Shouldn't it at least speed up memory-read 
access?
Sorry to everone for wasting their time with this stupid mistake of mine.
Thanks for all the help anyway.

Greetings,

-- 
David Jander

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-10-31 15:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-28  6:57 Kernel 2.6 on MPC8xx performance trouble David Jander
2005-10-28  9:36 ` Roger Larsson
2005-10-28 10:57   ` David Jander
2005-10-28 18:44     ` Roger Larsson
2005-10-28 20:37     ` Wolfgang Denk
2005-10-31  9:31       ` David Jander
2005-10-28 15:30 ` Marcelo Tosatti
2005-10-31  8:21   ` David Jander
2005-10-31 12:58     ` Mark Chambers
2005-10-31 13:08       ` David Jander
2005-10-31 15:29         ` David Jander

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).