* Kernel 2.6 on MPC8xx performance trouble...
@ 2005-10-28 6:57 David Jander
2005-10-28 9:36 ` Roger Larsson
2005-10-28 15:30 ` Marcelo Tosatti
0 siblings, 2 replies; 11+ messages in thread
From: David Jander @ 2005-10-28 6:57 UTC (permalink / raw)
To: linux-ppc-embedded
Hi all,
Many people have said it before: 2.6 has a performance penalty specially for
embdedded systems.
But now that I have 2.6 running on our 100MHz MPC852T based board, I was
shocked to see the result:
The most simple task doing nothing but a closed loop of integer math runs at
_half_ the speed compared to kernel 2.4.25!!!!!
Here are the conditions for the test:
- Bogomips are the same, so the CPU definitely runs at the same clock-rate
(and not half) as with "2.4".
- Enabling and disabling preemption doesn't have any impact (as expected for
such kinds of tasks).
- Setting HZ to 100 or 1000 also has only about 3% impact on speed.
- The binary of the test program is the same in both cases (no re-compile with
other optimizations by accident).
- The hardware is the same (exact same board).
- Certain hardware drivers that are not ported to "2.6" yet were present in
"2.4" but (obviously) not in "2.6", but non of them could have a _positive_
impact on performance.
- Kernel versions are 2.4.25 (denx-devel) and 2.6.14-rc5 (denx-git 20051027).
Result: The test takes 3 seconds on kernel-2.6 and 1.5 seconds on kernel-2.4.
Here is what "time" has to say about it:
real 0m3.119s
user 0m3.080s
sys 0m0.040s
The test loop is pretty brain-dead, but that doesn't matter right now.
This is it:
[....]
gettimeofday(&tv0,NULL);
for(t=0L; t<10000000L; t++)
{
a+=b; /* Do something */
}
gettimeofday(&tv,NULL);
[...]
Any ideas?
Am I misconfiguring something? Is 2.6 support for mpc8xx still broken
(cache/tlb, mm, etc) or is 2.6 supposed to perform THAT bad??
Greetings,
--
David Jander
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel 2.6 on MPC8xx performance trouble...
2005-10-28 6:57 Kernel 2.6 on MPC8xx performance trouble David Jander
@ 2005-10-28 9:36 ` Roger Larsson
2005-10-28 10:57 ` David Jander
2005-10-28 15:30 ` Marcelo Tosatti
1 sibling, 1 reply; 11+ messages in thread
From: Roger Larsson @ 2005-10-28 9:36 UTC (permalink / raw)
To: linuxppc-embedded
On Friday 28 October 2005 08.57, David Jander wrote:
> Hi all,
>
> Many people have said it before: 2.6 has a performance penalty specially
> for embdedded systems.
> But now that I have 2.6 running on our 100MHz MPC852T based board, I was
> shocked to see the result:
> The most simple task doing nothing but a closed loop of integer math runs
> at _half_ the speed compared to kernel 2.4.25!!!!!
>
> Here are the conditions for the test:
> - Bogomips are the same, so the CPU definitely runs at the same clock-rate
> (and not half) as with "2.4".
> - Enabling and disabling preemption doesn't have any impact (as expected
> for such kinds of tasks).
> - Setting HZ to 100 or 1000 also has only about 3% impact on speed.
> - The binary of the test program is the same in both cases (no re-compile
> with other optimizations by accident).
> - The hardware is the same (exact same board).
> - Certain hardware drivers that are not ported to "2.6" yet were present in
> "2.4" but (obviously) not in "2.6", but non of them could have a _positive_
> impact on performance.
> - Kernel versions are 2.4.25 (denx-devel) and 2.6.14-rc5 (denx-git
> 20051027).
>
> Result: The test takes 3 seconds on kernel-2.6 and 1.5 seconds on
> kernel-2.4. Here is what "time" has to say about it:
>
> real 0m3.119s
> user 0m3.080s
> sys 0m0.040s
>
> The test loop is pretty brain-dead, but that doesn't matter right now.
> This is it:
> [....]
> gettimeofday(&tv0,NULL);
> for(t=0L; t<10000000L; t++)
> {
> a+=b; /* Do something */
> }
> gettimeofday(&tv,NULL);
> [...]
>
> Any ideas?
> Am I misconfiguring something? Is 2.6 support for mpc8xx still broken
> (cache/tlb, mm, etc) or is 2.6 supposed to perform THAT bad??
Have you verified the system measured time with wall clock (wrist watch)?
System time could be wrong on one of the systems.
What is 'a' and 'b'? The only other explanation I can see is that your
"Do something" is more memory heavy than you think - array calculations?
(Cache problems should probably give a worse result, but you could check that
those config registers are the same).
/RogerL
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel 2.6 on MPC8xx performance trouble...
2005-10-28 9:36 ` Roger Larsson
@ 2005-10-28 10:57 ` David Jander
2005-10-28 18:44 ` Roger Larsson
2005-10-28 20:37 ` Wolfgang Denk
0 siblings, 2 replies; 11+ messages in thread
From: David Jander @ 2005-10-28 10:57 UTC (permalink / raw)
To: linuxppc-embedded
On Friday 28 October 2005 11:36, Roger Larsson wrote:
> >[...]
> > Result: The test takes 3 seconds on kernel-2.6 and 1.5 seconds on
> > kernel-2.4. Here is what "time" has to say about it:
> >
> > real 0m3.119s
> > user 0m3.080s
> > sys 0m0.040s
> >
> > The test loop is pretty brain-dead, but that doesn't matter right now.
> > This is it:
> > [....]
> > gettimeofday(&tv0,NULL);
> > for(t=0L; t<10000000L; t++)
> > {
> > a+=b; /* Do something */
> > }
> > gettimeofday(&tv,NULL);
> > [...]
> >
> > Any ideas?
> > Am I misconfiguring something? Is 2.6 support for mpc8xx still broken
> > (cache/tlb, mm, etc) or is 2.6 supposed to perform THAT bad??
>
> Have you verified the system measured time with wall clock (wrist watch)?
> System time could be wrong on one of the systems.
Yes. Wall-clock==gettimeofday-clock on both systems.
> What is 'a' and 'b'? The only other explanation I can see is that your
> "Do something" is more memory heavy than you think - array calculations?
> (Cache problems should probably give a worse result, but you could check
> that those config registers are the same).
They are just integers with fixed start values. These are in the loop, so it's
not an empty loop and hopefully the compiler won't out-optimize it so easily
(that is of course without specifying any optimization flags). Please don't
tell me it's a lousy benchmark, because I already know that! Be it as lousy
as it is, I shouldn't get _those_ results IMHO.
I have downloaded nbench (hopefully a more serious benchmark for raw computing
power), and the results are as follows (I deliberately excluded tests that
don't make sense (ie. use FP)):
Kernel 2.4.25:
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 30.438 : 0.78 : 0.26
STRING SORT : 1.5842 : 0.71 : 0.11
BITFIELD : 7.9506e+06 : 1.36 : 0.28
FP EMULATION : 3.258 : 1.56 : 0.36
IDEA : 108.89 : 1.67 : 0.49
Kernel 2.6.14-r5:
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 21.042 : 0.54 : 0.18
STRING SORT : 0.88215 : 0.39 : 0.06
BITFIELD : 6.0979e+06 : 1.05 : 0.22
FP EMULATION : 1.6453 : 0.79 : 0.18
IDEA : 110.25 : 1.69 : 0.50
Now, the strange thing is, IDEA is the only test where 2.6 is slightly faster
(results are consistent over repeated runs). Compiler options are: "-s
-static -Wall -O3 -msoft-float", FP-emulation in the kernel is never
activated.
I think I will need to look closer at how the "IDEA"-test works, but first
I'll have to run the nbench sources through "indent", because they are pretty
unreadable as is ;-)
I also downloaded oprofile, and am studying its manual right now. Hopefully
using this might get me a clue. If anybody already happens to know the
answer, please shout!
Greetings,
--
David Jander
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel 2.6 on MPC8xx performance trouble...
2005-10-28 6:57 Kernel 2.6 on MPC8xx performance trouble David Jander
2005-10-28 9:36 ` Roger Larsson
@ 2005-10-28 15:30 ` Marcelo Tosatti
2005-10-31 8:21 ` David Jander
1 sibling, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2005-10-28 15:30 UTC (permalink / raw)
To: David Jander; +Cc: linux-ppc-embedded
On Fri, Oct 28, 2005 at 08:57:44AM +0200, David Jander wrote:
>
> Hi all,
>
> Many people have said it before: 2.6 has a performance penalty specially for
> embdedded systems.
> But now that I have 2.6 running on our 100MHz MPC852T based board, I was
> shocked to see the result:
> The most simple task doing nothing but a closed loop of integer math runs at
> _half_ the speed compared to kernel 2.4.25!!!!!
David,
Do you have CONFIG_PIN_TLB enabled?
If you do, please patch this in:
http://www.kernel.org/git/?p=linux/kernel/git/marcelo/8xx-fixes;a=commitdiff;h=a41ba028534c45280170c05c23609ea84f34b38a
And select DEBUG_PIN_TLBIE.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel 2.6 on MPC8xx performance trouble...
2005-10-28 10:57 ` David Jander
@ 2005-10-28 18:44 ` Roger Larsson
2005-10-28 20:37 ` Wolfgang Denk
1 sibling, 0 replies; 11+ messages in thread
From: Roger Larsson @ 2005-10-28 18:44 UTC (permalink / raw)
To: David Jander; +Cc: linuxppc-embedded
On Friday 28 October 2005 12.57, David Jander wrote:
> They are just integers with fixed start values. These are in the loop, so
> it's not an empty loop and hopefully the compiler won't out-optimize it so
> easily (that is of course without specifying any optimization flags).
> Please don't tell me it's a lousy benchmark, because I already know that!
> Be it as lousy as it is, I shouldn't get _those_ results IMHO.
>
> I have downloaded nbench (hopefully a more serious benchmark for raw
> computing power), and the results are as follows (I deliberately excluded
> tests that don't make sense (ie. use FP)):
>
> Kernel 2.4.25:
>
> TEST : Iterations/sec. : Old Index : New Index
>
> : : Pentium 90* : AMD K6/233*
>
> --------------------:------------------:-------------:------------
> NUMERIC SORT : 30.438 : 0.78 : 0.26
> STRING SORT : 1.5842 : 0.71 : 0.11
> BITFIELD : 7.9506e+06 : 1.36 : 0.28
> FP EMULATION : 3.258 : 1.56 : 0.36
> IDEA : 108.89 : 1.67 : 0.49
>
> Kernel 2.6.14-r5:
>
> TEST : Iterations/sec. : Old Index : New Index
>
> : : Pentium 90* : AMD K6/233*
>
> --------------------:------------------:-------------:------------
> NUMERIC SORT : 21.042 : 0.54 : 0.18
> STRING SORT : 0.88215 : 0.39 : 0.06
> BITFIELD : 6.0979e+06 : 1.05 : 0.22
> FP EMULATION : 1.6453 : 0.79 : 0.18
> IDEA : 110.25 : 1.69 : 0.50
>
>
What about the Pentium 90 and AMD K6? Are those values actual measured
results? By you? If not why do THEY differ between the kernel versions?
Is this a MPC8xx problem - can it be verified on a x86?
/RogerL
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel 2.6 on MPC8xx performance trouble...
2005-10-28 10:57 ` David Jander
2005-10-28 18:44 ` Roger Larsson
@ 2005-10-28 20:37 ` Wolfgang Denk
2005-10-31 9:31 ` David Jander
1 sibling, 1 reply; 11+ messages in thread
From: Wolfgang Denk @ 2005-10-28 20:37 UTC (permalink / raw)
To: David Jander; +Cc: linuxppc-embedded
In message <200510281257.22650.david.jander@protonic.nl> you wrote:
>
> > What is 'a' and 'b'? The only other explanation I can see is that your
> > "Do something" is more memory heavy than you think - array calculations?
> > (Cache problems should probably give a worse result, but you could check
> > that those config registers are the same).
>
> They are just integers with fixed start values. These are in the loop, so it's
> not an empty loop and hopefully the compiler won't out-optimize it so easily
> (that is of course without specifying any optimization flags). Please don't
> tell me it's a lousy benchmark, because I already know that! Be it as lousy
> as it is, I shouldn't get _those_ results IMHO.
Indeed, you should not get such results. If you compare with the
lmbench results of our 2.4/2.6 comparison, you will notice that we
did NOT see such behaviour. There was a perfromnce degradation for
pure integer tests, due to increased system overhead, but far from a
factor of 2.
See http://www.denx.de/wiki/pub/Know/Linux24vs26/lmbench_results
Best regards,
Wolfgang Denk
--
Software Engineering: Embedded and Realtime Systems, Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
A fanatic is a person who can't change his mind and won't change the
subject. - Winston Churchill
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel 2.6 on MPC8xx performance trouble...
2005-10-28 15:30 ` Marcelo Tosatti
@ 2005-10-31 8:21 ` David Jander
2005-10-31 12:58 ` Mark Chambers
0 siblings, 1 reply; 11+ messages in thread
From: David Jander @ 2005-10-31 8:21 UTC (permalink / raw)
To: linuxppc-embedded
On Friday 28 October 2005 17:30, Marcelo Tosatti wrote:
>[...]
> David,
>
> Do you have CONFIG_PIN_TLB enabled?
>
> If you do, please patch this in:
>
> http://www.kernel.org/git/?p=linux/kernel/git/marcelo/8xx-fixes;a=commitdif
>f;h=a41ba028534c45280170c05c23609ea84f34b38a
>
> And select DEBUG_PIN_TLBIE.
Ok, done that... no change.
I don't get any of those debug messages, so I guess that was not the problem.
I have made another test in the meantime, trying to check if cache is working.
The test is pretty simple: see how fast I can fill a block of memory whose
size is increasing by a power of two. You should expect to see a step-like
decrease in speed when surpassing the size of the data cache (4kbyte).
The results are very suspicious:
kernel-2.4:
Memsize 512 : 39.342773 Mbyte/s
Memsize 1024 : 41.871094 Mbyte/s
Memsize 2048 : 43.212891 Mbyte/s
Memsize 4096 : 40.117188 Mbyte/s
Memsize 8192 : 28.148438 Mbyte/s
Memsize 16384 : 28.484375 Mbyte/s
Memsize 32768 : 28.656250 Mbyte/s
Memsize 65536 : 28.687500 Mbyte/s
This looks quite healthy: above 4kbyte we see a clear drop in performance, so
we just learned that our data-cache is most probably 4kbyte in size, wow!
Kernel-2.6:
Memsize 512 : 21.033691 Mbyte/s
Memsize 1024 : 22.047852 Mbyte/s
Memsize 2048 : 22.601562 Mbyte/s
Memsize 4096 : 22.882812 Mbyte/s
Memsize 8192 : 23.007812 Mbyte/s
Memsize 16384 : 23.093750 Mbyte/s
Memsize 32768 : 23.125000 Mbyte/s
Memsize 65536 : 23.125000 Mbyte/s
No, sir, no cache detected !
Where do I have to look now?
Greetings,
--
David Jander
Protonic Holland.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel 2.6 on MPC8xx performance trouble...
2005-10-28 20:37 ` Wolfgang Denk
@ 2005-10-31 9:31 ` David Jander
0 siblings, 0 replies; 11+ messages in thread
From: David Jander @ 2005-10-31 9:31 UTC (permalink / raw)
To: linuxppc-embedded
On Friday 28 October 2005 22:37, Wolfgang Denk wrote:
>[...]
> > They are just integers with fixed start values. These are in the loop, so
> > it's not an empty loop and hopefully the compiler won't out-optimize it
> > so easily (that is of course without specifying any optimization flags).
> > Please don't tell me it's a lousy benchmark, because I already know that!
> > Be it as lousy as it is, I shouldn't get _those_ results IMHO.
>
> Indeed, you should not get such results. If you compare with the
> lmbench results of our 2.4/2.6 comparison, you will notice that we
> did NOT see such behaviour. There was a perfromnce degradation for
> pure integer tests, due to increased system overhead, but far from a
> factor of 2.
> See http://www.denx.de/wiki/pub/Know/Linux24vs26/lmbench_results
I have seen them, and my conclusion is: Your kernel was working ok, while
mine, a newer one, is broken.
As you can see in the other e-mail I just posted (replying to Marcelo), at
least the CPU cache seems to be disabled. Might this have something to do
with processor model (mis-) identification?
I had to apply the "ppc_sys: do not BUG if system ID is unknown" patch from
Marcelo Tosatti a few days back in order to be able to boot in the first
place. I had a look at ppc_sys system identification for 8xx and it looked a
little bit nonsensical to me, since all 8xx report the same ID. Maybe the
intention was to set the ID "by hand" in board support and setup.
Problem is: there is still no real board-support infrastructure for mpc8xx,
like there is for mpc82xx for example. What are the plans for 8xx? Should I
try to emulate what others have done for some PQ2 platforms, i.e. create a
arch/ppc/platforms/myplatform.c file and implement board_init()?
Greetings,
--
David Jander
Protonic Holland.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel 2.6 on MPC8xx performance trouble...
2005-10-31 8:21 ` David Jander
@ 2005-10-31 12:58 ` Mark Chambers
2005-10-31 13:08 ` David Jander
0 siblings, 1 reply; 11+ messages in thread
From: Mark Chambers @ 2005-10-31 12:58 UTC (permalink / raw)
To: David Jander, linuxppc-embedded
>
> I have made another test in the meantime, trying to check if cache is
working.
> <snip>
> No, sir, no cache detected !
>
> Where do I have to look now?
>
Could the cache be in writethrough mode? (Instead of writeback)
Mark Chambers
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel 2.6 on MPC8xx performance trouble...
2005-10-31 12:58 ` Mark Chambers
@ 2005-10-31 13:08 ` David Jander
2005-10-31 15:29 ` David Jander
0 siblings, 1 reply; 11+ messages in thread
From: David Jander @ 2005-10-31 13:08 UTC (permalink / raw)
To: linuxppc-embedded
On Monday 31 October 2005 13:58, Mark Chambers wrote:
>[...]
> > No, sir, no cache detected !
> >
> > Where do I have to look now?
>
> Could the cache be in writethrough mode? (Instead of writeback)
Good point. I just checked the same test, but reading instead of writing. Same
result, no cache detected.
I know how to change cache policy in u-boot, but does the kernel change that?
If so, where?
Greetings,
--
David Jander
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Kernel 2.6 on MPC8xx performance trouble...
2005-10-31 13:08 ` David Jander
@ 2005-10-31 15:29 ` David Jander
0 siblings, 0 replies; 11+ messages in thread
From: David Jander @ 2005-10-31 15:29 UTC (permalink / raw)
To: linuxppc-embedded
On Monday 31 October 2005 14:08, David Jander wrote:
> On Monday 31 October 2005 13:58, Mark Chambers wrote:
> >[...]
> >
> > > No, sir, no cache detected !
> > >
> > > Where do I have to look now?
> >
> > Could the cache be in writethrough mode? (Instead of writeback)
>
> Good point. I just checked the same test, but reading instead of writing.
> Same result, no cache detected.
>
> I know how to change cache policy in u-boot, but does the kernel change
> that? If so, where?
Sorry for the stupid question. I did not remember it was there right before my
eyes. It seems I just got lost in the new menu structure of the Kconfig.
And yes, cache was in writethrough mode!!
It seems as though this option disables cache entirely, because after turning
it on, I suddenly get 2.4-type speed measurements.
With CONFIG_8xx_COPYBACK undefined, both reading and writing gives results as
if there was simply no cache. Shouldn't it at least speed up memory-read
access?
Sorry to everone for wasting their time with this stupid mistake of mine.
Thanks for all the help anyway.
Greetings,
--
David Jander
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2005-10-31 15:30 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-28 6:57 Kernel 2.6 on MPC8xx performance trouble David Jander
2005-10-28 9:36 ` Roger Larsson
2005-10-28 10:57 ` David Jander
2005-10-28 18:44 ` Roger Larsson
2005-10-28 20:37 ` Wolfgang Denk
2005-10-31 9:31 ` David Jander
2005-10-28 15:30 ` Marcelo Tosatti
2005-10-31 8:21 ` David Jander
2005-10-31 12:58 ` Mark Chambers
2005-10-31 13:08 ` David Jander
2005-10-31 15:29 ` David Jander
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).