* Athlon fast_copy_page revisited
@ 2001-05-30 18:08 Jimmie Mayfield
2001-05-30 18:33 ` Alan Cox
0 siblings, 1 reply; 4+ messages in thread
From: Jimmie Mayfield @ 2001-05-30 18:08 UTC (permalink / raw)
To: linux-kernel
Hi. A few weeks ago there was a discussion centering on the Athlon-optimized
fast_copy_page routine and how the prefetch might be causing problems on Via
motherboards. Unfortunately Alan's proposed fixes (to not prefetch the final 320 bytes)
don't seem to help...at least on my iWill KT-133A system as recent as 2.4.5-ac1.
Since Alan's code looks like it prevents the possibility of prefetching too much data,
it seems something else must be the culprit. Arjan posted a link to a user-space program
that benchmarks various *_clear_page and *_copy_page schemes. I spent yesterday evening
playing with this.
On a whim, I added some NOP statements to the even_faster copy_page routine. Imagine
my surprise when I found that the NOP-modified routine showed the highest
throughput of all the *_copy_page routines, consistently beating the other optimized
routines by sometimes 10% or more (but generally between 2-5%). On my two Athlon machines
(both socket-A thunderbirds), I found that I got the best scores if I grouped the MOVQ and
MOVNTQ operations into sets of 4.
It's interesting to note that I don't run into any problems running any of the *_copy_page
schemes in user-space but if I try in kernel space, I get the notorious crash inside
fast_copy_page. (If there was some sort of fundamental hardware problem associated with
prefetch or streaming, wouldn't it also show up in user-space?) Note: I've yet to try the
NOP-modified routines in kernel-space.
I've tried this benchmark on 4 Athlon/Duron machines now (2 Socket-A Thunderbirds, 1
Socket-A Duron and 1 Slot-A Athlon). Each of the Socket-A machines showed improvement
using the NOP-modified routines while the Slot-A machine performed better with the original
3DNow routine.
Arjan's original code is at: http://www.fenrus.demon.nl/athlon.c
My modifications are at: http://sackheads.org/~mayfield/jrm_athlon.c
Example test runs:
copy_page() tests
copy_page function 'warm up run' took 21350 cycles per page
copy_page function '2.4 non MMX' took 27706 cycles per page
copy_page function '2.4 MMX fallback' took 28600 cycles per page
copy_page function '2.4 MMX version' took 21370 cycles per page
copy_page function 'faster_copy' took 13119 cycles per page
copy_page function 'even_faster' took 14767 cycles per page
copy_page function 'jrm_copy_page_8nop' took 12774 cycles per page
copy_page function 'jrm_copy_page_10nop' took 12746 cycles per page
copy_page function 'jrm_copy_page_12nop' took 12740 cycles per page
copy_page() tests
copy_page function 'warm up run' took 22499 cycles per page
copy_page function '2.4 non MMX' took 27769 cycles per page
copy_page function '2.4 MMX fallback' took 27696 cycles per page
copy_page function '2.4 MMX version' took 22666 cycles per page
copy_page function 'faster_copy' took 13058 cycles per page
copy_page function 'even_faster' took 13169 cycles per page
copy_page function 'jrm_copy_page_8nop' took 12691 cycles per page
copy_page function 'jrm_copy_page_10nop' took 12750 cycles per page
copy_page function 'jrm_copy_page_12nop' took 14786 cycles per page
The values obviously fluctuate depending on system activity but the jrm_* routines were
faster in 13 out of 15 trials.
- Jimmie
--
Jimmie Mayfield
http://www.sackheads.org/mayfield email: mayfield+kernel@sackheads.org
My mail provider does not welcome UCE -- http://www.sackheads.org/uce
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Athlon fast_copy_page revisited
2001-05-30 18:08 Athlon fast_copy_page revisited Jimmie Mayfield
@ 2001-05-30 18:33 ` Alan Cox
2001-05-30 19:06 ` Brian Gerst
0 siblings, 1 reply; 4+ messages in thread
From: Alan Cox @ 2001-05-30 18:33 UTC (permalink / raw)
To: Jimmie Mayfield; +Cc: linux-kernel
> schemes in user-space but if I try in kernel space, I get the notorious crash inside
> fast_copy_page. (If there was some sort of fundamental hardware problem associated with
> prefetch or streaming, wouldn't it also show up in user-space?) Note: I've yet to try the
That has been one of the great puzzles. There are patterns that are very
different in kernel space - notably physically linear memory and code running
from a 4Mb tlb.
Alan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Athlon fast_copy_page revisited
2001-05-30 18:33 ` Alan Cox
@ 2001-05-30 19:06 ` Brian Gerst
0 siblings, 0 replies; 4+ messages in thread
From: Brian Gerst @ 2001-05-30 19:06 UTC (permalink / raw)
To: Alan Cox; +Cc: Jimmie Mayfield, linux-kernel
Alan Cox wrote:
>
> > schemes in user-space but if I try in kernel space, I get the notorious crash inside
> > fast_copy_page. (If there was some sort of fundamental hardware problem associated with
> > prefetch or streaming, wouldn't it also show up in user-space?) Note: I've yet to try the
>
> That has been one of the great puzzles. There are patterns that are very
> different in kernel space - notably physically linear memory and code running
> from a 4Mb tlb.
Have you tried hacking the kernel to only use 4k page tables (ie. filter
out the PSE capability bit)? A shot in the dark, but probably worth
trying.
--
Brian Gerst
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Athlon fast_copy_page revisited
[not found] <fa.f5i683v.igqsp3@ifi.uio.no>
@ 2001-06-02 17:38 ` Gordon Sadler
0 siblings, 0 replies; 4+ messages in thread
From: Gordon Sadler @ 2001-06-02 17:38 UTC (permalink / raw)
To: linux-kernel
[This message has also been posted.]
On Wed, 30 May 2001 18:09:35 GMT, Jimmie Mayfield
<mayfield+kernel@sackheads.org> wrote:
<SNIP explanation>
> Arjan's original code is at: http://www.fenrus.demon.nl/athlon.c
> My modifications are at: http://sackheads.org/~mayfield/jrm_athlon.c
>
> Example test runs:
>
> copy_page() tests
> copy_page function 'warm up run' took 21350 cycles per page
> copy_page function '2.4 non MMX' took 27706 cycles per page
> copy_page function '2.4 MMX fallback' took 28600 cycles per page
> copy_page function '2.4 MMX version' took 21370 cycles per page
> copy_page function 'faster_copy' took 13119 cycles per page
> copy_page function 'even_faster' took 14767 cycles per page
> copy_page function 'jrm_copy_page_8nop' took 12774 cycles per page
> copy_page function 'jrm_copy_page_10nop' took 12746 cycles per page
> copy_page function 'jrm_copy_page_12nop' took 12740 cycles per page
>
> copy_page() tests
> copy_page function 'warm up run' took 22499 cycles per page
> copy_page function '2.4 non MMX' took 27769 cycles per page
> copy_page function '2.4 MMX fallback' took 27696 cycles per page
> copy_page function '2.4 MMX version' took 22666 cycles per page
> copy_page function 'faster_copy' took 13058 cycles per page
> copy_page function 'even_faster' took 13169 cycles per page
> copy_page function 'jrm_copy_page_8nop' took 12691 cycles per page
> copy_page function 'jrm_copy_page_10nop' took 12750 cycles per page
> copy_page function 'jrm_copy_page_12nop' took 14786 cycles per page
>
> The values obviously fluctuate depending on system activity but the jrm_*
> routines were faster in 13 out of 15 trials.
>
I have a Duron 800 socket A on an Epox 8KTA3.
Has anyone noticed fluctuations with these tests.. such as
jrm_athlon1:
...
copy_page function 'faster_copy' took 9869 cycles per page
copy_page function 'even_faster' took 9822 cycles per page
...
jrm_athlon2:
...
copy_page function 'faster_copy' took 9939 cycles per page
copy_page function 'even_faster' took 17728 cycles per page
...
jrm_athlon3:
...
copy_page function 'faster_copy' took 17711 cycles per page
copy_page function 'even_faster' took 9843 cycles per page
...
I see these with gcc 2.95.4(Debian unstable) and a local build of
gcc-3.0 from CVS last night.
Almost as though some stall and/or caching is corrupting the results.
--
Gordon Sadler
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2001-06-02 17:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-05-30 18:08 Athlon fast_copy_page revisited Jimmie Mayfield
2001-05-30 18:33 ` Alan Cox
2001-05-30 19:06 ` Brian Gerst
[not found] <fa.f5i683v.igqsp3@ifi.uio.no>
2001-06-02 17:38 ` Gordon Sadler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox