From: Michael Thonke <iogl64nx@gmail.com>
To: Andi Kleen <ak@muc.de>
Cc: Benjamin LaHaise <bcrl@kvack.org>, linux-kernel@vger.kernel.org
Subject: Re: [RFC] x86-64: Use SSE for copy_page and clear_page
Date: Mon, 30 May 2005 22:05:28 +0200 [thread overview]
Message-ID: <429B7208.6070804@gmail.com> (raw)
In-Reply-To: <20050530193823.GD25794@muc.de>
Andi Kleen schrieb:
>>The SSE clear page fuction is almost twice as fast as the kernel's
>>current clear_page, while the copy_page implementation is roughly a
>>third faster. This is likely due to the fact that SSE instructions
>>can keep the 256 bit wide L2 cache bus at a higher utilisation than
>>64 bit movs are able to. Comments?
>>
>>
>
>Any use of write combining is wrong here because it forces
>the destination out of cache, which causes performance issues later on.
>Believe me we went through this years ago.
>
>If you can code up a better function for P4 that does not use
>write combining I would be happy to add. I never tuned the functions
>for P4.
>
>One simple experiment would be to just test if P4 likes the
>simple rep ; movsq / rep ; stosq loops and enable them.
>
>
No it doesn't like this sample here at all,I'll get segmentationfault on
that run.
RUN 1:
SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
buffer = 0x2aaaaade7000
clear_page() tests
clear_page function 'warm up run' took 13516 cycles per page
clear_page function 'kernel clear' took 6539 cycles per page
clear_page function '2.4 non MMX' took 6354 cycles per page
clear_page function '2.4 MMX fallback' took 6205 cycles per page
clear_page function '2.4 MMX version' took 6830 cycles per page
clear_page function 'faster_clear_page' took 6240 cycles per page
clear_page function 'even_faster_clear' took 5746 cycles per page
clear_page function 'xmm_clear ' took 4580 cycles per page
Segmentation fault
xmm64.o[9485] general protection rip:400814 rsp:7fffffc74118 error:0
xmm64.o[9486] general protection rip:400814 rsp:7fffff8b1498 error:0
xmm64.o[9487] general protection rip:400814 rsp:7fffffc31848 error:0
RUN 2:
Tell gcc use processor specific flags
gcc -pipe -march=nocona -O2 -o xmm64.o xmm64.c
SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
buffer = 0x2aaaaade7000
clear_page() tests
clear_page function 'warm up run' took 13419 cycles per page
clear_page function 'kernel clear' took 6403 cycles per page
clear_page function '2.4 non MMX' took 6290 cycles per page
clear_page function '2.4 MMX fallback' took 6156 cycles per page
clear_page function '2.4 MMX version' took 6605 cycles per page
clear_page function 'faster_clear_page' took 5607 cycles per page
clear_page function 'even_faster_clear' took 5173 cycles per page
clear_page function 'xmm_clear ' took 4307 cycles per page
clear_page function 'xmma_clear ' took 6230 cycles per page
clear_page function 'xmm2_clear ' took 4908 cycles per page
clear_page function 'xmma2_clear ' took 6256 cycles per page
clear_page function 'kernel clear' took 6506 cycles per page
copy_page() tests
copy_page function 'warm up run' took 10352 cycles per page
copy_page function '2.4 non MMX' took 9440 cycles per page
copy_page function '2.4 MMX fallback' took 9300 cycles per page
copy_page function '2.4 MMX version' took 10238 cycles per page
copy_page function 'faster_copy' took 9497 cycles per page
copy_page function 'even_faster' took 9229 cycles per page
copy_page function 'xmm_copy_page_no' took 7810 cycles per page
copy_page function 'xmm_copy_page' took 7397 cycles per page
copy_page function 'xmma_copy_page' took 9430 cycles per page
copy_page function 'v26_copy_page' took 9234 cycles per page
CPU flags on Intel Pentium 4 640 x86_64 Gentoo GNU/Linux
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
syscall nx lm constant_tsc pni monitor ds_cpl est cid cx16 xtpr
Greets
Michael
next prev parent reply other threads:[~2005-05-30 20:06 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-30 18:16 [RFC] x86-64: Use SSE for copy_page and clear_page Benjamin LaHaise
2005-05-30 18:45 ` Jeff Garzik
2005-05-30 19:06 ` dean gaudet
2005-05-30 19:11 ` dean gaudet
2005-05-30 19:32 ` Andi Kleen
2005-05-31 8:37 ` Denis Vlasenko
2005-05-31 9:15 ` Denis Vlasenko
2005-05-31 9:23 ` Andi Kleen
2005-05-31 13:59 ` Benjamin LaHaise
2005-06-01 6:22 ` Denis Vlasenko
2005-06-01 6:47 ` Denis Vlasenko
2005-06-01 7:22 ` michael
2005-06-01 7:48 ` Andi Kleen
2005-06-01 7:48 ` Denis Vlasenko
2005-06-01 21:46 ` dean gaudet
2005-06-01 8:01 ` Nick Piggin
2005-05-30 19:38 ` Andi Kleen
2005-05-30 20:05 ` Michael Thonke [this message]
2005-05-30 20:14 ` Benjamin LaHaise
2005-05-30 20:42 ` Michael Thonke
2005-05-31 7:11 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=429B7208.6070804@gmail.com \
--to=iogl64nx@gmail.com \
--cc=ak@muc.de \
--cc=bcrl@kvack.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox