From: Michael Thonke <iogl64nx@gmail.com>
To: Benjamin LaHaise <bcrl@kvack.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RFC] x86-64: Use SSE for copy_page and clear_page
Date: Mon, 30 May 2005 22:42:29 +0200 [thread overview]
Message-ID: <429B7AB5.5080400@gmail.com> (raw)
In-Reply-To: <20050530201419.GB10212@kvack.org>
Benjamin LaHaise schrieb:
>On Mon, May 30, 2005 at 10:05:28PM +0200, Michael Thonke wrote:
>
>
>>No it doesn't like this sample here at all,I'll get segmentationfault on
>>that run.
>>
>>
>
>Grab a new copy -- one of the routines had an unaligned store instead of
>aligned for the register save.
>
> -ben
>
>
>
Hi Benjamin,
Here are the results with the new copy.
*RUN 1: cc -o xmm64.o xmm64.c*
ioGL64NX_EMT64 ~ # ./xmm64.o
SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
buffer = 0x2aaaaade7000
clear_page() tests
clear_page function 'warm up run' took 13632 cycles per page
clear_page function 'kernel clear' took 6599 cycles per page
clear_page function '2.4 non MMX' took 6482 cycles per page
clear_page function '2.4 MMX fallback' took 6367 cycles per page
clear_page function '2.4 MMX version' took 6644 cycles per page
clear_page function 'faster_clear_page' took 6088 cycles per page
clear_page function 'even_faster_clear' took 5692 cycles per page
clear_page function 'xmm_clear' took 4270 cycles per page
clear_page function 'xmma_clear' took 6351 cycles per page
clear_page function 'xmm2_clear' took 4710 cycles per page
clear_page function 'xmma2_clear' took 6198 cycles per page
clear_page function 'xmm3_clear' took 6583 cycles per page
clear_page function 'nt clear ' took 4746 cycles per page
clear_page function 'kernel clear' took 6158 cycles per page
copy_page() tests
copy_page function 'warm up run' took 9210 cycles per page
copy_page function '2.4 non MMX' took 6740 cycles per page
copy_page function '2.4 MMX fallback' took 6697 cycles per page
copy_page function '2.4 MMX version' took 9178 cycles per page
copy_page function 'faster_copy' took 11360 cycles per page
copy_page function 'even_faster' took 10133 cycles per page
copy_page function 'xmm_copy_page_no' took 8885 cycles per page
copy_page function 'xmm_copy_page' took 8725 cycles per page
copy_page function 'xmma_copy_page' took 9964 cycles per page
copy_page function 'xmm3_copy_page' took 7176 cycles per page
copy_page function 'v26_copy_page' took 6879 cycles per page
copy_page function 'nt_copy_page' took 10858 cycles per page
*RUN 2: gcc -o xmm64.o xmm64.c*
ioGL64NX_EMT64 ~ # ./xmm64.o
SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
buffer = 0x2aaaaade7000
clear_page() tests
clear_page function 'warm up run' took 13981 cycles per page
clear_page function 'kernel clear' took 6708 cycles per page
clear_page function '2.4 non MMX' took 6505 cycles per page
clear_page function '2.4 MMX fallback' took 6235 cycles per page
clear_page function '2.4 MMX version' took 7251 cycles per page
clear_page function 'faster_clear_page' took 6390 cycles per page
clear_page function 'even_faster_clear' took 5932 cycles per page
clear_page function 'xmm_clear' took 4876 cycles per page
clear_page function 'xmma_clear' took 6379 cycles per page
clear_page function 'xmm2_clear' took 5264 cycles per page
clear_page function 'xmma2_clear' took 6373 cycles per page
clear_page function 'xmm3_clear' took 6651 cycles per page
clear_page function 'nt clear ' took 5186 cycles per page
clear_page function 'kernel clear' took 6326 cycles per page
copy_page() tests
copy_page function 'warm up run' took 9537 cycles per page
copy_page function '2.4 non MMX' took 6776 cycles per page
copy_page function '2.4 MMX fallback' took 7407 cycles per page
copy_page function '2.4 MMX version' took 8812 cycles per page
copy_page function 'faster_copy' took 10992 cycles per page
copy_page function 'even_faster' took 10232 cycles per page
copy_page function 'xmm_copy_page_no' took 8918 cycles per page
copy_page function 'xmm_copy_page' took 9579 cycles per page
copy_page function 'xmma_copy_page' took 9854 cycles per page
copy_page function 'xmm3_copy_page' took 7602 cycles per page
copy_page function 'v26_copy_page' took 6811 cycles per page
copy_page function 'nt_copy_page' took 10958 cycles per page
*RUN 3: gcc -pipe -march=nocona -O2 -o xmm64.o xmm64.c
*
SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
buffer = 0x2aaaaade7000
clear_page() tests
clear_page function 'warm up run' took 13626 cycles per page
clear_page function 'kernel clear' took 6780 cycles per page
clear_page function '2.4 non MMX' took 6755 cycles per page
clear_page function '2.4 MMX fallback' took 6283 cycles per page
clear_page function '2.4 MMX version' took 6764 cycles per page
clear_page function 'faster_clear_page' took 5764 cycles per page
clear_page function 'even_faster_clear' took 5240 cycles per page
clear_page function 'xmm_clear' took 4532 cycles per page
clear_page function 'xmma_clear' took 6352 cycles per page
clear_page function 'xmm2_clear' took 4983 cycles per page
clear_page function 'xmma2_clear' took 6211 cycles per page
clear_page function 'xmm3_clear' took 6748 cycles per page
clear_page function 'nt clear ' took 5166 cycles per page
clear_page function 'kernel clear' took 6201 cycles per page
copy_page() tests
copy_page function 'warm up run' took 9651 cycles per page
copy_page function '2.4 non MMX' took 6724 cycles per page
copy_page function '2.4 MMX fallback' took 6905 cycles per page
copy_page function '2.4 MMX version' took 9722 cycles per page
copy_page function 'faster_copy' took 9738 cycles per page
copy_page function 'even_faster' took 9609 cycles per page
copy_page function 'xmm_copy_page_no' took 8846 cycles per page
copy_page function 'xmm_copy_page' took 8591 cycles per page
copy_page function 'xmma_copy_page' took 8250 cycles per page
copy_page function 'xmm3_copy_page' took 7879 cycles per page
copy_page function 'v26_copy_page' took 7512 cycles per page
copy_page function 'nt_copy_page' took 10424 cycles per page
RUN 4: *gcc -pipe -march=nocona -O2 -fPIC -o xmm64.o xmm64.c*
SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
buffer = 0x2aaaaade7000
clear_page() tests
clear_page function 'warm up run' took 13713 cycles per page
clear_page function 'kernel clear' took 6655 cycles per page
clear_page function '2.4 non MMX' took 6448 cycles per page
clear_page function '2.4 MMX fallback' took 6270 cycles per page
clear_page function '2.4 MMX version' took 7001 cycles per page
clear_page function 'faster_clear_page' took 5671 cycles per page
clear_page function 'even_faster_clear' took 5366 cycles per page
clear_page function 'xmm_clear' took 4737 cycles per page
clear_page function 'xmma_clear' took 6464 cycles per page
clear_page function 'xmm2_clear' took 5214 cycles per page
clear_page function 'xmma2_clear' took 6371 cycles per page
clear_page function 'xmm3_clear' took 6660 cycles per page
clear_page function 'nt clear ' took 5066 cycles per page
clear_page function 'kernel clear' took 6314 cycles per page
copy_page() tests
copy_page function 'warm up run' took 9464 cycles per page
copy_page function '2.4 non MMX' took 7179 cycles per page
copy_page function '2.4 MMX fallback' took 6928 cycles per page
copy_page function '2.4 MMX version' took 9091 cycles per page
copy_page function 'faster_copy' took 9996 cycles per page
copy_page function 'even_faster' took 9824 cycles per page
copy_page function 'xmm_copy_page_no' took 8724 cycles per page
copy_page function 'xmm_copy_page' took 8920 cycles per page
copy_page function 'xmma_copy_page' took 8859 cycles per page
copy_page function 'xmm3_copy_page' took 7794 cycles per page
copy_page function 'v26_copy_page' took 7808 cycles per page
copy_page function 'nt_copy_page' took 9264 cycles per page
Do you need more results or tests Benjamin?
Greets and best regards
Michael
next prev parent reply other threads:[~2005-05-30 20:42 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-30 18:16 [RFC] x86-64: Use SSE for copy_page and clear_page Benjamin LaHaise
2005-05-30 18:45 ` Jeff Garzik
2005-05-30 19:06 ` dean gaudet
2005-05-30 19:11 ` dean gaudet
2005-05-30 19:32 ` Andi Kleen
2005-05-31 8:37 ` Denis Vlasenko
2005-05-31 9:15 ` Denis Vlasenko
2005-05-31 9:23 ` Andi Kleen
2005-05-31 13:59 ` Benjamin LaHaise
2005-06-01 6:22 ` Denis Vlasenko
2005-06-01 6:47 ` Denis Vlasenko
2005-06-01 7:22 ` michael
2005-06-01 7:48 ` Andi Kleen
2005-06-01 7:48 ` Denis Vlasenko
2005-06-01 21:46 ` dean gaudet
2005-06-01 8:01 ` Nick Piggin
2005-05-30 19:38 ` Andi Kleen
2005-05-30 20:05 ` Michael Thonke
2005-05-30 20:14 ` Benjamin LaHaise
2005-05-30 20:42 ` Michael Thonke [this message]
2005-05-31 7:11 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=429B7AB5.5080400@gmail.com \
--to=iogl64nx@gmail.com \
--cc=bcrl@kvack.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox