From: Michael Thonke <iogl64nx@gmail.com>
To: Benjamin LaHaise <bcrl@kvack.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RFC] x86-64: Use SSE for copy_page and clear_page
Date: Mon, 30 May 2005 22:42:29 +0200 [thread overview]
Message-ID: <429B7AB5.5080400@gmail.com> (raw)
In-Reply-To: <20050530201419.GB10212@kvack.org>
Benjamin LaHaise schrieb:
>On Mon, May 30, 2005 at 10:05:28PM +0200, Michael Thonke wrote:
>
>
>>No it doesn't like this sample here at all,I'll get segmentationfault on
>>that run.
>>
>>
>
>Grab a new copy -- one of the routines had an unaligned store instead of
>aligned for the register save.
>
> -ben
>
>
>
Hi Benjamin,
Here are the results with the new copy.
*RUN 1: cc -o xmm64.o xmm64.c*
ioGL64NX_EMT64 ~ # ./xmm64.o
SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
buffer = 0x2aaaaade7000
clear_page() tests
clear_page function 'warm up run' took 13632 cycles per page
clear_page function 'kernel clear' took 6599 cycles per page
clear_page function '2.4 non MMX' took 6482 cycles per page
clear_page function '2.4 MMX fallback' took 6367 cycles per page
clear_page function '2.4 MMX version' took 6644 cycles per page
clear_page function 'faster_clear_page' took 6088 cycles per page
clear_page function 'even_faster_clear' took 5692 cycles per page
clear_page function 'xmm_clear' took 4270 cycles per page
clear_page function 'xmma_clear' took 6351 cycles per page
clear_page function 'xmm2_clear' took 4710 cycles per page
clear_page function 'xmma2_clear' took 6198 cycles per page
clear_page function 'xmm3_clear' took 6583 cycles per page
clear_page function 'nt clear ' took 4746 cycles per page
clear_page function 'kernel clear' took 6158 cycles per page
copy_page() tests
copy_page function 'warm up run' took 9210 cycles per page
copy_page function '2.4 non MMX' took 6740 cycles per page
copy_page function '2.4 MMX fallback' took 6697 cycles per page
copy_page function '2.4 MMX version' took 9178 cycles per page
copy_page function 'faster_copy' took 11360 cycles per page
copy_page function 'even_faster' took 10133 cycles per page
copy_page function 'xmm_copy_page_no' took 8885 cycles per page
copy_page function 'xmm_copy_page' took 8725 cycles per page
copy_page function 'xmma_copy_page' took 9964 cycles per page
copy_page function 'xmm3_copy_page' took 7176 cycles per page
copy_page function 'v26_copy_page' took 6879 cycles per page
copy_page function 'nt_copy_page' took 10858 cycles per page
*RUN 2: gcc -o xmm64.o xmm64.c*
ioGL64NX_EMT64 ~ # ./xmm64.o
SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
buffer = 0x2aaaaade7000
clear_page() tests
clear_page function 'warm up run' took 13981 cycles per page
clear_page function 'kernel clear' took 6708 cycles per page
clear_page function '2.4 non MMX' took 6505 cycles per page
clear_page function '2.4 MMX fallback' took 6235 cycles per page
clear_page function '2.4 MMX version' took 7251 cycles per page
clear_page function 'faster_clear_page' took 6390 cycles per page
clear_page function 'even_faster_clear' took 5932 cycles per page
clear_page function 'xmm_clear' took 4876 cycles per page
clear_page function 'xmma_clear' took 6379 cycles per page
clear_page function 'xmm2_clear' took 5264 cycles per page
clear_page function 'xmma2_clear' took 6373 cycles per page
clear_page function 'xmm3_clear' took 6651 cycles per page
clear_page function 'nt clear ' took 5186 cycles per page
clear_page function 'kernel clear' took 6326 cycles per page
copy_page() tests
copy_page function 'warm up run' took 9537 cycles per page
copy_page function '2.4 non MMX' took 6776 cycles per page
copy_page function '2.4 MMX fallback' took 7407 cycles per page
copy_page function '2.4 MMX version' took 8812 cycles per page
copy_page function 'faster_copy' took 10992 cycles per page
copy_page function 'even_faster' took 10232 cycles per page
copy_page function 'xmm_copy_page_no' took 8918 cycles per page
copy_page function 'xmm_copy_page' took 9579 cycles per page
copy_page function 'xmma_copy_page' took 9854 cycles per page
copy_page function 'xmm3_copy_page' took 7602 cycles per page
copy_page function 'v26_copy_page' took 6811 cycles per page
copy_page function 'nt_copy_page' took 10958 cycles per page
*RUN 3: gcc -pipe -march=nocona -O2 -o xmm64.o xmm64.c
*
SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
buffer = 0x2aaaaade7000
clear_page() tests
clear_page function 'warm up run' took 13626 cycles per page
clear_page function 'kernel clear' took 6780 cycles per page
clear_page function '2.4 non MMX' took 6755 cycles per page
clear_page function '2.4 MMX fallback' took 6283 cycles per page
clear_page function '2.4 MMX version' took 6764 cycles per page
clear_page function 'faster_clear_page' took 5764 cycles per page
clear_page function 'even_faster_clear' took 5240 cycles per page
clear_page function 'xmm_clear' took 4532 cycles per page
clear_page function 'xmma_clear' took 6352 cycles per page
clear_page function 'xmm2_clear' took 4983 cycles per page
clear_page function 'xmma2_clear' took 6211 cycles per page
clear_page function 'xmm3_clear' took 6748 cycles per page
clear_page function 'nt clear ' took 5166 cycles per page
clear_page function 'kernel clear' took 6201 cycles per page
copy_page() tests
copy_page function 'warm up run' took 9651 cycles per page
copy_page function '2.4 non MMX' took 6724 cycles per page
copy_page function '2.4 MMX fallback' took 6905 cycles per page
copy_page function '2.4 MMX version' took 9722 cycles per page
copy_page function 'faster_copy' took 9738 cycles per page
copy_page function 'even_faster' took 9609 cycles per page
copy_page function 'xmm_copy_page_no' took 8846 cycles per page
copy_page function 'xmm_copy_page' took 8591 cycles per page
copy_page function 'xmma_copy_page' took 8250 cycles per page
copy_page function 'xmm3_copy_page' took 7879 cycles per page
copy_page function 'v26_copy_page' took 7512 cycles per page
copy_page function 'nt_copy_page' took 10424 cycles per page
RUN 4: *gcc -pipe -march=nocona -O2 -fPIC -o xmm64.o xmm64.c*
SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
buffer = 0x2aaaaade7000
clear_page() tests
clear_page function 'warm up run' took 13713 cycles per page
clear_page function 'kernel clear' took 6655 cycles per page
clear_page function '2.4 non MMX' took 6448 cycles per page
clear_page function '2.4 MMX fallback' took 6270 cycles per page
clear_page function '2.4 MMX version' took 7001 cycles per page
clear_page function 'faster_clear_page' took 5671 cycles per page
clear_page function 'even_faster_clear' took 5366 cycles per page
clear_page function 'xmm_clear' took 4737 cycles per page
clear_page function 'xmma_clear' took 6464 cycles per page
clear_page function 'xmm2_clear' took 5214 cycles per page
clear_page function 'xmma2_clear' took 6371 cycles per page
clear_page function 'xmm3_clear' took 6660 cycles per page
clear_page function 'nt clear ' took 5066 cycles per page
clear_page function 'kernel clear' took 6314 cycles per page
copy_page() tests
copy_page function 'warm up run' took 9464 cycles per page
copy_page function '2.4 non MMX' took 7179 cycles per page
copy_page function '2.4 MMX fallback' took 6928 cycles per page
copy_page function '2.4 MMX version' took 9091 cycles per page
copy_page function 'faster_copy' took 9996 cycles per page
copy_page function 'even_faster' took 9824 cycles per page
copy_page function 'xmm_copy_page_no' took 8724 cycles per page
copy_page function 'xmm_copy_page' took 8920 cycles per page
copy_page function 'xmma_copy_page' took 8859 cycles per page
copy_page function 'xmm3_copy_page' took 7794 cycles per page
copy_page function 'v26_copy_page' took 7808 cycles per page
copy_page function 'nt_copy_page' took 9264 cycles per page
Do you need more results or tests Benjamin?
Greets and best regards
Michael
next prev parent reply other threads:[~2005-05-30 20:42 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-30 18:16 [RFC] x86-64: Use SSE for copy_page and clear_page Benjamin LaHaise
2005-05-30 18:45 ` Jeff Garzik
2005-05-30 19:06 ` dean gaudet
2005-05-30 19:11 ` dean gaudet
2005-05-30 19:32 ` Andi Kleen
2005-05-31 8:37 ` Denis Vlasenko
2005-05-31 9:15 ` Denis Vlasenko
2005-05-31 9:23 ` Andi Kleen
2005-05-31 13:59 ` Benjamin LaHaise
2005-06-01 6:22 ` Denis Vlasenko
2005-06-01 6:47 ` Denis Vlasenko
2005-06-01 7:22 ` michael
2005-06-01 7:48 ` Andi Kleen
2005-06-01 7:48 ` Denis Vlasenko
2005-06-01 21:46 ` dean gaudet
2005-06-01 8:01 ` Nick Piggin
2005-05-30 19:38 ` Andi Kleen
2005-05-30 20:05 ` Michael Thonke
2005-05-30 20:14 ` Benjamin LaHaise
2005-05-30 20:42 ` Michael Thonke [this message]
2005-05-31 7:11 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=429B7AB5.5080400@gmail.com \
--to=iogl64nx@gmail.com \
--cc=bcrl@kvack.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.