public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Michael Thonke <iogl64nx@gmail.com>
To: Benjamin LaHaise <bcrl@kvack.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RFC] x86-64: Use SSE for copy_page and clear_page
Date: Mon, 30 May 2005 22:42:29 +0200	[thread overview]
Message-ID: <429B7AB5.5080400@gmail.com> (raw)
In-Reply-To: <20050530201419.GB10212@kvack.org>

Benjamin LaHaise schrieb:

>On Mon, May 30, 2005 at 10:05:28PM +0200, Michael Thonke wrote:
>  
>
>>No it doesn't like this sample here at all,I'll get segmentationfault on
>>that run.
>>    
>>
>
>Grab a new copy -- one of the routines had an unaligned store instead of 
>aligned for the register save.
>
>		-ben
>
>  
>
Hi Benjamin,

Here are the results with the new copy.

    *RUN 1: cc -o xmm64.o xmm64.c*

    ioGL64NX_EMT64 ~ # ./xmm64.o
    SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
    buffer = 0x2aaaaade7000
    clear_page() tests
    clear_page function 'warm up run'        took 13632 cycles per page
    clear_page function 'kernel clear'       took 6599 cycles per page
    clear_page function '2.4 non MMX'        took 6482 cycles per page
    clear_page function '2.4 MMX fallback'   took 6367 cycles per page
    clear_page function '2.4 MMX version'    took 6644 cycles per page
    clear_page function 'faster_clear_page'  took 6088 cycles per page
    clear_page function 'even_faster_clear'  took 5692 cycles per page
    clear_page function 'xmm_clear'  took 4270 cycles per page
    clear_page function 'xmma_clear'         took 6351 cycles per page
    clear_page function 'xmm2_clear'         took 4710 cycles per page
    clear_page function 'xmma2_clear'        took 6198 cycles per page
    clear_page function 'xmm3_clear'         took 6583 cycles per page
    clear_page function 'nt clear  '         took 4746 cycles per page
    clear_page function 'kernel clear'       took 6158 cycles per page

    copy_page() tests
    copy_page function 'warm up run'         took 9210 cycles per page
    copy_page function '2.4 non MMX'         took 6740 cycles per page
    copy_page function '2.4 MMX fallback'    took 6697 cycles per page
    copy_page function '2.4 MMX version'     took 9178 cycles per page
    copy_page function 'faster_copy'         took 11360 cycles per page
    copy_page function 'even_faster'         took 10133 cycles per page
    copy_page function 'xmm_copy_page_no'    took 8885 cycles per page
    copy_page function 'xmm_copy_page'       took 8725 cycles per page
    copy_page function 'xmma_copy_page'      took 9964 cycles per page
    copy_page function 'xmm3_copy_page'      took 7176 cycles per page
    copy_page function 'v26_copy_page'       took 6879 cycles per page
    copy_page function 'nt_copy_page'        took 10858 cycles per page


    *RUN 2: gcc -o xmm64.o xmm64.c*

    ioGL64NX_EMT64 ~ # ./xmm64.o
    SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
    buffer = 0x2aaaaade7000
    clear_page() tests
    clear_page function 'warm up run'        took 13981 cycles per page
    clear_page function 'kernel clear'       took 6708 cycles per page
    clear_page function '2.4 non MMX'        took 6505 cycles per page
    clear_page function '2.4 MMX fallback'   took 6235 cycles per page
    clear_page function '2.4 MMX version'    took 7251 cycles per page
    clear_page function 'faster_clear_page'  took 6390 cycles per page
    clear_page function 'even_faster_clear'  took 5932 cycles per page
    clear_page function 'xmm_clear'  took 4876 cycles per page
    clear_page function 'xmma_clear'         took 6379 cycles per page
    clear_page function 'xmm2_clear'         took 5264 cycles per page
    clear_page function 'xmma2_clear'        took 6373 cycles per page
    clear_page function 'xmm3_clear'         took 6651 cycles per page
    clear_page function 'nt clear  '         took 5186 cycles per page
    clear_page function 'kernel clear'       took 6326 cycles per page

    copy_page() tests
    copy_page function 'warm up run'         took 9537 cycles per page
    copy_page function '2.4 non MMX'         took 6776 cycles per page
    copy_page function '2.4 MMX fallback'    took 7407 cycles per page
    copy_page function '2.4 MMX version'     took 8812 cycles per page
    copy_page function 'faster_copy'         took 10992 cycles per page
    copy_page function 'even_faster'         took 10232 cycles per page
    copy_page function 'xmm_copy_page_no'    took 8918 cycles per page
    copy_page function 'xmm_copy_page'       took 9579 cycles per page
    copy_page function 'xmma_copy_page'      took 9854 cycles per page
    copy_page function 'xmm3_copy_page'      took 7602 cycles per page
    copy_page function 'v26_copy_page'       took 6811 cycles per page
    copy_page function 'nt_copy_page'        took 10958 cycles per page

    *RUN 3: gcc -pipe -march=nocona -O2 -o xmm64.o xmm64.c
    *
    SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
    buffer = 0x2aaaaade7000
    clear_page() tests
    clear_page function 'warm up run'        took 13626 cycles per page
    clear_page function 'kernel clear'       took 6780 cycles per page
    clear_page function '2.4 non MMX'        took 6755 cycles per page
    clear_page function '2.4 MMX fallback'   took 6283 cycles per page
    clear_page function '2.4 MMX version'    took 6764 cycles per page
    clear_page function 'faster_clear_page'  took 5764 cycles per page
    clear_page function 'even_faster_clear'  took 5240 cycles per page
    clear_page function 'xmm_clear'  took 4532 cycles per page
    clear_page function 'xmma_clear'         took 6352 cycles per page
    clear_page function 'xmm2_clear'         took 4983 cycles per page
    clear_page function 'xmma2_clear'        took 6211 cycles per page
    clear_page function 'xmm3_clear'         took 6748 cycles per page
    clear_page function 'nt clear  '         took 5166 cycles per page
    clear_page function 'kernel clear'       took 6201 cycles per page

    copy_page() tests
    copy_page function 'warm up run'         took 9651 cycles per page
    copy_page function '2.4 non MMX'         took 6724 cycles per page
    copy_page function '2.4 MMX fallback'    took 6905 cycles per page
    copy_page function '2.4 MMX version'     took 9722 cycles per page
    copy_page function 'faster_copy'         took 9738 cycles per page
    copy_page function 'even_faster'         took 9609 cycles per page
    copy_page function 'xmm_copy_page_no'    took 8846 cycles per page
    copy_page function 'xmm_copy_page'       took 8591 cycles per page
    copy_page function 'xmma_copy_page'      took 8250 cycles per page
    copy_page function 'xmm3_copy_page'      took 7879 cycles per page
    copy_page function 'v26_copy_page'       took 7512 cycles per page
    copy_page function 'nt_copy_page'        took 10424 cycles per page

    RUN 4: *gcc -pipe -march=nocona -O2 -fPIC -o xmm64.o xmm64.c*

    SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
    buffer = 0x2aaaaade7000
    clear_page() tests
    clear_page function 'warm up run'        took 13713 cycles per page
    clear_page function 'kernel clear'       took 6655 cycles per page
    clear_page function '2.4 non MMX'        took 6448 cycles per page
    clear_page function '2.4 MMX fallback'   took 6270 cycles per page
    clear_page function '2.4 MMX version'    took 7001 cycles per page
    clear_page function 'faster_clear_page'  took 5671 cycles per page
    clear_page function 'even_faster_clear'  took 5366 cycles per page
    clear_page function 'xmm_clear'  took 4737 cycles per page
    clear_page function 'xmma_clear'         took 6464 cycles per page
    clear_page function 'xmm2_clear'         took 5214 cycles per page
    clear_page function 'xmma2_clear'        took 6371 cycles per page
    clear_page function 'xmm3_clear'         took 6660 cycles per page
    clear_page function 'nt clear  '         took 5066 cycles per page
    clear_page function 'kernel clear'       took 6314 cycles per page

    copy_page() tests
    copy_page function 'warm up run'         took 9464 cycles per page
    copy_page function '2.4 non MMX'         took 7179 cycles per page
    copy_page function '2.4 MMX fallback'    took 6928 cycles per page
    copy_page function '2.4 MMX version'     took 9091 cycles per page
    copy_page function 'faster_copy'         took 9996 cycles per page
    copy_page function 'even_faster'         took 9824 cycles per page
    copy_page function 'xmm_copy_page_no'    took 8724 cycles per page
    copy_page function 'xmm_copy_page'       took 8920 cycles per page
    copy_page function 'xmma_copy_page'      took 8859 cycles per page
    copy_page function 'xmm3_copy_page'      took 7794 cycles per page
    copy_page function 'v26_copy_page'       took 7808 cycles per page
    copy_page function 'nt_copy_page'        took 9264 cycles per page

    Do you need more results or tests Benjamin?

    Greets and best regards
        Michael


  reply	other threads:[~2005-05-30 20:42 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-30 18:16 [RFC] x86-64: Use SSE for copy_page and clear_page Benjamin LaHaise
2005-05-30 18:45 ` Jeff Garzik
2005-05-30 19:06 ` dean gaudet
2005-05-30 19:11   ` dean gaudet
2005-05-30 19:32     ` Andi Kleen
2005-05-31  8:37       ` Denis Vlasenko
2005-05-31  9:15         ` Denis Vlasenko
2005-05-31  9:23           ` Andi Kleen
2005-05-31 13:59             ` Benjamin LaHaise
2005-06-01  6:22               ` Denis Vlasenko
2005-06-01  6:47                 ` Denis Vlasenko
2005-06-01  7:22             ` michael
2005-06-01  7:48               ` Andi Kleen
2005-06-01  7:48               ` Denis Vlasenko
2005-06-01 21:46                 ` dean gaudet
2005-06-01  8:01               ` Nick Piggin
2005-05-30 19:38 ` Andi Kleen
2005-05-30 20:05   ` Michael Thonke
2005-05-30 20:14     ` Benjamin LaHaise
2005-05-30 20:42       ` Michael Thonke [this message]
2005-05-31  7:11     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=429B7AB5.5080400@gmail.com \
    --to=iogl64nx@gmail.com \
    --cc=bcrl@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox