public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Michael Thonke <iogl64nx@gmail.com>
To: Andi Kleen <ak@muc.de>
Cc: Benjamin LaHaise <bcrl@kvack.org>, linux-kernel@vger.kernel.org
Subject: Re: [RFC] x86-64: Use SSE for copy_page and clear_page
Date: Mon, 30 May 2005 22:05:28 +0200	[thread overview]
Message-ID: <429B7208.6070804@gmail.com> (raw)
In-Reply-To: <20050530193823.GD25794@muc.de>

Andi Kleen schrieb:

>>The SSE clear page fuction is almost twice as fast as the kernel's 
>>current clear_page, while the copy_page implementation is roughly a 
>>third faster.  This is likely due to the fact that SSE instructions 
>>can keep the 256 bit wide L2 cache bus at a higher utilisation than 
>>64 bit movs are able to.  Comments?
>>    
>>
>
>Any use of write combining is wrong here because it forces
>the destination out of cache, which causes performance issues later on. 
>Believe me we went through this years ago.
>
>If you can code up a better function for P4 that does not use
>write combining I would be happy to add. I never tuned the functions
>for P4. 
>
>One simple experiment would be to just test if P4 likes the
>simple rep ; movsq / rep ; stosq loops and enable them.
>  
>
No it doesn't like this sample here at all,I'll get segmentationfault on
that run.
RUN 1:

    SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
    buffer = 0x2aaaaade7000
    clear_page() tests
    clear_page function 'warm up run'        took 13516 cycles per page
    clear_page function 'kernel clear'       took 6539 cycles per page
    clear_page function '2.4 non MMX'        took 6354 cycles per page
    clear_page function '2.4 MMX fallback'   took 6205 cycles per page
    clear_page function '2.4 MMX version'    took 6830 cycles per page
    clear_page function 'faster_clear_page'  took 6240 cycles per page
    clear_page function 'even_faster_clear'  took 5746 cycles per page
    clear_page function 'xmm_clear '         took 4580 cycles per page
    Segmentation fault

    xmm64.o[9485] general protection rip:400814 rsp:7fffffc74118 error:0
    xmm64.o[9486] general protection rip:400814 rsp:7fffff8b1498 error:0
    xmm64.o[9487] general protection rip:400814 rsp:7fffffc31848 error:0

RUN 2:
Tell gcc use processor specific flags
    gcc -pipe -march=nocona -O2 -o xmm64.o xmm64.c

    SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
    buffer = 0x2aaaaade7000
    clear_page() tests
    clear_page function 'warm up run'        took 13419 cycles per page
    clear_page function 'kernel clear'       took 6403 cycles per page
    clear_page function '2.4 non MMX'        took 6290 cycles per page
    clear_page function '2.4 MMX fallback'   took 6156 cycles per page
    clear_page function '2.4 MMX version'    took 6605 cycles per page
    clear_page function 'faster_clear_page'  took 5607 cycles per page
    clear_page function 'even_faster_clear'  took 5173 cycles per page
    clear_page function 'xmm_clear '         took 4307 cycles per page
    clear_page function 'xmma_clear '        took 6230 cycles per page
    clear_page function 'xmm2_clear '        took 4908 cycles per page
    clear_page function 'xmma2_clear '       took 6256 cycles per page
    clear_page function 'kernel clear'       took 6506 cycles per page

    copy_page() tests
    copy_page function 'warm up run'         took 10352 cycles per page
    copy_page function '2.4 non MMX'         took 9440 cycles per page
    copy_page function '2.4 MMX fallback'    took 9300 cycles per page
    copy_page function '2.4 MMX version'     took 10238 cycles per page
    copy_page function 'faster_copy'         took 9497 cycles per page
    copy_page function 'even_faster'         took 9229 cycles per page
    copy_page function 'xmm_copy_page_no'    took 7810 cycles per page
    copy_page function 'xmm_copy_page'       took 7397 cycles per page
    copy_page function 'xmma_copy_page'      took 9430 cycles per page
    copy_page function 'v26_copy_page'       took 9234 cycles per page

CPU flags on Intel Pentium 4 640 x86_64 Gentoo GNU/Linux

    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
    pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
    syscall nx lm constant_tsc pni monitor ds_cpl est cid cx16 xtpr

Greets
    Michael

  reply	other threads:[~2005-05-30 20:06 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-30 18:16 [RFC] x86-64: Use SSE for copy_page and clear_page Benjamin LaHaise
2005-05-30 18:45 ` Jeff Garzik
2005-05-30 19:06 ` dean gaudet
2005-05-30 19:11   ` dean gaudet
2005-05-30 19:32     ` Andi Kleen
2005-05-31  8:37       ` Denis Vlasenko
2005-05-31  9:15         ` Denis Vlasenko
2005-05-31  9:23           ` Andi Kleen
2005-05-31 13:59             ` Benjamin LaHaise
2005-06-01  6:22               ` Denis Vlasenko
2005-06-01  6:47                 ` Denis Vlasenko
2005-06-01  7:22             ` michael
2005-06-01  7:48               ` Andi Kleen
2005-06-01  7:48               ` Denis Vlasenko
2005-06-01 21:46                 ` dean gaudet
2005-06-01  8:01               ` Nick Piggin
2005-05-30 19:38 ` Andi Kleen
2005-05-30 20:05   ` Michael Thonke [this message]
2005-05-30 20:14     ` Benjamin LaHaise
2005-05-30 20:42       ` Michael Thonke
2005-05-31  7:11     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=429B7208.6070804@gmail.com \
    --to=iogl64nx@gmail.com \
    --cc=ak@muc.de \
    --cc=bcrl@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox