From: Borislav Petkov <bp@amd64.org>
To: "Valdis.Kletnieks@vt.edu" <Valdis.Kletnieks@vt.edu>
Cc: Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@elte.hu>,
melwyn lobo <linux.melwyn@gmail.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
Thomas Gleixner <tglx@linutronix.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: x86 memcpy performance
Date: Tue, 16 Aug 2011 14:16:04 +0200 [thread overview]
Message-ID: <20110816121604.GA29251@aftab> (raw)
In-Reply-To: <6296.1313462075@turing-police.cc.vt.edu>
[-- Attachment #1: Type: text/plain, Size: 2448 bytes --]
On Mon, Aug 15, 2011 at 10:34:35PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Sun, 14 Aug 2011 11:59:10 +0200, Borislav Petkov said:
>
> > Benchmarking with 10000 iterations, average results:
> > size XM MM speedup
> > 119 540.58 449.491 0.8314969419
>
> > 12273 2307.86 4042.88 1.751787902
> > 13924 2431.8 4224.48 1.737184756
> > 14335 2469.4 4218.82 1.708440514
> > 15018 2675.67 1904.07 0.711622886
> > 16374 2989.75 5296.26 1.771470902
> > 24564 4262.15 7696.86 1.805863077
> > 27852 4362.53 3347.72 0.7673805572
> > 28672 5122.8 7113.14 1.388524413
> > 30033 4874.62 8740.04 1.792967931
>
> The numbers for 15018 and 27852 are *way* odd for the MM case. I don't feel
> really good about this till we understand what happened for those two cases.
Yep.
> Also, anytime I see "10000 iterations", I ask myself if the benchmark
> rigging took proper note of hot/cold cache issues. That *may* explain
> the two oddball results we see above - but not knowing more about how
> it was benched, it's hard to say.
Yeah, the more scrutiny this gets the better. So I've cleaned up my
setup and have attached it.
xm_mem.c does the benchmarking and in bench_memcpy() there's the
sse_memcpy call which is the SSE memcpy implementation using inline asm.
It looks like gcc produces pretty crappy code here because if I replace
the sse_memcpy call with xm_memcpy() from xm_memcpy.S - this is the
same function but in pure asm - I get much better numbers, sometimes
even over 2x. It all depends on the alignment of the buffers though.
Also, those numbers don't include the context saving/restoring which the
kernel does for us.
7491 1509.89 2346.94 1.554378381
8170 2166.81 2857.78 1.318890326
12277 2659.03 4179.31 1.571744176
13907 2571.24 4125.7 1.604558427
14319 2638.74 5799.67 2.19789466 <----
14993 2752.42 4413.85 1.603625603
16371 3479.11 5562.65 1.59887055
So please take a look and let me know what you think.
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
[-- Attachment #2: sse_memcpy.tar.bz2 --]
[-- Type: application/octet-stream, Size: 3508 bytes --]
next prev parent reply other threads:[~2011-08-16 12:16 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-12 17:59 x86 memcpy performance melwyn lobo
2011-08-12 18:33 ` Andi Kleen
2011-08-12 19:52 ` Ingo Molnar
2011-08-14 9:59 ` Borislav Petkov
2011-08-14 11:13 ` Denys Vlasenko
2011-08-14 12:40 ` Borislav Petkov
2011-08-15 13:27 ` melwyn lobo
2011-08-15 13:44 ` Denys Vlasenko
2011-08-16 2:34 ` Valdis.Kletnieks
2011-08-16 12:16 ` Borislav Petkov [this message]
2011-09-01 15:15 ` Maarten Lankhorst
2011-09-01 16:18 ` Linus Torvalds
2011-09-08 8:35 ` Borislav Petkov
2011-09-08 10:58 ` Maarten Lankhorst
2011-09-09 8:14 ` Borislav Petkov
2011-09-09 10:12 ` Maarten Lankhorst
2011-09-09 11:23 ` Maarten Lankhorst
2011-09-09 13:42 ` Borislav Petkov
2011-09-09 14:39 ` Linus Torvalds
2011-09-09 15:35 ` Borislav Petkov
2011-12-05 12:20 ` melwyn lobo
2011-12-05 12:54 ` melwyn lobo
2011-12-05 14:36 ` Alan Cox
-- strict thread matches above, loose matches on Subject: below --
2011-08-15 14:55 Borislav Petkov
2011-08-15 14:59 ` Andy Lutomirski
2011-08-15 15:29 ` Borislav Petkov
2011-08-15 15:36 ` Andrew Lutomirski
2011-08-15 16:12 ` Borislav Petkov
2011-08-15 17:04 ` Andrew Lutomirski
2011-08-15 18:49 ` Borislav Petkov
2011-08-15 19:11 ` Andrew Lutomirski
2011-08-15 20:05 ` Borislav Petkov
2011-08-15 20:08 ` Andrew Lutomirski
2011-08-15 16:12 ` H. Peter Anvin
2011-08-15 16:58 ` Andrew Lutomirski
2011-08-15 18:26 ` H. Peter Anvin
2011-08-15 18:35 ` Andrew Lutomirski
2011-08-15 18:52 ` H. Peter Anvin
2011-08-16 7:19 ` melwyn lobo
2011-08-16 7:43 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110816121604.GA29251@aftab \
--to=bp@amd64.org \
--cc=Valdis.Kletnieks@vt.edu \
--cc=a.p.zijlstra@chello.nl \
--cc=bp@alien8.de \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux.melwyn@gmail.com \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.