All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: linux-kernel@vger.kernel.org, h.mitake@gmail.com,
	Ma Ling <ling.ma@intel.com>, Zhao Yakui <yakui.zhao@intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Paul Mackerras <paulus@samba.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH 2/2] perf bench: add x86-64 specific benchmarks to perf bench mem memcpy
Date: Mon, 1 Nov 2010 10:02:51 +0100	[thread overview]
Message-ID: <20101101090251.GA28039@elte.hu> (raw)
In-Reply-To: <4CCE51E6.7060908@dcl.info.waseda.ac.jp>


* Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> wrote:

> On 2010年10月31日 04:23, Ingo Molnar wrote:
> >
> >* Hitoshi Mitake<mitake@dcl.info.waseda.ac.jp>  wrote:
> >
> >>This patch adds new file: mem-memcpy-x86-64-asm.S
> >>for x86-64 specific memcpy() benchmarking.
> >>Added new benchmarks are,
> >>  x86-64-rep:      memcpy() implemented with rep instruction
> >>  x86-64-unrolled: unrolled memcpy()
> >>
> >>Original idea of including the source files of kernel
> >>for benchmarking is suggested by Ingo Molnar.
> >>This is more effective than write-once programs for quantitative
> >>evaluation of in-kernel, little and leaf functions called high frequently.
> >>Because perf bench is in kernel source tree and executing it
> >>on various hardwares, especially new model CPUs, is easy.
> >>
> >>This way can also be used for other functions of kernel e.g. checksum functions.
> >>
> >>Example of usage on Core i3 M330:
> >>
> >>| % ./perf bench mem memcpy -l 500MB
> >>| # Running mem/memcpy benchmark...
> >>| # Copying 500MB Bytes from 0x7f911f94c010 to 0x7f913ed4d010 ...
> >>|
> >>|      578.732506 MB/Sec
> >>| % ./perf bench mem memcpy -l 500MB -r x86-64-rep
> >>| # Running mem/memcpy benchmark...
> >>| # Copying 500MB Bytes from 0x7fb4b6fe4010 to 0x7fb4d63e5010 ...
> >>|
> >>|      738.184980 MB/Sec
> >>| % ./perf bench mem memcpy -l 500MB -r x86-64-unrolled
> >>| # Running mem/memcpy benchmark...
> >>| # Copying 500MB Bytes from 0x7f6f2e668010 to 0x7f6f4da69010 ...
> >>|
> >>|      767.483269 MB/Sec
> >>
> >>This shows clearly that unrolled memcpy() is efficient
> >>than rep version and glibc's one :)
> >
> >Hey, really cool output :-)
> >
> >Might also make sense to measure Ma Ling's patched version?
> 
> Does Ma Ling's patched version mean,
> 
> http://marc.info/?l=linux-kernel&m=128652296500989&w=2
> 
> the memcpy applied the patch of the URL?
> (It seems that this patch was written by Miao Xie.)
> 
> I'll include the result of patched version in the next post.

(Indeed it is Miao Xie - sorry!)

> >># checkpatch.pl warns about two externs in bench/mem-memcpy.c
> >># added by this patch. But I think it is no problem.
> >
> >You should put these:
> >
> >  +#ifdef ARCH_X86_64
> >  +extern void *memcpy_x86_64_unrolled(void *to, const void *from, size_t len);
> >  +extern void *memcpy_x86_64_rep(void *to, const void *from, size_t len);
> >  +#endif
> >
> >into a .h file - a new one if needed.
> >
> >That will make both checkpatch and me happier ;-)
> >
> 
> OK, I'll separate these files.
> 
> BTW, I found really interesting evaluation result.
> Current results of "perf bench mem memcpy" include
> the overhead of page faults because the measured memcpy()
> is the first access to allocated memory area.
> 
> I tested the another version of perf bench mem memcpy,
> which does memcpy() before measured memcpy() for removing
> the overhead come from page faults.
> 
> And this is the result:
> 
> % ./perf bench mem memcpy -l 500MB -r x86-64-unrolled
> # Running mem/memcpy benchmark...
> # Copying 500MB Bytes from 0x7f19d488f010 to 0x7f19f3c90010 ...
> 
>        4.608340 GB/Sec
> 
> % ./perf bench mem memcpy -l 500MB
> # Running mem/memcpy benchmark...
> # Copying 500MB Bytes from 0x7f696c3cc010 to 0x7f698b7cd010 ...
> 
>        4.856442 GB/Sec
> 
> % ./perf bench mem memcpy -l 500MB -r x86-64-rep
> # Running mem/memcpy benchmark...
> # Copying 500MB Bytes from 0x7f45d6cff010 to 0x7f45f6100010 ...
> 
>        6.024445 GB/Sec
> 
> The relation of scores reversed!
> I cannot explain the cause of this result, and
> this is really interesting phenomenon.

Interesting indeed, and it would be nice to analyse that! (It should be possible, 
using various PMU metrics in a clever way, to figure out what's happening inside the 
CPU, right?)

> So I'd like to add new command line option,
> like "--pre-page-faults" to perf bench mem memcpy,
> for doing memcpy() before measured memcpy().
> 
> How do you think about this idea?

Agreed. (Maybe name it --prefault, as 'prefaulting' is the term we generally use for 
things like this.)

An even better solution would be to output _both_ results by default, so that people 
can see both characteristics at a glance?

Thanks,

	Ingo

  reply	other threads:[~2010-11-01  9:03 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-29 16:01 [PATCH 1/2] perf bench: port memcpy_64.S to perf bench Hitoshi Mitake
2010-10-29 16:01 ` [PATCH 2/2] perf bench: add x86-64 specific benchmarks to perf bench mem memcpy Hitoshi Mitake
2010-10-30 19:23   ` Ingo Molnar
2010-11-01  5:36     ` Hitoshi Mitake
2010-11-01  9:02       ` Ingo Molnar [this message]
2010-11-05 17:05         ` Hitoshi Mitake
2010-11-10  9:12           ` Ingo Molnar
2010-11-12 15:01             ` Hitoshi Mitake
2010-11-12 15:02               ` [PATCH] perf bench: print both of prefaulted and no prefaulted results Hitoshi Mitake
2010-11-18  7:58                 ` Ingo Molnar
2010-11-25  7:04                   ` Hitoshi Mitake
2010-11-25  7:04                     ` [PATCH v2 1/2] " Hitoshi Mitake
2010-11-26 10:30                       ` [tip:perf/core] perf bench: Print both of prefaulted and no prefaulted results by default tip-bot for Hitoshi Mitake
     [not found]                         ` <4D03B1AD.7000606@dcl.info.waseda.ac.jp>
2010-12-12 13:46                           ` perf monitoring triggers Was: " Arnaldo Carvalho de Melo
2010-12-13 11:14                             ` Peter Zijlstra
2010-12-13 12:38                               ` Arnaldo Carvalho de Melo
2010-12-13 12:40                                 ` Peter Zijlstra
2010-12-13 13:12                                   ` Arnaldo Carvalho de Melo
2010-12-13 17:37                                     ` Hitoshi Mitake
2010-12-14  5:46                                       ` [RFC PATCH 1/2] perf stat: wait on unix domain socket before calling sys_perf_event_open() Hitoshi Mitake
2010-12-14  5:46                                       ` [RFC PATCH 2/2] perf bench: more fine grain monitoring for prefault memcpy() Hitoshi Mitake
2010-11-25  7:04                     ` [PATCH v2 2/2] perf bench: port arch/x86/lib/memcpy_64.S to perf bench mem memcpy Hitoshi Mitake
2010-11-26 10:31                       ` [tip:perf/core] perf bench: Add feature that measures the performance of the arch/x86/lib/memcpy_64.S memcpy routines via 'perf bench mem' tip-bot for Hitoshi Mitake
2010-11-29 13:26                         ` Hitoshi Mitake
2011-01-11 16:27         ` [PATCH 2/2] perf bench: add x86-64 specific benchmarks to perf bench mem memcpy Hitoshi Mitake
2010-10-29 19:49 ` [PATCH 1/2] perf bench: port memcpy_64.S to perf bench Peter Zijlstra
2010-10-30 19:21   ` Ingo Molnar
     [not found]     ` <4D0CE05C.1070600@dcl.info.waseda.ac.jp>
2010-12-20  6:30       ` Miao Xie
2010-12-20 15:34         ` Hitoshi Mitake
     [not found]   ` <20101029210824.GB13385@ghostprotocols.net>
2010-11-05 17:10     ` Hitoshi Mitake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101101090251.GA28039@elte.hu \
    --to=mingo@elte.hu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=h.mitake@gmail.com \
    --cc=hpa@zytor.com \
    --cc=ling.ma@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mitake@dcl.info.waseda.ac.jp \
    --cc=paulus@samba.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=yakui.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.