linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Namhyung Kim <namhyung@kernel.org>,
	David Ahern <dsahern@gmail.com>, Jiri Olsa <jolsa@redhat.com>,
	Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 02/14] perf/bench: Default to all routines in 'perf bench mem'
Date: Mon, 19 Oct 2015 19:47:44 +0200	[thread overview]
Message-ID: <20151019174744.GA2031@gmail.com> (raw)
In-Reply-To: <CA+55aFzAZwC69kZg3W8Mp6ERQOXXY3=U7jKpWqxDWi6W=6Z12Q@mail.gmail.com>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Oct 19, 2015 at 1:04 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >         triton:~> perf bench mem all
> >         # Running mem/memcpy benchmark...
> >         Routine default (Default memcpy() provided by glibc)
> >                4.957170 GB/Sec (with prefault)
> >         Routine x86-64-unrolled (unrolled memcpy() in arch/x86/lib/memcpy_64.S)
> >                4.379204 GB/Sec (with prefault)
> >         Routine x86-64-movsq (movsq-based memcpy() in arch/x86/lib/memcpy_64.S)
> >                4.264465 GB/Sec (with prefault)
> >         Routine x86-64-movsb (movsb-based memcpy() in arch/x86/lib/memcpy_64.S)
> >                6.554111 GB/Sec (with prefault)
> 
> Is this skylake? And why are the numbers so low? Even on my laptop
> (Haswell), I get ~21GB/s (when setting cpufreq to performance).

No, this was on my desktop, which is a water cooled IvyBridge running at 3.6GHz:

 processor       : 11
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 62
 model name      : Intel(R) Core(TM) i7-4960X CPU @ 3.60GHz
 stepping        : 4
 microcode       : 0x416
 cpu MHz         : 1303.031
 cache size      : 15360 KB

and I didn't really think about the validity of the numbers when I made the 
changelog, as I rarely benchmark on this box, due to it having various desktop 
loads running all the time.

AAs you noticed the results are highly variable with default settings:

  triton:~/tip> taskset 1 perf stat --null --repeat 10 perf bench mem memcpy -f x86-64-movsb 2>&1 | grep GB
       5.580357 GB/sec
       5.580357 GB/sec
      16.551907 GB/sec
      16.551907 GB/sec
      15.258789 GB/sec
      16.837284 GB/sec
      16.837284 GB/sec
      16.837284 GB/sec
      16.551907 GB/sec
      16.837284 GB/sec

They get more reliable with '-l 10000' (10,000 loops instead of the default 1):

  triton:~/tip> taskset 1 perf stat --null --repeat 10 perf bench mem memcpy -f x86-64-movsb -l 10000 2>&1 | grep GB
      15.483591 GB/sec
      16.975429 GB/sec
      17.088396 GB/sec
      20.920407 GB/sec
      21.346655 GB/sec
      21.322372 GB/sec
      21.338306 GB/sec
      21.342130 GB/sec
      21.339984 GB/sec
      21.373145 GB/sec

that's purely cached. Also note how after a few seconds it gets faster, due to 
cpufreq as you suspected.

So once I fix the frequency of all cores to the max, I get much more reliable 
results:

  triton:~/tip> taskset 1 perf stat --null --repeat 10 perf bench mem memcpy -f x86-64-movsb -l 10000 2>&1 | grep -E 'GB|elaps'
      21.356879 GB/sec
      21.378526 GB/sec
      21.351976 GB/sec
      21.375203 GB/sec
      21.369824 GB/sec
      21.353236 GB/sec
      21.283708 GB/sec
      21.380679 GB/sec
      21.347915 GB/sec
      21.378572 GB/sec
       0.459286278 seconds time elapsed                                          ( +-  0.04% )

I'll add a debug check to 'perf bench' to warn about systems that have variable 
cpufreq running - this is too easy a mistake to make :-/

So with the benchmark stabilized, I get the following results:

  triton:~/tip> taskset 1 perf bench mem memcpy -f all -l 10000
  # Running 'mem/memcpy' benchmark:
  # function 'default' (Default memcpy() provided by glibc)
  # Copying 1MB bytes ...
 
        18.356783 GB/sec
  # function 'x86-64-unrolled' (unrolled memcpy() in arch/x86/lib/memcpy_64.S)
  # Copying 1MB bytes ...
 
        16.294889 GB/sec
  # function 'x86-64-movsq' (movsq-based memcpy() in arch/x86/lib/memcpy_64.S)
  # Copying 1MB bytes ...
 
        15.760032 GB/sec
  # function 'x86-64-movsb' (movsb-based memcpy() in arch/x86/lib/memcpy_64.S)
  # Copying 1MB bytes ...

        21.145818 GB/sec

which matches your observations:

> It's interesting that 'movsb' for you is so much better. It's been
> promising before, and it *should* be able to do better than manual
> copying, but it's not been that noticeable on the machines I've
> tested. But I haven't ued Skylake or Broadwell yet.
> 
> cpufreq might be making a difference too. Maybe it's just ramping up
> the CPU? Or is that really repeatable?

So modulo the cpufreq multiplier it seems repeatable on this IB system - will try 
it on SkyLake as well.

Before relying on it I also wanted to implement the following 'perf bench' 
improvements:

 - make it more representative of kernel usage by benchmarking a list of
   characteristic lengths, not just the single stupid 1MB buffer. At smaller 
   buffer sizes I'd expect MOVSB to have even more of a fundamental advantage (due 
   to having all the differentiation in hardware) - but we don't know the
   latencies of those cases, some of which are in microcode I suspect.

 - measure aligned/unaligned buffer address and length effects as well

 - measure cache-cold numbers as well. This is pretty hard but not impossible.

With that we could start validating our fundamental memory op routines in 
user-space.

Thanks,

	Ingo

  reply	other threads:[~2015-10-19 17:47 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-19  8:04 [PATCH 00/14] perf bench: Misc improvements Ingo Molnar
2015-10-19  8:04 ` [PATCH 01/14] perf/bench: Improve the 'perf bench mem memcpy' code readability Ingo Molnar
2015-10-20  7:43   ` [tip:perf/core] perf bench: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 02/14] perf/bench: Default to all routines in 'perf bench mem' Ingo Molnar
2015-10-19 15:21   ` Linus Torvalds
2015-10-19 17:47     ` Ingo Molnar [this message]
2015-10-20  7:43   ` [tip:perf/core] perf bench: Default to all routines in ' perf " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 03/14] perf/bench: Eliminate unused argument from bench_mem_common() Ingo Molnar
2015-10-20  7:44   ` [tip:perf/core] perf bench: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 04/14] perf/bench: Rename 'mem-memcpy.c' => 'mem-functions.c' Ingo Molnar
2015-10-19 18:35   ` Arnaldo Carvalho de Melo
2015-10-19 18:37     ` Arnaldo Carvalho de Melo
2015-10-20  7:44   ` [tip:perf/core] perf bench: Rename 'mem-memcpy.c' => ' mem-functions.c' tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 05/14] perf/bench: Remove the prefaulting complication from 'perf bench mem mem*' Ingo Molnar
2015-10-20  7:44   ` [tip:perf/core] perf bench: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 06/14] perf/bench: List output formatting options on 'perf bench -h' Ingo Molnar
2015-10-19 13:28   ` David Ahern
2015-10-19 18:51     ` Arnaldo Carvalho de Melo
2015-10-20  7:45   ` [tip:perf/core] perf bench: List output formatting options on ' perf " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 07/14] perf/bench/mem: Change 'cycle' to 'cycles' Ingo Molnar
2015-10-20  7:45   ` [tip:perf/core] perf bench mem: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 08/14] perf/bench/mem: Rename 'routine' to 'routine_str' Ingo Molnar
2015-10-19 13:34   ` David Ahern
2015-10-19 18:54     ` Arnaldo Carvalho de Melo
2015-10-19 18:56       ` Arnaldo Carvalho de Melo
2015-10-19 19:09         ` Ingo Molnar
2015-10-19 19:20           ` Arnaldo Carvalho de Melo
2015-10-19 19:21             ` Arnaldo Carvalho de Melo
2015-10-20  7:36               ` Ingo Molnar
2015-10-20  7:45   ` [tip:perf/core] perf bench mem: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 09/14] perf/bench/mem: Fix 'length' vs. 'size' naming confusion Ingo Molnar
2015-10-20  7:46   ` [tip:perf/core] perf bench mem: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 10/14] perf/bench/mem: Improve user visible strings Ingo Molnar
2015-10-20  7:46   ` [tip:perf/core] perf bench mem: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 11/14] perf/bench/mem: Reorganize the code a bit Ingo Molnar
2015-10-20  7:46   ` [tip:perf/core] perf bench mem: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 12/14] perf/bench: Harmonize all the -l/--nr_loops options Ingo Molnar
2015-10-20  7:47   ` [tip:perf/core] perf bench: Harmonize all the -l/ --nr_loops options tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 13/14] perf/bench/mem: Rename 'routine' to 'function' Ingo Molnar
2015-10-20  7:47   ` [tip:perf/core] perf bench mem: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 14/14] perf/bench: Run benchmarks, don't test them Ingo Molnar
2015-10-20  7:47   ` [tip:perf/core] perf bench: " tip-bot for Ingo Molnar
2015-10-19 13:40 ` [PATCH 00/14] perf bench: Misc improvements David Ahern
2015-10-19 19:01   ` Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151019174744.GA2031@gmail.com \
    --to=mingo@kernel.org \
    --cc=acme@redhat.com \
    --cc=dsahern@gmail.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mitake@dcl.info.waseda.ac.jp \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).