Re: [PATCH 02/14] perf/bench: Default to all routines in 'perf bench mem'

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Namhyung Kim <namhyung@kernel.org>,
	David Ahern <dsahern@gmail.com>, Jiri Olsa <jolsa@redhat.com>,
	Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 02/14] perf/bench: Default to all routines in 'perf bench mem'
Date: Mon, 19 Oct 2015 19:47:44 +0200	[thread overview]
Message-ID: <20151019174744.GA2031@gmail.com> (raw)
In-Reply-To: <CA+55aFzAZwC69kZg3W8Mp6ERQOXXY3=U7jKpWqxDWi6W=6Z12Q@mail.gmail.com>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Oct 19, 2015 at 1:04 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >         triton:~> perf bench mem all
> >         # Running mem/memcpy benchmark...
> >         Routine default (Default memcpy() provided by glibc)
> >                4.957170 GB/Sec (with prefault)
> >         Routine x86-64-unrolled (unrolled memcpy() in arch/x86/lib/memcpy_64.S)
> >                4.379204 GB/Sec (with prefault)
> >         Routine x86-64-movsq (movsq-based memcpy() in arch/x86/lib/memcpy_64.S)
> >                4.264465 GB/Sec (with prefault)
> >         Routine x86-64-movsb (movsb-based memcpy() in arch/x86/lib/memcpy_64.S)
> >                6.554111 GB/Sec (with prefault)
> 
> Is this skylake? And why are the numbers so low? Even on my laptop
> (Haswell), I get ~21GB/s (when setting cpufreq to performance).

No, this was on my desktop, which is a water cooled IvyBridge running at 3.6GHz:

 processor       : 11
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 62
 model name      : Intel(R) Core(TM) i7-4960X CPU @ 3.60GHz
 stepping        : 4
 microcode       : 0x416
 cpu MHz         : 1303.031
 cache size      : 15360 KB

and I didn't really think about the validity of the numbers when I made the 
changelog, as I rarely benchmark on this box, due to it having various desktop 
loads running all the time.

AAs you noticed the results are highly variable with default settings:

  triton:~/tip> taskset 1 perf stat --null --repeat 10 perf bench mem memcpy -f x86-64-movsb 2>&1 | grep GB
       5.580357 GB/sec
       5.580357 GB/sec
      16.551907 GB/sec
      16.551907 GB/sec
      15.258789 GB/sec
      16.837284 GB/sec
      16.837284 GB/sec
      16.837284 GB/sec
      16.551907 GB/sec
      16.837284 GB/sec

They get more reliable with '-l 10000' (10,000 loops instead of the default 1):

  triton:~/tip> taskset 1 perf stat --null --repeat 10 perf bench mem memcpy -f x86-64-movsb -l 10000 2>&1 | grep GB
      15.483591 GB/sec
      16.975429 GB/sec
      17.088396 GB/sec
      20.920407 GB/sec
      21.346655 GB/sec
      21.322372 GB/sec
      21.338306 GB/sec
      21.342130 GB/sec
      21.339984 GB/sec
      21.373145 GB/sec

that's purely cached. Also note how after a few seconds it gets faster, due to 
cpufreq as you suspected.

So once I fix the frequency of all cores to the max, I get much more reliable 
results:

  triton:~/tip> taskset 1 perf stat --null --repeat 10 perf bench mem memcpy -f x86-64-movsb -l 10000 2>&1 | grep -E 'GB|elaps'
      21.356879 GB/sec
      21.378526 GB/sec
      21.351976 GB/sec
      21.375203 GB/sec
      21.369824 GB/sec
      21.353236 GB/sec
      21.283708 GB/sec
      21.380679 GB/sec
      21.347915 GB/sec
      21.378572 GB/sec
       0.459286278 seconds time elapsed                                          ( +-  0.04% )

I'll add a debug check to 'perf bench' to warn about systems that have variable 
cpufreq running - this is too easy a mistake to make :-/

So with the benchmark stabilized, I get the following results:

  triton:~/tip> taskset 1 perf bench mem memcpy -f all -l 10000
  # Running 'mem/memcpy' benchmark:
  # function 'default' (Default memcpy() provided by glibc)
  # Copying 1MB bytes ...
 
        18.356783 GB/sec
  # function 'x86-64-unrolled' (unrolled memcpy() in arch/x86/lib/memcpy_64.S)
  # Copying 1MB bytes ...
 
        16.294889 GB/sec
  # function 'x86-64-movsq' (movsq-based memcpy() in arch/x86/lib/memcpy_64.S)
  # Copying 1MB bytes ...
 
        15.760032 GB/sec
  # function 'x86-64-movsb' (movsb-based memcpy() in arch/x86/lib/memcpy_64.S)
  # Copying 1MB bytes ...

        21.145818 GB/sec

which matches your observations:

> It's interesting that 'movsb' for you is so much better. It's been
> promising before, and it *should* be able to do better than manual
> copying, but it's not been that noticeable on the machines I've
> tested. But I haven't ued Skylake or Broadwell yet.
> 
> cpufreq might be making a difference too. Maybe it's just ramping up
> the CPU? Or is that really repeatable?

So modulo the cpufreq multiplier it seems repeatable on this IB system - will try 
it on SkyLake as well.

Before relying on it I also wanted to implement the following 'perf bench' 
improvements:

 - make it more representative of kernel usage by benchmarking a list of
   characteristic lengths, not just the single stupid 1MB buffer. At smaller 
   buffer sizes I'd expect MOVSB to have even more of a fundamental advantage (due 
   to having all the differentiation in hardware) - but we don't know the
   latencies of those cases, some of which are in microcode I suspect.

 - measure aligned/unaligned buffer address and length effects as well

 - measure cache-cold numbers as well. This is pretty hard but not impossible.

With that we could start validating our fundamental memory op routines in 
user-space.

Thanks,

	Ingo

next prev parent reply	other threads:[~2015-10-19 17:47 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-19  8:04 [PATCH 00/14] perf bench: Misc improvements Ingo Molnar
2015-10-19  8:04 ` [PATCH 01/14] perf/bench: Improve the 'perf bench mem memcpy' code readability Ingo Molnar
2015-10-20  7:43   ` [tip:perf/core] perf bench: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 02/14] perf/bench: Default to all routines in 'perf bench mem' Ingo Molnar
2015-10-19 15:21   ` Linus Torvalds
2015-10-19 17:47     ` Ingo Molnar [this message]
2015-10-20  7:43   ` [tip:perf/core] perf bench: Default to all routines in ' perf " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 03/14] perf/bench: Eliminate unused argument from bench_mem_common() Ingo Molnar
2015-10-20  7:44   ` [tip:perf/core] perf bench: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 04/14] perf/bench: Rename 'mem-memcpy.c' => 'mem-functions.c' Ingo Molnar
2015-10-19 18:35   ` Arnaldo Carvalho de Melo
2015-10-19 18:37     ` Arnaldo Carvalho de Melo
2015-10-20  7:44   ` [tip:perf/core] perf bench: Rename 'mem-memcpy.c' => ' mem-functions.c' tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 05/14] perf/bench: Remove the prefaulting complication from 'perf bench mem mem*' Ingo Molnar
2015-10-20  7:44   ` [tip:perf/core] perf bench: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 06/14] perf/bench: List output formatting options on 'perf bench -h' Ingo Molnar
2015-10-19 13:28   ` David Ahern
2015-10-19 18:51     ` Arnaldo Carvalho de Melo
2015-10-20  7:45   ` [tip:perf/core] perf bench: List output formatting options on ' perf " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 07/14] perf/bench/mem: Change 'cycle' to 'cycles' Ingo Molnar
2015-10-20  7:45   ` [tip:perf/core] perf bench mem: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 08/14] perf/bench/mem: Rename 'routine' to 'routine_str' Ingo Molnar
2015-10-19 13:34   ` David Ahern
2015-10-19 18:54     ` Arnaldo Carvalho de Melo
2015-10-19 18:56       ` Arnaldo Carvalho de Melo
2015-10-19 19:09         ` Ingo Molnar
2015-10-19 19:20           ` Arnaldo Carvalho de Melo
2015-10-19 19:21             ` Arnaldo Carvalho de Melo
2015-10-20  7:36               ` Ingo Molnar
2015-10-20  7:45   ` [tip:perf/core] perf bench mem: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 09/14] perf/bench/mem: Fix 'length' vs. 'size' naming confusion Ingo Molnar
2015-10-20  7:46   ` [tip:perf/core] perf bench mem: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 10/14] perf/bench/mem: Improve user visible strings Ingo Molnar
2015-10-20  7:46   ` [tip:perf/core] perf bench mem: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 11/14] perf/bench/mem: Reorganize the code a bit Ingo Molnar
2015-10-20  7:46   ` [tip:perf/core] perf bench mem: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 12/14] perf/bench: Harmonize all the -l/--nr_loops options Ingo Molnar
2015-10-20  7:47   ` [tip:perf/core] perf bench: Harmonize all the -l/ --nr_loops options tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 13/14] perf/bench/mem: Rename 'routine' to 'function' Ingo Molnar
2015-10-20  7:47   ` [tip:perf/core] perf bench mem: " tip-bot for Ingo Molnar
2015-10-19  8:04 ` [PATCH 14/14] perf/bench: Run benchmarks, don't test them Ingo Molnar
2015-10-20  7:47   ` [tip:perf/core] perf bench: " tip-bot for Ingo Molnar
2015-10-19 13:40 ` [PATCH 00/14] perf bench: Misc improvements David Ahern
2015-10-19 19:01   ` Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151019174744.GA2031@gmail.com \
    --to=mingo@kernel.org \
    --cc=acme@redhat.com \
    --cc=dsahern@gmail.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mitake@dcl.info.waseda.ac.jp \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.