linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org, David Ahern <dsahern@gmail.com>,
	Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>,
	Jiri Olsa <jolsa@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: [PATCH 07/23] perf bench: Remove the prefaulting complication from 'perf bench mem mem*'
Date: Mon, 19 Oct 2015 18:39:18 -0300	[thread overview]
Message-ID: <1445290774-13344-8-git-send-email-acme@kernel.org> (raw)
In-Reply-To: <1445290774-13344-1-git-send-email-acme@kernel.org>

From: Ingo Molnar <mingo@kernel.org>

So 'perf bench mem memcpy/memset' has elaborate code to measure
memcpy()/memset() performance both with freshly allocated buffers (which
includes initial page fault overhead) and with preallocated buffers.

But the thing is, the resulting bandwidth results are mostly
meaningless, because page faults dominate so much of the cost.

It might make sense to measure cache cold vs. cache hot performance, but
the code does not do this.

So remove this complication, and always prefault the ranges before using
them.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-6-git-send-email-mingo@kernel.org
[ Remove --no-prefault, --only-prefault from docs, noticed by David Ahern ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-bench.txt |  16 ----
 tools/perf/bench/mem-functions.c        | 146 +++++++++++---------------------
 2 files changed, 50 insertions(+), 112 deletions(-)

diff --git a/tools/perf/Documentation/perf-bench.txt b/tools/perf/Documentation/perf-bench.txt
index ab632d9fbd7d..9cb60abe03aa 100644
--- a/tools/perf/Documentation/perf-bench.txt
+++ b/tools/perf/Documentation/perf-bench.txt
@@ -157,14 +157,6 @@ Repeat memcpy invocation this number of times.
 --cycle::
 Use perf's cpu-cycles event instead of gettimeofday syscall.
 
--o::
---only-prefault::
-Show only the result with page faults before memcpy.
-
--n::
---no-prefault::
-Show only the result without page faults before memcpy.
-
 *memset*::
 Suite for evaluating performance of simple memory set in various ways.
 
@@ -189,14 +181,6 @@ Repeat memset invocation this number of times.
 --cycle::
 Use perf's cpu-cycles event instead of gettimeofday syscall.
 
--o::
---only-prefault::
-Show only the result with page faults before memset.
-
--n::
---no-prefault::
-Show only the result without page faults before memset.
-
 SUITES FOR 'numa'
 ~~~~~~~~~~~~~~~~~
 *mem*::
diff --git a/tools/perf/bench/mem-functions.c b/tools/perf/bench/mem-functions.c
index 7acb9b83382c..9c18a4b976b6 100644
--- a/tools/perf/bench/mem-functions.c
+++ b/tools/perf/bench/mem-functions.c
@@ -28,8 +28,6 @@ static const char	*routine	= "all";
 static int		iterations	= 1;
 static bool		use_cycle;
 static int		cycle_fd;
-static bool		only_prefault;
-static bool		no_prefault;
 
 static const struct option options[] = {
 	OPT_STRING('l', "length", &length_str, "1MB",
@@ -41,10 +39,6 @@ static const struct option options[] = {
 		    "repeat memcpy() invocation this number of times"),
 	OPT_BOOLEAN('c', "cycle", &use_cycle,
 		    "Use cycles event instead of gettimeofday() for measuring"),
-	OPT_BOOLEAN('o', "only-prefault", &only_prefault,
-		    "Show only the result with page faults before memcpy()"),
-	OPT_BOOLEAN('n', "no-prefault", &no_prefault,
-		    "Show only the result without page faults before memcpy()"),
 	OPT_END()
 };
 
@@ -110,103 +104,60 @@ static double timeval2double(struct timeval *ts)
 	return (double)ts->tv_sec + (double)ts->tv_usec / (double)1000000;
 }
 
-#define print_bps(x) do {					\
-		if (x < K)					\
-			printf(" %14lf B/Sec", x);		\
-		else if (x < K * K)				\
-			printf(" %14lfd KB/Sec", x / K);	\
-		else if (x < K * K * K)				\
-			printf(" %14lf MB/Sec", x / K / K);	\
-		else						\
-			printf(" %14lf GB/Sec", x / K / K / K); \
+#define print_bps(x) do {						\
+		if (x < K)						\
+			printf(" %14lf B/Sec\n", x);			\
+		else if (x < K * K)					\
+			printf(" %14lfd KB/Sec\n", x / K);		\
+		else if (x < K * K * K)					\
+			printf(" %14lf MB/Sec\n", x / K / K);		\
+		else							\
+			printf(" %14lf GB/Sec\n", x / K / K / K);	\
 	} while (0)
 
 struct bench_mem_info {
 	const struct routine *routines;
-	u64 (*do_cycle)(const struct routine *r, size_t len, bool prefault);
-	double (*do_gettimeofday)(const struct routine *r, size_t len, bool prefault);
+	u64 (*do_cycle)(const struct routine *r, size_t len);
+	double (*do_gettimeofday)(const struct routine *r, size_t len);
 	const char *const *usage;
 };
 
 static void __bench_mem_routine(struct bench_mem_info *info, int r_idx, size_t len, double totallen)
 {
 	const struct routine *r = &info->routines[r_idx];
-	double result_bps[2];
-	u64 result_cycle[2];
-	int prefault = no_prefault ? 0 : 1;
-
-	result_cycle[0] = result_cycle[1] = 0ULL;
-	result_bps[0] = result_bps[1] = 0.0;
+	double result_bps = 0.0;
+	u64 result_cycle = 0;
 
 	printf("Routine %s (%s)\n", r->name, r->desc);
 
 	if (bench_format == BENCH_FORMAT_DEFAULT)
 		printf("# Copying %s Bytes ...\n\n", length_str);
 
-	if (!only_prefault && prefault) {
-		/* Show both results: */
-		if (use_cycle) {
-			result_cycle[0] = info->do_cycle(r, len, false);
-			result_cycle[1] = info->do_cycle(r, len, true);
-		} else {
-			result_bps[0]   = info->do_gettimeofday(r, len, false);
-			result_bps[1]   = info->do_gettimeofday(r, len, true);
-		}
+	if (use_cycle) {
+		result_cycle = info->do_cycle(r, len);
 	} else {
-		if (use_cycle)
-			result_cycle[prefault] = info->do_cycle(r, len, only_prefault);
-		else
-			result_bps[prefault] = info->do_gettimeofday(r, len, only_prefault);
+		result_bps = info->do_gettimeofday(r, len);
 	}
 
 	switch (bench_format) {
 	case BENCH_FORMAT_DEFAULT:
-		if (!only_prefault && prefault) {
-			if (use_cycle) {
-				printf(" %14lf Cycle/Byte\n",
-					(double)result_cycle[0]
-					/ totallen);
-				printf(" %14lf Cycle/Byte (with prefault)\n",
-					(double)result_cycle[1]
-					/ totallen);
-			} else {
-				print_bps(result_bps[0]);
-				printf("\n");
-				print_bps(result_bps[1]);
-				printf(" (with prefault)\n");
-			}
+		if (use_cycle) {
+			printf(" %14lf Cycle/Byte\n", (double)result_cycle/totallen);
 		} else {
-			if (use_cycle) {
-				printf(" %14lf Cycle/Byte",
-					(double)result_cycle[prefault]
-					/ totallen);
-			} else
-				print_bps(result_bps[prefault]);
-
-			printf("%s\n", only_prefault ? " (with prefault)" : "");
+			print_bps(result_bps);
 		}
 		break;
+
 	case BENCH_FORMAT_SIMPLE:
-		if (!only_prefault && prefault) {
-			if (use_cycle) {
-				printf("%lf %lf\n",
-					(double)result_cycle[0] / totallen,
-					(double)result_cycle[1] / totallen);
-			} else {
-				printf("%lf %lf\n",
-					result_bps[0], result_bps[1]);
-			}
+		if (use_cycle) {
+			printf("%lf\n", (double)result_cycle/totallen);
 		} else {
-			if (use_cycle) {
-				printf("%lf\n", (double)result_cycle[prefault]
-					/ totallen);
-			} else
-				printf("%lf\n", result_bps[prefault]);
+			printf("%lf\n", result_bps);
 		}
 		break;
+
 	default:
-		/* Reaching this means there's some disaster: */
-		die("unknown format: %d\n", bench_format);
+		BUG_ON(1);
 		break;
 	}
 }
@@ -219,11 +170,6 @@ static int bench_mem_common(int argc, const char **argv, struct bench_mem_info *
 
 	argc = parse_options(argc, argv, options, info->usage, 0);
 
-	if (no_prefault && only_prefault) {
-		fprintf(stderr, "Invalid options: -o and -n are mutually exclusive\n");
-		return 1;
-	}
-
 	if (use_cycle)
 		init_cycle();
 
@@ -235,10 +181,6 @@ static int bench_mem_common(int argc, const char **argv, struct bench_mem_info *
 		return 1;
 	}
 
-	/* Same as without specifying either of prefault and no-prefault: */
-	if (only_prefault && no_prefault)
-		only_prefault = no_prefault = false;
-
 	if (!strncmp(routine, "all", 3)) {
 		for (i = 0; info->routines[i].name; i++)
 			__bench_mem_routine(info, i, len, totallen);
@@ -278,7 +220,7 @@ static void memcpy_alloc_mem(void **dst, void **src, size_t length)
 	memset(*src, 0, length);
 }
 
-static u64 do_memcpy_cycle(const struct routine *r, size_t len, bool prefault)
+static u64 do_memcpy_cycle(const struct routine *r, size_t len)
 {
 	u64 cycle_start = 0ULL, cycle_end = 0ULL;
 	void *src = NULL, *dst = NULL;
@@ -287,8 +229,11 @@ static u64 do_memcpy_cycle(const struct routine *r, size_t len, bool prefault)
 
 	memcpy_alloc_mem(&dst, &src, len);
 
-	if (prefault)
-		fn(dst, src, len);
+	/*
+	 * We prefault the freshly allocated memory range here,
+	 * to not measure page fault overhead:
+	 */
+	fn(dst, src, len);
 
 	cycle_start = get_cycle();
 	for (i = 0; i < iterations; ++i)
@@ -300,7 +245,7 @@ static u64 do_memcpy_cycle(const struct routine *r, size_t len, bool prefault)
 	return cycle_end - cycle_start;
 }
 
-static double do_memcpy_gettimeofday(const struct routine *r, size_t len, bool prefault)
+static double do_memcpy_gettimeofday(const struct routine *r, size_t len)
 {
 	struct timeval tv_start, tv_end, tv_diff;
 	memcpy_t fn = r->fn.memcpy;
@@ -309,8 +254,11 @@ static double do_memcpy_gettimeofday(const struct routine *r, size_t len, bool p
 
 	memcpy_alloc_mem(&dst, &src, len);
 
-	if (prefault)
-		fn(dst, src, len);
+	/*
+	 * We prefault the freshly allocated memory range here,
+	 * to not measure page fault overhead:
+	 */
+	fn(dst, src, len);
 
 	BUG_ON(gettimeofday(&tv_start, NULL));
 	for (i = 0; i < iterations; ++i)
@@ -321,6 +269,7 @@ static double do_memcpy_gettimeofday(const struct routine *r, size_t len, bool p
 
 	free(src);
 	free(dst);
+
 	return (double)(((double)len * iterations) / timeval2double(&tv_diff));
 }
 
@@ -343,7 +292,7 @@ static void memset_alloc_mem(void **dst, size_t length)
 		die("memory allocation failed - maybe length is too large?\n");
 }
 
-static u64 do_memset_cycle(const struct routine *r, size_t len, bool prefault)
+static u64 do_memset_cycle(const struct routine *r, size_t len)
 {
 	u64 cycle_start = 0ULL, cycle_end = 0ULL;
 	memset_t fn = r->fn.memset;
@@ -352,8 +301,11 @@ static u64 do_memset_cycle(const struct routine *r, size_t len, bool prefault)
 
 	memset_alloc_mem(&dst, len);
 
-	if (prefault)
-		fn(dst, -1, len);
+	/*
+	 * We prefault the freshly allocated memory range here,
+	 * to not measure page fault overhead:
+	 */
+	fn(dst, -1, len);
 
 	cycle_start = get_cycle();
 	for (i = 0; i < iterations; ++i)
@@ -364,8 +316,7 @@ static u64 do_memset_cycle(const struct routine *r, size_t len, bool prefault)
 	return cycle_end - cycle_start;
 }
 
-static double do_memset_gettimeofday(const struct routine *r, size_t len,
-				     bool prefault)
+static double do_memset_gettimeofday(const struct routine *r, size_t len)
 {
 	struct timeval tv_start, tv_end, tv_diff;
 	memset_t fn = r->fn.memset;
@@ -374,8 +325,11 @@ static double do_memset_gettimeofday(const struct routine *r, size_t len,
 
 	memset_alloc_mem(&dst, len);
 
-	if (prefault)
-		fn(dst, -1, len);
+	/*
+	 * We prefault the freshly allocated memory range here,
+	 * to not measure page fault overhead:
+	 */
+	fn(dst, -1, len);
 
 	BUG_ON(gettimeofday(&tv_start, NULL));
 	for (i = 0; i < iterations; ++i)
-- 
2.1.0


  parent reply	other threads:[~2015-10-19 21:41 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-19 21:39 [GIT PULL 00/23] perf/core improvements and fixes Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 01/23] perf test: Silence tracepoint event failures Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 02/23] perf test: Suppress libtraceevent warnings Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 03/23] perf bench: Improve the 'perf bench mem memcpy' code readability Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 04/23] perf bench: Default to all routines in 'perf bench mem' Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 05/23] perf bench: Eliminate unused argument from bench_mem_common() Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 06/23] perf bench: Rename 'mem-memcpy.c' => 'mem-functions.c' Arnaldo Carvalho de Melo
2015-10-19 21:39 ` Arnaldo Carvalho de Melo [this message]
2015-10-19 21:39 ` [PATCH 08/23] perf bench: List output formatting options on 'perf bench -h' Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 09/23] perf bench mem: Change 'cycle' to 'cycles' Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 10/23] perf bench mem: Rename 'routine' to 'routine_str' Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 11/23] perf bench mem: Fix 'length' vs. 'size' naming confusion Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 12/23] perf bench mem: Improve user visible strings Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 13/23] perf bench mem: Reorganize the code a bit Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 14/23] perf bench: Harmonize all the -l/--nr_loops options Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 15/23] perf bench mem: Rename 'routine' to 'function' Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 16/23] perf bench: Run benchmarks, don't test them Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 17/23] perf help: Change 'usage' to 'Usage' for consistency Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 18/23] perf stat: Rename perf_stat struct into perf_stat_evsel Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 19/23] perf stat: Add AGGR_UNSET mode Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 20/23] perf cpu_map: Make cpu_map__build_map global Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 21/23] perf cpu_map: Add data arg to cpu_map__build_map callback Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 22/23] perf script: Check output fields only for samples Arnaldo Carvalho de Melo
2015-10-19 21:39 ` [PATCH 23/23] perf bench: Use named initializers in the trailer too Arnaldo Carvalho de Melo
2015-10-20  7:32 ` [GIT PULL 00/23] perf/core improvements and fixes Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1445290774-13344-8-git-send-email-acme@kernel.org \
    --to=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=dsahern@gmail.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mitake@dcl.info.waseda.ac.jp \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).