All of lore.kernel.org
 help / color / mirror / Atom feed
From: tip-bot for Adrian Hunter <tipbot@zytor.com>
To: linux-tip-commits@vger.kernel.org
Cc: jolsa@redhat.com, linux-kernel@vger.kernel.org, mingo@kernel.org,
	tglx@linutronix.de, adrian.hunter@intel.com, hpa@zytor.com,
	acme@redhat.com
Subject: [tip:perf/core] perf intel-pt: Add mispred-all config option to aid use with autofdo
Date: Tue, 29 Sep 2015 01:49:23 -0700	[thread overview]
Message-ID: <tip-ba11ba65e02836c475427ae199adfc2d8cc4a900@git.kernel.org> (raw)
In-Reply-To: <1443186956-18718-26-git-send-email-adrian.hunter@intel.com>

Commit-ID:  ba11ba65e02836c475427ae199adfc2d8cc4a900
Gitweb:     http://git.kernel.org/tip/ba11ba65e02836c475427ae199adfc2d8cc4a900
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 25 Sep 2015 16:15:56 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 28 Sep 2015 17:21:00 -0300

perf intel-pt: Add mispred-all config option to aid use with autofdo

autofdo incorrectly expects branch flags to include either mispred or
predicted.  In fact mispred = predicted = 0 is valid and means the flags
are not supported, which they aren't by Intel PT.

To make autofdo work, add a config option which will cause Intel PT
decoder to set the mispred flag on all branches.

Below is an example of using Intel PT with autofdo.  The example is
also added to the Intel PT documentation.  It requires autofdo
(https://github.com/google/autofdo) and gcc version 5.  The bubble
sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial)
amended to take the number of elements as a parameter.

	$ gcc-5 -O3 sort.c -o sort_optimized
	$ ./sort_optimized 30000
	Bubble sorting array of 30000 elements
	2254 ms

	$ cat ~/.perfconfig
	[intel-pt]
		mispred-all

	$ perf record -e intel_pt//u ./sort 3000
	Bubble sorting array of 3000 elements
	58 ms
	[ perf record: Woken up 2 times to write data ]
	[ perf record: Captured and wrote 3.939 MB perf.data ]
	$ perf inject -i perf.data -o inj --itrace=i100usle --strip
	$ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
	$ ./sort_autofdo 30000
	Bubble sorting array of 30000 elements
	2155 ms

Note there is currently no advantage to using Intel PT instead of LBR,
but that may change in the future if greater use is made of the data.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1443186956-18718-26-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/intel-pt.txt | 29 +++++++++++++++++++++++++++++
 tools/perf/util/intel-pt.c            | 14 ++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
index a0fbb5d..be764f9 100644
--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -764,3 +764,32 @@ perf inject also accepts the --itrace option in which case tracing data is
 removed and replaced with the synthesized events. e.g.
 
 	perf inject --itrace -i perf.data -o perf.data.new
+
+Below is an example of using Intel PT with autofdo.  It requires autofdo
+(https://github.com/google/autofdo) and gcc version 5.  The bubble
+sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial)
+amended to take the number of elements as a parameter.
+
+	$ gcc-5 -O3 sort.c -o sort_optimized
+	$ ./sort_optimized 30000
+	Bubble sorting array of 30000 elements
+	2254 ms
+
+	$ cat ~/.perfconfig
+	[intel-pt]
+		mispred-all
+
+	$ perf record -e intel_pt//u ./sort 3000
+	Bubble sorting array of 3000 elements
+	58 ms
+	[ perf record: Woken up 2 times to write data ]
+	[ perf record: Captured and wrote 3.939 MB perf.data ]
+	$ perf inject -i perf.data -o inj --itrace=i100usle --strip
+	$ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
+	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
+	$ ./sort_autofdo 30000
+	Bubble sorting array of 30000 elements
+	2155 ms
+
+Note there is currently no advantage to using Intel PT instead of LBR, but
+that may change in the future if greater use is made of the data.
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 05e8fcc51..03ff072 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -64,6 +64,7 @@ struct intel_pt {
 	bool data_queued;
 	bool est_tsc;
 	bool sync_switch;
+	bool mispred_all;
 	int have_sched_switch;
 	u32 pmu_type;
 	u64 kernel_start;
@@ -943,6 +944,7 @@ static void intel_pt_update_last_branch_rb(struct intel_pt_queue *ptq)
 	be->flags.abort = !!(state->flags & INTEL_PT_ABORT_TX);
 	be->flags.in_tx = !!(state->flags & INTEL_PT_IN_TX);
 	/* No support for mispredict */
+	be->flags.mispred = ptq->pt->mispred_all;
 
 	if (bs->nr < ptq->pt->synth_opts.last_branch_sz)
 		bs->nr += 1;
@@ -1967,6 +1969,16 @@ static bool intel_pt_find_switch(struct perf_evlist *evlist)
 	return false;
 }
 
+static int intel_pt_perf_config(const char *var, const char *value, void *data)
+{
+	struct intel_pt *pt = data;
+
+	if (!strcmp(var, "intel-pt.mispred-all"))
+		pt->mispred_all = perf_config_bool(var, value);
+
+	return 0;
+}
+
 static const char * const intel_pt_info_fmts[] = {
 	[INTEL_PT_PMU_TYPE]		= "  PMU Type            %"PRId64"\n",
 	[INTEL_PT_TIME_SHIFT]		= "  Time Shift          %"PRIu64"\n",
@@ -2011,6 +2023,8 @@ int intel_pt_process_auxtrace_info(union perf_event *event,
 	if (!pt)
 		return -ENOMEM;
 
+	perf_config(intel_pt_perf_config, pt);
+
 	err = auxtrace_queues__init(&pt->queues);
 	if (err)
 		goto err_free;

  reply	other threads:[~2015-09-29  8:50 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-25 13:15 [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Adrian Hunter
2015-09-25 13:15 ` [PATCH 01/25] perf auxtrace: Fix 'instructions' period of zero Adrian Hunter
2015-09-28 14:12   ` Arnaldo Carvalho de Melo
2015-09-28 14:16     ` Arnaldo Carvalho de Melo
2015-09-29  8:41   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 02/25] perf report: Fix sample type validation for synthesized callchains Adrian Hunter
2015-09-29  8:41   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 03/25] perf intel-pt: Fix potential loop forever Adrian Hunter
2015-09-29  8:42   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 04/25] perf intel-pt: Make logging slightly more efficient Adrian Hunter
2015-09-29  8:42   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 05/25] perf script: Allow time to be displayed in nanoseconds Adrian Hunter
2015-09-29  8:42   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 06/25] perf tools: Warn when AUX data has been lost Adrian Hunter
2015-09-29  8:43   ` [tip:perf/core] perf session: " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 07/25] perf tools: Add more documentation to export-to-postgresql.py script Adrian Hunter
2015-09-29  8:43   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 08/25] perf auxtrace: Add option to synthesize branch stacks on samples Adrian Hunter
2015-09-29  8:43   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 09/25] perf report: Adjust sample type validation for synthesized branch stacks Adrian Hunter
2015-09-29  8:44   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 10/25] perf report: Also do default setup " Adrian Hunter
2015-09-29  8:44   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 11/25] perf report: Skip events with null " Adrian Hunter
2015-09-29  8:44   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 12/25] perf inject: Set branch stack feature flag when synthesizing " Adrian Hunter
2015-09-29  8:45   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 13/25] perf intel-pt: Move branch filter logic Adrian Hunter
2015-09-29  8:45   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 14/25] perf intel-pt: Support generating branch stack Adrian Hunter
2015-09-29  8:45   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 15/25] perf report: Make max_stack value allow for synthesized callchains Adrian Hunter
2015-09-28 20:03   ` Arnaldo Carvalho de Melo
2015-09-29  8:52     ` Adrian Hunter
2015-09-29 15:51       ` Arnaldo Carvalho de Melo
2015-09-30  8:43         ` Adrian Hunter
2015-09-30 13:17           ` Arnaldo Carvalho de Melo
2015-10-01  7:10       ` [tip:perf/core] perf report: Amend documentation about max_stack and " tip-bot for Adrian Hunter
2015-09-29  8:46   ` [tip:perf/core] perf report: Make max_stack value allow for " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 16/25] perf hists: Allow for max_stack greater than PERF_MAX_STACK_DEPTH Adrian Hunter
2015-09-29  8:46   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 17/25] perf callchain: " Adrian Hunter
2015-09-28 20:08   ` Arnaldo Carvalho de Melo
2015-09-29  8:16     ` Adrian Hunter
2015-10-01 11:45       ` Adrian Hunter
2015-10-03  7:49   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 18/25] perf script: Add a setting for maximum stack depth Adrian Hunter
2015-09-29  8:46   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 19/25] perf scripting python: Allow for max_stack greater than PERF_MAX_STACK_DEPTH Adrian Hunter
2015-09-29  8:47   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 20/25] perf script: Make scripting_max_stack value allow for synthesized callchains Adrian Hunter
2015-09-29  8:47   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 21/25] perf tools: Add perf_evlist__id2evsel_strict() Adrian Hunter
2015-09-29  8:47   ` [tip:perf/core] perf evlist: " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 22/25] perf tools: Add perf_evlist__del() Adrian Hunter
2015-09-28 13:33   ` Arnaldo Carvalho de Melo
2015-09-28 20:14     ` Arnaldo Carvalho de Melo
2015-09-29  8:48   ` [tip:perf/core] perf evlist: Add perf_evlist__remove() tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 23/25] perf inject: Remove more aux-related stuff when processing instruction traces Adrian Hunter
2015-09-29  8:48   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 24/25] perf inject: Add --strip option to strip out non-synthesized events Adrian Hunter
2015-09-29  8:49   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-09-25 13:15 ` [PATCH 25/25] perf intel-pt: Add mispred-all config option to aid use with autofdo Adrian Hunter
2015-09-29  8:49   ` tip-bot for Adrian Hunter [this message]
2015-09-28 20:33 ` [PATCH 00/25] perf tools: minor improvements to Intel PT related stuff Arnaldo Carvalho de Melo
2015-09-29 11:13   ` Adrian Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tip-ba11ba65e02836c475427ae199adfc2d8cc4a900@git.kernel.org \
    --to=tipbot@zytor.com \
    --cc=acme@redhat.com \
    --cc=adrian.hunter@intel.com \
    --cc=hpa@zytor.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.