linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/6] perf dlfilter: Add dlfilter-show-cycles
@ 2021-10-26  9:01 Adrian Hunter
  2021-10-26  9:01 ` [PATCH V2 1/6] perf auxtrace: Add missing Z option to ITRACE_HELP Adrian Hunter
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Adrian Hunter @ 2021-10-26  9:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, Andi Kleen, linux-kernel

Hi

The issue with V1 was that the IPC information used to accumulate cycle
counts has a lower granularity because it is output only when the cycle
count correlates to the IP of the event i.e. only when the IPC is exactly
correct.

To enable more frequent updates to the cycle count, itrace option 'A' is
added, which specifies that IPC information can be approximate.

In addition there are some new miscellaneous patches.


Changes in V2:

    perf dlfilter: Add dlfilter-show-cycles
      Separate counts for branches, instructions or other events.

    New patches:
      perf auxtrace: Add missing Z option to ITRACE_HELP
      perf auxtrace: Add itrace A option to approximate IPC
      perf intel-pt: Support itrace A option to approximate IPC
      perf auxtrace: Add itrace d+o option to direct debug log to stdout
      perf intel-pt: Support itrace d+o option to direct debug log to stdout


Adrian Hunter (6):
      perf auxtrace: Add missing Z option to ITRACE_HELP
      perf auxtrace: Add itrace A option to approximate IPC
      perf intel-pt: Support itrace A option to approximate IPC
      perf dlfilter: Add dlfilter-show-cycles
      perf auxtrace: Add itrace d+o option to direct debug log to stdout
      perf intel-pt: Support itrace d+o option to direct debug log to stdout

 tools/perf/Documentation/itrace.txt                |   2 +
 tools/perf/Documentation/perf-intel-pt.txt         |  23 ++++
 tools/perf/Makefile.perf                           |   2 +-
 tools/perf/dlfilters/dlfilter-show-cycles.c        | 144 +++++++++++++++++++++
 tools/perf/util/auxtrace.c                         |   3 +
 tools/perf/util/auxtrace.h                         |   6 +
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  |   1 +
 .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |   1 +
 tools/perf/util/intel-pt-decoder/intel-pt-log.c    |   8 +-
 tools/perf/util/intel-pt.c                         |  21 ++-
 10 files changed, 200 insertions(+), 11 deletions(-)
 create mode 100644 tools/perf/dlfilters/dlfilter-show-cycles.c


Regards
Adrian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V2 1/6] perf auxtrace: Add missing Z option to ITRACE_HELP
  2021-10-26  9:01 [PATCH V2 0/6] perf dlfilter: Add dlfilter-show-cycles Adrian Hunter
@ 2021-10-26  9:01 ` Adrian Hunter
  2021-10-26  9:01 ` [PATCH V2 2/6] perf auxtrace: Add itrace A option to approximate IPC Adrian Hunter
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Adrian Hunter @ 2021-10-26  9:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, Andi Kleen, linux-kernel

ITRACE_HELP is used by perf commands to display help text for the --itrace
option. Add missing Z option.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/auxtrace.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 5f383908ca6e..20dc78d86d54 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -649,6 +649,7 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session,
 "				L[len]:			synthesize last branch entries on existing event records\n" \
 "				sNUMBER:    		skip initial number of events\n"		\
 "				q:			quicker (less detailed) decoding\n" \
+"				Z:			prefer to ignore timestamps (so-called \"timeless\" decoding)\n" \
 "				PERIOD[ns|us|ms|i|t]:   specify period to sample stream\n" \
 "				concatenate multiple options. Default is ibxwpe or cewp\n"
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V2 2/6] perf auxtrace: Add itrace A option to approximate IPC
  2021-10-26  9:01 [PATCH V2 0/6] perf dlfilter: Add dlfilter-show-cycles Adrian Hunter
  2021-10-26  9:01 ` [PATCH V2 1/6] perf auxtrace: Add missing Z option to ITRACE_HELP Adrian Hunter
@ 2021-10-26  9:01 ` Adrian Hunter
  2021-10-26  9:01 ` [PATCH V2 3/6] perf intel-pt: Support " Adrian Hunter
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Adrian Hunter @ 2021-10-26  9:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, Andi Kleen, linux-kernel

Add an option to specify that synthesized IPC can be approximate, rather
than completely accurate.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/itrace.txt | 1 +
 tools/perf/util/auxtrace.c          | 3 +++
 tools/perf/util/auxtrace.h          | 3 +++
 3 files changed, 7 insertions(+)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index 2d586fe5e4c5..141449e97bed 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -20,6 +20,7 @@
 		L	synthesize last branch entries on existing event records
 		s       skip initial number of events
 		q	quicker (less detailed) decoding
+		A	approximate IPC
 		Z	prefer to ignore timestamps (so-called "timeless" decoding)
 
 	The default is all events i.e. the same as --itrace=ibxwpe,
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 8d2865b9ade2..c679394b898d 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1564,6 +1564,9 @@ int itrace_do_parse_synth_opts(struct itrace_synth_opts *synth_opts,
 		case 'q':
 			synth_opts->quick += 1;
 			break;
+		case 'A':
+			synth_opts->approx_ipc = true;
+			break;
 		case 'Z':
 			synth_opts->timeless_decoding = true;
 			break;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 20dc78d86d54..889f976ea1a0 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -84,6 +84,7 @@ enum itrace_period_type {
  * @thread_stack: feed branches to the thread_stack
  * @last_branch: add branch context to 'instruction' events
  * @add_last_branch: add branch context to existing event records
+ * @approx_ipc: approximate IPC
  * @flc: whether to synthesize first level cache events
  * @llc: whether to synthesize last level cache events
  * @tlb: whether to synthesize TLB events
@@ -127,6 +128,7 @@ struct itrace_synth_opts {
 	bool			thread_stack;
 	bool			last_branch;
 	bool			add_last_branch;
+	bool			approx_ipc;
 	bool			flc;
 	bool			llc;
 	bool			tlb;
@@ -649,6 +651,7 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session,
 "				L[len]:			synthesize last branch entries on existing event records\n" \
 "				sNUMBER:    		skip initial number of events\n"		\
 "				q:			quicker (less detailed) decoding\n" \
+"				A:			approximate IPC\n" \
 "				Z:			prefer to ignore timestamps (so-called \"timeless\" decoding)\n" \
 "				PERIOD[ns|us|ms|i|t]:   specify period to sample stream\n" \
 "				concatenate multiple options. Default is ibxwpe or cewp\n"
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V2 3/6] perf intel-pt: Support itrace A option to approximate IPC
  2021-10-26  9:01 [PATCH V2 0/6] perf dlfilter: Add dlfilter-show-cycles Adrian Hunter
  2021-10-26  9:01 ` [PATCH V2 1/6] perf auxtrace: Add missing Z option to ITRACE_HELP Adrian Hunter
  2021-10-26  9:01 ` [PATCH V2 2/6] perf auxtrace: Add itrace A option to approximate IPC Adrian Hunter
@ 2021-10-26  9:01 ` Adrian Hunter
  2021-10-26 17:03   ` Andi Kleen
  2021-10-26  9:01 ` [PATCH V2 4/6] perf dlfilter: Add dlfilter-show-cycles Adrian Hunter
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Adrian Hunter @ 2021-10-26  9:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, Andi Kleen, linux-kernel

Normally, for cycle-acccurate mode, IPC values are an exact number of
instructions and cycles. Due to the granularity of timestamps, that happens
only when a CYC packet correlates to the event.

Support the itrace 'A' option, to use instead, the number of cycles
associated with the current timestamp. This provides IPC information for
every change of timestamp, but at the expense of accuracy.

Furthermore, it can be used in conjunction with dlfilter-show-cycles.so
to provide higher granularity cycle information.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-intel-pt.txt       |  5 +++++
 .../util/intel-pt-decoder/intel-pt-decoder.c     |  1 +
 .../util/intel-pt-decoder/intel-pt-decoder.h     |  1 +
 tools/perf/util/intel-pt.c                       | 16 ++++++++++++----
 4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
index 184ba62420f0..31f1f373c463 100644
--- a/tools/perf/Documentation/perf-intel-pt.txt
+++ b/tools/perf/Documentation/perf-intel-pt.txt
@@ -157,6 +157,10 @@ of instructions and number of cycles since the last update, and thus represent
 the average IPC since the last IPC for that event type.  Note IPC for "branches"
 events is calculated separately from IPC for "instructions" events.
 
+Even with the 'cyc' config term, it is possible to produce IPC information for
+every change of timestamp, but at the expense of accuracy.  That is selected by
+specifying the itrace 'A' option.
+
 Also note that the IPC instruction count may or may not include the current
 instruction.  If the cycle count is associated with an asynchronous branch
 (e.g. page fault or interrupt), then the instruction count does not include the
@@ -873,6 +877,7 @@ The letters are:
 	L	synthesize last branch entries on existing event records
 	s	skip initial number of events
 	q	quicker (less detailed) decoding
+	A	approximate IPC
 	Z	prefer to ignore timestamps (so-called "timeless" decoding)
 
 "Instructions" events look like they were recorded by "perf record -e
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index 5ab631702769..5f83937bf8f3 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -608,6 +608,7 @@ static inline void intel_pt_update_sample_time(struct intel_pt_decoder *decoder)
 {
 	decoder->sample_timestamp = decoder->timestamp;
 	decoder->sample_insn_cnt = decoder->timestamp_insn_cnt;
+	decoder->state.cycles = decoder->tot_cyc_cnt;
 }
 
 static void intel_pt_reposition(struct intel_pt_decoder *decoder)
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
index 4b5e79fcf557..8fd68f7a0963 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
@@ -218,6 +218,7 @@ struct intel_pt_state {
 	uint64_t to_ip;
 	uint64_t tot_insn_cnt;
 	uint64_t tot_cyc_cnt;
+	uint64_t cycles;
 	uint64_t timestamp;
 	uint64_t est_timestamp;
 	uint64_t trace_nr;
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 6f852b305e92..57e49b23ad25 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -163,6 +163,7 @@ struct intel_pt_queue {
 	bool step_through_buffers;
 	bool use_buffer_pid_tid;
 	bool sync_switch;
+	bool sample_ipc;
 	pid_t pid, tid;
 	int cpu;
 	int switch_state;
@@ -1571,7 +1572,7 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
 		sample.branch_stack = (struct branch_stack *)&dummy_bs;
 	}
 
-	if (ptq->state->flags & INTEL_PT_SAMPLE_IPC)
+	if (ptq->sample_ipc)
 		sample.cyc_cnt = ptq->ipc_cyc_cnt - ptq->last_br_cyc_cnt;
 	if (sample.cyc_cnt) {
 		sample.insn_cnt = ptq->ipc_insn_cnt - ptq->last_br_insn_cnt;
@@ -1622,7 +1623,7 @@ static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
 	else
 		sample.period = ptq->state->tot_insn_cnt - ptq->last_insn_cnt;
 
-	if (ptq->state->flags & INTEL_PT_SAMPLE_IPC)
+	if (ptq->sample_ipc)
 		sample.cyc_cnt = ptq->ipc_cyc_cnt - ptq->last_in_cyc_cnt;
 	if (sample.cyc_cnt) {
 		sample.insn_cnt = ptq->ipc_insn_cnt - ptq->last_in_insn_cnt;
@@ -2198,8 +2199,15 @@ static int intel_pt_sample(struct intel_pt_queue *ptq)
 
 	ptq->have_sample = false;
 
-	ptq->ipc_insn_cnt = ptq->state->tot_insn_cnt;
-	ptq->ipc_cyc_cnt = ptq->state->tot_cyc_cnt;
+	if (pt->synth_opts.approx_ipc) {
+		ptq->ipc_insn_cnt = ptq->state->tot_insn_cnt;
+		ptq->ipc_cyc_cnt = ptq->state->cycles;
+		ptq->sample_ipc = true;
+	} else {
+		ptq->ipc_insn_cnt = ptq->state->tot_insn_cnt;
+		ptq->ipc_cyc_cnt = ptq->state->tot_cyc_cnt;
+		ptq->sample_ipc = ptq->state->flags & INTEL_PT_SAMPLE_IPC;
+	}
 
 	/*
 	 * Do PEBS first to allow for the possibility that the PEBS timestamp
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V2 4/6] perf dlfilter: Add dlfilter-show-cycles
  2021-10-26  9:01 [PATCH V2 0/6] perf dlfilter: Add dlfilter-show-cycles Adrian Hunter
                   ` (2 preceding siblings ...)
  2021-10-26  9:01 ` [PATCH V2 3/6] perf intel-pt: Support " Adrian Hunter
@ 2021-10-26  9:01 ` Adrian Hunter
  2021-10-26  9:01 ` [PATCH V2 5/6] perf auxtrace: Add itrace d+o option to direct debug log to stdout Adrian Hunter
  2021-10-26  9:01 ` [PATCH V2 6/6] perf intel-pt: Support " Adrian Hunter
  5 siblings, 0 replies; 9+ messages in thread
From: Adrian Hunter @ 2021-10-26  9:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, Andi Kleen, linux-kernel

Add a new dlfilter to show cycles.

Cycle counts are accumulated per CPU (or per thread if CPU is not recorded)
from IPC information, and printed together with the change since the last
print, at the start of each line. Separate counts are kept for branches,
instructions or other events.

Note also, the itrace A option can be useful to provide higher granularity
cycle information.

Example:

 $ perf record -e intel_pt/cyc/u uname
 Linux
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.044 MB perf.data ]
 $ perf script --itrace=A --call-trace --dlfilter dlfilter-show-cycles.so --deltatime | head
         0                   perf-exec  8509 [001]     0.000000000:  psb offs: 0
         0                   perf-exec  8509 [001]     0.000000000:  cbr: 42 freq: 4219 MHz (156%)
       833        833            uname  8509 [001]     0.000047689: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )        _start
       833                       uname  8509 [001]     0.000003261: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )            _dl_start
      2015       1182            uname  8509 [001]     0.000000282: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )            _dl_start
      2676        661            uname  8509 [001]     0.000002629: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )            _dl_start
      3612        936            uname  8509 [001]     0.000001232: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )            _dl_start
      4579        967            uname  8509 [001]     0.000002519: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )            _dl_start
      6145       1566            uname  8509 [001]     0.000001050: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )                _dl_setup_hash
      6239         94            uname  8509 [001]     0.000000023: (/usr/lib/x86_64-linux-gnu/ld-2.31.so              )                _dl_sysdep_start

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-intel-pt.txt  |  19 ++-
 tools/perf/Makefile.perf                    |   2 +-
 tools/perf/dlfilters/dlfilter-show-cycles.c | 144 ++++++++++++++++++++
 3 files changed, 163 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/dlfilters/dlfilter-show-cycles.c

diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
index 31f1f373c463..81dd27be3d09 100644
--- a/tools/perf/Documentation/perf-intel-pt.txt
+++ b/tools/perf/Documentation/perf-intel-pt.txt
@@ -159,7 +159,9 @@ events is calculated separately from IPC for "instructions" events.
 
 Even with the 'cyc' config term, it is possible to produce IPC information for
 every change of timestamp, but at the expense of accuracy.  That is selected by
-specifying the itrace 'A' option.
+specifying the itrace 'A' option.  It may also be useful to use the 'A' option
+in conjunction with dlfilter-show-cycles.so to provide higher granularity cycle
+information.
 
 Also note that the IPC instruction count may or may not include the current
 instruction.  If the cycle count is associated with an asynchronous branch
@@ -1077,6 +1079,21 @@ The Z option is equivalent to having recorded a trace without TSC
 decoding a trace of a virtual machine.
 
 
+dlfilter-show-cycles.so
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Cycles can be displayed using dlfilter-show-cycles.so in which case the itrace A
+option can be useful to provide higher granularity cycle information:
+
+	perf script --itrace=A --call-trace --dlfilter dlfilter-show-cycles.so
+
+To see a list of dlfilters:
+
+	perf script -v --list-dlfilters
+
+See also linkperf:perf-dlfilters[1]
+
+
 dump option
 ~~~~~~~~~~~
 
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 7df13e74450c..e155570cb662 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -362,7 +362,7 @@ ifndef NO_JVMTI
 PROGRAMS += $(OUTPUT)$(LIBJVMTI)
 endif
 
-DLFILTERS := dlfilter-test-api-v0.so
+DLFILTERS := dlfilter-test-api-v0.so dlfilter-show-cycles.so
 DLFILTERS := $(patsubst %,$(OUTPUT)dlfilters/%,$(DLFILTERS))
 
 # what 'all' will build and 'install' will install, in perfexecdir
diff --git a/tools/perf/dlfilters/dlfilter-show-cycles.c b/tools/perf/dlfilters/dlfilter-show-cycles.c
new file mode 100644
index 000000000000..9eccc97bff82
--- /dev/null
+++ b/tools/perf/dlfilters/dlfilter-show-cycles.c
@@ -0,0 +1,144 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * dlfilter-show-cycles.c: Print the number of cycles at the start of each line
+ * Copyright (c) 2021, Intel Corporation.
+ */
+#include <perf/perf_dlfilter.h>
+#include <string.h>
+#include <stdio.h>
+
+#define MAX_CPU 4096
+
+enum {
+	INSTR_CYC,
+	BRNCH_CYC,
+	OTHER_CYC,
+	MAX_ENTRY
+};
+
+static __u64 cycles[MAX_CPU][MAX_ENTRY];
+static __u64 cycles_rpt[MAX_CPU][MAX_ENTRY];
+
+#define BITS		16
+#define TABLESZ		(1 << BITS)
+#define TABLEMAX	(TABLESZ / 2)
+#define MASK		(TABLESZ - 1)
+
+static struct entry {
+	__u32 used;
+	__s32 tid;
+	__u64 cycles[MAX_ENTRY];
+	__u64 cycles_rpt[MAX_ENTRY];
+} table[TABLESZ];
+
+static int tid_cnt;
+
+static int event_entry(const char *event)
+{
+	if (!event)
+		return OTHER_CYC;
+	if (!strncmp(event, "instructions", 12))
+		return INSTR_CYC;
+	if (!strncmp(event, "branches", 8))
+		return BRNCH_CYC;
+	return OTHER_CYC;
+}
+
+static struct entry *find_entry(__s32 tid)
+{
+	__u32 pos = tid & MASK;
+	struct entry *e;
+
+	e = &table[pos];
+	while (e->used) {
+		if (e->tid == tid)
+			return e;
+		if (++pos == TABLESZ)
+			pos = 0;
+		e = &table[pos];
+	}
+
+	if (tid_cnt >= TABLEMAX) {
+		fprintf(stderr, "Too many threads\n");
+		return NULL;
+	}
+
+	tid_cnt += 1;
+	e->used = 1;
+	e->tid = tid;
+	return e;
+}
+
+static void add_entry(__s32 tid, int pos, __u64 cnt)
+{
+	struct entry *e = find_entry(tid);
+
+	if (e)
+		e->cycles[pos] += cnt;
+}
+
+int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
+{
+	__s32 cpu = sample->cpu;
+	__s32 tid = sample->tid;
+	int pos;
+
+	if (!sample->cyc_cnt)
+		return 0;
+
+	pos = event_entry(sample->event);
+
+	if (cpu >= 0 && cpu < MAX_CPU)
+		cycles[cpu][pos] += sample->cyc_cnt;
+	else if (tid != -1)
+		add_entry(tid, pos, sample->cyc_cnt);
+	return 0;
+}
+
+static void print_vals(__u64 cycles, __u64 delta)
+{
+	if (delta)
+		printf("%10llu %10llu ", cycles, delta);
+	else
+		printf("%10llu %10s ", cycles, "");
+}
+
+int filter_event(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
+{
+	__s32 cpu = sample->cpu;
+	__s32 tid = sample->tid;
+	int pos;
+
+	pos = event_entry(sample->event);
+
+	if (cpu >= 0 && cpu < MAX_CPU) {
+		print_vals(cycles[cpu][pos], cycles[cpu][pos] - cycles_rpt[cpu][pos]);
+		cycles_rpt[cpu][pos] = cycles[cpu][pos];
+		return 0;
+	}
+
+	if (tid != -1) {
+		struct entry *e = find_entry(tid);
+
+		if (e) {
+			print_vals(e->cycles[pos], e->cycles[pos] - e->cycles_rpt[pos]);
+			e->cycles_rpt[pos] = e->cycles[pos];
+			return 0;
+		}
+	}
+
+	printf("%22s", "");
+	return 0;
+}
+
+const char *filter_description(const char **long_description)
+{
+	static char *long_desc = "Cycle counts are accumulated per CPU (or "
+		"per thread if CPU is not recorded) from IPC information, and "
+		"printed together with the change since the last print, at the "
+		"start of each line. Separate counts are kept for branches, "
+		"instructions or other events.";
+
+	*long_description = long_desc;
+	return "Print the number of cycles at the start of each line";
+}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V2 5/6] perf auxtrace: Add itrace d+o option to direct debug log to stdout
  2021-10-26  9:01 [PATCH V2 0/6] perf dlfilter: Add dlfilter-show-cycles Adrian Hunter
                   ` (3 preceding siblings ...)
  2021-10-26  9:01 ` [PATCH V2 4/6] perf dlfilter: Add dlfilter-show-cycles Adrian Hunter
@ 2021-10-26  9:01 ` Adrian Hunter
  2021-10-26  9:01 ` [PATCH V2 6/6] perf intel-pt: Support " Adrian Hunter
  5 siblings, 0 replies; 9+ messages in thread
From: Adrian Hunter @ 2021-10-26  9:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, Andi Kleen, linux-kernel

It can be useful to see debug output in between normal output.

Add 'o' to the flags of debug option 'd', so that '--itrace=d+o' can
specify output of the debug log to stdout.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/itrace.txt | 1 +
 tools/perf/util/auxtrace.h          | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index 141449e97bed..c52755481e2f 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -62,5 +62,6 @@
 	debug messages will or will not be logged. Each flag must be preceded
 	by either '+' or '-'. The flags are:
 		a	all perf events
+		o	output to stdout
 
 	If supported, the 'q' option may be repeated to increase the effect.
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 889f976ea1a0..bbf0d78c6401 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -59,6 +59,7 @@ enum itrace_period_type {
 #define AUXTRACE_ERR_FLG_DATA_LOST	(1 << ('l' - 'a'))
 
 #define AUXTRACE_LOG_FLG_ALL_PERF_EVTS	(1 << ('a' - 'a'))
+#define AUXTRACE_LOG_FLG_USE_STDOUT	(1 << ('o' - 'a'))
 
 /**
  * struct itrace_synth_opts - AUX area tracing synthesis options.
@@ -641,6 +642,7 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session,
 "				d[flags]:		create a debug log\n" \
 "							each flag must be preceded by + or -\n" \
 "							log flags are: a (all perf events)\n" \
+"							               o (output to stdout)\n" \
 "				f:	    		synthesize first level cache events\n" \
 "				m:	    		synthesize last level cache events\n" \
 "				t:	    		synthesize TLB events\n" \
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH V2 6/6] perf intel-pt: Support itrace d+o option to direct debug log to stdout
  2021-10-26  9:01 [PATCH V2 0/6] perf dlfilter: Add dlfilter-show-cycles Adrian Hunter
                   ` (4 preceding siblings ...)
  2021-10-26  9:01 ` [PATCH V2 5/6] perf auxtrace: Add itrace d+o option to direct debug log to stdout Adrian Hunter
@ 2021-10-26  9:01 ` Adrian Hunter
       [not found]   ` <dd9f91af-8b74-bf75-b3a4-c3826be7b190@linux.intel.com>
  5 siblings, 1 reply; 9+ messages in thread
From: Adrian Hunter @ 2021-10-26  9:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, Andi Kleen, linux-kernel

It can be useful to see debug output in between normal output.

Add support for AUXTRACE_LOG_FLG_USE_STDOUT to Intel PT.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-intel-pt.txt      | 1 +
 tools/perf/util/intel-pt-decoder/intel-pt-log.c | 8 ++++----
 tools/perf/util/intel-pt.c                      | 5 +++--
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
index 81dd27be3d09..b94dca105ebd 100644
--- a/tools/perf/Documentation/perf-intel-pt.txt
+++ b/tools/perf/Documentation/perf-intel-pt.txt
@@ -948,6 +948,7 @@ by flags which affect what debug messages will or will not be logged. Each flag
 must be preceded by either '+' or '-'. The flags support by Intel PT are:
 		-a	Suppress logging of perf events
 		+a	Log all perf events
+		+o	Output to stdout instead of "intel_pt.log"
 By default, logged perf events are filtered by any specified time ranges, but
 flag +a overrides that.
 
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.c b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
index 09feb5b07d32..5f5dfc8753f3 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-log.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
@@ -82,10 +82,10 @@ static int intel_pt_log_open(void)
 	if (f)
 		return 0;
 
-	if (!log_name[0])
-		return -1;
-
-	f = fopen(log_name, "w+");
+	if (log_name[0])
+		f = fopen(log_name, "w+");
+	else
+		f = stdout;
 	if (!f) {
 		intel_pt_enable_logging = false;
 		return -1;
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 57e49b23ad25..793bac850268 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -3659,8 +3659,6 @@ int intel_pt_process_auxtrace_info(union perf_event *event,
 	if (err)
 		goto err_free;
 
-	intel_pt_log_set_name(INTEL_PT_PMU_NAME);
-
 	if (session->itrace_synth_opts->set) {
 		pt->synth_opts = *session->itrace_synth_opts;
 	} else {
@@ -3675,6 +3673,9 @@ int intel_pt_process_auxtrace_info(union perf_event *event,
 		pt->synth_opts.thread_stack = opts->thread_stack;
 	}
 
+	if (!(pt->synth_opts.log_plus_flags & AUXTRACE_LOG_FLG_USE_STDOUT))
+		intel_pt_log_set_name(INTEL_PT_PMU_NAME);
+
 	pt->session = session;
 	pt->machine = &session->machines.host; /* No kvm support */
 	pt->auxtrace_type = auxtrace_info->type;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH V2 3/6] perf intel-pt: Support itrace A option to approximate IPC
  2021-10-26  9:01 ` [PATCH V2 3/6] perf intel-pt: Support " Adrian Hunter
@ 2021-10-26 17:03   ` Andi Kleen
  0 siblings, 0 replies; 9+ messages in thread
From: Andi Kleen @ 2021-10-26 17:03 UTC (permalink / raw)
  To: Adrian Hunter, Arnaldo Carvalho de Melo; +Cc: Jiri Olsa, linux-kernel


On 10/26/2021 2:01 AM, Adrian Hunter wrote:
> Normally, for cycle-acccurate mode, IPC values are an exact number of
> instructions and cycles. Due to the granularity of timestamps, that happens
> only when a CYC packet correlates to the event.
>
> Support the itrace 'A' option, to use instead, the number of cycles
> associated with the current timestamp. This provides IPC information for
> every change of timestamp, but at the expense of accuracy.

Can you expand a bit what exactly the accuracy loss it?

Would be good to describe that in the manpage too.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V2 6/6] perf intel-pt: Support itrace d+o option to direct debug log to stdout
       [not found]   ` <dd9f91af-8b74-bf75-b3a4-c3826be7b190@linux.intel.com>
@ 2021-10-27 18:59     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-10-27 18:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Adrian Hunter, Jiri Olsa, linux-kernel

Em Tue, Oct 26, 2021 at 10:05:15AM -0700, Andi Kleen escreveu:
> Except for the documentation comments everything looks good to me. Thanks
> for implementing that.
> 
> 
> Reviewed-by: Andi Kleen <ak@linux.intel.com>

Thanks, he can send the expansion on a follow up patch, reviewing and
applying.

- Arnaldo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-10-27 18:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-26  9:01 [PATCH V2 0/6] perf dlfilter: Add dlfilter-show-cycles Adrian Hunter
2021-10-26  9:01 ` [PATCH V2 1/6] perf auxtrace: Add missing Z option to ITRACE_HELP Adrian Hunter
2021-10-26  9:01 ` [PATCH V2 2/6] perf auxtrace: Add itrace A option to approximate IPC Adrian Hunter
2021-10-26  9:01 ` [PATCH V2 3/6] perf intel-pt: Support " Adrian Hunter
2021-10-26 17:03   ` Andi Kleen
2021-10-26  9:01 ` [PATCH V2 4/6] perf dlfilter: Add dlfilter-show-cycles Adrian Hunter
2021-10-26  9:01 ` [PATCH V2 5/6] perf auxtrace: Add itrace d+o option to direct debug log to stdout Adrian Hunter
2021-10-26  9:01 ` [PATCH V2 6/6] perf intel-pt: Support " Adrian Hunter
     [not found]   ` <dd9f91af-8b74-bf75-b3a4-c3826be7b190@linux.intel.com>
2021-10-27 18:59     ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).