linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>, Andi Kleen <ak@linux.intel.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Ian Rogers <irogers@google.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org
Subject: Re: [PATCH] perf scripts python: intel-pt-events.py: Add ability interleave output
Date: Tue, 25 Oct 2022 17:45:53 -0300	[thread overview]
Message-ID: <Y1hLAT5L4ffmFhWe@kernel.org> (raw)
In-Reply-To: <20221020152509.5298-1-adrian.hunter@intel.com>

Em Thu, Oct 20, 2022 at 06:25:09PM +0300, Adrian Hunter escreveu:
> Intel PT timestamps are not provided for every branch, let alone every
> instruction, so there can be many samples with the same timestamp. With
> per-cpu contexts, decoding is done for each CPU in turn, which can make it
> difficult to see what is happening on different CPUs at the same time.
> Currently the interleaving from perf script --itrace=i0ns is quite coarse
> grained. There are often long stretches executing on one CPU and nothing on
> another.

Thanks, applied.

- Arnaldo

 
> Some people are interested in seeing what happened on multiple CPUs before
> a crash to debug races etc.
> 
> To improve perf script interleaving for parallel execution, the
> intel-pt-events.py script has been enhanced to enable interleaving the
> output with the same timestamp from different CPUs. It is understood that
> interleaving is not perfect or causal.
> 
> Add parameter --interleave [<n>] to interleave sample output for the same
> timestamp so that no more than n samples for a CPU are displayed in a row.
> 'n' defaults to 4. Note this only affects the order of output, and only
> when the timestamp is the same.
> 
> Example:
> 
>   $ perf script intel-pt-events.py --insn-trace --interleave 3
>   ...
>   bash  2267/2267  [004]  9323.692625625  563caa3c86f0  jz 0x563caa3c89c7        run_pending_traps+0x30 (/usr/bin/bash)   IPC: 1.52 (38/25)
>   bash  2267/2267  [004]  9323.692625625  563caa3c89c7  movq  0x118(%rsp), %rax  run_pending_traps+0x307 (/usr/bin/bash)
>   bash  2267/2267  [004]  9323.692625625  563caa3c89cf  subq  %fs:0x28, %rax     run_pending_traps+0x30f (/usr/bin/bash)
>   bash  2270/2270  [007]  9323.692625625  55dc58cabf02  jz 0x55dc58cabf48        unquoted_glob_pattern_p+0x102 (/usr/bin/bash)   IPC: 1.56 (25/16)
>   bash  2270/2270  [007]  9323.692625625  55dc58cabf04  cmp $0x5d, %al           unquoted_glob_pattern_p+0x104 (/usr/bin/bash)
>   bash  2270/2270  [007]  9323.692625625  55dc58cabf06  jnz 0x55dc58cabf10       unquoted_glob_pattern_p+0x106 (/usr/bin/bash)
>   bash  2264/2264  [001]  9323.692625625  7fd556a4376c  jbe 0x7fd556a43ac8       round_and_return+0x3fc (/usr/lib/x86_64-linux-gnu/libc.so.6)   IPC: 4.30 (43/10)
>   bash  2264/2264  [001]  9323.692625625  7fd556a43772  and $0x8, %edx           round_and_return+0x402 (/usr/lib/x86_64-linux-gnu/libc.so.6)
>   bash  2264/2264  [001]  9323.692625625  7fd556a43775  jnz 0x7fd556a43ac8       round_and_return+0x405 (/usr/lib/x86_64-linux-gnu/libc.so.6)
>   bash  2267/2267  [004]  9323.692625625  563caa3c89d8  jnz 0x563caa3c8b11       run_pending_traps+0x318 (/usr/bin/bash)
>   bash  2267/2267  [004]  9323.692625625  563caa3c89de  add $0x128, %rsp         run_pending_traps+0x31e (/usr/bin/bash)
>   bash  2267/2267  [004]  9323.692625625  563caa3c89e5  popq  %rbx               run_pending_traps+0x325 (/usr/bin/bash)
>   ...
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/Documentation/perf-intel-pt.txt   | 12 +++-
>  tools/perf/scripts/python/intel-pt-events.py | 65 +++++++++++++++++++-
>  2 files changed, 74 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
> index 92464a5d7eaf..7b6ccd2fa3bf 100644
> --- a/tools/perf/Documentation/perf-intel-pt.txt
> +++ b/tools/perf/Documentation/perf-intel-pt.txt
> @@ -189,8 +189,16 @@ There is also script intel-pt-events.py which provides an example of how to
>  unpack the raw data for power events and PTWRITE. The script also displays
>  branches, and supports 2 additional modes selected by option:
>  
> - --insn-trace - instruction trace
> - --src-trace - source trace
> + - --insn-trace - instruction trace
> + - --src-trace - source trace
> +
> +The intel-pt-events.py script also has options:
> +
> + - --all-switch-events - display all switch events, not only the last consecutive.
> + - --interleave [<n>] - interleave sample output for the same timestamp so that
> + no more than n samples for a CPU are displayed in a row. 'n' defaults to 4.
> + Note this only affects the order of output, and only when the timestamp is the
> + same.
>  
>  As mentioned above, it is easy to capture too much data.  One way to limit the
>  data captured is to use 'snapshot' mode which is explained further below.
> diff --git a/tools/perf/scripts/python/intel-pt-events.py b/tools/perf/scripts/python/intel-pt-events.py
> index 6be7fd8fd615..08862a2582f4 100644
> --- a/tools/perf/scripts/python/intel-pt-events.py
> +++ b/tools/perf/scripts/python/intel-pt-events.py
> @@ -13,10 +13,12 @@
>  
>  from __future__ import print_function
>  
> +import io
>  import os
>  import sys
>  import struct
>  import argparse
> +import contextlib
>  
>  from libxed import LibXED
>  from ctypes import create_string_buffer, addressof
> @@ -39,6 +41,11 @@ glb_src			= False
>  glb_source_file_name	= None
>  glb_line_number		= None
>  glb_dso			= None
> +glb_stash_dict		= {}
> +glb_output		= None
> +glb_output_pos		= 0
> +glb_cpu			= -1
> +glb_time		= 0
>  
>  def get_optional_null(perf_dict, field):
>  	if field in perf_dict:
> @@ -70,6 +77,7 @@ def trace_begin():
>  	ap.add_argument("--insn-trace", action='store_true')
>  	ap.add_argument("--src-trace", action='store_true')
>  	ap.add_argument("--all-switch-events", action='store_true')
> +	ap.add_argument("--interleave", type=int, nargs='?', const=4, default=0)
>  	global glb_args
>  	global glb_insn
>  	global glb_src
> @@ -94,11 +102,39 @@ def trace_begin():
>  	perf_set_itrace_options(perf_script_context, itrace)
>  
>  def trace_end():
> +	if glb_args.interleave:
> +		flush_stashed_output()
>  	print("End")
>  
>  def trace_unhandled(event_name, context, event_fields_dict):
>  		print(' '.join(['%s=%s'%(k,str(v))for k,v in sorted(event_fields_dict.items())]))
>  
> +def stash_output():
> +	global glb_stash_dict
> +	global glb_output_pos
> +	output_str = glb_output.getvalue()[glb_output_pos:]
> +	n = len(output_str)
> +	if n:
> +		glb_output_pos += n
> +		if glb_cpu not in glb_stash_dict:
> +			glb_stash_dict[glb_cpu] = []
> +		glb_stash_dict[glb_cpu].append(output_str)
> +
> +def flush_stashed_output():
> +	global glb_stash_dict
> +	while glb_stash_dict:
> +		cpus = list(glb_stash_dict.keys())
> +		# Output at most glb_args.interleave output strings per cpu
> +		for cpu in cpus:
> +			items = glb_stash_dict[cpu]
> +			countdown = glb_args.interleave
> +			while len(items) and countdown:
> +				sys.stdout.write(items[0])
> +				del items[0]
> +				countdown -= 1
> +			if not items:
> +				del glb_stash_dict[cpu]
> +
>  def print_ptwrite(raw_buf):
>  	data = struct.unpack_from("<IQ", raw_buf)
>  	flags = data[0]
> @@ -375,15 +411,40 @@ def do_process_event(param_dict):
>  		print_common_start(comm, sample, name)
>  		print_common_ip(param_dict, sample, symbol, dso)
>  
> +def interleave_events(param_dict):
> +	global glb_cpu
> +	global glb_time
> +	global glb_output
> +	global glb_output_pos
> +
> +	sample  = param_dict["sample"]
> +	glb_cpu = sample["cpu"]
> +	ts      = sample["time"]
> +
> +	if glb_time != ts:
> +		glb_time = ts
> +		flush_stashed_output()
> +
> +	glb_output_pos = 0
> +	with contextlib.redirect_stdout(io.StringIO()) as glb_output:
> +		do_process_event(param_dict)
> +
> +	stash_output()
> +
>  def process_event(param_dict):
>  	try:
> -		do_process_event(param_dict)
> +		if glb_args.interleave:
> +			interleave_events(param_dict)
> +		else:
> +			do_process_event(param_dict)
>  	except broken_pipe_exception:
>  		# Stop python printing broken pipe errors and traceback
>  		sys.stdout = open(os.devnull, 'w')
>  		sys.exit(1)
>  
>  def auxtrace_error(typ, code, cpu, pid, tid, ip, ts, msg, cpumode, *x):
> +	if glb_args.interleave:
> +		flush_stashed_output()
>  	if len(x) >= 2 and x[0]:
>  		machine_pid = x[0]
>  		vcpu = x[1]
> @@ -403,6 +464,8 @@ def auxtrace_error(typ, code, cpu, pid, tid, ip, ts, msg, cpumode, *x):
>  		sys.exit(1)
>  
>  def context_switch(ts, cpu, pid, tid, np_pid, np_tid, machine_pid, out, out_preempt, *x):
> +	if glb_args.interleave:
> +		flush_stashed_output()
>  	if out:
>  		out_str = "Switch out "
>  	else:
> -- 
> 2.34.1

-- 

- Arnaldo

      reply	other threads:[~2022-10-25 20:46 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-20 15:25 [PATCH] perf scripts python: intel-pt-events.py: Add ability interleave output Adrian Hunter
2022-10-25 20:45 ` Arnaldo Carvalho de Melo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1hLAT5L4ffmFhWe@kernel.org \
    --to=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).