From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
Clark Williams <williams@redhat.com>,
linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
Adrian Hunter <adrian.hunter@intel.com>,
Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: [PATCH 02/37] perf thread-stack: Improve thread_stack__no_call_return()
Date: Mon, 25 Feb 2019 18:20:00 -0300 [thread overview]
Message-ID: <20190225212035.24781-3-acme@kernel.org> (raw)
In-Reply-To: <20190225212035.24781-1-acme@kernel.org>
From: Adrian Hunter <adrian.hunter@intel.com>
Improve thread_stack__no_call_return() to better handle 'returns' that
do not match the stack i.e. 'no call'. See code comments for details.
The example below shows how retpolines are affected:
Example:
$ cat simple-retpoline.c
__attribute__((noinline)) int bar(void)
{
return -1;
}
int foo(void)
{
return bar() + 1;
}
__attribute__((indirect_branch("thunk"))) int main()
{
int (*volatile fn)(void) = foo;
fn();
return fn();
}
$ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
$ objdump -d simple-retpoline
<SNIP>
0000000000001040 <main>:
1040: 48 83 ec 18 sub $0x18,%rsp
1044: 48 8d 05 25 01 00 00 lea 0x125(%rip),%rax # 1170 <foo>
104b: 48 89 44 24 08 mov %rax,0x8(%rsp)
1050: 48 8b 44 24 08 mov 0x8(%rsp),%rax
1055: e8 1f 01 00 00 callq 1179 <__x86_indirect_thunk_rax>
105a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
105f: 48 83 c4 18 add $0x18,%rsp
1063: e9 11 01 00 00 jmpq 1179 <__x86_indirect_thunk_rax>
<SNIP>
0000000000001160 <bar>:
1160: b8 ff ff ff ff mov $0xffffffff,%eax
1165: c3 retq
<SNIP>
0000000000001170 <foo>:
1170: e8 eb ff ff ff callq 1160 <bar>
1175: 83 c0 01 add $0x1,%eax
1178: c3 retq
0000000000001179 <__x86_indirect_thunk_rax>:
1179: e8 07 00 00 00 callq 1185 <__x86_indirect_thunk_rax+0xc>
117e: f3 90 pause
1180: 0f ae e8 lfence
1183: eb f9 jmp 117e <__x86_indirect_thunk_rax+0x5>
1185: 48 89 04 24 mov %rax,(%rsp)
1189: c3 retq
<SNIP>
$ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
$ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
2019-01-08 14:03:37.851655 Creating database...
2019-01-08 14:03:37.863256 Writing records...
2019-01-08 14:03:38.069750 Adding indexes
2019-01-08 14:03:38.078799 Done
$ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
Before:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> bar
After:
main
-> __x86_indirect_thunk_rax
-> __x86_indirect_thunk_rax
-> foo
-> bar
Committer testing:
Chose "Reports", Then "Context-Sensitive Call Graph" and then go on
expanding:
Before:
simple-retpolin
PID:PID
_start
_start
__libc_start_main
main
__x86_indirect_thunk_rax
__x86_indirect_thunk_rax
bar
After:
Remove the "simple.retpoline.db" file, run again the 'perf script' line
to regenerate the .db file and run the exported-sql-viewer.py again to
get the same all the way to 'main', then, from there, including 'main':
main
__x86_indirect_thunk_rax
__x86_indirect_thunk_rax
foo
bar
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/thread-stack.c | 49 +++++++++++++++++++++++++++++++---
1 file changed, 46 insertions(+), 3 deletions(-)
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index f52c0f90915d..632c07a125ab 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -638,14 +638,57 @@ static int thread_stack__no_call_return(struct thread *thread,
else
parent = root;
- /* This 'return' had no 'call', so push and pop top of stack */
- cp = call_path__findnew(cpr, parent, fsym, ip, ks);
+ if (parent->sym == from_al->sym) {
+ /*
+ * At the bottom of the stack, assume the missing 'call' was
+ * before the trace started. So, pop the current symbol and push
+ * the 'to' symbol.
+ */
+ if (ts->cnt == 1) {
+ err = thread_stack__call_return(thread, ts, --ts->cnt,
+ tm, ref, false);
+ if (err)
+ return err;
+ }
+
+ if (!ts->cnt) {
+ cp = call_path__findnew(cpr, root, tsym, addr, ks);
+
+ return thread_stack__push_cp(ts, addr, tm, ref, cp,
+ true, false);
+ }
+
+ /*
+ * Otherwise assume the 'return' is being used as a jump (e.g.
+ * retpoline) and just push the 'to' symbol.
+ */
+ cp = call_path__findnew(cpr, parent, tsym, addr, ks);
+
+ err = thread_stack__push_cp(ts, 0, tm, ref, cp, true, false);
+ if (!err)
+ ts->stack[ts->cnt - 1].non_call = true;
+
+ return err;
+ }
+
+ /*
+ * Assume 'parent' has not yet returned, so push 'to', and then push and
+ * pop 'from'.
+ */
+
+ cp = call_path__findnew(cpr, parent, tsym, addr, ks);
err = thread_stack__push_cp(ts, addr, tm, ref, cp, true, false);
if (err)
return err;
- return thread_stack__pop_cp(thread, ts, addr, tm, ref, tsym);
+ cp = call_path__findnew(cpr, cp, fsym, ip, ks);
+
+ err = thread_stack__push_cp(ts, ip, tm, ref, cp, true, false);
+ if (err)
+ return err;
+
+ return thread_stack__call_return(thread, ts, --ts->cnt, tm, ref, false);
}
static int thread_stack__trace_begin(struct thread *thread,
--
2.20.1
next prev parent reply other threads:[~2019-02-25 21:20 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-25 21:19 [GIT PULL] perf/core improvements and fixes Arnaldo Carvalho de Melo
2019-02-25 21:19 ` [PATCH 01/37] perf annotate: Fix getting source line failure Arnaldo Carvalho de Melo
2019-02-25 21:20 ` Arnaldo Carvalho de Melo [this message]
2019-02-25 21:20 ` [PATCH 03/37] perf thread-stack: Hide x86 retpolines Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 04/37] perf scripts python: exported-sql-viewer.py: Fix missing shebang Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 05/37] perf scripts python: exported-sql-viewer.py: Remove leftover debugging prints Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 06/37] perf scripts python: exported-sql-viewer.py: Hide Call Graph option if no calls table Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 07/37] perf scripts python: exported-sql-viewer.py: Move column headers Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 08/37] perf scripts python: exported-sql-viewer.py: Factor out ReportDialogBase Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 09/37] perf scripts python: exported-sql-viewer.py: Factor out ReportVars Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 10/37] perf scripts python: exported-sql-viewer.py: Move report name into ReportVars Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 11/37] perf scripts python: exported-sql-viewer.py: Create new dialog data item classes Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 12/37] perf scripts python: exported-sql-viewer.py: Remove SQLTableDialogDataItem Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 13/37] perf scripts python: exported-sql-viewer.py: Remove no selection error Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 14/37] perf scripts python: exported-sql-viewer.py: Add top calls report Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 15/37] perf: Copy parent's address filter offsets on clone Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 16/37] perf, pt, coresight: Fix address filters for vmas with non-zero offset Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 17/37] perf data: Move size to struct perf_data_file Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 18/37] perf data: Add global path holder Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 19/37] perf tools: Add depth checking to rm_rf Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 20/37] perf tools: Add pattern name " Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 21/37] perf tools: Add rm_rf_perf_data function Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 22/37] perf data: Make check_backup work over directories Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 23/37] perf data: Fail check_backup in case of error Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 24/37] perf data: Add perf_data__(create_dir|close_dir) functions Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 25/37] perf data: Add perf_data__open_dir_data function Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 26/37] perf script: Handle missing fields with -F + Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 27/37] perf tools: Add perf_exe() helper to find perf binary Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 28/37] perf script python: Add Python3 support to netdev-times.py Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 29/37] perf script python: Add Python3 support to failed-syscalls-by-pid.py Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 30/37] perf script python: Add Python3 support to mem-phys-addr.py Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 31/37] perf script python: Add Python3 support to net_dropmonitor.py Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 32/37] perf script python: Add Python3 support to powerpc-hcalls.py Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 33/37] perf script python: Add Python3 support to sctop.py Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 34/37] perf script python: Add Python3 support to stackcollapse.py Arnaldo Carvalho de Melo
2019-02-26 11:49 ` Paolo Bonzini
2019-02-25 21:20 ` [PATCH 35/37] perf script python: Add Python3 support to stat-cpi.py Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 36/37] perf script python: Add Python3 support to syscall-counts.py Arnaldo Carvalho de Melo
2019-02-25 21:20 ` [PATCH 37/37] perf script python: Add Python3 support to syscall-counts-by-pid.py Arnaldo Carvalho de Melo
2019-02-28 7:31 ` [GIT PULL] perf/core improvements and fixes Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190225212035.24781-3-acme@kernel.org \
--to=acme@kernel.org \
--cc=acme@redhat.com \
--cc=adrian.hunter@intel.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=williams@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.