linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCHSET 0/3] perf tools: Callchain improvement with --children and -g caller
@ 2014-08-14  6:01 Namhyung Kim
  2014-08-14  6:01 ` [PATCH 1/3] perf report: Relax -g option parsing not to limit the option order Namhyung Kim
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Namhyung Kim @ 2014-08-14  6:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Paul Mackerras, Namhyung Kim,
	Namhyung Kim, LKML, Jiri Olsa, David Ahern, Andi Kleen,
	Frederic Weisbecker, Arun Sharma, Rodrigo Campos

Hello,

This patchset tries to improve the callchain behavior when caller
ordering (which is not default) and --children (which is default)
enabled.  I think that we might need to change default to caller
ordering when --children is enabled - but it needs more discussion of
course.

The "perf report -g" displays callchains callee-first order by default.
That means it can see the callers of the sample in a reverse order.
For example, "intel_idle" entry shows following callchain on my data.
    
      Children      Self  Command          Shared Object       Symbol
      --------  --------  ---------------  ------------------  ----------------
        40.53%    40.53%  swapper          [kernel.kallsyms]   [k] intel_idle
                  |
                   --- intel_idle
                       cpuidle_enter
                       cpuidle_wrap_enter
                       cpuidle_enter_tk
                       cpuidle_idle_call
                       cpu_idle
                      |
                      |--85.25%-- start_secondary
                      |
                       --14.75%-- rest_init
                                 start_kernel
                                 x86_64_start_reservations
                                 x86_64_start_kernel
    
So "intel_idle" was called by "cpuidle_enter", and in turn, it's
called by "cpuidle_wrap_enter" and so on.
    
When with -g "caller", it shows callchains in a reversed order -
"caller-first".  I know it's sometimes useful but never used it
by myself though.
    
        40.53%    40.53%  swapper          [kernel.kallsyms]   [k] intel_idle
                        |
                        |--85.25%-- start_secondary
                        |          cpu_idle
                        |          cpuidle_idle_call
                        |          cpuidle_enter_tk
                        |          cpuidle_wrap_enter
                        |          cpuidle_enter
                        |          intel_idle
                        |
                         --14.75%-- x86_64_start_kernel
                                   x86_64_start_reservations
                                   start_kernel
                                   rest_init
                                   cpu_idle
                                   cpuidle_idle_call
                                   cpuidle_enter_tk
                                   cpuidle_wrap_enter
                                   cpuidle_enter
                                   intel_idle
    
However, with --children feature added, it now can show all *callees*
of the entry and I think it could be very useful.  For example,
"start_kernel" entry now can display it calls rest_init and in turn
cpu_idle and then cpuidle_idle_call (95.72%).
    
         6.14%     0.00%  swapper          [kernel.kallsyms]   [k] start_kernel
                       |
                        --- start_kernel
                            rest_init
                            cpu_idle
                           |
                           |--97.52%-- cpuidle_idle_call
                           |          cpuidle_enter_tk
                           |          |
                           |          |--99.91%-- cpuidle_wrap_enter
                           |          |          cpuidle_enter
                           |          |          intel_idle
                           |           --0.09%-- [...]
                            --2.48%-- [...]
    
Note that start_kernel has no self overhead - meaning that it never
get sampled by itself but constructs such a nice callgraph.  But,
sadly, if an entry has self overhead, callchain will get confused with
generated callchain (like above) and self callchains (which reversed
order) like the eariler example.
    
To be consistent with other entries, I'd like to make it just to show
a single entry - itself - like below since it doesn't have callees
(children) at all.  But still use the whole callchain to construct
children entries (like the start_kernel) as usual.
    
        40.53%    40.53%  swapper          [kernel.kallsyms]   [k] intel_idle
                        |
                        --- intel_idle
    
Note that this is just for --children is enabled.  Without that, it
will do the original behavior.

Okay, here is a real case other that pure self (intel_idle) and pure
generated (start_kernel) entries.  Without this patch the "cpu_idle"
entry in the above data shows callchain info like following.  Note
that it has 0.15% of self overhead.

  $ perf report --children --g caller
  ...
  # Children      Self  Command          Shared Object       Symbol           
  # ........  ........  ...............  ..................  .................
  #
      53.42%     0.15%  swapper          [kernel.kallsyms]   [k] cpu_idle
                    |--99.72%-- cpu_idle
                    |          |          
                    |          |--84.44%-- cpuidle_idle_call
                    |          |          |          
                    |          |          |--92.67%-- cpuidle_enter_tk
                    |          |          |
                    |          |          /* snip */
                    |          |          
                    |          |--7.83%-- tick_nohz_idle_exit
                    |          |          |          
                    |          |          |--77.61%-- hrtimer_cancel
                    |          |          |          hrtimer_try_to_cancel
                    |          |          |          __remove_hrtimer
                    |          |          |          timerqueue_del
                    |          |          |          rb_erase
                    |          |          |          
                    |          |           --22.39%-- account_idle_ticks
                    |          |                     irqtime_account_process_tick.isra.2
                    |          |          
                    |          |--7.07%-- schedule
                    |          |          __schedule
                    |          |          perf_event_task_sched_out
                    |          |          
                    |           --0.65%-- cpuidle_get_cpu_driver
                    |          
                     --0.28%-- x86_64_start_kernel         /* <==== XXX */
                               x86_64_start_reservations
                               start_kernel
                               rest_init
                               cpu_idle

As you can see, mostly it just calls other functions - cpuidle_idle_call,
tick_nohz_idle_exit schedule or cpuidle_get_cpu_driver.  But 0.28% of it
shows as it it called x86_64_start_kernel which is not correct.  It just
came from the reversed self callchain info.

With this patchset applied, the incorrect 0.28% is disappeared and it
moves under cpu_idle itself (with [...]).

      53.42%     0.15%  swapper          [kernel.kallsyms]   [k] cpu_idle
                    |
                    --- cpu_idle
                       |          
                       |--84.20%-- cpuidle_idle_call
                       |          |          
                       |          |--92.67%-- cpuidle_enter_tk
                       |          |          |          
                       |          |          /* snip */
                       |
                       |--7.81%-- tick_nohz_idle_exit
                       |          |          
                       |          |--77.61%-- hrtimer_cancel
                       |          |          hrtimer_try_to_cancel
                       |          |          __remove_hrtimer
                       |          |          timerqueue_del
                       |          |          rb_erase
                       |          |          
                       |           --22.39%-- account_idle_ticks
                       |                     irqtime_account_process_tick.isra.2
                       |          
                       |--7.05%-- schedule
                       |          __schedule
                       |          perf_event_task_sched_out
                       |          
                       |--0.65%-- cpuidle_get_cpu_driver
                        --0.28%-- [...]   /*  <=====  this is the self overhead */


You can get this from 'perf/caller-chain-v1' branch on my tree

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Any comments are welcomed, thanks
Namhyung


Namhyung Kim (3):
  perf report: Relax -g option parsing not to limit the option order
  perf tools: Put callers above callee when callchain order is caller
  perf callchain: Prune misleading callchains for self entries

 tools/perf/ui/hist.c        |  8 ++--
 tools/perf/util/callchain.c | 95 ++++++++++++++++++---------------------------
 tools/perf/util/hist.c      | 23 ++++++++++-
 3 files changed, 65 insertions(+), 61 deletions(-)

-- 
2.0.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] perf report: Relax -g option parsing not to limit the option order
  2014-08-14  6:01 [RFC/PATCHSET 0/3] perf tools: Callchain improvement with --children and -g caller Namhyung Kim
@ 2014-08-14  6:01 ` Namhyung Kim
  2014-08-18  8:22   ` [tip:perf/core] " tip-bot for Namhyung Kim
  2014-08-14  6:01 ` [PATCH 2/3] perf tools: Put callers above callee when callchain order is caller Namhyung Kim
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Namhyung Kim @ 2014-08-14  6:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Paul Mackerras, Namhyung Kim,
	Namhyung Kim, LKML, Jiri Olsa, David Ahern, Andi Kleen,
	Frederic Weisbecker, Arun Sharma, Rodrigo Campos

Current perf report -g/--call-graph option parser requires for option
argument having following order:

  type,min_percent[,print_limit],order,key

But sometimes it's annoying to type all even if one just wants to
change the "order" or "key" setting.

This patch fixes it to remove the ordering restriction so that one can
use just "-g caller", for instance.  The only remaining restriction is
that the "print_limit" always comes after the "min_percent".

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/callchain.c | 95 ++++++++++++++++++---------------------------
 1 file changed, 38 insertions(+), 57 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 437ee09727e6..08f0fbf5527c 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -28,74 +28,55 @@ __thread struct callchain_cursor callchain_cursor;
 int
 parse_callchain_report_opt(const char *arg)
 {
-	char *tok, *tok2;
+	char *tok;
 	char *endptr;
+	bool minpcnt_set = false;
 
 	symbol_conf.use_callchain = true;
 
 	if (!arg)
 		return 0;
 
-	tok = strtok((char *)arg, ",");
-	if (!tok)
-		return -1;
-
-	/* get the output mode */
-	if (!strncmp(tok, "graph", strlen(arg))) {
-		callchain_param.mode = CHAIN_GRAPH_ABS;
-
-	} else if (!strncmp(tok, "flat", strlen(arg))) {
-		callchain_param.mode = CHAIN_FLAT;
-	} else if (!strncmp(tok, "fractal", strlen(arg))) {
-		callchain_param.mode = CHAIN_GRAPH_REL;
-	} else if (!strncmp(tok, "none", strlen(arg))) {
-		callchain_param.mode = CHAIN_NONE;
-		symbol_conf.use_callchain = false;
-		return 0;
-	} else {
-		return -1;
-	}
-
-	/* get the min percentage */
-	tok = strtok(NULL, ",");
-	if (!tok)
-		goto setup;
-
-	callchain_param.min_percent = strtod(tok, &endptr);
-	if (tok == endptr)
-		return -1;
+	while ((tok = strtok((char *)arg, ",")) != NULL) {
+		if (!strncmp(tok, "none", strlen(tok))) {
+			callchain_param.mode = CHAIN_NONE;
+			symbol_conf.use_callchain = false;
+			return 0;
+		}
 
-	/* get the print limit */
-	tok2 = strtok(NULL, ",");
-	if (!tok2)
-		goto setup;
+		/* try to get the output mode */
+		if (!strncmp(tok, "graph", strlen(tok)))
+			callchain_param.mode = CHAIN_GRAPH_ABS;
+		else if (!strncmp(tok, "flat", strlen(tok)))
+			callchain_param.mode = CHAIN_FLAT;
+		else if (!strncmp(tok, "fractal", strlen(tok)))
+			callchain_param.mode = CHAIN_GRAPH_REL;
+		/* try to get the call chain order */
+		else if (!strncmp(tok, "caller", strlen(tok)))
+			callchain_param.order = ORDER_CALLER;
+		else if (!strncmp(tok, "callee", strlen(tok)))
+			callchain_param.order = ORDER_CALLEE;
+		/* try to get the sort key */
+		else if (!strncmp(tok, "function", strlen(tok)))
+			callchain_param.key = CCKEY_FUNCTION;
+		else if (!strncmp(tok, "address", strlen(tok)))
+			callchain_param.key = CCKEY_ADDRESS;
+		/* try to get the min percent */
+		else if (!minpcnt_set) {
+			callchain_param.min_percent = strtod(tok, &endptr);
+			if (tok == endptr)
+				return -1;
+			minpcnt_set = true;
+		} else {
+			/* try print limit at last */
+			callchain_param.print_limit = strtoul(tok, &endptr, 0);
+			if (tok == endptr)
+				return -1;
+		}
 
-	if (tok2[0] != 'c') {
-		callchain_param.print_limit = strtoul(tok2, &endptr, 0);
-		tok2 = strtok(NULL, ",");
-		if (!tok2)
-			goto setup;
+		arg = NULL;
 	}
 
-	/* get the call chain order */
-	if (!strncmp(tok2, "caller", strlen("caller")))
-		callchain_param.order = ORDER_CALLER;
-	else if (!strncmp(tok2, "callee", strlen("callee")))
-		callchain_param.order = ORDER_CALLEE;
-	else
-		return -1;
-
-	/* Get the sort key */
-	tok2 = strtok(NULL, ",");
-	if (!tok2)
-		goto setup;
-	if (!strncmp(tok2, "function", strlen("function")))
-		callchain_param.key = CCKEY_FUNCTION;
-	else if (!strncmp(tok2, "address", strlen("address")))
-		callchain_param.key = CCKEY_ADDRESS;
-	else
-		return -1;
-setup:
 	if (callchain_register_param(&callchain_param) < 0) {
 		pr_err("Can't register callchain params\n");
 		return -1;
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] perf tools: Put callers above callee when callchain order is caller
  2014-08-14  6:01 [RFC/PATCHSET 0/3] perf tools: Callchain improvement with --children and -g caller Namhyung Kim
  2014-08-14  6:01 ` [PATCH 1/3] perf report: Relax -g option parsing not to limit the option order Namhyung Kim
@ 2014-08-14  6:01 ` Namhyung Kim
  2014-08-14  6:01 ` [PATCH 3/3] perf callchain: Prune misleading callchains for self entries Namhyung Kim
  2014-08-15 13:52 ` [RFC/PATCHSET 0/3] perf tools: Callchain improvement with --children and -g caller Arnaldo Carvalho de Melo
  3 siblings, 0 replies; 14+ messages in thread
From: Namhyung Kim @ 2014-08-14  6:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Paul Mackerras, Namhyung Kim,
	Namhyung Kim, LKML, Jiri Olsa, David Ahern, Andi Kleen,
	Frederic Weisbecker, Arun Sharma, Rodrigo Campos

The perf report/top shows callers above their callees, but if user
gives a callchain param to caller it's inverted.  Fix it.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/ui/hist.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/ui/hist.c b/tools/perf/ui/hist.c
index b5fa7019d2e2..eacbb561af78 100644
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
@@ -197,14 +197,16 @@ static int __hpp__sort_acc(struct hist_entry *a, struct hist_entry *b,
 	s64 ret = 0;
 
 	if (symbol_conf.cumulate_callchain) {
-		/*
-		 * Put caller above callee when they have equal period.
-		 */
 		ret = field_cmp(get_field(a), get_field(b));
 		if (ret)
 			return ret;
 
+		/*
+		 * Put caller above callee when they have equal period.
+		 */
 		ret = b->callchain->max_depth - a->callchain->max_depth;
+		if (callchain_param.order == ORDER_CALLER)
+			ret = -ret;
 	}
 	return ret;
 }
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] perf callchain: Prune misleading callchains for self entries
  2014-08-14  6:01 [RFC/PATCHSET 0/3] perf tools: Callchain improvement with --children and -g caller Namhyung Kim
  2014-08-14  6:01 ` [PATCH 1/3] perf report: Relax -g option parsing not to limit the option order Namhyung Kim
  2014-08-14  6:01 ` [PATCH 2/3] perf tools: Put callers above callee when callchain order is caller Namhyung Kim
@ 2014-08-14  6:01 ` Namhyung Kim
  2014-08-14 14:10   ` Jiri Olsa
  2014-08-15 13:52 ` [RFC/PATCHSET 0/3] perf tools: Callchain improvement with --children and -g caller Arnaldo Carvalho de Melo
  3 siblings, 1 reply; 14+ messages in thread
From: Namhyung Kim @ 2014-08-14  6:01 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Paul Mackerras, Namhyung Kim,
	Namhyung Kim, LKML, Jiri Olsa, David Ahern, Andi Kleen,
	Frederic Weisbecker, Arun Sharma, Rodrigo Campos

The "perf report -g" displays callchains callee-first order.  That
means it can see the callers of the sample in a reverse order.  For
example, "intel_idle" entry shows following callchain on my data.

  Children      Self  Command          Shared Object       Symbol
  --------  --------  ---------------  ------------------  ----------------
    40.53%    40.53%  swapper          [kernel.kallsyms]   [k] intel_idle
              |
               --- intel_idle
                   cpuidle_enter
                   cpuidle_wrap_enter
                   cpuidle_enter_tk
                   cpuidle_idle_call
                   cpu_idle
                  |
                  |--85.25%-- start_secondary
                  |
                   --14.75%-- rest_init
                             start_kernel
                             x86_64_start_reservations
                             x86_64_start_kernel

So "intel_idle" was called by "cpuidle_enter", and in turn, it's
called by "cpuidle_wrap_enter" and so on.

When with -g "caller", it shows callchains in a reversed order -
"caller-first".  I know it's sometimes useful but never used it.

    40.53%    40.53%  swapper          [kernel.kallsyms]   [k] intel_idle
                    |
                    |--85.25%-- start_secondary
                    |          cpu_idle
                    |          cpuidle_idle_call
                    |          cpuidle_enter_tk
                    |          cpuidle_wrap_enter
                    |          cpuidle_enter
                    |          intel_idle
                    |
                     --14.75%-- x86_64_start_kernel
                               x86_64_start_reservations
                               start_kernel
                               rest_init
                               cpu_idle
                               cpuidle_idle_call
                               cpuidle_enter_tk
                               cpuidle_wrap_enter
                               cpuidle_enter
                               intel_idle

However, with --children feature added, it now can show all callees of
the entry.  For example, "start_kernel" entry now can display it calls
rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).

     6.14%     0.00%  swapper          [kernel.kallsyms]   [k] start_kernel
                   |
                    --- start_kernel
                        rest_init
                        cpu_idle
                       |
                       |--97.52%-- cpuidle_idle_call
                       |          cpuidle_enter_tk
                       |          |
                       |          |--99.91%-- cpuidle_wrap_enter
                       |          |          cpuidle_enter
                       |          |          intel_idle
                       |           --0.09%-- [...]
                        --2.48%-- [...]

Note that start_kernel has no self overhead - meaning that it never
get sampled by itself but constructs such a nice callgraph.  But,
sadly, if an entry has self overhead, callchain will get confused with
generated callchain (like above) and self callchains (which reversed
order) like the eariler example.

To be consistent with other entries, I'd like to make it just to show
a single entry - itself - like below since it doesn't have callees
(children) at all.  But still use the whole callchain to construct
children entries (like the start_kernel) as usual.

    40.53%    40.53%  swapper          [kernel.kallsyms]   [k] intel_idle
                    |
                    --- intel_idle

Note that this is just for --children is enabled.  Without that, it
will do the original behavior.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 30df6187ee02..67f249daf34e 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -716,7 +716,28 @@ iter_add_single_cumulative_entry(struct hist_entry_iter *iter,
 	iter->he = he;
 	he_cache[iter->curr++] = he;
 
-	callchain_append(he->callchain, &callchain_cursor, sample->period);
+	if (callchain_param.order == ORDER_CALLER) {
+		struct callchain_cursor cursor;
+		int nr = callchain_cursor.nr;
+
+		/*
+		 * When --children with -g caller, it just adds noise to
+		 * self entries.  Just adding last node (self) is enough
+		 * and it'd privide a consistent view with other (cumulative)
+		 * entries.
+		 */
+		while (--nr) {
+			callchain_cursor_advance(&callchain_cursor);
+		}
+
+		BUG_ON(callchain_cursor.nr != callchain_cursor.pos + 1);
+
+		callchain_cursor_snapshot(&cursor, &callchain_cursor);
+
+		callchain_append(he->callchain, &cursor, sample->period);
+	} else {
+		callchain_append(he->callchain, &callchain_cursor, sample->period);
+	}
 
 	/*
 	 * We need to re-initialize the cursor since callchain_append()
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries
  2014-08-14  6:01 ` [PATCH 3/3] perf callchain: Prune misleading callchains for self entries Namhyung Kim
@ 2014-08-14 14:10   ` Jiri Olsa
  2014-08-15  1:57     ` Namhyung Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Jiri Olsa @ 2014-08-14 14:10 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Paul Mackerras, Namhyung Kim, LKML, David Ahern, Andi Kleen,
	Frederic Weisbecker, Arun Sharma, Rodrigo Campos

On Thu, Aug 14, 2014 at 03:01:40PM +0900, Namhyung Kim wrote:

SNIP

> However, with --children feature added, it now can show all callees of
> the entry.  For example, "start_kernel" entry now can display it calls
> rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).
> 
>      6.14%     0.00%  swapper          [kernel.kallsyms]   [k] start_kernel
>                    |
>                     --- start_kernel
>                         rest_init
>                         cpu_idle
>                        |
>                        |--97.52%-- cpuidle_idle_call
>                        |          cpuidle_enter_tk
>                        |          |
>                        |          |--99.91%-- cpuidle_wrap_enter
>                        |          |          cpuidle_enter
>                        |          |          intel_idle
>                        |           --0.09%-- [...]
>                         --2.48%-- [...]
> 
> Note that start_kernel has no self overhead - meaning that it never
> get sampled by itself but constructs such a nice callgraph.  But,
> sadly, if an entry has self overhead, callchain will get confused with
> generated callchain (like above) and self callchains (which reversed
> order) like the eariler example.
> 
> To be consistent with other entries, I'd like to make it just to show
> a single entry - itself - like below since it doesn't have callees
> (children) at all.  But still use the whole callchain to construct
> children entries (like the start_kernel) as usual.
> 
>     40.53%    40.53%  swapper          [kernel.kallsyms]   [k] intel_idle
>                     |
>                     --- intel_idle

I understand the consistency point, but I think we'd loose
usefull info by cutting this off

I guess I can run 'report -g callee' to find out who called intel_idle
instead.. but I would not need to if the callchain stays here

jirka

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries
  2014-08-14 14:10   ` Jiri Olsa
@ 2014-08-15  1:57     ` Namhyung Kim
  2014-08-15 19:51       ` Jiri Olsa
  0 siblings, 1 reply; 14+ messages in thread
From: Namhyung Kim @ 2014-08-15  1:57 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Paul Mackerras, Namhyung Kim, LKML, David Ahern, Andi Kleen,
	Frederic Weisbecker, Arun Sharma, Rodrigo Campos

Hi Jiri,

2014-08-14 (목), 16:10 +0200, Jiri Olsa:
> On Thu, Aug 14, 2014 at 03:01:40PM +0900, Namhyung Kim wrote:
> 
> SNIP
> 
> > However, with --children feature added, it now can show all callees of
> > the entry.  For example, "start_kernel" entry now can display it calls
> > rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).
> > 
> >      6.14%     0.00%  swapper          [kernel.kallsyms]   [k] start_kernel
> >                    |
> >                     --- start_kernel
> >                         rest_init
> >                         cpu_idle
> >                        |
> >                        |--97.52%-- cpuidle_idle_call
> >                        |          cpuidle_enter_tk
> >                        |          |
> >                        |          |--99.91%-- cpuidle_wrap_enter
> >                        |          |          cpuidle_enter
> >                        |          |          intel_idle
> >                        |           --0.09%-- [...]
> >                         --2.48%-- [...]
> > 
> > Note that start_kernel has no self overhead - meaning that it never
> > get sampled by itself but constructs such a nice callgraph.  But,
> > sadly, if an entry has self overhead, callchain will get confused with
> > generated callchain (like above) and self callchains (which reversed
> > order) like the eariler example.
> > 
> > To be consistent with other entries, I'd like to make it just to show
> > a single entry - itself - like below since it doesn't have callees
> > (children) at all.  But still use the whole callchain to construct
> > children entries (like the start_kernel) as usual.
> > 
> >     40.53%    40.53%  swapper          [kernel.kallsyms]   [k] intel_idle
> >                     |
> >                     --- intel_idle
> 
> I understand the consistency point, but I think we'd loose
> usefull info by cutting this off
> 
> I guess I can run 'report -g callee' to find out who called intel_idle
> instead.. but I would not need to if the callchain stays here

Yeah, but current behavior intermixes caller-callchains and
callee-callchains together so adds confusion to users.  This is a
problem IMHO.

And with --children you can easily see the callers right above the entry
as they likely to have same or higher children overhead.

Thanks,
Namhyung



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC/PATCHSET 0/3] perf tools: Callchain improvement with --children and -g caller
  2014-08-14  6:01 [RFC/PATCHSET 0/3] perf tools: Callchain improvement with --children and -g caller Namhyung Kim
                   ` (2 preceding siblings ...)
  2014-08-14  6:01 ` [PATCH 3/3] perf callchain: Prune misleading callchains for self entries Namhyung Kim
@ 2014-08-15 13:52 ` Arnaldo Carvalho de Melo
  3 siblings, 0 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-08-15 13:52 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Paul Mackerras, Namhyung Kim, LKML,
	Jiri Olsa, David Ahern, Andi Kleen, Frederic Weisbecker,
	Arun Sharma, Rodrigo Campos

Em Thu, Aug 14, 2014 at 03:01:37PM +0900, Namhyung Kim escreveu:
> Hello,
> 
> This patchset tries to improve the callchain behavior when caller
> ordering (which is not default) and --children (which is default)
> enabled.  I think that we might need to change default to caller
> ordering when --children is enabled - but it needs more discussion of
> course.

Applied the first one, waiting for a conclusion to the discussion with
Jiri to process the other two.

- Arnaldo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries
  2014-08-15  1:57     ` Namhyung Kim
@ 2014-08-15 19:51       ` Jiri Olsa
  2014-08-16  2:26         ` Namhyung Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Jiri Olsa @ 2014-08-15 19:51 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Paul Mackerras, Namhyung Kim, LKML, David Ahern, Andi Kleen,
	Frederic Weisbecker, Arun Sharma, Rodrigo Campos

On Fri, Aug 15, 2014 at 10:57:14AM +0900, Namhyung Kim wrote:
> Hi Jiri,
> 
> 2014-08-14 (목), 16:10 +0200, Jiri Olsa:
> > On Thu, Aug 14, 2014 at 03:01:40PM +0900, Namhyung Kim wrote:
> > 
> > SNIP
> > 
> > > However, with --children feature added, it now can show all callees of
> > > the entry.  For example, "start_kernel" entry now can display it calls
> > > rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).
> > > 
> > >      6.14%     0.00%  swapper          [kernel.kallsyms]   [k] start_kernel
> > >                    |
> > >                     --- start_kernel
> > >                         rest_init
> > >                         cpu_idle
> > >                        |
> > >                        |--97.52%-- cpuidle_idle_call
> > >                        |          cpuidle_enter_tk
> > >                        |          |
> > >                        |          |--99.91%-- cpuidle_wrap_enter
> > >                        |          |          cpuidle_enter
> > >                        |          |          intel_idle
> > >                        |           --0.09%-- [...]
> > >                         --2.48%-- [...]
> > > 
> > > Note that start_kernel has no self overhead - meaning that it never
> > > get sampled by itself but constructs such a nice callgraph.  But,
> > > sadly, if an entry has self overhead, callchain will get confused with
> > > generated callchain (like above) and self callchains (which reversed
> > > order) like the eariler example.
> > > 
> > > To be consistent with other entries, I'd like to make it just to show
> > > a single entry - itself - like below since it doesn't have callees
> > > (children) at all.  But still use the whole callchain to construct
> > > children entries (like the start_kernel) as usual.
> > > 
> > >     40.53%    40.53%  swapper          [kernel.kallsyms]   [k] intel_idle
> > >                     |
> > >                     --- intel_idle
> > 
> > I understand the consistency point, but I think we'd loose
> > usefull info by cutting this off
> > 
> > I guess I can run 'report -g callee' to find out who called intel_idle
> > instead.. but I would not need to if the callchain stays here
> 
> Yeah, but current behavior intermixes caller-callchains and
> callee-callchains together so adds confusion to users.  This is a
> problem IMHO.

hum, where is it callee/caller mixed?  with following example:

---
void c(void)
{
}

void b(void)
{
        c();
}

void a(void)
{
        b();
}

int main(void)
{
        while(1) {
                a();
                b();
                c();
        }
}
---

for 'c' the current code will display:

-   43.74%    43.74%  t        t                 [.] c                                                                                    ▒
   - __libc_start_main                                                                                                                    ▒
      - 86.33% main                                                                                                                       ▒
           67.08% c                                                                                                                       ▒
         - 32.91% a                                                                                                                       ▒
              99.44% c                                                                                                                    ▒
            - 0.56% b                                                                                                                     ▒
                 c                                                                                                                        ▒
        13.67% c                                                                                                                          ▒

and with this patch:

-   43.74%    43.74%  t        t                 [.] c                                                                                    ▒
     c                                                                                                                                    ▒


The 'c' callchain is still in caller order. IMO we should
keep whole callchain here.

jirka

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries
  2014-08-15 19:51       ` Jiri Olsa
@ 2014-08-16  2:26         ` Namhyung Kim
  2014-08-18 11:31           ` Jiri Olsa
  0 siblings, 1 reply; 14+ messages in thread
From: Namhyung Kim @ 2014-08-16  2:26 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Paul Mackerras, Namhyung Kim, LKML, David Ahern, Andi Kleen,
	Frederic Weisbecker, Arun Sharma, Rodrigo Campos

2014-08-15 (금), 21:51 +0200, Jiri Olsa:
> On Fri, Aug 15, 2014 at 10:57:14AM +0900, Namhyung Kim wrote:
> > Hi Jiri,
> > 
> > 2014-08-14 (목), 16:10 +0200, Jiri Olsa:
> > > On Thu, Aug 14, 2014 at 03:01:40PM +0900, Namhyung Kim wrote:
> > > 
> > > SNIP
> > > 
> > > > However, with --children feature added, it now can show all callees of
> > > > the entry.  For example, "start_kernel" entry now can display it calls
> > > > rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).
> > > > 
> > > >      6.14%     0.00%  swapper          [kernel.kallsyms]   [k] start_kernel
> > > >                    |
> > > >                     --- start_kernel
> > > >                         rest_init
> > > >                         cpu_idle
> > > >                        |
> > > >                        |--97.52%-- cpuidle_idle_call
> > > >                        |          cpuidle_enter_tk
> > > >                        |          |
> > > >                        |          |--99.91%-- cpuidle_wrap_enter
> > > >                        |          |          cpuidle_enter
> > > >                        |          |          intel_idle
> > > >                        |           --0.09%-- [...]
> > > >                         --2.48%-- [...]
> > > > 
> > > > Note that start_kernel has no self overhead - meaning that it never
> > > > get sampled by itself but constructs such a nice callgraph.  But,
> > > > sadly, if an entry has self overhead, callchain will get confused with
> > > > generated callchain (like above) and self callchains (which reversed
> > > > order) like the eariler example.
> > > > 
> > > > To be consistent with other entries, I'd like to make it just to show
> > > > a single entry - itself - like below since it doesn't have callees
> > > > (children) at all.  But still use the whole callchain to construct
> > > > children entries (like the start_kernel) as usual.
> > > > 
> > > >     40.53%    40.53%  swapper          [kernel.kallsyms]   [k] intel_idle
> > > >                     |
> > > >                     --- intel_idle
> > > 
> > > I understand the consistency point, but I think we'd loose
> > > usefull info by cutting this off
> > > 
> > > I guess I can run 'report -g callee' to find out who called intel_idle
> > > instead.. but I would not need to if the callchain stays here
> > 
> > Yeah, but current behavior intermixes caller-callchains and
> > callee-callchains together so adds confusion to users.  This is a
> > problem IMHO.
> 
> hum, where is it callee/caller mixed?  with following example:
> 
> ---
> void c(void)
> {
> }
> 
> void b(void)
> {
>         c();
> }
> 
> void a(void)
> {
>         b();
> }
> 
> int main(void)
> {
>         while(1) {
>                 a();
>                 b();
>                 c();
>         }
> }
> ---
> 
> for 'c' the current code will display:
> 
> -   43.74%    43.74%  t        t                 [.] c                                                                                    ▒
>    - __libc_start_main                                                                                                                    ▒
>       - 86.33% main                                                                                                                       ▒
>            67.08% c                                                                                                                       ▒
>          - 32.91% a                                                                                                                       ▒
>               99.44% c                                                                                                                    ▒
>             - 0.56% b                                                                                                                     ▒
>                  c                                                                                                                        ▒
>         13.67% c                                                                                                                          ▒
> 
> and with this patch:
> 
> -   43.74%    43.74%  t        t                 [.] c                                                                                    ▒
>      c                                                                                                                                    ▒
> 
> 
> The 'c' callchain is still in caller order. IMO we should
> keep whole callchain here.

The problem is not in pure self entry (that has self overhead = children
overhead) and pure cumulative entry (self overhead = 0).  It's in mixed
entries, please see last two examples in the description 0/3.

Thanks,
Namhyung



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip:perf/core] perf report: Relax -g option parsing not to limit the option order
  2014-08-14  6:01 ` [PATCH 1/3] perf report: Relax -g option parsing not to limit the option order Namhyung Kim
@ 2014-08-18  8:22   ` tip-bot for Namhyung Kim
  0 siblings, 0 replies; 14+ messages in thread
From: tip-bot for Namhyung Kim @ 2014-08-18  8:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, linux-kernel, paulus, hpa, mingo, andi, a.p.zijlstra,
	namhyung.kim, rodrigo, namhyung, jolsa, fweisbec, dsahern, tglx,
	asharma

Commit-ID:  e8232f1ad4682c34e7e774c212ccd0c15bb5aa26
Gitweb:     http://git.kernel.org/tip/e8232f1ad4682c34e7e774c212ccd0c15bb5aa26
Author:     Namhyung Kim <namhyung@kernel.org>
AuthorDate: Thu, 14 Aug 2014 15:01:38 +0900
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 15 Aug 2014 10:50:07 -0300

perf report: Relax -g option parsing not to limit the option order

Current perf report -g/--call-graph option parser requires for option
argument having following order:

  type,min_percent[,print_limit],order,key

But sometimes it's annoying to type all even if one just wants to change
the "order" or "key" setting.

This patch fixes it to remove the ordering restriction so that one can
use just "-g caller", for instance.  The only remaining restriction is
that the "print_limit" always comes after the "min_percent".

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Arun Sharma <asharma@fb.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rodrigo Campos <rodrigo@sdfg.com.ar>
Link: http://lkml.kernel.org/r/1407996100-6359-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/callchain.c | 95 ++++++++++++++++++---------------------------
 1 file changed, 38 insertions(+), 57 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 437ee09..08f0fbf 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -28,74 +28,55 @@ __thread struct callchain_cursor callchain_cursor;
 int
 parse_callchain_report_opt(const char *arg)
 {
-	char *tok, *tok2;
+	char *tok;
 	char *endptr;
+	bool minpcnt_set = false;
 
 	symbol_conf.use_callchain = true;
 
 	if (!arg)
 		return 0;
 
-	tok = strtok((char *)arg, ",");
-	if (!tok)
-		return -1;
-
-	/* get the output mode */
-	if (!strncmp(tok, "graph", strlen(arg))) {
-		callchain_param.mode = CHAIN_GRAPH_ABS;
-
-	} else if (!strncmp(tok, "flat", strlen(arg))) {
-		callchain_param.mode = CHAIN_FLAT;
-	} else if (!strncmp(tok, "fractal", strlen(arg))) {
-		callchain_param.mode = CHAIN_GRAPH_REL;
-	} else if (!strncmp(tok, "none", strlen(arg))) {
-		callchain_param.mode = CHAIN_NONE;
-		symbol_conf.use_callchain = false;
-		return 0;
-	} else {
-		return -1;
-	}
-
-	/* get the min percentage */
-	tok = strtok(NULL, ",");
-	if (!tok)
-		goto setup;
-
-	callchain_param.min_percent = strtod(tok, &endptr);
-	if (tok == endptr)
-		return -1;
+	while ((tok = strtok((char *)arg, ",")) != NULL) {
+		if (!strncmp(tok, "none", strlen(tok))) {
+			callchain_param.mode = CHAIN_NONE;
+			symbol_conf.use_callchain = false;
+			return 0;
+		}
 
-	/* get the print limit */
-	tok2 = strtok(NULL, ",");
-	if (!tok2)
-		goto setup;
+		/* try to get the output mode */
+		if (!strncmp(tok, "graph", strlen(tok)))
+			callchain_param.mode = CHAIN_GRAPH_ABS;
+		else if (!strncmp(tok, "flat", strlen(tok)))
+			callchain_param.mode = CHAIN_FLAT;
+		else if (!strncmp(tok, "fractal", strlen(tok)))
+			callchain_param.mode = CHAIN_GRAPH_REL;
+		/* try to get the call chain order */
+		else if (!strncmp(tok, "caller", strlen(tok)))
+			callchain_param.order = ORDER_CALLER;
+		else if (!strncmp(tok, "callee", strlen(tok)))
+			callchain_param.order = ORDER_CALLEE;
+		/* try to get the sort key */
+		else if (!strncmp(tok, "function", strlen(tok)))
+			callchain_param.key = CCKEY_FUNCTION;
+		else if (!strncmp(tok, "address", strlen(tok)))
+			callchain_param.key = CCKEY_ADDRESS;
+		/* try to get the min percent */
+		else if (!minpcnt_set) {
+			callchain_param.min_percent = strtod(tok, &endptr);
+			if (tok == endptr)
+				return -1;
+			minpcnt_set = true;
+		} else {
+			/* try print limit at last */
+			callchain_param.print_limit = strtoul(tok, &endptr, 0);
+			if (tok == endptr)
+				return -1;
+		}
 
-	if (tok2[0] != 'c') {
-		callchain_param.print_limit = strtoul(tok2, &endptr, 0);
-		tok2 = strtok(NULL, ",");
-		if (!tok2)
-			goto setup;
+		arg = NULL;
 	}
 
-	/* get the call chain order */
-	if (!strncmp(tok2, "caller", strlen("caller")))
-		callchain_param.order = ORDER_CALLER;
-	else if (!strncmp(tok2, "callee", strlen("callee")))
-		callchain_param.order = ORDER_CALLEE;
-	else
-		return -1;
-
-	/* Get the sort key */
-	tok2 = strtok(NULL, ",");
-	if (!tok2)
-		goto setup;
-	if (!strncmp(tok2, "function", strlen("function")))
-		callchain_param.key = CCKEY_FUNCTION;
-	else if (!strncmp(tok2, "address", strlen("address")))
-		callchain_param.key = CCKEY_ADDRESS;
-	else
-		return -1;
-setup:
 	if (callchain_register_param(&callchain_param) < 0) {
 		pr_err("Can't register callchain params\n");
 		return -1;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries
  2014-08-16  2:26         ` Namhyung Kim
@ 2014-08-18 11:31           ` Jiri Olsa
  2014-08-19  5:51             ` Namhyung Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Jiri Olsa @ 2014-08-18 11:31 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Paul Mackerras, Namhyung Kim, LKML, David Ahern, Andi Kleen,
	Frederic Weisbecker, Arun Sharma, Rodrigo Campos

On Sat, Aug 16, 2014 at 11:26:31AM +0900, Namhyung Kim wrote:

SNIP

> > 
> > hum, where is it callee/caller mixed?  with following example:
> > 
> > ---
> > void c(void)
> > {
> > }
> > 
> > void b(void)
> > {
> >         c();
> > }
> > 
> > void a(void)
> > {
> >         b();
> > }
> > 
> > int main(void)
> > {
> >         while(1) {
> >                 a();
> >                 b();
> >                 c();
> >         }
> > }
> > ---
> > 
> > for 'c' the current code will display:
> > 
> > -   43.74%    43.74%  t        t                 [.] c                                                                                    ▒
> >    - __libc_start_main                                                                                                                    ▒
> >       - 86.33% main                                                                                                                       ▒
> >            67.08% c                                                                                                                       ▒
> >          - 32.91% a                                                                                                                       ▒
> >               99.44% c                                                                                                                    ▒
> >             - 0.56% b                                                                                                                     ▒
> >                  c                                                                                                                        ▒
> >         13.67% c                                                                                                                          ▒
> > 
> > and with this patch:
> > 
> > -   43.74%    43.74%  t        t                 [.] c                                                                                    ▒
> >      c                                                                                                                                    ▒
> > 
> > 
> > The 'c' callchain is still in caller order. IMO we should
> > keep whole callchain here.
> 
> The problem is not in pure self entry (that has self overhead = children
> overhead) and pure cumulative entry (self overhead = 0).  It's in mixed
> entries, please see last two examples in the description 0/3.

right, but it still affects pure entries as well
anyway, let's see the mixed entry

for 'a' the current code will display:
-   31.97%    17.16%  t        t                  [.] a                                                        ▒
   - __libc_start_main                                                                                         ◆
        81.08% a                                                                                               ▒
      - 18.92% main                                                                                            ▒
           a                                                                                                   ▒
   - a                                                                                                         ▒
        85.05% c                                                                                               ▒
      - 14.91% b                                                                                               ▒
           100.00% c                                                                                           ▒

and with this patch:
-   31.97%    17.16%  t        t                  [.] a                                                        ▒
   - a                                                                                                         ▒
        85.05% c                                                                                               ▒
      - 14.91% b                                                                                               ▒
           100.00% c                                                                                           ▒

so we'll miss the 'self' callchain of 'a' symbol

if we want to avoid the confusion about 2 different callchains, how
about marking them with 'self' and 'children' tags, instead of removing
one of them, like:

for 'a' the current code will display:
-   31.97%    17.16%  t        t                  [.] a                                                        ▒
   - [self]
     __libc_start_main                                                                                         ◆
        81.08% a                                                                                               ▒
      - 18.92% main                                                                                            ▒
           a                                                                                                   ▒
   - [children]
     a                                                                                                         ▒
        85.05% c                                                                                               ▒
      - 14.91% b                                                                                               ▒
           100.00% c                                                                                           ▒


jirka

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries
  2014-08-18 11:31           ` Jiri Olsa
@ 2014-08-19  5:51             ` Namhyung Kim
  2014-08-19  7:10               ` Jiri Olsa
  0 siblings, 1 reply; 14+ messages in thread
From: Namhyung Kim @ 2014-08-19  5:51 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Paul Mackerras, Namhyung Kim, LKML, David Ahern, Andi Kleen,
	Frederic Weisbecker, Arun Sharma, Rodrigo Campos

Hi Jiri,

On Mon, 18 Aug 2014 13:31:47 +0200, Jiri Olsa wrote:
> On Sat, Aug 16, 2014 at 11:26:31AM +0900, Namhyung Kim wrote:
>
> SNIP
>
>> > 
>> > hum, where is it callee/caller mixed?  with following example:
>> > 
>> > ---
>> > void c(void)
>> > {
>> > }
>> > 
>> > void b(void)
>> > {
>> >         c();
>> > }
>> > 
>> > void a(void)
>> > {
>> >         b();
>> > }
>> > 
>> > int main(void)
>> > {
>> >         while(1) {
>> >                 a();
>> >                 b();
>> >                 c();
>> >         }
>> > }
>> > ---
>> > 
>> > for 'c' the current code will display:
>> > 
>> > -   43.74%    43.74%  t        t                 [.] c                                                                                    ▒
>> >    - __libc_start_main                                                                                                                    ▒
>> >       - 86.33% main                                                                                                                       ▒
>> >            67.08% c                                                                                                                       ▒
>> >          - 32.91% a                                                                                                                       ▒
>> >               99.44% c                                                                                                                    ▒
>> >             - 0.56% b                                                                                                                     ▒
>> >                  c                                                                                                                        ▒
>> >         13.67% c                                                                                                                          ▒
>> > 
>> > and with this patch:
>> > 
>> > -   43.74%    43.74%  t        t                 [.] c                                                                                    ▒
>> >      c                                                                                                                                    ▒
>> > 
>> > 
>> > The 'c' callchain is still in caller order. IMO we should
>> > keep whole callchain here.
>> 
>> The problem is not in pure self entry (that has self overhead = children
>> overhead) and pure cumulative entry (self overhead = 0).  It's in mixed
>> entries, please see last two examples in the description 0/3.
>
> right, but it still affects pure entries as well
> anyway, let's see the mixed entry
>
> for 'a' the current code will display:
> -   31.97%    17.16%  t        t                  [.] a                                                        ▒
>    - __libc_start_main                                                                                         ◆
>         81.08% a                                                                                               ▒
>       - 18.92% main                                                                                            ▒
>            a                                                                                                   ▒
>    - a                                                                                                         ▒
>         85.05% c                                                                                               ▒
>       - 14.91% b                                                                                               ▒
>            100.00% c                                                                                           ▒
>
> and with this patch:
> -   31.97%    17.16%  t        t                  [.] a                                                        ▒
>    - a                                                                                                         ▒
>         85.05% c                                                                                               ▒
>       - 14.91% b                                                                                               ▒
>            100.00% c                                                                                           ▒
>
> so we'll miss the 'self' callchain of 'a' symbol
>
> if we want to avoid the confusion about 2 different callchains, how
> about marking them with 'self' and 'children' tags, instead of removing
> one of them, like:
>
> for 'a' the current code will display:
> -   31.97%    17.16%  t        t                  [.] a                                                        ▒
>    - [self]
>      __libc_start_main                                                                                         ◆
>         81.08% a                                                                                               ▒
>       - 18.92% main                                                                                            ▒
>            a                                                                                                   ▒
>    - [children]
>      a                                                                                                         ▒
>         85.05% c                                                                                               ▒
>       - 14.91% b                                                                                               ▒
>            100.00% c                                                                                           ▒

I think it just adds complexity in implementation and confuses users.
Why do we keep those two callchains just for the caller mode?

I ran same code, recorded data and report it with current code.  It shows
who are the callers of the symbol:

  $ perf report   # same as passing -g callee
    Children      Self  Command  Shared Object      Symbols
  ---------------------------------------------------------------------------------
  +   99.98%     0.00%  abc2     libc-2.17.so       [.] __libc_start_main
  -   79.75%    11.29%  abc2     abc2               [.] main
       main
       __libc_start_main
  -   53.07%    52.99%  abc2     abc2               [.] c
     - c
        + 46.12% main
        + 30.99% a
          12.88% __libc_start_main
        + 10.01% b
  -   34.12%    28.75%  abc2     abc2               [.] b
     - b
        + 52.61% main
          25.93% __libc_start_main
        + 21.46% a
  -   30.56%     6.78%  abc2     abc2               [.] a
     - a
        + 85.14% main
          14.86% __libc_start_main
  +    0.17%     0.01%  abc2     [kernel.kallsyms]  [k] apic_timer_interrupt
  +    0.15%     0.00%  abc2     [kernel.kallsyms]  [k] smp_apic_timer_interrupt
  +    0.11%     0.00%  abc2     [kernel.kallsyms]  [k] hrtimer_interrupt


With -g caller option, the current code now shows callees and callers
altogether:

  $ perf report -g caller
    Children      Self  Command  Shared Object      Symbols
  ---------------------------------------------------------------------------------
  +   99.98%     0.00%  abc2     libc-2.17.so       [.] __libc_start_main
  -   79.75%    11.29%  abc2     abc2               [.] main
     - main
        + 38.01% a
        + 35.75% c
        + 26.22% b
       __libc_start_main
       main
  -   53.07%    52.99%  abc2     abc2               [.] c
     - __libc_start_main
        + 87.15% main
          12.85% c
  -   34.12%    28.75%  abc2     abc2               [.] b
     - __libc_start_main
        + 69.27% main
          30.73% b
     - b
          98.85% c
        + 1.15% apic_timer_interrupt
  -    30.56%     6.78%  abc2     abc2              [.] a
     - a
        + 69.15% c
        + 30.80% b
     - __libc_start_main
          66.86% a
        + 33.14% main
  +    0.17%     0.01%  abc2     [kernel.kallsyms]  [k] apic_timer_interrupt
  +    0.15%     0.00%  abc2     [kernel.kallsyms]  [k] smp_apic_timer_interrupt
  +    0.11%     0.00%  abc2     [kernel.kallsyms]  [k] hrtimer_interrupt


IMHO, it's harder to find out the right info.  Even if we might add
self/children tag to the callchains, I still think it'd be better just
to trim down irrelavant info from the output.

After applying this patch, it'll only show the callees of the symbol and
I think this is what most users expect to see:

  +   99.98%     0.00%  abc2     libc-2.17.so       [.] __libc_start_main
  -   79.75%    11.29%  abc2     abc2               [.] main
     - main
        + 38.01% a
        + 35.75% c
        + 26.22% b
  -   53.07%    52.99%  abc2     abc2               [.] c
     - c
        + 100.00% apic_timer_interrupt
  -   34.12%    28.75%  abc2     abc2               [.] b
     - b
          98.85% c
        + 1.15% apic_timer_interrupt
  -   30.56%     6.78%  abc2     abc2               [.] a
     - a
        + 69.15% c
        + 30.80% b
  +    0.17%     0.01%  abc2     [kernel.kallsyms]  [k] apic_timer_interrupt
  +    0.15%     0.00%  abc2     [kernel.kallsyms]  [k] smp_apic_timer_interrupt
  +    0.11%     0.00%  abc2     [kernel.kallsyms]  [k] hrtimer_interrupt


If you want to see the callers of 'c', you may go to above entries (main
and __libc_start_main in this case) and then follow the callchains.  If
it's not enough, you can re-run perf report with -g callee option.

Now It becomes a problem of dispalying whether callers or callees of
symbols rather than which order (callee-first or caller-first) it
prints.  So what current name (caller and callee) does is exactly the
opposite.  :-/

Btw I can see some other bugs in the callchain result too and will try
to fix it soon.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries
  2014-08-19  5:51             ` Namhyung Kim
@ 2014-08-19  7:10               ` Jiri Olsa
  2014-08-19  8:20                 ` Namhyung Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Jiri Olsa @ 2014-08-19  7:10 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Paul Mackerras, Namhyung Kim, LKML, David Ahern, Andi Kleen,
	Frederic Weisbecker, Arun Sharma, Rodrigo Campos

On Tue, Aug 19, 2014 at 02:51:31PM +0900, Namhyung Kim wrote:
SNIP

> IMHO, it's harder to find out the right info.  Even if we might add
> self/children tag to the callchains, I still think it'd be better just
> to trim down irrelavant info from the output.
> 
> After applying this patch, it'll only show the callees of the symbol and
> I think this is what most users expect to see:
> 
>   +   99.98%     0.00%  abc2     libc-2.17.so       [.] __libc_start_main
>   -   79.75%    11.29%  abc2     abc2               [.] main
>      - main
>         + 38.01% a
>         + 35.75% c
>         + 26.22% b
>   -   53.07%    52.99%  abc2     abc2               [.] c
>      - c
>         + 100.00% apic_timer_interrupt
>   -   34.12%    28.75%  abc2     abc2               [.] b
>      - b
>           98.85% c
>         + 1.15% apic_timer_interrupt
>   -   30.56%     6.78%  abc2     abc2               [.] a
>      - a
>         + 69.15% c
>         + 30.80% b
>   +    0.17%     0.01%  abc2     [kernel.kallsyms]  [k] apic_timer_interrupt
>   +    0.15%     0.00%  abc2     [kernel.kallsyms]  [k] smp_apic_timer_interrupt
>   +    0.11%     0.00%  abc2     [kernel.kallsyms]  [k] hrtimer_interrupt
> 
> 
> If you want to see the callers of 'c', you may go to above entries (main
> and __libc_start_main in this case) and then follow the callchains.  If
> it's not enough, you can re-run perf report with -g callee option.

but only if you know who called them, like in this simple example

IMHO this part of callchain is not confusing enough to be cut,
for me it's usefull info for which I'd need to make more steps
to get it

anyway, this is from review point of view.. as a user I dont use
'-g caller' often enough to actually care ;-)

thanks,
jirka

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries
  2014-08-19  7:10               ` Jiri Olsa
@ 2014-08-19  8:20                 ` Namhyung Kim
  0 siblings, 0 replies; 14+ messages in thread
From: Namhyung Kim @ 2014-08-19  8:20 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Paul Mackerras, Namhyung Kim, LKML, David Ahern, Andi Kleen,
	Frederic Weisbecker, Arun Sharma, Rodrigo Campos

On Tue, 19 Aug 2014 09:10:24 +0200, Jiri Olsa wrote:
> On Tue, Aug 19, 2014 at 02:51:31PM +0900, Namhyung Kim wrote:
> SNIP
>
>> IMHO, it's harder to find out the right info.  Even if we might add
>> self/children tag to the callchains, I still think it'd be better just
>> to trim down irrelavant info from the output.
>> 
>> After applying this patch, it'll only show the callees of the symbol and
>> I think this is what most users expect to see:
>> 
>>   +   99.98%     0.00%  abc2     libc-2.17.so       [.] __libc_start_main
>>   -   79.75%    11.29%  abc2     abc2               [.] main
>>      - main
>>         + 38.01% a
>>         + 35.75% c
>>         + 26.22% b
>>   -   53.07%    52.99%  abc2     abc2               [.] c
>>      - c
>>         + 100.00% apic_timer_interrupt
>>   -   34.12%    28.75%  abc2     abc2               [.] b
>>      - b
>>           98.85% c
>>         + 1.15% apic_timer_interrupt
>>   -   30.56%     6.78%  abc2     abc2               [.] a
>>      - a
>>         + 69.15% c
>>         + 30.80% b
>>   +    0.17%     0.01%  abc2     [kernel.kallsyms]  [k] apic_timer_interrupt
>>   +    0.15%     0.00%  abc2     [kernel.kallsyms]  [k] smp_apic_timer_interrupt
>>   +    0.11%     0.00%  abc2     [kernel.kallsyms]  [k] hrtimer_interrupt
>> 
>> 
>> If you want to see the callers of 'c', you may go to above entries (main
>> and __libc_start_main in this case) and then follow the callchains.  If
>> it's not enough, you can re-run perf report with -g callee option.
>
> but only if you know who called them, like in this simple example

Nope.  The --children option tends to collect callers above a callee.
This is not guaranteed though, but for most cases it'll work.  If it's
not case, one can use -g callee.


>
> IMHO this part of callchain is not confusing enough to be cut,
> for me it's usefull info for which I'd need to make more steps
> to get it

Well, that might be true for this simple case, but I'm not sure for
large and complex data.  I'd really like to keep the code and the output
simpler, more consistent and more intuitive.


>
> anyway, this is from review point of view.. as a user I dont use
> '-g caller' often enough to actually care ;-)

Yeah, me too.  But with --children option, it becomes very powerful and
can give another viewpoint of the performance analysis IMHO.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-08-19  8:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-14  6:01 [RFC/PATCHSET 0/3] perf tools: Callchain improvement with --children and -g caller Namhyung Kim
2014-08-14  6:01 ` [PATCH 1/3] perf report: Relax -g option parsing not to limit the option order Namhyung Kim
2014-08-18  8:22   ` [tip:perf/core] " tip-bot for Namhyung Kim
2014-08-14  6:01 ` [PATCH 2/3] perf tools: Put callers above callee when callchain order is caller Namhyung Kim
2014-08-14  6:01 ` [PATCH 3/3] perf callchain: Prune misleading callchains for self entries Namhyung Kim
2014-08-14 14:10   ` Jiri Olsa
2014-08-15  1:57     ` Namhyung Kim
2014-08-15 19:51       ` Jiri Olsa
2014-08-16  2:26         ` Namhyung Kim
2014-08-18 11:31           ` Jiri Olsa
2014-08-19  5:51             ` Namhyung Kim
2014-08-19  7:10               ` Jiri Olsa
2014-08-19  8:20                 ` Namhyung Kim
2014-08-15 13:52 ` [RFC/PATCHSET 0/3] perf tools: Callchain improvement with --children and -g caller Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).