linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] perfcounter: callchain symbol resolving and fixes
@ 2009-07-01  3:35 Frederic Weisbecker
  2009-07-01  3:35 ` [PATCH 1/3] perfcounter: Fix storage size allocation of callchain list Frederic Weisbecker
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2009-07-01  3:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Mike Galbraith, Paul Mackerras,
	Anton Blanchard, Arnaldo Carvalho de Melo, Frederic Weisbecker

Hi,

This patchset provides the symbol resolving for callchains.
Example:

perf report -s sym -c

5.40%  [k] __d_lookup
             3.60%
                __d_lookup
                perf_callchain
                perf_counter_overflow
                intel_pmu_handle_irq
                perf_counter_nmi_handler
                notifier_call_chain
                atomic_notifier_call_chain
                notify_die
                do_nmi
                nmi
                do_lookup
                __link_path_walk
                path_walk
                do_path_lookup
                user_path_at
                vfs_fstatat
                vfs_lstat
                sys_newlstat
                system_call_fastpath
                __lxstat
                0x406fb1

             1.26%
                __d_lookup
                perf_callchain
                perf_counter_overflow
                intel_pmu_handle_irq
                perf_counter_nmi_handler
                notifier_call_chain
                atomic_notifier_call_chain
                notify_die
                do_nmi
                nmi
                do_lookup
                __link_path_walk
                path_walk
                do_path_lookup
                user_path_at
                vfs_fstatat
                vfs_lstat
                sys_newlstat
                system_call_fastpath
                __lxstat
[...]

Sorry about the third patch, it's a kind of all-in-one monolithic thing which
gathers various fixes. I should have granulate it...

Still in my plans:

- profit we have a tree to display a better graph hierarchy
- let the user provide a limit for hit percentage, depth, number of
  backtraces, etc...
- better output
- colors

And another one:

- remove the perfcounter internal nmi call frame (ie: every nmi frame)
  so that we drop this header from each callchain:

                perf_callchain
                perf_counter_overflow
                intel_pmu_handle_irq
                perf_counter_nmi_handler
                notifier_call_chain
                atomic_notifier_call_chain
                notify_die
                do_nmi
                nmi

Frederic.

Frederic Weisbecker (3):
  perfcounter: Fix storage size allocation of callchain list
  perfcounter: Resolve symbols in callchains
  perfcounter: Various fixes for callchains

 tools/perf/builtin-report.c |  102 ++++++++++++++++++++++-----------
 tools/perf/util/callchain.c |  131 ++++++++++++++++++++++++++++++++-----------
 tools/perf/util/callchain.h |    5 +-
 3 files changed, 168 insertions(+), 70 deletions(-)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] perfcounter: Fix storage size allocation of callchain list
  2009-07-01  3:35 [PATCH 0/3] perfcounter: callchain symbol resolving and fixes Frederic Weisbecker
@ 2009-07-01  3:35 ` Frederic Weisbecker
  2009-07-01  8:46   ` [tip:perfcounters/urgent] perf_counter tools: " tip-bot for Frederic Weisbecker
  2009-07-01  3:35 ` [PATCH 2/3] perfcounter: Resolve symbols in callchains Frederic Weisbecker
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2009-07-01  3:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Mike Galbraith, Paul Mackerras,
	Anton Blanchard, Arnaldo Carvalho de Melo, Frederic Weisbecker

Fix a confusion while giving the size of a callchain list during
its allocation. We are using the wrong structure size.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 tools/perf/util/callchain.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index ad3c285..bbf7813 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -74,7 +74,7 @@ fill_node(struct callchain_node *node, struct ip_callchain *chain, int start)
 	for (i = start; i < chain->nr; i++) {
 		struct callchain_list *call;
 
-		call = malloc(sizeof(*chain));
+		call = malloc(sizeof(*call));
 		if (!call) {
 			perror("not enough memory for the code path tree");
 			return;
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] perfcounter: Resolve symbols in callchains
  2009-07-01  3:35 [PATCH 0/3] perfcounter: callchain symbol resolving and fixes Frederic Weisbecker
  2009-07-01  3:35 ` [PATCH 1/3] perfcounter: Fix storage size allocation of callchain list Frederic Weisbecker
@ 2009-07-01  3:35 ` Frederic Weisbecker
  2009-07-01  8:46   ` [tip:perfcounters/urgent] perf_counter tools: " tip-bot for Frederic Weisbecker
  2009-07-01  3:35 ` [PATCH 3/3] perfcounter: Various fixes for callchains Frederic Weisbecker
  2009-07-01  8:18 ` [PATCH 0/3] perfcounter: callchain symbol resolving and fixes Ingo Molnar
  3 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2009-07-01  3:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Mike Galbraith, Paul Mackerras,
	Anton Blanchard, Arnaldo Carvalho de Melo, Frederic Weisbecker

This patch resolves the names, when possible, of each ip present in
the callchains while using the -c option with perf report.

Example:

5.40%  [k] __d_lookup
             5.37%
                perf_callchain
                perf_counter_overflow
                intel_pmu_handle_irq
                perf_counter_nmi_handler
                notifier_call_chain
                atomic_notifier_call_chain
                notify_die
                do_nmi
                nmi
                do_lookup
                __link_path_walk
                path_walk
                do_path_lookup
                user_path_at
                sys_faccessat
                sys_access
                system_call_fastpath
                0x7fb609846f77

             0.01%
                perf_callchain
                perf_counter_overflow
                intel_pmu_handle_irq
                perf_counter_nmi_handler
                notifier_call_chain
                atomic_notifier_call_chain
                notify_die
                do_nmi
                nmi
                do_lookup
                __link_path_walk
                path_walk
                do_path_lookup
                user_path_at
                sys_faccessat

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 tools/perf/builtin-report.c |  102 ++++++++++++++++++++++++++++---------------
 tools/perf/util/callchain.c |   33 ++++++++------
 tools/perf/util/callchain.h |    5 ++-
 3 files changed, 90 insertions(+), 50 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index ed391db..8f17dc7 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -784,8 +784,15 @@ callchain__fprintf(FILE *fp, struct callchain_node *self, u64 total_samples)
 	ret += callchain__fprintf(fp, self->parent, total_samples);
 
 
-	list_for_each_entry(chain, &self->val, list)
-		ret += fprintf(fp, "                %p\n", (void *)chain->ip);
+	list_for_each_entry(chain, &self->val, list) {
+		if (chain->ip >= PERF_CONTEXT_MAX)
+			continue;
+		if (chain->sym)
+			ret += fprintf(fp, "                %s\n", chain->sym->name);
+		else
+			ret += fprintf(fp, "                %p\n",
+					(void *)chain->ip);
+	}
 
 	return ret;
 }
@@ -920,6 +927,55 @@ static int call__match(struct symbol *sym)
 	return 0;
 }
 
+static struct symbol **
+resolve_callchain(struct thread *thread, struct map *map,
+		    struct ip_callchain *chain, struct hist_entry *entry)
+{
+	int i;
+	struct symbol **syms;
+	u64 context = PERF_CONTEXT_MAX;
+
+	if (callchain) {
+		syms = calloc(chain->nr, sizeof(*syms));
+		if (!syms) {
+			fprintf(stderr, "Can't allocate memory for symbols\n");
+			exit(-1);
+		}
+	}
+
+	for (i = 0; i < chain->nr; i++) {
+		u64 ip = chain->ips[i];
+		struct dso *dso = NULL;
+		struct symbol *sym;
+
+		if (ip >= PERF_CONTEXT_MAX) {
+			context = ip;
+			continue;
+		}
+
+		switch (context) {
+		case PERF_CONTEXT_KERNEL:
+			dso = kernel_dso;
+			break;
+		default:
+			break;
+		}
+
+		sym = resolve_symbol(thread, NULL, &dso, &ip);
+
+		if (sym) {
+			if (sort__has_parent && call__match(sym) &&
+			    !entry->parent)
+				entry->parent = sym;
+			if (!callchain)
+				break;
+			syms[i] = sym;
+		}
+	}
+
+	return syms;
+}
+
 /*
  * collect histogram counts
  */
@@ -932,6 +988,7 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
 	struct rb_node **p = &hist.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *he;
+	struct symbol **syms = NULL;
 	struct hist_entry entry = {
 		.thread	= thread,
 		.map	= map,
@@ -945,36 +1002,8 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
 	};
 	int cmp;
 
-	if (sort__has_parent && chain) {
-		u64 context = PERF_CONTEXT_MAX;
-		int i;
-
-		for (i = 0; i < chain->nr; i++) {
-			u64 ip = chain->ips[i];
-			struct dso *dso = NULL;
-			struct symbol *sym;
-
-			if (ip >= PERF_CONTEXT_MAX) {
-				context = ip;
-				continue;
-			}
-
-			switch (context) {
-			case PERF_CONTEXT_KERNEL:
-				dso = kernel_dso;
-				break;
-			default:
-				break;
-			}
-
-			sym = resolve_symbol(thread, NULL, &dso, &ip);
-
-			if (sym && call__match(sym)) {
-				entry.parent = sym;
-				break;
-			}
-		}
-	}
+	if ((sort__has_parent || callchain) && chain)
+		syms = resolve_callchain(thread, map, chain, &entry);
 
 	while (*p != NULL) {
 		parent = *p;
@@ -984,8 +1013,10 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
 
 		if (!cmp) {
 			he->count += count;
-			if (callchain)
-				append_chain(&he->callchain, chain);
+			if (callchain) {
+				append_chain(&he->callchain, chain, syms);
+				free(syms);
+			}
 			return 0;
 		}
 
@@ -1001,7 +1032,8 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
 	*he = entry;
 	if (callchain) {
 		callchain_init(&he->callchain);
-		append_chain(&he->callchain, chain);
+		append_chain(&he->callchain, chain, syms);
+		free(syms);
 	}
 	rb_link_node(&he->rb_node, parent, p);
 	rb_insert_color(&he->rb_node, &hist);
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index bbf7813..6568cb1 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -67,7 +67,8 @@ static struct callchain_node *create_child(struct callchain_node *parent)
 }
 
 static void
-fill_node(struct callchain_node *node, struct ip_callchain *chain, int start)
+fill_node(struct callchain_node *node, struct ip_callchain *chain, int start,
+	  struct symbol **syms)
 {
 	int i;
 
@@ -80,24 +81,26 @@ fill_node(struct callchain_node *node, struct ip_callchain *chain, int start)
 			return;
 		}
 		call->ip = chain->ips[i];
+		call->sym = syms[i];
 		list_add_tail(&call->list, &node->val);
 	}
 	node->val_nr = i - start;
 }
 
-static void add_child(struct callchain_node *parent, struct ip_callchain *chain)
+static void add_child(struct callchain_node *parent, struct ip_callchain *chain,
+		      struct symbol **syms)
 {
 	struct callchain_node *new;
 
 	new = create_child(parent);
-	fill_node(new, chain, parent->val_nr);
+	fill_node(new, chain, parent->val_nr, syms);
 
 	new->hit = 1;
 }
 
 static void
 split_add_child(struct callchain_node *parent, struct ip_callchain *chain,
-		struct callchain_list *to_split, int idx)
+		struct callchain_list *to_split, int idx, struct symbol **syms)
 {
 	struct callchain_node *new;
 
@@ -109,21 +112,22 @@ split_add_child(struct callchain_node *parent, struct ip_callchain *chain,
 	parent->val_nr = idx;
 
 	/* create the new one */
-	add_child(parent, chain);
+	add_child(parent, chain, syms);
 }
 
 static int
 __append_chain(struct callchain_node *root, struct ip_callchain *chain,
-		int start);
+	       int start, struct symbol **syms);
 
 static int
-__append_chain_children(struct callchain_node *root, struct ip_callchain *chain)
+__append_chain_children(struct callchain_node *root, struct ip_callchain *chain,
+			struct symbol **syms)
 {
 	struct callchain_node *rnode;
 
 	/* lookup in childrens */
 	list_for_each_entry(rnode, &root->children, brothers) {
-		int ret = __append_chain(rnode, chain, root->val_nr);
+		int ret = __append_chain(rnode, chain, root->val_nr, syms);
 		if (!ret)
 			return 0;
 	}
@@ -132,7 +136,7 @@ __append_chain_children(struct callchain_node *root, struct ip_callchain *chain)
 
 static int
 __append_chain(struct callchain_node *root, struct ip_callchain *chain,
-		int start)
+	       int start, struct symbol **syms)
 {
 	struct callchain_list *cnode;
 	int i = start;
@@ -154,7 +158,7 @@ __append_chain(struct callchain_node *root, struct ip_callchain *chain,
 
 	/* we match only a part of the node. Split it and add the new chain */
 	if (i < root->val_nr) {
-		split_add_child(root, chain, cnode, i);
+		split_add_child(root, chain, cnode, i, syms);
 		return 0;
 	}
 
@@ -164,11 +168,12 @@ __append_chain(struct callchain_node *root, struct ip_callchain *chain,
 		return 0;
 	}
 
-	return __append_chain_children(root, chain);
+	return __append_chain_children(root, chain, syms);
 }
 
-void append_chain(struct callchain_node *root, struct ip_callchain *chain)
+void append_chain(struct callchain_node *root, struct ip_callchain *chain,
+		  struct symbol **syms)
 {
-	if (__append_chain_children(root, chain) == -1)
-		add_child(root, chain);
+	if (__append_chain_children(root, chain, syms) == -1)
+		add_child(root, chain, syms);
 }
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index fa1cd2f..c942daa 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -4,6 +4,7 @@
 #include "../perf.h"
 #include "list.h"
 #include "rbtree.h"
+#include "symbol.h"
 
 
 struct callchain_node {
@@ -18,6 +19,7 @@ struct callchain_node {
 
 struct callchain_list {
 	unsigned long		ip;
+	struct symbol		*sym;
 	struct list_head	list;
 };
 
@@ -28,6 +30,7 @@ static inline void callchain_init(struct callchain_node *node)
 	INIT_LIST_HEAD(&node->val);
 }
 
-void append_chain(struct callchain_node *root, struct ip_callchain *chain);
+void append_chain(struct callchain_node *root, struct ip_callchain *chain,
+		  struct symbol **syms);
 void sort_chain_to_rbtree(struct rb_root *rb_root, struct callchain_node *node);
 #endif
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] perfcounter: Various fixes for callchains
  2009-07-01  3:35 [PATCH 0/3] perfcounter: callchain symbol resolving and fixes Frederic Weisbecker
  2009-07-01  3:35 ` [PATCH 1/3] perfcounter: Fix storage size allocation of callchain list Frederic Weisbecker
  2009-07-01  3:35 ` [PATCH 2/3] perfcounter: Resolve symbols in callchains Frederic Weisbecker
@ 2009-07-01  3:35 ` Frederic Weisbecker
  2009-07-01  8:46   ` [tip:perfcounters/urgent] perf_counter tools: " tip-bot for Frederic Weisbecker
  2009-07-01  8:18 ` [PATCH 0/3] perfcounter: callchain symbol resolving and fixes Ingo Molnar
  3 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2009-07-01  3:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Mike Galbraith, Paul Mackerras,
	Anton Blanchard, Arnaldo Carvalho de Melo, Frederic Weisbecker

The symbol resolving has of course revealed some bugs
in the callchain tree handling.
This patch fixes some of them, including:

- inherit the children from the parents while splitting a node
- fix list range moving
- fix indexes setting in callchains
- create a child on the current node if the path doesn't match in
  the existent children (was only done on the root)
- compare using symbols when possible so that we can match a function
  using any ip inside by referring to its start address.

The practical effects are

- remove double callchains
- fix upside down or any random order of callchains
- fix wrong paths
- fix bad hits and percentage accounts

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 tools/perf/util/callchain.c |  122 +++++++++++++++++++++++++++++++-----------
 1 files changed, 90 insertions(+), 32 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 6568cb1..440db12 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -4,6 +4,9 @@
  * Handle the callchains from the stream in an ad-hoc radix tree and then
  * sort them in an rbtree.
  *
+ * Using a radix for code path provides a fast retrieval and factorizes
+ * memory use. Also that lets us use the paths in a hierarchical graph view.
+ *
  */
 
 #include <stdlib.h>
@@ -14,7 +17,8 @@
 #include "callchain.h"
 
 
-static void rb_insert_callchain(struct rb_root *root, struct callchain_node *chain)
+static void
+rb_insert_callchain(struct rb_root *root, struct callchain_node *chain)
 {
 	struct rb_node **p = &root->rb_node;
 	struct rb_node *parent = NULL;
@@ -49,7 +53,12 @@ void sort_chain_to_rbtree(struct rb_root *rb_root, struct callchain_node *node)
 		rb_insert_callchain(rb_root, node);
 }
 
-static struct callchain_node *create_child(struct callchain_node *parent)
+/*
+ * Create a child for a parent. If inherit_children, then the new child
+ * will become the new parent of it's parent children
+ */
+static struct callchain_node *
+create_child(struct callchain_node *parent, bool inherit_children)
 {
 	struct callchain_node *new;
 
@@ -61,14 +70,27 @@ static struct callchain_node *create_child(struct callchain_node *parent)
 	new->parent = parent;
 	INIT_LIST_HEAD(&new->children);
 	INIT_LIST_HEAD(&new->val);
+
+	if (inherit_children) {
+		struct callchain_node *next;
+
+		list_splice(&parent->children, &new->children);
+		INIT_LIST_HEAD(&parent->children);
+
+		list_for_each_entry(next, &new->children, brothers)
+			next->parent = new;
+	}
 	list_add_tail(&new->brothers, &parent->children);
 
 	return new;
 }
 
+/*
+ * Fill the node with callchain values
+ */
 static void
-fill_node(struct callchain_node *node, struct ip_callchain *chain, int start,
-	  struct symbol **syms)
+fill_node(struct callchain_node *node, struct ip_callchain *chain,
+	  int start, struct symbol **syms)
 {
 	int i;
 
@@ -84,54 +106,80 @@ fill_node(struct callchain_node *node, struct ip_callchain *chain, int start,
 		call->sym = syms[i];
 		list_add_tail(&call->list, &node->val);
 	}
-	node->val_nr = i - start;
+	node->val_nr = chain->nr - start;
+	if (!node->val_nr)
+		printf("Warning: empty node in callchain tree\n");
 }
 
-static void add_child(struct callchain_node *parent, struct ip_callchain *chain,
-		      struct symbol **syms)
+static void
+add_child(struct callchain_node *parent, struct ip_callchain *chain,
+	  int start, struct symbol **syms)
 {
 	struct callchain_node *new;
 
-	new = create_child(parent);
-	fill_node(new, chain, parent->val_nr, syms);
+	new = create_child(parent, false);
+	fill_node(new, chain, start, syms);
 
 	new->hit = 1;
 }
 
+/*
+ * Split the parent in two parts (a new child is created) and
+ * give a part of its callchain to the created child.
+ * Then create another child to host the given callchain of new branch
+ */
 static void
 split_add_child(struct callchain_node *parent, struct ip_callchain *chain,
-		struct callchain_list *to_split, int idx, struct symbol **syms)
+		struct callchain_list *to_split, int idx_parents, int idx_local,
+		struct symbol **syms)
 {
 	struct callchain_node *new;
+	struct list_head *old_tail;
+	int idx_total = idx_parents + idx_local;
 
 	/* split */
-	new = create_child(parent);
-	list_move_tail(&to_split->list, &new->val);
-	new->hit = parent->hit;
-	parent->hit = 0;
-	parent->val_nr = idx;
+	new = create_child(parent, true);
+
+	/* split the callchain and move a part to the new child */
+	old_tail = parent->val.prev;
+	list_del_range(&to_split->list, old_tail);
+	new->val.next = &to_split->list;
+	new->val.prev = old_tail;
+	to_split->list.prev = &new->val;
+	old_tail->next = &new->val;
 
-	/* create the new one */
-	add_child(parent, chain, syms);
+	/* split the hits */
+	new->hit = parent->hit;
+	new->val_nr = parent->val_nr - idx_local;
+	parent->val_nr = idx_local;
+
+	/* create a new child for the new branch if any */
+	if (idx_total < chain->nr) {
+		parent->hit = 0;
+		add_child(parent, chain, idx_total, syms);
+	} else {
+		parent->hit = 1;
+	}
 }
 
 static int
 __append_chain(struct callchain_node *root, struct ip_callchain *chain,
 	       int start, struct symbol **syms);
 
-static int
+static void
 __append_chain_children(struct callchain_node *root, struct ip_callchain *chain,
-			struct symbol **syms)
+			struct symbol **syms, int start)
 {
 	struct callchain_node *rnode;
 
 	/* lookup in childrens */
 	list_for_each_entry(rnode, &root->children, brothers) {
-		int ret = __append_chain(rnode, chain, root->val_nr, syms);
+		int ret = __append_chain(rnode, chain, start, syms);
 		if (!ret)
-			return 0;
+			return;
 	}
-	return -1;
+	/* nothing in children, add to the current node */
+	add_child(root, chain, start, syms);
 }
 
 static int
@@ -142,14 +190,22 @@ __append_chain(struct callchain_node *root, struct ip_callchain *chain,
 	int i = start;
 	bool found = false;
 
-	/* lookup in the current node */
+	/*
+	 * Lookup in the current node
+	 * If we have a symbol, then compare the start to match
+	 * anywhere inside a function.
+	 */
 	list_for_each_entry(cnode, &root->val, list) {
-		if (cnode->ip != chain->ips[i++])
+		if (i == chain->nr)
+			break;
+		if (cnode->sym && syms[i]) {
+			if (cnode->sym->start != syms[i]->start)
+				break;
+		} else if (cnode->ip != chain->ips[i])
 			break;
 		if (!found)
 			found = true;
-		if (i == chain->nr)
-			break;
+		i++;
 	}
 
 	/* matches not, relay on the parent */
@@ -157,23 +213,25 @@ __append_chain(struct callchain_node *root, struct ip_callchain *chain,
 		return -1;
 
 	/* we match only a part of the node. Split it and add the new chain */
-	if (i < root->val_nr) {
-		split_add_child(root, chain, cnode, i, syms);
+	if (i - start < root->val_nr) {
+		split_add_child(root, chain, cnode, start, i - start, syms);
 		return 0;
 	}
 
 	/* we match 100% of the path, increment the hit */
-	if (i == root->val_nr) {
+	if (i - start == root->val_nr && i == chain->nr) {
 		root->hit++;
 		return 0;
 	}
 
-	return __append_chain_children(root, chain, syms);
+	/* We match the node and still have a part remaining */
+	__append_chain_children(root, chain, syms, i);
+
+	return 0;
 }
 
 void append_chain(struct callchain_node *root, struct ip_callchain *chain,
 		  struct symbol **syms)
 {
-	if (__append_chain_children(root, chain, syms) == -1)
-		add_child(root, chain, syms);
+	__append_chain_children(root, chain, syms, 0);
 }
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] perfcounter: callchain symbol resolving and fixes
  2009-07-01  3:35 [PATCH 0/3] perfcounter: callchain symbol resolving and fixes Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2009-07-01  3:35 ` [PATCH 3/3] perfcounter: Various fixes for callchains Frederic Weisbecker
@ 2009-07-01  8:18 ` Ingo Molnar
  2009-07-01 16:33   ` Frederic Weisbecker
  3 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2009-07-01  8:18 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Peter Zijlstra, Mike Galbraith, Paul Mackerras,
	Anton Blanchard, Arnaldo Carvalho de Melo


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> Hi,
> 
> This patchset provides the symbol resolving for callchains.
> Example:
> 
> perf report -s sym -c
> 
> 5.40%  [k] __d_lookup
>              3.60%
>                 __d_lookup
>                 perf_callchain
>                 perf_counter_overflow
>                 intel_pmu_handle_irq
>                 perf_counter_nmi_handler
>                 notifier_call_chain
>                 atomic_notifier_call_chain
>                 notify_die
>                 do_nmi
>                 nmi
>                 do_lookup
>                 __link_path_walk
>                 path_walk
>                 do_path_lookup
>                 user_path_at
>                 vfs_fstatat
>                 vfs_lstat
>                 sys_newlstat
>                 system_call_fastpath
>                 __lxstat
>                 0x406fb1

nice!

> Sorry about the third patch, it's a kind of all-in-one monolithic 
> thing which gathers various fixes. I should have granulate it...

No problem, it's good enough - it's all about the same topic.

> 
> Still in my plans:
> 
> - profit we have a tree to display a better graph hierarchy
> - let the user provide a limit for hit percentage, depth, number of
>   backtraces, etc...
> - better output
> - colors
> 
> And another one:
> 
> - remove the perfcounter internal nmi call frame (ie: every nmi frame)
>   so that we drop this header from each callchain:
> 
>                 perf_callchain
>                 perf_counter_overflow
>                 intel_pmu_handle_irq
>                 perf_counter_nmi_handler
>                 notifier_call_chain
>                 atomic_notifier_call_chain
>                 notify_die
>                 do_nmi
>                 nmi

Sounds good. I suspect this latter one is the most important one 
because right now the backtrace output screen real estate is 
dominated by the repetitive nmi entries, making it hard to interpret 
the result 'at a glance'.

I think we should skip those NMI entries right in the kernel - that 
will also make call-chain event records quite a bit smaller, by 
about 72 bytes per call-chain record.

We can do the skipping by using this backtrace-generator callback in 
arch/x86/kernel/cpu/perf_counter.c:

 static int backtrace_stack(void *data, char *name)
 {
         /* Process all stacks: */
         return 0;
 }

The 'name' parameter passed in signals the type of stack frame we 
are processing. If you look into arch/x86/kernel/dumpstack_64.c, it 
can be one of these strings:

        static char ids[][8] = {
                [DEBUG_STACK - 1] = "#DB",
                [NMI_STACK - 1] = "NMI",
                [DOUBLEFAULT_STACK - 1] = "#DF",
                [STACKFAULT_STACK - 1] = "#SS",
                [MCE_STACK - 1] = "#MC",

A quick check to see whether this concept works would be expose the 
ids array and do:

 static int PER_CPU(int, is_nmi_frame);

 static int backtrace_stack(void *data, char *name)
 {
	if (name == x86_stack_ids[NMI_STACK-1])
		per_cpu(is_nmi_frame, raw_processor_id()) = 1;
	else
		per_cpu(is_nmi_frame, raw_processor_id()) = 0;

        /* Process all stacks: */
        return 0;
 }

and to add something like this to backtrace_address():

	if (per_cpu(is_nmi_frame, raw_processor_id())
		return;

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:perfcounters/urgent] perf_counter tools: Fix storage size allocation of callchain list
  2009-07-01  3:35 ` [PATCH 1/3] perfcounter: Fix storage size allocation of callchain list Frederic Weisbecker
@ 2009-07-01  8:46   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2009-07-01  8:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, acme, anton, paulus, hpa, mingo, a.p.zijlstra,
	efault, fweisbec, tglx, mingo

Commit-ID:  9198aa77b69647d1d91207f8075763abe7dc0bf4
Gitweb:     http://git.kernel.org/tip/9198aa77b69647d1d91207f8075763abe7dc0bf4
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Wed, 1 Jul 2009 05:35:13 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 1 Jul 2009 09:58:23 +0200

perf_counter tools: Fix storage size allocation of callchain list

Fix a confusion while giving the size of a callchain list
during its allocation. We are using the wrong structure size.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246419315-9968-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 tools/perf/util/callchain.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index ad3c285..bbf7813 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -74,7 +74,7 @@ fill_node(struct callchain_node *node, struct ip_callchain *chain, int start)
 	for (i = start; i < chain->nr; i++) {
 		struct callchain_list *call;
 
-		call = malloc(sizeof(*chain));
+		call = malloc(sizeof(*call));
 		if (!call) {
 			perror("not enough memory for the code path tree");
 			return;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:perfcounters/urgent] perf_counter tools: Resolve symbols in callchains
  2009-07-01  3:35 ` [PATCH 2/3] perfcounter: Resolve symbols in callchains Frederic Weisbecker
@ 2009-07-01  8:46   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2009-07-01  8:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, acme, anton, paulus, hpa, mingo, a.p.zijlstra,
	efault, fweisbec, tglx, mingo

Commit-ID:  4424961ad6621a02c6b4c9093e801002c1bb9f65
Gitweb:     http://git.kernel.org/tip/4424961ad6621a02c6b4c9093e801002c1bb9f65
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Wed, 1 Jul 2009 05:35:14 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 1 Jul 2009 09:58:26 +0200

perf_counter tools: Resolve symbols in callchains

This patch resolves the names, when possible, of each ip
present in the callchains while using the -c option with perf
report.

Example:

5.40%  [k] __d_lookup
             5.37%
                perf_callchain
                perf_counter_overflow
                intel_pmu_handle_irq
                perf_counter_nmi_handler
                notifier_call_chain
                atomic_notifier_call_chain
                notify_die
                do_nmi
                nmi
                do_lookup
                __link_path_walk
                path_walk
                do_path_lookup
                user_path_at
                sys_faccessat
                sys_access
                system_call_fastpath
                0x7fb609846f77

             0.01%
                perf_callchain
                perf_counter_overflow
                intel_pmu_handle_irq
                perf_counter_nmi_handler
                notifier_call_chain
                atomic_notifier_call_chain
                notify_die
                do_nmi
                nmi
                do_lookup
                __link_path_walk
                path_walk
                do_path_lookup
                user_path_at
                sys_faccessat

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246419315-9968-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 tools/perf/builtin-report.c |  102 ++++++++++++++++++++++++++++---------------
 tools/perf/util/callchain.c |   33 ++++++++------
 tools/perf/util/callchain.h |    5 ++-
 3 files changed, 90 insertions(+), 50 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 3f5d8ea..1977930 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -794,8 +794,15 @@ callchain__fprintf(FILE *fp, struct callchain_node *self, u64 total_samples)
 	ret += callchain__fprintf(fp, self->parent, total_samples);
 
 
-	list_for_each_entry(chain, &self->val, list)
-		ret += fprintf(fp, "                %p\n", (void *)chain->ip);
+	list_for_each_entry(chain, &self->val, list) {
+		if (chain->ip >= PERF_CONTEXT_MAX)
+			continue;
+		if (chain->sym)
+			ret += fprintf(fp, "                %s\n", chain->sym->name);
+		else
+			ret += fprintf(fp, "                %p\n",
+					(void *)chain->ip);
+	}
 
 	return ret;
 }
@@ -930,6 +937,55 @@ static int call__match(struct symbol *sym)
 	return 0;
 }
 
+static struct symbol **
+resolve_callchain(struct thread *thread, struct map *map,
+		    struct ip_callchain *chain, struct hist_entry *entry)
+{
+	int i;
+	struct symbol **syms;
+	u64 context = PERF_CONTEXT_MAX;
+
+	if (callchain) {
+		syms = calloc(chain->nr, sizeof(*syms));
+		if (!syms) {
+			fprintf(stderr, "Can't allocate memory for symbols\n");
+			exit(-1);
+		}
+	}
+
+	for (i = 0; i < chain->nr; i++) {
+		u64 ip = chain->ips[i];
+		struct dso *dso = NULL;
+		struct symbol *sym;
+
+		if (ip >= PERF_CONTEXT_MAX) {
+			context = ip;
+			continue;
+		}
+
+		switch (context) {
+		case PERF_CONTEXT_KERNEL:
+			dso = kernel_dso;
+			break;
+		default:
+			break;
+		}
+
+		sym = resolve_symbol(thread, NULL, &dso, &ip);
+
+		if (sym) {
+			if (sort__has_parent && call__match(sym) &&
+			    !entry->parent)
+				entry->parent = sym;
+			if (!callchain)
+				break;
+			syms[i] = sym;
+		}
+	}
+
+	return syms;
+}
+
 /*
  * collect histogram counts
  */
@@ -942,6 +998,7 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
 	struct rb_node **p = &hist.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *he;
+	struct symbol **syms = NULL;
 	struct hist_entry entry = {
 		.thread	= thread,
 		.map	= map,
@@ -955,39 +1012,11 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
 	};
 	int cmp;
 
-	if (sort__has_parent && chain) {
-		u64 context = PERF_CONTEXT_MAX;
-		int i;
-
-		for (i = 0; i < chain->nr; i++) {
-			u64 ip = chain->ips[i];
-			struct dso *dso = NULL;
-			struct symbol *sym;
-
-			if (ip >= PERF_CONTEXT_MAX) {
-				context = ip;
-				continue;
-			}
-
-			switch (context) {
 			case PERF_CONTEXT_HV:
 				dso = hypervisor_dso;
 				break;
-			case PERF_CONTEXT_KERNEL:
-				dso = kernel_dso;
-				break;
-			default:
-				break;
-			}
-
-			sym = resolve_symbol(thread, NULL, &dso, &ip);
-
-			if (sym && call__match(sym)) {
-				entry.parent = sym;
-				break;
-			}
-		}
-	}
+	if ((sort__has_parent || callchain) && chain)
+		syms = resolve_callchain(thread, map, chain, &entry);
 
 	while (*p != NULL) {
 		parent = *p;
@@ -997,8 +1026,10 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
 
 		if (!cmp) {
 			he->count += count;
-			if (callchain)
-				append_chain(&he->callchain, chain);
+			if (callchain) {
+				append_chain(&he->callchain, chain, syms);
+				free(syms);
+			}
 			return 0;
 		}
 
@@ -1014,7 +1045,8 @@ hist_entry__add(struct thread *thread, struct map *map, struct dso *dso,
 	*he = entry;
 	if (callchain) {
 		callchain_init(&he->callchain);
-		append_chain(&he->callchain, chain);
+		append_chain(&he->callchain, chain, syms);
+		free(syms);
 	}
 	rb_link_node(&he->rb_node, parent, p);
 	rb_insert_color(&he->rb_node, &hist);
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index bbf7813..6568cb1 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -67,7 +67,8 @@ static struct callchain_node *create_child(struct callchain_node *parent)
 }
 
 static void
-fill_node(struct callchain_node *node, struct ip_callchain *chain, int start)
+fill_node(struct callchain_node *node, struct ip_callchain *chain, int start,
+	  struct symbol **syms)
 {
 	int i;
 
@@ -80,24 +81,26 @@ fill_node(struct callchain_node *node, struct ip_callchain *chain, int start)
 			return;
 		}
 		call->ip = chain->ips[i];
+		call->sym = syms[i];
 		list_add_tail(&call->list, &node->val);
 	}
 	node->val_nr = i - start;
 }
 
-static void add_child(struct callchain_node *parent, struct ip_callchain *chain)
+static void add_child(struct callchain_node *parent, struct ip_callchain *chain,
+		      struct symbol **syms)
 {
 	struct callchain_node *new;
 
 	new = create_child(parent);
-	fill_node(new, chain, parent->val_nr);
+	fill_node(new, chain, parent->val_nr, syms);
 
 	new->hit = 1;
 }
 
 static void
 split_add_child(struct callchain_node *parent, struct ip_callchain *chain,
-		struct callchain_list *to_split, int idx)
+		struct callchain_list *to_split, int idx, struct symbol **syms)
 {
 	struct callchain_node *new;
 
@@ -109,21 +112,22 @@ split_add_child(struct callchain_node *parent, struct ip_callchain *chain,
 	parent->val_nr = idx;
 
 	/* create the new one */
-	add_child(parent, chain);
+	add_child(parent, chain, syms);
 }
 
 static int
 __append_chain(struct callchain_node *root, struct ip_callchain *chain,
-		int start);
+	       int start, struct symbol **syms);
 
 static int
-__append_chain_children(struct callchain_node *root, struct ip_callchain *chain)
+__append_chain_children(struct callchain_node *root, struct ip_callchain *chain,
+			struct symbol **syms)
 {
 	struct callchain_node *rnode;
 
 	/* lookup in childrens */
 	list_for_each_entry(rnode, &root->children, brothers) {
-		int ret = __append_chain(rnode, chain, root->val_nr);
+		int ret = __append_chain(rnode, chain, root->val_nr, syms);
 		if (!ret)
 			return 0;
 	}
@@ -132,7 +136,7 @@ __append_chain_children(struct callchain_node *root, struct ip_callchain *chain)
 
 static int
 __append_chain(struct callchain_node *root, struct ip_callchain *chain,
-		int start)
+	       int start, struct symbol **syms)
 {
 	struct callchain_list *cnode;
 	int i = start;
@@ -154,7 +158,7 @@ __append_chain(struct callchain_node *root, struct ip_callchain *chain,
 
 	/* we match only a part of the node. Split it and add the new chain */
 	if (i < root->val_nr) {
-		split_add_child(root, chain, cnode, i);
+		split_add_child(root, chain, cnode, i, syms);
 		return 0;
 	}
 
@@ -164,11 +168,12 @@ __append_chain(struct callchain_node *root, struct ip_callchain *chain,
 		return 0;
 	}
 
-	return __append_chain_children(root, chain);
+	return __append_chain_children(root, chain, syms);
 }
 
-void append_chain(struct callchain_node *root, struct ip_callchain *chain)
+void append_chain(struct callchain_node *root, struct ip_callchain *chain,
+		  struct symbol **syms)
 {
-	if (__append_chain_children(root, chain) == -1)
-		add_child(root, chain);
+	if (__append_chain_children(root, chain, syms) == -1)
+		add_child(root, chain, syms);
 }
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index fa1cd2f..c942daa 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -4,6 +4,7 @@
 #include "../perf.h"
 #include "list.h"
 #include "rbtree.h"
+#include "symbol.h"
 
 
 struct callchain_node {
@@ -18,6 +19,7 @@ struct callchain_node {
 
 struct callchain_list {
 	unsigned long		ip;
+	struct symbol		*sym;
 	struct list_head	list;
 };
 
@@ -28,6 +30,7 @@ static inline void callchain_init(struct callchain_node *node)
 	INIT_LIST_HEAD(&node->val);
 }
 
-void append_chain(struct callchain_node *root, struct ip_callchain *chain);
+void append_chain(struct callchain_node *root, struct ip_callchain *chain,
+		  struct symbol **syms);
 void sort_chain_to_rbtree(struct rb_root *rb_root, struct callchain_node *node);
 #endif

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [tip:perfcounters/urgent] perf_counter tools: Various fixes for callchains
  2009-07-01  3:35 ` [PATCH 3/3] perfcounter: Various fixes for callchains Frederic Weisbecker
@ 2009-07-01  8:46   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2009-07-01  8:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, acme, anton, paulus, hpa, mingo, a.p.zijlstra,
	efault, fweisbec, tglx, mingo

Commit-ID:  deac911cbdcb124fa0cee47c588e0cb0400b23b7
Gitweb:     http://git.kernel.org/tip/deac911cbdcb124fa0cee47c588e0cb0400b23b7
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Wed, 1 Jul 2009 05:35:15 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 1 Jul 2009 09:58:52 +0200

perf_counter tools: Various fixes for callchains

The symbol resolving has of course revealed some bugs in the
callchain tree handling. This patch fixes some of them,
including:

- inherit the children from the parents while splitting a node
- fix list range moving
- fix indexes setting in callchains
- create a child on the current node if the path doesn't match in
  the existent children (was only done on the root)
- compare using symbols when possible so that we can match a function
  using any ip inside by referring to its start address.

The practical effects are:

- remove double callchains
- fix upside down or any random order of callchains
- fix wrong paths
- fix bad hits and percentage accounts

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246419315-9968-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


---
 tools/perf/util/callchain.c |  122 +++++++++++++++++++++++++++++++-----------
 1 files changed, 90 insertions(+), 32 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 6568cb1..440db12 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -4,6 +4,9 @@
  * Handle the callchains from the stream in an ad-hoc radix tree and then
  * sort them in an rbtree.
  *
+ * Using a radix for code path provides a fast retrieval and factorizes
+ * memory use. Also that lets us use the paths in a hierarchical graph view.
+ *
  */
 
 #include <stdlib.h>
@@ -14,7 +17,8 @@
 #include "callchain.h"
 
 
-static void rb_insert_callchain(struct rb_root *root, struct callchain_node *chain)
+static void
+rb_insert_callchain(struct rb_root *root, struct callchain_node *chain)
 {
 	struct rb_node **p = &root->rb_node;
 	struct rb_node *parent = NULL;
@@ -49,7 +53,12 @@ void sort_chain_to_rbtree(struct rb_root *rb_root, struct callchain_node *node)
 		rb_insert_callchain(rb_root, node);
 }
 
-static struct callchain_node *create_child(struct callchain_node *parent)
+/*
+ * Create a child for a parent. If inherit_children, then the new child
+ * will become the new parent of it's parent children
+ */
+static struct callchain_node *
+create_child(struct callchain_node *parent, bool inherit_children)
 {
 	struct callchain_node *new;
 
@@ -61,14 +70,27 @@ static struct callchain_node *create_child(struct callchain_node *parent)
 	new->parent = parent;
 	INIT_LIST_HEAD(&new->children);
 	INIT_LIST_HEAD(&new->val);
+
+	if (inherit_children) {
+		struct callchain_node *next;
+
+		list_splice(&parent->children, &new->children);
+		INIT_LIST_HEAD(&parent->children);
+
+		list_for_each_entry(next, &new->children, brothers)
+			next->parent = new;
+	}
 	list_add_tail(&new->brothers, &parent->children);
 
 	return new;
 }
 
+/*
+ * Fill the node with callchain values
+ */
 static void
-fill_node(struct callchain_node *node, struct ip_callchain *chain, int start,
-	  struct symbol **syms)
+fill_node(struct callchain_node *node, struct ip_callchain *chain,
+	  int start, struct symbol **syms)
 {
 	int i;
 
@@ -84,54 +106,80 @@ fill_node(struct callchain_node *node, struct ip_callchain *chain, int start,
 		call->sym = syms[i];
 		list_add_tail(&call->list, &node->val);
 	}
-	node->val_nr = i - start;
+	node->val_nr = chain->nr - start;
+	if (!node->val_nr)
+		printf("Warning: empty node in callchain tree\n");
 }
 
-static void add_child(struct callchain_node *parent, struct ip_callchain *chain,
-		      struct symbol **syms)
+static void
+add_child(struct callchain_node *parent, struct ip_callchain *chain,
+	  int start, struct symbol **syms)
 {
 	struct callchain_node *new;
 
-	new = create_child(parent);
-	fill_node(new, chain, parent->val_nr, syms);
+	new = create_child(parent, false);
+	fill_node(new, chain, start, syms);
 
 	new->hit = 1;
 }
 
+/*
+ * Split the parent in two parts (a new child is created) and
+ * give a part of its callchain to the created child.
+ * Then create another child to host the given callchain of new branch
+ */
 static void
 split_add_child(struct callchain_node *parent, struct ip_callchain *chain,
-		struct callchain_list *to_split, int idx, struct symbol **syms)
+		struct callchain_list *to_split, int idx_parents, int idx_local,
+		struct symbol **syms)
 {
 	struct callchain_node *new;
+	struct list_head *old_tail;
+	int idx_total = idx_parents + idx_local;
 
 	/* split */
-	new = create_child(parent);
-	list_move_tail(&to_split->list, &new->val);
-	new->hit = parent->hit;
-	parent->hit = 0;
-	parent->val_nr = idx;
+	new = create_child(parent, true);
+
+	/* split the callchain and move a part to the new child */
+	old_tail = parent->val.prev;
+	list_del_range(&to_split->list, old_tail);
+	new->val.next = &to_split->list;
+	new->val.prev = old_tail;
+	to_split->list.prev = &new->val;
+	old_tail->next = &new->val;
 
-	/* create the new one */
-	add_child(parent, chain, syms);
+	/* split the hits */
+	new->hit = parent->hit;
+	new->val_nr = parent->val_nr - idx_local;
+	parent->val_nr = idx_local;
+
+	/* create a new child for the new branch if any */
+	if (idx_total < chain->nr) {
+		parent->hit = 0;
+		add_child(parent, chain, idx_total, syms);
+	} else {
+		parent->hit = 1;
+	}
 }
 
 static int
 __append_chain(struct callchain_node *root, struct ip_callchain *chain,
 	       int start, struct symbol **syms);
 
-static int
+static void
 __append_chain_children(struct callchain_node *root, struct ip_callchain *chain,
-			struct symbol **syms)
+			struct symbol **syms, int start)
 {
 	struct callchain_node *rnode;
 
 	/* lookup in childrens */
 	list_for_each_entry(rnode, &root->children, brothers) {
-		int ret = __append_chain(rnode, chain, root->val_nr, syms);
+		int ret = __append_chain(rnode, chain, start, syms);
 		if (!ret)
-			return 0;
+			return;
 	}
-	return -1;
+	/* nothing in children, add to the current node */
+	add_child(root, chain, start, syms);
 }
 
 static int
@@ -142,14 +190,22 @@ __append_chain(struct callchain_node *root, struct ip_callchain *chain,
 	int i = start;
 	bool found = false;
 
-	/* lookup in the current node */
+	/*
+	 * Lookup in the current node
+	 * If we have a symbol, then compare the start to match
+	 * anywhere inside a function.
+	 */
 	list_for_each_entry(cnode, &root->val, list) {
-		if (cnode->ip != chain->ips[i++])
+		if (i == chain->nr)
+			break;
+		if (cnode->sym && syms[i]) {
+			if (cnode->sym->start != syms[i]->start)
+				break;
+		} else if (cnode->ip != chain->ips[i])
 			break;
 		if (!found)
 			found = true;
-		if (i == chain->nr)
-			break;
+		i++;
 	}
 
 	/* matches not, relay on the parent */
@@ -157,23 +213,25 @@ __append_chain(struct callchain_node *root, struct ip_callchain *chain,
 		return -1;
 
 	/* we match only a part of the node. Split it and add the new chain */
-	if (i < root->val_nr) {
-		split_add_child(root, chain, cnode, i, syms);
+	if (i - start < root->val_nr) {
+		split_add_child(root, chain, cnode, start, i - start, syms);
 		return 0;
 	}
 
 	/* we match 100% of the path, increment the hit */
-	if (i == root->val_nr) {
+	if (i - start == root->val_nr && i == chain->nr) {
 		root->hit++;
 		return 0;
 	}
 
-	return __append_chain_children(root, chain, syms);
+	/* We match the node and still have a part remaining */
+	__append_chain_children(root, chain, syms, i);
+
+	return 0;
 }
 
 void append_chain(struct callchain_node *root, struct ip_callchain *chain,
 		  struct symbol **syms)
 {
-	if (__append_chain_children(root, chain, syms) == -1)
-		add_child(root, chain, syms);
+	__append_chain_children(root, chain, syms, 0);
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] perfcounter: callchain symbol resolving and fixes
  2009-07-01  8:18 ` [PATCH 0/3] perfcounter: callchain symbol resolving and fixes Ingo Molnar
@ 2009-07-01 16:33   ` Frederic Weisbecker
  2009-07-01 17:14     ` Frederic Weisbecker
  0 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2009-07-01 16:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Mike Galbraith, Paul Mackerras,
	Anton Blanchard, Arnaldo Carvalho de Melo

On Wed, Jul 01, 2009 at 10:18:14AM +0200, Ingo Molnar wrote:
> 
> * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 
> > Hi,
> > 
> > This patchset provides the symbol resolving for callchains.
> > Example:
> > 
> > perf report -s sym -c
> > 
> > 5.40%  [k] __d_lookup
> >              3.60%
> >                 __d_lookup
> >                 perf_callchain
> >                 perf_counter_overflow
> >                 intel_pmu_handle_irq
> >                 perf_counter_nmi_handler
> >                 notifier_call_chain
> >                 atomic_notifier_call_chain
> >                 notify_die
> >                 do_nmi
> >                 nmi
> >                 do_lookup
> >                 __link_path_walk
> >                 path_walk
> >                 do_path_lookup
> >                 user_path_at
> >                 vfs_fstatat
> >                 vfs_lstat
> >                 sys_newlstat
> >                 system_call_fastpath
> >                 __lxstat
> >                 0x406fb1
> 
> nice!
> 
> > Sorry about the third patch, it's a kind of all-in-one monolithic 
> > thing which gathers various fixes. I should have granulate it...
> 
> No problem, it's good enough - it's all about the same topic.
> 
> > 
> > Still in my plans:
> > 
> > - profit we have a tree to display a better graph hierarchy
> > - let the user provide a limit for hit percentage, depth, number of
> >   backtraces, etc...
> > - better output
> > - colors
> > 
> > And another one:
> > 
> > - remove the perfcounter internal nmi call frame (ie: every nmi frame)
> >   so that we drop this header from each callchain:
> > 
> >                 perf_callchain
> >                 perf_counter_overflow
> >                 intel_pmu_handle_irq
> >                 perf_counter_nmi_handler
> >                 notifier_call_chain
> >                 atomic_notifier_call_chain
> >                 notify_die
> >                 do_nmi
> >                 nmi
> 
> Sounds good. I suspect this latter one is the most important one 
> because right now the backtrace output screen real estate is 
> dominated by the repetitive nmi entries, making it hard to interpret 
> the result 'at a glance'.
> 
> I think we should skip those NMI entries right in the kernel - that 
> will also make call-chain event records quite a bit smaller, by 
> about 72 bytes per call-chain record.
> 
> We can do the skipping by using this backtrace-generator callback in 
> arch/x86/kernel/cpu/perf_counter.c:
> 
>  static int backtrace_stack(void *data, char *name)
>  {
>          /* Process all stacks: */
>          return 0;
>  }
> 
> The 'name' parameter passed in signals the type of stack frame we 
> are processing. If you look into arch/x86/kernel/dumpstack_64.c, it 
> can be one of these strings:
> 
>         static char ids[][8] = {
>                 [DEBUG_STACK - 1] = "#DB",
>                 [NMI_STACK - 1] = "NMI",
>                 [DOUBLEFAULT_STACK - 1] = "#DF",
>                 [STACKFAULT_STACK - 1] = "#SS",
>                 [MCE_STACK - 1] = "#MC",
> 
> A quick check to see whether this concept works would be expose the 
> ids array and do:
> 
>  static int PER_CPU(int, is_nmi_frame);
> 
>  static int backtrace_stack(void *data, char *name)
>  {
> 	if (name == x86_stack_ids[NMI_STACK-1])


IIRC, gcc manages to factorize the string table in the elf
format right?
So that a simple == should indeed work here.

Because if you look at dumpstack_64.c, the calls to ->stack()
use plain const string for some of them:

ops->stack(data, "IRQ")

But "NMI" is always passed by its real address in the ids so
that should work without problem here.

(I just feared about using strcmp is such a fastpath).



> 		per_cpu(is_nmi_frame, raw_processor_id()) = 1;
> 	else
> 		per_cpu(is_nmi_frame, raw_processor_id()) = 0;
> 
>         /* Process all stacks: */
>         return 0;
>  }
> 
> and to add something like this to backtrace_address():
> 
> 	if (per_cpu(is_nmi_frame, raw_processor_id())
> 		return;
> 
> 	Ingo


Heh, looks like I'll almost only have to copy-paste this mail :)

Another solution would be to handle an IGNORE return value
from dump_trace() instead of always terminate the trace when
->stack() < 0

Would it be useful for other kind of uses?
For now I just asssume ignoring a stack is not a known pattern
so I'll just implement your solution.

Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] perfcounter: callchain symbol resolving and fixes
  2009-07-01 16:33   ` Frederic Weisbecker
@ 2009-07-01 17:14     ` Frederic Weisbecker
  0 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2009-07-01 17:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Peter Zijlstra, Mike Galbraith, Paul Mackerras,
	Anton Blanchard, Arnaldo Carvalho de Melo

On Wed, Jul 01, 2009 at 06:33:25PM +0200, Frederic Weisbecker wrote:
> On Wed, Jul 01, 2009 at 10:18:14AM +0200, Ingo Molnar wrote:
> > 
> > * Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > 
> > > Hi,
> > > 
> > > This patchset provides the symbol resolving for callchains.
> > > Example:
> > > 
> > > perf report -s sym -c
> > > 
> > > 5.40%  [k] __d_lookup
> > >              3.60%
> > >                 __d_lookup
> > >                 perf_callchain
> > >                 perf_counter_overflow
> > >                 intel_pmu_handle_irq
> > >                 perf_counter_nmi_handler
> > >                 notifier_call_chain
> > >                 atomic_notifier_call_chain
> > >                 notify_die
> > >                 do_nmi
> > >                 nmi
> > >                 do_lookup
> > >                 __link_path_walk
> > >                 path_walk
> > >                 do_path_lookup
> > >                 user_path_at
> > >                 vfs_fstatat
> > >                 vfs_lstat
> > >                 sys_newlstat
> > >                 system_call_fastpath
> > >                 __lxstat
> > >                 0x406fb1
> > 
> > nice!
> > 
> > > Sorry about the third patch, it's a kind of all-in-one monolithic 
> > > thing which gathers various fixes. I should have granulate it...
> > 
> > No problem, it's good enough - it's all about the same topic.
> > 
> > > 
> > > Still in my plans:
> > > 
> > > - profit we have a tree to display a better graph hierarchy
> > > - let the user provide a limit for hit percentage, depth, number of
> > >   backtraces, etc...
> > > - better output
> > > - colors
> > > 
> > > And another one:
> > > 
> > > - remove the perfcounter internal nmi call frame (ie: every nmi frame)
> > >   so that we drop this header from each callchain:
> > > 
> > >                 perf_callchain
> > >                 perf_counter_overflow
> > >                 intel_pmu_handle_irq
> > >                 perf_counter_nmi_handler
> > >                 notifier_call_chain
> > >                 atomic_notifier_call_chain
> > >                 notify_die
> > >                 do_nmi
> > >                 nmi
> > 
> > Sounds good. I suspect this latter one is the most important one 
> > because right now the backtrace output screen real estate is 
> > dominated by the repetitive nmi entries, making it hard to interpret 
> > the result 'at a glance'.
> > 
> > I think we should skip those NMI entries right in the kernel - that 
> > will also make call-chain event records quite a bit smaller, by 
> > about 72 bytes per call-chain record.
> > 
> > We can do the skipping by using this backtrace-generator callback in 
> > arch/x86/kernel/cpu/perf_counter.c:
> > 
> >  static int backtrace_stack(void *data, char *name)
> >  {
> >          /* Process all stacks: */
> >          return 0;
> >  }
> > 
> > The 'name' parameter passed in signals the type of stack frame we 
> > are processing. If you look into arch/x86/kernel/dumpstack_64.c, it 
> > can be one of these strings:
> > 
> >         static char ids[][8] = {
> >                 [DEBUG_STACK - 1] = "#DB",
> >                 [NMI_STACK - 1] = "NMI",
> >                 [DOUBLEFAULT_STACK - 1] = "#DF",
> >                 [STACKFAULT_STACK - 1] = "#SS",
> >                 [MCE_STACK - 1] = "#MC",
> > 
> > A quick check to see whether this concept works would be expose the 
> > ids array and do:
> > 
> >  static int PER_CPU(int, is_nmi_frame);
> > 
> >  static int backtrace_stack(void *data, char *name)
> >  {
> > 	if (name == x86_stack_ids[NMI_STACK-1])
> 
> 
> IIRC, gcc manages to factorize the string table in the elf
> format right?
> So that a simple == should indeed work here.


Actually, the strings mapped to stack ids are
not const pointers so they won't go the string table I guess.
Anyway "NMI" is never passed as a plain string so it should work
fine.



> 
> Because if you look at dumpstack_64.c, the calls to ->stack()
> use plain const string for some of them:
> 
> ops->stack(data, "IRQ")
> 
> But "NMI" is always passed by its real address in the ids so
> that should work without problem here.
> 
> (I just feared about using strcmp is such a fastpath).
> 
> 
> 
> > 		per_cpu(is_nmi_frame, raw_processor_id()) = 1;
> > 	else
> > 		per_cpu(is_nmi_frame, raw_processor_id()) = 0;
> > 
> >         /* Process all stacks: */
> >         return 0;
> >  }
> > 
> > and to add something like this to backtrace_address():
> > 
> > 	if (per_cpu(is_nmi_frame, raw_processor_id())
> > 		return;
> > 
> > 	Ingo
> 
> 
> Heh, looks like I'll almost only have to copy-paste this mail :)
> 
> Another solution would be to handle an IGNORE return value
> from dump_trace() instead of always terminate the trace when
> ->stack() < 0
> 
> Would it be useful for other kind of uses?
> For now I just asssume ignoring a stack is not a known pattern
> so I'll just implement your solution.
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-07-01 17:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-01  3:35 [PATCH 0/3] perfcounter: callchain symbol resolving and fixes Frederic Weisbecker
2009-07-01  3:35 ` [PATCH 1/3] perfcounter: Fix storage size allocation of callchain list Frederic Weisbecker
2009-07-01  8:46   ` [tip:perfcounters/urgent] perf_counter tools: " tip-bot for Frederic Weisbecker
2009-07-01  3:35 ` [PATCH 2/3] perfcounter: Resolve symbols in callchains Frederic Weisbecker
2009-07-01  8:46   ` [tip:perfcounters/urgent] perf_counter tools: " tip-bot for Frederic Weisbecker
2009-07-01  3:35 ` [PATCH 3/3] perfcounter: Various fixes for callchains Frederic Weisbecker
2009-07-01  8:46   ` [tip:perfcounters/urgent] perf_counter tools: " tip-bot for Frederic Weisbecker
2009-07-01  8:18 ` [PATCH 0/3] perfcounter: callchain symbol resolving and fixes Ingo Molnar
2009-07-01 16:33   ` Frederic Weisbecker
2009-07-01 17:14     ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).