[PATCH 0/5][RFC] [RFC] function profiler

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5][RFC] [RFC] function profiler
@ 2009-03-25  3:56 Steven Rostedt
  2009-03-25  3:56 ` [PATCH 1/5][RFC] tracing: add " Steven Rostedt
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25  3:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird

The following patches are RFC, and to discuss is this work would be
useful.

This patch series adds a function profiler hook to the function tracer
and function graph tracer. That is, it uses the mcount calls to track the
functions being called. Now with the function graph tracer it also
keeps track of the times the functions have been executing.

Note, this is similar to oprofile, but is not a sampling approach, but
a true recorder. Of course, by recording every function the overhead
compared to oprofile is higher. With the use of dynamic ftrace, you could
just trace certain functions, or just a certain module. The flexibility
is a bit greater with this profiling.

The last patch adds another feature to the function graph. It gives
the ability to only trace the amount of time inside a function and
not the amount spent in sub functions that are also recorded. That is,
if A calls B, you can decide if the time spent in A should include the
time spent in B. This is an option added to trace_options.

Again, this is all RFC, but if there is any interest in having this
feature, prehaps we should include it.

-- Steve

The following patches are in:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git

    branch: rfc/tip/tracing/function-profiler

Steven Rostedt (5):
      tracing: add function profiler
      tracing: move function profiler data out of function struct
      tracing: adding function timings to function profiler
      tracing: make the function profiler per cpu
      function-graph: add option to calculate graph time or not

----
 include/linux/ftrace.h               |    3 +-
 kernel/trace/Kconfig                 |   15 +
 kernel/trace/ftrace.c                |  558 +++++++++++++++++++++++++++++++++-
 kernel/trace/trace.c                 |   15 +-
 kernel/trace/trace.h                 |    4 +-
 kernel/trace/trace_functions_graph.c |   25 ++-
 kernel/trace/trace_output.c          |   10 +
 kernel/trace/trace_output.h          |    2 +
 8 files changed, 608 insertions(+), 24 deletions(-)
-- 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/5][RFC] tracing: add function profiler
  2009-03-25  3:56 [PATCH 0/5][RFC] [RFC] function profiler Steven Rostedt
@ 2009-03-25  3:56 ` Steven Rostedt
  2009-03-25 10:02   ` Ingo Molnar
  2009-03-25 19:11   ` [PATCH 1/5][RFC] tracing: add function profiler Frederic Weisbecker
  2009-03-25  3:56 ` [PATCH 2/5][RFC] tracing: move function profiler data out of function struct Steven Rostedt
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25  3:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt

[-- Attachment #1: 0001-tracing-add-function-profiler.patch --]
[-- Type: text/plain, Size: 11700 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Impact: new profiling feature

This patch adds a function profiler. In debugfs/tracing/ two new
files are created.

  function_profile_enabled  - to enable or disable profiling

  trace_stat/functions   - the profiled functions.

For example:

  echo 1 > /debugfs/tracing/function_profile_enabled
  ./hackbench 50
  echo 0 > /debugfs/tracing/function_profile_enabled

yields:

  cat /debugfs/tracing/trace_stat/functions

  Function                               Hit
  --------                               ---
  _spin_lock                        10106442
  _spin_unlock                      10097492
  kfree                              6013704
  _spin_unlock_irqrestore            4423941
  _spin_lock_irqsave                 4406825
  __phys_addr                        4181686
  __slab_free                        4038222
  dput                               4030130
  path_put                           4023387
  unroll_tree_refs                   4019532
[...]

The most hit functions are listed first. Functions that are not
hit are not listed.

This feature depends on and uses dynamic function tracing. When the
function profiling is disabled, no overhead occurs. But it still
takes up around 300KB to hold the data, thus it is not recomended
to keep it enabled for systems low on memory.

When a '1' is echoed into the function_profile_enabled file, the
counters for is function is reset back to zero. Thus you can see what
functions are hit most by different programs.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 include/linux/ftrace.h |    4 +
 kernel/trace/Kconfig   |   19 +++
 kernel/trace/ftrace.c  |  313 +++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 334 insertions(+), 2 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 015a3d2..0456c3a 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -153,6 +153,10 @@ struct dyn_ftrace {
 		unsigned long		flags;
 		struct dyn_ftrace	*newlist;
 	};
+#ifdef CONFIG_FUNCTION_PROFILER
+	unsigned long			counter;
+	struct hlist_node		node;
+#endif
 	struct dyn_arch_ftrace		arch;
 };
 
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 8a4d729..95e9ad5 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -105,6 +105,7 @@ config FUNCTION_GRAPH_TRACER
 	  This is done by setting the current return address on the current
 	  task structure into a stack of calls.
 
+
 config IRQSOFF_TRACER
 	bool "Interrupts-off Latency Tracer"
 	default n
@@ -376,6 +377,24 @@ config DYNAMIC_FTRACE
 	 were made. If so, it runs stop_machine (stops all CPUS)
 	 and modifies the code to jump over the call to ftrace.
 
+config FUNCTION_PROFILER
+	bool "Kernel function profiler"
+	depends on DYNAMIC_FTRACE
+	default n
+	help
+	 This option enables the kernel function profiler. When the dynamic
+	 function tracing is enabled, a counter is added into the function
+	 records used by the dynamic function tracer. A file is created in
+	 debugfs called function_profile_enabled which defaults to zero.
+	 When a 1 is echoed into this file profiling begins, and when a
+	 zero is entered, profiling stops. A file in the trace_stats
+	 directory called functions, that show the list of functions that
+	 have been hit and their counters.
+
+	 This takes up around 320K more memory.
+
+	 If in doubt, say N
+
 config FTRACE_MCOUNT_RECORD
 	def_bool y
 	depends on DYNAMIC_FTRACE
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 7b8722b..11f364c 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -34,6 +34,7 @@
 #include <asm/ftrace.h>
 
 #include "trace.h"
+#include "trace_stat.h"
 
 #define FTRACE_WARN_ON(cond)			\
 	do {					\
@@ -261,7 +262,6 @@ struct ftrace_func_probe {
 	struct rcu_head		rcu;
 };
 
-
 enum {
 	FTRACE_ENABLE_CALLS		= (1 << 0),
 	FTRACE_DISABLE_CALLS		= (1 << 1),
@@ -309,6 +309,307 @@ static struct dyn_ftrace *ftrace_free_records;
 		}				\
 	}
 
+#ifdef CONFIG_FUNCTION_PROFILER
+static struct hlist_head *ftrace_profile_hash;
+static int ftrace_profile_bits;
+static int ftrace_profile_enabled;
+static DEFINE_MUTEX(ftrace_profile_lock);
+
+static void *
+function_stat_next(void *v, int idx)
+{
+	struct dyn_ftrace *rec = v;
+	struct ftrace_page *pg;
+
+	pg = (struct ftrace_page *)((unsigned long)rec & PAGE_MASK);
+
+ again:
+	rec++;
+	if ((void *)rec >= (void *)&pg->records[pg->index]) {
+		pg = pg->next;
+		if (!pg)
+			return NULL;
+		rec = &pg->records[0];
+	}
+
+	if (rec->flags & FTRACE_FL_FREE ||
+	    rec->flags & FTRACE_FL_FAILED ||
+	    !(rec->flags & FTRACE_FL_CONVERTED) ||
+	    /* ignore non hit functions */
+	    !rec->counter)
+		goto again;
+
+	return rec;
+}
+
+static void *function_stat_start(struct tracer_stat *trace)
+{
+	return function_stat_next(&ftrace_pages_start->records[0], 0);
+}
+
+static int function_stat_cmp(void *p1, void *p2)
+{
+	struct dyn_ftrace *a = p1;
+	struct dyn_ftrace *b = p2;
+
+	if (a->counter < b->counter)
+		return -1;
+	if (a->counter > b->counter)
+		return 1;
+	else
+		return 0;
+}
+
+static int function_stat_headers(struct seq_file *m)
+{
+	seq_printf(m, "  Function                               Hit\n"
+		      "  --------                               ---\n");
+	return 0;
+}
+
+static int function_stat_show(struct seq_file *m, void *v)
+{
+	struct dyn_ftrace *rec = v;
+	char str[KSYM_SYMBOL_LEN];
+
+	kallsyms_lookup(rec->ip, NULL, NULL, NULL, str);
+
+	seq_printf(m, "  %-30.30s  %10lu\n", str, rec->counter);
+	return 0;
+}
+
+static struct tracer_stat function_stats = {
+	.name = "functions",
+	.stat_start = function_stat_start,
+	.stat_next = function_stat_next,
+	.stat_cmp = function_stat_cmp,
+	.stat_headers = function_stat_headers,
+	.stat_show = function_stat_show
+};
+
+static void ftrace_profile_init(int nr_funcs)
+{
+	unsigned long addr;
+	int order;
+	int size;
+
+	/*
+	 * We are profiling all functions, lets make it 1/4th of the
+	 * number of functions that are in core kernel. So we have to
+	 * iterate 4 times.
+	 */
+	order = (sizeof(struct hlist_head) * nr_funcs) / 4;
+	order = get_order(order);
+	size = 1 << (PAGE_SHIFT + order);
+
+	pr_info("Allocating %d KB for profiler hash\n", size >> 10);
+
+	addr = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
+	if (!addr) {
+		pr_warning("Could not allocate function profiler hash\n");
+		return;
+	}
+
+	ftrace_profile_hash = (void *)addr;
+
+	/*
+	 * struct hlist_head should be a pointer of 4 or 8 bytes.
+	 * And a simple bit manipulation can be done, but if for
+	 * some reason struct hlist_head is not a mulitple of 2,
+	 * then we play it safe, and simply count. This function
+	 * is done once at boot up, so it is not that critical in
+	 * performance.
+	 */
+
+	size--;
+	size /= sizeof(struct hlist_head);
+
+	for (; size; size >>= 1)
+		ftrace_profile_bits++;
+
+	pr_info("Function profiler has %d hash buckets\n",
+		1 << ftrace_profile_bits);
+
+	return;
+}
+
+static ssize_t
+ftrace_profile_read(struct file *filp, char __user *ubuf,
+		     size_t cnt, loff_t *ppos)
+{
+	char buf[64];
+	int r;
+
+	r = sprintf(buf, "%u\n", ftrace_profile_enabled);
+	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+}
+
+static void ftrace_profile_reset(void)
+{
+	struct dyn_ftrace *rec;
+	struct ftrace_page *pg;
+
+	do_for_each_ftrace_rec(pg, rec) {
+		rec->counter = 0;
+	} while_for_each_ftrace_rec();
+}
+
+static struct dyn_ftrace *ftrace_find_profiled_func(unsigned long ip)
+{
+	struct dyn_ftrace *rec;
+	struct hlist_head *hhd;
+	struct hlist_node *n;
+	unsigned long flags;
+	unsigned long key;
+
+	if (!ftrace_profile_hash)
+		return NULL;
+
+	key = hash_long(ip, ftrace_profile_bits);
+	hhd = &ftrace_profile_hash[key];
+
+	if (hlist_empty(hhd))
+		return NULL;
+
+	local_irq_save(flags);
+	hlist_for_each_entry_rcu(rec, n, hhd, node) {
+		if (rec->ip == ip)
+			goto out;
+	}
+	rec = NULL;
+ out:
+	local_irq_restore(flags);
+
+	return rec;
+}
+
+static void
+function_profile_call(unsigned long ip, unsigned long parent_ip)
+{
+	struct dyn_ftrace *rec;
+	unsigned long flags;
+
+	if (!ftrace_profile_enabled)
+		return;
+
+	local_irq_save(flags);
+	rec = ftrace_find_profiled_func(ip);
+	if (!rec)
+		goto out;
+
+	rec->counter++;
+ out:
+	local_irq_restore(flags);
+}
+
+static struct ftrace_ops ftrace_profile_ops __read_mostly =
+{
+	.func = function_profile_call,
+};
+
+static ssize_t
+ftrace_profile_write(struct file *filp, const char __user *ubuf,
+		     size_t cnt, loff_t *ppos)
+{
+	unsigned long val;
+	char buf[64];
+	int ret;
+
+	if (!ftrace_profile_hash) {
+		pr_info("Can not enable hash due to earlier problems\n");
+		return -ENODEV;
+	}
+
+	if (cnt >= sizeof(buf))
+		return -EINVAL;
+
+	if (copy_from_user(&buf, ubuf, cnt))
+		return -EFAULT;
+
+	buf[cnt] = 0;
+
+	ret = strict_strtoul(buf, 10, &val);
+	if (ret < 0)
+		return ret;
+
+	val = !!val;
+
+	mutex_lock(&ftrace_profile_lock);
+	if (ftrace_profile_enabled ^ val) {
+		if (val) {
+			ftrace_profile_reset();
+			register_ftrace_function(&ftrace_profile_ops);
+			ftrace_profile_enabled = 1;
+		} else {
+			ftrace_profile_enabled = 0;
+			unregister_ftrace_function(&ftrace_profile_ops);
+		}
+	}
+	mutex_unlock(&ftrace_profile_lock);
+
+	filp->f_pos += cnt;
+
+	return cnt;
+}
+
+static const struct file_operations ftrace_profile_fops = {
+	.open		= tracing_open_generic,
+	.read		= ftrace_profile_read,
+	.write		= ftrace_profile_write,
+};
+
+static void ftrace_profile_debugfs(struct dentry *d_tracer)
+{
+	struct dentry *entry;
+	int ret;
+
+	ret = register_stat_tracer(&function_stats);
+	if (ret) {
+		pr_warning("Warning: could not register "
+			   "function stats\n");
+		return;
+	}
+
+	entry = debugfs_create_file("function_profile_enabled", 0644,
+				    d_tracer, NULL, &ftrace_profile_fops);
+	if (!entry)
+		pr_warning("Could not create debugfs "
+			   "'function_profile_enabled' entry\n");
+}
+
+static void ftrace_add_profile(struct dyn_ftrace *rec)
+{
+	unsigned long key;
+
+	if (!ftrace_profile_hash)
+		return;
+
+	key = hash_long(rec->ip, ftrace_profile_bits);
+	hlist_add_head_rcu(&rec->node, &ftrace_profile_hash[key]);
+}
+
+static void ftrace_profile_release(struct dyn_ftrace *rec)
+{
+	mutex_lock(&ftrace_profile_lock);
+	hlist_del(&rec->node);
+	mutex_unlock(&ftrace_profile_lock);
+}
+
+#else /* CONFIG_FUNCTION_PROFILER */
+static void ftrace_profile_init(int nr_funcs)
+{
+}
+static void ftrace_add_profile(struct dyn_ftrace *rec)
+{
+}
+static void ftrace_profile_debugfs(struct dentry *d_tracer)
+{
+}
+static void ftrace_profile_release(struct dyn_ftrace *rec)
+{
+}
+#endif /* CONFIG_FUNCTION_PROFILER */
+
 #ifdef CONFIG_KPROBES
 
 static int frozen_record_count;
@@ -359,8 +660,10 @@ void ftrace_release(void *start, unsigned long size)
 	mutex_lock(&ftrace_lock);
 	do_for_each_ftrace_rec(pg, rec) {
 		if ((rec->ip >= s) && (rec->ip < e) &&
-		    !(rec->flags & FTRACE_FL_FREE))
+		    !(rec->flags & FTRACE_FL_FREE)) {
 			ftrace_free_rec(rec);
+			ftrace_profile_release(rec);
+		}
 	} while_for_each_ftrace_rec();
 	mutex_unlock(&ftrace_lock);
 }
@@ -414,6 +717,8 @@ ftrace_record_ip(unsigned long ip)
 	rec->newlist = ftrace_new_addrs;
 	ftrace_new_addrs = rec;
 
+	ftrace_add_profile(rec);
+
 	return rec;
 }
 
@@ -2157,6 +2462,8 @@ static __init int ftrace_init_dyn_debugfs(struct dentry *d_tracer)
 			   "'set_graph_function' entry\n");
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
+	ftrace_profile_debugfs(d_tracer);
+
 	return 0;
 }
 
@@ -2225,6 +2532,8 @@ void __init ftrace_init(void)
 	if (ret)
 		goto failed;
 
+	ftrace_profile_init(count);
+
 	last_ftrace_enabled = ftrace_enabled = 1;
 
 	ret = ftrace_convert_nops(NULL,
-- 
1.6.2

-- 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/5][RFC] tracing: move function profiler data out of function struct
  2009-03-25  3:56 [PATCH 0/5][RFC] [RFC] function profiler Steven Rostedt
  2009-03-25  3:56 ` [PATCH 1/5][RFC] tracing: add " Steven Rostedt
@ 2009-03-25  3:56 ` Steven Rostedt
  2009-03-25 10:07   ` Ingo Molnar
  2009-03-25  3:56 ` [PATCH 3/5][RFC] tracing: adding function timings to function profiler Steven Rostedt
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25  3:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt

[-- Attachment #1: 0002-tracing-move-function-profiler-data-out-of-function.patch --]
[-- Type: text/plain, Size: 18390 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Impact: reduce size of memory in function profiler

The function profiler originally introduces its counters into the
function records itself. There is 20 thousand different functions on
a normal system, and that is adding 20 thousand counters for profiling
event when not needed.

A normal run of the profiler yields only a couple of thousand functions
executed, depending on what is being profiled. This means we have around
18 thousand useless counters.

This patch rectifies this by moving the data out of the function
records used by dynamic ftrace. Data is preallocated to hold the functions
when the profiling begins. Checks are made during profiling to see if
more recorcds should be allocated, and they are allocated if it is safe
to do so.

This also removes the dependency from using dynamic ftrace, and also
removes the overhead by having it enabled.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 include/linux/ftrace.h |    4 -
 kernel/trace/Kconfig   |   10 +-
 kernel/trace/ftrace.c  |  440 ++++++++++++++++++++++++++++--------------------
 3 files changed, 263 insertions(+), 191 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 0456c3a..015a3d2 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -153,10 +153,6 @@ struct dyn_ftrace {
 		unsigned long		flags;
 		struct dyn_ftrace	*newlist;
 	};
-#ifdef CONFIG_FUNCTION_PROFILER
-	unsigned long			counter;
-	struct hlist_node		node;
-#endif
 	struct dyn_arch_ftrace		arch;
 };
 
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 95e9ad5..8a41360 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -379,20 +379,16 @@ config DYNAMIC_FTRACE
 
 config FUNCTION_PROFILER
 	bool "Kernel function profiler"
-	depends on DYNAMIC_FTRACE
+	depends on FUNCTION_TRACER
 	default n
 	help
-	 This option enables the kernel function profiler. When the dynamic
-	 function tracing is enabled, a counter is added into the function
-	 records used by the dynamic function tracer. A file is created in
-	 debugfs called function_profile_enabled which defaults to zero.
+	 This option enables the kernel function profiler. A file is created
+	 in debugfs called function_profile_enabled which defaults to zero.
 	 When a 1 is echoed into this file profiling begins, and when a
 	 zero is entered, profiling stops. A file in the trace_stats
 	 directory called functions, that show the list of functions that
 	 have been hit and their counters.
 
-	 This takes up around 320K more memory.
-
 	 If in doubt, say N
 
 config FTRACE_MCOUNT_RECORD
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 11f364c..24dac44 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -241,87 +241,48 @@ static void ftrace_update_pid_func(void)
 #endif
 }
 
-/* set when tracing only a pid */
-struct pid *ftrace_pid_trace;
-static struct pid * const ftrace_swapper_pid = &init_struct_pid;
-
-#ifdef CONFIG_DYNAMIC_FTRACE
-
-#ifndef CONFIG_FTRACE_MCOUNT_RECORD
-# error Dynamic ftrace depends on MCOUNT_RECORD
-#endif
-
-static struct hlist_head ftrace_func_hash[FTRACE_FUNC_HASHSIZE] __read_mostly;
-
-struct ftrace_func_probe {
-	struct hlist_node	node;
-	struct ftrace_probe_ops	*ops;
-	unsigned long		flags;
-	unsigned long		ip;
-	void			*data;
-	struct rcu_head		rcu;
+#ifdef CONFIG_FUNCTION_PROFILER
+struct ftrace_profile {
+	struct hlist_node		node;
+	unsigned long			ip;
+	unsigned long			counter;
 };
 
-enum {
-	FTRACE_ENABLE_CALLS		= (1 << 0),
-	FTRACE_DISABLE_CALLS		= (1 << 1),
-	FTRACE_UPDATE_TRACE_FUNC	= (1 << 2),
-	FTRACE_ENABLE_MCOUNT		= (1 << 3),
-	FTRACE_DISABLE_MCOUNT		= (1 << 4),
-	FTRACE_START_FUNC_RET		= (1 << 5),
-	FTRACE_STOP_FUNC_RET		= (1 << 6),
+struct ftrace_profile_page {
+	struct ftrace_profile_page	*next;
+	unsigned long			index;
+	struct ftrace_profile		records[];
 };
 
-static int ftrace_filtered;
+#define PROFILE_RECORDS_SIZE						\
+	(PAGE_SIZE - offsetof(struct ftrace_profile_page, records))
 
-static struct dyn_ftrace *ftrace_new_addrs;
+#define PROFILES_PER_PAGE					\
+	(PROFILE_RECORDS_SIZE / sizeof(struct ftrace_profile))
 
-static DEFINE_MUTEX(ftrace_regex_lock);
-
-struct ftrace_page {
-	struct ftrace_page	*next;
-	int			index;
-	struct dyn_ftrace	records[];
-};
+/* TODO: make these percpu, to prevent cache line bouncing */
+static struct ftrace_profile_page *profile_pages_start;
+static struct ftrace_profile_page *profile_pages;
 
-#define ENTRIES_PER_PAGE \
-  ((PAGE_SIZE - sizeof(struct ftrace_page)) / sizeof(struct dyn_ftrace))
-
-/* estimate from running different kernels */
-#define NR_TO_INIT		10000
-
-static struct ftrace_page	*ftrace_pages_start;
-static struct ftrace_page	*ftrace_pages;
-
-static struct dyn_ftrace *ftrace_free_records;
-
-/*
- * This is a double for. Do not use 'break' to break out of the loop,
- * you must use a goto.
- */
-#define do_for_each_ftrace_rec(pg, rec)					\
-	for (pg = ftrace_pages_start; pg; pg = pg->next) {		\
-		int _____i;						\
-		for (_____i = 0; _____i < pg->index; _____i++) {	\
-			rec = &pg->records[_____i];
-
-#define while_for_each_ftrace_rec()		\
-		}				\
-	}
-
-#ifdef CONFIG_FUNCTION_PROFILER
 static struct hlist_head *ftrace_profile_hash;
 static int ftrace_profile_bits;
 static int ftrace_profile_enabled;
 static DEFINE_MUTEX(ftrace_profile_lock);
 
+static DEFINE_PER_CPU(atomic_t, ftrace_profile_disable);
+
+#define FTRACE_PROFILE_HASH_SIZE 1024 /* must be power of 2 */
+
+static raw_spinlock_t ftrace_profile_rec_lock =
+	(raw_spinlock_t)__RAW_SPIN_LOCK_UNLOCKED;
+
 static void *
 function_stat_next(void *v, int idx)
 {
-	struct dyn_ftrace *rec = v;
-	struct ftrace_page *pg;
+	struct ftrace_profile *rec = v;
+	struct ftrace_profile_page *pg;
 
-	pg = (struct ftrace_page *)((unsigned long)rec & PAGE_MASK);
+	pg = (struct ftrace_profile_page *)((unsigned long)rec & PAGE_MASK);
 
  again:
 	rec++;
@@ -330,27 +291,22 @@ function_stat_next(void *v, int idx)
 		if (!pg)
 			return NULL;
 		rec = &pg->records[0];
+		if (!rec->counter)
+			goto again;
 	}
 
-	if (rec->flags & FTRACE_FL_FREE ||
-	    rec->flags & FTRACE_FL_FAILED ||
-	    !(rec->flags & FTRACE_FL_CONVERTED) ||
-	    /* ignore non hit functions */
-	    !rec->counter)
-		goto again;
-
 	return rec;
 }
 
 static void *function_stat_start(struct tracer_stat *trace)
 {
-	return function_stat_next(&ftrace_pages_start->records[0], 0);
+	return function_stat_next(&profile_pages_start->records[0], 0);
 }
 
 static int function_stat_cmp(void *p1, void *p2)
 {
-	struct dyn_ftrace *a = p1;
-	struct dyn_ftrace *b = p2;
+	struct ftrace_profile *a = p1;
+	struct ftrace_profile *b = p2;
 
 	if (a->counter < b->counter)
 		return -1;
@@ -369,7 +325,7 @@ static int function_stat_headers(struct seq_file *m)
 
 static int function_stat_show(struct seq_file *m, void *v)
 {
-	struct dyn_ftrace *rec = v;
+	struct ftrace_profile *rec = v;
 	char str[KSYM_SYMBOL_LEN];
 
 	kallsyms_lookup(rec->ip, NULL, NULL, NULL, str);
@@ -387,115 +343,191 @@ static struct tracer_stat function_stats = {
 	.stat_show = function_stat_show
 };
 
-static void ftrace_profile_init(int nr_funcs)
+static void ftrace_profile_reset(void)
 {
-	unsigned long addr;
-	int order;
-	int size;
+	struct ftrace_profile_page *pg;
 
-	/*
-	 * We are profiling all functions, lets make it 1/4th of the
-	 * number of functions that are in core kernel. So we have to
-	 * iterate 4 times.
-	 */
-	order = (sizeof(struct hlist_head) * nr_funcs) / 4;
-	order = get_order(order);
-	size = 1 << (PAGE_SHIFT + order);
-
-	pr_info("Allocating %d KB for profiler hash\n", size >> 10);
+	pg = profile_pages = profile_pages_start;
 
-	addr = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
-	if (!addr) {
-		pr_warning("Could not allocate function profiler hash\n");
-		return;
+	while (pg) {
+		memset(pg->records, 0, PROFILE_RECORDS_SIZE);
+		pg->index = 0;
+		pg = pg->next;
 	}
 
-	ftrace_profile_hash = (void *)addr;
+	memset(ftrace_profile_hash, 0,
+	       FTRACE_PROFILE_HASH_SIZE * sizeof(struct hlist_head));
+}
 
-	/*
-	 * struct hlist_head should be a pointer of 4 or 8 bytes.
-	 * And a simple bit manipulation can be done, but if for
-	 * some reason struct hlist_head is not a mulitple of 2,
-	 * then we play it safe, and simply count. This function
-	 * is done once at boot up, so it is not that critical in
-	 * performance.
-	 */
+int ftrace_profile_pages_init(void)
+{
+	struct ftrace_profile_page *pg;
+	int i;
 
-	size--;
-	size /= sizeof(struct hlist_head);
+	/* If we already allocated, do nothing */
+	if (profile_pages)
+		return 0;
 
-	for (; size; size >>= 1)
-		ftrace_profile_bits++;
+	profile_pages = (void *)get_zeroed_page(GFP_KERNEL);
+	if (!profile_pages)
+		return -ENOMEM;
 
-	pr_info("Function profiler has %d hash buckets\n",
-		1 << ftrace_profile_bits);
+	pg = profile_pages_start = profile_pages;
 
-	return;
+	/* allocate 10 more pages to start */
+	for (i = 0; i < 10; i++) {
+		pg->next = (void *)get_zeroed_page(GFP_KERNEL);
+		/*
+		 * We only care about allocating profile_pages, if
+		 * we failed to allocate here, hopefully we will allocate
+		 * later.
+		 */
+		if (!pg->next)
+			break;
+		pg = pg->next;
+	}
+
+	return 0;
 }
 
-static ssize_t
-ftrace_profile_read(struct file *filp, char __user *ubuf,
-		     size_t cnt, loff_t *ppos)
+static int ftrace_profile_init(void)
 {
-	char buf[64];
-	int r;
+	int size;
 
-	r = sprintf(buf, "%u\n", ftrace_profile_enabled);
-	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
-}
+	if (ftrace_profile_hash) {
+		/* If the profile is already created, simply reset it */
+		ftrace_profile_reset();
+		return 0;
+	}
 
-static void ftrace_profile_reset(void)
-{
-	struct dyn_ftrace *rec;
-	struct ftrace_page *pg;
+	/*
+	 * We are profiling all functions, but usually only a few thousand
+	 * functions are hit. We'll make a hash of 1024 items.
+	 */
+	size = FTRACE_PROFILE_HASH_SIZE;
 
-	do_for_each_ftrace_rec(pg, rec) {
-		rec->counter = 0;
-	} while_for_each_ftrace_rec();
+	ftrace_profile_hash =
+		kzalloc(sizeof(struct hlist_head) * size, GFP_KERNEL);
+
+	if (!ftrace_profile_hash)
+		return -ENOMEM;
+
+	size--;
+
+	for (; size; size >>= 1)
+		ftrace_profile_bits++;
+
+	/* Preallocate a few pages */
+	if (ftrace_profile_pages_init() < 0) {
+		kfree(ftrace_profile_hash);
+		ftrace_profile_hash = NULL;
+		return -ENOMEM;
+	}
+
+	return 0;
 }
 
-static struct dyn_ftrace *ftrace_find_profiled_func(unsigned long ip)
+/* interrupts must be disabled */
+static struct ftrace_profile *ftrace_find_profiled_func(unsigned long ip)
 {
-	struct dyn_ftrace *rec;
+	struct ftrace_profile *rec;
 	struct hlist_head *hhd;
 	struct hlist_node *n;
-	unsigned long flags;
 	unsigned long key;
 
-	if (!ftrace_profile_hash)
-		return NULL;
-
 	key = hash_long(ip, ftrace_profile_bits);
 	hhd = &ftrace_profile_hash[key];
 
 	if (hlist_empty(hhd))
 		return NULL;
 
-	local_irq_save(flags);
 	hlist_for_each_entry_rcu(rec, n, hhd, node) {
 		if (rec->ip == ip)
-			goto out;
+			return rec;
+	}
+
+	return NULL;
+}
+
+static void ftrace_add_profile(struct ftrace_profile *rec)
+{
+	unsigned long key;
+
+	key = hash_long(rec->ip, ftrace_profile_bits);
+	hlist_add_head_rcu(&rec->node, &ftrace_profile_hash[key]);
+}
+
+/* Interrupts must be disabled calling this */
+static struct ftrace_profile *
+ftrace_profile_alloc(unsigned long ip, bool alloc_safe)
+{
+	struct ftrace_profile *rec = NULL;
+
+	/* prevent recursion */
+	if (atomic_inc_return(&__get_cpu_var(ftrace_profile_disable)) != 1)
+		goto out;
+
+	__raw_spin_lock(&ftrace_profile_rec_lock);
+
+	/* Try to always keep another page available */
+	if (!profile_pages->next && alloc_safe)
+		profile_pages->next = (void *)get_zeroed_page(GFP_ATOMIC);
+
+	/*
+	 * Try to find the function again since another
+	 * task on another CPU could have added it
+	 */
+	rec = ftrace_find_profiled_func(ip);
+	if (rec)
+		goto out_unlock;
+
+	if (profile_pages->index == PROFILES_PER_PAGE) {
+		if (!profile_pages->next)
+			goto out_unlock;
+		profile_pages = profile_pages->next;
 	}
-	rec = NULL;
+
+	rec = &profile_pages->records[profile_pages->index++];
+	rec->ip = ip;
+	ftrace_add_profile(rec);
+
+ out_unlock:
+	__raw_spin_unlock(&ftrace_profile_rec_lock);
  out:
-	local_irq_restore(flags);
+	atomic_dec(&__get_cpu_var(ftrace_profile_disable));
 
 	return rec;
 }
 
+/*
+ * If we are not in an interrupt, or softirq and
+ * and interrupts are disabled and preemption is not enabled
+ * (not in a spinlock) then it should be safe to allocate memory.
+ */
+static bool ftrace_safe_to_allocate(void)
+{
+	return !in_interrupt() && irqs_disabled() && !preempt_count();
+}
+
 static void
 function_profile_call(unsigned long ip, unsigned long parent_ip)
 {
-	struct dyn_ftrace *rec;
+	struct ftrace_profile *rec;
 	unsigned long flags;
+	bool alloc_safe;
 
 	if (!ftrace_profile_enabled)
 		return;
 
+	alloc_safe = ftrace_safe_to_allocate();
+
 	local_irq_save(flags);
 	rec = ftrace_find_profiled_func(ip);
-	if (!rec)
-		goto out;
+	if (!rec) {
+		rec = ftrace_profile_alloc(ip, alloc_safe);
+		if (!rec)
+			goto out;
+	}
 
 	rec->counter++;
  out:
@@ -515,11 +547,6 @@ ftrace_profile_write(struct file *filp, const char __user *ubuf,
 	char buf[64];
 	int ret;
 
-	if (!ftrace_profile_hash) {
-		pr_info("Can not enable hash due to earlier problems\n");
-		return -ENODEV;
-	}
-
 	if (cnt >= sizeof(buf))
 		return -EINVAL;
 
@@ -537,7 +564,12 @@ ftrace_profile_write(struct file *filp, const char __user *ubuf,
 	mutex_lock(&ftrace_profile_lock);
 	if (ftrace_profile_enabled ^ val) {
 		if (val) {
-			ftrace_profile_reset();
+			ret = ftrace_profile_init();
+			if (ret < 0) {
+				cnt = ret;
+				goto out;
+			}
+
 			register_ftrace_function(&ftrace_profile_ops);
 			ftrace_profile_enabled = 1;
 		} else {
@@ -545,6 +577,7 @@ ftrace_profile_write(struct file *filp, const char __user *ubuf,
 			unregister_ftrace_function(&ftrace_profile_ops);
 		}
 	}
+ out:
 	mutex_unlock(&ftrace_profile_lock);
 
 	filp->f_pos += cnt;
@@ -552,6 +585,17 @@ ftrace_profile_write(struct file *filp, const char __user *ubuf,
 	return cnt;
 }
 
+static ssize_t
+ftrace_profile_read(struct file *filp, char __user *ubuf,
+		     size_t cnt, loff_t *ppos)
+{
+	char buf[64];
+	int r;
+
+	r = sprintf(buf, "%u\n", ftrace_profile_enabled);
+	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+}
+
 static const struct file_operations ftrace_profile_fops = {
 	.open		= tracing_open_generic,
 	.read		= ftrace_profile_read,
@@ -577,39 +621,80 @@ static void ftrace_profile_debugfs(struct dentry *d_tracer)
 			   "'function_profile_enabled' entry\n");
 }
 
-static void ftrace_add_profile(struct dyn_ftrace *rec)
-{
-	unsigned long key;
-
-	if (!ftrace_profile_hash)
-		return;
-
-	key = hash_long(rec->ip, ftrace_profile_bits);
-	hlist_add_head_rcu(&rec->node, &ftrace_profile_hash[key]);
-}
-
-static void ftrace_profile_release(struct dyn_ftrace *rec)
-{
-	mutex_lock(&ftrace_profile_lock);
-	hlist_del(&rec->node);
-	mutex_unlock(&ftrace_profile_lock);
-}
-
 #else /* CONFIG_FUNCTION_PROFILER */
-static void ftrace_profile_init(int nr_funcs)
-{
-}
-static void ftrace_add_profile(struct dyn_ftrace *rec)
-{
-}
 static void ftrace_profile_debugfs(struct dentry *d_tracer)
 {
 }
-static void ftrace_profile_release(struct dyn_ftrace *rec)
-{
-}
 #endif /* CONFIG_FUNCTION_PROFILER */
 
+/* set when tracing only a pid */
+struct pid *ftrace_pid_trace;
+static struct pid * const ftrace_swapper_pid = &init_struct_pid;
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+
+#ifndef CONFIG_FTRACE_MCOUNT_RECORD
+# error Dynamic ftrace depends on MCOUNT_RECORD
+#endif
+
+static struct hlist_head ftrace_func_hash[FTRACE_FUNC_HASHSIZE] __read_mostly;
+
+struct ftrace_func_probe {
+	struct hlist_node	node;
+	struct ftrace_probe_ops	*ops;
+	unsigned long		flags;
+	unsigned long		ip;
+	void			*data;
+	struct rcu_head		rcu;
+};
+
+enum {
+	FTRACE_ENABLE_CALLS		= (1 << 0),
+	FTRACE_DISABLE_CALLS		= (1 << 1),
+	FTRACE_UPDATE_TRACE_FUNC	= (1 << 2),
+	FTRACE_ENABLE_MCOUNT		= (1 << 3),
+	FTRACE_DISABLE_MCOUNT		= (1 << 4),
+	FTRACE_START_FUNC_RET		= (1 << 5),
+	FTRACE_STOP_FUNC_RET		= (1 << 6),
+};
+
+static int ftrace_filtered;
+
+static struct dyn_ftrace *ftrace_new_addrs;
+
+static DEFINE_MUTEX(ftrace_regex_lock);
+
+struct ftrace_page {
+	struct ftrace_page	*next;
+	int			index;
+	struct dyn_ftrace	records[];
+};
+
+#define ENTRIES_PER_PAGE \
+  ((PAGE_SIZE - sizeof(struct ftrace_page)) / sizeof(struct dyn_ftrace))
+
+/* estimate from running different kernels */
+#define NR_TO_INIT		10000
+
+static struct ftrace_page	*ftrace_pages_start;
+static struct ftrace_page	*ftrace_pages;
+
+static struct dyn_ftrace *ftrace_free_records;
+
+/*
+ * This is a double for. Do not use 'break' to break out of the loop,
+ * you must use a goto.
+ */
+#define do_for_each_ftrace_rec(pg, rec)					\
+	for (pg = ftrace_pages_start; pg; pg = pg->next) {		\
+		int _____i;						\
+		for (_____i = 0; _____i < pg->index; _____i++) {	\
+			rec = &pg->records[_____i];
+
+#define while_for_each_ftrace_rec()		\
+		}				\
+	}
+
 #ifdef CONFIG_KPROBES
 
 static int frozen_record_count;
@@ -660,10 +745,8 @@ void ftrace_release(void *start, unsigned long size)
 	mutex_lock(&ftrace_lock);
 	do_for_each_ftrace_rec(pg, rec) {
 		if ((rec->ip >= s) && (rec->ip < e) &&
-		    !(rec->flags & FTRACE_FL_FREE)) {
+		    !(rec->flags & FTRACE_FL_FREE))
 			ftrace_free_rec(rec);
-			ftrace_profile_release(rec);
-		}
 	} while_for_each_ftrace_rec();
 	mutex_unlock(&ftrace_lock);
 }
@@ -717,8 +800,6 @@ ftrace_record_ip(unsigned long ip)
 	rec->newlist = ftrace_new_addrs;
 	ftrace_new_addrs = rec;
 
-	ftrace_add_profile(rec);
-
 	return rec;
 }
 
@@ -2462,8 +2543,6 @@ static __init int ftrace_init_dyn_debugfs(struct dentry *d_tracer)
 			   "'set_graph_function' entry\n");
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
-	ftrace_profile_debugfs(d_tracer);
-
 	return 0;
 }
 
@@ -2532,8 +2611,6 @@ void __init ftrace_init(void)
 	if (ret)
 		goto failed;
 
-	ftrace_profile_init(count);
-
 	last_ftrace_enabled = ftrace_enabled = 1;
 
 	ret = ftrace_convert_nops(NULL,
@@ -2734,6 +2811,9 @@ static __init int ftrace_init_debugfs(void)
 	if (!entry)
 		pr_warning("Could not create debugfs "
 			   "'set_ftrace_pid' entry\n");
+
+	ftrace_profile_debugfs(d_tracer);
+
 	return 0;
 }
 fs_initcall(ftrace_init_debugfs);
-- 
1.6.2

-- 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/5][RFC] tracing: adding function timings to function profiler
  2009-03-25  3:56 [PATCH 0/5][RFC] [RFC] function profiler Steven Rostedt
  2009-03-25  3:56 ` [PATCH 1/5][RFC] tracing: add " Steven Rostedt
  2009-03-25  3:56 ` [PATCH 2/5][RFC] tracing: move function profiler data out of function struct Steven Rostedt
@ 2009-03-25  3:56 ` Steven Rostedt
  2009-03-25 18:30   ` Frederic Weisbecker
  2009-03-25  3:56 ` [PATCH 4/5][RFC] tracing: make the function profiler per cpu Steven Rostedt
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25  3:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt

[-- Attachment #1: 0003-tracing-adding-function-timings-to-function-profile.patch --]
[-- Type: text/plain, Size: 9252 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

If the function graph trace is enabled, the function profiler will
use it to take the timing of the functions.

 cat /debug/tracing/trace_stat/functions

  Function                               Hit    Time
  --------                               ---    ----
  mwait_idle                             127    183028.4 us
  schedule                                26    151997.7 us
  __schedule                              31    151975.1 us
  sys_wait4                                2    74080.53 us
  do_wait                                  2    74077.80 us
  sys_newlstat                           138    39929.16 us
  do_path_lookup                         179    39845.79 us
  vfs_lstat_fd                           138    39761.97 us
  user_path_at                           153    39469.58 us
  path_walk                              179    39435.76 us
  __link_path_walk                       189    39143.73 us
[...]

Note the times are skewed due to the function graph tracer not taking
into account schedules.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 kernel/trace/ftrace.c                |   93 ++++++++++++++++++++++++++++++++--
 kernel/trace/trace.c                 |   11 ----
 kernel/trace/trace.h                 |    3 +-
 kernel/trace/trace_functions_graph.c |   17 +++++-
 kernel/trace/trace_output.c          |   10 ++++
 kernel/trace/trace_output.h          |    2 +
 6 files changed, 117 insertions(+), 19 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 24dac44..a9ccd71 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -33,7 +33,7 @@
 
 #include <asm/ftrace.h>
 
-#include "trace.h"
+#include "trace_output.h"
 #include "trace_stat.h"
 
 #define FTRACE_WARN_ON(cond)			\
@@ -246,6 +246,9 @@ struct ftrace_profile {
 	struct hlist_node		node;
 	unsigned long			ip;
 	unsigned long			counter;
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+	unsigned long long		time;
+#endif
 };
 
 struct ftrace_profile_page {
@@ -303,6 +306,22 @@ static void *function_stat_start(struct tracer_stat *trace)
 	return function_stat_next(&profile_pages_start->records[0], 0);
 }
 
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+/* function graph compares on total time */
+static int function_stat_cmp(void *p1, void *p2)
+{
+	struct ftrace_profile *a = p1;
+	struct ftrace_profile *b = p2;
+
+	if (a->time < b->time)
+		return -1;
+	if (a->time > b->time)
+		return 1;
+	else
+		return 0;
+}
+#else
+/* not function graph compares against hits */
 static int function_stat_cmp(void *p1, void *p2)
 {
 	struct ftrace_profile *a = p1;
@@ -315,11 +334,17 @@ static int function_stat_cmp(void *p1, void *p2)
 	else
 		return 0;
 }
+#endif
 
 static int function_stat_headers(struct seq_file *m)
 {
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+	seq_printf(m, "  Function                               Hit    Time\n"
+		      "  --------                               ---    ----\n");
+#else
 	seq_printf(m, "  Function                               Hit\n"
 		      "  --------                               ---\n");
+#endif
 	return 0;
 }
 
@@ -327,10 +352,25 @@ static int function_stat_show(struct seq_file *m, void *v)
 {
 	struct ftrace_profile *rec = v;
 	char str[KSYM_SYMBOL_LEN];
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+	static struct trace_seq s;
+	static DEFINE_MUTEX(mutex);
+
+	mutex_lock(&mutex);
+	trace_seq_init(&s);
+	trace_print_graph_duration(rec->time, &s);
+#endif
 
 	kallsyms_lookup(rec->ip, NULL, NULL, NULL, str);
+	seq_printf(m, "  %-30.30s  %10lu", str, rec->counter);
+
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+	seq_printf(m, "    ");
+	trace_print_seq(m, &s);
+	mutex_unlock(&mutex);
+#endif
+	seq_putc(m, '\n');
 
-	seq_printf(m, "  %-30.30s  %10lu\n", str, rec->counter);
 	return 0;
 }
 
@@ -534,11 +574,52 @@ function_profile_call(unsigned long ip, unsigned long parent_ip)
 	local_irq_restore(flags);
 }
 
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+static int profile_graph_entry(struct ftrace_graph_ent *trace)
+{
+	function_profile_call(trace->func, 0);
+	return 1;
+}
+
+static void profile_graph_return(struct ftrace_graph_ret *trace)
+{
+	unsigned long flags;
+	struct ftrace_profile *rec;
+
+	local_irq_save(flags);
+	rec = ftrace_find_profiled_func(trace->func);
+	if (rec)
+		rec->time += trace->rettime - trace->calltime;
+	local_irq_restore(flags);
+}
+
+static int register_ftrace_profiler(void)
+{
+	return register_ftrace_graph(&profile_graph_return,
+				     &profile_graph_entry);
+}
+
+static void unregister_ftrace_profiler(void)
+{
+	unregister_ftrace_graph();
+}
+#else
 static struct ftrace_ops ftrace_profile_ops __read_mostly =
 {
 	.func = function_profile_call,
 };
 
+static int register_ftrace_profiler(void)
+{
+	return register_ftrace_function(&ftrace_profile_ops);
+}
+
+static void unregister_ftrace_profiler(void)
+{
+	unregister_ftrace_function(&ftrace_profile_ops);
+}
+#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
 static ssize_t
 ftrace_profile_write(struct file *filp, const char __user *ubuf,
 		     size_t cnt, loff_t *ppos)
@@ -570,11 +651,15 @@ ftrace_profile_write(struct file *filp, const char __user *ubuf,
 				goto out;
 			}
 
-			register_ftrace_function(&ftrace_profile_ops);
+			ret = register_ftrace_profiler();
+			if (ret < 0) {
+				cnt = ret;
+				goto out;
+			}
 			ftrace_profile_enabled = 1;
 		} else {
 			ftrace_profile_enabled = 0;
-			unregister_ftrace_function(&ftrace_profile_ops);
+			unregister_ftrace_profiler();
 		}
 	}
  out:
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 67c6a21..821bf49 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -402,17 +402,6 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, void *buf, size_t cnt)
 	return cnt;
 }
 
-static void
-trace_print_seq(struct seq_file *m, struct trace_seq *s)
-{
-	int len = s->len >= PAGE_SIZE ? PAGE_SIZE - 1 : s->len;
-
-	s->buffer[len] = 0;
-	seq_puts(m, s->buffer);
-
-	trace_seq_init(s);
-}
-
 /**
  * update_max_tr - snapshot all trace buffers from global_trace to max_tr
  * @tr: tracer
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d7410bb..c66ca3b 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -605,6 +605,8 @@ extern unsigned long trace_flags;
 /* Standard output formatting function used for function return traces */
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 extern enum print_line_t print_graph_function(struct trace_iterator *iter);
+extern enum print_line_t
+trace_print_graph_duration(unsigned long long duration, struct trace_seq *s);
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 /* TODO: make this variable */
@@ -636,7 +638,6 @@ static inline int ftrace_graph_addr(unsigned long addr)
 	return 1;
 }
 #endif /* CONFIG_DYNAMIC_FTRACE */
-
 #else /* CONFIG_FUNCTION_GRAPH_TRACER */
 static inline enum print_line_t
 print_graph_function(struct trace_iterator *iter)
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index d28687e..85bba0f 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -426,8 +426,8 @@ print_graph_irq(struct trace_iterator *iter, unsigned long addr,
 	return TRACE_TYPE_HANDLED;
 }
 
-static enum print_line_t
-print_graph_duration(unsigned long long duration, struct trace_seq *s)
+enum print_line_t
+trace_print_graph_duration(unsigned long long duration, struct trace_seq *s)
 {
 	unsigned long nsecs_rem = do_div(duration, 1000);
 	/* log10(ULONG_MAX) + '\0' */
@@ -464,12 +464,23 @@ print_graph_duration(unsigned long long duration, struct trace_seq *s)
 		if (!ret)
 			return TRACE_TYPE_PARTIAL_LINE;
 	}
+	return TRACE_TYPE_HANDLED;
+}
+
+static enum print_line_t
+print_graph_duration(unsigned long long duration, struct trace_seq *s)
+{
+	int ret;
+
+	ret = trace_print_graph_duration(duration, s);
+	if (ret != TRACE_TYPE_HANDLED)
+		return ret;
 
 	ret = trace_seq_printf(s, "|  ");
 	if (!ret)
 		return TRACE_TYPE_PARTIAL_LINE;
-	return TRACE_TYPE_HANDLED;
 
+	return TRACE_TYPE_HANDLED;
 }
 
 /* Case of a leaf function on its call entry */
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 19261fd..a3b6e3f 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -19,6 +19,16 @@ static struct hlist_head event_hash[EVENT_HASHSIZE] __read_mostly;
 
 static int next_event_type = __TRACE_LAST_TYPE + 1;
 
+void trace_print_seq(struct seq_file *m, struct trace_seq *s)
+{
+	int len = s->len >= PAGE_SIZE ? PAGE_SIZE - 1 : s->len;
+
+	s->buffer[len] = 0;
+	seq_puts(m, s->buffer);
+
+	trace_seq_init(s);
+}
+
 enum print_line_t trace_print_bprintk_msg_only(struct trace_iterator *iter)
 {
 	struct trace_seq *s = &iter->seq;
diff --git a/kernel/trace/trace_output.h b/kernel/trace/trace_output.h
index 35c422f..1eac297 100644
--- a/kernel/trace/trace_output.h
+++ b/kernel/trace/trace_output.h
@@ -20,6 +20,8 @@ trace_print_bprintk_msg_only(struct trace_iterator *iter);
 extern enum print_line_t
 trace_print_printk_msg_only(struct trace_iterator *iter);
 
+extern void trace_print_seq(struct seq_file *m, struct trace_seq *s);
+
 extern int trace_seq_printf(struct trace_seq *s, const char *fmt, ...)
 	__attribute__ ((format (printf, 2, 3)));
 extern int
-- 
1.6.2

-- 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/5][RFC] tracing: make the function profiler per cpu
  2009-03-25  3:56 [PATCH 0/5][RFC] [RFC] function profiler Steven Rostedt
                   ` (2 preceding siblings ...)
  2009-03-25  3:56 ` [PATCH 3/5][RFC] tracing: adding function timings to function profiler Steven Rostedt
@ 2009-03-25  3:56 ` Steven Rostedt
  2009-03-25  3:56 ` [PATCH 5/5][RFC] function-graph: add option to calculate graph time or not Steven Rostedt
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25  3:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt

[-- Attachment #1: 0004-tracing-make-the-function-profiler-per-cpu.patch --]
[-- Type: text/plain, Size: 10394 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Impact: speed enhancement

By making the function profiler record in per cpu data we not only
get better readings, avoid races, we also do not have to take any
locks.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 kernel/trace/ftrace.c |  201 ++++++++++++++++++++++++++++++++-----------------
 1 files changed, 131 insertions(+), 70 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index a9ccd71..ed1fc50 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -257,28 +257,28 @@ struct ftrace_profile_page {
 	struct ftrace_profile		records[];
 };
 
+struct ftrace_profile_stat {
+	atomic_t			disabled;
+	struct hlist_head		*hash;
+	struct ftrace_profile_page	*pages;
+	struct ftrace_profile_page	*start;
+	struct tracer_stat		stat;
+};
+
 #define PROFILE_RECORDS_SIZE						\
 	(PAGE_SIZE - offsetof(struct ftrace_profile_page, records))
 
 #define PROFILES_PER_PAGE					\
 	(PROFILE_RECORDS_SIZE / sizeof(struct ftrace_profile))
 
-/* TODO: make these percpu, to prevent cache line bouncing */
-static struct ftrace_profile_page *profile_pages_start;
-static struct ftrace_profile_page *profile_pages;
-
-static struct hlist_head *ftrace_profile_hash;
 static int ftrace_profile_bits;
 static int ftrace_profile_enabled;
 static DEFINE_MUTEX(ftrace_profile_lock);
 
-static DEFINE_PER_CPU(atomic_t, ftrace_profile_disable);
+static DEFINE_PER_CPU(struct ftrace_profile_stat, ftrace_profile_stats);
 
 #define FTRACE_PROFILE_HASH_SIZE 1024 /* must be power of 2 */
 
-static raw_spinlock_t ftrace_profile_rec_lock =
-	(raw_spinlock_t)__RAW_SPIN_LOCK_UNLOCKED;
-
 static void *
 function_stat_next(void *v, int idx)
 {
@@ -303,7 +303,13 @@ function_stat_next(void *v, int idx)
 
 static void *function_stat_start(struct tracer_stat *trace)
 {
-	return function_stat_next(&profile_pages_start->records[0], 0);
+	struct ftrace_profile_stat *stat =
+		container_of(trace, struct ftrace_profile_stat, stat);
+
+	if (!stat || !stat->start)
+		return NULL;
+
+	return function_stat_next(&stat->start->records[0], 0);
 }
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
@@ -374,20 +380,11 @@ static int function_stat_show(struct seq_file *m, void *v)
 	return 0;
 }
 
-static struct tracer_stat function_stats = {
-	.name = "functions",
-	.stat_start = function_stat_start,
-	.stat_next = function_stat_next,
-	.stat_cmp = function_stat_cmp,
-	.stat_headers = function_stat_headers,
-	.stat_show = function_stat_show
-};
-
-static void ftrace_profile_reset(void)
+static void ftrace_profile_reset(struct ftrace_profile_stat *stat)
 {
 	struct ftrace_profile_page *pg;
 
-	pg = profile_pages = profile_pages_start;
+	pg = stat->pages = stat->start;
 
 	while (pg) {
 		memset(pg->records, 0, PROFILE_RECORDS_SIZE);
@@ -395,24 +392,24 @@ static void ftrace_profile_reset(void)
 		pg = pg->next;
 	}
 
-	memset(ftrace_profile_hash, 0,
+	memset(stat->hash, 0,
 	       FTRACE_PROFILE_HASH_SIZE * sizeof(struct hlist_head));
 }
 
-int ftrace_profile_pages_init(void)
+int ftrace_profile_pages_init(struct ftrace_profile_stat *stat)
 {
 	struct ftrace_profile_page *pg;
 	int i;
 
 	/* If we already allocated, do nothing */
-	if (profile_pages)
+	if (stat->pages)
 		return 0;
 
-	profile_pages = (void *)get_zeroed_page(GFP_KERNEL);
-	if (!profile_pages)
+	stat->pages = (void *)get_zeroed_page(GFP_KERNEL);
+	if (!stat->pages)
 		return -ENOMEM;
 
-	pg = profile_pages_start = profile_pages;
+	pg = stat->start = stat->pages;
 
 	/* allocate 10 more pages to start */
 	for (i = 0; i < 10; i++) {
@@ -430,13 +427,16 @@ int ftrace_profile_pages_init(void)
 	return 0;
 }
 
-static int ftrace_profile_init(void)
+static int ftrace_profile_init_cpu(int cpu)
 {
+	struct ftrace_profile_stat *stat;
 	int size;
 
-	if (ftrace_profile_hash) {
+	stat = &per_cpu(ftrace_profile_stats, cpu);
+
+	if (stat->hash) {
 		/* If the profile is already created, simply reset it */
-		ftrace_profile_reset();
+		ftrace_profile_reset(stat);
 		return 0;
 	}
 
@@ -446,29 +446,45 @@ static int ftrace_profile_init(void)
 	 */
 	size = FTRACE_PROFILE_HASH_SIZE;
 
-	ftrace_profile_hash =
-		kzalloc(sizeof(struct hlist_head) * size, GFP_KERNEL);
+	stat->hash = kzalloc(sizeof(struct hlist_head) * size, GFP_KERNEL);
 
-	if (!ftrace_profile_hash)
+	if (!stat->hash)
 		return -ENOMEM;
 
-	size--;
+	if (!ftrace_profile_bits) {
+		size--;
 
-	for (; size; size >>= 1)
-		ftrace_profile_bits++;
+		for (; size; size >>= 1)
+			ftrace_profile_bits++;
+	}
 
 	/* Preallocate a few pages */
-	if (ftrace_profile_pages_init() < 0) {
-		kfree(ftrace_profile_hash);
-		ftrace_profile_hash = NULL;
+	if (ftrace_profile_pages_init(stat) < 0) {
+		kfree(stat->hash);
+		stat->hash = NULL;
 		return -ENOMEM;
 	}
 
 	return 0;
 }
 
+static int ftrace_profile_init(void)
+{
+	int cpu;
+	int ret = 0;
+
+	for_each_online_cpu(cpu) {
+		ret = ftrace_profile_init_cpu(cpu);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
 /* interrupts must be disabled */
-static struct ftrace_profile *ftrace_find_profiled_func(unsigned long ip)
+static struct ftrace_profile *
+ftrace_find_profiled_func(struct ftrace_profile_stat *stat, unsigned long ip)
 {
 	struct ftrace_profile *rec;
 	struct hlist_head *hhd;
@@ -476,7 +492,7 @@ static struct ftrace_profile *ftrace_find_profiled_func(unsigned long ip)
 	unsigned long key;
 
 	key = hash_long(ip, ftrace_profile_bits);
-	hhd = &ftrace_profile_hash[key];
+	hhd = &stat->hash[key];
 
 	if (hlist_empty(hhd))
 		return NULL;
@@ -489,52 +505,50 @@ static struct ftrace_profile *ftrace_find_profiled_func(unsigned long ip)
 	return NULL;
 }
 
-static void ftrace_add_profile(struct ftrace_profile *rec)
+static void ftrace_add_profile(struct ftrace_profile_stat *stat,
+			       struct ftrace_profile *rec)
 {
 	unsigned long key;
 
 	key = hash_long(rec->ip, ftrace_profile_bits);
-	hlist_add_head_rcu(&rec->node, &ftrace_profile_hash[key]);
+	hlist_add_head_rcu(&rec->node, &stat->hash[key]);
 }
 
 /* Interrupts must be disabled calling this */
 static struct ftrace_profile *
-ftrace_profile_alloc(unsigned long ip, bool alloc_safe)
+ftrace_profile_alloc(struct ftrace_profile_stat *stat,
+		     unsigned long ip, bool alloc_safe)
 {
 	struct ftrace_profile *rec = NULL;
 
 	/* prevent recursion */
-	if (atomic_inc_return(&__get_cpu_var(ftrace_profile_disable)) != 1)
+	if (atomic_inc_return(&stat->disabled) != 1)
 		goto out;
 
-	__raw_spin_lock(&ftrace_profile_rec_lock);
-
 	/* Try to always keep another page available */
-	if (!profile_pages->next && alloc_safe)
-		profile_pages->next = (void *)get_zeroed_page(GFP_ATOMIC);
+	if (!stat->pages->next && alloc_safe)
+		stat->pages->next = (void *)get_zeroed_page(GFP_ATOMIC);
 
 	/*
 	 * Try to find the function again since another
 	 * task on another CPU could have added it
 	 */
-	rec = ftrace_find_profiled_func(ip);
+	rec = ftrace_find_profiled_func(stat, ip);
 	if (rec)
-		goto out_unlock;
+		goto out;
 
-	if (profile_pages->index == PROFILES_PER_PAGE) {
-		if (!profile_pages->next)
-			goto out_unlock;
-		profile_pages = profile_pages->next;
+	if (stat->pages->index == PROFILES_PER_PAGE) {
+		if (!stat->pages->next)
+			goto out;
+		stat->pages = stat->pages->next;
 	}
 
-	rec = &profile_pages->records[profile_pages->index++];
+	rec = &stat->pages->records[stat->pages->index++];
 	rec->ip = ip;
-	ftrace_add_profile(rec);
+	ftrace_add_profile(stat, rec);
 
- out_unlock:
-	__raw_spin_unlock(&ftrace_profile_rec_lock);
  out:
-	atomic_dec(&__get_cpu_var(ftrace_profile_disable));
+	atomic_dec(&stat->disabled);
 
 	return rec;
 }
@@ -552,6 +566,7 @@ static bool ftrace_safe_to_allocate(void)
 static void
 function_profile_call(unsigned long ip, unsigned long parent_ip)
 {
+	struct ftrace_profile_stat *stat;
 	struct ftrace_profile *rec;
 	unsigned long flags;
 	bool alloc_safe;
@@ -562,9 +577,14 @@ function_profile_call(unsigned long ip, unsigned long parent_ip)
 	alloc_safe = ftrace_safe_to_allocate();
 
 	local_irq_save(flags);
-	rec = ftrace_find_profiled_func(ip);
+
+	stat = &__get_cpu_var(ftrace_profile_stats);
+	if (!stat->hash)
+		goto out;
+
+	rec = ftrace_find_profiled_func(stat, ip);
 	if (!rec) {
-		rec = ftrace_profile_alloc(ip, alloc_safe);
+		rec = ftrace_profile_alloc(stat, ip, alloc_safe);
 		if (!rec)
 			goto out;
 	}
@@ -583,13 +603,19 @@ static int profile_graph_entry(struct ftrace_graph_ent *trace)
 
 static void profile_graph_return(struct ftrace_graph_ret *trace)
 {
-	unsigned long flags;
+	struct ftrace_profile_stat *stat;
 	struct ftrace_profile *rec;
+	unsigned long flags;
 
 	local_irq_save(flags);
-	rec = ftrace_find_profiled_func(trace->func);
+	stat = &__get_cpu_var(ftrace_profile_stats);
+	if (!stat->hash)
+		goto out;
+
+	rec = ftrace_find_profiled_func(stat, trace->func);
 	if (rec)
 		rec->time += trace->rettime - trace->calltime;
+ out:
 	local_irq_restore(flags);
 }
 
@@ -687,16 +713,51 @@ static const struct file_operations ftrace_profile_fops = {
 	.write		= ftrace_profile_write,
 };
 
+/* used to initialize the real stat files */
+static struct tracer_stat function_stats __initdata = {
+	.name = "functions",
+	.stat_start = function_stat_start,
+	.stat_next = function_stat_next,
+	.stat_cmp = function_stat_cmp,
+	.stat_headers = function_stat_headers,
+	.stat_show = function_stat_show
+};
+
 static void ftrace_profile_debugfs(struct dentry *d_tracer)
 {
+	struct ftrace_profile_stat *stat;
 	struct dentry *entry;
+	char *name;
 	int ret;
+	int cpu;
 
-	ret = register_stat_tracer(&function_stats);
-	if (ret) {
-		pr_warning("Warning: could not register "
-			   "function stats\n");
-		return;
+	for_each_possible_cpu(cpu) {
+		stat = &per_cpu(ftrace_profile_stats, cpu);
+
+		/* allocate enough for function name + cpu number */
+		name = kmalloc(32, GFP_KERNEL);
+		if (!name) {
+			/*
+			 * The files created are permanent, if something happens
+			 * we still do not free memory.
+			 */
+			kfree(stat);
+			WARN(1,
+			     "Could not allocate stat file for cpu %d\n",
+			     cpu);
+			return;
+		}
+		stat->stat = function_stats;
+		snprintf(name, 32, "function%d", cpu);
+		stat->stat.name = name;
+		ret = register_stat_tracer(&stat->stat);
+		if (ret) {
+			WARN(1,
+			     "Could not register function stat for cpu %d\n",
+			     cpu);
+			kfree(name);
+			return;
+		}
 	}
 
 	entry = debugfs_create_file("function_profile_enabled", 0644,
-- 
1.6.2

-- 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/5][RFC] function-graph: add option to calculate graph time or not
  2009-03-25  3:56 [PATCH 0/5][RFC] [RFC] function profiler Steven Rostedt
                   ` (3 preceding siblings ...)
  2009-03-25  3:56 ` [PATCH 4/5][RFC] tracing: make the function profiler per cpu Steven Rostedt
@ 2009-03-25  3:56 ` Steven Rostedt
  2009-03-25 10:11   ` Ingo Molnar
  2009-03-25  9:50 ` [PATCH 0/5][RFC] [RFC] function profiler Ingo Molnar
  2009-03-25  9:52 ` Ingo Molnar
  6 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25  3:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt

[-- Attachment #1: 0005-function-graph-add-option-to-calculate-graph-time-o.patch --]
[-- Type: text/plain, Size: 4758 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

graph time is the time that a function is executing another function.
Thus if function A calls B, if graph-time is set, then the time for
A includes B. This is the default behavior. But if graph-time is off,
then the time spent executing B is subtracted from A.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 include/linux/ftrace.h               |    3 +--
 kernel/trace/ftrace.c                |   21 ++++++++++++++++++++-
 kernel/trace/trace.c                 |    4 +++-
 kernel/trace/trace.h                 |    1 +
 kernel/trace/trace_functions_graph.c |    8 ++++----
 5 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 015a3d2..9e0a8d2 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -365,6 +365,7 @@ struct ftrace_ret_stack {
 	unsigned long ret;
 	unsigned long func;
 	unsigned long long calltime;
+	unsigned long long subtime;
 };
 
 /*
@@ -376,8 +377,6 @@ extern void return_to_handler(void);
 
 extern int
 ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth);
-extern void
-ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret);
 
 /*
  * Sometimes we don't want to trace a function with the function
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index ed1fc50..71e5fae 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -604,6 +604,7 @@ static int profile_graph_entry(struct ftrace_graph_ent *trace)
 static void profile_graph_return(struct ftrace_graph_ret *trace)
 {
 	struct ftrace_profile_stat *stat;
+	unsigned long long calltime;
 	struct ftrace_profile *rec;
 	unsigned long flags;
 
@@ -612,9 +613,27 @@ static void profile_graph_return(struct ftrace_graph_ret *trace)
 	if (!stat->hash)
 		goto out;
 
+	calltime = trace->rettime - trace->calltime;
+
+	if (!(trace_flags & TRACE_ITER_GRAPH_TIME)) {
+		int index;
+
+		index = trace->depth;
+
+		/* Append this call time to the parent time to subtract */
+		if (index)
+			current->ret_stack[index - 1].subtime += calltime;
+
+		if (current->ret_stack[index].subtime < calltime)
+			calltime -= current->ret_stack[index].subtime;
+		else
+			calltime = 0;
+	}
+
 	rec = ftrace_find_profiled_func(stat, trace->func);
 	if (rec)
-		rec->time += trace->rettime - trace->calltime;
+		rec->time += calltime;
+
  out:
 	local_irq_restore(flags);
 }
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 821bf49..5d1a16c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -255,7 +255,8 @@ static DECLARE_WAIT_QUEUE_HEAD(trace_wait);
 
 /* trace_flags holds trace_options default values */
 unsigned long trace_flags = TRACE_ITER_PRINT_PARENT | TRACE_ITER_PRINTK |
-	TRACE_ITER_ANNOTATE | TRACE_ITER_CONTEXT_INFO | TRACE_ITER_SLEEP_TIME;
+	TRACE_ITER_ANNOTATE | TRACE_ITER_CONTEXT_INFO | TRACE_ITER_SLEEP_TIME |
+	TRACE_ITER_GRAPH_TIME;
 
 /**
  * trace_wake_up - wake up tasks waiting for trace input
@@ -317,6 +318,7 @@ static const char *trace_options[] = {
 	"latency-format",
 	"global-clock",
 	"sleep-time",
+	"graph-time",
 	NULL
 };
 
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index c66ca3b..e3429a8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -685,6 +685,7 @@ enum trace_iterator_flags {
 	TRACE_ITER_LATENCY_FMT		= 0x40000,
 	TRACE_ITER_GLOBAL_CLK		= 0x80000,
 	TRACE_ITER_SLEEP_TIME		= 0x100000,
+	TRACE_ITER_GRAPH_TIME		= 0x200000,
 };
 
 /*
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 85bba0f..10f6ad7 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -78,13 +78,14 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth)
 	current->ret_stack[index].ret = ret;
 	current->ret_stack[index].func = func;
 	current->ret_stack[index].calltime = calltime;
+	current->ret_stack[index].subtime = 0;
 	*depth = index;
 
 	return 0;
 }
 
 /* Retrieve a function return address to the trace stack on thread info.*/
-void
+static void
 ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret)
 {
 	int index;
@@ -104,9 +105,6 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret)
 	trace->calltime = current->ret_stack[index].calltime;
 	trace->overrun = atomic_read(&current->trace_overrun);
 	trace->depth = index;
-	barrier();
-	current->curr_ret_stack--;
-
 }
 
 /*
@@ -121,6 +119,8 @@ unsigned long ftrace_return_to_handler(void)
 	ftrace_pop_return_trace(&trace, &ret);
 	trace.rettime = trace_clock_local();
 	ftrace_graph_return(&trace);
+	barrier();
+	current->curr_ret_stack--;
 
 	if (unlikely(!ret)) {
 		ftrace_graph_stop();
-- 
1.6.2

-- 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5][RFC] [RFC] function profiler
  2009-03-25  3:56 [PATCH 0/5][RFC] [RFC] function profiler Steven Rostedt
                   ` (4 preceding siblings ...)
  2009-03-25  3:56 ` [PATCH 5/5][RFC] function-graph: add option to calculate graph time or not Steven Rostedt
@ 2009-03-25  9:50 ` Ingo Molnar
  2009-03-25  9:52 ` Ingo Molnar
  6 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2009-03-25  9:50 UTC (permalink / raw)
  To: Steven Rostedt, Tom Zanussi
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird


* Steven Rostedt <rostedt@goodmis.org> wrote:

> The following patches are RFC, and to discuss is this work would 
> be useful.
> 
> This patch series adds a function profiler hook to the function 
> tracer and function graph tracer. That is, it uses the mcount 
> calls to track the functions being called. Now with the function 
> graph tracer it also keeps track of the times the functions have 
> been executing.
> 
> Note, this is similar to oprofile, but is not a sampling approach, 
> but a true recorder. Of course, by recording every function the 
> overhead compared to oprofile is higher. With the use of dynamic 
> ftrace, you could just trace certain functions, or just a certain 
> module. The flexibility is a bit greater with this profiling.

Also, with the built-in user-defined filter code in the latest 
tracing tree this can also be used to only trace a given PID, or a 
range of PIDs.

Or we can trace preempt-disabled functions only, via this filter 
expression:

  echo 'common_preempt_count != 0' > /debug/tracing/events/ftrace/function/filter

Or (future planned) we can profile functions only taking longer than 
100 microseconds:

  echo 'duration >= 0' > /debug/tracing/events/ftrace/function/filter

There are other possibilities as well.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/5][RFC] [RFC] function profiler
  2009-03-25  3:56 [PATCH 0/5][RFC] [RFC] function profiler Steven Rostedt
                   ` (5 preceding siblings ...)
  2009-03-25  9:50 ` [PATCH 0/5][RFC] [RFC] function profiler Ingo Molnar
@ 2009-03-25  9:52 ` Ingo Molnar
  6 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2009-03-25  9:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird


* Steven Rostedt <rostedt@goodmis.org> wrote:

> The following patches are RFC, and to discuss is this work would be
> useful.
> 
> This patch series adds a function profiler hook to the function tracer
> and function graph tracer. That is, it uses the mcount calls to track the
> functions being called. Now with the function graph tracer it also
> keeps track of the times the functions have been executing.
> 
> Note, this is similar to oprofile, but is not a sampling approach, but
> a true recorder. Of course, by recording every function the overhead
> compared to oprofile is higher. With the use of dynamic ftrace, you could
> just trace certain functions, or just a certain module. The flexibility
> is a bit greater with this profiling.
> 
> The last patch adds another feature to the function graph. It gives
> the ability to only trace the amount of time inside a function and
> not the amount spent in sub functions that are also recorded. That is,
> if A calls B, you can decide if the time spent in A should include the
> time spent in B. This is an option added to trace_options.
> 
> Again, this is all RFC, but if there is any interest in having this
> feature, prehaps we should include it.
> 
> -- Steve
> 
> 
> The following patches are in:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> 
>     branch: rfc/tip/tracing/function-profiler
> 
> 
> Steven Rostedt (5):
>       tracing: add function profiler
>       tracing: move function profiler data out of function struct
>       tracing: adding function timings to function profiler
>       tracing: make the function profiler per cpu
>       function-graph: add option to calculate graph time or not
> 
> ----
>  include/linux/ftrace.h               |    3 +-
>  kernel/trace/Kconfig                 |   15 +
>  kernel/trace/ftrace.c                |  558 +++++++++++++++++++++++++++++++++-
>  kernel/trace/trace.c                 |   15 +-
>  kernel/trace/trace.h                 |    4 +-
>  kernel/trace/trace_functions_graph.c |   25 ++-
>  kernel/trace/trace_output.c          |   10 +
>  kernel/trace/trace_output.h          |    2 +
>  8 files changed, 608 insertions(+), 24 deletions(-)

Pulled into tip:tracing/profiling, thanks Steve!

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5][RFC] tracing: add function profiler
  2009-03-25  3:56 ` [PATCH 1/5][RFC] tracing: add " Steven Rostedt
@ 2009-03-25 10:02   ` Ingo Molnar
  2009-03-25 15:42     ` Steven Rostedt
  2009-03-25 19:11   ` [PATCH 1/5][RFC] tracing: add function profiler Frederic Weisbecker
  1 sibling, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2009-03-25 10:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt


(a few minor nits - all delta-fixable.)

* Steven Rostedt <rostedt@goodmis.org> wrote:

> +#ifdef CONFIG_FUNCTION_PROFILER
> +static struct hlist_head *ftrace_profile_hash;
> +static int ftrace_profile_bits;
> +static int ftrace_profile_enabled;

should be __read_mostly.

> +static DEFINE_MUTEX(ftrace_profile_lock);

should have a comment explaining its rules.

> +
> +static void *
> +function_stat_next(void *v, int idx)
> +{
> +	struct dyn_ftrace *rec = v;
> +	struct ftrace_page *pg;
> +
> +	pg = (struct ftrace_page *)((unsigned long)rec & PAGE_MASK);
> +
> + again:
> +	rec++;
> +	if ((void *)rec >= (void *)&pg->records[pg->index]) {
> +		pg = pg->next;
> +		if (!pg)
> +			return NULL;
> +		rec = &pg->records[0];
> +	}
> +
> +	if (rec->flags & FTRACE_FL_FREE ||
> +	    rec->flags & FTRACE_FL_FAILED ||
> +	    !(rec->flags & FTRACE_FL_CONVERTED) ||
> +	    /* ignore non hit functions */
> +	    !rec->counter)
> +		goto again;
> +
> +	return rec;
> +}
> +
> +static void *function_stat_start(struct tracer_stat *trace)
> +{
> +	return function_stat_next(&ftrace_pages_start->records[0], 0);
> +}
> +
> +static int function_stat_cmp(void *p1, void *p2)
> +{
> +	struct dyn_ftrace *a = p1;
> +	struct dyn_ftrace *b = p2;
> +
> +	if (a->counter < b->counter)
> +		return -1;
> +	if (a->counter > b->counter)
> +		return 1;
> +	else
> +		return 0;
> +}
> +
> +static int function_stat_headers(struct seq_file *m)
> +{
> +	seq_printf(m, "  Function                               Hit\n"
> +		      "  --------                               ---\n");
> +	return 0;
> +}
> +
> +static int function_stat_show(struct seq_file *m, void *v)
> +{
> +	struct dyn_ftrace *rec = v;
> +	char str[KSYM_SYMBOL_LEN];
> +
> +	kallsyms_lookup(rec->ip, NULL, NULL, NULL, str);
> +
> +	seq_printf(m, "  %-30.30s  %10lu\n", str, rec->counter);
> +	return 0;
> +}
> +
> +static struct tracer_stat function_stats = {
> +	.name = "functions",
> +	.stat_start = function_stat_start,
> +	.stat_next = function_stat_next,
> +	.stat_cmp = function_stat_cmp,
> +	.stat_headers = function_stat_headers,
> +	.stat_show = function_stat_show
> +};

vertical spaces please.

> +
> +static void ftrace_profile_init(int nr_funcs)
> +{
> +	unsigned long addr;
> +	int order;
> +	int size;
> +
> +	/*
> +	 * We are profiling all functions, lets make it 1/4th of the
> +	 * number of functions that are in core kernel. So we have to
> +	 * iterate 4 times.
> +	 */
> +	order = (sizeof(struct hlist_head) * nr_funcs) / 4;
> +	order = get_order(order);
> +	size = 1 << (PAGE_SHIFT + order);
> +
> +	pr_info("Allocating %d KB for profiler hash\n", size >> 10);
> +
> +	addr = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> +	if (!addr) {
> +		pr_warning("Could not allocate function profiler hash\n");
> +		return;
> +	}
> +
> +	ftrace_profile_hash = (void *)addr;
> +
> +	/*
> +	 * struct hlist_head should be a pointer of 4 or 8 bytes.
> +	 * And a simple bit manipulation can be done, but if for
> +	 * some reason struct hlist_head is not a mulitple of 2,
> +	 * then we play it safe, and simply count. This function
> +	 * is done once at boot up, so it is not that critical in
> +	 * performance.
> +	 */
> +
> +	size--;
> +	size /= sizeof(struct hlist_head);
> +
> +	for (; size; size >>= 1)
> +		ftrace_profile_bits++;
> +
> +	pr_info("Function profiler has %d hash buckets\n",
> +		1 << ftrace_profile_bits);
> +
> +	return;
> +}

stale return.

> +
> +static ssize_t
> +ftrace_profile_read(struct file *filp, char __user *ubuf,
> +		     size_t cnt, loff_t *ppos)
> +{
> +	char buf[64];

Magic constant.

> +	int r;
> +
> +	r = sprintf(buf, "%u\n", ftrace_profile_enabled);
> +	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);

What guarantees that 'cnt' cannot be larger than ~44?

> +}
> +
> +static void ftrace_profile_reset(void)
> +{
> +	struct dyn_ftrace *rec;
> +	struct ftrace_page *pg;
> +
> +	do_for_each_ftrace_rec(pg, rec) {
> +		rec->counter = 0;
> +	} while_for_each_ftrace_rec();
> +}
> +
> +static struct dyn_ftrace *ftrace_find_profiled_func(unsigned long ip)
> +{
> +	struct dyn_ftrace *rec;
> +	struct hlist_head *hhd;
> +	struct hlist_node *n;
> +	unsigned long flags;
> +	unsigned long key;
> +
> +	if (!ftrace_profile_hash)
> +		return NULL;
> +
> +	key = hash_long(ip, ftrace_profile_bits);
> +	hhd = &ftrace_profile_hash[key];
> +
> +	if (hlist_empty(hhd))
> +		return NULL;
> +
> +	local_irq_save(flags);
> +	hlist_for_each_entry_rcu(rec, n, hhd, node) {
> +		if (rec->ip == ip)
> +			goto out;
> +	}
> +	rec = NULL;
> + out:
> +	local_irq_restore(flags);
> +
> +	return rec;
> +}

actually, this hash is a per function attribute, in disguise, 
indexed by addressed. Dont we have such a hash already: 
ftrace_func_hash?

Wouldnt it be better to refactor all this into a clean per function 
attribute hash thing. The profile counter can be allocated via 
percpu_alloc(), to keep it scalable.

> +
> +static void
> +function_profile_call(unsigned long ip, unsigned long parent_ip)
> +{
> +	struct dyn_ftrace *rec;
> +	unsigned long flags;
> +
> +	if (!ftrace_profile_enabled)
> +		return;
> +
> +	local_irq_save(flags);
> +	rec = ftrace_find_profiled_func(ip);
> +	if (!rec)
> +		goto out;
> +
> +	rec->counter++;
> + out:
> +	local_irq_restore(flags);
> +}
> +
> +static struct ftrace_ops ftrace_profile_ops __read_mostly =
> +{
> +	.func = function_profile_call,

(use vertical space consistent with other places in this file 
please.)

> +};
> +
> +static ssize_t
> +ftrace_profile_write(struct file *filp, const char __user *ubuf,
> +		     size_t cnt, loff_t *ppos)
> +{
> +	unsigned long val;
> +	char buf[64];
> +	int ret;
> +
> +	if (!ftrace_profile_hash) {
> +		pr_info("Can not enable hash due to earlier problems\n");
> +		return -ENODEV;
> +	}
> +
> +	if (cnt >= sizeof(buf))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&buf, ubuf, cnt))
> +		return -EFAULT;

cnt is always <= 64 ?

> +
> +	buf[cnt] = 0;
> +
> +	ret = strict_strtoul(buf, 10, &val);
> +	if (ret < 0)
> +		return ret;
> +
> +	val = !!val;
> +
> +	mutex_lock(&ftrace_profile_lock);
> +	if (ftrace_profile_enabled ^ val) {
> +		if (val) {
> +			ftrace_profile_reset();
> +			register_ftrace_function(&ftrace_profile_ops);
> +			ftrace_profile_enabled = 1;
> +		} else {
> +			ftrace_profile_enabled = 0;
> +			unregister_ftrace_function(&ftrace_profile_ops);
> +		}
> +	}
> +	mutex_unlock(&ftrace_profile_lock);
> +
> +	filp->f_pos += cnt;
> +
> +	return cnt;

these patterns of code are familar too. We do a lot of 
integer-option twiddlig. Helper-function potential?

> +}
> +
> +static const struct file_operations ftrace_profile_fops = {
> +	.open		= tracing_open_generic,
> +	.read		= ftrace_profile_read,
> +	.write		= ftrace_profile_write,
> +};
> +
> +static void ftrace_profile_debugfs(struct dentry *d_tracer)
> +{
> +	struct dentry *entry;
> +	int ret;
> +
> +	ret = register_stat_tracer(&function_stats);
> +	if (ret) {
> +		pr_warning("Warning: could not register "
> +			   "function stats\n");
> +		return;
> +	}
> +
> +	entry = debugfs_create_file("function_profile_enabled", 0644,
> +				    d_tracer, NULL, &ftrace_profile_fops);
> +	if (!entry)
> +		pr_warning("Could not create debugfs "
> +			   "'function_profile_enabled' entry\n");

this is a reoccuring pattern. Add tracing_create_file() perhaps, 
with an implicit pr_warning()? Would clean up quite a lot of code.

> +}
> +
> +static void ftrace_add_profile(struct dyn_ftrace *rec)
> +{
> +	unsigned long key;
> +
> +	if (!ftrace_profile_hash)
> +		return;
> +
> +	key = hash_long(rec->ip, ftrace_profile_bits);
> +	hlist_add_head_rcu(&rec->node, &ftrace_profile_hash[key]);
> +}
> +
> +static void ftrace_profile_release(struct dyn_ftrace *rec)
> +{
> +	mutex_lock(&ftrace_profile_lock);
> +	hlist_del(&rec->node);
> +	mutex_unlock(&ftrace_profile_lock);
> +}
> +
> +#else /* CONFIG_FUNCTION_PROFILER */
> +static void ftrace_profile_init(int nr_funcs)
> +{
> +}
> +static void ftrace_add_profile(struct dyn_ftrace *rec)
> +{
> +}
> +static void ftrace_profile_debugfs(struct dentry *d_tracer)
> +{
> +}
> +static void ftrace_profile_release(struct dyn_ftrace *rec)
> +{
> +}
> +#endif /* CONFIG_FUNCTION_PROFILER */
> +
>  #ifdef CONFIG_KPROBES
>  
>  static int frozen_record_count;
> @@ -359,8 +660,10 @@ void ftrace_release(void *start, unsigned long size)
>  	mutex_lock(&ftrace_lock);
>  	do_for_each_ftrace_rec(pg, rec) {
>  		if ((rec->ip >= s) && (rec->ip < e) &&
> -		    !(rec->flags & FTRACE_FL_FREE))
> +		    !(rec->flags & FTRACE_FL_FREE)) {
>  			ftrace_free_rec(rec);
> +			ftrace_profile_release(rec);
> +		}
>  	} while_for_each_ftrace_rec();
>  	mutex_unlock(&ftrace_lock);
>  }
> @@ -414,6 +717,8 @@ ftrace_record_ip(unsigned long ip)
>  	rec->newlist = ftrace_new_addrs;
>  	ftrace_new_addrs = rec;
>  
> +	ftrace_add_profile(rec);
> +
>  	return rec;
>  }
>  
> @@ -2157,6 +2462,8 @@ static __init int ftrace_init_dyn_debugfs(struct dentry *d_tracer)
>  			   "'set_graph_function' entry\n");
>  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
>  
> +	ftrace_profile_debugfs(d_tracer);
> +
>  	return 0;
>  }
>  
> @@ -2225,6 +2532,8 @@ void __init ftrace_init(void)
>  	if (ret)
>  		goto failed;
>  
> +	ftrace_profile_init(count);
> +
>  	last_ftrace_enabled = ftrace_enabled = 1;
>  
>  	ret = ftrace_convert_nops(NULL,

Neat stuff ...

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5][RFC] tracing: move function profiler data out of function struct
  2009-03-25  3:56 ` [PATCH 2/5][RFC] tracing: move function profiler data out of function struct Steven Rostedt
@ 2009-03-25 10:07   ` Ingo Molnar
  2009-03-25 15:43     ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2009-03-25 10:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt


* Steven Rostedt <rostedt@goodmis.org> wrote:

> +/* Interrupts must be disabled calling this */
> +static struct ftrace_profile *
> +ftrace_profile_alloc(unsigned long ip, bool alloc_safe)
> +{
> +	struct ftrace_profile *rec = NULL;
> +
> +	/* prevent recursion */
> +	if (atomic_inc_return(&__get_cpu_var(ftrace_profile_disable)) != 1)
> +		goto out;
> +
> +	__raw_spin_lock(&ftrace_profile_rec_lock);
> +
> +	/* Try to always keep another page available */
> +	if (!profile_pages->next && alloc_safe)
> +		profile_pages->next = (void *)get_zeroed_page(GFP_ATOMIC);

this does not seem to be NMI safe.

This all would be solved much more robustly by the function 
attributes hash approach i suggested in the previous mail. If 
percpu_alloc() is done for 20,000 functions the memory allocation 
overhead is no big deal.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5][RFC] function-graph: add option to calculate graph time or not
  2009-03-25  3:56 ` [PATCH 5/5][RFC] function-graph: add option to calculate graph time or not Steven Rostedt
@ 2009-03-25 10:11   ` Ingo Molnar
  2009-03-25 17:35     ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2009-03-25 10:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt


* Steven Rostedt <rostedt@goodmis.org> wrote:

> --- a/kernel/trace/trace_functions_graph.c
> +++ b/kernel/trace/trace_functions_graph.c
> @@ -104,9 +105,6 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret)
>  	trace->calltime = current->ret_stack[index].calltime;
>  	trace->overrun = atomic_read(&current->trace_overrun);
>  	trace->depth = index;
> -	barrier();
> -	current->curr_ret_stack--;
> -
>  }
>  
>  /*
> @@ -121,6 +119,8 @@ unsigned long ftrace_return_to_handler(void)
>  	ftrace_pop_return_trace(&trace, &ret);
>  	trace.rettime = trace_clock_local();
>  	ftrace_graph_return(&trace);
> +	barrier();
> +	current->curr_ret_stack--;

this will modify the regular graph tracer too. No discussion of this 
change in the changelog - is it a fix?

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5][RFC] tracing: add function profiler
  2009-03-25 10:02   ` Ingo Molnar
@ 2009-03-25 15:42     ` Steven Rostedt
  2009-03-25 17:38       ` [PATCH][GIT PULL] tracing: clean up tracing profiler Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25 15:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt




On Wed, 25 Mar 2009, Ingo Molnar wrote:

> 
> (a few minor nits - all delta-fixable.)
> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > +#ifdef CONFIG_FUNCTION_PROFILER
> > +static struct hlist_head *ftrace_profile_hash;
> > +static int ftrace_profile_bits;
> > +static int ftrace_profile_enabled;
> 
> should be __read_mostly.

OK, agreed.

> 
> > +static DEFINE_MUTEX(ftrace_profile_lock);
> 
> should have a comment explaining its rules.

OK, will fix.

> 
> > +
> > +static void *
> > +function_stat_next(void *v, int idx)
> > +{
> > +	struct dyn_ftrace *rec = v;
> > +	struct ftrace_page *pg;
> > +
> > +	pg = (struct ftrace_page *)((unsigned long)rec & PAGE_MASK);
> > +
> > + again:
> > +	rec++;
> > +	if ((void *)rec >= (void *)&pg->records[pg->index]) {
> > +		pg = pg->next;
> > +		if (!pg)
> > +			return NULL;
> > +		rec = &pg->records[0];
> > +	}
> > +
> > +	if (rec->flags & FTRACE_FL_FREE ||
> > +	    rec->flags & FTRACE_FL_FAILED ||
> > +	    !(rec->flags & FTRACE_FL_CONVERTED) ||
> > +	    /* ignore non hit functions */
> > +	    !rec->counter)
> > +		goto again;
> > +
> > +	return rec;
> > +}
> > +
> > +static void *function_stat_start(struct tracer_stat *trace)
> > +{
> > +	return function_stat_next(&ftrace_pages_start->records[0], 0);
> > +}
> > +
> > +static int function_stat_cmp(void *p1, void *p2)
> > +{
> > +	struct dyn_ftrace *a = p1;
> > +	struct dyn_ftrace *b = p2;
> > +
> > +	if (a->counter < b->counter)
> > +		return -1;
> > +	if (a->counter > b->counter)
> > +		return 1;
> > +	else
> > +		return 0;
> > +}
> > +
> > +static int function_stat_headers(struct seq_file *m)
> > +{
> > +	seq_printf(m, "  Function                               Hit\n"
> > +		      "  --------                               ---\n");
> > +	return 0;
> > +}
> > +
> > +static int function_stat_show(struct seq_file *m, void *v)
> > +{
> > +	struct dyn_ftrace *rec = v;
> > +	char str[KSYM_SYMBOL_LEN];
> > +
> > +	kallsyms_lookup(rec->ip, NULL, NULL, NULL, str);
> > +
> > +	seq_printf(m, "  %-30.30s  %10lu\n", str, rec->counter);
> > +	return 0;
> > +}
> > +
> > +static struct tracer_stat function_stats = {
> > +	.name = "functions",
> > +	.stat_start = function_stat_start,
> > +	.stat_next = function_stat_next,
> > +	.stat_cmp = function_stat_cmp,
> > +	.stat_headers = function_stat_headers,
> > +	.stat_show = function_stat_show
> > +};
> 
> vertical spaces please.

Oops, cut and paste error. I really like to copy sloppiness, don't I ;-)

> 
> > +
> > +static void ftrace_profile_init(int nr_funcs)
> > +{
> > +	unsigned long addr;
> > +	int order;
> > +	int size;
> > +
> > +	/*
> > +	 * We are profiling all functions, lets make it 1/4th of the
> > +	 * number of functions that are in core kernel. So we have to
> > +	 * iterate 4 times.
> > +	 */
> > +	order = (sizeof(struct hlist_head) * nr_funcs) / 4;
> > +	order = get_order(order);
> > +	size = 1 << (PAGE_SHIFT + order);
> > +
> > +	pr_info("Allocating %d KB for profiler hash\n", size >> 10);
> > +
> > +	addr = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> > +	if (!addr) {
> > +		pr_warning("Could not allocate function profiler hash\n");
> > +		return;
> > +	}
> > +
> > +	ftrace_profile_hash = (void *)addr;
> > +
> > +	/*
> > +	 * struct hlist_head should be a pointer of 4 or 8 bytes.
> > +	 * And a simple bit manipulation can be done, but if for
> > +	 * some reason struct hlist_head is not a mulitple of 2,
> > +	 * then we play it safe, and simply count. This function
> > +	 * is done once at boot up, so it is not that critical in
> > +	 * performance.
> > +	 */
> > +
> > +	size--;
> > +	size /= sizeof(struct hlist_head);
> > +
> > +	for (; size; size >>= 1)
> > +		ftrace_profile_bits++;
> > +
> > +	pr_info("Function profiler has %d hash buckets\n",
> > +		1 << ftrace_profile_bits);
> > +
> > +	return;
> > +}
> 
> stale return.

Hmm, I wonder if I originaly returned a value, and just cut it off when
I changed it. I'll fix it if it still exists later in the patches.

> 
> > +
> > +static ssize_t
> > +ftrace_profile_read(struct file *filp, char __user *ubuf,
> > +		     size_t cnt, loff_t *ppos)
> > +{
> > +	char buf[64];
> 
> Magic constant.
> 
> > +	int r;
> > +
> > +	r = sprintf(buf, "%u\n", ftrace_profile_enabled);
> > +	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
> 
> What guarantees that 'cnt' cannot be larger than ~44?

Well, r returns the size of "%u\n", where I do not see it ever being 
bigger than 64. The 'r' is what keeps the buffer from reading more than 
64.

I can comment that "64" is large enough to hold a number?

> 
> > +}
> > +
> > +static void ftrace_profile_reset(void)
> > +{
> > +	struct dyn_ftrace *rec;
> > +	struct ftrace_page *pg;
> > +
> > +	do_for_each_ftrace_rec(pg, rec) {
> > +		rec->counter = 0;
> > +	} while_for_each_ftrace_rec();
> > +}
> > +
> > +static struct dyn_ftrace *ftrace_find_profiled_func(unsigned long ip)
> > +{
> > +	struct dyn_ftrace *rec;
> > +	struct hlist_head *hhd;
> > +	struct hlist_node *n;
> > +	unsigned long flags;
> > +	unsigned long key;
> > +
> > +	if (!ftrace_profile_hash)
> > +		return NULL;
> > +
> > +	key = hash_long(ip, ftrace_profile_bits);
> > +	hhd = &ftrace_profile_hash[key];
> > +
> > +	if (hlist_empty(hhd))
> > +		return NULL;
> > +
> > +	local_irq_save(flags);
> > +	hlist_for_each_entry_rcu(rec, n, hhd, node) {
> > +		if (rec->ip == ip)
> > +			goto out;
> > +	}
> > +	rec = NULL;
> > + out:
> > +	local_irq_restore(flags);
> > +
> > +	return rec;
> > +}
> 
> actually, this hash is a per function attribute, in disguise, 
> indexed by addressed. Dont we have such a hash already: 
> ftrace_func_hash?

We do have a smaller hash with the probe funcs. But it is much smaller
since it is based on users adding attributes, where as this hash tries
to save all functions being executed. I can look at seeing if the two
could be consolidated.

> 
> Wouldnt it be better to refactor all this into a clean per function 
> attribute hash thing. The profile counter can be allocated via 
> percpu_alloc(), to keep it scalable.

Later patches makes it per cpu.

> 
> > +
> > +static void
> > +function_profile_call(unsigned long ip, unsigned long parent_ip)
> > +{
> > +	struct dyn_ftrace *rec;
> > +	unsigned long flags;
> > +
> > +	if (!ftrace_profile_enabled)
> > +		return;
> > +
> > +	local_irq_save(flags);
> > +	rec = ftrace_find_profiled_func(ip);
> > +	if (!rec)
> > +		goto out;
> > +
> > +	rec->counter++;
> > + out:
> > +	local_irq_restore(flags);
> > +}
> > +
> > +static struct ftrace_ops ftrace_profile_ops __read_mostly =
> > +{
> > +	.func = function_profile_call,
> 
> (use vertical space consistent with other places in this file 
> please.)

OK.

> 
> > +};
> > +
> > +static ssize_t
> > +ftrace_profile_write(struct file *filp, const char __user *ubuf,
> > +		     size_t cnt, loff_t *ppos)
> > +{
> > +	unsigned long val;
> > +	char buf[64];
> > +	int ret;
> > +
> > +	if (!ftrace_profile_hash) {
> > +		pr_info("Can not enable hash due to earlier problems\n");
> > +		return -ENODEV;
> > +	}
> > +
> > +	if (cnt >= sizeof(buf))
> > +		return -EINVAL;
> > +
> > +	if (copy_from_user(&buf, ubuf, cnt))
> > +		return -EFAULT;
> 
> cnt is always <= 64 ?

The 'if (cnt >= sizeof(buf))' above makes that so.


> 
> > +
> > +	buf[cnt] = 0;
> > +
> > +	ret = strict_strtoul(buf, 10, &val);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	val = !!val;
> > +
> > +	mutex_lock(&ftrace_profile_lock);
> > +	if (ftrace_profile_enabled ^ val) {
> > +		if (val) {
> > +			ftrace_profile_reset();
> > +			register_ftrace_function(&ftrace_profile_ops);
> > +			ftrace_profile_enabled = 1;
> > +		} else {
> > +			ftrace_profile_enabled = 0;
> > +			unregister_ftrace_function(&ftrace_profile_ops);
> > +		}
> > +	}
> > +	mutex_unlock(&ftrace_profile_lock);
> > +
> > +	filp->f_pos += cnt;
> > +
> > +	return cnt;
> 
> these patterns of code are familar too. We do a lot of 
> integer-option twiddlig. Helper-function potential?

I'll have to look into that too.

> 
> > +}
> > +
> > +static const struct file_operations ftrace_profile_fops = {
> > +	.open		= tracing_open_generic,
> > +	.read		= ftrace_profile_read,
> > +	.write		= ftrace_profile_write,
> > +};
> > +
> > +static void ftrace_profile_debugfs(struct dentry *d_tracer)
> > +{
> > +	struct dentry *entry;
> > +	int ret;
> > +
> > +	ret = register_stat_tracer(&function_stats);
> > +	if (ret) {
> > +		pr_warning("Warning: could not register "
> > +			   "function stats\n");
> > +		return;
> > +	}
> > +
> > +	entry = debugfs_create_file("function_profile_enabled", 0644,
> > +				    d_tracer, NULL, &ftrace_profile_fops);
> > +	if (!entry)
> > +		pr_warning("Could not create debugfs "
> > +			   "'function_profile_enabled' entry\n");
> 
> this is a reoccuring pattern. Add tracing_create_file() perhaps, 
> with an implicit pr_warning()? Would clean up quite a lot of code.

I had a patch to do that a while ago. It some how got lost in the shuffle.
I can write up a new one.

> 
> > +}
> > +
> > +static void ftrace_add_profile(struct dyn_ftrace *rec)
> > +{
> > +	unsigned long key;
> > +
> > +	if (!ftrace_profile_hash)
> > +		return;
> > +
> > +	key = hash_long(rec->ip, ftrace_profile_bits);
> > +	hlist_add_head_rcu(&rec->node, &ftrace_profile_hash[key]);
> > +}
> > +
> > +static void ftrace_profile_release(struct dyn_ftrace *rec)
> > +{
> > +	mutex_lock(&ftrace_profile_lock);
> > +	hlist_del(&rec->node);
> > +	mutex_unlock(&ftrace_profile_lock);
> > +}
> > +
> > +#else /* CONFIG_FUNCTION_PROFILER */
> > +static void ftrace_profile_init(int nr_funcs)
> > +{
> > +}
> > +static void ftrace_add_profile(struct dyn_ftrace *rec)
> > +{
> > +}
> > +static void ftrace_profile_debugfs(struct dentry *d_tracer)
> > +{
> > +}
> > +static void ftrace_profile_release(struct dyn_ftrace *rec)
> > +{
> > +}
> > +#endif /* CONFIG_FUNCTION_PROFILER */
> > +
> >  #ifdef CONFIG_KPROBES
> >  
> >  static int frozen_record_count;
> > @@ -359,8 +660,10 @@ void ftrace_release(void *start, unsigned long size)
> >  	mutex_lock(&ftrace_lock);
> >  	do_for_each_ftrace_rec(pg, rec) {
> >  		if ((rec->ip >= s) && (rec->ip < e) &&
> > -		    !(rec->flags & FTRACE_FL_FREE))
> > +		    !(rec->flags & FTRACE_FL_FREE)) {
> >  			ftrace_free_rec(rec);
> > +			ftrace_profile_release(rec);
> > +		}
> >  	} while_for_each_ftrace_rec();
> >  	mutex_unlock(&ftrace_lock);
> >  }
> > @@ -414,6 +717,8 @@ ftrace_record_ip(unsigned long ip)
> >  	rec->newlist = ftrace_new_addrs;
> >  	ftrace_new_addrs = rec;
> >  
> > +	ftrace_add_profile(rec);
> > +
> >  	return rec;
> >  }
> >  
> > @@ -2157,6 +2462,8 @@ static __init int ftrace_init_dyn_debugfs(struct dentry *d_tracer)
> >  			   "'set_graph_function' entry\n");
> >  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> >  
> > +	ftrace_profile_debugfs(d_tracer);
> > +
> >  	return 0;
> >  }
> >  
> > @@ -2225,6 +2532,8 @@ void __init ftrace_init(void)
> >  	if (ret)
> >  		goto failed;
> >  
> > +	ftrace_profile_init(count);
> > +
> >  	last_ftrace_enabled = ftrace_enabled = 1;
> >  
> >  	ret = ftrace_convert_nops(NULL,
> 
> Neat stuff ...

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5][RFC] tracing: move function profiler data out of function struct
  2009-03-25 10:07   ` Ingo Molnar
@ 2009-03-25 15:43     ` Steven Rostedt
  2009-03-25 17:44       ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25 15:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt


On Wed, 25 Mar 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > +/* Interrupts must be disabled calling this */
> > +static struct ftrace_profile *
> > +ftrace_profile_alloc(unsigned long ip, bool alloc_safe)
> > +{
> > +	struct ftrace_profile *rec = NULL;
> > +
> > +	/* prevent recursion */
> > +	if (atomic_inc_return(&__get_cpu_var(ftrace_profile_disable)) != 1)
> > +		goto out;
> > +
> > +	__raw_spin_lock(&ftrace_profile_rec_lock);
> > +
> > +	/* Try to always keep another page available */
> > +	if (!profile_pages->next && alloc_safe)
> > +		profile_pages->next = (void *)get_zeroed_page(GFP_ATOMIC);
> 
> this does not seem to be NMI safe.

The atomic_inc_return protects against NMIs, since this is the only place 
the lock is taken.

> 
> This all would be solved much more robustly by the function 
> attributes hash approach i suggested in the previous mail. If 
> percpu_alloc() is done for 20,000 functions the memory allocation 
> overhead is no big deal.

Later patches create a per-cpu buffers and removes the lock.

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5][RFC] function-graph: add option to calculate graph time or not
  2009-03-25 10:11   ` Ingo Molnar
@ 2009-03-25 17:35     ` Steven Rostedt
  0 siblings, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25 17:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt

On Wed, 25 Mar 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > --- a/kernel/trace/trace_functions_graph.c
> > +++ b/kernel/trace/trace_functions_graph.c
> > @@ -104,9 +105,6 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret)
> >  	trace->calltime = current->ret_stack[index].calltime;
> >  	trace->overrun = atomic_read(&current->trace_overrun);
> >  	trace->depth = index;
> > -	barrier();
> > -	current->curr_ret_stack--;
> > -
> >  }
> >  
> >  /*
> > @@ -121,6 +119,8 @@ unsigned long ftrace_return_to_handler(void)
> >  	ftrace_pop_return_trace(&trace, &ret);
> >  	trace.rettime = trace_clock_local();
> >  	ftrace_graph_return(&trace);
> > +	barrier();
> > +	current->curr_ret_stack--;
> 
> this will modify the regular graph tracer too. No discussion of this 
> change in the changelog - is it a fix?

Ah, sorry, I should have commented that.

This is to prevent a race in the profiler code. This is not a problem with 
current function graph code.

What happens is that the profiler references the ret_stack from the 
ftrace_graph_return function. If we update the curr_ret_stack and we take 
an interrupt then we run the risk of reading incorrect data. Once we 
decrement the curr_ret_stack, the ret_stack for the function is fair game 
to be used again by an interrupt.

Perhaps I should pull this change out and add it for core. This will let 
any function graph user safely reference the ret_stack.

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH][GIT PULL] tracing: clean up tracing profiler
  2009-03-25 15:42     ` Steven Rostedt
@ 2009-03-25 17:38       ` Steven Rostedt
  0 siblings, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25 17:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt


Ingo,

Please pull the latest rfc/tip/tracing/function-profiler tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
rfc/tip/tracing/function-profiler


Steven Rostedt (1):
      tracing: clean up tracing profiler

----
 kernel/trace/ftrace.c |   28 +++++++++++++++-------------
 1 files changed, 15 insertions(+), 13 deletions(-)
---------------------------
commit fb9fb015e92123fa3a8e0c2e2fff491d4a56b470
Author: Steven Rostedt <srostedt@redhat.com>
Date:   Wed Mar 25 13:26:41 2009 -0400

    tracing: clean up tracing profiler
    
    Ingo Molnar suggested clean ups for the profiling code. This patch
    makes those updates.
    
    Reported-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Steven Rostedt <srostedt@redhat.com>

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 71e5fae..a141d84 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -69,7 +69,7 @@ static DEFINE_MUTEX(ftrace_lock);
 
 static struct ftrace_ops ftrace_list_end __read_mostly =
 {
-	.func = ftrace_stub,
+	.func		= ftrace_stub,
 };
 
 static struct ftrace_ops *ftrace_list __read_mostly = &ftrace_list_end;
@@ -271,8 +271,10 @@ struct ftrace_profile_stat {
 #define PROFILES_PER_PAGE					\
 	(PROFILE_RECORDS_SIZE / sizeof(struct ftrace_profile))
 
-static int ftrace_profile_bits;
-static int ftrace_profile_enabled;
+static int ftrace_profile_bits __read_mostly;
+static int ftrace_profile_enabled __read_mostly;
+
+/* ftrace_profile_lock - synchronize the enable and disable of the profiler */
 static DEFINE_MUTEX(ftrace_profile_lock);
 
 static DEFINE_PER_CPU(struct ftrace_profile_stat, ftrace_profile_stats);
@@ -651,7 +653,7 @@ static void unregister_ftrace_profiler(void)
 #else
 static struct ftrace_ops ftrace_profile_ops __read_mostly =
 {
-	.func = function_profile_call,
+	.func		= function_profile_call,
 };
 
 static int register_ftrace_profiler(void)
@@ -670,7 +672,7 @@ ftrace_profile_write(struct file *filp, const char __user *ubuf,
 		     size_t cnt, loff_t *ppos)
 {
 	unsigned long val;
-	char buf[64];
+	char buf[64];		/* big enough to hold a number */
 	int ret;
 
 	if (cnt >= sizeof(buf))
@@ -719,7 +721,7 @@ static ssize_t
 ftrace_profile_read(struct file *filp, char __user *ubuf,
 		     size_t cnt, loff_t *ppos)
 {
-	char buf[64];
+	char buf[64];		/* big enough to hold a number */
 	int r;
 
 	r = sprintf(buf, "%u\n", ftrace_profile_enabled);
@@ -734,12 +736,12 @@ static const struct file_operations ftrace_profile_fops = {
 
 /* used to initialize the real stat files */
 static struct tracer_stat function_stats __initdata = {
-	.name = "functions",
-	.stat_start = function_stat_start,
-	.stat_next = function_stat_next,
-	.stat_cmp = function_stat_cmp,
-	.stat_headers = function_stat_headers,
-	.stat_show = function_stat_show
+	.name		= "functions",
+	.stat_start	= function_stat_start,
+	.stat_next	= function_stat_next,
+	.stat_cmp	= function_stat_cmp,
+	.stat_headers	= function_stat_headers,
+	.stat_show	= function_stat_show
 };
 
 static void ftrace_profile_debugfs(struct dentry *d_tracer)
@@ -1954,7 +1956,7 @@ function_trace_probe_call(unsigned long ip, unsigned long parent_ip)
 
 static struct ftrace_ops trace_probe_ops __read_mostly =
 {
-	.func = function_trace_probe_call,
+	.func		= function_trace_probe_call,
 };
 
 static int ftrace_probe_registered;



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5][RFC] tracing: move function profiler data out of function struct
  2009-03-25 15:43     ` Steven Rostedt
@ 2009-03-25 17:44       ` Ingo Molnar
  2009-03-25 17:57         ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2009-03-25 17:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt


* Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> On Wed, 25 Mar 2009, Ingo Molnar wrote:
> 
> > 
> > * Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > +/* Interrupts must be disabled calling this */
> > > +static struct ftrace_profile *
> > > +ftrace_profile_alloc(unsigned long ip, bool alloc_safe)
> > > +{
> > > +	struct ftrace_profile *rec = NULL;
> > > +
> > > +	/* prevent recursion */
> > > +	if (atomic_inc_return(&__get_cpu_var(ftrace_profile_disable)) != 1)
> > > +		goto out;
> > > +
> > > +	__raw_spin_lock(&ftrace_profile_rec_lock);
> > > +
> > > +	/* Try to always keep another page available */
> > > +	if (!profile_pages->next && alloc_safe)
> > > +		profile_pages->next = (void *)get_zeroed_page(GFP_ATOMIC);
> > 
> > this does not seem to be NMI safe.
> 
> The atomic_inc_return protects against NMIs, since this is the only place 
> the lock is taken.

i mean, if this code executes _ni_ an NMI. Or that cannot happen? We 
trace nmis too, dont we?

> > This all would be solved much more robustly by the function 
> > attributes hash approach i suggested in the previous mail. If 
> > percpu_alloc() is done for 20,000 functions the memory 
> > allocation overhead is no big deal.
> 
> Later patches create a per-cpu buffers and removes the lock.

ok :)

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5][RFC] tracing: move function profiler data out of function struct
  2009-03-25 17:44       ` Ingo Molnar
@ 2009-03-25 17:57         ` Steven Rostedt
  2009-03-25 18:02           ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25 17:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt


On Wed, 25 Mar 2009, Ingo Molnar wrote:

> > The atomic_inc_return protects against NMIs, since this is the only place 
> > the lock is taken.
> 
> i mean, if this code executes _ni_ an NMI. Or that cannot happen? We 
> trace nmis too, dont we?

Yes, it can execute in an NMI, but we have no issues there. If the lock 
was taken the atomic inc is set. If an NMI comes in, it will fail the
atomic inc counter test and never try to take the lock. 

But as stated below, this is just beating the dead horse, because we can 
bend the lock. But it is impossible to bend the lock. What we must realize 
is... "There is no lock". ;-)

-- Steve


> 
> > > This all would be solved much more robustly by the function 
> > > attributes hash approach i suggested in the previous mail. If 
> > > percpu_alloc() is done for 20,000 functions the memory 
> > > allocation overhead is no big deal.
> > 
> > Later patches create a per-cpu buffers and removes the lock.
> 
> ok :)
> 
> 	Ingo
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5][RFC] tracing: move function profiler data out of function struct
  2009-03-25 17:57         ` Steven Rostedt
@ 2009-03-25 18:02           ` Ingo Molnar
  2009-03-25 18:16             ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2009-03-25 18:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt


* Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> On Wed, 25 Mar 2009, Ingo Molnar wrote:
> 
> > > The atomic_inc_return protects against NMIs, since this is the only place 
> > > the lock is taken.
> > 
> > i mean, if this code executes _ni_ an NMI. Or that cannot happen? We 
> > trace nmis too, dont we?
> 
> Yes, it can execute in an NMI, but we have no issues there. If the 
> lock was taken the atomic inc is set. If an NMI comes in, it will 
> fail the atomic inc counter test and never try to take the lock.

You are misunderstanding me. Can this bit:

> > +   if (!profile_pages->next && alloc_safe)
> > +           profile_pages->next = (void *)get_zeroed_page(GFP_ATOMIC);

run in NMI context?

or is alloc_safe false in that case? Even if it's safe it looks 
really messy ...

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5][RFC] tracing: move function profiler data out of function struct
  2009-03-25 18:02           ` Ingo Molnar
@ 2009-03-25 18:16             ` Steven Rostedt
  2009-03-25 18:35               ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25 18:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt

On Wed, 25 Mar 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > 
> > On Wed, 25 Mar 2009, Ingo Molnar wrote:
> > 
> > > > The atomic_inc_return protects against NMIs, since this is the only place 
> > > > the lock is taken.
> > > 
> > > i mean, if this code executes _ni_ an NMI. Or that cannot happen? We 
> > > trace nmis too, dont we?
> > 
> > Yes, it can execute in an NMI, but we have no issues there. If the 
> > lock was taken the atomic inc is set. If an NMI comes in, it will 
> > fail the atomic inc counter test and never try to take the lock.
> 
> You are misunderstanding me. Can this bit:

Ah thanks,

> 
> > > +   if (!profile_pages->next && alloc_safe)
> > > +           profile_pages->next = (void *)get_zeroed_page(GFP_ATOMIC);
> 
> run in NMI context?
> 
> or is alloc_safe false in that case? Even if it's safe it looks 
> really messy ...

alloc_safe is true only if interrupts were enabled, preemption is enabled 
and we are not in an interrupt or softirq.

So, it should fail when in NMI.

But I agree, it is messy, and I never liked it. I was trying to save 
space, but the space saving is probably not worth the messy code.

For every recorded function we have:

struct ftrace_profile {
	struct hlist_node	*node;
	unsigned long		ip;
	unsigned long		hit;
	unsigned long long	calltime;
};

on x86_64 that is 40 bytes. (node is 16, ip is 8, hit is 8, calltime is 8)

Thus, a normal distro kernel has 20,000 functions, and if we were to 
allocate for every function per online cpu, we have:

 20K * 40 = 800K per online CPU.

But we allocate only when we use it. Though it must be warned, because 
when we use it we must always be able to reference it, that memory stays 
allocated. If this becomes an issue, I can add a way to clean the results 
and free the memory.

I guess 800K per online CPU is not worth the messy code. I'll write up a 
patch to fix that.

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5][RFC] tracing: adding function timings to function profiler
  2009-03-25  3:56 ` [PATCH 3/5][RFC] tracing: adding function timings to function profiler Steven Rostedt
@ 2009-03-25 18:30   ` Frederic Weisbecker
  0 siblings, 0 replies; 25+ messages in thread
From: Frederic Weisbecker @ 2009-03-25 18:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Tim Bird, Steven Rostedt

On Tue, Mar 24, 2009 at 11:56:49PM -0400, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
> 
> If the function graph trace is enabled, the function profiler will
> use it to take the timing of the functions.
> 
>  cat /debug/tracing/trace_stat/functions
> 
>   Function                               Hit    Time
>   --------                               ---    ----
>   mwait_idle                             127    183028.4 us
>   schedule                                26    151997.7 us
>   __schedule                              31    151975.1 us
>   sys_wait4                                2    74080.53 us
>   do_wait                                  2    74077.80 us
>   sys_newlstat                           138    39929.16 us
>   do_path_lookup                         179    39845.79 us
>   vfs_lstat_fd                           138    39761.97 us
>   user_path_at                           153    39469.58 us
>   path_walk                              179    39435.76 us
>   __link_path_walk                       189    39143.73 us
> [...]



Wow, that's great!


 
> Note the times are skewed due to the function graph tracer not taking
> into account schedules.
> 
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> ---
>  kernel/trace/ftrace.c                |   93 ++++++++++++++++++++++++++++++++--
>  kernel/trace/trace.c                 |   11 ----
>  kernel/trace/trace.h                 |    3 +-
>  kernel/trace/trace_functions_graph.c |   17 +++++-
>  kernel/trace/trace_output.c          |   10 ++++
>  kernel/trace/trace_output.h          |    2 +
>  6 files changed, 117 insertions(+), 19 deletions(-)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 24dac44..a9ccd71 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -33,7 +33,7 @@
>  
>  #include <asm/ftrace.h>
>  
> -#include "trace.h"
> +#include "trace_output.h"
>  #include "trace_stat.h"
>  
>  #define FTRACE_WARN_ON(cond)			\
> @@ -246,6 +246,9 @@ struct ftrace_profile {
>  	struct hlist_node		node;
>  	unsigned long			ip;
>  	unsigned long			counter;
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +	unsigned long long		time;
> +#endif
>  };
>  
>  struct ftrace_profile_page {
> @@ -303,6 +306,22 @@ static void *function_stat_start(struct tracer_stat *trace)
>  	return function_stat_next(&profile_pages_start->records[0], 0);
>  }
>  
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +/* function graph compares on total time */
> +static int function_stat_cmp(void *p1, void *p2)
> +{
> +	struct ftrace_profile *a = p1;
> +	struct ftrace_profile *b = p2;
> +
> +	if (a->time < b->time)
> +		return -1;
> +	if (a->time > b->time)
> +		return 1;
> +	else
> +		return 0;
> +}
> +#else
> +/* not function graph compares against hits */
>  static int function_stat_cmp(void *p1, void *p2)
>  {
>  	struct ftrace_profile *a = p1;
> @@ -315,11 +334,17 @@ static int function_stat_cmp(void *p1, void *p2)
>  	else
>  		return 0;
>  }
> +#endif
>  
>  static int function_stat_headers(struct seq_file *m)
>  {
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +	seq_printf(m, "  Function                               Hit    Time\n"
> +		      "  --------                               ---    ----\n");
> +#else
>  	seq_printf(m, "  Function                               Hit\n"
>  		      "  --------                               ---\n");
> +#endif
>  	return 0;
>  }
>  
> @@ -327,10 +352,25 @@ static int function_stat_show(struct seq_file *m, void *v)
>  {
>  	struct ftrace_profile *rec = v;
>  	char str[KSYM_SYMBOL_LEN];
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +	static struct trace_seq s;
> +	static DEFINE_MUTEX(mutex);


I must confess I have some troubles to find what is protecting this mutex.
Usually I find them self-explained by their name if they are global or
by their situation inside a struture. But there, it's a bit difficult.

After a small delay reading: Ah, the static seq, sorry :-)


Thanks,

Frederic.


> +
> +	mutex_lock(&mutex);
> +	trace_seq_init(&s);
> +	trace_print_graph_duration(rec->time, &s);
> +#endif
>  
>  	kallsyms_lookup(rec->ip, NULL, NULL, NULL, str);
> +	seq_printf(m, "  %-30.30s  %10lu", str, rec->counter);
> +
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +	seq_printf(m, "    ");
> +	trace_print_seq(m, &s);
> +	mutex_unlock(&mutex);
> +#endif
> +	seq_putc(m, '\n');
>  
> -	seq_printf(m, "  %-30.30s  %10lu\n", str, rec->counter);
>  	return 0;
>  }
>  
> @@ -534,11 +574,52 @@ function_profile_call(unsigned long ip, unsigned long parent_ip)
>  	local_irq_restore(flags);
>  }
>  
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +static int profile_graph_entry(struct ftrace_graph_ent *trace)
> +{
> +	function_profile_call(trace->func, 0);
> +	return 1;
> +}
> +
> +static void profile_graph_return(struct ftrace_graph_ret *trace)
> +{
> +	unsigned long flags;
> +	struct ftrace_profile *rec;
> +
> +	local_irq_save(flags);
> +	rec = ftrace_find_profiled_func(trace->func);
> +	if (rec)
> +		rec->time += trace->rettime - trace->calltime;
> +	local_irq_restore(flags);
> +}
> +
> +static int register_ftrace_profiler(void)
> +{
> +	return register_ftrace_graph(&profile_graph_return,
> +				     &profile_graph_entry);
> +}
> +
> +static void unregister_ftrace_profiler(void)
> +{
> +	unregister_ftrace_graph();
> +}
> +#else
>  static struct ftrace_ops ftrace_profile_ops __read_mostly =
>  {
>  	.func = function_profile_call,
>  };
>  
> +static int register_ftrace_profiler(void)
> +{
> +	return register_ftrace_function(&ftrace_profile_ops);
> +}
> +
> +static void unregister_ftrace_profiler(void)
> +{
> +	unregister_ftrace_function(&ftrace_profile_ops);
> +}
> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> +
>  static ssize_t
>  ftrace_profile_write(struct file *filp, const char __user *ubuf,
>  		     size_t cnt, loff_t *ppos)
> @@ -570,11 +651,15 @@ ftrace_profile_write(struct file *filp, const char __user *ubuf,
>  				goto out;
>  			}
>  
> -			register_ftrace_function(&ftrace_profile_ops);
> +			ret = register_ftrace_profiler();
> +			if (ret < 0) {
> +				cnt = ret;
> +				goto out;
> +			}
>  			ftrace_profile_enabled = 1;
>  		} else {
>  			ftrace_profile_enabled = 0;
> -			unregister_ftrace_function(&ftrace_profile_ops);
> +			unregister_ftrace_profiler();
>  		}
>  	}
>   out:
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 67c6a21..821bf49 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -402,17 +402,6 @@ static ssize_t trace_seq_to_buffer(struct trace_seq *s, void *buf, size_t cnt)
>  	return cnt;
>  }
>  
> -static void
> -trace_print_seq(struct seq_file *m, struct trace_seq *s)
> -{
> -	int len = s->len >= PAGE_SIZE ? PAGE_SIZE - 1 : s->len;
> -
> -	s->buffer[len] = 0;
> -	seq_puts(m, s->buffer);
> -
> -	trace_seq_init(s);
> -}
> -
>  /**
>   * update_max_tr - snapshot all trace buffers from global_trace to max_tr
>   * @tr: tracer
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index d7410bb..c66ca3b 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -605,6 +605,8 @@ extern unsigned long trace_flags;
>  /* Standard output formatting function used for function return traces */
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>  extern enum print_line_t print_graph_function(struct trace_iterator *iter);
> +extern enum print_line_t
> +trace_print_graph_duration(unsigned long long duration, struct trace_seq *s);
>  
>  #ifdef CONFIG_DYNAMIC_FTRACE
>  /* TODO: make this variable */
> @@ -636,7 +638,6 @@ static inline int ftrace_graph_addr(unsigned long addr)
>  	return 1;
>  }
>  #endif /* CONFIG_DYNAMIC_FTRACE */
> -
>  #else /* CONFIG_FUNCTION_GRAPH_TRACER */
>  static inline enum print_line_t
>  print_graph_function(struct trace_iterator *iter)
> diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
> index d28687e..85bba0f 100644
> --- a/kernel/trace/trace_functions_graph.c
> +++ b/kernel/trace/trace_functions_graph.c
> @@ -426,8 +426,8 @@ print_graph_irq(struct trace_iterator *iter, unsigned long addr,
>  	return TRACE_TYPE_HANDLED;
>  }
>  
> -static enum print_line_t
> -print_graph_duration(unsigned long long duration, struct trace_seq *s)
> +enum print_line_t
> +trace_print_graph_duration(unsigned long long duration, struct trace_seq *s)
>  {
>  	unsigned long nsecs_rem = do_div(duration, 1000);
>  	/* log10(ULONG_MAX) + '\0' */
> @@ -464,12 +464,23 @@ print_graph_duration(unsigned long long duration, struct trace_seq *s)
>  		if (!ret)
>  			return TRACE_TYPE_PARTIAL_LINE;
>  	}
> +	return TRACE_TYPE_HANDLED;
> +}
> +
> +static enum print_line_t
> +print_graph_duration(unsigned long long duration, struct trace_seq *s)
> +{
> +	int ret;
> +
> +	ret = trace_print_graph_duration(duration, s);
> +	if (ret != TRACE_TYPE_HANDLED)
> +		return ret;
>  
>  	ret = trace_seq_printf(s, "|  ");
>  	if (!ret)
>  		return TRACE_TYPE_PARTIAL_LINE;
> -	return TRACE_TYPE_HANDLED;
>  
> +	return TRACE_TYPE_HANDLED;
>  }
>  
>  /* Case of a leaf function on its call entry */
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 19261fd..a3b6e3f 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -19,6 +19,16 @@ static struct hlist_head event_hash[EVENT_HASHSIZE] __read_mostly;
>  
>  static int next_event_type = __TRACE_LAST_TYPE + 1;
>  
> +void trace_print_seq(struct seq_file *m, struct trace_seq *s)
> +{
> +	int len = s->len >= PAGE_SIZE ? PAGE_SIZE - 1 : s->len;
> +
> +	s->buffer[len] = 0;
> +	seq_puts(m, s->buffer);
> +
> +	trace_seq_init(s);
> +}
> +
>  enum print_line_t trace_print_bprintk_msg_only(struct trace_iterator *iter)
>  {
>  	struct trace_seq *s = &iter->seq;
> diff --git a/kernel/trace/trace_output.h b/kernel/trace/trace_output.h
> index 35c422f..1eac297 100644
> --- a/kernel/trace/trace_output.h
> +++ b/kernel/trace/trace_output.h
> @@ -20,6 +20,8 @@ trace_print_bprintk_msg_only(struct trace_iterator *iter);
>  extern enum print_line_t
>  trace_print_printk_msg_only(struct trace_iterator *iter);
>  
> +extern void trace_print_seq(struct seq_file *m, struct trace_seq *s);
> +
>  extern int trace_seq_printf(struct trace_seq *s, const char *fmt, ...)
>  	__attribute__ ((format (printf, 2, 3)));
>  extern int
> -- 
> 1.6.2
> 
> -- 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5][RFC] tracing: move function profiler data out of function struct
  2009-03-25 18:16             ` Steven Rostedt
@ 2009-03-25 18:35               ` Ingo Molnar
  2009-03-25 18:40                 ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2009-03-25 18:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt


* Steven Rostedt <rostedt@goodmis.org> wrote:

> struct ftrace_profile {
> 	struct hlist_node	*node;
> 	unsigned long		ip;
> 	unsigned long		hit;
> 	unsigned long long	calltime;
> };
> 
> on x86_64 that is 40 bytes. (node is 16, ip is 8, hit is 8, calltime is 8)

node is a pointer so it's 8 bytes - so the whole thing is 32 bytes. 
Acceptable IMO - and could be compressed further if needed.

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5][RFC] tracing: move function profiler data out of function struct
  2009-03-25 18:35               ` Ingo Molnar
@ 2009-03-25 18:40                 ` Steven Rostedt
  2009-03-25 18:46                   ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25 18:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt


On Wed, 25 Mar 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > struct ftrace_profile {
> > 	struct hlist_node	*node;
> > 	unsigned long		ip;
> > 	unsigned long		hit;
> > 	unsigned long long	calltime;
> > };
> > 
> > on x86_64 that is 40 bytes. (node is 16, ip is 8, hit is 8, calltime is 8)
> 
> node is a pointer so it's 8 bytes - so the whole thing is 32 bytes. 
> Acceptable IMO - and could be compressed further if needed.

that was me doing the struct from memory. Because I had already switched 
out of that branch :-/

It is really a struct hlist_node node; not a pointer. My mistake.

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5][RFC] tracing: move function profiler data out of function struct
  2009-03-25 18:40                 ` Steven Rostedt
@ 2009-03-25 18:46                   ` Steven Rostedt
  0 siblings, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25 18:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Tim Bird, Steven Rostedt


On Wed, 25 Mar 2009, Steven Rostedt wrote:
> 
> that was me doing the struct from memory. Because I had already switched 
> out of that branch :-/

Note to self (and others)...

 git show <branch>:<file>

works for things like this.

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5][RFC] tracing: add function profiler
  2009-03-25  3:56 ` [PATCH 1/5][RFC] tracing: add " Steven Rostedt
  2009-03-25 10:02   ` Ingo Molnar
@ 2009-03-25 19:11   ` Frederic Weisbecker
  2009-03-25 19:31     ` Steven Rostedt
  1 sibling, 1 reply; 25+ messages in thread
From: Frederic Weisbecker @ 2009-03-25 19:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Tim Bird, Steven Rostedt

On Tue, Mar 24, 2009 at 11:56:47PM -0400, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
> 
> Impact: new profiling feature
> 
> This patch adds a function profiler. In debugfs/tracing/ two new
> files are created.
> 
>   function_profile_enabled  - to enable or disable profiling
> 
>   trace_stat/functions   - the profiled functions.
> 
> For example:
> 
>   echo 1 > /debugfs/tracing/function_profile_enabled
>   ./hackbench 50
>   echo 0 > /debugfs/tracing/function_profile_enabled
> 
> yields:
> 
>   cat /debugfs/tracing/trace_stat/functions
> 
>   Function                               Hit
>   --------                               ---
>   _spin_lock                        10106442
>   _spin_unlock                      10097492
>   kfree                              6013704
>   _spin_unlock_irqrestore            4423941
>   _spin_lock_irqsave                 4406825
>   __phys_addr                        4181686
>   __slab_free                        4038222
>   dput                               4030130
>   path_put                           4023387
>   unroll_tree_refs                   4019532
> [...]
> 
> The most hit functions are listed first. Functions that are not
> hit are not listed.
> 
> This feature depends on and uses dynamic function tracing. When the
> function profiling is disabled, no overhead occurs. But it still
> takes up around 300KB to hold the data, thus it is not recomended
> to keep it enabled for systems low on memory.
> 
> When a '1' is echoed into the function_profile_enabled file, the
> counters for is function is reset back to zero. Thus you can see what
> functions are hit most by different programs.
> 
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> ---
>  include/linux/ftrace.h |    4 +
>  kernel/trace/Kconfig   |   19 +++
>  kernel/trace/ftrace.c  |  313 +++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 334 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 015a3d2..0456c3a 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -153,6 +153,10 @@ struct dyn_ftrace {
>  		unsigned long		flags;
>  		struct dyn_ftrace	*newlist;
>  	};
> +#ifdef CONFIG_FUNCTION_PROFILER
> +	unsigned long			counter;
> +	struct hlist_node		node;
> +#endif
>  	struct dyn_arch_ftrace		arch;
>  };
>  
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index 8a4d729..95e9ad5 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -105,6 +105,7 @@ config FUNCTION_GRAPH_TRACER
>  	  This is done by setting the current return address on the current
>  	  task structure into a stack of calls.
>  
> +
>  config IRQSOFF_TRACER
>  	bool "Interrupts-off Latency Tracer"
>  	default n
> @@ -376,6 +377,24 @@ config DYNAMIC_FTRACE
>  	 were made. If so, it runs stop_machine (stops all CPUS)
>  	 and modifies the code to jump over the call to ftrace.
>  
> +config FUNCTION_PROFILER
> +	bool "Kernel function profiler"
> +	depends on DYNAMIC_FTRACE
> +	default n
> +	help
> +	 This option enables the kernel function profiler. When the dynamic
> +	 function tracing is enabled, a counter is added into the function
> +	 records used by the dynamic function tracer. A file is created in
> +	 debugfs called function_profile_enabled which defaults to zero.
> +	 When a 1 is echoed into this file profiling begins, and when a
> +	 zero is entered, profiling stops. A file in the trace_stats
> +	 directory called functions, that show the list of functions that
> +	 have been hit and their counters.
> +
> +	 This takes up around 320K more memory.
> +
> +	 If in doubt, say N
> +
>  config FTRACE_MCOUNT_RECORD
>  	def_bool y
>  	depends on DYNAMIC_FTRACE
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 7b8722b..11f364c 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -34,6 +34,7 @@
>  #include <asm/ftrace.h>
>  
>  #include "trace.h"
> +#include "trace_stat.h"
>  
>  #define FTRACE_WARN_ON(cond)			\
>  	do {					\
> @@ -261,7 +262,6 @@ struct ftrace_func_probe {
>  	struct rcu_head		rcu;
>  };
>  
> -
>  enum {
>  	FTRACE_ENABLE_CALLS		= (1 << 0),
>  	FTRACE_DISABLE_CALLS		= (1 << 1),
> @@ -309,6 +309,307 @@ static struct dyn_ftrace *ftrace_free_records;
>  		}				\
>  	}
>  
> +#ifdef CONFIG_FUNCTION_PROFILER
> +static struct hlist_head *ftrace_profile_hash;
> +static int ftrace_profile_bits;
> +static int ftrace_profile_enabled;
> +static DEFINE_MUTEX(ftrace_profile_lock);
> +
> +static void *
> +function_stat_next(void *v, int idx)
> +{
> +	struct dyn_ftrace *rec = v;
> +	struct ftrace_page *pg;
> +
> +	pg = (struct ftrace_page *)((unsigned long)rec & PAGE_MASK);
> +
> + again:
> +	rec++;
> +	if ((void *)rec >= (void *)&pg->records[pg->index]) {
> +		pg = pg->next;
> +		if (!pg)
> +			return NULL;
> +		rec = &pg->records[0];
> +	}
> +
> +	if (rec->flags & FTRACE_FL_FREE ||
> +	    rec->flags & FTRACE_FL_FAILED ||
> +	    !(rec->flags & FTRACE_FL_CONVERTED) ||
> +	    /* ignore non hit functions */
> +	    !rec->counter)
> +		goto again;
> +
> +	return rec;
> +}
> +
> +static void *function_stat_start(struct tracer_stat *trace)
> +{
> +	return function_stat_next(&ftrace_pages_start->records[0], 0);
> +}
> +
> +static int function_stat_cmp(void *p1, void *p2)
> +{
> +	struct dyn_ftrace *a = p1;
> +	struct dyn_ftrace *b = p2;
> +
> +	if (a->counter < b->counter)
> +		return -1;
> +	if (a->counter > b->counter)
> +		return 1;
> +	else
> +		return 0;
> +}
> +
> +static int function_stat_headers(struct seq_file *m)
> +{
> +	seq_printf(m, "  Function                               Hit\n"
> +		      "  --------                               ---\n");
> +	return 0;
> +}
> +
> +static int function_stat_show(struct seq_file *m, void *v)
> +{
> +	struct dyn_ftrace *rec = v;
> +	char str[KSYM_SYMBOL_LEN];
> +
> +	kallsyms_lookup(rec->ip, NULL, NULL, NULL, str);
> +
> +	seq_printf(m, "  %-30.30s  %10lu\n", str, rec->counter);
> +	return 0;
> +}
> +
> +static struct tracer_stat function_stats = {
> +	.name = "functions",
> +	.stat_start = function_stat_start,
> +	.stat_next = function_stat_next,
> +	.stat_cmp = function_stat_cmp,
> +	.stat_headers = function_stat_headers,
> +	.stat_show = function_stat_show
> +};
> +
> +static void ftrace_profile_init(int nr_funcs)
> +{
> +	unsigned long addr;
> +	int order;
> +	int size;
> +
> +	/*
> +	 * We are profiling all functions, lets make it 1/4th of the
> +	 * number of functions that are in core kernel. So we have to
> +	 * iterate 4 times.
> +	 */
> +	order = (sizeof(struct hlist_head) * nr_funcs) / 4;
> +	order = get_order(order);
> +	size = 1 << (PAGE_SHIFT + order);
> +
> +	pr_info("Allocating %d KB for profiler hash\n", size >> 10);
> +
> +	addr = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> +	if (!addr) {
> +		pr_warning("Could not allocate function profiler hash\n");
> +		return;
> +	}
> +
> +	ftrace_profile_hash = (void *)addr;
> +
> +	/*
> +	 * struct hlist_head should be a pointer of 4 or 8 bytes.
> +	 * And a simple bit manipulation can be done, but if for
> +	 * some reason struct hlist_head is not a mulitple of 2,
> +	 * then we play it safe, and simply count. This function
> +	 * is done once at boot up, so it is not that critical in
> +	 * performance.
> +	 */
> +
> +	size--;
> +	size /= sizeof(struct hlist_head);
> +
> +	for (; size; size >>= 1)
> +		ftrace_profile_bits++;
> +
> +	pr_info("Function profiler has %d hash buckets\n",
> +		1 << ftrace_profile_bits);
> +
> +	return;
> +}
> +
> +static ssize_t
> +ftrace_profile_read(struct file *filp, char __user *ubuf,
> +		     size_t cnt, loff_t *ppos)
> +{
> +	char buf[64];
> +	int r;
> +
> +	r = sprintf(buf, "%u\n", ftrace_profile_enabled);
> +	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
> +}
> +
> +static void ftrace_profile_reset(void)
> +{
> +	struct dyn_ftrace *rec;
> +	struct ftrace_page *pg;
> +
> +	do_for_each_ftrace_rec(pg, rec) {
> +		rec->counter = 0;
> +	} while_for_each_ftrace_rec();
> +}
> +
> +static struct dyn_ftrace *ftrace_find_profiled_func(unsigned long ip)
> +{
> +	struct dyn_ftrace *rec;
> +	struct hlist_head *hhd;
> +	struct hlist_node *n;
> +	unsigned long flags;
> +	unsigned long key;
> +
> +	if (!ftrace_profile_hash)
> +		return NULL;
> +
> +	key = hash_long(ip, ftrace_profile_bits);
> +	hhd = &ftrace_profile_hash[key];
> +
> +	if (hlist_empty(hhd))
> +		return NULL;
> +
> +	local_irq_save(flags);
> +	hlist_for_each_entry_rcu(rec, n, hhd, node) {
> +		if (rec->ip == ip)
> +			goto out;
> +	}
> +	rec = NULL;
> + out:
> +	local_irq_restore(flags);
> +
> +	return rec;
> +}
> +
> +static void
> +function_profile_call(unsigned long ip, unsigned long parent_ip)
> +{
> +	struct dyn_ftrace *rec;
> +	unsigned long flags;
> +
> +	if (!ftrace_profile_enabled)
> +		return;
> +
> +	local_irq_save(flags);
> +	rec = ftrace_find_profiled_func(ip);
> +	if (!rec)
> +		goto out;
> +
> +	rec->counter++;
> + out:
> +	local_irq_restore(flags);
> +}
> +
> +static struct ftrace_ops ftrace_profile_ops __read_mostly =
> +{
> +	.func = function_profile_call,
> +};
> +
> +static ssize_t
> +ftrace_profile_write(struct file *filp, const char __user *ubuf,
> +		     size_t cnt, loff_t *ppos)
> +{
> +	unsigned long val;
> +	char buf[64];
> +	int ret;
> +
> +	if (!ftrace_profile_hash) {
> +		pr_info("Can not enable hash due to earlier problems\n");
> +		return -ENODEV;
> +	}
> +
> +	if (cnt >= sizeof(buf))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&buf, ubuf, cnt))
> +		return -EFAULT;
> +
> +	buf[cnt] = 0;
> +
> +	ret = strict_strtoul(buf, 10, &val);
> +	if (ret < 0)
> +		return ret;
> +
> +	val = !!val;
> +
> +	mutex_lock(&ftrace_profile_lock);
> +	if (ftrace_profile_enabled ^ val) {
> +		if (val) {
> +			ftrace_profile_reset();
> +			register_ftrace_function(&ftrace_profile_ops);
> +			ftrace_profile_enabled = 1;
> +		} else {
> +			ftrace_profile_enabled = 0;
> +			unregister_ftrace_function(&ftrace_profile_ops);
> +		}
> +	}
> +	mutex_unlock(&ftrace_profile_lock);
> +
> +	filp->f_pos += cnt;
> +
> +	return cnt;
> +}
> +
> +static const struct file_operations ftrace_profile_fops = {
> +	.open		= tracing_open_generic,
> +	.read		= ftrace_profile_read,
> +	.write		= ftrace_profile_write,
> +};


I should really provide such helper per stat tracer.

Say, something like:

/debug/tracing/trace_stat/name/enable

Thanks,
Frederic.


> +
> +static void ftrace_profile_debugfs(struct dentry *d_tracer)
> +{
> +	struct dentry *entry;
> +	int ret;
> +
> +	ret = register_stat_tracer(&function_stats);
> +	if (ret) {
> +		pr_warning("Warning: could not register "
> +			   "function stats\n");
> +		return;
> +	}
> +
> +	entry = debugfs_create_file("function_profile_enabled", 0644,
> +				    d_tracer, NULL, &ftrace_profile_fops);
> +	if (!entry)
> +		pr_warning("Could not create debugfs "
> +			   "'function_profile_enabled' entry\n");
> +}
> +
> +static void ftrace_add_profile(struct dyn_ftrace *rec)
> +{
> +	unsigned long key;
> +
> +	if (!ftrace_profile_hash)
> +		return;
> +
> +	key = hash_long(rec->ip, ftrace_profile_bits);
> +	hlist_add_head_rcu(&rec->node, &ftrace_profile_hash[key]);
> +}
> +
> +static void ftrace_profile_release(struct dyn_ftrace *rec)
> +{
> +	mutex_lock(&ftrace_profile_lock);
> +	hlist_del(&rec->node);
> +	mutex_unlock(&ftrace_profile_lock);
> +}
> +
> +#else /* CONFIG_FUNCTION_PROFILER */
> +static void ftrace_profile_init(int nr_funcs)
> +{
> +}
> +static void ftrace_add_profile(struct dyn_ftrace *rec)
> +{
> +}
> +static void ftrace_profile_debugfs(struct dentry *d_tracer)
> +{
> +}
> +static void ftrace_profile_release(struct dyn_ftrace *rec)
> +{
> +}
> +#endif /* CONFIG_FUNCTION_PROFILER */
> +
>  #ifdef CONFIG_KPROBES
>  
>  static int frozen_record_count;
> @@ -359,8 +660,10 @@ void ftrace_release(void *start, unsigned long size)
>  	mutex_lock(&ftrace_lock);
>  	do_for_each_ftrace_rec(pg, rec) {
>  		if ((rec->ip >= s) && (rec->ip < e) &&
> -		    !(rec->flags & FTRACE_FL_FREE))
> +		    !(rec->flags & FTRACE_FL_FREE)) {
>  			ftrace_free_rec(rec);
> +			ftrace_profile_release(rec);
> +		}
>  	} while_for_each_ftrace_rec();
>  	mutex_unlock(&ftrace_lock);
>  }
> @@ -414,6 +717,8 @@ ftrace_record_ip(unsigned long ip)
>  	rec->newlist = ftrace_new_addrs;
>  	ftrace_new_addrs = rec;
>  
> +	ftrace_add_profile(rec);
> +
>  	return rec;
>  }
>  
> @@ -2157,6 +2462,8 @@ static __init int ftrace_init_dyn_debugfs(struct dentry *d_tracer)
>  			   "'set_graph_function' entry\n");
>  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
>  
> +	ftrace_profile_debugfs(d_tracer);
> +
>  	return 0;
>  }
>  
> @@ -2225,6 +2532,8 @@ void __init ftrace_init(void)
>  	if (ret)
>  		goto failed;
>  
> +	ftrace_profile_init(count);
> +
>  	last_ftrace_enabled = ftrace_enabled = 1;
>  
>  	ret = ftrace_convert_nops(NULL,
> -- 
> 1.6.2
> 
> -- 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5][RFC] tracing: add function profiler
  2009-03-25 19:11   ` [PATCH 1/5][RFC] tracing: add function profiler Frederic Weisbecker
@ 2009-03-25 19:31     ` Steven Rostedt
  0 siblings, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2009-03-25 19:31 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Tim Bird, Steven Rostedt


On Wed, 25 Mar 2009, Frederic Weisbecker wrote:
> > +
> > +static const struct file_operations ftrace_profile_fops = {
> > +	.open		= tracing_open_generic,
> > +	.read		= ftrace_profile_read,
> > +	.write		= ftrace_profile_write,
> > +};
> 
> 
> I should really provide such helper per stat tracer.
> 
> Say, something like:
> 
> /debug/tracing/trace_stat/name/enable
> 

No, because if you notice in the later patches, I add a per cpu tracing. 
This is standard operations, and adding too much will unnecessarily limit 
what we can do.

-- Steve


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2009-03-25 19:31 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-25  3:56 [PATCH 0/5][RFC] [RFC] function profiler Steven Rostedt
2009-03-25  3:56 ` [PATCH 1/5][RFC] tracing: add " Steven Rostedt
2009-03-25 10:02   ` Ingo Molnar
2009-03-25 15:42     ` Steven Rostedt
2009-03-25 17:38       ` [PATCH][GIT PULL] tracing: clean up tracing profiler Steven Rostedt
2009-03-25 19:11   ` [PATCH 1/5][RFC] tracing: add function profiler Frederic Weisbecker
2009-03-25 19:31     ` Steven Rostedt
2009-03-25  3:56 ` [PATCH 2/5][RFC] tracing: move function profiler data out of function struct Steven Rostedt
2009-03-25 10:07   ` Ingo Molnar
2009-03-25 15:43     ` Steven Rostedt
2009-03-25 17:44       ` Ingo Molnar
2009-03-25 17:57         ` Steven Rostedt
2009-03-25 18:02           ` Ingo Molnar
2009-03-25 18:16             ` Steven Rostedt
2009-03-25 18:35               ` Ingo Molnar
2009-03-25 18:40                 ` Steven Rostedt
2009-03-25 18:46                   ` Steven Rostedt
2009-03-25  3:56 ` [PATCH 3/5][RFC] tracing: adding function timings to function profiler Steven Rostedt
2009-03-25 18:30   ` Frederic Weisbecker
2009-03-25  3:56 ` [PATCH 4/5][RFC] tracing: make the function profiler per cpu Steven Rostedt
2009-03-25  3:56 ` [PATCH 5/5][RFC] function-graph: add option to calculate graph time or not Steven Rostedt
2009-03-25 10:11   ` Ingo Molnar
2009-03-25 17:35     ` Steven Rostedt
2009-03-25  9:50 ` [PATCH 0/5][RFC] [RFC] function profiler Ingo Molnar
2009-03-25  9:52 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox