[PATCH] ftrace: add documentation for function graph tracer

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Tim Bird <tim.bird@am.sony.com>
To: Steven Rostedt <rostedt@goodmis.org>,
	linux kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@elte.hu>
Subject: [PATCH] ftrace: add documentation for function graph tracer
Date: Fri, 22 May 2009 13:02:01 -0700	[thread overview]
Message-ID: <4A1704B9.3010200@am.sony.com> (raw)

Add some documentation specifically for the function graph tracer.

I realized about half-way into this that this is redundant with
material already in Document/trace/ftrace.txt.  However, I
think there is value in having in a separate document.  Also,
I have a little bit of stuff here which is not in the other
document.

Let me know what you think.

(ftd source to follow next week)

Signed-off-by: Tim Bird <tim.bird@am.sony.com>
---
 Documentation/trace/func-graph.txt |  237 +++++++++++++++++++++++++++++++++++++
 1 files changed, 237 insertions(+)

--- /dev/null
+++ b/Documentation/trace/func-graph.txt
@@ -0,0 +1,237 @@
+		Function Graph Tracing
+
+	   Documentation written by Tim Bird
+
+
+Function graph tracing is a variant of function tracing, which traces
+both the entry and exit of each kernel function.
+
+Preparation
+-----------
+
+The function graph tracing feature is compiled in using the
+CONFIG_FUNCTION_GRAPH_TRACER option. Tracing is disabled by default, so
+it is safe to have this set to yes. On platforms without dynamic tracing
+capability (e.g. ARM in 2.6.30), function tracing (and function graph tracing)
+add significant overhead to function execution in the Linux kernel).  On
+these platforms it would be unwise to leave function tracing turned on
+for in production environments.
+
+Note that function graph tracing is supported on SMP systems.
+
+Usage Quick Reference
+---------------------
+
+ $ mount -t debugfs debugfs /debug
+ $ echo function_graph > /debug/tracing/current_tracer
+ $ echo funcgraph-abstime >/debug/tracing/trace_options
+ $ echo 1 >/debug/tracing/tracing_enabled
+ $ <do something> ; echo 0 >/debug/tracing/tracing_enabled
+ $ cat /debug/tracing/trace > /tmp/trace-something.txt
+ $ echo nop > /debug/tracing/current_tracer
+
+Usage
+-----
+
+Make sure debugfs is mounted to /debug. If not, (requires root privileges)
+ $ mount -t debugfs debugfs /debug
+
+Activate function graph tracing (requires root privileges):
+ $ echo function_graph > /debug/tracing/current_tracer
+
+Enable tracing (if not already enabled)
+ $ echo 1 >/debug/tracing/tracing_enabled
+
+Do something, and quickly disable tracing, to avoid overrunning the
+related events in the trace log.  Note that the trace log uses a ring
+buffer, which continually overwrites older events in the log, until
+tracing is disabled.
+
+ $ <do something> ; echo 0 >/debug/tracing/tracing_enabled
+
+Store the trace:
+$ cat /debug/tracing/trace > /tmp/trace-something.txt
+
+Extra Tip:
+During tracing you can place comments (markers) into the trace by
+
+ $ echo "foo happened" > /debug/tracing/trace_marker
+
+This makes it easier to see which part of the (huge) trace corresponds to
+which action. It is recommended to place descriptive markers about what you
+do.  (I'm not sure how effective this is for function graph tracing.  The
+trace buffer fills so quickly that any comment made in "human time" will
+likely get overrun in the trace buffer before a human has a chance to stop
+the trace. - this tip was copied from mmiotrace.txt)
+
+Shut down function graph tracing (requires root privileges):
+ $ echo nop > /debug/tracing/current_tracer
+
+If it doesn't look like sufficient data was captured, you can enlarge the
+buffers and try again. Buffers are enlarged by first seeing how large the
+current buffers are:
+
+ $ cat /debug/tracing/buffer_size_kb
+
+gives you a number. Approximately double this number and write it back.
+
+For instance:
+ $ echo 128000 > /debug/tracing/buffer_size_kb
+
+Then start again from the top.
+
+How function graph tracing works
+--------------------------------
+The function tracer (upon which the function graph tracer is dependent) works
+by having the compiler put a call to a special instrumentation routine (called
+'mcount') in the prologue of every kernel function.  (Well, almost every
+kernel function - some functions can not be traced for one reason or another.)
+The function graph tracer records the entry time for each function traced.
+Then, it arranges to capture the function exit, by setting up a return
+trampoline.  This is done by recording the real return address, and
+substituting the trampoline address for the return address in the process
+stack.  When the function returns to the tracer, the time is recorded again,
+and the real caller is returned to.
+
+Trace Log Format
+----------------
+
+The log format for the function graph tracer is text and is unfortunately NOT
+easily filtered with e.g. grep and awk.  The output is easily read by a human,
+and is useful for showing how functions nest within other functions.  But
+since the duration of each function is printed at the end of the routine (on a
+line with no name, just a closing brace), you have to scan through the log to
+match up the duration times with the matching functions.
+
+The function graph tracer consists of a header showing the tracer name, and
+the fields that are configured for display on each line.  Then lines are shown
+for function entry and exit events.  Also, when a process change occurs (that
+is, a schedule occurs in the kernel), a special sequence is shown.
+
+Here is a sample showing the log header and a few sample trace lines.
+
+# tracer: function_graph
+#
+#      TIME       CPU  DURATION                  FUNCTION CALLS
+#       |         |     |   |                     |   |   |   |
+ 2105.963678 |   0)               |      mutex_unlock() {
+ 2105.963682 |   0)   5.715 us    |        __mutex_unlock_slowpath();
+ 2105.963693 |   0) + 14.700 us   |      }
+ 2105.963698 |   0)               |      dnotify_parent() {
+
+A context switch line shows the process name and pid for the outgoing and
+incoming tasks.  The outgoing task is shown on the left, followed by '=>'
+then the incoming task.
+
+Here is a sample of a context switch:
+ ------------------------------------------
+ 0)     ls-595     =>    sed-596
+ ------------------------------------------
+
+You can configure the items displayed for each trace element, by setting
+/debug/tracing/trace_options.  (See Trace Display Options below)
+
+The following elements are available for display:
+
+TIME - this is the (absolute) time since the machine started in seconds, with
+a decimal portion showing resolution to microseconds.  The actual resolution
+of the time (and of the tracer timings in general) will depend on the specific
+clock used by ftrace. This option is off by default.
+
+CPU - indicates the CPU on which the function was executed
+
+TASK/PID - shows the task name and PID (process ID) for each trace entry.  The
+entry has the format <task>-<pid> (e.g. "sh-443").  This option is off by
+default.
+
+OVERHEAD (not labeled) - is a flag indicator, showing functions whose
+duration exceeded certain threshholds:
+   space (or nothing) - the function executed in less than 10 microseconds
+   + - the function lasted longer than 10 microseconds
+   ! - the function lasted longer than 100 microseconds
+
+DURATION - this is printed for function exit trace events, and shows
+ the time in microseconds that the function executed.  This is "wall time"
+for the function, and includes the time spent in any sub-routines or
+with the CPU scheduled away from this routine (this routine or a child
+waiting for a something.)
+
+FUNCTION CALLS - this shows the name of the function for function entry events,
+and a closing brace '}' for function exit events. For function entry (only,
+not a leaf) the line consists of a function name and an open brace.  For
+function exit, the line consists of a closing brace.  Note that the function
+name and closing brace are indented to indicate their depth on the call stack.
+This allows you to match up function exits with their matching functions, and
+to easily identify the call tree (graph) for a particular function.
+
+Trace Display Options
+---------------------
+
+The following display options are available for customizing the function
+graph trace output:
+
+   abstime - show TIME
+   cpu - show CPU
+   overhead - show OVERHEAD indicator
+   proc - show TASK and PID
+   duration - show DURATION
+   overrun - shows if the trace had an overrun (used for debugging the tracer)
+
+To set an option echo a string to /debug/tracing/trace_options, using the format:
+"funcgraph-<opt_name>".  To unset a particular option, use the format:
+"nofuncgraph-<opt_name>".
+
+ $ echo funcgraph-abstime >/debug/tracing/trace_options
+ $ echo nofuncgraph-cpu >/debug/tracing/trace_options
+ $ cat trace
+
+Tools for Developers
+--------------------
+
+The user space tool 'ftd' includes capabilities for:
+ * sorting the information by time spent in each routine
+ * filtering the information by cpu, process, call count, and execution time
+ * showing minimum and maximum time spent in a routine
+
+= Tutorial/Samples =
+For an interesting first trace, try the following:
+(make sure no other processes are active, if you can)
+
+ $ mount -t debugfs debugfs /debug
+ $ cd /debug/tracing
+ $ echo function_graph > current_tracer
+ $ echo funcgraph-abstime > trace_options
+ $ echo 10000 > buffer_size_kb
+ $ echo 1 > tracing_enabled ; ls /bin | sed s/[aeiou]/z/g ; \
+	echo "marker test!" ; echo 0 >tracing_enabled
+ $ cat trace > /tmp/trace1.txt
+
+You might need to change the buffer size to 20000 (20 M) if
+you don't capture the entire trace (if the top of the buffer
+starts in the middle of ls or sed execution).
+
+Now examine the data:
+ $ ls -l /tmp/trace1.txt
+ $ vi /tmp/trace1.txt
+
+Note that the trace data is quite large, but probably only covers a
+second or two of execution.
+
+Here are some things to look for in the trace:
+
+ * Watch as the shell forks and execs ls and sed.
+ * Watch for timer interrupts which cause context switches between the three
+ * processes (sh, ls and sed).
+ * Watch for page faults during ls and sed startup, as the processes are paged
+   into memory.
+ * Look for routines starting with "sys_" which are at the top level of the
+   call graphs. These are system calls.
+ * Look for the duration value for the system calls, on the lines with the
+   closing braces at the same indentation level.  You can see how long
+   different system calls took to execute, and what sub-routines were called
+   internally in the kernel.  Also, you can determine if a long-running
+   system call was interrupted (and the process scheduled out, before
+   returning)
+ * Look for the comment "marker test!" near the end of the trace.
+
+

                 reply	other threads:[~2009-05-22 20:02 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A1704B9.3010200@am.sony.com \
    --to=tim.bird@am.sony.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.