[PATCH 0/7] [GIT PULL] tracing: updates

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/7] [GIT PULL] tracing: updates
@ 2010-04-01  3:17 Steven Rostedt
  2010-04-01  3:17 ` [PATCH 1/7] tracing: Convert some signal events to DEFINE_TRACE Steven Rostedt
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Steven Rostedt @ 2010-04-01  3:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker


Ingo,

Please pull the latest tip/tracing/core tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/core


Li Zefan (3):
      tracing: Convert some signal events to DEFINE_TRACE
      tracing: Update comments
      tracing: Remove side effect from module tracepoints that caused a GPF

Steven Rostedt (4):
      tracing: Fix compile error in module tracepoints when MODULE_UNLOAD not set
      ring-buffer: Add place holder recording of dropped events
      tracing: Show the lost events in the trace_pipe output
      ring-buffer: Add lost event count to end of sub buffer

----
 drivers/oprofile/cpu_buffer.c        |    4 +-
 include/linux/ftrace_event.h         |    1 +
 include/linux/module.h               |    6 +-
 include/linux/ring_buffer.h          |    6 +-
 include/trace/events/module.h        |   18 ++++--
 include/trace/events/signal.h        |   52 +++++++----------
 include/trace/ftrace.h               |   33 ++++++-----
 kernel/module.c                      |    8 +-
 kernel/trace/ring_buffer.c           |  101 ++++++++++++++++++++++++++++++++--
 kernel/trace/ring_buffer_benchmark.c |    2 +-
 kernel/trace/trace.c                 |   30 +++++++---
 kernel/trace/trace_functions_graph.c |    5 +-
 kernel/trace/trace_selftest.c        |    2 +-
 13 files changed, 187 insertions(+), 81 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/7] tracing: Convert some signal events to DEFINE_TRACE
  2010-04-01  3:17 [PATCH 0/7] [GIT PULL] tracing: updates Steven Rostedt
@ 2010-04-01  3:17 ` Steven Rostedt
  2010-04-01  3:17 ` [PATCH 2/7] tracing: Update comments Steven Rostedt
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2010-04-01  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Masami Hiramatsu,
	Li Zefan

[-- Attachment #1: 0001-tracing-Convert-some-signal-events-to-DEFINE_TRACE.patch --]
[-- Type: text/plain, Size: 3123 bytes --]

From: Li Zefan <lizf@cn.fujitsu.com>

Use DECLARE_EVENT_CLASS to remove duplicate code:

text    data     bss     dec     hex filename
  23639    6084       8   29731    7423 kernel/signal.o.orig
  22727    6084       8   28819    7093 kernel/signal.o

2 events are converted:

  signal_queue_overflow: signal_overflow_fail, signal_lose_info

No functional change.

Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4BA97FBD.8070703@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/events/signal.h |   52 +++++++++++++++++-----------------------
 1 files changed, 22 insertions(+), 30 deletions(-)

diff --git a/include/trace/events/signal.h b/include/trace/events/signal.h
index a510b75..814566c 100644
--- a/include/trace/events/signal.h
+++ b/include/trace/events/signal.h
@@ -100,18 +100,7 @@ TRACE_EVENT(signal_deliver,
 		  __entry->sa_handler, __entry->sa_flags)
 );
 
-/**
- * signal_overflow_fail - called when signal queue is overflow
- * @sig: signal number
- * @group: signal to process group or not (bool)
- * @info: pointer to struct siginfo
- *
- * Kernel fails to generate 'sig' signal with 'info' siginfo, because
- * siginfo queue is overflow, and the signal is dropped.
- * 'group' is not 0 if the signal will be sent to a process group.
- * 'sig' is always one of RT signals.
- */
-TRACE_EVENT(signal_overflow_fail,
+DECLARE_EVENT_CLASS(signal_queue_overflow,
 
 	TP_PROTO(int sig, int group, struct siginfo *info),
 
@@ -135,6 +124,24 @@ TRACE_EVENT(signal_overflow_fail,
 );
 
 /**
+ * signal_overflow_fail - called when signal queue is overflow
+ * @sig: signal number
+ * @group: signal to process group or not (bool)
+ * @info: pointer to struct siginfo
+ *
+ * Kernel fails to generate 'sig' signal with 'info' siginfo, because
+ * siginfo queue is overflow, and the signal is dropped.
+ * 'group' is not 0 if the signal will be sent to a process group.
+ * 'sig' is always one of RT signals.
+ */
+DEFINE_EVENT(signal_queue_overflow, signal_overflow_fail,
+
+	TP_PROTO(int sig, int group, struct siginfo *info),
+
+	TP_ARGS(sig, group, info)
+);
+
+/**
  * signal_lose_info - called when siginfo is lost
  * @sig: signal number
  * @group: signal to process group or not (bool)
@@ -145,28 +152,13 @@ TRACE_EVENT(signal_overflow_fail,
  * 'group' is not 0 if the signal will be sent to a process group.
  * 'sig' is always one of non-RT signals.
  */
-TRACE_EVENT(signal_lose_info,
+DEFINE_EVENT(signal_queue_overflow, signal_lose_info,
 
 	TP_PROTO(int sig, int group, struct siginfo *info),
 
-	TP_ARGS(sig, group, info),
-
-	TP_STRUCT__entry(
-		__field(	int,	sig	)
-		__field(	int,	group	)
-		__field(	int,	errno	)
-		__field(	int,	code	)
-	),
-
-	TP_fast_assign(
-		__entry->sig	= sig;
-		__entry->group	= group;
-		TP_STORE_SIGINFO(__entry, info);
-	),
-
-	TP_printk("sig=%d group=%d errno=%d code=%d",
-		  __entry->sig, __entry->group, __entry->errno, __entry->code)
+	TP_ARGS(sig, group, info)
 );
+
 #endif /* _TRACE_SIGNAL_H */
 
 /* This part must be outside protection */
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/7] tracing: Update comments
  2010-04-01  3:17 [PATCH 0/7] [GIT PULL] tracing: updates Steven Rostedt
  2010-04-01  3:17 ` [PATCH 1/7] tracing: Convert some signal events to DEFINE_TRACE Steven Rostedt
@ 2010-04-01  3:17 ` Steven Rostedt
  2010-04-01  3:17 ` [PATCH 3/7] tracing: Remove side effect from module tracepoints that caused a GPF Steven Rostedt
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2010-04-01  3:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Li Zefan

[-- Attachment #1: 0002-tracing-Update-comments.patch --]
[-- Type: text/plain, Size: 3438 bytes --]

From: Li Zefan <lizf@cn.fujitsu.com>

Make some comments consistent with the code.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4BA97FD0.7090202@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/ftrace.h |   33 +++++++++++++++++++--------------
 1 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index ea6f9d4..75dd778 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -154,9 +154,11 @@
  *
  *	field = (typeof(field))entry;
  *
- *	p = get_cpu_var(ftrace_event_seq);
+ *	p = &get_cpu_var(ftrace_event_seq);
  *	trace_seq_init(p);
- *	ret = trace_seq_printf(s, <TP_printk> "\n");
+ *	ret = trace_seq_printf(s, "%s: ", <call>);
+ *	if (ret)
+ *		ret = trace_seq_printf(s, <TP_printk> "\n");
  *	put_cpu();
  *	if (!ret)
  *		return TRACE_TYPE_PARTIAL_LINE;
@@ -450,38 +452,38 @@ perf_trace_disable_##name(struct ftrace_event_call *unused)		\
  *
  * static void ftrace_raw_event_<call>(proto)
  * {
+ *	struct ftrace_data_offsets_<call> __maybe_unused __data_offsets;
  *	struct ring_buffer_event *event;
  *	struct ftrace_raw_<call> *entry; <-- defined in stage 1
  *	struct ring_buffer *buffer;
  *	unsigned long irq_flags;
+ *	int __data_size;
  *	int pc;
  *
  *	local_save_flags(irq_flags);
  *	pc = preempt_count();
  *
+ *	__data_size = ftrace_get_offsets_<call>(&__data_offsets, args);
+ *
  *	event = trace_current_buffer_lock_reserve(&buffer,
  *				  event_<call>.id,
- *				  sizeof(struct ftrace_raw_<call>),
+ *				  sizeof(*entry) + __data_size,
  *				  irq_flags, pc);
  *	if (!event)
  *		return;
  *	entry	= ring_buffer_event_data(event);
  *
- *	<assign>;  <-- Here we assign the entries by the __field and
- *			__array macros.
+ *	{ <assign>; }  <-- Here we assign the entries by the __field and
+ *			   __array macros.
  *
- *	trace_current_buffer_unlock_commit(buffer, event, irq_flags, pc);
+ *	if (!filter_current_check_discard(buffer, event_call, entry, event))
+ *		trace_current_buffer_unlock_commit(buffer,
+ *						   event, irq_flags, pc);
  * }
  *
  * static int ftrace_raw_reg_event_<call>(struct ftrace_event_call *unused)
  * {
- *	int ret;
- *
- *	ret = register_trace_<call>(ftrace_raw_event_<call>);
- *	if (!ret)
- *		pr_info("event trace: Could not activate trace point "
- *			"probe to <call>");
- *	return ret;
+ *	return register_trace_<call>(ftrace_raw_event_<call>);
  * }
  *
  * static void ftrace_unreg_event_<call>(struct ftrace_event_call *unused)
@@ -493,6 +495,8 @@ perf_trace_disable_##name(struct ftrace_event_call *unused)		\
  *	.trace			= ftrace_raw_output_<call>, <-- stage 2
  * };
  *
+ * static const char print_fmt_<call>[] = <TP_printk>;
+ *
  * static struct ftrace_event_call __used
  * __attribute__((__aligned__(4)))
  * __attribute__((section("_ftrace_events"))) event_<call> = {
@@ -501,6 +505,8 @@ perf_trace_disable_##name(struct ftrace_event_call *unused)		\
  *	.raw_init		= trace_event_raw_init,
  *	.regfunc		= ftrace_reg_event_<call>,
  *	.unregfunc		= ftrace_unreg_event_<call>,
+ *	.print_fmt		= print_fmt_<call>,
+ *	.define_fields		= ftrace_define_fields_<call>,
  * }
  *
  */
@@ -569,7 +575,6 @@ ftrace_raw_event_id_##call(struct ftrace_event_call *event_call,	\
 		return;							\
 	entry	= ring_buffer_event_data(event);			\
 									\
-									\
 	tstruct								\
 									\
 	{ assign; }							\
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/7] tracing: Remove side effect from module tracepoints that caused a GPF
  2010-04-01  3:17 [PATCH 0/7] [GIT PULL] tracing: updates Steven Rostedt
  2010-04-01  3:17 ` [PATCH 1/7] tracing: Convert some signal events to DEFINE_TRACE Steven Rostedt
  2010-04-01  3:17 ` [PATCH 2/7] tracing: Update comments Steven Rostedt
@ 2010-04-01  3:17 ` Steven Rostedt
  2010-04-01  3:17 ` [PATCH 4/7] tracing: Fix compile error in module tracepoints when MODULE_UNLOAD not set Steven Rostedt
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2010-04-01  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker,
	Mathieu Desnoyers, Li Zefan

[-- Attachment #1: 0003-tracing-Remove-side-effect-from-module-tracepoints-t.patch --]
[-- Type: text/plain, Size: 3468 bytes --]

From: Li Zefan <lizf@cn.fujitsu.com>

Remove the @refcnt argument, because it has side-effects, and arguments with
side-effects are not skipped by the jump over disabled instrumentation and are
executed even when the tracepoint is disabled.

This was also causing a GPF as found by Randy Dunlap:

Subject: [PATCH 3/7] 2.6.33 GP fault only when built with tracing
LKML-Reference: <4BA2B69D.3000309@oracle.com>

Note, the current 2.6.34-rc has a fix for the actual cause of the GPF,
but this fixes one of its triggers.

Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4BA97FA7.6040406@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/module.h        |    6 ++----
 include/trace/events/module.h |   14 +++++++-------
 kernel/module.c               |    3 +--
 3 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 5e869ff..393ec39 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -460,8 +460,7 @@ static inline void __module_get(struct module *module)
 	if (module) {
 		preempt_disable();
 		__this_cpu_inc(module->refptr->count);
-		trace_module_get(module, _THIS_IP_,
-				 __this_cpu_read(module->refptr->count));
+		trace_module_get(module, _THIS_IP_);
 		preempt_enable();
 	}
 }
@@ -475,8 +474,7 @@ static inline int try_module_get(struct module *module)
 
 		if (likely(module_is_live(module))) {
 			__this_cpu_inc(module->refptr->count);
-			trace_module_get(module, _THIS_IP_,
-				__this_cpu_read(module->refptr->count));
+			trace_module_get(module, _THIS_IP_);
 		}
 		else
 			ret = 0;
diff --git a/include/trace/events/module.h b/include/trace/events/module.h
index 4b0f48b..a585f81 100644
--- a/include/trace/events/module.h
+++ b/include/trace/events/module.h
@@ -53,9 +53,9 @@ TRACE_EVENT(module_free,
 
 DECLARE_EVENT_CLASS(module_refcnt,
 
-	TP_PROTO(struct module *mod, unsigned long ip, int refcnt),
+	TP_PROTO(struct module *mod, unsigned long ip),
 
-	TP_ARGS(mod, ip, refcnt),
+	TP_ARGS(mod, ip),
 
 	TP_STRUCT__entry(
 		__field(	unsigned long,	ip		)
@@ -65,7 +65,7 @@ DECLARE_EVENT_CLASS(module_refcnt,
 
 	TP_fast_assign(
 		__entry->ip	= ip;
-		__entry->refcnt	= refcnt;
+		__entry->refcnt	= __this_cpu_read(mod->refptr->count);
 		__assign_str(name, mod->name);
 	),
 
@@ -75,16 +75,16 @@ DECLARE_EVENT_CLASS(module_refcnt,
 
 DEFINE_EVENT(module_refcnt, module_get,
 
-	TP_PROTO(struct module *mod, unsigned long ip, int refcnt),
+	TP_PROTO(struct module *mod, unsigned long ip),
 
-	TP_ARGS(mod, ip, refcnt)
+	TP_ARGS(mod, ip)
 );
 
 DEFINE_EVENT(module_refcnt, module_put,
 
-	TP_PROTO(struct module *mod, unsigned long ip, int refcnt),
+	TP_PROTO(struct module *mod, unsigned long ip),
 
-	TP_ARGS(mod, ip, refcnt)
+	TP_ARGS(mod, ip)
 );
 
 TRACE_EVENT(module_request,
diff --git a/kernel/module.c b/kernel/module.c
index c968d36..21591ad 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -800,8 +800,7 @@ void module_put(struct module *module)
 		preempt_disable();
 		__this_cpu_dec(module->refptr->count);
 
-		trace_module_put(module, _RET_IP_,
-				 __this_cpu_read(module->refptr->count));
+		trace_module_put(module, _RET_IP_);
 		/* Maybe they're waiting for us to drop reference? */
 		if (unlikely(!module_is_live(module)))
 			wake_up_process(module->waiter);
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/7] tracing: Fix compile error in module tracepoints when MODULE_UNLOAD not set
  2010-04-01  3:17 [PATCH 0/7] [GIT PULL] tracing: updates Steven Rostedt
                   ` (2 preceding siblings ...)
  2010-04-01  3:17 ` [PATCH 3/7] tracing: Remove side effect from module tracepoints that caused a GPF Steven Rostedt
@ 2010-04-01  3:17 ` Steven Rostedt
  2010-04-01  3:17 ` [PATCH 5/7] ring-buffer: Add place holder recording of dropped events Steven Rostedt
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2010-04-01  3:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker

[-- Attachment #1: 0004-tracing-Fix-compile-error-in-module-tracepoints-when.patch --]
[-- Type: text/plain, Size: 1773 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

If modules are configured in the build but unloading of modules is not,
then the refcnt is not defined. Place the get/put module tracepoints
under CONFIG_MODULE_UNLOAD since it references this field in the module
structure.

As a side-effect, this patch also reduces the code when MODULE_UNLOAD
is not set, because these unused tracepoints are not created.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/trace/events/module.h |    4 ++++
 kernel/module.c               |    5 +++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/module.h b/include/trace/events/module.h
index a585f81..f07b44a 100644
--- a/include/trace/events/module.h
+++ b/include/trace/events/module.h
@@ -51,6 +51,9 @@ TRACE_EVENT(module_free,
 	TP_printk("%s", __get_str(name))
 );
 
+#ifdef CONFIG_MODULE_UNLOAD
+/* trace_module_get/put are only used if CONFIG_MODULE_UNLOAD is defined */
+
 DECLARE_EVENT_CLASS(module_refcnt,
 
 	TP_PROTO(struct module *mod, unsigned long ip),
@@ -86,6 +89,7 @@ DEFINE_EVENT(module_refcnt, module_put,
 
 	TP_ARGS(mod, ip)
 );
+#endif /* CONFIG_MODULE_UNLOAD */
 
 TRACE_EVENT(module_request,
 
diff --git a/kernel/module.c b/kernel/module.c
index 21591ad..d9e2379 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -59,8 +59,6 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/module.h>
 
-EXPORT_TRACEPOINT_SYMBOL(module_get);
-
 #if 0
 #define DEBUGP printk
 #else
@@ -467,6 +465,9 @@ MODINFO_ATTR(srcversion);
 static char last_unloaded_module[MODULE_NAME_LEN+1];
 
 #ifdef CONFIG_MODULE_UNLOAD
+
+EXPORT_TRACEPOINT_SYMBOL(module_get);
+
 /* Init the unload section of the module. */
 static void module_unload_init(struct module *mod)
 {
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/7] ring-buffer: Add place holder recording of dropped events
  2010-04-01  3:17 [PATCH 0/7] [GIT PULL] tracing: updates Steven Rostedt
                   ` (3 preceding siblings ...)
  2010-04-01  3:17 ` [PATCH 4/7] tracing: Fix compile error in module tracepoints when MODULE_UNLOAD not set Steven Rostedt
@ 2010-04-01  3:17 ` Steven Rostedt
  2010-04-01  3:17 ` [PATCH 6/7] tracing: Show the lost events in the trace_pipe output Steven Rostedt
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2010-04-01  3:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker, Robert Richter,
	Andi Kleen, Li Zefan, Arnaldo Carvalho de Melo,
	Luis Claudio R. Goncalves

[-- Attachment #1: 0005-ring-buffer-Add-place-holder-recording-of-dropped-ev.patch --]
[-- Type: text/plain, Size: 14322 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Currently, when the ring buffer drops events, it does not record
the fact that it did so. It does inform the writer that the event
was dropped by returning a NULL event, but it does not put in any
place holder where the event was dropped.

This is not a trivial thing to add because the ring buffer mostly
runs in overwrite (flight recorder) mode. That is, when the ring
buffer is full, new data will overwrite old data.

In a produce/consumer mode, where new data is simply dropped when
the ring buffer is full, it is trivial to add the placeholder
for dropped events. When there's more room to write new data, then
a special event can be added to notify the reader about the dropped
events.

But in overwrite mode, any new write can overwrite events. A place
holder can not be inserted into the ring buffer since there never
may be room. A reader could also come in at anytime and miss the
placeholder.

Luckily, the way the ring buffer works, the read side can find out
if events were lost or not, and how many events. Everytime a write
takes place, if it overwrites the header page (the next read) it
updates a "overrun" variable that keeps track of the number of
lost events. When a reader swaps out a page from the ring buffer,
it can record this number, perfom the swap, and then check to
see if the number changed, and take the diff if it has, which would be
the number of events dropped. This can be stored by the reader
and returned to callers of the reader.

Since the reader page swap will fail if the writer moved the head
page since the time the reader page set up the swap, this gives room
to record the overruns without worrying about races. If the reader
sets up the pages, records the overrun, than performs the swap,
if the swap succeeds, then the overrun variable has not been
updated since the setup before the swap.

For binary readers of the ring buffer, a flag is set in the header
of each sub page (sub buffer) of the ring buffer. This flag is embedded
in the size field of the data on the sub buffer, in the 31st bit (the size
can be 32 or 64 bits depending on the architecture), but only 27
bits needs to be used for the actual size (less actually).

We could add a new field in the sub buffer header to also record the
number of events dropped since the last read, but this will change the
format of the binary ring buffer a bit too much. Perhaps this change can
be made if the information on the number of events dropped is considered
important enough.

Note, the notification of dropped events is only used by consuming reads
or peeking at the ring buffer. Iterating over the ring buffer does not
keep this information because the necessary data is only available when
a page swap is made, and the iterator does not swap out pages.

Cc: Robert Richter <robert.richter@amd.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 drivers/oprofile/cpu_buffer.c        |    4 +-
 include/linux/ring_buffer.h          |    6 ++-
 kernel/trace/ring_buffer.c           |   72 +++++++++++++++++++++++++++++++---
 kernel/trace/ring_buffer_benchmark.c |    2 +-
 kernel/trace/trace.c                 |    4 +-
 kernel/trace/trace_functions_graph.c |    5 +-
 kernel/trace/trace_selftest.c        |    2 +-
 7 files changed, 79 insertions(+), 16 deletions(-)

diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
index 166b67e..7581dbe 100644
--- a/drivers/oprofile/cpu_buffer.c
+++ b/drivers/oprofile/cpu_buffer.c
@@ -186,14 +186,14 @@ int op_cpu_buffer_write_commit(struct op_entry *entry)
 struct op_sample *op_cpu_buffer_read_entry(struct op_entry *entry, int cpu)
 {
 	struct ring_buffer_event *e;
-	e = ring_buffer_consume(op_ring_buffer_read, cpu, NULL);
+	e = ring_buffer_consume(op_ring_buffer_read, cpu, NULL, NULL);
 	if (e)
 		goto event;
 	if (ring_buffer_swap_cpu(op_ring_buffer_read,
 				 op_ring_buffer_write,
 				 cpu))
 		return NULL;
-	e = ring_buffer_consume(op_ring_buffer_read, cpu, NULL);
+	e = ring_buffer_consume(op_ring_buffer_read, cpu, NULL, NULL);
 	if (e)
 		goto event;
 	return NULL;
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 5fcc31e..c829776 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -120,9 +120,11 @@ int ring_buffer_write(struct ring_buffer *buffer,
 		      unsigned long length, void *data);
 
 struct ring_buffer_event *
-ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts);
+ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts,
+		 unsigned long *lost_events);
 struct ring_buffer_event *
-ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts);
+ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts,
+		    unsigned long *lost_events);
 
 struct ring_buffer_iter *
 ring_buffer_read_start(struct ring_buffer *buffer, int cpu);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index d1187ef..8295650 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -318,6 +318,9 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data);
 #define TS_MASK		((1ULL << TS_SHIFT) - 1)
 #define TS_DELTA_TEST	(~TS_MASK)
 
+/* Flag when events were overwritten */
+#define RB_MISSED_EVENTS	(1 << 31)
+
 struct buffer_data_page {
 	u64		 time_stamp;	/* page time stamp */
 	local_t		 commit;	/* write committed index */
@@ -416,6 +419,12 @@ int ring_buffer_print_page_header(struct trace_seq *s)
 			       (unsigned int)sizeof(field.commit),
 			       (unsigned int)is_signed_type(long));
 
+	ret = trace_seq_printf(s, "\tfield: int overwrite;\t"
+			       "offset:%u;\tsize:%u;\tsigned:%u;\n",
+			       (unsigned int)offsetof(typeof(field), commit),
+			       1,
+			       (unsigned int)is_signed_type(long));
+
 	ret = trace_seq_printf(s, "\tfield: char data;\t"
 			       "offset:%u;\tsize:%u;\tsigned:%u;\n",
 			       (unsigned int)offsetof(typeof(field), data),
@@ -439,6 +448,8 @@ struct ring_buffer_per_cpu {
 	struct buffer_page		*tail_page;	/* write to tail */
 	struct buffer_page		*commit_page;	/* committed pages */
 	struct buffer_page		*reader_page;
+	unsigned long			lost_events;
+	unsigned long			last_overrun;
 	local_t				commit_overrun;
 	local_t				overrun;
 	local_t				entries;
@@ -2835,6 +2846,7 @@ static struct buffer_page *
 rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
 {
 	struct buffer_page *reader = NULL;
+	unsigned long overwrite;
 	unsigned long flags;
 	int nr_loops = 0;
 	int ret;
@@ -2896,6 +2908,18 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
 	rb_set_list_to_head(cpu_buffer, &cpu_buffer->reader_page->list);
 
 	/*
+	 * We want to make sure we read the overruns after we set up our
+	 * pointers to the next object. The writer side does a
+	 * cmpxchg to cross pages which acts as the mb on the writer
+	 * side. Note, the reader will constantly fail the swap
+	 * while the writer is updating the pointers, so this
+	 * guarantees that the overwrite recorded here is the one we
+	 * want to compare with the last_overrun.
+	 */
+	smp_mb();
+	overwrite = local_read(&(cpu_buffer->overrun));
+
+	/*
 	 * Here's the tricky part.
 	 *
 	 * We need to move the pointer past the header page.
@@ -2926,6 +2950,11 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
 	cpu_buffer->reader_page = reader;
 	rb_reset_reader_page(cpu_buffer);
 
+	if (overwrite != cpu_buffer->last_overrun) {
+		cpu_buffer->lost_events = overwrite - cpu_buffer->last_overrun;
+		cpu_buffer->last_overrun = overwrite;
+	}
+
 	goto again;
 
  out:
@@ -3002,8 +3031,14 @@ static void rb_advance_iter(struct ring_buffer_iter *iter)
 		rb_advance_iter(iter);
 }
 
+static int rb_lost_events(struct ring_buffer_per_cpu *cpu_buffer)
+{
+	return cpu_buffer->lost_events;
+}
+
 static struct ring_buffer_event *
-rb_buffer_peek(struct ring_buffer_per_cpu *cpu_buffer, u64 *ts)
+rb_buffer_peek(struct ring_buffer_per_cpu *cpu_buffer, u64 *ts,
+	       unsigned long *lost_events)
 {
 	struct ring_buffer_event *event;
 	struct buffer_page *reader;
@@ -3055,6 +3090,8 @@ rb_buffer_peek(struct ring_buffer_per_cpu *cpu_buffer, u64 *ts)
 			ring_buffer_normalize_time_stamp(cpu_buffer->buffer,
 							 cpu_buffer->cpu, ts);
 		}
+		if (lost_events)
+			*lost_events = rb_lost_events(cpu_buffer);
 		return event;
 
 	default:
@@ -3165,12 +3202,14 @@ static inline int rb_ok_to_lock(void)
  * @buffer: The ring buffer to read
  * @cpu: The cpu to peak at
  * @ts: The timestamp counter of this event.
+ * @lost_events: a variable to store if events were lost (may be NULL)
  *
  * This will return the event that will be read next, but does
  * not consume the data.
  */
 struct ring_buffer_event *
-ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts)
+ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts,
+		 unsigned long *lost_events)
 {
 	struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
 	struct ring_buffer_event *event;
@@ -3185,7 +3224,7 @@ ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts)
 	local_irq_save(flags);
 	if (dolock)
 		spin_lock(&cpu_buffer->reader_lock);
-	event = rb_buffer_peek(cpu_buffer, ts);
+	event = rb_buffer_peek(cpu_buffer, ts, lost_events);
 	if (event && event->type_len == RINGBUF_TYPE_PADDING)
 		rb_advance_reader(cpu_buffer);
 	if (dolock)
@@ -3227,13 +3266,17 @@ ring_buffer_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
 /**
  * ring_buffer_consume - return an event and consume it
  * @buffer: The ring buffer to get the next event from
+ * @cpu: the cpu to read the buffer from
+ * @ts: a variable to store the timestamp (may be NULL)
+ * @lost_events: a variable to store if events were lost (may be NULL)
  *
  * Returns the next event in the ring buffer, and that event is consumed.
  * Meaning, that sequential reads will keep returning a different event,
  * and eventually empty the ring buffer if the producer is slower.
  */
 struct ring_buffer_event *
-ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts)
+ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts,
+		    unsigned long *lost_events)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct ring_buffer_event *event = NULL;
@@ -3254,9 +3297,11 @@ ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts)
 	if (dolock)
 		spin_lock(&cpu_buffer->reader_lock);
 
-	event = rb_buffer_peek(cpu_buffer, ts);
-	if (event)
+	event = rb_buffer_peek(cpu_buffer, ts, lost_events);
+	if (event) {
+		cpu_buffer->lost_events = 0;
 		rb_advance_reader(cpu_buffer);
+	}
 
 	if (dolock)
 		spin_unlock(&cpu_buffer->reader_lock);
@@ -3405,6 +3450,9 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
 	cpu_buffer->write_stamp = 0;
 	cpu_buffer->read_stamp = 0;
 
+	cpu_buffer->lost_events = 0;
+	cpu_buffer->last_overrun = 0;
+
 	rb_head_page_activate(cpu_buffer);
 }
 
@@ -3684,6 +3732,7 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
 	unsigned int commit;
 	unsigned int read;
 	u64 save_timestamp;
+	int missed_events = 0;
 	int ret = -1;
 
 	if (!cpumask_test_cpu(cpu, buffer->cpumask))
@@ -3716,6 +3765,10 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
 	read = reader->read;
 	commit = rb_page_commit(reader);
 
+	/* Check if any events were dropped */
+	if (cpu_buffer->lost_events)
+		missed_events = 1;
+
 	/*
 	 * If this page has been partially read or
 	 * if len is not big enough to read the rest of the page or
@@ -3779,6 +3832,13 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
 	}
 	ret = read;
 
+	cpu_buffer->lost_events = 0;
+	/*
+	 * Set a flag in the commit field if we lost events
+	 */
+	if (missed_events)
+		local_add(RB_MISSED_EVENTS, &bpage->commit);
+
  out_unlock:
 	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index df74c79..dc56556 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -81,7 +81,7 @@ static enum event_status read_event(int cpu)
 	int *entry;
 	u64 ts;
 
-	event = ring_buffer_consume(buffer, cpu, &ts);
+	event = ring_buffer_consume(buffer, cpu, &ts, NULL);
 	if (!event)
 		return EVENT_DROPPED;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 3ec2ee6..fabb003 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1556,7 +1556,7 @@ peek_next_entry(struct trace_iterator *iter, int cpu, u64 *ts)
 	if (buf_iter)
 		event = ring_buffer_iter_peek(buf_iter, ts);
 	else
-		event = ring_buffer_peek(iter->tr->buffer, cpu, ts);
+		event = ring_buffer_peek(iter->tr->buffer, cpu, ts, NULL);
 
 	ftrace_enable_cpu();
 
@@ -1635,7 +1635,7 @@ static void trace_consume(struct trace_iterator *iter)
 {
 	/* Don't allow ftrace to trace into the ring buffers */
 	ftrace_disable_cpu();
-	ring_buffer_consume(iter->tr->buffer, iter->cpu, &iter->ts);
+	ring_buffer_consume(iter->tr->buffer, iter->cpu, &iter->ts, NULL);
 	ftrace_enable_cpu();
 }
 
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index e6989d9..a7f75fb 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -489,9 +489,10 @@ get_return_for_leaf(struct trace_iterator *iter,
 			 * We need to consume the current entry to see
 			 * the next one.
 			 */
-			ring_buffer_consume(iter->tr->buffer, iter->cpu, NULL);
+			ring_buffer_consume(iter->tr->buffer, iter->cpu,
+					    NULL, NULL);
 			event = ring_buffer_peek(iter->tr->buffer, iter->cpu,
-						 NULL);
+						 NULL, NULL);
 		}
 
 		if (!event)
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 280fea4..e501808 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -29,7 +29,7 @@ static int trace_test_buffer_cpu(struct trace_array *tr, int cpu)
 	struct trace_entry *entry;
 	unsigned int loops = 0;
 
-	while ((event = ring_buffer_consume(tr->buffer, cpu, NULL))) {
+	while ((event = ring_buffer_consume(tr->buffer, cpu, NULL, NULL))) {
 		entry = ring_buffer_event_data(event);
 
 		/*
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/7] tracing: Show the lost events in the trace_pipe output
  2010-04-01  3:17 [PATCH 0/7] [GIT PULL] tracing: updates Steven Rostedt
                   ` (4 preceding siblings ...)
  2010-04-01  3:17 ` [PATCH 5/7] ring-buffer: Add place holder recording of dropped events Steven Rostedt
@ 2010-04-01  3:17 ` Steven Rostedt
  2010-04-01  3:17 ` [PATCH 7/7] ring-buffer: Add lost event count to end of sub buffer Steven Rostedt
  2010-04-02 16:39 ` [PATCH 0/7] [GIT PULL] tracing: updates Ingo Molnar
  7 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2010-04-01  3:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker

[-- Attachment #1: 0006-tracing-Show-the-lost-events-in-the-trace_pipe-outpu.patch --]
[-- Type: text/plain, Size: 6051 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Now that the ring buffer can keep track of where events are lost.
Use this information to the output of trace_pipe:

       hackbench-3588  [001]  1326.701660: lock_acquire: ffffffff816591e0 read rcu_read_lock
       hackbench-3588  [001]  1326.701661: lock_acquire: ffff88003f4091f0 &(&dentry->d_lock)->rlock
       hackbench-3588  [001]  1326.701664: lock_release: ffff88003f4091f0 &(&dentry->d_lock)->rlock
CPU:1 [LOST 673 EVENTS]
       hackbench-3588  [001]  1326.702711: kmem_cache_free: call_site=ffffffff81102b85 ptr=ffff880026d96738
       hackbench-3588  [001]  1326.702712: lock_release: ffff88003e1480a8 &mm->mmap_sem
       hackbench-3588  [001]  1326.702713: lock_acquire: ffff88003e1480a8 &mm->mmap_sem

Even works with the function graph tracer:

 2) ! 170.098 us  |                                            }
 2)   4.036 us    |                                            rcu_irq_exit();
 2)   3.657 us    |                                            idle_cpu();
 2) ! 190.301 us  |                                          }
CPU:2 [LOST 2196 EVENTS]
 2)   0.853 us    |                            } /* cancel_dirty_page */
 2)               |                            remove_from_page_cache() {
 2)   1.578 us    |                              _raw_spin_lock_irq();
 2)               |                              __remove_from_page_cache() {

Note, it does not work with the iterator "trace" file, since it requires
the use of consuming the page from the ring buffer to determine how many
events were lost, which the iterator does not do.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace_event.h |    1 +
 kernel/trace/trace.c         |   30 ++++++++++++++++++++++--------
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index c0f4b36..39e71b0 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -58,6 +58,7 @@ struct trace_iterator {
 	/* The below is zeroed out in pipe_read */
 	struct trace_seq	seq;
 	struct trace_entry	*ent;
+	unsigned long		lost_events;
 	int			leftover;
 	int			cpu;
 	u64			ts;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index fabb003..0498beb 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1545,7 +1545,8 @@ static void trace_iterator_increment(struct trace_iterator *iter)
 }
 
 static struct trace_entry *
-peek_next_entry(struct trace_iterator *iter, int cpu, u64 *ts)
+peek_next_entry(struct trace_iterator *iter, int cpu, u64 *ts,
+		unsigned long *lost_events)
 {
 	struct ring_buffer_event *event;
 	struct ring_buffer_iter *buf_iter = iter->buffer_iter[cpu];
@@ -1556,7 +1557,8 @@ peek_next_entry(struct trace_iterator *iter, int cpu, u64 *ts)
 	if (buf_iter)
 		event = ring_buffer_iter_peek(buf_iter, ts);
 	else
-		event = ring_buffer_peek(iter->tr->buffer, cpu, ts, NULL);
+		event = ring_buffer_peek(iter->tr->buffer, cpu, ts,
+					 lost_events);
 
 	ftrace_enable_cpu();
 
@@ -1564,10 +1566,12 @@ peek_next_entry(struct trace_iterator *iter, int cpu, u64 *ts)
 }
 
 static struct trace_entry *
-__find_next_entry(struct trace_iterator *iter, int *ent_cpu, u64 *ent_ts)
+__find_next_entry(struct trace_iterator *iter, int *ent_cpu,
+		  unsigned long *missing_events, u64 *ent_ts)
 {
 	struct ring_buffer *buffer = iter->tr->buffer;
 	struct trace_entry *ent, *next = NULL;
+	unsigned long lost_events, next_lost = 0;
 	int cpu_file = iter->cpu_file;
 	u64 next_ts = 0, ts;
 	int next_cpu = -1;
@@ -1580,7 +1584,7 @@ __find_next_entry(struct trace_iterator *iter, int *ent_cpu, u64 *ent_ts)
 	if (cpu_file > TRACE_PIPE_ALL_CPU) {
 		if (ring_buffer_empty_cpu(buffer, cpu_file))
 			return NULL;
-		ent = peek_next_entry(iter, cpu_file, ent_ts);
+		ent = peek_next_entry(iter, cpu_file, ent_ts, missing_events);
 		if (ent_cpu)
 			*ent_cpu = cpu_file;
 
@@ -1592,7 +1596,7 @@ __find_next_entry(struct trace_iterator *iter, int *ent_cpu, u64 *ent_ts)
 		if (ring_buffer_empty_cpu(buffer, cpu))
 			continue;
 
-		ent = peek_next_entry(iter, cpu, &ts);
+		ent = peek_next_entry(iter, cpu, &ts, &lost_events);
 
 		/*
 		 * Pick the entry with the smallest timestamp:
@@ -1601,6 +1605,7 @@ __find_next_entry(struct trace_iterator *iter, int *ent_cpu, u64 *ent_ts)
 			next = ent;
 			next_cpu = cpu;
 			next_ts = ts;
+			next_lost = lost_events;
 		}
 	}
 
@@ -1610,6 +1615,9 @@ __find_next_entry(struct trace_iterator *iter, int *ent_cpu, u64 *ent_ts)
 	if (ent_ts)
 		*ent_ts = next_ts;
 
+	if (missing_events)
+		*missing_events = next_lost;
+
 	return next;
 }
 
@@ -1617,13 +1625,14 @@ __find_next_entry(struct trace_iterator *iter, int *ent_cpu, u64 *ent_ts)
 struct trace_entry *trace_find_next_entry(struct trace_iterator *iter,
 					  int *ent_cpu, u64 *ent_ts)
 {
-	return __find_next_entry(iter, ent_cpu, ent_ts);
+	return __find_next_entry(iter, ent_cpu, NULL, ent_ts);
 }
 
 /* Find the next real entry, and increment the iterator to the next entry */
 static void *find_next_entry_inc(struct trace_iterator *iter)
 {
-	iter->ent = __find_next_entry(iter, &iter->cpu, &iter->ts);
+	iter->ent = __find_next_entry(iter, &iter->cpu,
+				      &iter->lost_events, &iter->ts);
 
 	if (iter->ent)
 		trace_iterator_increment(iter);
@@ -1635,7 +1644,8 @@ static void trace_consume(struct trace_iterator *iter)
 {
 	/* Don't allow ftrace to trace into the ring buffers */
 	ftrace_disable_cpu();
-	ring_buffer_consume(iter->tr->buffer, iter->cpu, &iter->ts, NULL);
+	ring_buffer_consume(iter->tr->buffer, iter->cpu, &iter->ts,
+			    &iter->lost_events);
 	ftrace_enable_cpu();
 }
 
@@ -2030,6 +2040,10 @@ static enum print_line_t print_trace_line(struct trace_iterator *iter)
 {
 	enum print_line_t ret;
 
+	if (iter->lost_events)
+		trace_seq_printf(&iter->seq, "CPU:%d [LOST %lu EVENTS]\n",
+				 iter->cpu, iter->lost_events);
+
 	if (iter->trace && iter->trace->print_line) {
 		ret = iter->trace->print_line(iter);
 		if (ret != TRACE_TYPE_UNHANDLED)
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 7/7] ring-buffer: Add lost event count to end of sub buffer
  2010-04-01  3:17 [PATCH 0/7] [GIT PULL] tracing: updates Steven Rostedt
                   ` (5 preceding siblings ...)
  2010-04-01  3:17 ` [PATCH 6/7] tracing: Show the lost events in the trace_pipe output Steven Rostedt
@ 2010-04-01  3:17 ` Steven Rostedt
  2010-04-01  6:13   ` Frederic Weisbecker
  2010-04-02 16:39 ` [PATCH 0/7] [GIT PULL] tracing: updates Ingo Molnar
  7 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2010-04-01  3:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker

[-- Attachment #1: 0007-ring-buffer-Add-lost-event-count-to-end-of-sub-buffe.patch --]
[-- Type: text/plain, Size: 4032 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Currently, binary readers of the ring buffer only know where events were
lost, but not how many events were lost at that location.
This information is available, but it would require adding another
field to the sub buffer header to include it.

But when a event can not fit at the end of a sub buffer, it is written
to the next sub buffer. This means there is a good chance that the
buffer may have room to hold this counter. If it does, write
the counter at the end of the sub buffer and set another flag
in the data size field that states that this information exists.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |   37 +++++++++++++++++++++++++++++++++----
 1 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 8295650..dc6d563 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -320,6 +320,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data);
 
 /* Flag when events were overwritten */
 #define RB_MISSED_EVENTS	(1 << 31)
+/* Missed count stored at end */
+#define RB_MISSED_STORED	(1 << 30)
 
 struct buffer_data_page {
 	u64		 time_stamp;	/* page time stamp */
@@ -340,6 +342,7 @@ struct buffer_page {
 	local_t		 write;		/* index for next write */
 	unsigned	 read;		/* index for next read */
 	local_t		 entries;	/* entries on this page */
+	unsigned long	 real_end;	/* real end of data */
 	struct buffer_data_page *page;	/* Actual data page */
 };
 
@@ -1770,6 +1773,13 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
 	kmemcheck_annotate_bitfield(event, bitfield);
 
 	/*
+	 * Save the original length to the meta data.
+	 * This will be used by the reader to add lost event
+	 * counter.
+	 */
+	tail_page->real_end = tail;
+
+	/*
 	 * If this event is bigger than the minimum size, then
 	 * we need to be careful that we don't subtract the
 	 * write counter enough to allow another writer to slip
@@ -2888,6 +2898,7 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
 	local_set(&cpu_buffer->reader_page->write, 0);
 	local_set(&cpu_buffer->reader_page->entries, 0);
 	local_set(&cpu_buffer->reader_page->page->commit, 0);
+	cpu_buffer->reader_page->real_end = 0;
 
  spin:
 	/*
@@ -3728,11 +3739,11 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
 	struct ring_buffer_event *event;
 	struct buffer_data_page *bpage;
 	struct buffer_page *reader;
+	unsigned long missed_events;
 	unsigned long flags;
 	unsigned int commit;
 	unsigned int read;
 	u64 save_timestamp;
-	int missed_events = 0;
 	int ret = -1;
 
 	if (!cpumask_test_cpu(cpu, buffer->cpumask))
@@ -3766,8 +3777,7 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
 	commit = rb_page_commit(reader);
 
 	/* Check if any events were dropped */
-	if (cpu_buffer->lost_events)
-		missed_events = 1;
+	missed_events = cpu_buffer->lost_events;
 
 	/*
 	 * If this page has been partially read or
@@ -3829,6 +3839,14 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
 		local_set(&reader->entries, 0);
 		reader->read = 0;
 		*data_page = bpage;
+
+		/*
+		 * Use the real_end for the data size,
+		 * This gives us a chance to store the lost events
+		 * on the page.
+		 */
+		if (reader->real_end)
+			local_set(&bpage->commit, reader->real_end);
 	}
 	ret = read;
 
@@ -3836,8 +3854,19 @@ int ring_buffer_read_page(struct ring_buffer *buffer,
 	/*
 	 * Set a flag in the commit field if we lost events
 	 */
-	if (missed_events)
+	if (missed_events) {
+		commit = local_read(&bpage->commit);
+
+		/* If there is room at the end of the page to save the
+		 * missed events, then record it there.
+		 */
+		if (BUF_PAGE_SIZE - commit >= sizeof(missed_events)) {
+			memcpy(&bpage->data[commit], &missed_events,
+			       sizeof(missed_events));
+			local_add(RB_MISSED_STORED, &bpage->commit);
+		}
 		local_add(RB_MISSED_EVENTS, &bpage->commit);
+	}
 
  out_unlock:
 	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
-- 
1.7.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 7/7] ring-buffer: Add lost event count to end of sub buffer
  2010-04-01  3:17 ` [PATCH 7/7] ring-buffer: Add lost event count to end of sub buffer Steven Rostedt
@ 2010-04-01  6:13   ` Frederic Weisbecker
  2010-04-01 12:23     ` Steven Rostedt
  0 siblings, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2010-04-01  6:13 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Ingo Molnar, Andrew Morton

On Wed, Mar 31, 2010 at 11:17:42PM -0400, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
> 
> Currently, binary readers of the ring buffer only know where events were
> lost, but not how many events were lost at that location.
> This information is available, but it would require adding another
> field to the sub buffer header to include it.
> 
> But when a event can not fit at the end of a sub buffer, it is written
> to the next sub buffer. This means there is a good chance that the
> buffer may have room to hold this counter. If it does, write
> the counter at the end of the sub buffer and set another flag
> in the data size field that states that this information exists.


Since you can not ensure you have a room for that,
is there no way to delay the update of cpu_buffer->last_overrun
until you find a large enough end of page for that?

I mean, you do this in rb_get_page():

       if (overwrite != cpu_buffer->last_overrun) {
               cpu_buffer->lost_events = overwrite - cpu_buffer->last_overrun;
               cpu_buffer->last_overrun = overwrite;
       }

If you don't have enough room to put lost_events in the end of
the buffer, you could keep as is cpu_buffer->last_overrun,
so the next time you swap the reader page, the new delta
will be added to the previous that you couldn't transmit.

That way you ensure the user always know when you lose events.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 7/7] ring-buffer: Add lost event count to end of sub buffer
  2010-04-01  6:13   ` Frederic Weisbecker
@ 2010-04-01 12:23     ` Steven Rostedt
  2010-04-08 23:42       ` Frederic Weisbecker
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2010-04-01 12:23 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: linux-kernel, Ingo Molnar, Andrew Morton

On Thu, 2010-04-01 at 08:13 +0200, Frederic Weisbecker wrote:
> On Wed, Mar 31, 2010 at 11:17:42PM -0400, Steven Rostedt wrote:

> If you don't have enough room to put lost_events in the end of
> the buffer, you could keep as is cpu_buffer->last_overrun,
> so the next time you swap the reader page, the new delta
> will be added to the previous that you couldn't transmit.
> 
> That way you ensure the user always know when you lose events.

When is not the issue since we set the MISSING_EVENTS flag already. It's
how many were lost at that time that we do not store.

I actually found in most cases there is room, and trace-cmd can report
it. When there is no room, it just says events were dropped without
reporting the actual number.

This is what it shows when the number can be stored:

       trace-cmd-4980  [002] 32784.303699: lock_acquire:         0xffff88003ebf0e08 &(&sbi->s_rsv_window_lock)->rlock
       trace-cmd-4980  [002] 32784.303700: lock_release:         0xffff88003ebf0e08 &(&sbi->s_rsv_window_lock)->rlock
CPU:1 [415 EVENTS DROPPED]
       trace-cmd-4981  [001] 32784.303701: lock_release:         0xffff88000c49d7d0 &ei->truncate_mutex
       trace-cmd-4981  [001] 32784.303704: lock_acquire:         0xffffffff816591e0 read rcu_read_lock
       trace-cmd-4980  [002] 32784.303705: lock_acquire:         0xffff88003c0719d8 &(&bgl->locks[i].lock)->rlock

This is what it shows when the number can not:

          <idle>-0     [002] 32784.304459: lock_acquire:         0xffffffff816591e0 read rcu_read_lock
          <idle>-0     [002] 32784.304464: lock_release:         0xffffffff816591e0 rcu_read_lock
CPU:1 [EVENTS DROPPED]
              ls-4984  [001] 32784.304471: lock_acquire:         0xffff88003f8b8018 &(&base->lock)->rlock
              ls-4984  [001] 32784.304471: lock_release:         0xffff88003f8b8018 &(&base->lock)->rlock
              ls-4984  [001] 32784.304471: softirq_exit:         vec=1 [action=TIMER]

The problem with storing it later is that there is no guarantee that it
will be stored later. How far should the output look for that number. By
delaying when to print it, the output would need to search for that
number.

What happens if events were lost again before we found the number? Do we
add them together? Then we wont know how many events were lost between
the two spots where the events were lost. That is, if 448 events were
lost but we could not store the number, and then 318 events were lost,
and now we can store the number, do we add them together and report 766
events lost?

The end of the report still gives the total number lost, as it has
always done. Now I'm trying to report where they are lost.

Also note, in practice, the majority of the time we can record the
number:

  $ trace-cmd report | grep EVENTS
CPU:1 [82 EVENTS DROPPED]
CPU:0 [79 EVENTS DROPPED]
CPU:2 [75 EVENTS DROPPED]
CPU:1 [86 EVENTS DROPPED]
CPU:0 [87 EVENTS DROPPED]
CPU:2 [EVENTS DROPPED]
CPU:0 [87 EVENTS DROPPED]
CPU:0 [79 EVENTS DROPPED]
CPU:1 [177 EVENTS DROPPED]
CPU:0 [78 EVENTS DROPPED]
CPU:0 [73 EVENTS DROPPED]
CPU:1 [163 EVENTS DROPPED]
CPU:2 [86 EVENTS DROPPED]
CPU:0 [75 EVENTS DROPPED]
CPU:2 [89 EVENTS DROPPED]
CPU:0 [EVENTS DROPPED]
CPU:0 [75 EVENTS DROPPED]
CPU:1 [EVENTS DROPPED]
CPU:2 [EVENTS DROPPED]
CPU:0 [77 EVENTS DROPPED]
CPU:0 [EVENTS DROPPED]
CPU:0 [78 EVENTS DROPPED]
CPU:1 [90 EVENTS DROPPED]
CPU:2 [86 EVENTS DROPPED]
CPU:1 [87 EVENTS DROPPED]
CPU:2 [84 EVENTS DROPPED]
CPU:2 [82 EVENTS DROPPED]
CPU:1 [90 EVENTS DROPPED]
CPU:1 [89 EVENTS DROPPED]
CPU:2 [EVENTS DROPPED]
CPU:0 [EVENTS DROPPED]
CPU:2 [164 EVENTS DROPPED]
CPU:1 [EVENTS DROPPED]
CPU:0 [79 EVENTS DROPPED]
CPU:0 [89 EVENTS DROPPED]

If we really want this number every time, then we need to add it to the
sub buffer header, and not hope there was room to store it.

-- Steve

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/7] [GIT PULL] tracing: updates
  2010-04-01  3:17 [PATCH 0/7] [GIT PULL] tracing: updates Steven Rostedt
                   ` (6 preceding siblings ...)
  2010-04-01  3:17 ` [PATCH 7/7] ring-buffer: Add lost event count to end of sub buffer Steven Rostedt
@ 2010-04-02 16:39 ` Ingo Molnar
  7 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2010-04-02 16:39 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Andrew Morton, Frederic Weisbecker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Ingo,
> 
> Please pull the latest tip/tracing/core tree, which can be found at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> tip/tracing/core
> 
> 
> Li Zefan (3):
>       tracing: Convert some signal events to DEFINE_TRACE
>       tracing: Update comments
>       tracing: Remove side effect from module tracepoints that caused a GPF
> 
> Steven Rostedt (4):
>       tracing: Fix compile error in module tracepoints when MODULE_UNLOAD not set
>       ring-buffer: Add place holder recording of dropped events
>       tracing: Show the lost events in the trace_pipe output
>       ring-buffer: Add lost event count to end of sub buffer
> 
> ----
>  drivers/oprofile/cpu_buffer.c        |    4 +-
>  include/linux/ftrace_event.h         |    1 +
>  include/linux/module.h               |    6 +-
>  include/linux/ring_buffer.h          |    6 +-
>  include/trace/events/module.h        |   18 ++++--
>  include/trace/events/signal.h        |   52 +++++++----------
>  include/trace/ftrace.h               |   33 ++++++-----
>  kernel/module.c                      |    8 +-
>  kernel/trace/ring_buffer.c           |  101 ++++++++++++++++++++++++++++++++--
>  kernel/trace/ring_buffer_benchmark.c |    2 +-
>  kernel/trace/trace.c                 |   30 +++++++---
>  kernel/trace/trace_functions_graph.c |    5 +-
>  kernel/trace/trace_selftest.c        |    2 +-
>  13 files changed, 187 insertions(+), 81 deletions(-)

Pulled, thanks a lot Steve!

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 7/7] ring-buffer: Add lost event count to end of sub buffer
  2010-04-01 12:23     ` Steven Rostedt
@ 2010-04-08 23:42       ` Frederic Weisbecker
  0 siblings, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2010-04-08 23:42 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Ingo Molnar, Andrew Morton

On Thu, Apr 01, 2010 at 08:23:36AM -0400, Steven Rostedt wrote:
> On Thu, 2010-04-01 at 08:13 +0200, Frederic Weisbecker wrote:
> > On Wed, Mar 31, 2010 at 11:17:42PM -0400, Steven Rostedt wrote:
> 
> > If you don't have enough room to put lost_events in the end of
> > the buffer, you could keep as is cpu_buffer->last_overrun,
> > so the next time you swap the reader page, the new delta
> > will be added to the previous that you couldn't transmit.
> > 
> > That way you ensure the user always know when you lose events.
> 
> When is not the issue since we set the MISSING_EVENTS flag already. It's
> how many were lost at that time that we do not store.


Ah ok.


 
> I actually found in most cases there is room, and trace-cmd can report
> it. When there is no room, it just says events were dropped without
> reporting the actual number.
> 
> 
> This is what it shows when the number can be stored:
> 
>        trace-cmd-4980  [002] 32784.303699: lock_acquire:         0xffff88003ebf0e08 &(&sbi->s_rsv_window_lock)->rlock
>        trace-cmd-4980  [002] 32784.303700: lock_release:         0xffff88003ebf0e08 &(&sbi->s_rsv_window_lock)->rlock
> CPU:1 [415 EVENTS DROPPED]
>        trace-cmd-4981  [001] 32784.303701: lock_release:         0xffff88000c49d7d0 &ei->truncate_mutex
>        trace-cmd-4981  [001] 32784.303704: lock_acquire:         0xffffffff816591e0 read rcu_read_lock
>        trace-cmd-4980  [002] 32784.303705: lock_acquire:         0xffff88003c0719d8 &(&bgl->locks[i].lock)->rlock
> 
> This is what it shows when the number can not:
> 
>           <idle>-0     [002] 32784.304459: lock_acquire:         0xffffffff816591e0 read rcu_read_lock
>           <idle>-0     [002] 32784.304464: lock_release:         0xffffffff816591e0 rcu_read_lock
> CPU:1 [EVENTS DROPPED]
>               ls-4984  [001] 32784.304471: lock_acquire:         0xffff88003f8b8018 &(&base->lock)->rlock
>               ls-4984  [001] 32784.304471: lock_release:         0xffff88003f8b8018 &(&base->lock)->rlock
>               ls-4984  [001] 32784.304471: softirq_exit:         vec=1 [action=TIMER]
> 
> 
> The problem with storing it later is that there is no guarantee that it
> will be stored later. How far should the output look for that number. By
> delaying when to print it, the output would need to search for that
> number.
> 
> What happens if events were lost again before we found the number? Do we
> add them together? Then we wont know how many events were lost between
> the two spots where the events were lost. That is, if 448 events were
> lost but we could not store the number, and then 318 events were lost,
> and now we can store the number, do we add them together and report 766
> events lost?


This is what I thought about yeah, adding them.



> The end of the report still gives the total number lost, as it has
> always done. Now I'm trying to report where they are lost.


Ok.


 
> Also note, in practice, the majority of the time we can record the
> number:
> 
>   $ trace-cmd report | grep EVENTS
> CPU:1 [82 EVENTS DROPPED]
> CPU:0 [79 EVENTS DROPPED]
> CPU:2 [75 EVENTS DROPPED]
> CPU:1 [86 EVENTS DROPPED]
> CPU:0 [87 EVENTS DROPPED]
> CPU:2 [EVENTS DROPPED]
> CPU:0 [87 EVENTS DROPPED]
> CPU:0 [79 EVENTS DROPPED]
> CPU:1 [177 EVENTS DROPPED]
> CPU:0 [78 EVENTS DROPPED]
> CPU:0 [73 EVENTS DROPPED]
> CPU:1 [163 EVENTS DROPPED]
> CPU:2 [86 EVENTS DROPPED]
> CPU:0 [75 EVENTS DROPPED]
> CPU:2 [89 EVENTS DROPPED]
> CPU:0 [EVENTS DROPPED]
> CPU:0 [75 EVENTS DROPPED]
> CPU:1 [EVENTS DROPPED]
> CPU:2 [EVENTS DROPPED]
> CPU:0 [77 EVENTS DROPPED]
> CPU:0 [EVENTS DROPPED]
> CPU:0 [78 EVENTS DROPPED]
> CPU:1 [90 EVENTS DROPPED]
> CPU:2 [86 EVENTS DROPPED]
> CPU:1 [87 EVENTS DROPPED]
> CPU:2 [84 EVENTS DROPPED]
> CPU:2 [82 EVENTS DROPPED]
> CPU:1 [90 EVENTS DROPPED]
> CPU:1 [89 EVENTS DROPPED]
> CPU:2 [EVENTS DROPPED]
> CPU:0 [EVENTS DROPPED]
> CPU:2 [164 EVENTS DROPPED]
> CPU:1 [EVENTS DROPPED]
> CPU:0 [79 EVENTS DROPPED]
> CPU:0 [89 EVENTS DROPPED]
> 
> 
> If we really want this number every time, then we need to add it to the
> sub buffer header, and not hope there was room to store it.


Yeah. My biggest worry was that the user may not be reported of some
lost events. But as far as we have the total and also a flag that at
least reports we missed events (whatever the number there), then
it's fine.

Thanks.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-04-08 23:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-01  3:17 [PATCH 0/7] [GIT PULL] tracing: updates Steven Rostedt
2010-04-01  3:17 ` [PATCH 1/7] tracing: Convert some signal events to DEFINE_TRACE Steven Rostedt
2010-04-01  3:17 ` [PATCH 2/7] tracing: Update comments Steven Rostedt
2010-04-01  3:17 ` [PATCH 3/7] tracing: Remove side effect from module tracepoints that caused a GPF Steven Rostedt
2010-04-01  3:17 ` [PATCH 4/7] tracing: Fix compile error in module tracepoints when MODULE_UNLOAD not set Steven Rostedt
2010-04-01  3:17 ` [PATCH 5/7] ring-buffer: Add place holder recording of dropped events Steven Rostedt
2010-04-01  3:17 ` [PATCH 6/7] tracing: Show the lost events in the trace_pipe output Steven Rostedt
2010-04-01  3:17 ` [PATCH 7/7] ring-buffer: Add lost event count to end of sub buffer Steven Rostedt
2010-04-01  6:13   ` Frederic Weisbecker
2010-04-01 12:23     ` Steven Rostedt
2010-04-08 23:42       ` Frederic Weisbecker
2010-04-02 16:39 ` [PATCH 0/7] [GIT PULL] tracing: updates Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).