[PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]
@ 2009-09-04 23:55 Steven Rostedt
  2009-09-04 23:55 ` [PATCH 01/18] ring-buffer: do not reset while in a commit Steven Rostedt
                   ` (19 more replies)
  0 siblings, 20 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

Thomas,

I can port these over to -rt if you want, or you could just
add them yourself. I know the spinlocks need to be converted to atomic.
Just let me know if you want me to do it.

Ingo,

This patch series started out fixing a single bug that Arnaldo ran into
with the -rt patch series. I've hit it too and wrote a patch to solve it.
Unfortunately the patch was racey and error prone and would crash easily
in Arnaldo's setup.

The problem was with the wakeup latency tracer. When it was developed,
it controlled what went into the ring buffer. But now that we have
events, which can write into the ring buffer at anytime unless they are
disabled. When a max latency is encountered, the buffers are swapped
and the running ring buffer is reset. But if a commit is happening
at this time, it will be corrupted.

Luckily the ring buffer is robust enough to detect this and instead of
crashing the kernel, it safely disables the ring buffer and outputs
a back trace. But this does not help the developer that wants the trace.

Talking with Thomas Gleixner, we both agreed that resetting the ring
buffer on the fly was dangerous. Instead we should insert a marker and
use that as the start of the trace.

I thought this would be an easy change, but after a week of debugging,
fixing bugs just to create two new ones for every one I fixed, this
was not as easy as I thought.

The problem was that you can not write a marker in a buffer that is
not on the same CPU as the writer. I tried to add flags to make the
ring buffer reset itself when a new write happens on that CPU but
that just proved to be even more racey.

Finally (something I should have realized from day 1, not 4 days later)
I realized I already had a marker. The latency tracers always record
the timestamp of when the trace began. I can simply ignore any trace
event that happened before that timestamp. This ended up working out
very well. The code is quite simple and solid.

But this journey through the marker bits was not all in vain. I actually
came across several outstanding bugs in both the tracer and the ring buffer.
Nothing major, but enough to be fixed.

Not just the latency tracers could cause a corruption, but the reset
of the ring buffers by the switching of plugins could also cause it.
All resets must synchronize the disabling of the ring buffers with any
current writers.

Next I found that the swapping of the buffers with the wakeup tracer
was causing issues. The events would use the trace_array variable *tr
to access the buffers. tr->buffer to reserve space on the buffer
and then tr->buffer to commit it. The problem is that the wakeup
tracer swaps the tr->buffer with the max_tr.buffer. The commit can
happen on another buffer than what it started with. This again would
be detected by the ring buffer and would shut it down. The solution
here was to change the API to the writing of ftrace to pass in
the buffer directly and not the trace_array. Thus the events would pass
in the same buffer for both the reserve and the commit. It's OK
to update the max_tr on the wakeup tracer.

The irqsoff trace posed a different problem. It does not swap the
buffers, but it swaps a per cpu buffer within the buffer. This
can be done because the irqsoff tracer only records per cpu and does
not record the events of other cpus. But if a latency is detected
in the middle of a commit (commits can be preempted by interrupts) then
we can not swap the buffer. This is a tougher problem and I currently
do not have solution since it would require a copy of one buffer to the
other. Perhaps in the future. But instead of just failing, I added a
print into the buffer to notify the users that this has occurred.
They will now see:

# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 2.6.31-rc5
# --------------------------------------------------------------------
# latency: 151 us, #1/1, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
#    -----------------
#    | task: -4328 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
#  => started at: save_args
#  => ended at:   __do_softirq
#
#
#                  _------=> CPU#            
#                 / _-----=> irqs-off        
#                | / _----=> need-resched    
#                || / _---=> hardirq/softirq 
#                ||| / _--=> preempt-depth   
#                |||| /                      
#                |||||     delay             
#  cmd     pid   ||||| time  |   caller      
#     \   /      |||||   \   |   /           
    bash-4328    3d.s5  167us : update_max_tr_single: Failed to swap buffers due to commit in progress

Which is better than a wrong trace. It still shows the max latency and the
start and stop points. Note, this run had lockdep enabled with locking
events on, which spews a lot of events after a latency has been hit
(the 167 entry with respect to the 151 max). This should not be too big of
a deal since it took a while loop of constant resettng of the max latency
and greping for this failure to hit it.

Because the irqsoff tracer swaps the internal ring buffer cpus,
a check needs to be added in the recording to handle this. Because this
check is only needed for swapping the internal buffers, and the irqsoff
(and preemptoff) tracer is the only user. I only do this if they
are configured (keep the overhead down).

Please pull the latest tip/tracing/core tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/core

Steven Rostedt (18):
      ring-buffer: do not reset while in a commit
      ring-buffer: do not swap buffers during a commit
      ring-buffer: remove unnecessary cpu_relax
      ring-buffer: fix ring_buffer_read crossing pages
      ring-buffer: remove ring_buffer_event_discard
      ring-buffer: do not count discarded events
      ring-buffer: disable all cpu buffers when one finds a problem
      tracing: print out start and stop in latency traces
      tracing: disable update max tracer while reading trace
      tracing: disable buffers and synchronize_sched before resetting
      tracing: remove users of tracing_reset
      tracing: use timestamp to determine start of latency traces
      tracing: make tracing_reset safe for external use
      tracing: pass around ring buffer instead of tracer
      tracing: add trace_array_printk for internal tracers to use
      tracing: report error in trace if we fail to swap latency buffer
      ring-buffer: check for swapped buffers in start of committing
      ring-buffer: only enable ring_buffer_swap_cpu when needed

----
 include/linux/ftrace_event.h         |   15 ++-
 include/linux/ring_buffer.h          |   23 +--
 include/trace/ftrace.h               |   15 ++-
 kernel/trace/Kconfig                 |    8 +
 kernel/trace/blktrace.c              |   12 +-
 kernel/trace/kmemtrace.c             |    4 +-
 kernel/trace/ring_buffer.c           |  172 +++++++++++++-------
 kernel/trace/trace.c                 |  297 ++++++++++++++++++++++++----------
 kernel/trace/trace.h                 |   26 ++--
 kernel/trace/trace_boot.c            |   16 +-
 kernel/trace/trace_events.c          |    6 +-
 kernel/trace/trace_functions_graph.c |   14 +-
 kernel/trace/trace_irqsoff.c         |    3 +-
 kernel/trace/trace_mmiotrace.c       |   10 +-
 kernel/trace/trace_power.c           |   22 ++-
 kernel/trace/trace_sched_switch.c    |   18 ++-
 kernel/trace/trace_sched_wakeup.c    |    7 +-
 kernel/trace/trace_syscalls.c        |   18 ++-
 18 files changed, 444 insertions(+), 242 deletions(-)
-- 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/18] ring-buffer: do not reset while in a commit
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 02/18] ring-buffer: do not swap buffers during " Steven Rostedt
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0001-ring-buffer-do-not-reset-while-in-a-commit.patch --]
[-- Type: text/plain, Size: 968 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The callers of reset must ensure that no commit can be taking place
at the time of the reset. If it does then we may corrupt the ring buffer.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index da2c59d..79d6012 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3373,12 +3373,16 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu)
 
 	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
 
+	if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing)))
+		goto out;
+
 	__raw_spin_lock(&cpu_buffer->lock);
 
 	rb_reset_cpu(cpu_buffer);
 
 	__raw_spin_unlock(&cpu_buffer->lock);
 
+ out:
 	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
 	atomic_dec(&cpu_buffer->record_disabled);
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/18] ring-buffer: do not swap buffers during a commit
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
  2009-09-04 23:55 ` [PATCH 01/18] ring-buffer: do not reset while in a commit Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 03/18] ring-buffer: remove unnecessary cpu_relax Steven Rostedt
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0002-ring-buffer-do-not-swap-buffers-during-a-commit.patch --]
[-- Type: text/plain, Size: 1129 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

If a commit is taking place on a CPU ring buffer, do not allow it to
be swapped. Return -EBUSY when this is detected instead.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 79d6012..2878bd4 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3519,16 +3519,23 @@ int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 	atomic_inc(&cpu_buffer_a->record_disabled);
 	atomic_inc(&cpu_buffer_b->record_disabled);
 
+	ret = -EBUSY;
+	if (local_read(&cpu_buffer_a->committing))
+		goto out_dec;
+	if (local_read(&cpu_buffer_b->committing))
+		goto out_dec;
+
 	buffer_a->buffers[cpu] = cpu_buffer_b;
 	buffer_b->buffers[cpu] = cpu_buffer_a;
 
 	cpu_buffer_b->buffer = buffer_a;
 	cpu_buffer_a->buffer = buffer_b;
 
+	ret = 0;
+
+out_dec:
 	atomic_dec(&cpu_buffer_a->record_disabled);
 	atomic_dec(&cpu_buffer_b->record_disabled);
-
-	ret = 0;
 out:
 	return ret;
 }
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/18] ring-buffer: remove unnecessary cpu_relax
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
  2009-09-04 23:55 ` [PATCH 01/18] ring-buffer: do not reset while in a commit Steven Rostedt
  2009-09-04 23:55 ` [PATCH 02/18] ring-buffer: do not swap buffers during " Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 04/18] ring-buffer: fix ring_buffer_read crossing pages Steven Rostedt
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0003-ring-buffer-remove-unnecessary-cpu_relax.patch --]
[-- Type: text/plain, Size: 1890 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The loops in the ring buffer that use cpu_relax are not dependent on
other CPUs. They simply came across some padding in the ring buffer and
are skipping over them. It is a normal loop and does not require a
cpu_relax.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |   16 ++++------------
 1 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 2878bd4..a05541a 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3132,10 +3132,8 @@ ring_buffer_peek(struct ring_buffer *buffer, int cpu, u64 *ts)
 		spin_unlock(&cpu_buffer->reader_lock);
 	local_irq_restore(flags);
 
-	if (event && event->type_len == RINGBUF_TYPE_PADDING) {
-		cpu_relax();
+	if (event && event->type_len == RINGBUF_TYPE_PADDING)
 		goto again;
-	}
 
 	return event;
 }
@@ -3160,10 +3158,8 @@ ring_buffer_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
 	event = rb_iter_peek(iter, ts);
 	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
-	if (event && event->type_len == RINGBUF_TYPE_PADDING) {
-		cpu_relax();
+	if (event && event->type_len == RINGBUF_TYPE_PADDING)
 		goto again;
-	}
 
 	return event;
 }
@@ -3209,10 +3205,8 @@ ring_buffer_consume(struct ring_buffer *buffer, int cpu, u64 *ts)
  out:
 	preempt_enable();
 
-	if (event && event->type_len == RINGBUF_TYPE_PADDING) {
-		cpu_relax();
+	if (event && event->type_len == RINGBUF_TYPE_PADDING)
 		goto again;
-	}
 
 	return event;
 }
@@ -3302,10 +3296,8 @@ ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts)
  out:
 	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
-	if (event && event->type_len == RINGBUF_TYPE_PADDING) {
-		cpu_relax();
+	if (event && event->type_len == RINGBUF_TYPE_PADDING)
 		goto again;
-	}
 
 	return event;
 }
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/18] ring-buffer: fix ring_buffer_read crossing pages
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (2 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 03/18] ring-buffer: remove unnecessary cpu_relax Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 05/18] ring-buffer: remove ring_buffer_event_discard Steven Rostedt
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0004-ring-buffer-fix-ring_buffer_read-crossing-pages.patch --]
[-- Type: text/plain, Size: 1988 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

When the ring buffer uses an iterator (static read mode, not on the
fly reading), when it crosses a page boundery, it will skip the first
entry on the next page. The reason is that the last entry of a page
is usually padding if the page is not full. The padding will not be
returned to the user.

The problem arises on ring_buffer_read because it also increments the
iterator. Because both the read and peek use the same rb_iter_peek,
the rb_iter_peak will return the padding but also increment to the next
item. This is because the ring_buffer_peek will not incerment it
itself.

The ring_buffer_read will increment it again and then call rb_iter_peek
again to get the next item. But that will be the second item, not the
first one on the page.

The reason this never showed up before, is because the ftrace utility
always calls ring_buffer_peek first and only uses ring_buffer_read
to increment to the next item. The ring_buffer_peek will always keep
the pointer to a valid item and not padding. This just hid the bug.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index a05541a..9d939e7 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3286,19 +3286,19 @@ ring_buffer_read(struct ring_buffer_iter *iter, u64 *ts)
 	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
 	unsigned long flags;

- again:
 	spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+ again:
 	event = rb_iter_peek(iter, ts);
 	if (!event)
 		goto out;

+	if (event->type_len == RINGBUF_TYPE_PADDING)
+		goto again;
+
 	rb_advance_iter(iter);
  out:
 	spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);

-	if (event && event->type_len == RINGBUF_TYPE_PADDING)
-		goto again;
-
 	return event;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_read);
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/18] ring-buffer: remove ring_buffer_event_discard
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (3 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 04/18] ring-buffer: fix ring_buffer_read crossing pages Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 06/18] ring-buffer: do not count discarded events Steven Rostedt
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0005-ring-buffer-remove-ring_buffer_event_discard.patch --]
[-- Type: text/plain, Size: 3597 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The function ring_buffer_event_discard can be used on any item in the
ring buffer, even after the item was committed. This function provides
no safety nets and is very race prone.

An item may be safely removed from the ring buffer before it is committed
with the ring_buffer_discard_commit.

Since there are currently no users of this function, and because this
function is racey and error prone, this patch removes it altogether.

Note, removing this function also allows the counters to ignore
all discarded events (patches will follow).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ring_buffer.h |   14 --------------
 kernel/trace/ring_buffer.c  |   27 ++++++---------------------
 2 files changed, 6 insertions(+), 35 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 7fca716..e061b4e 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -75,20 +75,6 @@ ring_buffer_event_time_delta(struct ring_buffer_event *event)
 }
 
 /*
- * ring_buffer_event_discard can discard any event in the ring buffer.
- *   it is up to the caller to protect against a reader from
- *   consuming it or a writer from wrapping and replacing it.
- *
- * No external protection is needed if this is called before
- * the event is commited. But in that case it would be better to
- * use ring_buffer_discard_commit.
- *
- * Note, if an event that has not been committed is discarded
- * with ring_buffer_event_discard, it must still be committed.
- */
-void ring_buffer_event_discard(struct ring_buffer_event *event);
-
-/*
  * ring_buffer_discard_commit will remove an event that has not
  *   ben committed yet. If this is used, then ring_buffer_unlock_commit
  *   must not be called on the discarded event. This function
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9d939e7..092fe0c 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2328,31 +2328,16 @@ static inline void rb_event_discard(struct ring_buffer_event *event)
 }
 
 /**
- * ring_buffer_event_discard - discard any event in the ring buffer
- * @event: the event to discard
- *
- * Sometimes a event that is in the ring buffer needs to be ignored.
- * This function lets the user discard an event in the ring buffer
- * and then that event will not be read later.
- *
- * Note, it is up to the user to be careful with this, and protect
- * against races. If the user discards an event that has been consumed
- * it is possible that it could corrupt the ring buffer.
- */
-void ring_buffer_event_discard(struct ring_buffer_event *event)
-{
-	rb_event_discard(event);
-}
-EXPORT_SYMBOL_GPL(ring_buffer_event_discard);
-
-/**
  * ring_buffer_commit_discard - discard an event that has not been committed
  * @buffer: the ring buffer
  * @event: non committed event to discard
  *
- * This is similar to ring_buffer_event_discard but must only be
- * performed on an event that has not been committed yet. The difference
- * is that this will also try to free the event from the ring buffer
+ * Sometimes an event that is in the ring buffer needs to be ignored.
+ * This function lets the user discard an event in the ring buffer
+ * and then that event will not be read later.
+ *
+ * This function only works if it is called before the the item has been
+ * committed. It will try to free the event from the ring buffer
  * if another event has not been added behind it.
  *
  * If another event has been added behind it, it will set the event
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/18] ring-buffer: do not count discarded events
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (4 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 05/18] ring-buffer: remove ring_buffer_event_discard Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 07/18] ring-buffer: disable all cpu buffers when one finds a problem Steven Rostedt
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0006-ring-buffer-do-not-count-discarded-events.patch --]
[-- Type: text/plain, Size: 4841 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The latency tracers report the number of items in the trace buffer.
This uses the ring buffer data to calculate this. Because discarded
events are also counted, the numbers do not match the number of items
that are printed. The ring buffer also adds a "padding" item to the
end of each buffer page which also gets counted as a discarded item.

This patch decrements the counter to the page entries on a discard.
This allows us to ignore discarded entries while reading the buffer.

Decrementing the counter is still safe since it can only happen while
the committing flag is still set.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |   71 +++++++++++++++++++++++++++++++++----------
 1 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 092fe0c..c8d2a66 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -218,17 +218,12 @@ enum {
 
 static inline int rb_null_event(struct ring_buffer_event *event)
 {
-	return event->type_len == RINGBUF_TYPE_PADDING
-			&& event->time_delta == 0;
-}
-
-static inline int rb_discarded_event(struct ring_buffer_event *event)
-{
-	return event->type_len == RINGBUF_TYPE_PADDING && event->time_delta;
+	return event->type_len == RINGBUF_TYPE_PADDING && !event->time_delta;
 }
 
 static void rb_event_set_padding(struct ring_buffer_event *event)
 {
+	/* padding has a NULL time_delta */
 	event->type_len = RINGBUF_TYPE_PADDING;
 	event->time_delta = 0;
 }
@@ -1778,9 +1773,6 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
 	event->type_len = RINGBUF_TYPE_PADDING;
 	/* time delta must be non zero */
 	event->time_delta = 1;
-	/* Account for this as an entry */
-	local_inc(&tail_page->entries);
-	local_inc(&cpu_buffer->entries);
 
 	/* Set write to end of buffer */
 	length = (tail + length) - BUF_PAGE_SIZE;
@@ -2269,18 +2261,23 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
 }
 EXPORT_SYMBOL_GPL(ring_buffer_lock_reserve);
 
-static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
+static void
+rb_update_write_stamp(struct ring_buffer_per_cpu *cpu_buffer,
 		      struct ring_buffer_event *event)
 {
-	local_inc(&cpu_buffer->entries);
-
 	/*
 	 * The event first in the commit queue updates the
 	 * time stamp.
 	 */
 	if (rb_event_is_commit(cpu_buffer, event))
 		cpu_buffer->write_stamp += event->time_delta;
+}
 
+static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
+		      struct ring_buffer_event *event)
+{
+	local_inc(&cpu_buffer->entries);
+	rb_update_write_stamp(cpu_buffer, event);
 	rb_end_commit(cpu_buffer);
 }
 
@@ -2327,6 +2324,46 @@ static inline void rb_event_discard(struct ring_buffer_event *event)
 		event->time_delta = 1;
 }
 
+/*
+ * Decrement the entries to the page that an event is on.
+ * The event does not even need to exist, only the pointer
+ * to the page it is on. This may only be called before the commit
+ * takes place.
+ */
+static inline void
+rb_decrement_entry(struct ring_buffer_per_cpu *cpu_buffer,
+		   struct ring_buffer_event *event)
+{
+	unsigned long addr = (unsigned long)event;
+	struct buffer_page *bpage = cpu_buffer->commit_page;
+	struct buffer_page *start;
+
+	addr &= PAGE_MASK;
+
+	/* Do the likely case first */
+	if (likely(bpage->page == (void *)addr)) {
+		local_dec(&bpage->entries);
+		return;
+	}
+
+	/*
+	 * Because the commit page may be on the reader page we
+	 * start with the next page and check the end loop there.
+	 */
+	rb_inc_page(cpu_buffer, &bpage);
+	start = bpage;
+	do {
+		if (bpage->page == (void *)addr) {
+			local_dec(&bpage->entries);
+			return;
+		}
+		rb_inc_page(cpu_buffer, &bpage);
+	} while (bpage != start);
+
+	/* commit not part of this buffer?? */
+	RB_WARN_ON(cpu_buffer, 1);
+}
+
 /**
  * ring_buffer_commit_discard - discard an event that has not been committed
  * @buffer: the ring buffer
@@ -2365,14 +2402,15 @@ void ring_buffer_discard_commit(struct ring_buffer *buffer,
 	 */
 	RB_WARN_ON(buffer, !local_read(&cpu_buffer->committing));
 
+	rb_decrement_entry(cpu_buffer, event);
 	if (rb_try_to_discard(cpu_buffer, event))
 		goto out;
 
 	/*
 	 * The commit is still visible by the reader, so we
-	 * must increment entries.
+	 * must still update the timestamp.
 	 */
-	local_inc(&cpu_buffer->entries);
+	rb_update_write_stamp(cpu_buffer, event);
  out:
 	rb_end_commit(cpu_buffer);
 
@@ -2884,8 +2922,7 @@ static void rb_advance_reader(struct ring_buffer_per_cpu *cpu_buffer)
 
 	event = rb_reader_event(cpu_buffer);
 
-	if (event->type_len <= RINGBUF_TYPE_DATA_TYPE_LEN_MAX
-			|| rb_discarded_event(event))
+	if (event->type_len <= RINGBUF_TYPE_DATA_TYPE_LEN_MAX)
 		cpu_buffer->read++;
 
 	rb_update_read_stamp(cpu_buffer, event);
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/18] ring-buffer: disable all cpu buffers when one finds a problem
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (5 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 06/18] ring-buffer: do not count discarded events Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 08/18] tracing: print out start and stop in latency traces Steven Rostedt
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0007-ring-buffer-disable-all-cpu-buffers-when-one-finds-a.patch --]
[-- Type: text/plain, Size: 1679 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Currently the way RB_WARN_ON works, is to disable either the current
CPU buffer or all CPU buffers, depending on whether a ring_buffer or
ring_buffer_per_cpu struct was passed into the macro.

Most users of the RB_WARN_ON pass in the CPU buffer, so only the one
CPU buffer gets disabled but the rest are still active. This may
confuse users even though a warning is sent to the console.

This patch changes the macro to disable the entire buffer even if
the CPU buffer is passed in.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |   21 +++++++++++++--------
 1 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index c8d2a66..f83a42a 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -467,14 +467,19 @@ struct ring_buffer_iter {
 };
 
 /* buffer may be either ring_buffer or ring_buffer_per_cpu */
-#define RB_WARN_ON(buffer, cond)				\
-	({							\
-		int _____ret = unlikely(cond);			\
-		if (_____ret) {					\
-			atomic_inc(&buffer->record_disabled);	\
-			WARN_ON(1);				\
-		}						\
-		_____ret;					\
+#define RB_WARN_ON(b, cond)						\
+	({								\
+		int _____ret = unlikely(cond);				\
+		if (_____ret) {						\
+			if (__same_type(*(b), struct ring_buffer_per_cpu)) { \
+				struct ring_buffer_per_cpu *__b =	\
+					(void *)b;			\
+				atomic_inc(&__b->buffer->record_disabled); \
+			} else						\
+				atomic_inc(&b->record_disabled);	\
+			WARN_ON(1);					\
+		}							\
+		_____ret;						\
 	})
 
 /* Up this if you want to test the TIME_EXTENTS and normalization */
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/18] tracing: print out start and stop in latency traces
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (6 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 07/18] ring-buffer: disable all cpu buffers when one finds a problem Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 09/18] tracing: disable update max tracer while reading trace Steven Rostedt
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0008-tracing-print-out-start-and-stop-in-latency-traces.patch --]
[-- Type: text/plain, Size: 2687 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

During development of the tracer, we would copy information from
the live tracer to the max tracer with one memcpy. Since then we
added a generic ring buffer and we handle the copies differently now.
Unfortunately, we never copied the critical section information, and
we lost the output:

 #  => started at: kmem_cache_alloc
 #  => ended at:   kmem_cache_alloc

This patch adds back the critical start and end copying as well as
removes the unused "trace_idx" and "overrun" fields of the
trace_array_cpu structure.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c |   19 +++++++++++--------
 kernel/trace/trace.h |    3 ---
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0f08816..df2c9f7 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -407,19 +407,22 @@ static void
 __update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
 {
 	struct trace_array_cpu *data = tr->data[cpu];
+	struct trace_array_cpu *max_data = tr->data[cpu];
 
 	max_tr.cpu = cpu;
 	max_tr.time_start = data->preempt_timestamp;
 
-	data = max_tr.data[cpu];
-	data->saved_latency = tracing_max_latency;
+	max_data = max_tr.data[cpu];
+	max_data->saved_latency = tracing_max_latency;
+	max_data->critical_start = data->critical_start;
+	max_data->critical_end = data->critical_end;
 
 	memcpy(data->comm, tsk->comm, TASK_COMM_LEN);
-	data->pid = tsk->pid;
-	data->uid = task_uid(tsk);
-	data->nice = tsk->static_prio - 20 - MAX_RT_PRIO;
-	data->policy = tsk->policy;
-	data->rt_priority = tsk->rt_priority;
+	max_data->pid = tsk->pid;
+	max_data->uid = task_uid(tsk);
+	max_data->nice = tsk->static_prio - 20 - MAX_RT_PRIO;
+	max_data->policy = tsk->policy;
+	max_data->rt_priority = tsk->rt_priority;
 
 	/* record this tasks comm */
 	tracing_record_cmdline(tsk);
@@ -1501,7 +1504,7 @@ print_trace_header(struct seq_file *m, struct trace_iterator *iter)
 		seq_puts(m, "\n#  => ended at:   ");
 		seq_print_ip_sym(&iter->seq, data->critical_end, sym_flags);
 		trace_print_seq(m, &iter->seq);
-		seq_puts(m, "#\n");
+		seq_puts(m, "\n#\n");
 	}
 
 	seq_puts(m, "#\n");
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index e2c06b2..f2af713 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -234,9 +234,6 @@ struct trace_array_cpu {
 	atomic_t		disabled;
 	void			*buffer_page;	/* ring buffer spare */
 
-	/* these fields get copied into max-trace: */
-	unsigned long		trace_idx;
-	unsigned long		overrun;
 	unsigned long		saved_latency;
 	unsigned long		critical_start;
 	unsigned long		critical_end;
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/18] tracing: disable update max tracer while reading trace
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (7 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 08/18] tracing: print out start and stop in latency traces Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 10/18] tracing: disable buffers and synchronize_sched before resetting Steven Rostedt
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0009-tracing-disable-update-max-tracer-while-reading-trac.patch --]
[-- Type: text/plain, Size: 1569 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

When reading the tracer from the trace file, updating the max latency
may corrupt the output. This patch disables the tracing of the max
latency while reading the trace file.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index df2c9f7..e521f1e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -263,6 +263,9 @@ unsigned long trace_flags = TRACE_ITER_PRINT_PARENT | TRACE_ITER_PRINTK |
 	TRACE_ITER_ANNOTATE | TRACE_ITER_CONTEXT_INFO | TRACE_ITER_SLEEP_TIME |
 	TRACE_ITER_GRAPH_TIME;
 
+static int trace_stop_count;
+static DEFINE_SPINLOCK(tracing_start_lock);
+
 /**
  * trace_wake_up - wake up tasks waiting for trace input
  *
@@ -442,6 +445,9 @@ update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
 {
 	struct ring_buffer *buf = tr->buffer;
 
+	if (trace_stop_count)
+		return;
+
 	WARN_ON_ONCE(!irqs_disabled());
 	__raw_spin_lock(&ftrace_max_lock);
 
@@ -469,6 +475,9 @@ update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu)
 {
 	int ret;
 
+	if (trace_stop_count)
+		return;
+
 	WARN_ON_ONCE(!irqs_disabled());
 	__raw_spin_lock(&ftrace_max_lock);
 
@@ -685,9 +694,6 @@ static void trace_init_cmdlines(void)
 	cmdline_idx = 0;
 }
 
-static int trace_stop_count;
-static DEFINE_SPINLOCK(tracing_start_lock);
-
 /**
  * ftrace_off_permanent - disable all ftrace code permanently
  *
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/18] tracing: disable buffers and synchronize_sched before resetting
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (8 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 09/18] tracing: disable update max tracer while reading trace Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 11/18] tracing: remove users of tracing_reset Steven Rostedt
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0010-tracing-disable-buffers-and-synchronize_sched-before.patch --]
[-- Type: text/plain, Size: 1103 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Resetting the ring buffers while traces are happening can corrupt
the ring buffer and disable it (no kernel crash to worry about).

The safest thing to do is disable the ring buffers, call synchronize_sched()
to wait for all current writers to finish and then reset the buffer.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e521f1e..9110329 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -658,12 +658,20 @@ void tracing_reset(struct trace_array *tr, int cpu)
 
 void tracing_reset_online_cpus(struct trace_array *tr)
 {
+	struct ring_buffer *buffer = tr->buffer;
 	int cpu;
 
+	ring_buffer_record_disable(buffer);
+
+	/* Make sure all commits have finished */
+	synchronize_sched();
+
 	tr->time_start = ftrace_now(tr->cpu);
 
 	for_each_online_cpu(cpu)
 		tracing_reset(tr, cpu);
+
+	ring_buffer_record_enable(buffer);
 }
 
 void tracing_reset_current(int cpu)
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 11/18] tracing: remove users of tracing_reset
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (9 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 10/18] tracing: disable buffers and synchronize_sched before resetting Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 12/18] tracing: use timestamp to determine start of latency traces Steven Rostedt
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0011-tracing-remove-users-of-tracing_reset.patch --]
[-- Type: text/plain, Size: 3296 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The function tracing_reset is deprecated for outside use of trace.c.

The new function to reset the the buffers is tracing_reset_online_cpus.

The reason for this is that resetting the buffers while the event
trace points are active can corrupt the buffers, because they may
be writing at the time of reset. The tracing_reset_online_cpus disables
writes and waits for current writers to finish.

This patch replaces all users of tracing_reset except for the latency
tracers. Those changes require more work and will be removed in the
following patches.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/kmemtrace.c   |    4 +---
 kernel/trace/trace.c       |    7 ++-----
 kernel/trace/trace_boot.c  |    4 +---
 kernel/trace/trace_power.c |    4 +---
 4 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/kernel/trace/kmemtrace.c b/kernel/trace/kmemtrace.c
index dda53cc..81b1645 100644
--- a/kernel/trace/kmemtrace.c
+++ b/kernel/trace/kmemtrace.c
@@ -183,11 +183,9 @@ static void kmemtrace_stop_probes(void)
 
 static int kmem_trace_init(struct trace_array *tr)
 {
-	int cpu;
 	kmemtrace_array = tr;
 
-	for_each_cpu(cpu, cpu_possible_mask)
-		tracing_reset(tr, cpu);
+	tracing_reset_online_cpus(tr);
 
 	kmemtrace_start_probes();
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 9110329..54517a8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -550,7 +550,6 @@ __acquires(kernel_lock)
 	if (type->selftest && !tracing_selftest_disabled) {
 		struct tracer *saved_tracer = current_trace;
 		struct trace_array *tr = &global_trace;
-		int i;
 
 		/*
 		 * Run a selftest on this tracer.
@@ -559,8 +558,7 @@ __acquires(kernel_lock)
 		 * internal tracing to verify that everything is in order.
 		 * If we fail, we do not register this tracer.
 		 */
-		for_each_tracing_cpu(i)
-			tracing_reset(tr, i);
+		tracing_reset_online_cpus(tr);
 
 		current_trace = type;
 		/* the test is responsible for initializing and enabling */
@@ -573,8 +571,7 @@ __acquires(kernel_lock)
 			goto out;
 		}
 		/* Only reset on passing, to avoid touching corrupted buffers */
-		for_each_tracing_cpu(i)
-			tracing_reset(tr, i);
+		tracing_reset_online_cpus(tr);
 
 		printk(KERN_CONT "PASSED\n");
 	}
diff --git a/kernel/trace/trace_boot.c b/kernel/trace/trace_boot.c
index a29ef23..8631393 100644
--- a/kernel/trace/trace_boot.c
+++ b/kernel/trace/trace_boot.c
@@ -41,14 +41,12 @@ void disable_boot_trace(void)
 
 static int boot_trace_init(struct trace_array *tr)
 {
-	int cpu;
 	boot_trace = tr;
 
 	if (!tr)
 		return 0;
 
-	for_each_cpu(cpu, cpu_possible_mask)
-		tracing_reset(tr, cpu);
+	tracing_reset_online_cpus(tr);
 
 	tracing_sched_switch_assign_trace(tr);
 	return 0;
diff --git a/kernel/trace/trace_power.c b/kernel/trace/trace_power.c
index 8a30d98..a5d5a4f 100644
--- a/kernel/trace/trace_power.c
+++ b/kernel/trace/trace_power.c
@@ -144,14 +144,12 @@ static void power_trace_reset(struct trace_array *tr)
 
 static int power_trace_init(struct trace_array *tr)
 {
-	int cpu;
 	power_trace = tr;
 
 	trace_power_enabled = 1;
 	tracing_power_register();
 
-	for_each_cpu(cpu, cpu_possible_mask)
-		tracing_reset(tr, cpu);
+	tracing_reset_online_cpus(tr);
 	return 0;
 }
 
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 12/18] tracing: use timestamp to determine start of latency traces
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (10 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 11/18] tracing: remove users of tracing_reset Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 13/18] tracing: make tracing_reset safe for external use Steven Rostedt
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0012-tracing-use-timestamp-to-determine-start-of-latency-.patch --]
[-- Type: text/plain, Size: 7755 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Currently the latency tracers reset the ring buffer. Unfortunately
if a commit is in process (due to a trace event), this can corrupt
the ring buffer. When this happens, the ring buffer will detect
the corruption and then permanently disable the ring buffer.

The bug does not crash the system, but it does prevent further tracing
after the bug is hit.

Instead of reseting the trace buffers, the timestamp of the start of
the trace is used instead. The buffers will still contain the previous
data, but the output will not count any data that is before the
timestamp of the trace.

Note, this only affects the static trace output (trace) and not the
runtime trace output (trace_pipe). The runtime trace output does not
make sense for the latency tracers anyway.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c              |   80 +++++++++++++++++++++++++++++--------
 kernel/trace/trace.h              |    1 +
 kernel/trace/trace_irqsoff.c      |    3 +-
 kernel/trace/trace_sched_wakeup.c |    7 +--
 4 files changed, 67 insertions(+), 24 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 54517a8..7daf372 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -454,10 +454,6 @@ update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
 	tr->buffer = max_tr.buffer;
 	max_tr.buffer = buf;
 
-	ftrace_disable_cpu();
-	ring_buffer_reset(tr->buffer);
-	ftrace_enable_cpu();
-
 	__update_max_tr(tr, tsk, cpu);
 	__raw_spin_unlock(&ftrace_max_lock);
 }
@@ -483,7 +479,6 @@ update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu)
 
 	ftrace_disable_cpu();
 
-	ring_buffer_reset(max_tr.buffer);
 	ret = ring_buffer_swap_cpu(max_tr.buffer, tr->buffer, cpu);
 
 	ftrace_enable_cpu();
@@ -1374,6 +1369,37 @@ static void *s_next(struct seq_file *m, void *v, loff_t *pos)
 	return ent;
 }
 
+static void tracing_iter_reset(struct trace_iterator *iter, int cpu)
+{
+	struct trace_array *tr = iter->tr;
+	struct ring_buffer_event *event;
+	struct ring_buffer_iter *buf_iter;
+	unsigned long entries = 0;
+	u64 ts;
+
+	tr->data[cpu]->skipped_entries = 0;
+
+	if (!iter->buffer_iter[cpu])
+		return;
+
+	buf_iter = iter->buffer_iter[cpu];
+	ring_buffer_iter_reset(buf_iter);
+
+	/*
+	 * We could have the case with the max latency tracers
+	 * that a reset never took place on a cpu. This is evident
+	 * by the timestamp being before the start of the buffer.
+	 */
+	while ((event = ring_buffer_iter_peek(buf_iter, &ts))) {
+		if (ts >= iter->tr->time_start)
+			break;
+		entries++;
+		ring_buffer_read(buf_iter, NULL);
+	}
+
+	tr->data[cpu]->skipped_entries = entries;
+}
+
 /*
  * No necessary locking here. The worst thing which can
  * happen is loosing events consumed at the same time
@@ -1412,10 +1438,9 @@ static void *s_start(struct seq_file *m, loff_t *pos)
 
 		if (cpu_file == TRACE_PIPE_ALL_CPU) {
 			for_each_tracing_cpu(cpu)
-				ring_buffer_iter_reset(iter->buffer_iter[cpu]);
+				tracing_iter_reset(iter, cpu);
 		} else
-			ring_buffer_iter_reset(iter->buffer_iter[cpu_file]);
-
+			tracing_iter_reset(iter, cpu_file);
 
 		ftrace_enable_cpu();
 
@@ -1464,16 +1489,32 @@ print_trace_header(struct seq_file *m, struct trace_iterator *iter)
 	struct trace_array *tr = iter->tr;
 	struct trace_array_cpu *data = tr->data[tr->cpu];
 	struct tracer *type = current_trace;
-	unsigned long total;
-	unsigned long entries;
+	unsigned long entries = 0;
+	unsigned long total = 0;
+	unsigned long count;
 	const char *name = "preemption";
+	int cpu;
 
 	if (type)
 		name = type->name;
 
-	entries = ring_buffer_entries(iter->tr->buffer);
-	total = entries +
-		ring_buffer_overruns(iter->tr->buffer);
+
+	for_each_tracing_cpu(cpu) {
+		count = ring_buffer_entries_cpu(tr->buffer, cpu);
+		/*
+		 * If this buffer has skipped entries, then we hold all
+		 * entries for the trace and we need to ignore the
+		 * ones before the time stamp.
+		 */
+		if (tr->data[cpu]->skipped_entries) {
+			count -= tr->data[cpu]->skipped_entries;
+			/* total is the same as the entries */
+			total += count;
+		} else
+			total += count +
+				ring_buffer_overrun_cpu(tr->buffer, cpu);
+		entries += count;
+	}
 
 	seq_printf(m, "# %s latency trace v1.1.5 on %s\n",
 		   name, UTS_RELEASE);
@@ -1534,6 +1575,9 @@ static void test_cpu_buff_start(struct trace_iterator *iter)
 	if (cpumask_test_cpu(iter->cpu, iter->started))
 		return;
 
+	if (iter->tr->data[iter->cpu]->skipped_entries)
+		return;
+
 	cpumask_set_cpu(iter->cpu, iter->started);
 
 	/* Don't print started cpu buffer for the first entry of the trace */
@@ -1796,19 +1840,23 @@ __tracing_open(struct inode *inode, struct file *file)
 	if (ring_buffer_overruns(iter->tr->buffer))
 		iter->iter_flags |= TRACE_FILE_ANNOTATE;
 
+	/* stop the trace while dumping */
+	tracing_stop();
+
 	if (iter->cpu_file == TRACE_PIPE_ALL_CPU) {
 		for_each_tracing_cpu(cpu) {
 
 			iter->buffer_iter[cpu] =
 				ring_buffer_read_start(iter->tr->buffer, cpu);
+			tracing_iter_reset(iter, cpu);
 		}
 	} else {
 		cpu = iter->cpu_file;
 		iter->buffer_iter[cpu] =
 				ring_buffer_read_start(iter->tr->buffer, cpu);
+		tracing_iter_reset(iter, cpu);
 	}
 
-	/* TODO stop tracer */
 	ret = seq_open(file, &tracer_seq_ops);
 	if (ret < 0) {
 		fail_ret = ERR_PTR(ret);
@@ -1818,9 +1866,6 @@ __tracing_open(struct inode *inode, struct file *file)
 	m = file->private_data;
 	m->private = iter;
 
-	/* stop the trace while dumping */
-	tracing_stop();
-
 	mutex_unlock(&trace_types_lock);
 
 	return iter;
@@ -1831,6 +1876,7 @@ __tracing_open(struct inode *inode, struct file *file)
 			ring_buffer_read_finish(iter->buffer_iter[cpu]);
 	}
 	free_cpumask_var(iter->started);
+	tracing_start();
  fail:
 	mutex_unlock(&trace_types_lock);
 	kfree(iter->trace);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f2af713..ca070de 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -241,6 +241,7 @@ struct trace_array_cpu {
 	unsigned long		nice;
 	unsigned long		policy;
 	unsigned long		rt_priority;
+	unsigned long		skipped_entries;
 	cycle_t			preempt_timestamp;
 	pid_t			pid;
 	uid_t			uid;
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index b923d13..5555b75 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -178,7 +178,6 @@ out_unlock:
 out:
 	data->critical_sequence = max_sequence;
 	data->preempt_timestamp = ftrace_now(cpu);
-	tracing_reset(tr, cpu);
 	trace_function(tr, CALLER_ADDR0, parent_ip, flags, pc);
 }
 
@@ -208,7 +207,6 @@ start_critical_timing(unsigned long ip, unsigned long parent_ip)
 	data->critical_sequence = max_sequence;
 	data->preempt_timestamp = ftrace_now(cpu);
 	data->critical_start = parent_ip ? : ip;
-	tracing_reset(tr, cpu);
 
 	local_save_flags(flags);
 
@@ -379,6 +377,7 @@ static void __irqsoff_tracer_init(struct trace_array *tr)
 	irqsoff_trace = tr;
 	/* make sure that the tracer is visible */
 	smp_wmb();
+	tracing_reset_online_cpus(tr);
 	start_irqsoff_tracer(tr);
 }
 
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index eacb272..ad69f10 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -186,11 +186,6 @@ out:
 
 static void __wakeup_reset(struct trace_array *tr)
 {
-	int cpu;
-
-	for_each_possible_cpu(cpu)
-		tracing_reset(tr, cpu);
-
 	wakeup_cpu = -1;
 	wakeup_prio = -1;
 
@@ -204,6 +199,8 @@ static void wakeup_reset(struct trace_array *tr)
 {
 	unsigned long flags;
 
+	tracing_reset_online_cpus(tr);
+
 	local_irq_save(flags);
 	__raw_spin_lock(&wakeup_lock);
 	__wakeup_reset(tr);
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 13/18] tracing: make tracing_reset safe for external use
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (11 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 12/18] tracing: use timestamp to determine start of latency traces Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 14/18] tracing: pass around ring buffer instead of tracer Steven Rostedt
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0013-tracing-make-tracing_reset-safe-for-external-use.patch --]
[-- Type: text/plain, Size: 1671 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Reseting the trace buffer without first disabling the buffer and
waiting for any writers to complete, can corrupt the ring buffer.

This patch makes the external version of tracing_reset safe from
corruption by disabling the ring buffer and calling synchronize_sched.

This version can no longer be called from interrupt context. But all those
callers have been removed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c |   17 +++++++++++++++--
 1 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7daf372..0418e26 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -641,13 +641,26 @@ void unregister_tracer(struct tracer *type)
 	mutex_unlock(&trace_types_lock);
 }
 
-void tracing_reset(struct trace_array *tr, int cpu)
+static void __tracing_reset(struct trace_array *tr, int cpu)
 {
 	ftrace_disable_cpu();
 	ring_buffer_reset_cpu(tr->buffer, cpu);
 	ftrace_enable_cpu();
 }
 
+void tracing_reset(struct trace_array *tr, int cpu)
+{
+	struct ring_buffer *buffer = tr->buffer;
+
+	ring_buffer_record_disable(buffer);
+
+	/* Make sure all commits have finished */
+	synchronize_sched();
+	__tracing_reset(tr, cpu);
+
+	ring_buffer_record_enable(buffer);
+}
+
 void tracing_reset_online_cpus(struct trace_array *tr)
 {
 	struct ring_buffer *buffer = tr->buffer;
@@ -661,7 +674,7 @@ void tracing_reset_online_cpus(struct trace_array *tr)
 	tr->time_start = ftrace_now(tr->cpu);
 
 	for_each_online_cpu(cpu)
-		tracing_reset(tr, cpu);
+		__tracing_reset(tr, cpu);
 
 	ring_buffer_record_enable(buffer);
 }
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 14/18] tracing: pass around ring buffer instead of tracer
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (12 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 13/18] tracing: make tracing_reset safe for external use Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 15/18] tracing: add trace_array_printk for internal tracers to use Steven Rostedt
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0014-tracing-pass-around-ring-buffer-instead-of-tracer.patch --]
[-- Type: text/plain, Size: 32098 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The latency tracers (irqsoff and wakeup) can swap trace buffers
on the fly. If an event is happening and has reserved data on one of
the buffers, and the latency tracer swaps the global buffer with the
max buffer, the result is that the event may commit the data to the
wrong buffer.

This patch changes the API to the trace recording to be recieve the
buffer that was used to reserve a commit. Then this buffer can be passed
in to the commit.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace_event.h         |   15 +++--
 include/trace/ftrace.h               |   15 +++--
 kernel/trace/blktrace.c              |   12 ++-
 kernel/trace/trace.c                 |  117 +++++++++++++++++++--------------
 kernel/trace/trace.h                 |   17 +++--
 kernel/trace/trace_boot.c            |   12 ++-
 kernel/trace/trace_events.c          |    6 +-
 kernel/trace/trace_functions_graph.c |   14 ++--
 kernel/trace/trace_mmiotrace.c       |   10 ++-
 kernel/trace/trace_power.c           |   18 ++++--
 kernel/trace/trace_sched_switch.c    |   18 +++--
 kernel/trace/trace_syscalls.c        |   18 +++--
 12 files changed, 163 insertions(+), 109 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 7554804..23f7179 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -93,13 +93,17 @@ void tracing_generic_entry_update(struct trace_entry *entry,
 				  unsigned long flags,
 				  int pc);
 struct ring_buffer_event *
-trace_current_buffer_lock_reserve(int type, unsigned long len,
+trace_current_buffer_lock_reserve(struct ring_buffer **current_buffer,
+				  int type, unsigned long len,
 				  unsigned long flags, int pc);
-void trace_current_buffer_unlock_commit(struct ring_buffer_event *event,
+void trace_current_buffer_unlock_commit(struct ring_buffer *buffer,
+					struct ring_buffer_event *event,
 					unsigned long flags, int pc);
-void trace_nowake_buffer_unlock_commit(struct ring_buffer_event *event,
+void trace_nowake_buffer_unlock_commit(struct ring_buffer *buffer,
+				       struct ring_buffer_event *event,
 					unsigned long flags, int pc);
-void trace_current_buffer_discard_commit(struct ring_buffer_event *event);
+void trace_current_buffer_discard_commit(struct ring_buffer *buffer,
+					 struct ring_buffer_event *event);
 
 void tracing_record_cmdline(struct task_struct *tsk);
 
@@ -135,7 +139,8 @@ struct ftrace_event_call {
 
 extern void destroy_preds(struct ftrace_event_call *call);
 extern int filter_match_preds(struct ftrace_event_call *call, void *rec);
-extern int filter_current_check_discard(struct ftrace_event_call *call,
+extern int filter_current_check_discard(struct ring_buffer *buffer,
+					struct ftrace_event_call *call,
 					void *rec,
 					struct ring_buffer_event *event);
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index bfbc842..308bafd 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -460,13 +460,15 @@ static void ftrace_profile_disable_##call(struct ftrace_event_call *event_call)\
  * {
  *	struct ring_buffer_event *event;
  *	struct ftrace_raw_<call> *entry; <-- defined in stage 1
+ *	struct ring_buffer *buffer;
  *	unsigned long irq_flags;
  *	int pc;
  *
  *	local_save_flags(irq_flags);
  *	pc = preempt_count();
  *
- *	event = trace_current_buffer_lock_reserve(event_<call>.id,
+ *	event = trace_current_buffer_lock_reserve(&buffer,
+ *				  event_<call>.id,
  *				  sizeof(struct ftrace_raw_<call>),
  *				  irq_flags, pc);
  *	if (!event)
@@ -476,7 +478,7 @@ static void ftrace_profile_disable_##call(struct ftrace_event_call *event_call)\
  *	<assign>;  <-- Here we assign the entries by the __field and
  *			__array macros.
  *
- *	trace_current_buffer_unlock_commit(event, irq_flags, pc);
+ *	trace_current_buffer_unlock_commit(buffer, event, irq_flags, pc);
  * }
  *
  * static int ftrace_raw_reg_event_<call>(void)
@@ -568,6 +570,7 @@ static void ftrace_raw_event_##call(proto)				\
 	struct ftrace_event_call *event_call = &event_##call;		\
 	struct ring_buffer_event *event;				\
 	struct ftrace_raw_##call *entry;				\
+	struct ring_buffer *buffer;					\
 	unsigned long irq_flags;					\
 	int __data_size;						\
 	int pc;								\
@@ -577,7 +580,8 @@ static void ftrace_raw_event_##call(proto)				\
 									\
 	__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
 									\
-	event = trace_current_buffer_lock_reserve(event_##call.id,	\
+	event = trace_current_buffer_lock_reserve(&buffer,		\
+				 event_##call.id,			\
 				 sizeof(*entry) + __data_size,		\
 				 irq_flags, pc);			\
 	if (!event)							\
@@ -589,8 +593,9 @@ static void ftrace_raw_event_##call(proto)				\
 									\
 	{ assign; }							\
 									\
-	if (!filter_current_check_discard(event_call, entry, event))	\
-		trace_nowake_buffer_unlock_commit(event, irq_flags, pc); \
+	if (!filter_current_check_discard(buffer, event_call, entry, event)) \
+		trace_nowake_buffer_unlock_commit(buffer,		\
+						  event, irq_flags, pc); \
 }									\
 									\
 static int ftrace_raw_reg_event_##call(void *ptr)			\
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 1090b0a..243bafc 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -65,13 +65,15 @@ static void trace_note(struct blk_trace *bt, pid_t pid, int action,
 {
 	struct blk_io_trace *t;
 	struct ring_buffer_event *event = NULL;
+	struct ring_buffer *buffer = NULL;
 	int pc = 0;
 	int cpu = smp_processor_id();
 	bool blk_tracer = blk_tracer_enabled;
 
 	if (blk_tracer) {
+		buffer = blk_tr->buffer;
 		pc = preempt_count();
-		event = trace_buffer_lock_reserve(blk_tr, TRACE_BLK,
+		event = trace_buffer_lock_reserve(buffer, TRACE_BLK,
 						  sizeof(*t) + len,
 						  0, pc);
 		if (!event)
@@ -96,7 +98,7 @@ record_it:
 		memcpy((void *) t + sizeof(*t), data, len);
 
 		if (blk_tracer)
-			trace_buffer_unlock_commit(blk_tr, event, 0, pc);
+			trace_buffer_unlock_commit(buffer, event, 0, pc);
 	}
 }
 
@@ -179,6 +181,7 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
 {
 	struct task_struct *tsk = current;
 	struct ring_buffer_event *event = NULL;
+	struct ring_buffer *buffer = NULL;
 	struct blk_io_trace *t;
 	unsigned long flags = 0;
 	unsigned long *sequence;
@@ -204,8 +207,9 @@ static void __blk_add_trace(struct blk_trace *bt, sector_t sector, int bytes,
 	if (blk_tracer) {
 		tracing_record_cmdline(current);
 
+		buffer = blk_tr->buffer;
 		pc = preempt_count();
-		event = trace_buffer_lock_reserve(blk_tr, TRACE_BLK,
+		event = trace_buffer_lock_reserve(buffer, TRACE_BLK,
 						  sizeof(*t) + pdu_len,
 						  0, pc);
 		if (!event)
@@ -252,7 +256,7 @@ record_it:
 			memcpy((void *) t + sizeof(*t), pdu_data, pdu_len);
 
 		if (blk_tracer) {
-			trace_buffer_unlock_commit(blk_tr, event, 0, pc);
+			trace_buffer_unlock_commit(buffer, event, 0, pc);
 			return;
 		}
 	}
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0418e26..0c61836 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -169,10 +169,11 @@ static struct trace_array	global_trace;
 
 static DEFINE_PER_CPU(struct trace_array_cpu, global_trace_cpu);
 
-int filter_current_check_discard(struct ftrace_event_call *call, void *rec,
+int filter_current_check_discard(struct ring_buffer *buffer,
+				 struct ftrace_event_call *call, void *rec,
 				 struct ring_buffer_event *event)
 {
-	return filter_check_discard(call, rec, global_trace.buffer, event);
+	return filter_check_discard(call, rec, buffer, event);
 }
 EXPORT_SYMBOL_GPL(filter_current_check_discard);
 
@@ -887,14 +888,15 @@ tracing_generic_entry_update(struct trace_entry *entry, unsigned long flags,
 }
 EXPORT_SYMBOL_GPL(tracing_generic_entry_update);
 
-struct ring_buffer_event *trace_buffer_lock_reserve(struct trace_array *tr,
-						    int type,
-						    unsigned long len,
-						    unsigned long flags, int pc)
+struct ring_buffer_event *
+trace_buffer_lock_reserve(struct ring_buffer *buffer,
+			  int type,
+			  unsigned long len,
+			  unsigned long flags, int pc)
 {
 	struct ring_buffer_event *event;
 
-	event = ring_buffer_lock_reserve(tr->buffer, len);
+	event = ring_buffer_lock_reserve(buffer, len);
 	if (event != NULL) {
 		struct trace_entry *ent = ring_buffer_event_data(event);
 
@@ -905,53 +907,59 @@ struct ring_buffer_event *trace_buffer_lock_reserve(struct trace_array *tr,
 	return event;
 }
 
-static inline void __trace_buffer_unlock_commit(struct trace_array *tr,
-					struct ring_buffer_event *event,
-					unsigned long flags, int pc,
-					int wake)
+static inline void
+__trace_buffer_unlock_commit(struct ring_buffer *buffer,
+			     struct ring_buffer_event *event,
+			     unsigned long flags, int pc,
+			     int wake)
 {
-	ring_buffer_unlock_commit(tr->buffer, event);
+	ring_buffer_unlock_commit(buffer, event);
 
-	ftrace_trace_stack(tr, flags, 6, pc);
-	ftrace_trace_userstack(tr, flags, pc);
+	ftrace_trace_stack(buffer, flags, 6, pc);
+	ftrace_trace_userstack(buffer, flags, pc);
 
 	if (wake)
 		trace_wake_up();
 }
 
-void trace_buffer_unlock_commit(struct trace_array *tr,
-					struct ring_buffer_event *event,
-					unsigned long flags, int pc)
+void trace_buffer_unlock_commit(struct ring_buffer *buffer,
+				struct ring_buffer_event *event,
+				unsigned long flags, int pc)
 {
-	__trace_buffer_unlock_commit(tr, event, flags, pc, 1);
+	__trace_buffer_unlock_commit(buffer, event, flags, pc, 1);
 }
 
 struct ring_buffer_event *
-trace_current_buffer_lock_reserve(int type, unsigned long len,
+trace_current_buffer_lock_reserve(struct ring_buffer **current_rb,
+				  int type, unsigned long len,
 				  unsigned long flags, int pc)
 {
-	return trace_buffer_lock_reserve(&global_trace,
+	*current_rb = global_trace.buffer;
+	return trace_buffer_lock_reserve(*current_rb,
 					 type, len, flags, pc);
 }
 EXPORT_SYMBOL_GPL(trace_current_buffer_lock_reserve);
 
-void trace_current_buffer_unlock_commit(struct ring_buffer_event *event,
+void trace_current_buffer_unlock_commit(struct ring_buffer *buffer,
+					struct ring_buffer_event *event,
 					unsigned long flags, int pc)
 {
-	__trace_buffer_unlock_commit(&global_trace, event, flags, pc, 1);
+	__trace_buffer_unlock_commit(buffer, event, flags, pc, 1);
 }
 EXPORT_SYMBOL_GPL(trace_current_buffer_unlock_commit);
 
-void trace_nowake_buffer_unlock_commit(struct ring_buffer_event *event,
-					unsigned long flags, int pc)
+void trace_nowake_buffer_unlock_commit(struct ring_buffer *buffer,
+				       struct ring_buffer_event *event,
+				       unsigned long flags, int pc)
 {
-	__trace_buffer_unlock_commit(&global_trace, event, flags, pc, 0);
+	__trace_buffer_unlock_commit(buffer, event, flags, pc, 0);
 }
 EXPORT_SYMBOL_GPL(trace_nowake_buffer_unlock_commit);
 
-void trace_current_buffer_discard_commit(struct ring_buffer_event *event)
+void trace_current_buffer_discard_commit(struct ring_buffer *buffer,
+					 struct ring_buffer_event *event)
 {
-	ring_buffer_discard_commit(global_trace.buffer, event);
+	ring_buffer_discard_commit(buffer, event);
 }
 EXPORT_SYMBOL_GPL(trace_current_buffer_discard_commit);
 
@@ -961,6 +969,7 @@ trace_function(struct trace_array *tr,
 	       int pc)
 {
 	struct ftrace_event_call *call = &event_function;
+	struct ring_buffer *buffer = tr->buffer;
 	struct ring_buffer_event *event;
 	struct ftrace_entry *entry;
 
@@ -968,7 +977,7 @@ trace_function(struct trace_array *tr,
 	if (unlikely(local_read(&__get_cpu_var(ftrace_cpu_disabled))))
 		return;
 
-	event = trace_buffer_lock_reserve(tr, TRACE_FN, sizeof(*entry),
+	event = trace_buffer_lock_reserve(buffer, TRACE_FN, sizeof(*entry),
 					  flags, pc);
 	if (!event)
 		return;
@@ -976,8 +985,8 @@ trace_function(struct trace_array *tr,
 	entry->ip			= ip;
 	entry->parent_ip		= parent_ip;
 
-	if (!filter_check_discard(call, entry, tr->buffer, event))
-		ring_buffer_unlock_commit(tr->buffer, event);
+	if (!filter_check_discard(call, entry, buffer, event))
+		ring_buffer_unlock_commit(buffer, event);
 }
 
 void
@@ -990,7 +999,7 @@ ftrace(struct trace_array *tr, struct trace_array_cpu *data,
 }
 
 #ifdef CONFIG_STACKTRACE
-static void __ftrace_trace_stack(struct trace_array *tr,
+static void __ftrace_trace_stack(struct ring_buffer *buffer,
 				 unsigned long flags,
 				 int skip, int pc)
 {
@@ -999,7 +1008,7 @@ static void __ftrace_trace_stack(struct trace_array *tr,
 	struct stack_entry *entry;
 	struct stack_trace trace;
 
-	event = trace_buffer_lock_reserve(tr, TRACE_STACK,
+	event = trace_buffer_lock_reserve(buffer, TRACE_STACK,
 					  sizeof(*entry), flags, pc);
 	if (!event)
 		return;
@@ -1012,26 +1021,27 @@ static void __ftrace_trace_stack(struct trace_array *tr,
 	trace.entries		= entry->caller;
 
 	save_stack_trace(&trace);
-	if (!filter_check_discard(call, entry, tr->buffer, event))
-		ring_buffer_unlock_commit(tr->buffer, event);
+	if (!filter_check_discard(call, entry, buffer, event))
+		ring_buffer_unlock_commit(buffer, event);
 }
 
-void ftrace_trace_stack(struct trace_array *tr, unsigned long flags, int skip,
-			int pc)
+void ftrace_trace_stack(struct ring_buffer *buffer, unsigned long flags,
+			int skip, int pc)
 {
 	if (!(trace_flags & TRACE_ITER_STACKTRACE))
 		return;
 
-	__ftrace_trace_stack(tr, flags, skip, pc);
+	__ftrace_trace_stack(buffer, flags, skip, pc);
 }
 
 void __trace_stack(struct trace_array *tr, unsigned long flags, int skip,
 		   int pc)
 {
-	__ftrace_trace_stack(tr, flags, skip, pc);
+	__ftrace_trace_stack(tr->buffer, flags, skip, pc);
 }
 
-void ftrace_trace_userstack(struct trace_array *tr, unsigned long flags, int pc)
+void
+ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags, int pc)
 {
 	struct ftrace_event_call *call = &event_user_stack;
 	struct ring_buffer_event *event;
@@ -1041,7 +1051,7 @@ void ftrace_trace_userstack(struct trace_array *tr, unsigned long flags, int pc)
 	if (!(trace_flags & TRACE_ITER_USERSTACKTRACE))
 		return;
 
-	event = trace_buffer_lock_reserve(tr, TRACE_USER_STACK,
+	event = trace_buffer_lock_reserve(buffer, TRACE_USER_STACK,
 					  sizeof(*entry), flags, pc);
 	if (!event)
 		return;
@@ -1055,8 +1065,8 @@ void ftrace_trace_userstack(struct trace_array *tr, unsigned long flags, int pc)
 	trace.entries		= entry->caller;
 
 	save_stack_trace_user(&trace);
-	if (!filter_check_discard(call, entry, tr->buffer, event))
-		ring_buffer_unlock_commit(tr->buffer, event);
+	if (!filter_check_discard(call, entry, buffer, event))
+		ring_buffer_unlock_commit(buffer, event);
 }
 
 #ifdef UNUSED
@@ -1075,9 +1085,10 @@ ftrace_trace_special(void *__tr,
 {
 	struct ring_buffer_event *event;
 	struct trace_array *tr = __tr;
+	struct ring_buffer *buffer = tr->buffer;
 	struct special_entry *entry;
 
-	event = trace_buffer_lock_reserve(tr, TRACE_SPECIAL,
+	event = trace_buffer_lock_reserve(buffer, TRACE_SPECIAL,
 					  sizeof(*entry), 0, pc);
 	if (!event)
 		return;
@@ -1085,7 +1096,7 @@ ftrace_trace_special(void *__tr,
 	entry->arg1			= arg1;
 	entry->arg2			= arg2;
 	entry->arg3			= arg3;
-	trace_buffer_unlock_commit(tr, event, 0, pc);
+	trace_buffer_unlock_commit(buffer, event, 0, pc);
 }
 
 void
@@ -1131,6 +1142,7 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
 
 	struct ftrace_event_call *call = &event_bprint;
 	struct ring_buffer_event *event;
+	struct ring_buffer *buffer;
 	struct trace_array *tr = &global_trace;
 	struct trace_array_cpu *data;
 	struct bprint_entry *entry;
@@ -1163,7 +1175,9 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
 		goto out_unlock;
 
 	size = sizeof(*entry) + sizeof(u32) * len;
-	event = trace_buffer_lock_reserve(tr, TRACE_BPRINT, size, flags, pc);
+	buffer = tr->buffer;
+	event = trace_buffer_lock_reserve(buffer, TRACE_BPRINT, size,
+					  flags, pc);
 	if (!event)
 		goto out_unlock;
 	entry = ring_buffer_event_data(event);
@@ -1171,8 +1185,8 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
 	entry->fmt			= fmt;
 
 	memcpy(entry->buf, trace_buf, sizeof(u32) * len);
-	if (!filter_check_discard(call, entry, tr->buffer, event))
-		ring_buffer_unlock_commit(tr->buffer, event);
+	if (!filter_check_discard(call, entry, buffer, event))
+		ring_buffer_unlock_commit(buffer, event);
 
 out_unlock:
 	__raw_spin_unlock(&trace_buf_lock);
@@ -1194,6 +1208,7 @@ int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
 
 	struct ftrace_event_call *call = &event_print;
 	struct ring_buffer_event *event;
+	struct ring_buffer *buffer;
 	struct trace_array *tr = &global_trace;
 	struct trace_array_cpu *data;
 	int cpu, len = 0, size, pc;
@@ -1222,7 +1237,9 @@ int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
 	trace_buf[len] = 0;
 
 	size = sizeof(*entry) + len + 1;
-	event = trace_buffer_lock_reserve(tr, TRACE_PRINT, size, irq_flags, pc);
+	buffer = tr->buffer;
+	event = trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
+					  irq_flags, pc);
 	if (!event)
 		goto out_unlock;
 	entry = ring_buffer_event_data(event);
@@ -1230,8 +1247,8 @@ int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
 
 	memcpy(&entry->buf, trace_buf, len);
 	entry->buf[len] = 0;
-	if (!filter_check_discard(call, entry, tr->buffer, event))
-		ring_buffer_unlock_commit(tr->buffer, event);
+	if (!filter_check_discard(call, entry, buffer, event))
+		ring_buffer_unlock_commit(buffer, event);
 
  out_unlock:
 	__raw_spin_unlock(&trace_buf_lock);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index ca070de..4d30414 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -415,12 +415,13 @@ void init_tracer_sysprof_debugfs(struct dentry *d_tracer);
 
 struct ring_buffer_event;
 
-struct ring_buffer_event *trace_buffer_lock_reserve(struct trace_array *tr,
-						    int type,
-						    unsigned long len,
-						    unsigned long flags,
-						    int pc);
-void trace_buffer_unlock_commit(struct trace_array *tr,
+struct ring_buffer_event *
+trace_buffer_lock_reserve(struct ring_buffer *buffer,
+			  int type,
+			  unsigned long len,
+			  unsigned long flags,
+			  int pc);
+void trace_buffer_unlock_commit(struct ring_buffer *buffer,
 				struct ring_buffer_event *event,
 				unsigned long flags, int pc);
 
@@ -481,10 +482,10 @@ void update_max_tr_single(struct trace_array *tr,
 #endif /* CONFIG_TRACER_MAX_TRACE */
 
 #ifdef CONFIG_STACKTRACE
-void ftrace_trace_stack(struct trace_array *tr, unsigned long flags,
+void ftrace_trace_stack(struct ring_buffer *buffer, unsigned long flags,
 			int skip, int pc);
 
-void ftrace_trace_userstack(struct trace_array *tr, unsigned long flags,
+void ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags,
 			    int pc);
 
 void __trace_stack(struct trace_array *tr, unsigned long flags, int skip,
diff --git a/kernel/trace/trace_boot.c b/kernel/trace/trace_boot.c
index 8631393..19bfc75 100644
--- a/kernel/trace/trace_boot.c
+++ b/kernel/trace/trace_boot.c
@@ -130,6 +130,7 @@ struct tracer boot_tracer __read_mostly =
 void trace_boot_call(struct boot_trace_call *bt, initcall_t fn)
 {
 	struct ring_buffer_event *event;
+	struct ring_buffer *buffer;
 	struct trace_boot_call *entry;
 	struct trace_array *tr = boot_trace;
 
@@ -142,13 +143,14 @@ void trace_boot_call(struct boot_trace_call *bt, initcall_t fn)
 	sprint_symbol(bt->func, (unsigned long)fn);
 	preempt_disable();
 
-	event = trace_buffer_lock_reserve(tr, TRACE_BOOT_CALL,
+	buffer = tr->buffer;
+	event = trace_buffer_lock_reserve(buffer, TRACE_BOOT_CALL,
 					  sizeof(*entry), 0, 0);
 	if (!event)
 		goto out;
 	entry	= ring_buffer_event_data(event);
 	entry->boot_call = *bt;
-	trace_buffer_unlock_commit(tr, event, 0, 0);
+	trace_buffer_unlock_commit(buffer, event, 0, 0);
  out:
 	preempt_enable();
 }
@@ -156,6 +158,7 @@ void trace_boot_call(struct boot_trace_call *bt, initcall_t fn)
 void trace_boot_ret(struct boot_trace_ret *bt, initcall_t fn)
 {
 	struct ring_buffer_event *event;
+	struct ring_buffer *buffer;
 	struct trace_boot_ret *entry;
 	struct trace_array *tr = boot_trace;
 
@@ -165,13 +168,14 @@ void trace_boot_ret(struct boot_trace_ret *bt, initcall_t fn)
 	sprint_symbol(bt->func, (unsigned long)fn);
 	preempt_disable();
 
-	event = trace_buffer_lock_reserve(tr, TRACE_BOOT_RET,
+	buffer = tr->buffer;
+	event = trace_buffer_lock_reserve(buffer, TRACE_BOOT_RET,
 					  sizeof(*entry), 0, 0);
 	if (!event)
 		goto out;
 	entry	= ring_buffer_event_data(event);
 	entry->boot_ret = *bt;
-	trace_buffer_unlock_commit(tr, event, 0, 0);
+	trace_buffer_unlock_commit(buffer, event, 0, 0);
  out:
 	preempt_enable();
 }
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index d33bcde..78b1ed2 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1438,6 +1438,7 @@ static void
 function_test_events_call(unsigned long ip, unsigned long parent_ip)
 {
 	struct ring_buffer_event *event;
+	struct ring_buffer *buffer;
 	struct ftrace_entry *entry;
 	unsigned long flags;
 	long disabled;
@@ -1455,7 +1456,8 @@ function_test_events_call(unsigned long ip, unsigned long parent_ip)
 
 	local_save_flags(flags);
 
-	event = trace_current_buffer_lock_reserve(TRACE_FN, sizeof(*entry),
+	event = trace_current_buffer_lock_reserve(&buffer,
+						  TRACE_FN, sizeof(*entry),
 						  flags, pc);
 	if (!event)
 		goto out;
@@ -1463,7 +1465,7 @@ function_test_events_call(unsigned long ip, unsigned long parent_ip)
 	entry->ip			= ip;
 	entry->parent_ip		= parent_ip;
 
-	trace_nowake_buffer_unlock_commit(event, flags, pc);
+	trace_nowake_buffer_unlock_commit(buffer, event, flags, pc);
 
  out:
 	atomic_dec(&per_cpu(test_event_disable, cpu));
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 3f4a251..b3749a2 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -173,19 +173,20 @@ static int __trace_graph_entry(struct trace_array *tr,
 {
 	struct ftrace_event_call *call = &event_funcgraph_entry;
 	struct ring_buffer_event *event;
+	struct ring_buffer *buffer = tr->buffer;
 	struct ftrace_graph_ent_entry *entry;
 
 	if (unlikely(local_read(&__get_cpu_var(ftrace_cpu_disabled))))
 		return 0;
 
-	event = trace_buffer_lock_reserve(tr, TRACE_GRAPH_ENT,
+	event = trace_buffer_lock_reserve(buffer, TRACE_GRAPH_ENT,
 					  sizeof(*entry), flags, pc);
 	if (!event)
 		return 0;
 	entry	= ring_buffer_event_data(event);
 	entry->graph_ent			= *trace;
-	if (!filter_current_check_discard(call, entry, event))
-		ring_buffer_unlock_commit(tr->buffer, event);
+	if (!filter_current_check_discard(buffer, call, entry, event))
+		ring_buffer_unlock_commit(buffer, event);
 
 	return 1;
 }
@@ -236,19 +237,20 @@ static void __trace_graph_return(struct trace_array *tr,
 {
 	struct ftrace_event_call *call = &event_funcgraph_exit;
 	struct ring_buffer_event *event;
+	struct ring_buffer *buffer = tr->buffer;
 	struct ftrace_graph_ret_entry *entry;
 
 	if (unlikely(local_read(&__get_cpu_var(ftrace_cpu_disabled))))
 		return;
 
-	event = trace_buffer_lock_reserve(tr, TRACE_GRAPH_RET,
+	event = trace_buffer_lock_reserve(buffer, TRACE_GRAPH_RET,
 					  sizeof(*entry), flags, pc);
 	if (!event)
 		return;
 	entry	= ring_buffer_event_data(event);
 	entry->ret				= *trace;
-	if (!filter_current_check_discard(call, entry, event))
-		ring_buffer_unlock_commit(tr->buffer, event);
+	if (!filter_current_check_discard(buffer, call, entry, event))
+		ring_buffer_unlock_commit(buffer, event);
 }
 
 void trace_graph_return(struct ftrace_graph_ret *trace)
diff --git a/kernel/trace/trace_mmiotrace.c b/kernel/trace/trace_mmiotrace.c
index d53b45e..c4c9bbd 100644
--- a/kernel/trace/trace_mmiotrace.c
+++ b/kernel/trace/trace_mmiotrace.c
@@ -307,11 +307,12 @@ static void __trace_mmiotrace_rw(struct trace_array *tr,
 				struct trace_array_cpu *data,
 				struct mmiotrace_rw *rw)
 {
+	struct ring_buffer *buffer = tr->buffer;
 	struct ring_buffer_event *event;
 	struct trace_mmiotrace_rw *entry;
 	int pc = preempt_count();
 
-	event = trace_buffer_lock_reserve(tr, TRACE_MMIO_RW,
+	event = trace_buffer_lock_reserve(buffer, TRACE_MMIO_RW,
 					  sizeof(*entry), 0, pc);
 	if (!event) {
 		atomic_inc(&dropped_count);
@@ -319,7 +320,7 @@ static void __trace_mmiotrace_rw(struct trace_array *tr,
 	}
 	entry	= ring_buffer_event_data(event);
 	entry->rw			= *rw;
-	trace_buffer_unlock_commit(tr, event, 0, pc);
+	trace_buffer_unlock_commit(buffer, event, 0, pc);
 }
 
 void mmio_trace_rw(struct mmiotrace_rw *rw)
@@ -333,11 +334,12 @@ static void __trace_mmiotrace_map(struct trace_array *tr,
 				struct trace_array_cpu *data,
 				struct mmiotrace_map *map)
 {
+	struct ring_buffer *buffer = tr->buffer;
 	struct ring_buffer_event *event;
 	struct trace_mmiotrace_map *entry;
 	int pc = preempt_count();
 
-	event = trace_buffer_lock_reserve(tr, TRACE_MMIO_MAP,
+	event = trace_buffer_lock_reserve(buffer, TRACE_MMIO_MAP,
 					  sizeof(*entry), 0, pc);
 	if (!event) {
 		atomic_inc(&dropped_count);
@@ -345,7 +347,7 @@ static void __trace_mmiotrace_map(struct trace_array *tr,
 	}
 	entry	= ring_buffer_event_data(event);
 	entry->map			= *map;
-	trace_buffer_unlock_commit(tr, event, 0, pc);
+	trace_buffer_unlock_commit(buffer, event, 0, pc);
 }
 
 void mmio_trace_mapping(struct mmiotrace_map *map)
diff --git a/kernel/trace/trace_power.c b/kernel/trace/trace_power.c
index a5d5a4f..fe1a00f 100644
--- a/kernel/trace/trace_power.c
+++ b/kernel/trace/trace_power.c
@@ -38,6 +38,7 @@ static void probe_power_end(struct power_trace *it)
 {
 	struct ftrace_event_call *call = &event_power;
 	struct ring_buffer_event *event;
+	struct ring_buffer *buffer;
 	struct trace_power *entry;
 	struct trace_array_cpu *data;
 	struct trace_array *tr = power_trace;
@@ -45,18 +46,20 @@ static void probe_power_end(struct power_trace *it)
 	if (!trace_power_enabled)
 		return;
 
+	buffer = tr->buffer;
+
 	preempt_disable();
 	it->end = ktime_get();
 	data = tr->data[smp_processor_id()];
 
-	event = trace_buffer_lock_reserve(tr, TRACE_POWER,
+	event = trace_buffer_lock_reserve(buffer, TRACE_POWER,
 					  sizeof(*entry), 0, 0);
 	if (!event)
 		goto out;
 	entry	= ring_buffer_event_data(event);
 	entry->state_data = *it;
-	if (!filter_check_discard(call, entry, tr->buffer, event))
-		trace_buffer_unlock_commit(tr, event, 0, 0);
+	if (!filter_check_discard(call, entry, buffer, event))
+		trace_buffer_unlock_commit(buffer, event, 0, 0);
  out:
 	preempt_enable();
 }
@@ -66,6 +69,7 @@ static void probe_power_mark(struct power_trace *it, unsigned int type,
 {
 	struct ftrace_event_call *call = &event_power;
 	struct ring_buffer_event *event;
+	struct ring_buffer *buffer;
 	struct trace_power *entry;
 	struct trace_array_cpu *data;
 	struct trace_array *tr = power_trace;
@@ -73,6 +77,8 @@ static void probe_power_mark(struct power_trace *it, unsigned int type,
 	if (!trace_power_enabled)
 		return;
 
+	buffer = tr->buffer;
+
 	memset(it, 0, sizeof(struct power_trace));
 	it->state = level;
 	it->type = type;
@@ -81,14 +87,14 @@ static void probe_power_mark(struct power_trace *it, unsigned int type,
 	it->end = it->stamp;
 	data = tr->data[smp_processor_id()];
 
-	event = trace_buffer_lock_reserve(tr, TRACE_POWER,
+	event = trace_buffer_lock_reserve(buffer, TRACE_POWER,
 					  sizeof(*entry), 0, 0);
 	if (!event)
 		goto out;
 	entry	= ring_buffer_event_data(event);
 	entry->state_data = *it;
-	if (!filter_check_discard(call, entry, tr->buffer, event))
-		trace_buffer_unlock_commit(tr, event, 0, 0);
+	if (!filter_check_discard(call, entry, buffer, event))
+		trace_buffer_unlock_commit(buffer, event, 0, 0);
  out:
 	preempt_enable();
 }
diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
index e1285d7..5fca0f5 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -28,10 +28,11 @@ tracing_sched_switch_trace(struct trace_array *tr,
 			   unsigned long flags, int pc)
 {
 	struct ftrace_event_call *call = &event_context_switch;
+	struct ring_buffer *buffer = tr->buffer;
 	struct ring_buffer_event *event;
 	struct ctx_switch_entry *entry;
 
-	event = trace_buffer_lock_reserve(tr, TRACE_CTX,
+	event = trace_buffer_lock_reserve(buffer, TRACE_CTX,
 					  sizeof(*entry), flags, pc);
 	if (!event)
 		return;
@@ -44,8 +45,8 @@ tracing_sched_switch_trace(struct trace_array *tr,
 	entry->next_state		= next->state;
 	entry->next_cpu	= task_cpu(next);
 
-	if (!filter_check_discard(call, entry, tr->buffer, event))
-		trace_buffer_unlock_commit(tr, event, flags, pc);
+	if (!filter_check_discard(call, entry, buffer, event))
+		trace_buffer_unlock_commit(buffer, event, flags, pc);
 }
 
 static void
@@ -86,8 +87,9 @@ tracing_sched_wakeup_trace(struct trace_array *tr,
 	struct ftrace_event_call *call = &event_wakeup;
 	struct ring_buffer_event *event;
 	struct ctx_switch_entry *entry;
+	struct ring_buffer *buffer = tr->buffer;
 
-	event = trace_buffer_lock_reserve(tr, TRACE_WAKE,
+	event = trace_buffer_lock_reserve(buffer, TRACE_WAKE,
 					  sizeof(*entry), flags, pc);
 	if (!event)
 		return;
@@ -100,10 +102,10 @@ tracing_sched_wakeup_trace(struct trace_array *tr,
 	entry->next_state		= wakee->state;
 	entry->next_cpu			= task_cpu(wakee);
 
-	if (!filter_check_discard(call, entry, tr->buffer, event))
-		ring_buffer_unlock_commit(tr->buffer, event);
-	ftrace_trace_stack(tr, flags, 6, pc);
-	ftrace_trace_userstack(tr, flags, pc);
+	if (!filter_check_discard(call, entry, buffer, event))
+		ring_buffer_unlock_commit(buffer, event);
+	ftrace_trace_stack(tr->buffer, flags, 6, pc);
+	ftrace_trace_userstack(tr->buffer, flags, pc);
 }
 
 static void
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 4f5fae6..8712ce3 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -223,6 +223,7 @@ void ftrace_syscall_enter(struct pt_regs *regs, long id)
 	struct syscall_trace_enter *entry;
 	struct syscall_metadata *sys_data;
 	struct ring_buffer_event *event;
+	struct ring_buffer *buffer;
 	int size;
 	int syscall_nr;
 
@@ -238,8 +239,8 @@ void ftrace_syscall_enter(struct pt_regs *regs, long id)
 
 	size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;
 
-	event = trace_current_buffer_lock_reserve(sys_data->enter_id, size,
-							0, 0);
+	event = trace_current_buffer_lock_reserve(&buffer, sys_data->enter_id,
+						  size, 0, 0);
 	if (!event)
 		return;
 
@@ -247,8 +248,9 @@ void ftrace_syscall_enter(struct pt_regs *regs, long id)
 	entry->nr = syscall_nr;
 	syscall_get_arguments(current, regs, 0, sys_data->nb_args, entry->args);
 
-	if (!filter_current_check_discard(sys_data->enter_event, entry, event))
-		trace_current_buffer_unlock_commit(event, 0, 0);
+	if (!filter_current_check_discard(buffer, sys_data->enter_event,
+					  entry, event))
+		trace_current_buffer_unlock_commit(buffer, event, 0, 0);
 }
 
 void ftrace_syscall_exit(struct pt_regs *regs, long ret)
@@ -256,6 +258,7 @@ void ftrace_syscall_exit(struct pt_regs *regs, long ret)
 	struct syscall_trace_exit *entry;
 	struct syscall_metadata *sys_data;
 	struct ring_buffer_event *event;
+	struct ring_buffer *buffer;
 	int syscall_nr;
 
 	syscall_nr = syscall_get_nr(current, regs);
@@ -268,7 +271,7 @@ void ftrace_syscall_exit(struct pt_regs *regs, long ret)
 	if (!sys_data)
 		return;
 
-	event = trace_current_buffer_lock_reserve(sys_data->exit_id,
+	event = trace_current_buffer_lock_reserve(&buffer, sys_data->exit_id,
 				sizeof(*entry), 0, 0);
 	if (!event)
 		return;
@@ -277,8 +280,9 @@ void ftrace_syscall_exit(struct pt_regs *regs, long ret)
 	entry->nr = syscall_nr;
 	entry->ret = syscall_get_return_value(current, regs);
 
-	if (!filter_current_check_discard(sys_data->exit_event, entry, event))
-		trace_current_buffer_unlock_commit(event, 0, 0);
+	if (!filter_current_check_discard(buffer, sys_data->exit_event,
+					  entry, event))
+		trace_current_buffer_unlock_commit(buffer, event, 0, 0);
 }
 
 int reg_event_syscall_enter(void *ptr)
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 15/18] tracing: add trace_array_printk for internal tracers to use
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (13 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 14/18] tracing: pass around ring buffer instead of tracer Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 16/18] tracing: report error in trace if we fail to swap latency buffer Steven Rostedt
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0015-tracing-add-trace_array_printk-for-internal-tracers-.patch --]
[-- Type: text/plain, Size: 2347 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

This patch adds a trace_array_printk to allow a tracer to use the
trace_printk on its own trace array.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c |   24 ++++++++++++++++++++++--
 kernel/trace/trace.h |    5 +++++
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 0c61836..ef08328 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1201,7 +1201,23 @@ out:
 }
 EXPORT_SYMBOL_GPL(trace_vbprintk);
 
-int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
+int trace_array_printk(struct trace_array *tr,
+		       unsigned long ip, const char *fmt, ...)
+{
+	int ret;
+	va_list ap;
+
+	if (!(trace_flags & TRACE_ITER_PRINTK))
+		return 0;
+
+	va_start(ap, fmt);
+	ret = trace_array_vprintk(tr, ip, fmt, ap);
+	va_end(ap);
+	return ret;
+}
+
+int trace_array_vprintk(struct trace_array *tr,
+			unsigned long ip, const char *fmt, va_list args)
 {
 	static raw_spinlock_t trace_buf_lock = __RAW_SPIN_LOCK_UNLOCKED;
 	static char trace_buf[TRACE_BUF_SIZE];
@@ -1209,7 +1225,6 @@ int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
 	struct ftrace_event_call *call = &event_print;
 	struct ring_buffer_event *event;
 	struct ring_buffer *buffer;
-	struct trace_array *tr = &global_trace;
 	struct trace_array_cpu *data;
 	int cpu, len = 0, size, pc;
 	struct print_entry *entry;
@@ -1260,6 +1275,11 @@ int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
 
 	return len;
 }
+
+int trace_vprintk(unsigned long ip, const char *fmt, va_list args)
+{
+	return trace_array_printk(&global_trace, ip, fmt, args);
+}
 EXPORT_SYMBOL_GPL(trace_vprintk);
 
 enum trace_file_type {
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 4d30414..fa1dccb 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -566,6 +566,11 @@ extern int
 trace_vbprintk(unsigned long ip, const char *fmt, va_list args);
 extern int
 trace_vprintk(unsigned long ip, const char *fmt, va_list args);
+extern int
+trace_array_vprintk(struct trace_array *tr,
+		    unsigned long ip, const char *fmt, va_list args);
+int trace_array_printk(struct trace_array *tr,
+		       unsigned long ip, const char *fmt, ...);
 
 extern unsigned long trace_flags;
 
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 16/18] tracing: report error in trace if we fail to swap latency buffer
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (14 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 15/18] tracing: add trace_array_printk for internal tracers to use Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 17/18] ring-buffer: check for swapped buffers in start of committing Steven Rostedt
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0016-tracing-report-error-in-trace-if-we-fail-to-swap-lat.patch --]
[-- Type: text/plain, Size: 2273 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The irqsoff tracer will fail to swap the cpu buffer with the max
buffer if it preempts a commit. Instead of ignoring this, this patch
makes the tracer report it if the last max latency failed due to preempting
a current commit.

The output of the latency tracer will look like this:

 # tracer: irqsoff
 #
 # irqsoff latency trace v1.1.5 on 2.6.31-rc5
 # --------------------------------------------------------------------
 # latency: 112 us, #1/1, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
 #    -----------------
 #    | task: -4281 (uid:0 nice:0 policy:0 rt_prio:0)
 #    -----------------
 #  => started at: save_args
 #  => ended at:   __do_softirq
 #
 #
 #                  _------=> CPU#
 #                 / _-----=> irqs-off
 #                | / _----=> need-resched
 #                || / _---=> hardirq/softirq
 #                ||| / _--=> preempt-depth
 #                |||| /
 #                |||||     delay
 #  cmd     pid   ||||| time  |   caller
 #     \   /      |||||   \   |   /
    bash-4281    1d.s6  265us : update_max_tr_single: Failed to swap buffers due to commit in progress

Note the latency time and the functions that disabled the irqs or preemption
will still be listed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ef08328..6df9861 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -482,9 +482,20 @@ update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu)
 
 	ret = ring_buffer_swap_cpu(max_tr.buffer, tr->buffer, cpu);
 
+	if (ret == -EBUSY) {
+		/*
+		 * We failed to swap the buffer due to a commit taking
+		 * place on this CPU. We fail to record, but we reset
+		 * the max trace buffer (no one writes directly to it)
+		 * and flag that it failed.
+		 */
+		trace_array_printk(&max_tr, _THIS_IP_,
+			"Failed to swap buffers due to commit in progress\n");
+	}
+
 	ftrace_enable_cpu();
 
-	WARN_ON_ONCE(ret && ret != -EAGAIN);
+	WARN_ON_ONCE(ret && ret != -EAGAIN && ret != -EBUSY);
 
 	__update_max_tr(tr, tsk, cpu);
 	__raw_spin_unlock(&ftrace_max_lock);
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 17/18] ring-buffer: check for swapped buffers in start of committing
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (15 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 16/18] tracing: report error in trace if we fail to swap latency buffer Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-04 23:55 ` [PATCH 18/18] ring-buffer: only enable ring_buffer_swap_cpu when needed Steven Rostedt
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0017-ring-buffer-check-for-swapped-buffers-in-start-of-co.patch --]
[-- Type: text/plain, Size: 2169 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Because the irqsoff tracer can swap an internal CPU buffer, it is possible
that a swap happens between the start of the write and before the committing
bit is set (the committing bit will disable swapping).

This patch adds a check for this and will fail the write if it detects it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |   20 +++++++++++++++++---
 1 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f83a42a..1766c0e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2073,7 +2073,8 @@ static void rb_end_commit(struct ring_buffer_per_cpu *cpu_buffer)
 }
 
 static struct ring_buffer_event *
-rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
+rb_reserve_next_event(struct ring_buffer *buffer,
+		      struct ring_buffer_per_cpu *cpu_buffer,
 		      unsigned long length)
 {
 	struct ring_buffer_event *event;
@@ -2083,6 +2084,19 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
 
 	rb_start_commit(cpu_buffer);
 
+	/*
+	 * Due to the ability to swap a cpu buffer from a buffer
+	 * it is possible it was swapped before we committed.
+	 * (committing stops a swap). We check for it here and
+	 * if it happened, we have to fail the write.
+	 */
+	barrier();
+	if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) {
+		local_dec(&cpu_buffer->committing);
+		local_dec(&cpu_buffer->commits);
+		return NULL;
+	}
+
 	length = rb_calculate_event_length(length);
  again:
 	/*
@@ -2243,7 +2257,7 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
 	if (length > BUF_MAX_DATA_SIZE)
 		goto out;
 
-	event = rb_reserve_next_event(cpu_buffer, length);
+	event = rb_reserve_next_event(buffer, cpu_buffer, length);
 	if (!event)
 		goto out;
 
@@ -2476,7 +2490,7 @@ int ring_buffer_write(struct ring_buffer *buffer,
 	if (length > BUF_MAX_DATA_SIZE)
 		goto out;
 
-	event = rb_reserve_next_event(cpu_buffer, length);
+	event = rb_reserve_next_event(buffer, cpu_buffer, length);
 	if (!event)
 		goto out;
 
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 18/18] ring-buffer: only enable ring_buffer_swap_cpu when needed
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (16 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 17/18] ring-buffer: check for swapped buffers in start of committing Steven Rostedt
@ 2009-09-04 23:55 ` Steven Rostedt
  2009-09-05 15:06 ` [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Arnaldo Carvalho de Melo
  2009-09-06  4:19 ` Ingo Molnar
  19 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-04 23:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

[-- Attachment #1: 0018-ring-buffer-only-enable-ring_buffer_swap_cpu-when-ne.patch --]
[-- Type: text/plain, Size: 3646 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Since the ability to swap the cpu buffers adds a small overhead to
the recording of a trace, we only want to add it when needed.

Only the irqsoff and preemptoff tracers use this feature, and both are
not recommended for production kernels. This patch disables its use
when neither irqsoff nor preemptoff is configured.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ring_buffer.h |    9 +++++++++
 kernel/trace/Kconfig        |    8 ++++++++
 kernel/trace/ring_buffer.c  |    4 ++++
 3 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index e061b4e..5fcc31e 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -140,8 +140,17 @@ unsigned long ring_buffer_size(struct ring_buffer *buffer);
 void ring_buffer_reset_cpu(struct ring_buffer *buffer, int cpu);
 void ring_buffer_reset(struct ring_buffer *buffer);
 
+#ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
 int ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
 			 struct ring_buffer *buffer_b, int cpu);
+#else
+static inline int
+ring_buffer_swap_cpu(struct ring_buffer *buffer_a,
+		     struct ring_buffer *buffer_b, int cpu)
+{
+	return -ENODEV;
+}
+#endif
 
 int ring_buffer_empty(struct ring_buffer *buffer);
 int ring_buffer_empty_cpu(struct ring_buffer *buffer, int cpu);
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 163fbfc..1ea0d12 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -62,6 +62,12 @@ config EVENT_TRACING
 config CONTEXT_SWITCH_TRACER
 	bool
 
+config RING_BUFFER_ALLOW_SWAP
+	bool
+	help
+	 Allow the use of ring_buffer_swap_cpu.
+	 Adds a very slight overhead to tracing when enabled.
+
 # All tracer options should select GENERIC_TRACER. For those options that are
 # enabled by all tracers (context switch and event tracer) they select TRACING.
 # This allows those options to appear when no other tracer is selected. But the
@@ -146,6 +152,7 @@ config IRQSOFF_TRACER
 	select TRACE_IRQFLAGS
 	select GENERIC_TRACER
 	select TRACER_MAX_TRACE
+	select RING_BUFFER_ALLOW_SWAP
 	help
 	  This option measures the time spent in irqs-off critical
 	  sections, with microsecond accuracy.
@@ -167,6 +174,7 @@ config PREEMPT_TRACER
 	depends on PREEMPT
 	select GENERIC_TRACER
 	select TRACER_MAX_TRACE
+	select RING_BUFFER_ALLOW_SWAP
 	help
 	  This option measures the time spent in preemption off critical
 	  sections, with microsecond accuracy.
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 1766c0e..454e74e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2084,6 +2084,7 @@ rb_reserve_next_event(struct ring_buffer *buffer,
 
 	rb_start_commit(cpu_buffer);
 
+#ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
 	/*
 	 * Due to the ability to swap a cpu buffer from a buffer
 	 * it is possible it was swapped before we committed.
@@ -2096,6 +2097,7 @@ rb_reserve_next_event(struct ring_buffer *buffer,
 		local_dec(&cpu_buffer->commits);
 		return NULL;
 	}
+#endif
 
 	length = rb_calculate_event_length(length);
  again:
@@ -3498,6 +3500,7 @@ int ring_buffer_empty_cpu(struct ring_buffer *buffer, int cpu)
 }
 EXPORT_SYMBOL_GPL(ring_buffer_empty_cpu);
 
+#ifdef CONFIG_RING_BUFFER_ALLOW_SWAP
 /**
  * ring_buffer_swap_cpu - swap a CPU buffer between two ring buffers
  * @buffer_a: One buffer to swap with
@@ -3573,6 +3576,7 @@ out:
 	return ret;
 }
 EXPORT_SYMBOL_GPL(ring_buffer_swap_cpu);
+#endif /* CONFIG_RING_BUFFER_ALLOW_SWAP */
 
 /**
  * ring_buffer_alloc_read_page - allocate a page to read from buffer
-- 
1.6.3.3

-- 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (17 preceding siblings ...)
  2009-09-04 23:55 ` [PATCH 18/18] ring-buffer: only enable ring_buffer_swap_cpu when needed Steven Rostedt
@ 2009-09-05 15:06 ` Arnaldo Carvalho de Melo
  2009-09-07 10:20   ` Jan Blunck
  2009-09-06  4:19 ` Ingo Molnar
  19 siblings, 1 reply; 27+ messages in thread
From: Arnaldo Carvalho de Melo @ 2009-09-05 15:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Clark Williams, linux-rt-users,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo

Em Fri, Sep 04, 2009 at 07:55:27PM -0400, Steven Rostedt escreveu:
> Thomas,
> 
> I can port these over to -rt if you want, or you could just
> add them yourself. I know the spinlocks need to be converted to atomic.
> Just let me know if you want me to do it.

I did a quick port to tip/rt/head and couldn't reproduce the problems I
was experiencing, thanks!

I've put the ported patch series at:

http://userweb.kernel.org/~acme/rostedt-rb-rt

Tested using:

echo 1 > /sys/kernel/debug/tracing/events/irq/enable
echo 1 > /sys/kernel/debug/tracing/events/sched/enable
echo wakeup > /sys/kernel/debug/tracing/current_tracer

Then a make -j64 allyesconfig + cyclictest -p95 ...
[root@hs21xm-1 ~]# uname -a
Linux hs21xm-1 2.6.31-rc8-rt9 #2 SMP PREEMPT RT Sat Sep 5 09:42:03 CDT
2009 x86_64 x86_64 x86_64 GNU/Linux
[root@hs21xm-1 ~]# cat /sys/kernel/debug/tracing/trace
# tracer: wakeup
#
# wakeup latency trace v1.1.5 on 2.6.31-rc8-rt9
# --------------------------------------------------------------------
# latency: 220 us, #43/43, CPU#4 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
#    -----------------
#    | task: -2924 (uid:0 nice:-5 policy:1 rt_prio:85)
#    -----------------
#
#                  _------=> CPU#            
#                 / _-----=> irqs-off        
#                | / _----=> need-resched    
#                || / _---=> hardirq/softirq 
#                ||| / _--=> preempt-depth   
#                |||| /                      
#                |||||     delay             
#  cmd     pid   ||||| time  |   caller      
#     \   /      |||||   \   |   /           
sirq-rcu-52      3d..3    0us : sched_switch: task sirq-rcu/3:52 [29] (D) ==> cc1:13267 [120]
  <idle>-0       5d.h3    0us :      0:140:R   + [004]  2924: 14:R irq/60-eth2
  <idle>-0       5d.h3    1us+: wake_up_process <-handle_IRQ_event
irq/60-e-2924    4d..2    4us+: sched_wakeup: task sirq-net-rx/4:60 [24] success=1
   <...>-13267   3d..3    8us : sched_switch: task cc1:13267 [120] (T|0x100) ==> cc1:13085 [120]
irq/60-e-2924    4d.h2    8us+: sched_wakeup: task sirq-timer/4:58 [29] success=1
irq/60-e-2924    4d.h2   10us+: sched_wakeup: task sirq-rcu/4:65 [29] success=1
irq/60-e-2924    4d..3   15us+: sched_switch: task irq/60-eth2:2924 [14] (D) ==> sirq-net-rx/4:60 [24]
   <...>-12172   0d..4   26us+: sched_wakeup: task cc1:13566 [120] success=0
   <...>-13566   7d..4   36us+: sched_wakeup: task cc1:12172 [120] success=0
   <...>-60      4d..2   39us+: sched_wakeup: task cc1:12985 [120] success=1
   <...>-13300   2d..4   44us+: sched_wakeup: task cc1:12172 [120] success=0
   <...>-13566   7d..3   46us+: sched_switch: task cc1:13566 [120] (R) ==> cc1:12985 [120]
   <...>-60      4d..2   55us+: sched_wakeup: task cc1:13149 [120] success=1
   <...>-60      4d..4   59us+: sched_wakeup: task cc1:12985 [120] success=0
   <...>-12985   7d..2   65us : sched_migrate_task: task cc1:13274 [120] from: 4  to: 7
  <idle>-0       5d.h3   67us+: sched_wakeup: task sirq-timer/5:71 [29] success=1
   <...>-12985   7d..2   68us+: sched_wakeup: task cc1:13274 [120] success=1
  <idle>-0       5d..3   70us+: sched_switch: task swapper:0 [140] (R) ==> sirq-timer/5:71 [29]
sirq-tim-71      5d..3   74us+: sched_switch: task sirq-timer/5:71 [29] (D) ==> swapper:0 [140]
   <...>-60      4d..2   82us+: sched_wakeup: task cc1:12683 [120] success=1
   <...>-60      4d..3   89us+: sched_switch: task sirq-net-rx/4:60 [24] (D) ==> sirq-timer/4:58 [29]
sirq-tim-58      4d..3   93us+: sched_switch: task sirq-timer/4:58 [29] (D) ==> sirq-rcu/4:65 [29]
sirq-rcu-65      4d..3   97us+: sched_switch: task sirq-rcu/4:65 [29] (D) ==> cc1:13355 [120]
   <...>-13210   6d..3  103us+: sched_switch: task cc1:13210 [120] (T|0x100) ==> fixdep:13407 [120]
   <...>-13407   6d..2  127us+: sched_wakeup: task cc1:13127 [120] success=1
   <...>-13407   6d.h3  134us+: sched_wakeup: task sirq-timer/6:84 [29] success=1
   <...>-13407   6dNh3  137us+: sched_wakeup: task sirq-rcu/6:91 [29] success=1
   <...>-13407   6d..3  141us+: sched_switch: task fixdep:13407 [120] (R) ==> sirq-timer/6:84 [29]
   <...>-13300   2d..4  145us : sched_wakeup: task cc1:12172 [120] success=0
sirq-tim-84      6d..3  146us+: sched_switch: task sirq-timer/6:84 [29] (D) ==> sirq-rcu/6:91 [29]
sirq-rcu-91      6d..3  153us+: sched_switch: task sirq-rcu/6:91 [29] (D) ==> genksyms:13194 [120]
   <...>-12985   7d.h2  194us+: sched_wakeup: task sirq-timer/7:97 [29] success=1
   <...>-13358   1d..3  199us : sched_switch: task genksyms:13358 [120] (D) ==> as:12659 [120]
   <...>-12985   7dNh2  199us+: sched_wakeup: task sirq-rcu/7:104 [29] success=1
   <...>-12985   7d..3  204us+: sched_switch: task cc1:12985 [120] (R) ==> sirq-timer/7:97 [29]
sirq-tim-97      7d.h1  207us : irq_handler_entry: irq=60 handler=eth2
sirq-tim-97      7d.h1  208us+: irq_handler_exit: irq=60 return=handled
sirq-tim-97      7d.h2  211us+: sched_wakeup: task irq/60-eth2:2924 [14] success=1
sirq-tim-97      7d..3  216us+: sched_switch: task sirq-timer/7:97 [29] (D) ==> sirq-rcu/7:104 [29]
   <...>-13355   4d..3  219us : sched_switch: task cc1:13355 [120] (R) ==> irq/60-eth2:2924 [14]
   <...>-13355   4d..3  219us : __schedule <-schedule
   <...>-13355   4d..3  220us :  13355:120:R ==> [004]  2924: 14:R irq/60-eth2

- Arnaldo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]
  2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
                   ` (18 preceding siblings ...)
  2009-09-05 15:06 ` [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Arnaldo Carvalho de Melo
@ 2009-09-06  4:19 ` Ingo Molnar
  19 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2009-09-06  4:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker, Arnaldo Carvalho de Melo


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Finally (something I should have realized from day 1, not 4 days 
> later) I realized I already had a marker. The latency tracers 
> always record the timestamp of when the trace began. I can simply 
> ignore any trace event that happened before that timestamp. This 
> ended up working out very well. The code is quite simple and 
> solid.

Yes - that trick came from the original latency tracer and i added 
it for that purpose.

The flip side is that if there's anything wrong with timestamps 
(and this does happen frequently as well) we suddenly have an empty 
trace and no explanation about what happened.

>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> tip/tracing/core
> 
> 
> Steven Rostedt (18):
>       ring-buffer: do not reset while in a commit
>       ring-buffer: do not swap buffers during a commit
>       ring-buffer: remove unnecessary cpu_relax
>       ring-buffer: fix ring_buffer_read crossing pages
>       ring-buffer: remove ring_buffer_event_discard
>       ring-buffer: do not count discarded events
>       ring-buffer: disable all cpu buffers when one finds a problem
>       tracing: print out start and stop in latency traces
>       tracing: disable update max tracer while reading trace
>       tracing: disable buffers and synchronize_sched before resetting
>       tracing: remove users of tracing_reset
>       tracing: use timestamp to determine start of latency traces
>       tracing: make tracing_reset safe for external use
>       tracing: pass around ring buffer instead of tracer
>       tracing: add trace_array_printk for internal tracers to use
>       tracing: report error in trace if we fail to swap latency buffer
>       ring-buffer: check for swapped buffers in start of committing
>       ring-buffer: only enable ring_buffer_swap_cpu when needed
> 
> ----
>  include/linux/ftrace_event.h         |   15 ++-
>  include/linux/ring_buffer.h          |   23 +--
>  include/trace/ftrace.h               |   15 ++-
>  kernel/trace/Kconfig                 |    8 +
>  kernel/trace/blktrace.c              |   12 +-
>  kernel/trace/kmemtrace.c             |    4 +-
>  kernel/trace/ring_buffer.c           |  172 +++++++++++++-------
>  kernel/trace/trace.c                 |  297 ++++++++++++++++++++++++----------
>  kernel/trace/trace.h                 |   26 ++--
>  kernel/trace/trace_boot.c            |   16 +-
>  kernel/trace/trace_events.c          |    6 +-
>  kernel/trace/trace_functions_graph.c |   14 +-
>  kernel/trace/trace_irqsoff.c         |    3 +-
>  kernel/trace/trace_mmiotrace.c       |   10 +-
>  kernel/trace/trace_power.c           |   22 ++-
>  kernel/trace/trace_sched_switch.c    |   18 ++-
>  kernel/trace/trace_sched_wakeup.c    |    7 +-
>  kernel/trace/trace_syscalls.c        |   18 ++-
>  18 files changed, 444 insertions(+), 242 deletions(-)

Pulled, thanks a lot Steve!

We should also think about how to expose the function tracer, and 
in particular latency tracing functionality, via perf.

The most natural approach would be to expose it as a 'generic 
event' - and simply stream function tracing events as they happen.

The latency tracer itself could be exposed either as a generic 
event, or as a software counter. The difference there is that it 
needs a per counter 'max latency' state that controls whether a new 
trace is sent down the pipe.

This would allow very flexible tooling: for example one could do a 
dump of the maximum latency a particular workflow experiences [in a 
scheduling invariant manner] - without having to trace the full 
system. I.e. individual apps could be validated separately and 
latency impact on individual apps could be tracked more accurately 
as well.

(beyond the other usecases such as tracking cache-miss rates in 
addition to latencies - this would allow the observation of 
worst-case cachemisses correlating with large latencies, etc.)

Steve, Frederic, does any of you have interest (and time ;-) to 
look into that? It should probably have its own tooling variant, 
not be part of 'perf trace' per se (which is the generic tracing 
tool).
 
'perf latencytrace' or so - because latency tracing has its own 
special needs (for the same reason do we have a separate latency 
tracer ftrace plugin).

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer  [2.6.32]
  2009-09-05 15:06 ` [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Arnaldo Carvalho de Melo
@ 2009-09-07 10:20   ` Jan Blunck
  2009-09-07 10:28     ` Frederic Weisbecker
  2009-09-08 14:42     ` Jan Blunck
  0 siblings, 2 replies; 27+ messages in thread
From: Jan Blunck @ 2009-09-07 10:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Steven Rostedt, linux-kernel, Ingo Molnar, Clark Williams,
	linux-rt-users, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

On Sat, Sep 5, 2009 at 5:06 PM, Arnaldo Carvalho de Melo<acme@redhat.com> wrote:
>
> I did a quick port to tip/rt/head and couldn't reproduce the problems I
> was experiencing, thanks!
>
> I've put the ported patch series at:
>
> http://userweb.kernel.org/~acme/rostedt-rb-rt
>

Nice, this also seems to fix my problem with the hanging wakup
selftest. At least the system continues to boot. However, I still get
this:

[   13.739115] Testing tracer sched_switch: PASSED
[   14.002071] Testing tracer function: PASSED
[   14.158816] Testing dynamic ftrace: PASSED
[   14.512524] Testing tracer irqsoff: PASSED
[   14.693519] Testing tracer preemptoff: PASSED
[   14.853519] Testing tracer preemptirqsoff: PASSED
[   15.013506] Testing tracer wakeup:
[   15.134004] hrtimer: interrupt too slow, forcing clock min delta to
27508383 ns
[   15.134010] hrtimer: interrupt too slow, forcing clock min delta to
27544629 ns
[   15.134016] hrtimer: interrupt too slow, forcing clock min delta to
27543582 ns
[   15.134022] hrtimer: interrupt too slow, forcing clock min delta to
27543372 ns
[   15.134028] hrtimer: interrupt too slow, forcing clock min delta to
27543162 ns
[   15.134034] hrtimer: interrupt too slow, forcing clock min delta to
27542325 ns
[   15.134040] hrtimer: interrupt too slow, forcing clock min delta to
27542952 ns
[   15.134046] hrtimer: interrupt too slow, forcing clock min delta to
27543579 ns
[   15.134052] hrtimer: interrupt too slow, forcing clock min delta to
27543582 ns
[   15.134058] hrtimer: interrupt too slow, forcing clock min delta to
27543789 ns
[   15.134063] hrtimer: interrupt too slow, forcing clock min delta to
27543372 ns
[   15.134070] hrtimer: interrupt too slow, forcing clock min delta to
27542532 ns
[   15.134076] hrtimer: interrupt too slow, forcing clock min delta to
27542745 ns
[   15.134082] hrtimer: interrupt too slow, forcing clock min delta to
27543372 ns
[   15.134088] hrtimer: interrupt too slow, forcing clock min delta to
27542742 ns
[   15.134094] hrtimer: interrupt too slow, forcing clock min delta to
27542325 ns
[   15.134101] hrtimer: interrupt too slow, forcing clock min delta to
27544419 ns
[   15.134107] hrtimer: interrupt too slow, forcing clock min delta to
27544419 ns
[   15.134113] hrtimer: interrupt too slow, forcing clock min delta to
27544209 ns
[   15.134119] hrtimer: interrupt too slow, forcing clock min delta to
27544419 ns
[   15.134126] hrtimer: interrupt too slow, forcing clock min delta to
27544419 ns
[   15.134132] hrtimer: interrupt too slow, forcing clock min delta to
27544422 ns
[   15.134161] hrtimer: interrupt too slow, forcing clock min delta to
27492249 ns
[   15.133993] hrtimer: interrupt too slow, forcing clock min delta to
28730742 ns
[   15.810670] PASSED
[   15.865758] Testing tracer wakeup_rt: PASSED

And this time I have lockdep really disabled.

Jan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]
  2009-09-07 10:20   ` Jan Blunck
@ 2009-09-07 10:28     ` Frederic Weisbecker
  2009-09-07 10:41       ` Jan Blunck
  2009-09-08 14:42     ` Jan Blunck
  1 sibling, 1 reply; 27+ messages in thread
From: Frederic Weisbecker @ 2009-09-07 10:28 UTC (permalink / raw)
  To: Jan Blunck, Ulrich Lukas
  Cc: Arnaldo Carvalho de Melo, Steven Rostedt, linux-kernel,
	Ingo Molnar, Clark Williams, linux-rt-users, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra

On Mon, Sep 07, 2009 at 12:20:50PM +0200, Jan Blunck wrote:
> On Sat, Sep 5, 2009 at 5:06 PM, Arnaldo Carvalho de Melo<acme@redhat.com> wrote:
> >
> > I did a quick port to tip/rt/head and couldn't reproduce the problems I
> > was experiencing, thanks!
> >
> > I've put the ported patch series at:
> >
> > http://userweb.kernel.org/~acme/rostedt-rb-rt
> >
> 
> Nice, this also seems to fix my problem with the hanging wakup
> selftest. At least the system continues to boot. However, I still get
> this:
> 
> [   13.739115] Testing tracer sched_switch: PASSED
> [   14.002071] Testing tracer function: PASSED
> [   14.158816] Testing dynamic ftrace: PASSED
> [   14.512524] Testing tracer irqsoff: PASSED
> [   14.693519] Testing tracer preemptoff: PASSED
> [   14.853519] Testing tracer preemptirqsoff: PASSED
> [   15.013506] Testing tracer wakeup:
> [   15.134004] hrtimer: interrupt too slow, forcing clock min delta to
> 27508383 ns



Ah, Ulrich Lukas also reported me such warnings. I would like to
track the reason of these messages (the hrtimer hanging detection
seems to warn too early).

Could you please send me your config, I hope I could reproduce it.

Thanks!


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]
  2009-09-07 10:28     ` Frederic Weisbecker
@ 2009-09-07 10:41       ` Jan Blunck
  2009-09-07 11:22         ` Frederic Weisbecker
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Blunck @ 2009-09-07 10:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ulrich Lukas, Arnaldo Carvalho de Melo, Steven Rostedt,
	linux-kernel, Ingo Molnar, Clark Williams, linux-rt-users,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 1394 bytes --]

On Mon, Sep 07, Frederic Weisbecker wrote:

> On Mon, Sep 07, 2009 at 12:20:50PM +0200, Jan Blunck wrote:
> > On Sat, Sep 5, 2009 at 5:06 PM, Arnaldo Carvalho de Melo<acme@redhat.com> wrote:
> > >
> > > I did a quick port to tip/rt/head and couldn't reproduce the problems I
> > > was experiencing, thanks!
> > >
> > > I've put the ported patch series at:
> > >
> > > http://userweb.kernel.org/~acme/rostedt-rb-rt
> > >
> > 
> > Nice, this also seems to fix my problem with the hanging wakup
> > selftest. At least the system continues to boot. However, I still get
> > this:
> > 
> > [   13.739115] Testing tracer sched_switch: PASSED
> > [   14.002071] Testing tracer function: PASSED
> > [   14.158816] Testing dynamic ftrace: PASSED
> > [   14.512524] Testing tracer irqsoff: PASSED
> > [   14.693519] Testing tracer preemptoff: PASSED
> > [   14.853519] Testing tracer preemptirqsoff: PASSED
> > [   15.013506] Testing tracer wakeup:
> > [   15.134004] hrtimer: interrupt too slow, forcing clock min delta to
> > 27508383 ns
> 
> 
> 
> Ah, Ulrich Lukas also reported me such warnings. I would like to
> track the reason of these messages (the hrtimer hanging detection
> seems to warn too early).
> 
> Could you please send me your config, I hope I could reproduce it.
> 

I only see that on one of my systems, a 32 core Opteron and not on my 8
core. So it seems that size does matter ...

[-- Attachment #2: hrtimer-slow.config --]
[-- Type: application/x-config, Size: 69312 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32]
  2009-09-07 10:41       ` Jan Blunck
@ 2009-09-07 11:22         ` Frederic Weisbecker
  0 siblings, 0 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2009-09-07 11:22 UTC (permalink / raw)
  To: Jan Blunck
  Cc: Ulrich Lukas, Arnaldo Carvalho de Melo, Steven Rostedt,
	linux-kernel, Ingo Molnar, Clark Williams, linux-rt-users,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra

On Mon, Sep 07, 2009 at 12:41:55PM +0200, Jan Blunck wrote:
> On Mon, Sep 07, Frederic Weisbecker wrote:
> > Ah, Ulrich Lukas also reported me such warnings. I would like to
> > track the reason of these messages (the hrtimer hanging detection
> > seems to warn too early).
> > 
> > Could you please send me your config, I hope I could reproduce it.
> > 
> 
> I only see that on one of my systems, a 32 core Opteron and not on my 8
> core. So it seems that size does matter ...


Ah then I have few chances to reproduce it with my boxes :-)

I'll try anyway, thanks!


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer  [2.6.32]
  2009-09-07 10:20   ` Jan Blunck
  2009-09-07 10:28     ` Frederic Weisbecker
@ 2009-09-08 14:42     ` Jan Blunck
  2009-09-08 15:39       ` Steven Rostedt
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Blunck @ 2009-09-08 14:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Steven Rostedt, linux-kernel, Ingo Molnar, Clark Williams,
	linux-rt-users, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Frederic Weisbecker

On Mon, Sep 7, 2009 at 12:20 PM, Jan Blunck<jblunck@suse.de> wrote:
> On Sat, Sep 5, 2009 at 5:06 PM, Arnaldo Carvalho de Melo<acme@redhat.com> wrote:
>>
>> I did a quick port to tip/rt/head and couldn't reproduce the problems I
>> was experiencing, thanks!
>>
>> I've put the ported patch series at:
>>
>> http://userweb.kernel.org/~acme/rostedt-rb-rt
>>

Besides the problem with the wakeup tracer I see traces with
preemptirqsoff that I think are invalid:

# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 2.6.31-rc8-rt9-rt_trace
# --------------------------------------------------------------------
# latency: 332 us, #4/4, CPU#24 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:32)
#    -----------------
#    | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
#  => started at: start_secondary
#  => ended at:   start_secondary
#
#
#                  _------=> CPU#
#                 / _-----=> irqs-off
#                | / _----=> need-resched
#                || / _---=> hardirq/softirq
#                ||| / _--=> preempt-depth
#                |||| /
#                |||||     delay
#  cmd     pid   ||||| time  |   caller
#     \   /      |||||   \   |   /
  <idle>-0      24.N.1    0us : cpu_idle <-start_secondary
  <idle>-0      24dN.1    1us!: tick_nohz_stop_idle <-tick_nohz_restart_sched_ti
ck
  <idle>-0      24dN.1  332us+: cpu_idle <-start_secondary
  <idle>-0      24dN.1  336us : trace_preempt_on <-start_secondary

I have set kernel.ftrace_enabled = 1 but still not a single function
is recorded in tick_nohz_stop_idle().

Any ideas?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer  [2.6.32]
  2009-09-08 14:42     ` Jan Blunck
@ 2009-09-08 15:39       ` Steven Rostedt
  0 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2009-09-08 15:39 UTC (permalink / raw)
  To: Jan Blunck
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Ingo Molnar,
	Clark Williams, linux-rt-users, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Frederic Weisbecker

On Tue, 2009-09-08 at 16:42 +0200, Jan Blunck wrote:
> On Mon, Sep 7, 2009 at 12:20 PM, Jan Blunck<jblunck@suse.de> wrote:
> > On Sat, Sep 5, 2009 at 5:06 PM, Arnaldo Carvalho de Melo<acme@redhat.com> wrote:
> >>
> >> I did a quick port to tip/rt/head and couldn't reproduce the problems I
> >> was experiencing, thanks!
> >>
> >> I've put the ported patch series at:
> >>
> >> http://userweb.kernel.org/~acme/rostedt-rb-rt
> >>
> 
> Besides the problem with the wakeup tracer I see traces with
> preemptirqsoff that I think are invalid:
> 
> # tracer: preemptirqsoff
> #
> # preemptirqsoff latency trace v1.1.5 on 2.6.31-rc8-rt9-rt_trace
> # --------------------------------------------------------------------
> # latency: 332 us, #4/4, CPU#24 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:32)
> #    -----------------
> #    | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
> #    -----------------
> #  => started at: start_secondary
> #  => ended at:   start_secondary
> #
> #
> #                  _------=> CPU#
> #                 / _-----=> irqs-off
> #                | / _----=> need-resched
> #                || / _---=> hardirq/softirq
> #                ||| / _--=> preempt-depth
> #                |||| /
> #                |||||     delay
> #  cmd     pid   ||||| time  |   caller
> #     \   /      |||||   \   |   /
>   <idle>-0      24.N.1    0us : cpu_idle <-start_secondary
>   <idle>-0      24dN.1    1us!: tick_nohz_stop_idle <-tick_nohz_restart_sched_ti
> ck
>   <idle>-0      24dN.1  332us+: cpu_idle <-start_secondary
>   <idle>-0      24dN.1  336us : trace_preempt_on <-start_secondary
> 
> I have set kernel.ftrace_enabled = 1 but still not a single function
> is recorded in tick_nohz_stop_idle().
> 
> Any ideas?

Which arch is this for? When the system goes into idle, it usually
disables preemption and maybe interrupts. But the idle call in assembly
will enable interrupts. And when an interrupt happens it comes out.

There should be callbacks in cpu_idle to disable latency tracing. These
functions are called stop_critical_timings and start_critical_timings.
You'll see these calls in both process_64.c and process_32.c in the x86
arch.

Is you set up not calling cpu_idle?

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2009-09-08 15:39 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-04 23:55 [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Steven Rostedt
2009-09-04 23:55 ` [PATCH 01/18] ring-buffer: do not reset while in a commit Steven Rostedt
2009-09-04 23:55 ` [PATCH 02/18] ring-buffer: do not swap buffers during " Steven Rostedt
2009-09-04 23:55 ` [PATCH 03/18] ring-buffer: remove unnecessary cpu_relax Steven Rostedt
2009-09-04 23:55 ` [PATCH 04/18] ring-buffer: fix ring_buffer_read crossing pages Steven Rostedt
2009-09-04 23:55 ` [PATCH 05/18] ring-buffer: remove ring_buffer_event_discard Steven Rostedt
2009-09-04 23:55 ` [PATCH 06/18] ring-buffer: do not count discarded events Steven Rostedt
2009-09-04 23:55 ` [PATCH 07/18] ring-buffer: disable all cpu buffers when one finds a problem Steven Rostedt
2009-09-04 23:55 ` [PATCH 08/18] tracing: print out start and stop in latency traces Steven Rostedt
2009-09-04 23:55 ` [PATCH 09/18] tracing: disable update max tracer while reading trace Steven Rostedt
2009-09-04 23:55 ` [PATCH 10/18] tracing: disable buffers and synchronize_sched before resetting Steven Rostedt
2009-09-04 23:55 ` [PATCH 11/18] tracing: remove users of tracing_reset Steven Rostedt
2009-09-04 23:55 ` [PATCH 12/18] tracing: use timestamp to determine start of latency traces Steven Rostedt
2009-09-04 23:55 ` [PATCH 13/18] tracing: make tracing_reset safe for external use Steven Rostedt
2009-09-04 23:55 ` [PATCH 14/18] tracing: pass around ring buffer instead of tracer Steven Rostedt
2009-09-04 23:55 ` [PATCH 15/18] tracing: add trace_array_printk for internal tracers to use Steven Rostedt
2009-09-04 23:55 ` [PATCH 16/18] tracing: report error in trace if we fail to swap latency buffer Steven Rostedt
2009-09-04 23:55 ` [PATCH 17/18] ring-buffer: check for swapped buffers in start of committing Steven Rostedt
2009-09-04 23:55 ` [PATCH 18/18] ring-buffer: only enable ring_buffer_swap_cpu when needed Steven Rostedt
2009-09-05 15:06 ` [PATCH 00/18] tracing/ring-buffer: fixes for latency tracer [2.6.32] Arnaldo Carvalho de Melo
2009-09-07 10:20   ` Jan Blunck
2009-09-07 10:28     ` Frederic Weisbecker
2009-09-07 10:41       ` Jan Blunck
2009-09-07 11:22         ` Frederic Weisbecker
2009-09-08 14:42     ` Jan Blunck
2009-09-08 15:39       ` Steven Rostedt
2009-09-06  4:19 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox