[PATCH 0/8] git pull request for tip/tracing/core

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/8] git pull request for tip/tracing/core
@ 2009-02-08  5:49 Steven Rostedt
  2009-02-08  5:49 ` [PATCH 1/8] trace: remove deprecated entry->cpu Steven Rostedt
                   ` (8 more replies)
  0 siblings, 9 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-02-08  5:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker

Ingo,

I rebased the previous changes onto tip/tracing/core. The first patch
is new, which is the elimination of the entry->cpu.

I did not remove it when it became obsolete, and thus we had
other developers using it, thinking that it was the way to
get the current cpu number. Unfortunately, the entry->cpu was
just taking up space, and never was initialized.

The rest of the patches, I've posted before, but this time I ported
them over to tip/tracing/core.


The following patches are in:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git

    branch: tip/tracing/core/devel


Steven Rostedt (7):
      trace: remove deprecated entry->cpu
      ring-buffer: add NMI protection for spinlocks
      ring-buffer: allow tracing_off to be used in core kernel code
      ftrace, x86: rename in_nmi variable
      nmi: add generic nmi tracking state
      ftrace: change function graph tracer to use new in_nmi
      ring-buffer: use generic version of in_nmi

Wenji Huang (1):
      trace: trivial fixes in comment typos.

----
 arch/x86/Kconfig                 |    1 +
 arch/x86/kernel/ftrace.c         |   35 ++++++++---------------------------
 include/linux/ftrace.h           |    2 +-
 include/linux/ftrace_irq.h       |    2 +-
 include/linux/hardirq.h          |   15 +++++++++++++++
 include/linux/ring_buffer.h      |    9 +++++++++
 kernel/trace/Kconfig             |    8 ++++++++
 kernel/trace/ftrace.c            |    6 +++---
 kernel/trace/ring_buffer.c       |   31 +++++++++++++++++++++++++++++--
 kernel/trace/trace.c             |    2 +-
 kernel/trace/trace.h             |    7 +++----
 kernel/trace/trace_hw_branches.c |    3 +--
 kernel/trace/trace_output.c      |    6 +++---
 13 files changed, 83 insertions(+), 44 deletions(-)

-- 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/8] trace: remove deprecated entry->cpu
  2009-02-08  5:49 [PATCH 0/8] git pull request for tip/tracing/core Steven Rostedt
@ 2009-02-08  5:49 ` Steven Rostedt
  2009-02-08 12:29   ` Frederic Weisbecker
  2009-02-08  5:49 ` [PATCH 2/8] ring-buffer: add NMI protection for spinlocks Steven Rostedt
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2009-02-08  5:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Steven Rostedt

[-- Attachment #1: 0001-trace-remove-deprecated-entry-cpu.patch --]
[-- Type: text/plain, Size: 4221 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Impact: fix to prevent developers from using entry->cpu

With the new ring buffer infrastructure, the cpu for the entry is
implicit with which CPU buffer it is on.

The original code use to record the current cpu into the generic
entry header, which can be retrieved by entry->cpu. When the
ring buffer was introduced, the users were convert to use the
the cpu number of which cpu ring buffer was in use (this was passed
to the tracers by the iterator: iter->cpu).

Unfortunately, the cpu item in the entry structure was never removed.
This allowed for developers to use it instead of the proper iter->cpu,
unknowingly, using an uninitialized variable. This was not the fault
of the developers, since it would seem like the logical place to
retrieve the cpu identifier.

This patch removes the cpu item from the entry structure and fixes
all the users that should have been using iter->cpu.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 kernel/trace/trace.c             |    2 +-
 kernel/trace/trace.h             |    1 -
 kernel/trace/trace_hw_branches.c |    3 +--
 kernel/trace/trace_output.c      |    6 +++---
 4 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index fd51cf0..bd4d9f8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1531,7 +1531,7 @@ static enum print_line_t print_bin_fmt(struct trace_iterator *iter)
 
 	if (trace_flags & TRACE_ITER_CONTEXT_INFO) {
 		SEQ_PUT_FIELD_RET(s, entry->pid);
-		SEQ_PUT_FIELD_RET(s, entry->cpu);
+		SEQ_PUT_FIELD_RET(s, iter->cpu);
 		SEQ_PUT_FIELD_RET(s, iter->ts);
 	}
 
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f0c7a0f..5efc4c7 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -45,7 +45,6 @@ enum trace_type {
  */
 struct trace_entry {
 	unsigned char		type;
-	unsigned char		cpu;
 	unsigned char		flags;
 	unsigned char		preempt_count;
 	int			pid;
diff --git a/kernel/trace/trace_hw_branches.c b/kernel/trace/trace_hw_branches.c
index fff3545..549238a 100644
--- a/kernel/trace/trace_hw_branches.c
+++ b/kernel/trace/trace_hw_branches.c
@@ -159,7 +159,7 @@ static enum print_line_t bts_trace_print_line(struct trace_iterator *iter)
 	trace_assign_type(it, entry);
 
 	if (entry->type == TRACE_HW_BRANCHES) {
-		if (trace_seq_printf(seq, "%4d  ", entry->cpu) &&
+		if (trace_seq_printf(seq, "%4d  ", iter->cpu) &&
 		    seq_print_ip_sym(seq, it->to, symflags) &&
 		    trace_seq_printf(seq, "\t  <-  ") &&
 		    seq_print_ip_sym(seq, it->from, symflags) &&
@@ -195,7 +195,6 @@ void trace_hw_branch(u64 from, u64 to)
 	entry	= ring_buffer_event_data(event);
 	tracing_generic_entry_update(&entry->ent, 0, from);
 	entry->ent.type = TRACE_HW_BRANCHES;
-	entry->ent.cpu = cpu;
 	entry->from = from;
 	entry->to   = to;
 	ring_buffer_unlock_commit(tr->buffer, event, irq2);
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index b7380ee..463a310 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -333,7 +333,7 @@ int trace_print_context(struct trace_iterator *iter)
 	unsigned long secs = (unsigned long)t;
 
 	return trace_seq_printf(s, "%16s-%-5d [%03d] %5lu.%06lu: ",
-				comm, entry->pid, entry->cpu, secs, usec_rem);
+				comm, entry->pid, iter->cpu, secs, usec_rem);
 }
 
 int trace_print_lat_context(struct trace_iterator *iter)
@@ -356,7 +356,7 @@ int trace_print_lat_context(struct trace_iterator *iter)
 		char *comm = trace_find_cmdline(entry->pid);
 		ret = trace_seq_printf(s, "%16s %5d %3d %d %08x %08lx [%08lx]"
 				       " %ld.%03ldms (+%ld.%03ldms): ", comm,
-				       entry->pid, entry->cpu, entry->flags,
+				       entry->pid, iter->cpu, entry->flags,
 				       entry->preempt_count, iter->idx,
 				       ns2usecs(iter->ts),
 				       abs_usecs / USEC_PER_MSEC,
@@ -364,7 +364,7 @@ int trace_print_lat_context(struct trace_iterator *iter)
 				       rel_usecs / USEC_PER_MSEC,
 				       rel_usecs % USEC_PER_MSEC);
 	} else {
-		ret = lat_print_generic(s, entry, entry->cpu);
+		ret = lat_print_generic(s, entry, iter->cpu);
 		if (ret)
 			ret = lat_print_timestamp(s, abs_usecs, rel_usecs);
 	}
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/8] ring-buffer: add NMI protection for spinlocks
  2009-02-08  5:49 [PATCH 0/8] git pull request for tip/tracing/core Steven Rostedt
  2009-02-08  5:49 ` [PATCH 1/8] trace: remove deprecated entry->cpu Steven Rostedt
@ 2009-02-08  5:49 ` Steven Rostedt
  2009-02-08  5:49 ` [PATCH 3/8] ring-buffer: allow tracing_off to be used in core kernel code Steven Rostedt
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-02-08  5:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Steven Rostedt

[-- Attachment #1: 0002-ring-buffer-add-NMI-protection-for-spinlocks.patch --]
[-- Type: text/plain, Size: 7151 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Impact: prevent deadlock in NMI

The ring buffers are not yet totally lockless with writing to
the buffer. When a writer crosses a page, it grabs a per cpu spinlock
to protect against a reader. The spinlocks taken by a writer are not
to protect against other writers, since a writer can only write to
its own per cpu buffer. The spinlocks protect against readers that
can touch any cpu buffer. The writers are made to be reentrant
with the spinlocks disabling interrupts.

The problem arises when an NMI writes to the buffer, and that write
crosses a page boundary. If it grabs a spinlock, it can be racing
with another writer (since disabling interrupts does not protect
against NMIs) or with a reader on the same CPU. Luckily, most of the
users are not reentrant and protects against this issue. But if a
user of the ring buffer becomes reentrant (which is what the ring
buffers do allow), if the NMI also writes to the ring buffer then
we risk the chance of a deadlock.

This patch moves the ftrace_nmi_enter called by nmi_enter() to the
ring buffer code. It replaces the current ftrace_nmi_enter that is
used by arch specific code to arch_ftrace_nmi_enter and updates
the Kconfig to handle it.

When an NMI is called, it will set a per cpu variable in the ring buffer
code and will clear it when the NMI exits. If a write to the ring buffer
crosses page boundaries inside an NMI, a trylock is used on the spin
lock instead. If the spinlock fails to be acquired, then the entry
is discarded.

This bug appeared in the ftrace work in the RT tree, where event tracing
is reentrant. This workaround solved the deadlocks that appeared there.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/Kconfig           |    1 +
 arch/x86/kernel/ftrace.c   |    8 +++---
 include/linux/ftrace_irq.h |   10 ++++++++-
 kernel/trace/Kconfig       |    8 +++++++
 kernel/trace/ring_buffer.c |   48 ++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 73f7fe8..a6be725 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -34,6 +34,7 @@ config X86
 	select HAVE_FUNCTION_TRACER
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_FUNCTION_TRACE_MCOUNT_TEST
+	select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE || FUNCTION_GRAPH_TRACER
 	select HAVE_KVM if ((X86_32 && !X86_VOYAGER && !X86_VISWS && !X86_NUMAQ) || X86_64)
 	select HAVE_ARCH_KGDB if !X86_VOYAGER
 	select HAVE_ARCH_TRACEHOOK
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 4d33224..4c68358 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -113,7 +113,7 @@ static void ftrace_mod_code(void)
 					     MCOUNT_INSN_SIZE);
 }
 
-void ftrace_nmi_enter(void)
+void arch_ftrace_nmi_enter(void)
 {
 	atomic_inc(&in_nmi);
 	/* Must have in_nmi seen before reading write flag */
@@ -124,7 +124,7 @@ void ftrace_nmi_enter(void)
 	}
 }
 
-void ftrace_nmi_exit(void)
+void arch_ftrace_nmi_exit(void)
 {
 	/* Finish all executions before clearing in_nmi */
 	smp_wmb();
@@ -376,12 +376,12 @@ int ftrace_disable_ftrace_graph_caller(void)
  */
 static atomic_t in_nmi;
 
-void ftrace_nmi_enter(void)
+void arch_ftrace_nmi_enter(void)
 {
 	atomic_inc(&in_nmi);
 }
 
-void ftrace_nmi_exit(void)
+void arch_ftrace_nmi_exit(void)
 {
 	atomic_dec(&in_nmi);
 }
diff --git a/include/linux/ftrace_irq.h b/include/linux/ftrace_irq.h
index 366a054..29de677 100644
--- a/include/linux/ftrace_irq.h
+++ b/include/linux/ftrace_irq.h
@@ -2,7 +2,15 @@
 #define _LINUX_FTRACE_IRQ_H
 
 
-#if defined(CONFIG_DYNAMIC_FTRACE) || defined(CONFIG_FUNCTION_GRAPH_TRACER)
+#ifdef CONFIG_FTRACE_NMI_ENTER
+extern void arch_ftrace_nmi_enter(void);
+extern void arch_ftrace_nmi_exit(void);
+#else
+static inline void arch_ftrace_nmi_enter(void) { }
+static inline void arch_ftrace_nmi_exit(void) { }
+#endif
+
+#ifdef CONFIG_RING_BUFFER
 extern void ftrace_nmi_enter(void);
 extern void ftrace_nmi_exit(void);
 #else
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 28f2644..25131a5 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -9,6 +9,9 @@ config USER_STACKTRACE_SUPPORT
 config NOP_TRACER
 	bool
 
+config HAVE_FTRACE_NMI_ENTER
+	bool
+
 config HAVE_FUNCTION_TRACER
 	bool
 
@@ -37,6 +40,11 @@ config TRACER_MAX_TRACE
 config RING_BUFFER
 	bool
 
+config FTRACE_NMI_ENTER
+       bool
+       depends on HAVE_FTRACE_NMI_ENTER
+       default y
+
 config TRACING
 	bool
 	select DEBUG_FS
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index b36d737..a60a6a8 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4,6 +4,7 @@
  * Copyright (C) 2008 Steven Rostedt <srostedt@redhat.com>
  */
 #include <linux/ring_buffer.h>
+#include <linux/ftrace_irq.h>
 #include <linux/spinlock.h>
 #include <linux/debugfs.h>
 #include <linux/uaccess.h>
@@ -19,6 +20,35 @@
 #include "trace.h"
 
 /*
+ * Since the write to the buffer is still not fully lockless,
+ * we must be careful with NMIs. The locks in the writers
+ * are taken when a write crosses to a new page. The locks
+ * protect against races with the readers (this will soon
+ * be fixed with a lockless solution).
+ *
+ * Because we can not protect against NMIs, and we want to
+ * keep traces reentrant, we need to manage what happens
+ * when we are in an NMI.
+ */
+static DEFINE_PER_CPU(int, rb_in_nmi);
+
+void ftrace_nmi_enter(void)
+{
+	__get_cpu_var(rb_in_nmi)++;
+	/* call arch specific handler too */
+	arch_ftrace_nmi_enter();
+}
+
+void ftrace_nmi_exit(void)
+{
+	arch_ftrace_nmi_exit();
+	__get_cpu_var(rb_in_nmi)--;
+	/* NMIs are not recursive */
+	WARN_ON_ONCE(__get_cpu_var(rb_in_nmi));
+}
+
+
+/*
  * A fast way to enable or disable all ring buffers is to
  * call tracing_on or tracing_off. Turning off the ring buffers
  * prevents all ring buffers from being recorded to.
@@ -982,6 +1012,7 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
 	struct ring_buffer *buffer = cpu_buffer->buffer;
 	struct ring_buffer_event *event;
 	unsigned long flags;
+	bool lock_taken = false;
 
 	commit_page = cpu_buffer->commit_page;
 	/* we just need to protect against interrupts */
@@ -995,7 +1026,19 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
 		struct buffer_page *next_page = tail_page;
 
 		local_irq_save(flags);
-		__raw_spin_lock(&cpu_buffer->lock);
+		/*
+		 * NMIs can happen after we take the lock.
+		 * If we are in an NMI, only take the lock
+		 * if it is not already taken. Otherwise
+		 * simply fail.
+		 */
+		if (unlikely(__get_cpu_var(rb_in_nmi))) {
+			if (!__raw_spin_trylock(&cpu_buffer->lock))
+				goto out_unlock;
+		} else
+			__raw_spin_lock(&cpu_buffer->lock);
+
+		lock_taken = true;
 
 		rb_inc_page(cpu_buffer, &next_page);
 
@@ -1097,7 +1140,8 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
 	if (tail <= BUF_PAGE_SIZE)
 		local_set(&tail_page->write, tail);
 
-	__raw_spin_unlock(&cpu_buffer->lock);
+	if (likely(lock_taken))
+		__raw_spin_unlock(&cpu_buffer->lock);
 	local_irq_restore(flags);
 	return NULL;
 }
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/8] ring-buffer: allow tracing_off to be used in core kernel code
  2009-02-08  5:49 [PATCH 0/8] git pull request for tip/tracing/core Steven Rostedt
  2009-02-08  5:49 ` [PATCH 1/8] trace: remove deprecated entry->cpu Steven Rostedt
  2009-02-08  5:49 ` [PATCH 2/8] ring-buffer: add NMI protection for spinlocks Steven Rostedt
@ 2009-02-08  5:49 ` Steven Rostedt
  2009-02-08  5:49 ` [PATCH 4/8] ftrace, x86: rename in_nmi variable Steven Rostedt
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-02-08  5:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Steven Rostedt

[-- Attachment #1: 0003-ring-buffer-allow-tracing_off-to-be-used-in-core-ke.patch --]
[-- Type: text/plain, Size: 1337 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

tracing_off() is the fastest way to stop recording to the ring buffers.
This may be used in places like panic and die, just before the
ftrace_dump is called.

This patch adds the appropriate CPP conditionals to make it a stub
function when the ring buffer is not configured it.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 include/linux/ring_buffer.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index b3b3596..ac94c06 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -124,9 +124,18 @@ unsigned long ring_buffer_overrun_cpu(struct ring_buffer *buffer, int cpu);
 u64 ring_buffer_time_stamp(int cpu);
 void ring_buffer_normalize_time_stamp(int cpu, u64 *ts);
 
+/*
+ * The below functions are fine to use outside the tracing facility.
+ */
+#ifdef CONFIG_RING_BUFFER
 void tracing_on(void);
 void tracing_off(void);
 void tracing_off_permanent(void);
+#else
+static inline void tracing_on(void) { }
+static inline void tracing_off(void) { }
+static inline void tracing_off_permanent(void) { }
+#endif
 
 void *ring_buffer_alloc_read_page(struct ring_buffer *buffer);
 void ring_buffer_free_read_page(struct ring_buffer *buffer, void *data);
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/8] ftrace, x86: rename in_nmi variable
  2009-02-08  5:49 [PATCH 0/8] git pull request for tip/tracing/core Steven Rostedt
                   ` (2 preceding siblings ...)
  2009-02-08  5:49 ` [PATCH 3/8] ring-buffer: allow tracing_off to be used in core kernel code Steven Rostedt
@ 2009-02-08  5:49 ` Steven Rostedt
  2009-02-08  5:50 ` [PATCH 5/8] nmi: add generic nmi tracking state Steven Rostedt
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-02-08  5:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Steven Rostedt

[-- Attachment #1: 0004-ftrace-x86-rename-in_nmi-variable.patch --]
[-- Type: text/plain, Size: 2698 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Impact: clean up

The in_nmi variable in x86 arch ftrace.c is a misnomer.
Andrew Morton pointed out that the in_nmi variable is incremented
by all CPUS. It can be set when another CPU is running an NMI.

Since this is actually intentional, the fix is to rename it to
what it really is: "nmi_running"

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/kernel/ftrace.c |   22 +++++++++++-----------
 1 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 4c68358..e3fad2e 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -82,7 +82,7 @@ static unsigned char *ftrace_call_replace(unsigned long ip, unsigned long addr)
  * are the same as what exists.
  */
 
-static atomic_t in_nmi = ATOMIC_INIT(0);
+static atomic_t nmi_running = ATOMIC_INIT(0);
 static int mod_code_status;		/* holds return value of text write */
 static int mod_code_write;		/* set when NMI should do the write */
 static void *mod_code_ip;		/* holds the IP to write to */
@@ -115,8 +115,8 @@ static void ftrace_mod_code(void)
 
 void arch_ftrace_nmi_enter(void)
 {
-	atomic_inc(&in_nmi);
-	/* Must have in_nmi seen before reading write flag */
+	atomic_inc(&nmi_running);
+	/* Must have nmi_running seen before reading write flag */
 	smp_mb();
 	if (mod_code_write) {
 		ftrace_mod_code();
@@ -126,19 +126,19 @@ void arch_ftrace_nmi_enter(void)
 
 void arch_ftrace_nmi_exit(void)
 {
-	/* Finish all executions before clearing in_nmi */
+	/* Finish all executions before clearing nmi_running */
 	smp_wmb();
-	atomic_dec(&in_nmi);
+	atomic_dec(&nmi_running);
 }
 
 static void wait_for_nmi(void)
 {
-	if (!atomic_read(&in_nmi))
+	if (!atomic_read(&nmi_running))
 		return;
 
 	do {
 		cpu_relax();
-	} while(atomic_read(&in_nmi));
+	} while (atomic_read(&nmi_running));
 
 	nmi_wait_count++;
 }
@@ -374,16 +374,16 @@ int ftrace_disable_ftrace_graph_caller(void)
  * this page for dynamic ftrace. They have been
  * simplified to ignore all traces in NMI context.
  */
-static atomic_t in_nmi;
+static atomic_t nmi_running;
 
 void arch_ftrace_nmi_enter(void)
 {
-	atomic_inc(&in_nmi);
+	atomic_inc(&nmi_running);
 }
 
 void arch_ftrace_nmi_exit(void)
 {
-	atomic_dec(&in_nmi);
+	atomic_dec(&nmi_running);
 }
 
 #endif /* !CONFIG_DYNAMIC_FTRACE */
@@ -475,7 +475,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
 				&return_to_handler;
 
 	/* Nmi's are currently unsupported */
-	if (unlikely(atomic_read(&in_nmi)))
+	if (unlikely(atomic_read(&nmi_running)))
 		return;
 
 	if (unlikely(atomic_read(&current->tracing_graph_pause)))
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/8] nmi: add generic nmi tracking state
  2009-02-08  5:49 [PATCH 0/8] git pull request for tip/tracing/core Steven Rostedt
                   ` (3 preceding siblings ...)
  2009-02-08  5:49 ` [PATCH 4/8] ftrace, x86: rename in_nmi variable Steven Rostedt
@ 2009-02-08  5:50 ` Steven Rostedt
  2009-02-08  5:50 ` [PATCH 6/8] ftrace: change function graph tracer to use new in_nmi Steven Rostedt
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-02-08  5:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Steven Rostedt

[-- Attachment #1: 0005-nmi-add-generic-nmi-tracking-state.patch --]
[-- Type: text/plain, Size: 1890 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

This code adds an in_nmi() macro that uses the current tasks preempt count
to track when it is in NMI context. Other parts of the kernel can
use this to determine if the context is in NMI context or not.

This code was inspired by the -rt patch in_nmi version that was
written by Peter Zijlstra, who borrowed that code from
Mathieu Desnoyers.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 include/linux/hardirq.h |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index f832883..f3cf86e 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -61,6 +61,12 @@
 #error PREEMPT_ACTIVE is too low!
 #endif
 
+#define NMI_OFFSET	(PREEMPT_ACTIVE << 1)
+
+#if NMI_OFFSET >= 0x80000000
+#error PREEMPT_ACTIVE too high!
+#endif
+
 #define hardirq_count()	(preempt_count() & HARDIRQ_MASK)
 #define softirq_count()	(preempt_count() & SOFTIRQ_MASK)
 #define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK))
@@ -73,6 +79,11 @@
 #define in_softirq()		(softirq_count())
 #define in_interrupt()		(irq_count())
 
+/*
+ * Are we in NMI context?
+ */
+#define in_nmi()	(preempt_count() & NMI_OFFSET)
+
 #if defined(CONFIG_PREEMPT)
 # define PREEMPT_INATOMIC_BASE kernel_locked()
 # define PREEMPT_CHECK_OFFSET 1
@@ -167,6 +178,8 @@ extern void irq_exit(void);
 #define nmi_enter()				\
 	do {					\
 		ftrace_nmi_enter();		\
+		BUG_ON(in_nmi());		\
+		add_preempt_count(NMI_OFFSET);	\
 		lockdep_off();			\
 		rcu_nmi_enter();		\
 		__irq_enter();			\
@@ -177,6 +190,8 @@ extern void irq_exit(void);
 		__irq_exit();			\
 		rcu_nmi_exit();			\
 		lockdep_on();			\
+		BUG_ON(!in_nmi());		\
+		sub_preempt_count(NMI_OFFSET);	\
 		ftrace_nmi_exit();		\
 	} while (0)
 
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/8] ftrace: change function graph tracer to use new in_nmi
  2009-02-08  5:49 [PATCH 0/8] git pull request for tip/tracing/core Steven Rostedt
                   ` (4 preceding siblings ...)
  2009-02-08  5:50 ` [PATCH 5/8] nmi: add generic nmi tracking state Steven Rostedt
@ 2009-02-08  5:50 ` Steven Rostedt
  2009-02-08  5:50 ` [PATCH 7/8] ring-buffer: use generic version of in_nmi Steven Rostedt
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-02-08  5:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Steven Rostedt

[-- Attachment #1: 0006-ftrace-change-function-graph-tracer-to-use-new-in_n.patch --]
[-- Type: text/plain, Size: 2405 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The function graph tracer piggy backed onto the dynamic ftracer
to use the in_nmi custom code for dynamic tracing. The problem
was (as Andrew Morton pointed out) it really only wanted to bail
out if the context of the current CPU was in NMI context. But the
dynamic ftrace in_nmi custom code was true if _any_ CPU happened
to be in NMI context.

Now that we have a generic in_nmi interface, this patch changes
the function graph code to use it instead of the dynamic ftarce
custom code.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/Kconfig         |    2 +-
 arch/x86/kernel/ftrace.c |   21 +--------------------
 2 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a6be725..2cf7bbc 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -34,7 +34,7 @@ config X86
 	select HAVE_FUNCTION_TRACER
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_FUNCTION_TRACE_MCOUNT_TEST
-	select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE || FUNCTION_GRAPH_TRACER
+	select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE
 	select HAVE_KVM if ((X86_32 && !X86_VOYAGER && !X86_VISWS && !X86_NUMAQ) || X86_64)
 	select HAVE_ARCH_KGDB if !X86_VOYAGER
 	select HAVE_ARCH_TRACEHOOK
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index e3fad2e..918073c 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -367,25 +367,6 @@ int ftrace_disable_ftrace_graph_caller(void)
 	return ftrace_mod_jmp(ip, old_offset, new_offset);
 }
 
-#else /* CONFIG_DYNAMIC_FTRACE */
-
-/*
- * These functions are picked from those used on
- * this page for dynamic ftrace. They have been
- * simplified to ignore all traces in NMI context.
- */
-static atomic_t nmi_running;
-
-void arch_ftrace_nmi_enter(void)
-{
-	atomic_inc(&nmi_running);
-}
-
-void arch_ftrace_nmi_exit(void)
-{
-	atomic_dec(&nmi_running);
-}
-
 #endif /* !CONFIG_DYNAMIC_FTRACE */
 
 /* Add a function return address to the trace stack on thread info.*/
@@ -475,7 +456,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
 				&return_to_handler;
 
 	/* Nmi's are currently unsupported */
-	if (unlikely(atomic_read(&nmi_running)))
+	if (unlikely(in_nmi()))
 		return;
 
 	if (unlikely(atomic_read(&current->tracing_graph_pause)))
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 7/8] ring-buffer: use generic version of in_nmi
  2009-02-08  5:49 [PATCH 0/8] git pull request for tip/tracing/core Steven Rostedt
                   ` (5 preceding siblings ...)
  2009-02-08  5:50 ` [PATCH 6/8] ftrace: change function graph tracer to use new in_nmi Steven Rostedt
@ 2009-02-08  5:50 ` Steven Rostedt
  2009-02-08  5:50 ` [PATCH 8/8] trace: trivial fixes in comment typos Steven Rostedt
  2009-02-09  9:37 ` [PATCH 0/8] git pull request for tip/tracing/core Ingo Molnar
  8 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-02-08  5:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Steven Rostedt

[-- Attachment #1: 0007-ring-buffer-use-generic-version-of-in_nmi.patch --]
[-- Type: text/plain, Size: 4035 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Impact: clean up

Now that a generic in_nmi is available, this patch removes the
special code in the ring_buffer and implements the in_nmi generic
version instead.

With this change, I was also able to rename the "arch_ftrace_nmi_enter"
back to "ftrace_nmi_enter" and remove the code from the ring buffer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/kernel/ftrace.c   |    4 ++--
 include/linux/ftrace_irq.h |    8 --------
 kernel/trace/ring_buffer.c |   43 +++++++++++++------------------------------
 3 files changed, 15 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 918073c..d74d75e 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -113,7 +113,7 @@ static void ftrace_mod_code(void)
 					     MCOUNT_INSN_SIZE);
 }
 
-void arch_ftrace_nmi_enter(void)
+void ftrace_nmi_enter(void)
 {
 	atomic_inc(&nmi_running);
 	/* Must have nmi_running seen before reading write flag */
@@ -124,7 +124,7 @@ void arch_ftrace_nmi_enter(void)
 	}
 }
 
-void arch_ftrace_nmi_exit(void)
+void ftrace_nmi_exit(void)
 {
 	/* Finish all executions before clearing nmi_running */
 	smp_wmb();
diff --git a/include/linux/ftrace_irq.h b/include/linux/ftrace_irq.h
index 29de677..dca7bf8 100644
--- a/include/linux/ftrace_irq.h
+++ b/include/linux/ftrace_irq.h
@@ -3,14 +3,6 @@
 
 
 #ifdef CONFIG_FTRACE_NMI_ENTER
-extern void arch_ftrace_nmi_enter(void);
-extern void arch_ftrace_nmi_exit(void);
-#else
-static inline void arch_ftrace_nmi_enter(void) { }
-static inline void arch_ftrace_nmi_exit(void) { }
-#endif
-
-#ifdef CONFIG_RING_BUFFER
 extern void ftrace_nmi_enter(void);
 extern void ftrace_nmi_exit(void);
 #else
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index a60a6a8..5ee3444 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -8,6 +8,7 @@
 #include <linux/spinlock.h>
 #include <linux/debugfs.h>
 #include <linux/uaccess.h>
+#include <linux/hardirq.h>
 #include <linux/module.h>
 #include <linux/percpu.h>
 #include <linux/mutex.h>
@@ -20,35 +21,6 @@
 #include "trace.h"
 
 /*
- * Since the write to the buffer is still not fully lockless,
- * we must be careful with NMIs. The locks in the writers
- * are taken when a write crosses to a new page. The locks
- * protect against races with the readers (this will soon
- * be fixed with a lockless solution).
- *
- * Because we can not protect against NMIs, and we want to
- * keep traces reentrant, we need to manage what happens
- * when we are in an NMI.
- */
-static DEFINE_PER_CPU(int, rb_in_nmi);
-
-void ftrace_nmi_enter(void)
-{
-	__get_cpu_var(rb_in_nmi)++;
-	/* call arch specific handler too */
-	arch_ftrace_nmi_enter();
-}
-
-void ftrace_nmi_exit(void)
-{
-	arch_ftrace_nmi_exit();
-	__get_cpu_var(rb_in_nmi)--;
-	/* NMIs are not recursive */
-	WARN_ON_ONCE(__get_cpu_var(rb_in_nmi));
-}
-
-
-/*
  * A fast way to enable or disable all ring buffers is to
  * call tracing_on or tracing_off. Turning off the ring buffers
  * prevents all ring buffers from being recorded to.
@@ -1027,12 +999,23 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
 
 		local_irq_save(flags);
 		/*
+		 * Since the write to the buffer is still not
+		 * fully lockless, we must be careful with NMIs.
+		 * The locks in the writers are taken when a write
+		 * crosses to a new page. The locks protect against
+		 * races with the readers (this will soon be fixed
+		 * with a lockless solution).
+		 *
+		 * Because we can not protect against NMIs, and we
+		 * want to keep traces reentrant, we need to manage
+		 * what happens when we are in an NMI.
+		 *
 		 * NMIs can happen after we take the lock.
 		 * If we are in an NMI, only take the lock
 		 * if it is not already taken. Otherwise
 		 * simply fail.
 		 */
-		if (unlikely(__get_cpu_var(rb_in_nmi))) {
+		if (unlikely(in_nmi())) {
 			if (!__raw_spin_trylock(&cpu_buffer->lock))
 				goto out_unlock;
 		} else
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 8/8] trace: trivial fixes in comment typos.
  2009-02-08  5:49 [PATCH 0/8] git pull request for tip/tracing/core Steven Rostedt
                   ` (6 preceding siblings ...)
  2009-02-08  5:50 ` [PATCH 7/8] ring-buffer: use generic version of in_nmi Steven Rostedt
@ 2009-02-08  5:50 ` Steven Rostedt
  2009-02-09  9:37 ` [PATCH 0/8] git pull request for tip/tracing/core Ingo Molnar
  8 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-02-08  5:50 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, Wenji Huang,
	Steven Rostedt

[-- Attachment #1: 0008-trace-trivial-fixes-in-comment-typos.patch --]
[-- Type: text/plain, Size: 3303 bytes --]

From: Wenji Huang <wenji.huang@oracle.com>

Impact: clean up

Fixed several typos in the comments.

Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 include/linux/ftrace.h |    2 +-
 kernel/trace/ftrace.c  |    6 +++---
 kernel/trace/trace.h   |    6 +++---
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 7840e71..5e302d6 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -140,7 +140,7 @@ static inline int ftrace_disable_ftrace_graph_caller(void) { return 0; }
 #endif
 
 /**
- * ftrace_make_nop - convert code into top
+ * ftrace_make_nop - convert code into nop
  * @mod: module structure if called by module load initialization
  * @rec: the mcount call site record
  * @addr: the address that the call site should be calling
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 6861003..1796e01 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -465,7 +465,7 @@ __ftrace_replace_code(struct dyn_ftrace *rec, int enable)
 	 * it is not enabled then do nothing.
 	 *
 	 * If this record is not to be traced and
-	 * it is enabled then disabled it.
+	 * it is enabled then disable it.
 	 *
 	 */
 	if (rec->flags & FTRACE_FL_NOTRACE) {
@@ -485,7 +485,7 @@ __ftrace_replace_code(struct dyn_ftrace *rec, int enable)
 		if (fl == (FTRACE_FL_FILTER | FTRACE_FL_ENABLED))
 			return 0;
 
-		/* Record is not filtered and is not enabled do nothing */
+		/* Record is not filtered or enabled, do nothing */
 		if (!fl)
 			return 0;
 
@@ -507,7 +507,7 @@ __ftrace_replace_code(struct dyn_ftrace *rec, int enable)
 
 		} else {
 
-			/* if record is not enabled do nothing */
+			/* if record is not enabled, do nothing */
 			if (!(rec->flags & FTRACE_FL_ENABLED))
 				return 0;
 
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 5efc4c7..f92aba5 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -616,12 +616,12 @@ extern struct tracer nop_trace;
  * preempt_enable (after a disable), a schedule might take place
  * causing an infinite recursion.
  *
- * To prevent this, we read the need_recshed flag before
+ * To prevent this, we read the need_resched flag before
  * disabling preemption. When we want to enable preemption we
  * check the flag, if it is set, then we call preempt_enable_no_resched.
  * Otherwise, we call preempt_enable.
  *
- * The rational for doing the above is that if need resched is set
+ * The rational for doing the above is that if need_resched is set
  * and we have yet to reschedule, we are either in an atomic location
  * (where we do not need to check for scheduling) or we are inside
  * the scheduler and do not want to resched.
@@ -642,7 +642,7 @@ static inline int ftrace_preempt_disable(void)
  *
  * This is a scheduler safe way to enable preemption and not miss
  * any preemption checks. The disabled saved the state of preemption.
- * If resched is set, then we were either inside an atomic or
+ * If resched is set, then we are either inside an atomic or
  * are inside the scheduler (we would have already scheduled
  * otherwise). In this case, we do not want to call normal
  * preempt_enable, but preempt_enable_no_resched instead.
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/8] trace: remove deprecated entry->cpu
  2009-02-08  5:49 ` [PATCH 1/8] trace: remove deprecated entry->cpu Steven Rostedt
@ 2009-02-08 12:29   ` Frederic Weisbecker
  0 siblings, 0 replies; 30+ messages in thread
From: Frederic Weisbecker @ 2009-02-08 12:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Steven Rostedt

On Sun, Feb 08, 2009 at 12:49:56AM -0500, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
> 
> Impact: fix to prevent developers from using entry->cpu
> 
> With the new ring buffer infrastructure, the cpu for the entry is
> implicit with which CPU buffer it is on.
> 
> The original code use to record the current cpu into the generic
> entry header, which can be retrieved by entry->cpu. When the
> ring buffer was introduced, the users were convert to use the
> the cpu number of which cpu ring buffer was in use (this was passed
> to the tracers by the iterator: iter->cpu).
> 
> Unfortunately, the cpu item in the entry structure was never removed.
> This allowed for developers to use it instead of the proper iter->cpu,
> unknowingly, using an uninitialized variable. This was not the fault
> of the developers, since it would seem like the logical place to
> retrieve the cpu identifier.
> 
> This patch removes the cpu item from the entry structure and fixes
> all the users that should have been using iter->cpu.
> 
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>



Heh, we have sent the same patch at very few different hours...
Funny :-)


> ---
>  kernel/trace/trace.c             |    2 +-
>  kernel/trace/trace.h             |    1 -
>  kernel/trace/trace_hw_branches.c |    3 +--
>  kernel/trace/trace_output.c      |    6 +++---
>  4 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index fd51cf0..bd4d9f8 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -1531,7 +1531,7 @@ static enum print_line_t print_bin_fmt(struct trace_iterator *iter)
>  
>  	if (trace_flags & TRACE_ITER_CONTEXT_INFO) {
>  		SEQ_PUT_FIELD_RET(s, entry->pid);
> -		SEQ_PUT_FIELD_RET(s, entry->cpu);
> +		SEQ_PUT_FIELD_RET(s, iter->cpu);
>  		SEQ_PUT_FIELD_RET(s, iter->ts);
>  	}
>  
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index f0c7a0f..5efc4c7 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -45,7 +45,6 @@ enum trace_type {
>   */
>  struct trace_entry {
>  	unsigned char		type;
> -	unsigned char		cpu;
>  	unsigned char		flags;
>  	unsigned char		preempt_count;
>  	int			pid;
> diff --git a/kernel/trace/trace_hw_branches.c b/kernel/trace/trace_hw_branches.c
> index fff3545..549238a 100644
> --- a/kernel/trace/trace_hw_branches.c
> +++ b/kernel/trace/trace_hw_branches.c
> @@ -159,7 +159,7 @@ static enum print_line_t bts_trace_print_line(struct trace_iterator *iter)
>  	trace_assign_type(it, entry);
>  
>  	if (entry->type == TRACE_HW_BRANCHES) {
> -		if (trace_seq_printf(seq, "%4d  ", entry->cpu) &&
> +		if (trace_seq_printf(seq, "%4d  ", iter->cpu) &&
>  		    seq_print_ip_sym(seq, it->to, symflags) &&
>  		    trace_seq_printf(seq, "\t  <-  ") &&
>  		    seq_print_ip_sym(seq, it->from, symflags) &&
> @@ -195,7 +195,6 @@ void trace_hw_branch(u64 from, u64 to)
>  	entry	= ring_buffer_event_data(event);
>  	tracing_generic_entry_update(&entry->ent, 0, from);
>  	entry->ent.type = TRACE_HW_BRANCHES;
> -	entry->ent.cpu = cpu;
>  	entry->from = from;
>  	entry->to   = to;
>  	ring_buffer_unlock_commit(tr->buffer, event, irq2);
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index b7380ee..463a310 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -333,7 +333,7 @@ int trace_print_context(struct trace_iterator *iter)
>  	unsigned long secs = (unsigned long)t;
>  
>  	return trace_seq_printf(s, "%16s-%-5d [%03d] %5lu.%06lu: ",
> -				comm, entry->pid, entry->cpu, secs, usec_rem);
> +				comm, entry->pid, iter->cpu, secs, usec_rem);
>  }
>  
>  int trace_print_lat_context(struct trace_iterator *iter)
> @@ -356,7 +356,7 @@ int trace_print_lat_context(struct trace_iterator *iter)
>  		char *comm = trace_find_cmdline(entry->pid);
>  		ret = trace_seq_printf(s, "%16s %5d %3d %d %08x %08lx [%08lx]"
>  				       " %ld.%03ldms (+%ld.%03ldms): ", comm,
> -				       entry->pid, entry->cpu, entry->flags,
> +				       entry->pid, iter->cpu, entry->flags,
>  				       entry->preempt_count, iter->idx,
>  				       ns2usecs(iter->ts),
>  				       abs_usecs / USEC_PER_MSEC,
> @@ -364,7 +364,7 @@ int trace_print_lat_context(struct trace_iterator *iter)
>  				       rel_usecs / USEC_PER_MSEC,
>  				       rel_usecs % USEC_PER_MSEC);
>  	} else {
> -		ret = lat_print_generic(s, entry, entry->cpu);
> +		ret = lat_print_generic(s, entry, iter->cpu);
>  		if (ret)
>  			ret = lat_print_timestamp(s, abs_usecs, rel_usecs);
>  	}
> -- 
> 1.5.6.5
> 
> -- 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-08  5:49 [PATCH 0/8] git pull request for tip/tracing/core Steven Rostedt
                   ` (7 preceding siblings ...)
  2009-02-08  5:50 ` [PATCH 8/8] trace: trivial fixes in comment typos Steven Rostedt
@ 2009-02-09  9:37 ` Ingo Molnar
  2009-02-11 15:36   ` Ingo Molnar
  8 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2009-02-09  9:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Ingo,
> 
> I rebased the previous changes onto tip/tracing/core. The first patch
> is new, which is the elimination of the entry->cpu.
> 
> I did not remove it when it became obsolete, and thus we had
> other developers using it, thinking that it was the way to
> get the current cpu number. Unfortunately, the entry->cpu was
> just taking up space, and never was initialized.
> 
> The rest of the patches, I've posted before, but this time I ported
> them over to tip/tracing/core.
> 
> 
> The following patches are in:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> 
>     branch: tip/tracing/core/devel
> 
> 
> Steven Rostedt (7):
>       trace: remove deprecated entry->cpu
>       ring-buffer: add NMI protection for spinlocks
>       ring-buffer: allow tracing_off to be used in core kernel code
>       ftrace, x86: rename in_nmi variable
>       nmi: add generic nmi tracking state
>       ftrace: change function graph tracer to use new in_nmi
>       ring-buffer: use generic version of in_nmi
> 
> Wenji Huang (1):
>       trace: trivial fixes in comment typos.
> 
> ----
>  arch/x86/Kconfig                 |    1 +
>  arch/x86/kernel/ftrace.c         |   35 ++++++++---------------------------
>  include/linux/ftrace.h           |    2 +-
>  include/linux/ftrace_irq.h       |    2 +-
>  include/linux/hardirq.h          |   15 +++++++++++++++
>  include/linux/ring_buffer.h      |    9 +++++++++
>  kernel/trace/Kconfig             |    8 ++++++++
>  kernel/trace/ftrace.c            |    6 +++---
>  kernel/trace/ring_buffer.c       |   31 +++++++++++++++++++++++++++++--
>  kernel/trace/trace.c             |    2 +-
>  kernel/trace/trace.h             |    7 +++----
>  kernel/trace/trace_hw_branches.c |    3 +--
>  kernel/trace/trace_output.c      |    6 +++---
>  13 files changed, 83 insertions(+), 44 deletions(-)

Pulled into tip/tracing/ftrace, thanks Steve!

There was a conflict in kernel/trace/trace_hw_branches.c - i resolved it,
please double-check the end result.

One small detail, for future pull requests to me, could you please use the
customary "tracing: " tags for generic commits, instead the "trace: " tag
you started using recently? Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-09  9:37 ` [PATCH 0/8] git pull request for tip/tracing/core Ingo Molnar
@ 2009-02-11 15:36   ` Ingo Molnar
  2009-02-11 15:46     ` Steven Rostedt
  2009-02-11 16:49     ` Steven Rostedt
  0 siblings, 2 replies; 30+ messages in thread
From: Ingo Molnar @ 2009-02-11 15:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker


> > Steven Rostedt (7):
> >       ring-buffer: add NMI protection for spinlocks
> >       nmi: add generic nmi tracking state

hm, this commit breaks a number of non-x86 architectures the following way:

In file included from /home/mingo/tip/include/linux/interrupt.h:12,
                 from /home/mingo/tip/arch/ia64/include/asm/mca.h:16,
                 from /home/mingo/tip/arch/ia64/kernel/asm-offsets.c:17:
/home/mingo/tip/include/linux/hardirq.h:67:2: #error PREEMPT_ACTIVE too high!
make[2]: *** [arch/ia64/kernel/asm-offsets.s] Error 1

Alpha and ARM are broken too. (and possibly others)

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 15:36   ` Ingo Molnar
@ 2009-02-11 15:46     ` Steven Rostedt
  2009-02-11 16:25       ` Ingo Molnar
  2009-02-11 16:49     ` Steven Rostedt
  1 sibling, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2009-02-11 15:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker


On Wed, 11 Feb 2009, Ingo Molnar wrote:

> 
> > > Steven Rostedt (7):
> > >       ring-buffer: add NMI protection for spinlocks
> > >       nmi: add generic nmi tracking state
> 
> hm, this commit breaks a number of non-x86 architectures the following way:
> 
> In file included from /home/mingo/tip/include/linux/interrupt.h:12,
>                  from /home/mingo/tip/arch/ia64/include/asm/mca.h:16,
>                  from /home/mingo/tip/arch/ia64/kernel/asm-offsets.c:17:
> /home/mingo/tip/include/linux/hardirq.h:67:2: #error PREEMPT_ACTIVE too high!
> make[2]: *** [arch/ia64/kernel/asm-offsets.s] Error 1
> 
> Alpha and ARM are broken too. (and possibly others)

I'm glad we had that check.

OK, revert it and I'll take a look to see what other options we have.

Thanks,

-- Steve


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 15:46     ` Steven Rostedt
@ 2009-02-11 16:25       ` Ingo Molnar
  2009-02-11 16:33         ` Steven Rostedt
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2009-02-11 16:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> On Wed, 11 Feb 2009, Ingo Molnar wrote:
> 
> > 
> > > > Steven Rostedt (7):
> > > >       ring-buffer: add NMI protection for spinlocks
> > > >       nmi: add generic nmi tracking state
> > 
> > hm, this commit breaks a number of non-x86 architectures the following way:
> > 
> > In file included from /home/mingo/tip/include/linux/interrupt.h:12,
> >                  from /home/mingo/tip/arch/ia64/include/asm/mca.h:16,
> >                  from /home/mingo/tip/arch/ia64/kernel/asm-offsets.c:17:
> > /home/mingo/tip/include/linux/hardirq.h:67:2: #error PREEMPT_ACTIVE too high!
> > make[2]: *** [arch/ia64/kernel/asm-offsets.s] Error 1
> > 
> > Alpha and ARM are broken too. (and possibly others)
> 
> I'm glad we had that check.
> 
> OK, revert it and I'll take a look to see what other options we have.

Reverting it is neither simple nor desired: 1) it's now embedded in other
tracing commits 2) my test-boxes stopped doing those rare lockups with the
NMI fix in place ...

Any better approach to fix those platforms? I reported this ftrace NMI problem 
to you about a year ago ...

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 16:25       ` Ingo Molnar
@ 2009-02-11 16:33         ` Steven Rostedt
  0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-02-11 16:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker


On Wed, 11 Feb 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > 
> > On Wed, 11 Feb 2009, Ingo Molnar wrote:
> > 
> > > 
> > > > > Steven Rostedt (7):
> > > > >       ring-buffer: add NMI protection for spinlocks
> > > > >       nmi: add generic nmi tracking state
> > > 
> > > hm, this commit breaks a number of non-x86 architectures the following way:
> > > 
> > > In file included from /home/mingo/tip/include/linux/interrupt.h:12,
> > >                  from /home/mingo/tip/arch/ia64/include/asm/mca.h:16,
> > >                  from /home/mingo/tip/arch/ia64/kernel/asm-offsets.c:17:
> > > /home/mingo/tip/include/linux/hardirq.h:67:2: #error PREEMPT_ACTIVE too high!
> > > make[2]: *** [arch/ia64/kernel/asm-offsets.s] Error 1
> > > 
> > > Alpha and ARM are broken too. (and possibly others)
> > 
> > I'm glad we had that check.
> > 
> > OK, revert it and I'll take a look to see what other options we have.
> 
> Reverting it is neither simple nor desired: 1) it's now embedded in other
> tracing commits 2) my test-boxes stopped doing those rare lockups with the
> NMI fix in place ...

OK, I was thinking about coming up with another fix.

I'll have to look at the other archs and see what we can do to make this 
work for all.

> 
> Any better approach to fix those platforms? I reported this ftrace NMI problem 
> to you about a year ago ...

Yeah, and I just found the bug ;-)


-- Steve


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 15:36   ` Ingo Molnar
  2009-02-11 15:46     ` Steven Rostedt
@ 2009-02-11 16:49     ` Steven Rostedt
  2009-02-11 16:59       ` Steven Rostedt
  1 sibling, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2009-02-11 16:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker


On Wed, 11 Feb 2009, Ingo Molnar wrote:

> 
> > > Steven Rostedt (7):
> > >       ring-buffer: add NMI protection for spinlocks
> > >       nmi: add generic nmi tracking state
> 
> hm, this commit breaks a number of non-x86 architectures the following way:
> 
> In file included from /home/mingo/tip/include/linux/interrupt.h:12,
>                  from /home/mingo/tip/arch/ia64/include/asm/mca.h:16,
>                  from /home/mingo/tip/arch/ia64/kernel/asm-offsets.c:17:
> /home/mingo/tip/include/linux/hardirq.h:67:2: #error PREEMPT_ACTIVE too high!
> make[2]: *** [arch/ia64/kernel/asm-offsets.s] Error 1
> 
> Alpha and ARM are broken too. (and possibly others)

Looking at this, it doesn't seem like it should be a problem. Perhaps the 
CPP is treating the number as signed. Does this patch fix the compile?

-- Steve

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index f3cf86e..9bd4302 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -63,7 +63,7 @@
 
 #define NMI_OFFSET	(PREEMPT_ACTIVE << 1)
 
-#if NMI_OFFSET >= 0x80000000
+#if NMI_OFFSET >= 0x80000000UL
 #error PREEMPT_ACTIVE too high!
 #endif
 

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 16:49     ` Steven Rostedt
@ 2009-02-11 16:59       ` Steven Rostedt
  2009-02-11 17:16         ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2009-02-11 16:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker


On Wed, 11 Feb 2009, Steven Rostedt wrote:

> 
> On Wed, 11 Feb 2009, Ingo Molnar wrote:
> 
> > 
> > > > Steven Rostedt (7):
> > > >       ring-buffer: add NMI protection for spinlocks
> > > >       nmi: add generic nmi tracking state
> > 
> > hm, this commit breaks a number of non-x86 architectures the following way:
> > 
> > In file included from /home/mingo/tip/include/linux/interrupt.h:12,
> >                  from /home/mingo/tip/arch/ia64/include/asm/mca.h:16,
> >                  from /home/mingo/tip/arch/ia64/kernel/asm-offsets.c:17:
> > /home/mingo/tip/include/linux/hardirq.h:67:2: #error PREEMPT_ACTIVE too high!
> > make[2]: *** [arch/ia64/kernel/asm-offsets.s] Error 1

I just looked at alpha and arm, I'm now looking at ia64. And this could
be an issue, since it has 14 bits for hard irqs.

What would be the impact to make the preempt count a long?

-- Steve


> > 
> > Alpha and ARM are broken too. (and possibly others)
> 
> Looking at this, it doesn't seem like it should be a problem. Perhaps the 
> CPP is treating the number as signed. Does this patch fix the compile?
> 
> -- Steve
> 
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> 
> diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
> index f3cf86e..9bd4302 100644
> --- a/include/linux/hardirq.h
> +++ b/include/linux/hardirq.h
> @@ -63,7 +63,7 @@
>  
>  #define NMI_OFFSET	(PREEMPT_ACTIVE << 1)
>  
> -#if NMI_OFFSET >= 0x80000000
> +#if NMI_OFFSET >= 0x80000000UL
>  #error PREEMPT_ACTIVE too high!
>  #endif
>  
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 16:59       ` Steven Rostedt
@ 2009-02-11 17:16         ` Ingo Molnar
  2009-02-11 17:30           ` Steven Rostedt
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2009-02-11 17:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> On Wed, 11 Feb 2009, Steven Rostedt wrote:
> 
> > 
> > On Wed, 11 Feb 2009, Ingo Molnar wrote:
> > 
> > > 
> > > > > Steven Rostedt (7):
> > > > >       ring-buffer: add NMI protection for spinlocks
> > > > >       nmi: add generic nmi tracking state
> > > 
> > > hm, this commit breaks a number of non-x86 architectures the following way:
> > > 
> > > In file included from /home/mingo/tip/include/linux/interrupt.h:12,
> > >                  from /home/mingo/tip/arch/ia64/include/asm/mca.h:16,
> > >                  from /home/mingo/tip/arch/ia64/kernel/asm-offsets.c:17:
> > > /home/mingo/tip/include/linux/hardirq.h:67:2: #error PREEMPT_ACTIVE too high!
> > > make[2]: *** [arch/ia64/kernel/asm-offsets.s] Error 1
> 
> I just looked at alpha and arm, I'm now looking at ia64. And this could
> be an issue, since it has 14 bits for hard irqs.
> 
> What would be the impact to make the preempt count a long?

Assembly code has to be audited i guess, whether it's treated as an int anywhere. 
Should work i guess.

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 17:16         ` Ingo Molnar
@ 2009-02-11 17:30           ` Steven Rostedt
  2009-02-11 17:31             ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2009-02-11 17:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker


On Wed, 11 Feb 2009, Ingo Molnar wrote:
> > 
> > I just looked at alpha and arm, I'm now looking at ia64. And this could
> > be an issue, since it has 14 bits for hard irqs.
> > 
> > What would be the impact to make the preempt count a long?
> 
> Assembly code has to be audited i guess, whether it's treated as an int anywhere. 
> Should work i guess.

Before we go and make the change, Peter brought up a good point on IRC. Is 
there any reason that ia64 needs 1 << 14 IRQs?  That's 16384!

Perhaps the better solution wolud be (if possible), to simply lower the 
number of bits.

-- Steve


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 17:30           ` Steven Rostedt
@ 2009-02-11 17:31             ` Ingo Molnar
  2009-02-11 17:57               ` Luck, Tony
  0 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2009-02-11 17:31 UTC (permalink / raw)
  To: Steven Rostedt, Tony Luck, Mike Travis
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 11 Feb 2009, Ingo Molnar wrote:
> > > 
> > > I just looked at alpha and arm, I'm now looking at ia64. And this could
> > > be an issue, since it has 14 bits for hard irqs.
> > > 
> > > What would be the impact to make the preempt count a long?
> > 
> > Assembly code has to be audited i guess, whether it's treated as an int anywhere. 
> > Should work i guess.
> 
> Before we go and make the change, Peter brought up a good point on IRC. Is 
> there any reason that ia64 needs 1 << 14 IRQs?  That's 16384!
> 
> Perhaps the better solution wolud be (if possible), to simply lower the 
> number of bits.

i'm the wrong person to be asked about that. (Cc:-ed the right people)

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 17:31             ` Ingo Molnar
@ 2009-02-11 17:57               ` Luck, Tony
  2009-02-11 18:23                 ` Steven Rostedt
                                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Luck, Tony @ 2009-02-11 17:57 UTC (permalink / raw)
  To: Ingo Molnar, Steven Rostedt, Mike Travis
  Cc: linux-kernel@vger.kernel.org, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker,
	isimatu.yasuaki@jp.fujitsu.com, kaneshige.kenji@jp.fujitsu.com

> > Before we go and make the change, Peter brought up a good point on IRC. Is
> > there any reason that ia64 needs 1 << 14 IRQs?  That's 16384!
> >
> > Perhaps the better solution wolud be (if possible), to simply lower the
> > number of bits.
>
> i'm the wrong person to be asked about that. (Cc:-ed the right people)

People build some pretty big systems on ia64.  SGI's largest has 4096
cpus ... so 16384 IRQs is only 4 per cpu.  That doesn't sound like very
many to me.

Fujitsu added the vector domain support for ia64 to get around the shortage
of IRQs for large machines.  Added them to the Cc: list to see if they have
comments on how many IRQs are needed.

-Tony

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 17:57               ` Luck, Tony
@ 2009-02-11 18:23                 ` Steven Rostedt
  2009-02-11 18:34                   ` Luck, Tony
  2009-02-11 20:39                 ` Jack Steiner
  2009-02-12  2:39                 ` Kenji Kaneshige
  2 siblings, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2009-02-11 18:23 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Ingo Molnar, Mike Travis, linux-kernel@vger.kernel.org,
	Thomas Gleixner, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, isimatu.yasuaki@jp.fujitsu.com,
	kaneshige.kenji@jp.fujitsu.com


On Wed, 11 Feb 2009, Luck, Tony wrote:

> > > Before we go and make the change, Peter brought up a good point on IRC. Is
> > > there any reason that ia64 needs 1 << 14 IRQs?  That's 16384!
> > >
> > > Perhaps the better solution wolud be (if possible), to simply lower the
> > > number of bits.
> >
> > i'm the wrong person to be asked about that. (Cc:-ed the right people)
> 
> People build some pretty big systems on ia64.  SGI's largest has 4096
> cpus ... so 16384 IRQs is only 4 per cpu.  That doesn't sound like very
> many to me.
> 
> Fujitsu added the vector domain support for ia64 to get around the shortage
> of IRQs for large machines.  Added them to the Cc: list to see if they have
> comments on how many IRQs are needed.

The bits in question is really the number of possible nested interrupts 
that can happen. We take the paranoid approach that we can have a max 
nesting of NR_IRQS. Perhaps this can be changed, and just do a max of 
1<<10 nesting? And have a big warn on if it happens to be bigger, or fall 
to another counter if it is bigger.

1000 nested IRQs seems a bit extreme :-/

Just throwing out some ideas.

-- Steve


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 18:23                 ` Steven Rostedt
@ 2009-02-11 18:34                   ` Luck, Tony
  2009-02-11 18:42                     ` Steven Rostedt
  0 siblings, 1 reply; 30+ messages in thread
From: Luck, Tony @ 2009-02-11 18:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Mike Travis, linux-kernel@vger.kernel.org,
	Thomas Gleixner, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, isimatu.yasuaki@jp.fujitsu.com,
	kaneshige.kenji@jp.fujitsu.com

> The bits in question is really the number of possible nested interrupts
> that can happen. We take the paranoid approach that we can have a max
> nesting of NR_IRQS. Perhaps this can be changed, and just do a max of
> 1<<10 nesting? And have a big warn on if it happens to be bigger, or fall
> to another counter if it is bigger.
>
> 1000 nested IRQs seems a bit extreme :-/

Ah, I see.  Then the answer is very different.  The number of nested
interrupts possible on a cpu is limited by the number of priority
classes for interrupts (See Table 5-8 on page 2:112 of the Itanium
software developers manual).  Effectively the max nesting depth is
16.

1000 nested interrupts would be certain to run us out of stack.

-Tony

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 18:34                   ` Luck, Tony
@ 2009-02-11 18:42                     ` Steven Rostedt
  2009-02-11 20:20                       ` Ingo Molnar
  0 siblings, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2009-02-11 18:42 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Ingo Molnar, Mike Travis, linux-kernel@vger.kernel.org,
	Thomas Gleixner, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, isimatu.yasuaki@jp.fujitsu.com,
	kaneshige.kenji@jp.fujitsu.com


On Wed, 11 Feb 2009, Luck, Tony wrote:

> > The bits in question is really the number of possible nested interrupts
> > that can happen. We take the paranoid approach that we can have a max
> > nesting of NR_IRQS. Perhaps this can be changed, and just do a max of
> > 1<<10 nesting? And have a big warn on if it happens to be bigger, or fall
> > to another counter if it is bigger.
> >
> > 1000 nested IRQs seems a bit extreme :-/
> 
> Ah, I see.  Then the answer is very different.  The number of nested
> interrupts possible on a cpu is limited by the number of priority
> classes for interrupts (See Table 5-8 on page 2:112 of the Itanium
> software developers manual).  Effectively the max nesting depth is
> 16.
> 
> 1000 nested interrupts would be certain to run us out of stack.

Ingo,

Do you think this is a good assumption to make, that no arch will have 
over 1<<10 nested interrupts. We can add a WARN_ON if it happens. Then we 
can make all archs have a 10 bit offset. Smaller may also be sufficient.

-- Steve


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 18:42                     ` Steven Rostedt
@ 2009-02-11 20:20                       ` Ingo Molnar
  0 siblings, 0 replies; 30+ messages in thread
From: Ingo Molnar @ 2009-02-11 20:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Luck, Tony, Mike Travis, linux-kernel@vger.kernel.org,
	Thomas Gleixner, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, isimatu.yasuaki@jp.fujitsu.com,
	kaneshige.kenji@jp.fujitsu.com


* Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> On Wed, 11 Feb 2009, Luck, Tony wrote:
> 
> > > The bits in question is really the number of possible nested interrupts
> > > that can happen. We take the paranoid approach that we can have a max
> > > nesting of NR_IRQS. Perhaps this can be changed, and just do a max of
> > > 1<<10 nesting? And have a big warn on if it happens to be bigger, or fall
> > > to another counter if it is bigger.
> > >
> > > 1000 nested IRQs seems a bit extreme :-/
> > 
> > Ah, I see.  Then the answer is very different.  The number of nested
> > interrupts possible on a cpu is limited by the number of priority
> > classes for interrupts (See Table 5-8 on page 2:112 of the Itanium
> > software developers manual).  Effectively the max nesting depth is
> > 16.
> > 
> > 1000 nested interrupts would be certain to run us out of stack.
> 
> Ingo,
> 
> Do you think this is a good assumption to make, that no arch will have 
> over 1<<10 nested interrupts. We can add a WARN_ON if it happens. Then we 
> can make all archs have a 10 bit offset. Smaller may also be sufficient.

That's a fair assumption, yes. No need for a WARN_ON() (it slows down a critical 
path) - things will get very colorful much sooner than that, due to kernel stack 
overflow.

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 17:57               ` Luck, Tony
  2009-02-11 18:23                 ` Steven Rostedt
@ 2009-02-11 20:39                 ` Jack Steiner
  2009-02-12  2:39                 ` Kenji Kaneshige
  2 siblings, 0 replies; 30+ messages in thread
From: Jack Steiner @ 2009-02-11 20:39 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Ingo Molnar, Steven Rostedt, Mike Travis,
	linux-kernel@vger.kernel.org, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker,
	isimatu.yasuaki@jp.fujitsu.com, kaneshige.kenji@jp.fujitsu.com

On Wed, Feb 11, 2009 at 09:57:00AM -0800, Luck, Tony wrote:
> > > Before we go and make the change, Peter brought up a good point on IRC. Is
> > > there any reason that ia64 needs 1 << 14 IRQs?  That's 16384!
> > >
> > > Perhaps the better solution wolud be (if possible), to simply lower the
> > > number of bits.
> >
> > i'm the wrong person to be asked about that. (Cc:-ed the right people)
> 
> People build some pretty big systems on ia64.  SGI's largest has 4096
> cpus ... so 16384 IRQs is only 4 per cpu.  That doesn't sound like very
> many to me.

As long as per-cpu interrupts (ex., timer, cpe_hndlr, cmc_hndlr, etc) are not counted
in the IRQ count, SGI is ok with  256 IRQs. We had problems overflowing the old
limit of 256 but have worked around it.

If per-cpu interrupts ARE counted in the total IRQs, then we currently use 5
per-cpu interrupts for each cpu. A 4k cpu config will need 20k IRQs + additional
IRQs for real devices.

I know the IRQ infrastructure has changed recently so this may no longer apply.


--- jack

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-11 17:57               ` Luck, Tony
  2009-02-11 18:23                 ` Steven Rostedt
  2009-02-11 20:39                 ` Jack Steiner
@ 2009-02-12  2:39                 ` Kenji Kaneshige
  2009-02-12  2:43                   ` Steven Rostedt
  2 siblings, 1 reply; 30+ messages in thread
From: Kenji Kaneshige @ 2009-02-12  2:39 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Ingo Molnar, Steven Rostedt, Mike Travis,
	linux-kernel@vger.kernel.org, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker,
	isimatu.yasuaki@jp.fujitsu.com

Luck, Tony wrote:
>>> Before we go and make the change, Peter brought up a good point on IRC. Is
>>> there any reason that ia64 needs 1 << 14 IRQs?  That's 16384!
>>>
>>> Perhaps the better solution wolud be (if possible), to simply lower the
>>> number of bits.
>> i'm the wrong person to be asked about that. (Cc:-ed the right people)
> 
> People build some pretty big systems on ia64.  SGI's largest has 4096
> cpus ... so 16384 IRQs is only 4 per cpu.  That doesn't sound like very
> many to me.
> 
> Fujitsu added the vector domain support for ia64 to get around the shortage
> of IRQs for large machines.  Added them to the Cc: list to see if they have
> comments on how many IRQs are needed.
> 

The 1024 IRQs are enough for GSIs on our maximum configuration. But
if the devices are MSI/MSI-X capable, it could be more than 1024. 

Thanks,
Kenji Kaneshige



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-12  2:39                 ` Kenji Kaneshige
@ 2009-02-12  2:43                   ` Steven Rostedt
  2009-02-12  3:15                     ` Kenji Kaneshige
  0 siblings, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2009-02-12  2:43 UTC (permalink / raw)
  To: Kenji Kaneshige
  Cc: Luck, Tony, Ingo Molnar, Mike Travis,
	linux-kernel@vger.kernel.org, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker,
	isimatu.yasuaki@jp.fujitsu.com


On Thu, 12 Feb 2009, Kenji Kaneshige wrote:
> Luck, Tony wrote:
> >>> Before we go and make the change, Peter brought up a good point on IRC. Is
> >>> there any reason that ia64 needs 1 << 14 IRQs?  That's 16384!
> >>>
> >>> Perhaps the better solution wolud be (if possible), to simply lower the
> >>> number of bits.
> >> i'm the wrong person to be asked about that. (Cc:-ed the right people)
> > 
> > People build some pretty big systems on ia64.  SGI's largest has 4096
> > cpus ... so 16384 IRQs is only 4 per cpu.  That doesn't sound like very
> > many to me.
> > 
> > Fujitsu added the vector domain support for ia64 to get around the shortage
> > of IRQs for large machines.  Added them to the Cc: list to see if they have
> > comments on how many IRQs are needed.
> > 
> 
> The 1024 IRQs are enough for GSIs on our maximum configuration. But
> if the devices are MSI/MSI-X capable, it could be more than 1024. 

But you would never expect more than 1024 nested interrupts all on the 
same CPU?  That is, to have over 1024 interrupts interrupting each other?

-- Steve


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-12  2:43                   ` Steven Rostedt
@ 2009-02-12  3:15                     ` Kenji Kaneshige
  2009-02-12  3:22                       ` Steven Rostedt
  0 siblings, 1 reply; 30+ messages in thread
From: Kenji Kaneshige @ 2009-02-12  3:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Luck, Tony, Ingo Molnar, Mike Travis,
	linux-kernel@vger.kernel.org, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker,
	isimatu.yasuaki@jp.fujitsu.com

Steven Rostedt wrote:
> On Thu, 12 Feb 2009, Kenji Kaneshige wrote:
>> Luck, Tony wrote:
>>>>> Before we go and make the change, Peter brought up a good point on IRC. Is
>>>>> there any reason that ia64 needs 1 << 14 IRQs?  That's 16384!
>>>>>
>>>>> Perhaps the better solution wolud be (if possible), to simply lower the
>>>>> number of bits.
>>>> i'm the wrong person to be asked about that. (Cc:-ed the right people)
>>> People build some pretty big systems on ia64.  SGI's largest has 4096
>>> cpus ... so 16384 IRQs is only 4 per cpu.  That doesn't sound like very
>>> many to me.
>>>
>>> Fujitsu added the vector domain support for ia64 to get around the shortage
>>> of IRQs for large machines.  Added them to the Cc: list to see if they have
>>> comments on how many IRQs are needed.
>>>
>> The 1024 IRQs are enough for GSIs on our maximum configuration. But
>> if the devices are MSI/MSI-X capable, it could be more than 1024. 
> 
> But you would never expect more than 1024 nested interrupts all on the 
> same CPU?  That is, to have over 1024 interrupts interrupting each other?
> 

Itanium processor has 240 vectors for external interrupts and
has a mechanism to classifies them to 16 priority classes. The
ia64 linux limits maximum nested interrupts depth to 16 using
this mechanism.

Thanks,
Kenji Kaneshige



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/8] git pull request for tip/tracing/core
  2009-02-12  3:15                     ` Kenji Kaneshige
@ 2009-02-12  3:22                       ` Steven Rostedt
  0 siblings, 0 replies; 30+ messages in thread
From: Steven Rostedt @ 2009-02-12  3:22 UTC (permalink / raw)
  To: Kenji Kaneshige
  Cc: Luck, Tony, Ingo Molnar, Mike Travis,
	linux-kernel@vger.kernel.org, Thomas Gleixner, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Frederic Weisbecker,
	isimatu.yasuaki@jp.fujitsu.com


On Thu, 12 Feb 2009, Kenji Kaneshige wrote:
> 
> Itanium processor has 240 vectors for external interrupts and
> has a mechanism to classifies them to 16 priority classes. The
> ia64 linux limits maximum nested interrupts depth to 16 using
> this mechanism.

Then 1000, should be more than enough ;-)

Thanks for the info,

-- Steve


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2009-02-12  3:22 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-08  5:49 [PATCH 0/8] git pull request for tip/tracing/core Steven Rostedt
2009-02-08  5:49 ` [PATCH 1/8] trace: remove deprecated entry->cpu Steven Rostedt
2009-02-08 12:29   ` Frederic Weisbecker
2009-02-08  5:49 ` [PATCH 2/8] ring-buffer: add NMI protection for spinlocks Steven Rostedt
2009-02-08  5:49 ` [PATCH 3/8] ring-buffer: allow tracing_off to be used in core kernel code Steven Rostedt
2009-02-08  5:49 ` [PATCH 4/8] ftrace, x86: rename in_nmi variable Steven Rostedt
2009-02-08  5:50 ` [PATCH 5/8] nmi: add generic nmi tracking state Steven Rostedt
2009-02-08  5:50 ` [PATCH 6/8] ftrace: change function graph tracer to use new in_nmi Steven Rostedt
2009-02-08  5:50 ` [PATCH 7/8] ring-buffer: use generic version of in_nmi Steven Rostedt
2009-02-08  5:50 ` [PATCH 8/8] trace: trivial fixes in comment typos Steven Rostedt
2009-02-09  9:37 ` [PATCH 0/8] git pull request for tip/tracing/core Ingo Molnar
2009-02-11 15:36   ` Ingo Molnar
2009-02-11 15:46     ` Steven Rostedt
2009-02-11 16:25       ` Ingo Molnar
2009-02-11 16:33         ` Steven Rostedt
2009-02-11 16:49     ` Steven Rostedt
2009-02-11 16:59       ` Steven Rostedt
2009-02-11 17:16         ` Ingo Molnar
2009-02-11 17:30           ` Steven Rostedt
2009-02-11 17:31             ` Ingo Molnar
2009-02-11 17:57               ` Luck, Tony
2009-02-11 18:23                 ` Steven Rostedt
2009-02-11 18:34                   ` Luck, Tony
2009-02-11 18:42                     ` Steven Rostedt
2009-02-11 20:20                       ` Ingo Molnar
2009-02-11 20:39                 ` Jack Steiner
2009-02-12  2:39                 ` Kenji Kaneshige
2009-02-12  2:43                   ` Steven Rostedt
2009-02-12  3:15                     ` Kenji Kaneshige
2009-02-12  3:22                       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox