[PATCH 0/2] ftrace: updates for tip

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] ftrace: updates for tip
@ 2009-01-22 20:20 Steven Rostedt
  2009-01-23 10:09 ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2009-01-22 20:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker

Ingo,

The following patches are in:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git

    branch: tip/devel


Steven Rostedt (2):
      trace: remove internal irqsoff disabling for trace output
      trace: fix logic to start/stop counting

----
 kernel/trace/trace.c         |   13 ++++++-------
 kernel/trace/trace_irqsoff.c |   34 ++--------------------------------
 2 files changed, 8 insertions(+), 39 deletions(-)
-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] ftrace: updates for tip
  2009-01-22 20:20 [PATCH 0/2] ftrace: updates for tip Steven Rostedt
@ 2009-01-23 10:09 ` Ingo Molnar
  0 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2009-01-23 10:09 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Andrew Morton, Frederic Weisbecker


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Ingo,
> 
> The following patches are in:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> 
>     branch: tip/devel
> 
> 
> Steven Rostedt (2):
>       trace: remove internal irqsoff disabling for trace output
>       trace: fix logic to start/stop counting
> 
> ----
>  kernel/trace/trace.c         |   13 ++++++-------
>  kernel/trace/trace_irqsoff.c |   34 ++--------------------------------
>  2 files changed, 8 insertions(+), 39 deletions(-)

Pulled into tip/tracing/ftrace, thanks Steve!

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 0/2] ftrace: updates for tip
@ 2009-02-06  1:13 Steven Rostedt
  2009-02-06  1:13 ` [PATCH 1/2] ring-buffer: add NMI protection for spinlocks Steven Rostedt
  2009-02-06  1:13 ` [PATCH 2/2] ring-buffer: allow tracing_off to be used in core kernel code Steven Rostedt
  0 siblings, 2 replies; 8+ messages in thread
From: Steven Rostedt @ 2009-02-06  1:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker,
	Arnaldo Carvalho de Melo


Ingo,

The following patches are in:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git

    branch: tip/devel


Steven Rostedt (2):
      ring-buffer: add NMI protection for spinlocks
      ring-buffer: allow tracing_off to be used in core kernel code

----
 arch/x86/Kconfig            |    1 +
 arch/x86/kernel/ftrace.c    |    8 +++---
 include/linux/ftrace_irq.h  |   10 ++++++++-
 include/linux/ring_buffer.h |    9 ++++++++
 kernel/trace/Kconfig        |    8 +++++++
 kernel/trace/ring_buffer.c  |   48 +++++++++++++++++++++++++++++++++++++++++-
 6 files changed, 77 insertions(+), 7 deletions(-)

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] ring-buffer: add NMI protection for spinlocks
  2009-02-06  1:13 [PATCH 0/2] ftrace: updates for tip Steven Rostedt
@ 2009-02-06  1:13 ` Steven Rostedt
  2009-02-06  1:53   ` Andrew Morton
  2009-02-06  1:13 ` [PATCH 2/2] ring-buffer: allow tracing_off to be used in core kernel code Steven Rostedt
  1 sibling, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2009-02-06  1:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker,
	Arnaldo Carvalho de Melo, Steven Rostedt

[-- Attachment #1: 0001-ring-buffer-add-NMI-protection-for-spinlocks.patch --]
[-- Type: text/plain, Size: 7067 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Impact: prevent deadlock in NMI

The ring buffers are not yet totally lockless with writing to
the buffer. When a writer crosses a page, it grabs a per cpu spinlock
to protect against a reader. The spinlocks taken by a writer are not
to protect against other writers, since a writer can only write to
its own per cpu buffer. The spinlocks protect against readers that
can touch any cpu buffer. The writers are made to be reentrant
with the spinlocks disabling interrupts.

The problem arises when an NMI writes to the buffer, and that write
crosses a page boundary. If it grabs a spinlock, it can be racing
with another writer (since disabling interrupts does not protect
against NMIs) or with a reader on the same CPU. Luckily, most of the
users are not reentrant and protects against this issue. But if a
user of the ring buffer becomes reentrant (which is what the ring
buffers do allow), if the NMI also writes to the ring buffer then
we risk the chance of a deadlock.

This patch moves the ftrace_nmi_enter called by nmi_enter() to the
ring buffer code. It replaces the current ftrace_nmi_enter that is
used by arch specific code to arch_ftrace_nmi_enter and updates
the Kconfig to handle it.

When an NMI is called, it will set a per cpu variable in the ring buffer
code and will clear it when the NMI exits. If a write to the ring buffer
crosses page boundaries inside an NMI, a trylock is used on the spin
lock instead. If the spinlock fails to be acquired, then the entry
is discarded.

This bug appeared in the ftrace work in the RT tree, where event tracing
is reentrant. This workaround solved the deadlocks that appeared there.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 arch/x86/Kconfig           |    1 +
 arch/x86/kernel/ftrace.c   |    8 +++---
 include/linux/ftrace_irq.h |   10 ++++++++-
 kernel/trace/Kconfig       |    8 +++++++
 kernel/trace/ring_buffer.c |   48 ++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4277640..3da7121 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -34,6 +34,7 @@ config X86
 	select HAVE_FUNCTION_TRACER
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_FUNCTION_TRACE_MCOUNT_TEST
+	select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE || FUNCTION_GRAPH_TRACER
 	select HAVE_KVM
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_TRACEHOOK
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 4d33224..4c68358 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -113,7 +113,7 @@ static void ftrace_mod_code(void)
 					     MCOUNT_INSN_SIZE);
 }
 
-void ftrace_nmi_enter(void)
+void arch_ftrace_nmi_enter(void)
 {
 	atomic_inc(&in_nmi);
 	/* Must have in_nmi seen before reading write flag */
@@ -124,7 +124,7 @@ void ftrace_nmi_enter(void)
 	}
 }
 
-void ftrace_nmi_exit(void)
+void arch_ftrace_nmi_exit(void)
 {
 	/* Finish all executions before clearing in_nmi */
 	smp_wmb();
@@ -376,12 +376,12 @@ int ftrace_disable_ftrace_graph_caller(void)
  */
 static atomic_t in_nmi;
 
-void ftrace_nmi_enter(void)
+void arch_ftrace_nmi_enter(void)
 {
 	atomic_inc(&in_nmi);
 }
 
-void ftrace_nmi_exit(void)
+void arch_ftrace_nmi_exit(void)
 {
 	atomic_dec(&in_nmi);
 }
diff --git a/include/linux/ftrace_irq.h b/include/linux/ftrace_irq.h
index 366a054..29de677 100644
--- a/include/linux/ftrace_irq.h
+++ b/include/linux/ftrace_irq.h
@@ -2,7 +2,15 @@
 #define _LINUX_FTRACE_IRQ_H
 
 
-#if defined(CONFIG_DYNAMIC_FTRACE) || defined(CONFIG_FUNCTION_GRAPH_TRACER)
+#ifdef CONFIG_FTRACE_NMI_ENTER
+extern void arch_ftrace_nmi_enter(void);
+extern void arch_ftrace_nmi_exit(void);
+#else
+static inline void arch_ftrace_nmi_enter(void) { }
+static inline void arch_ftrace_nmi_exit(void) { }
+#endif
+
+#ifdef CONFIG_RING_BUFFER
 extern void ftrace_nmi_enter(void);
 extern void ftrace_nmi_exit(void);
 #else
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 28f2644..25131a5 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -9,6 +9,9 @@ config USER_STACKTRACE_SUPPORT
 config NOP_TRACER
 	bool
 
+config HAVE_FTRACE_NMI_ENTER
+	bool
+
 config HAVE_FUNCTION_TRACER
 	bool
 
@@ -37,6 +40,11 @@ config TRACER_MAX_TRACE
 config RING_BUFFER
 	bool
 
+config FTRACE_NMI_ENTER
+       bool
+       depends on HAVE_FTRACE_NMI_ENTER
+       default y
+
 config TRACING
 	bool
 	select DEBUG_FS
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index b36d737..a60a6a8 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4,6 +4,7 @@
  * Copyright (C) 2008 Steven Rostedt <srostedt@redhat.com>
  */
 #include <linux/ring_buffer.h>
+#include <linux/ftrace_irq.h>
 #include <linux/spinlock.h>
 #include <linux/debugfs.h>
 #include <linux/uaccess.h>
@@ -19,6 +20,35 @@
 #include "trace.h"
 
 /*
+ * Since the write to the buffer is still not fully lockless,
+ * we must be careful with NMIs. The locks in the writers
+ * are taken when a write crosses to a new page. The locks
+ * protect against races with the readers (this will soon
+ * be fixed with a lockless solution).
+ *
+ * Because we can not protect against NMIs, and we want to
+ * keep traces reentrant, we need to manage what happens
+ * when we are in an NMI.
+ */
+static DEFINE_PER_CPU(int, rb_in_nmi);
+
+void ftrace_nmi_enter(void)
+{
+	__get_cpu_var(rb_in_nmi)++;
+	/* call arch specific handler too */
+	arch_ftrace_nmi_enter();
+}
+
+void ftrace_nmi_exit(void)
+{
+	arch_ftrace_nmi_exit();
+	__get_cpu_var(rb_in_nmi)--;
+	/* NMIs are not recursive */
+	WARN_ON_ONCE(__get_cpu_var(rb_in_nmi));
+}
+
+
+/*
  * A fast way to enable or disable all ring buffers is to
  * call tracing_on or tracing_off. Turning off the ring buffers
  * prevents all ring buffers from being recorded to.
@@ -982,6 +1012,7 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
 	struct ring_buffer *buffer = cpu_buffer->buffer;
 	struct ring_buffer_event *event;
 	unsigned long flags;
+	bool lock_taken = false;
 
 	commit_page = cpu_buffer->commit_page;
 	/* we just need to protect against interrupts */
@@ -995,7 +1026,19 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
 		struct buffer_page *next_page = tail_page;
 
 		local_irq_save(flags);
-		__raw_spin_lock(&cpu_buffer->lock);
+		/*
+		 * NMIs can happen after we take the lock.
+		 * If we are in an NMI, only take the lock
+		 * if it is not already taken. Otherwise
+		 * simply fail.
+		 */
+		if (unlikely(__get_cpu_var(rb_in_nmi))) {
+			if (!__raw_spin_trylock(&cpu_buffer->lock))
+				goto out_unlock;
+		} else
+			__raw_spin_lock(&cpu_buffer->lock);
+
+		lock_taken = true;
 
 		rb_inc_page(cpu_buffer, &next_page);
 
@@ -1097,7 +1140,8 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
 	if (tail <= BUF_PAGE_SIZE)
 		local_set(&tail_page->write, tail);
 
-	__raw_spin_unlock(&cpu_buffer->lock);
+	if (likely(lock_taken))
+		__raw_spin_unlock(&cpu_buffer->lock);
 	local_irq_restore(flags);
 	return NULL;
 }
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] ring-buffer: allow tracing_off to be used in core kernel code
  2009-02-06  1:13 [PATCH 0/2] ftrace: updates for tip Steven Rostedt
  2009-02-06  1:13 ` [PATCH 1/2] ring-buffer: add NMI protection for spinlocks Steven Rostedt
@ 2009-02-06  1:13 ` Steven Rostedt
  1 sibling, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2009-02-06  1:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Frederic Weisbecker,
	Arnaldo Carvalho de Melo, Steven Rostedt

[-- Attachment #1: 0002-ring-buffer-allow-tracing_off-to-be-used-in-core-ke.patch --]
[-- Type: text/plain, Size: 1337 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

tracing_off() is the fastest way to stop recording to the ring buffers.
This may be used in places like panic and die, just before the
ftrace_dump is called.

This patch adds the appropriate CPP conditionals to make it a stub
function when the ring buffer is not configured it.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
 include/linux/ring_buffer.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index b3b3596..ac94c06 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -124,9 +124,18 @@ unsigned long ring_buffer_overrun_cpu(struct ring_buffer *buffer, int cpu);
 u64 ring_buffer_time_stamp(int cpu);
 void ring_buffer_normalize_time_stamp(int cpu, u64 *ts);
 
+/*
+ * The below functions are fine to use outside the tracing facility.
+ */
+#ifdef CONFIG_RING_BUFFER
 void tracing_on(void);
 void tracing_off(void);
 void tracing_off_permanent(void);
+#else
+static inline void tracing_on(void) { }
+static inline void tracing_off(void) { }
+static inline void tracing_off_permanent(void) { }
+#endif
 
 void *ring_buffer_alloc_read_page(struct ring_buffer *buffer);
 void ring_buffer_free_read_page(struct ring_buffer *buffer, void *data);
-- 
1.5.6.5

-- 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] ring-buffer: add NMI protection for spinlocks
  2009-02-06  1:13 ` [PATCH 1/2] ring-buffer: add NMI protection for spinlocks Steven Rostedt
@ 2009-02-06  1:53   ` Andrew Morton
  2009-02-06  2:25     ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2009-02-06  1:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Frederic Weisbecker,
	Arnaldo Carvalho de Melo, Steven Rostedt

On Thu, 05 Feb 2009 20:13:16 -0500 Steven Rostedt <rostedt@goodmis.org> wrote:

> From: Steven Rostedt <srostedt@redhat.com>
> 
> Impact: prevent deadlock in NMI
> 
> The ring buffers are not yet totally lockless with writing to
> the buffer. When a writer crosses a page, it grabs a per cpu spinlock
> to protect against a reader. The spinlocks taken by a writer are not
> to protect against other writers, since a writer can only write to
> its own per cpu buffer. The spinlocks protect against readers that
> can touch any cpu buffer. The writers are made to be reentrant
> with the spinlocks disabling interrupts.
> 
> The problem arises when an NMI writes to the buffer, and that write
> crosses a page boundary. If it grabs a spinlock, it can be racing
> with another writer (since disabling interrupts does not protect
> against NMIs) or with a reader on the same CPU. Luckily, most of the
> users are not reentrant and protects against this issue. But if a
> user of the ring buffer becomes reentrant (which is what the ring
> buffers do allow), if the NMI also writes to the ring buffer then
> we risk the chance of a deadlock.
> 
> This patch moves the ftrace_nmi_enter called by nmi_enter() to the
> ring buffer code. It replaces the current ftrace_nmi_enter that is
> used by arch specific code to arch_ftrace_nmi_enter and updates
> the Kconfig to handle it.
> 
> When an NMI is called, it will set a per cpu variable in the ring buffer
> code and will clear it when the NMI exits. If a write to the ring buffer
> crosses page boundaries inside an NMI, a trylock is used on the spin
> lock instead. If the spinlock fails to be acquired, then the entry
> is discarded.
> 
> This bug appeared in the ftrace work in the RT tree, where event tracing
> is reentrant. This workaround solved the deadlocks that appeared there.
> 
> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
> ---
>  arch/x86/Kconfig           |    1 +
>  arch/x86/kernel/ftrace.c   |    8 +++---
>  include/linux/ftrace_irq.h |   10 ++++++++-
>  kernel/trace/Kconfig       |    8 +++++++
>  kernel/trace/ring_buffer.c |   48 ++++++++++++++++++++++++++++++++++++++++++-
>  5 files changed, 68 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 4277640..3da7121 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -34,6 +34,7 @@ config X86
>  	select HAVE_FUNCTION_TRACER
>  	select HAVE_FUNCTION_GRAPH_TRACER
>  	select HAVE_FUNCTION_TRACE_MCOUNT_TEST
> +	select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE || FUNCTION_GRAPH_TRACER
>  	select HAVE_KVM
>  	select HAVE_ARCH_KGDB
>  	select HAVE_ARCH_TRACEHOOK
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index 4d33224..4c68358 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -113,7 +113,7 @@ static void ftrace_mod_code(void)
>  					     MCOUNT_INSN_SIZE);
>  }
>  
> -void ftrace_nmi_enter(void)
> +void arch_ftrace_nmi_enter(void)
>  {
>  	atomic_inc(&in_nmi);

This in_nmi thing looks a bit awkward.  It's not very clear from the
code what it does or why it exists, but it looks like I'm going to have
to get used to that...

Surely it is buggy?  It's incremented by _any_ cpu, so code which reads
it (eg, prepare_ftrace_return()) will bale out if some other CPU
happened to be in the middle of taking an NMI.

Would it not be better to make in_nmi a per-cpu thing, make it a piece
of first-class kernel infrastructure, add a nice interface to it, hoist
it up into generic code, call that code from arch NMI handlers, call
that interface from ftrace, etc?  Or perhaps it could be implemented in
task_struct, alongside in_irq(), in_softirq(), etc.  There are three bits
left in ->preempt_count.

This approach would remove the duplication which this patch is
attempting to add.

Plus there's already quite a bit of duplication in
arch/x86/kernel/ftrace.c between the
CONFIG_DYNAMIC_FTRACE/!CONFIG_DYNAMIC_FTRACE hunks.  Two definitions of
in_nmi, duplicated inc/dec thereof?




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] ring-buffer: add NMI protection for spinlocks
  2009-02-06  1:53   ` Andrew Morton
@ 2009-02-06  2:25     ` Steven Rostedt
  2009-02-06  2:32       ` Steven Rostedt
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2009-02-06  2:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, Ingo Molnar, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Steven Rostedt, Arjan van de Ven

[ added Arjan ]

On Thu, 5 Feb 2009, Andrew Morton wrote:
> > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> > index 4d33224..4c68358 100644
> > --- a/arch/x86/kernel/ftrace.c
> > +++ b/arch/x86/kernel/ftrace.c
> > @@ -113,7 +113,7 @@ static void ftrace_mod_code(void)
> >  					     MCOUNT_INSN_SIZE);
> >  }
> >  
> > -void ftrace_nmi_enter(void)
> > +void arch_ftrace_nmi_enter(void)
> >  {
> >  	atomic_inc(&in_nmi);
> 
> This in_nmi thing looks a bit awkward.  It's not very clear from the
> code what it does or why it exists, but it looks like I'm going to have
> to get used to that...
> 
> Surely it is buggy?  It's incremented by _any_ cpu, so code which reads
> it (eg, prepare_ftrace_return()) will bale out if some other CPU
> happened to be in the middle of taking an NMI.

And that is exactly what we want it to do ;-)

This is code modification. If you modify code that another cpu is 
executing with, the result is undefined. In my tests to cause this type of 
error, I usually get a general protection fault.

To modify code safely, you must act like a uniprocessor. That is, the code 
you modify must not be executing on another CPU. We use stop_machine to 
help us out here, but under NMI stress tests, we found that we can crash, 
because the NMI will execute code being modified. Stop_machine does not 
disable NMIs on other CPUS.

To solve this, we sync up the modification with the NMI with a couple of 
variables. That's what all those memory barriers are for.

When we go to modify a line of code, we set a variable that we are doing 
so, wait for all NMIs to finish, then modify the code, wait for NMIs
again, and repeat on the next line to modify.

If the NMI goes off on any CPU, it increments the in_nmi counter. If the 
mod code flag is set, the NMI modifies the code.

The trick here is that the crash only occurs if the code is modified as it 
is being executed on. But it will not crash if the code is written to with 
the same data as it was. In other words, it is fine to write the code and 
execute it, if what you are writing is what is already there. No 
modification being done.

All NMIs that go off will modify the same code. Even if the code being 
modified is in a NMI code path, only the first NMI that goes off will 
actually modify it, the rest will just be writting the same data to the 
location. Even the main code will be writing the same data. No races.

> 
> Would it not be better to make in_nmi a per-cpu thing, make it a piece
> of first-class kernel infrastructure, add a nice interface to it, hoist
> it up into generic code, call that code from arch NMI handlers, call
> that interface from ftrace, etc?  Or perhaps it could be implemented in
> task_struct, alongside in_irq(), in_softirq(), etc.  There are three bits
> left in ->preempt_count.

I agree, we should have a per cpu nmi code. I can write that up and use
it for the ring buffer. But the call back for the ftrace code that I 
explained here must still be made. We need a "global" nmi counter for it 
to work.

-- Steve

> 
> This approach would remove the duplication which this patch is
> attempting to add.
> 
> Plus there's already quite a bit of duplication in
> arch/x86/kernel/ftrace.c between the
> CONFIG_DYNAMIC_FTRACE/!CONFIG_DYNAMIC_FTRACE hunks.  Two definitions of
> in_nmi, duplicated inc/dec thereof?
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] ring-buffer: add NMI protection for spinlocks
  2009-02-06  2:25     ` Steven Rostedt
@ 2009-02-06  2:32       ` Steven Rostedt
  0 siblings, 0 replies; 8+ messages in thread
From: Steven Rostedt @ 2009-02-06  2:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, Ingo Molnar, Frederic Weisbecker, Arnaldo Carvalho de Melo,
	Steven Rostedt, Arjan van de Ven


On Thu, 5 Feb 2009, Steven Rostedt wrote:

> 
> [ added Arjan ]
> 
> On Thu, 5 Feb 2009, Andrew Morton wrote:
> > > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> > > index 4d33224..4c68358 100644
> > > --- a/arch/x86/kernel/ftrace.c
> > > +++ b/arch/x86/kernel/ftrace.c
> > > @@ -113,7 +113,7 @@ static void ftrace_mod_code(void)
> > >  					     MCOUNT_INSN_SIZE);
> > >  }
> > >  
> > > -void ftrace_nmi_enter(void)
> > > +void arch_ftrace_nmi_enter(void)
> > >  {
> > >  	atomic_inc(&in_nmi);
> > 
> > This in_nmi thing looks a bit awkward.  It's not very clear from the
> > code what it does or why it exists, but it looks like I'm going to have
> > to get used to that...
> > 
> > Surely it is buggy?  It's incremented by _any_ cpu, so code which reads
> > it (eg, prepare_ftrace_return()) will bale out if some other CPU
> > happened to be in the middle of taking an NMI.

Arg! Completely ignore this. I did not look at the function you mentioned.
I was thinking about the code modification.

Yes! It is buggy :-p

The prepare_ftrace_return only wants to bail if it is in an nmi on the 
current CPU.

/me goes to write a patch.

-- Steve

> 
> And that is exactly what we want it to do ;-)
> 
> This is code modification. If you modify code that another cpu is 
> executing with, the result is undefined. In my tests to cause this type of 
> error, I usually get a general protection fault.
> 
> To modify code safely, you must act like a uniprocessor. That is, the code 
> you modify must not be executing on another CPU. We use stop_machine to 
> help us out here, but under NMI stress tests, we found that we can crash, 
> because the NMI will execute code being modified. Stop_machine does not 
> disable NMIs on other CPUS.
> 
> To solve this, we sync up the modification with the NMI with a couple of 
> variables. That's what all those memory barriers are for.
> 
> When we go to modify a line of code, we set a variable that we are doing 
> so, wait for all NMIs to finish, then modify the code, wait for NMIs
> again, and repeat on the next line to modify.
> 
> If the NMI goes off on any CPU, it increments the in_nmi counter. If the 
> mod code flag is set, the NMI modifies the code.
> 
> The trick here is that the crash only occurs if the code is modified as it 
> is being executed on. But it will not crash if the code is written to with 
> the same data as it was. In other words, it is fine to write the code and 
> execute it, if what you are writing is what is already there. No 
> modification being done.
> 
> All NMIs that go off will modify the same code. Even if the code being 
> modified is in a NMI code path, only the first NMI that goes off will 
> actually modify it, the rest will just be writting the same data to the 
> location. Even the main code will be writing the same data. No races.
> 
> 
> 
> > 
> > Would it not be better to make in_nmi a per-cpu thing, make it a piece
> > of first-class kernel infrastructure, add a nice interface to it, hoist
> > it up into generic code, call that code from arch NMI handlers, call
> > that interface from ftrace, etc?  Or perhaps it could be implemented in
> > task_struct, alongside in_irq(), in_softirq(), etc.  There are three bits
> > left in ->preempt_count.
> 
> I agree, we should have a per cpu nmi code. I can write that up and use
> it for the ring buffer. But the call back for the ftrace code that I 
> explained here must still be made. We need a "global" nmi counter for it 
> to work.
> 
> -- Steve
> 
> > 
> > This approach would remove the duplication which this patch is
> > attempting to add.
> > 
> > Plus there's already quite a bit of duplication in
> > arch/x86/kernel/ftrace.c between the
> > CONFIG_DYNAMIC_FTRACE/!CONFIG_DYNAMIC_FTRACE hunks.  Two definitions of
> > in_nmi, duplicated inc/dec thereof?
> > 
> > 
> > 
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-02-06  2:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-06  1:13 [PATCH 0/2] ftrace: updates for tip Steven Rostedt
2009-02-06  1:13 ` [PATCH 1/2] ring-buffer: add NMI protection for spinlocks Steven Rostedt
2009-02-06  1:53   ` Andrew Morton
2009-02-06  2:25     ` Steven Rostedt
2009-02-06  2:32       ` Steven Rostedt
2009-02-06  1:13 ` [PATCH 2/2] ring-buffer: allow tracing_off to be used in core kernel code Steven Rostedt
  -- strict thread matches above, loose matches on Subject: below --
2009-01-22 20:20 [PATCH 0/2] ftrace: updates for tip Steven Rostedt
2009-01-23 10:09 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox