[RFC][PATCH] tracing: Have stack tracer force RCU to be watching

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH] tracing: Have stack tracer force RCU to be watching
@ 2015-10-20 16:10 Steven Rostedt
  2015-10-20 20:25 ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2015-10-20 16:10 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Peter Zijlstra, LKML

Paul,

I've spent a couple of days debugging this, and finally found that my
stack tracer was calling the stack trace code, which calls
__module_address() which asserts the below.

Is just calling rcu_irq_enter() and rcu_irq_exit() safe to do
everywhere (with interrupts always disabled)? This patch appears to fix
the bug.

Peter,

I'm going to be sending a second patch that converts that from a
WARN_ON() to an open coded WARN_ON_ONCE(), because WARN_ON() also calls
the stack trace code which calls __module_address() and we end up with
an infinite warning about it. This prevented me from seeing where the
bug actually was, and crashed the box.

-- Steve

>From a2d7629048322ae62bff57f34f5f995e25ed234c Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
Date: Tue, 20 Oct 2015 11:38:08 -0400
Subject: [PATCH] tracing: Have stack tracer force RCU to be watching

The stack tracer was triggering the WARN_ON() in module.c:

 static void module_assert_mutex_or_preempt(void)
 {
 #ifdef CONFIG_LOCKDEP
	if (unlikely(!debug_locks))
		return;

	WARN_ON(!rcu_read_lock_sched_held() &&
		!lockdep_is_held(&module_mutex));
 #endif
 }

The reason is that the stack tracer traces all function calls, and some of
those calls happen while exiting or entering user space and idle. Some of
these functions are called after RCU had already stopped watching, as RCU
does not watch userspace or idle CPUs.

If a max stack is hit, then the save_stack_trace() is called, which will
check module addresses and call module_assert_mutex_or_preempt(), and then
trigger the warning. Sad part is, the warning itself will also do a stack
trace and tigger the same warning. That probably should be fixed.

The warning was added by 0be964be0d45 "module: Sanitize RCU usage and
locking" but this bug has probably been around longer. But it's unlikely to
cause much harm, but the new warning causes the system to lock up.

Cc: stable@vger.kernel.org # 4.2+
Cc: Peter Zijlstra <peterz@infradead.org>
Cc:"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_stack.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index b746399ab59c..5f29402bff0f 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -88,6 +88,12 @@ check_stack(unsigned long ip, unsigned long *stack)
 	local_irq_save(flags);
 	arch_spin_lock(&max_stack_lock);

+	/*
+	 * RCU may not be watching, make it see us.
+	 * The stack trace code uses rcu_sched.
+	 */
+	rcu_irq_enter();
+
 	/* In case another CPU set the tracer_frame on us */
 	if (unlikely(!frame_size))
 		this_size -= tracer_frame;
@@ -169,6 +175,7 @@ check_stack(unsigned long ip, unsigned long *stack)
 	}

  out:
+	rcu_irq_exit();
 	arch_spin_unlock(&max_stack_lock);
 	local_irq_restore(flags);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] tracing: Have stack tracer force RCU to be watching
  2015-10-20 16:10 [RFC][PATCH] tracing: Have stack tracer force RCU to be watching Steven Rostedt
@ 2015-10-20 20:25 ` Paul E. McKenney
  2015-10-20 21:32   ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2015-10-20 20:25 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Peter Zijlstra, LKML

On Tue, Oct 20, 2015 at 12:10:31PM -0400, Steven Rostedt wrote:
> 
> Paul,
> 
> I've spent a couple of days debugging this, and finally found that my
> stack tracer was calling the stack trace code, which calls
> __module_address() which asserts the below.
> 
> Is just calling rcu_irq_enter() and rcu_irq_exit() safe to do
> everywhere (with interrupts always disabled)? This patch appears to fix
> the bug.

Yep!  Just don't call it from an NMI handler.  And don't call it with
interrupts enabled.  The patch looks to have interrupts always disabled,
and the surrounding code doesn't look like NMI-safe code anyway, so
should be OK.

							Thanx, Paul

> Peter,
> 
> I'm going to be sending a second patch that converts that from a
> WARN_ON() to an open coded WARN_ON_ONCE(), because WARN_ON() also calls
> the stack trace code which calls __module_address() and we end up with
> an infinite warning about it. This prevented me from seeing where the
> bug actually was, and crashed the box.
> 
> -- Steve
> 
> 
> 
> From a2d7629048322ae62bff57f34f5f995e25ed234c Mon Sep 17 00:00:00 2001
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> Date: Tue, 20 Oct 2015 11:38:08 -0400
> Subject: [PATCH] tracing: Have stack tracer force RCU to be watching
> 
> The stack tracer was triggering the WARN_ON() in module.c:
> 
>  static void module_assert_mutex_or_preempt(void)
>  {
>  #ifdef CONFIG_LOCKDEP
> 	if (unlikely(!debug_locks))
> 		return;
> 
> 	WARN_ON(!rcu_read_lock_sched_held() &&
> 		!lockdep_is_held(&module_mutex));
>  #endif
>  }
> 
> The reason is that the stack tracer traces all function calls, and some of
> those calls happen while exiting or entering user space and idle. Some of
> these functions are called after RCU had already stopped watching, as RCU
> does not watch userspace or idle CPUs.
> 
> If a max stack is hit, then the save_stack_trace() is called, which will
> check module addresses and call module_assert_mutex_or_preempt(), and then
> trigger the warning. Sad part is, the warning itself will also do a stack
> trace and tigger the same warning. That probably should be fixed.
> 
> The warning was added by 0be964be0d45 "module: Sanitize RCU usage and
> locking" but this bug has probably been around longer. But it's unlikely to
> cause much harm, but the new warning causes the system to lock up.
> 
> Cc: stable@vger.kernel.org # 4.2+
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc:"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  kernel/trace/trace_stack.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index b746399ab59c..5f29402bff0f 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -88,6 +88,12 @@ check_stack(unsigned long ip, unsigned long *stack)
>  	local_irq_save(flags);
>  	arch_spin_lock(&max_stack_lock);
>  
> +	/*
> +	 * RCU may not be watching, make it see us.
> +	 * The stack trace code uses rcu_sched.
> +	 */
> +	rcu_irq_enter();
> +
>  	/* In case another CPU set the tracer_frame on us */
>  	if (unlikely(!frame_size))
>  		this_size -= tracer_frame;
> @@ -169,6 +175,7 @@ check_stack(unsigned long ip, unsigned long *stack)
>  	}
>  
>   out:
> +	rcu_irq_exit();
>  	arch_spin_unlock(&max_stack_lock);
>  	local_irq_restore(flags);
>  }
> -- 
> 1.8.3.1
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] tracing: Have stack tracer force RCU to be watching
  2015-10-20 20:25 ` Paul E. McKenney
@ 2015-10-20 21:32   ` Steven Rostedt
  2015-10-20 23:48     ` Paul E. McKenney
  2015-10-21  8:01     ` Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Steven Rostedt @ 2015-10-20 21:32 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Peter Zijlstra, LKML

On Tue, 20 Oct 2015 13:25:28 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Tue, Oct 20, 2015 at 12:10:31PM -0400, Steven Rostedt wrote:
> > 
> > Paul,
> > 
> > I've spent a couple of days debugging this, and finally found that my
> > stack tracer was calling the stack trace code, which calls
> > __module_address() which asserts the below.
> > 
> > Is just calling rcu_irq_enter() and rcu_irq_exit() safe to do
> > everywhere (with interrupts always disabled)? This patch appears to fix
> > the bug.
> 
> Yep!  Just don't call it from an NMI handler.  And don't call it with
> interrupts enabled.  The patch looks to have interrupts always disabled,
> and the surrounding code doesn't look like NMI-safe code anyway, so
> should be OK.
> 
> 							Thanx, Paul
>

Hmm, good point about NMI handler. Right now I think the only thing
protecting this from getting in the critical section while in NMI is
the check that we are using the task struct stack. But that may not be
enough in 32 bit.

I should probably add a "if (in_nmi()) return" somewhere.

-- Steve

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] tracing: Have stack tracer force RCU to be watching
  2015-10-20 21:32   ` Steven Rostedt
@ 2015-10-20 23:48     ` Paul E. McKenney
  2015-10-20 23:55       ` Steven Rostedt
  2015-10-21  8:01     ` Peter Zijlstra
  1 sibling, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2015-10-20 23:48 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Peter Zijlstra, LKML

On Tue, Oct 20, 2015 at 05:32:28PM -0400, Steven Rostedt wrote:
> On Tue, 20 Oct 2015 13:25:28 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Tue, Oct 20, 2015 at 12:10:31PM -0400, Steven Rostedt wrote:
> > > 
> > > Paul,
> > > 
> > > I've spent a couple of days debugging this, and finally found that my
> > > stack tracer was calling the stack trace code, which calls
> > > __module_address() which asserts the below.
> > > 
> > > Is just calling rcu_irq_enter() and rcu_irq_exit() safe to do
> > > everywhere (with interrupts always disabled)? This patch appears to fix
> > > the bug.
> > 
> > Yep!  Just don't call it from an NMI handler.  And don't call it with
> > interrupts enabled.  The patch looks to have interrupts always disabled,
> > and the surrounding code doesn't look like NMI-safe code anyway, so
> > should be OK.
> > 
> > 							Thanx, Paul
> >
> 
> Hmm, good point about NMI handler. Right now I think the only thing
> protecting this from getting in the critical section while in NMI is
> the check that we are using the task struct stack. But that may not be
> enough in 32 bit.
> 
> I should probably add a "if (in_nmi()) return" somewhere.

Please!  ;-) ;-) ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] tracing: Have stack tracer force RCU to be watching
  2015-10-20 23:48     ` Paul E. McKenney
@ 2015-10-20 23:55       ` Steven Rostedt
  2015-10-21 17:46         ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2015-10-20 23:55 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Peter Zijlstra, LKML

On Tue, 20 Oct 2015 16:48:30 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Tue, Oct 20, 2015 at 05:32:28PM -0400, Steven Rostedt wrote:
> > On Tue, 20 Oct 2015 13:25:28 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > On Tue, Oct 20, 2015 at 12:10:31PM -0400, Steven Rostedt wrote:
> > > > 
> > > > Paul,
> > > > 
> > > > I've spent a couple of days debugging this, and finally found that my
> > > > stack tracer was calling the stack trace code, which calls
> > > > __module_address() which asserts the below.
> > > > 
> > > > Is just calling rcu_irq_enter() and rcu_irq_exit() safe to do
> > > > everywhere (with interrupts always disabled)? This patch appears to fix
> > > > the bug.
> > > 
> > > Yep!  Just don't call it from an NMI handler.  And don't call it with
> > > interrupts enabled.  The patch looks to have interrupts always disabled,
> > > and the surrounding code doesn't look like NMI-safe code anyway, so
> > > should be OK.
> > > 
> > > 							Thanx, Paul
> > >
> > 
> > Hmm, good point about NMI handler. Right now I think the only thing
> > protecting this from getting in the critical section while in NMI is
> > the check that we are using the task struct stack. But that may not be
> > enough in 32 bit.
> > 
> > I should probably add a "if (in_nmi()) return" somewhere.
> 
> Please!  ;-) ;-) ;-)
> 
>

That's a separate fix, as it will break elsewhere than just this.

As for my patch, Can I have an Acked-by?

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] tracing: Have stack tracer force RCU to be watching
  2015-10-20 21:32   ` Steven Rostedt
  2015-10-20 23:48     ` Paul E. McKenney
@ 2015-10-21  8:01     ` Peter Zijlstra
  2015-10-21 11:39       ` Steven Rostedt
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2015-10-21  8:01 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Paul E. McKenney, LKML

On Tue, Oct 20, 2015 at 05:32:28PM -0400, Steven Rostedt wrote:
> On Tue, 20 Oct 2015 13:25:28 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Tue, Oct 20, 2015 at 12:10:31PM -0400, Steven Rostedt wrote:
> > > 
> > > Paul,
> > > 
> > > I've spent a couple of days debugging this, and finally found that my
> > > stack tracer was calling the stack trace code, which calls
> > > __module_address() which asserts the below.
> > > 
> > > Is just calling rcu_irq_enter() and rcu_irq_exit() safe to do
> > > everywhere (with interrupts always disabled)? This patch appears to fix
> > > the bug.
> > 
> > Yep!  Just don't call it from an NMI handler.  And don't call it with
> > interrupts enabled.  The patch looks to have interrupts always disabled,
> > and the surrounding code doesn't look like NMI-safe code anyway, so
> > should be OK.
> > 
> > 							Thanx, Paul
> >
> 
> Hmm, good point about NMI handler. Right now I think the only thing
> protecting this from getting in the critical section while in NMI is
> the check that we are using the task struct stack. But that may not be
> enough in 32 bit.
> 
> I should probably add a "if (in_nmi()) return" somewhere.

But if there's an arch that doesn't use a separate NMI stack, the NMI
might cause the largest stack, which would then remain invisible from
the stack-tracer.

Should we not instead fix the NMI-safety of this tracer?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] tracing: Have stack tracer force RCU to be watching
  2015-10-21  8:01     ` Peter Zijlstra
@ 2015-10-21 11:39       ` Steven Rostedt
  2015-10-21 11:52         ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2015-10-21 11:39 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Paul E. McKenney, LKML

On Wed, 21 Oct 2015 10:01:42 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> > I should probably add a "if (in_nmi()) return" somewhere.
> 
> But if there's an arch that doesn't use a separate NMI stack, the NMI
> might cause the largest stack, which would then remain invisible from
> the stack-tracer.
> 
> Should we not instead fix the NMI-safety of this tracer?

We could, but that should be a separate project, as that would require
doing everything lockless, which would require a redesign. Is that
worth it?

For now, the safe thing to do is the if (in_nmi()), but certainly, if
someone gets time to make it NMI safe, we can do that too.

-- Steve

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] tracing: Have stack tracer force RCU to be watching
  2015-10-21 11:39       ` Steven Rostedt
@ 2015-10-21 11:52         ` Peter Zijlstra
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2015-10-21 11:52 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Paul E. McKenney, LKML

On Wed, Oct 21, 2015 at 07:39:22AM -0400, Steven Rostedt wrote:
> On Wed, 21 Oct 2015 10:01:42 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > I should probably add a "if (in_nmi()) return" somewhere.
> > 
> > But if there's an arch that doesn't use a separate NMI stack, the NMI
> > might cause the largest stack, which would then remain invisible from
> > the stack-tracer.
> > 
> > Should we not instead fix the NMI-safety of this tracer?
> 
> We could, but that should be a separate project, as that would require
> doing everything lockless, which would require a redesign. Is that
> worth it?

I've no idea on either, not on how hard it would be to fix, nor on if
its worth the effort.

I suppose auditing which archs do not have dedicated NMI stacks might be
a good first stab at things. x86 still uses an IST for NMIs, right?
Andy's been changing things a lot lately, I'm not sure I'm up to date on
this.

> For now, the safe thing to do is the if (in_nmi()), but certainly, if
> someone gets time to make it NMI safe, we can do that too.

Sure, fix current holes first.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] tracing: Have stack tracer force RCU to be watching
  2015-10-20 23:55       ` Steven Rostedt
@ 2015-10-21 17:46         ` Steven Rostedt
  0 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2015-10-21 17:46 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Peter Zijlstra, LKML

On Tue, 20 Oct 2015 19:55:29 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 20 Oct 2015 16:48:30 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Tue, Oct 20, 2015 at 05:32:28PM -0400, Steven Rostedt wrote:
> > > On Tue, 20 Oct 2015 13:25:28 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > 
> > > > On Tue, Oct 20, 2015 at 12:10:31PM -0400, Steven Rostedt wrote:
> > > > > 
> > > > > Paul,
> > > > > 
> > > > > I've spent a couple of days debugging this, and finally found that my
> > > > > stack tracer was calling the stack trace code, which calls
> > > > > __module_address() which asserts the below.
> > > > > 
> > > > > Is just calling rcu_irq_enter() and rcu_irq_exit() safe to do
> > > > > everywhere (with interrupts always disabled)? This patch appears to fix
> > > > > the bug.
> > > > 
> > > > Yep!  Just don't call it from an NMI handler.  And don't call it with
> > > > interrupts enabled.  The patch looks to have interrupts always disabled,
> > > > and the surrounding code doesn't look like NMI-safe code anyway, so
> > > > should be OK.
> > > > 
> > > > 							Thanx, Paul
> > > >
> > > 
> > > Hmm, good point about NMI handler. Right now I think the only thing
> > > protecting this from getting in the critical section while in NMI is
> > > the check that we are using the task struct stack. But that may not be
> > > enough in 32 bit.
> > > 
> > > I should probably add a "if (in_nmi()) return" somewhere.
> > 
> > Please!  ;-) ;-) ;-)
> > 
> >
> 
> That's a separate fix, as it will break elsewhere than just this.
> 
> As for my patch, Can I have an Acked-by?

Ping?

-- Steve

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-10-21 17:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-20 16:10 [RFC][PATCH] tracing: Have stack tracer force RCU to be watching Steven Rostedt
2015-10-20 20:25 ` Paul E. McKenney
2015-10-20 21:32   ` Steven Rostedt
2015-10-20 23:48     ` Paul E. McKenney
2015-10-20 23:55       ` Steven Rostedt
2015-10-21 17:46         ` Steven Rostedt
2015-10-21  8:01     ` Peter Zijlstra
2015-10-21 11:39       ` Steven Rostedt
2015-10-21 11:52         ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).