public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Li Zefan <lizf@cn.fujitsu.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	stable@kernel.org
Subject: Re: [PATCH 5/5] tracing: Do not record user stack trace from NMI context
Date: Sun, 14 Mar 2010 11:27:53 +0100	[thread overview]
Message-ID: <20100314102747.GB5140@nowhere> (raw)
In-Reply-To: <20100313025855.495916344@goodmis.org>

On Fri, Mar 12, 2010 at 09:57:00PM -0500, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
> 
> A bug was found with Li Zefan's ftrace_stress_test that caused applications
> to segfault during the test.
> 
> Placing a tracing_off() in the segfault code, and examining several
> traces, I found that the following was always the case. The lock tracer
> was enabled (lockdep being required) and userstack was enabled. Testing
> this out, I just enabled the two, but that was not good enough. I needed
> to run something else that could trigger it. Running a load like hackbench
> did not work, but executing a new program would. The following would
> trigger the segfault within seconds:
> 
>   # echo 1 > /debug/tracing/options/userstacktrace
>   # echo 1 > /debug/tracing/events/lock/enable
>   # while :; do ls > /dev/null ; done
> 
> Enabling the function graph tracer and looking at what was happening
> I finally noticed that all cashes happened just after an NMI.
> 
>  1)               |    copy_user_handle_tail() {
>  1)               |      bad_area_nosemaphore() {
>  1)               |        __bad_area_nosemaphore() {
>  1)               |          no_context() {
>  1)               |            fixup_exception() {
>  1)   0.319 us    |              search_exception_tables();
>  1)   0.873 us    |            }
> [...]
>  1)   0.314 us    |  __rcu_read_unlock();
>  1)   0.325 us    |    native_apic_mem_write();
>  1)   0.943 us    |  }
>  1)   0.304 us    |  rcu_nmi_exit();
> [...]
>  1)   0.479 us    |  find_vma();
>  1)               |  bad_area() {
>  1)               |    __bad_area() {
> 
> After capturing several traces of failures, all of them happened
> after an NMI. Curious about this, I added a trace_printk() to the NMI
> handler to read the regs->ip to see where the NMI happened. In which I
> found out it was here:
> 
> ffffffff8135b660 <page_fault>:
> ffffffff8135b660:       48 83 ec 78             sub    $0x78,%rsp
> ffffffff8135b664:       e8 97 01 00 00          callq  ffffffff8135b800 <error_entry>
> 
> What was happening is that the NMI would happen at the place that a page
> fault occurred. It would call rcu_read_lock() which was traced by
> the lock events, and the user_stack_trace would run. This would trigger
> a page fault inside the NMI. I do not see where the CR2 register is
> saved or restored in NMI handling. This means that it would corrupt
> the page fault handling that the NMI interrupted.
> 
> The reason the while loop of ls helped trigger the bug, was that
> each execution of ls would cause lots of pages to be faulted in, and
> increase the chances of the race happening.
> 
> The simple solution is to not allow user stack traces in NMI context.
> After this patch, I ran the above "ls" test for a couple of hours
> without any issues. Without this patch, the bug would trigger in less
> than a minute.
> 
> Cc: stable@kernel.org
> Reported-by: Li Zefan <lizf@cn.fujitsu.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>



Wow, that's a race :)

In perf this is dealt with a special copy_from_user_nmi()
(see in arch/x86/kernel/cpu/perf_event.c)

May be save_stack_trace_user() should use that instead
of a __copy_from_user_inatomic() based thing, just to
cover such NMI corner race case.



> ---
>  kernel/trace/trace.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 484337d..e52683f 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -1284,6 +1284,13 @@ ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags, int pc)
>  	if (!(trace_flags & TRACE_ITER_USERSTACKTRACE))
>  		return;
>  
> +	/*
> +	 * NMIs can not handle page faults, even with fix ups.
> +	 * The save user stack can (and often does) fault.
> +	 */
> +	if (unlikely(in_nmi()))
> +		return;
> +
>  	event = trace_buffer_lock_reserve(buffer, TRACE_USER_STACK,
>  					  sizeof(*entry), flags, pc);
>  	if (!event)
> -- 
> 1.7.0
> 
> 


  reply	other threads:[~2010-03-14 10:27 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-13  2:56 [PATCH 0/5][GIT PULL][2.6.34] tracing: urgent fixes Steven Rostedt
2010-03-13  2:56 ` [PATCH 1/5] ring-buffer: Move disabled check into preempt disable section Steven Rostedt
2010-03-13  2:56 ` [PATCH 2/5] function-graph: Init curr_ret_stack with ret_stack Steven Rostedt
2010-03-14 10:10   ` Frederic Weisbecker
2010-03-13  2:56 ` [PATCH 3/5] tracing: Use same local variable when resetting the ring buffer Steven Rostedt
2010-03-13  2:56 ` [PATCH 4/5] tracing: Disable buffer switching when starting or stopping trace Steven Rostedt
2010-03-13  2:57 ` [PATCH 5/5] tracing: Do not record user stack trace from NMI context Steven Rostedt
2010-03-14 10:27   ` Frederic Weisbecker [this message]
2010-03-14 15:28     ` Steven Rostedt
2010-03-14 16:58     ` Steven Rostedt
2010-03-17  2:08       ` Frederic Weisbecker
2010-03-14 22:05   ` John Kacur
2010-03-14 22:29     ` Steven Rostedt
2010-03-13  7:25 ` [PATCH 0/5][GIT PULL][2.6.34] tracing: urgent fixes Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100314102747.GB5140@nowhere \
    --to=fweisbec@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    --cc=stable@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox