Re: Panic and page fault in loop during handling NMI backtrace handler

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: Panic and page fault in loop during handling NMI backtrace handler
       [not found] <27240C0AC20F114CBF8149A2696CBE4A01B60835@SHSMSX101.ccr.corp.intel.com>
@ 2013-10-15 12:18 ` Frederic Weisbecker
  2013-10-15 12:37   ` Peter Zijlstra
  2013-10-15 16:40 ` Steven Rostedt
  1 sibling, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2013-10-15 12:18 UTC (permalink / raw)
  To: Liu, Chuansheng, Steven Rostedt
  Cc: Ingo Molnar (mingo@kernel.org), hpa@zytor.com,
	akpm@linux-foundation.org, paulmck@linux.vnet.ibm.com,
	Peter Zijlstra (peterz@infradead.org), x86@kernel.org,
	'linux-kernel@vger.kernel.org' (linux-kernel@vger.kernel.org),
	Wang, Xiaoming, Li, Zhuangzhi

On Tue, Oct 15, 2013 at 02:01:04AM +0000, Liu, Chuansheng wrote:
> We meet one issue that during trigger all CPU backtrace, but during in the NMI handler arch_trigger_all_cpu_backtrace_handler,
> It hit the PAGE fault, then PAGE fault is in loop, at last the thread stack overflow, and system panic.
> 
> Anyone can give some help? Thanks.

Looks like we re-enter the fault several times. On x86-32, NMIs can
fault if they dereference vmalloc'ed area. I wonder if the module thing
we lookup in the NMI is stored on some vmalloc'ed area.

Also, may be we enter the fault, trigger the WARN_ON_ONCE(in_nmi()) warning,
which in turn dumps the stack, lookup that module address, refaults, etc...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Panic and page fault in loop during handling NMI backtrace handler
  2013-10-15 12:18 ` Panic and page fault in loop during handling NMI backtrace handler Frederic Weisbecker
@ 2013-10-15 12:37   ` Peter Zijlstra
  2013-10-15 12:48     ` Paul E. McKenney
  2013-10-15 12:54     ` Frederic Weisbecker
  0 siblings, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2013-10-15 12:37 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Liu, Chuansheng, Steven Rostedt, Ingo Molnar (mingo@kernel.org),
	hpa@zytor.com, akpm@linux-foundation.org,
	paulmck@linux.vnet.ibm.com, x86@kernel.org,
	'linux-kernel@vger.kernel.org' (linux-kernel@vger.kernel.org),
	Wang, Xiaoming, Li, Zhuangzhi

On Tue, Oct 15, 2013 at 02:18:53PM +0200, Frederic Weisbecker wrote:
> On Tue, Oct 15, 2013 at 02:01:04AM +0000, Liu, Chuansheng wrote:
> > We meet one issue that during trigger all CPU backtrace, but during in the NMI handler arch_trigger_all_cpu_backtrace_handler,
> > It hit the PAGE fault, then PAGE fault is in loop, at last the thread stack overflow, and system panic.
> > 
> > Anyone can give some help? Thanks.
> 
> Looks like we re-enter the fault several times. On x86-32, NMIs can
> fault if they dereference vmalloc'ed area. I wonder if the module thing
> we lookup in the NMI is stored on some vmalloc'ed area.

IIRC modules are indeed allocated using vmalloc. See module_alloc()
using vmalloc_exec()

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Panic and page fault in loop during handling NMI backtrace handler
  2013-10-15 12:37   ` Peter Zijlstra
@ 2013-10-15 12:48     ` Paul E. McKenney
  2013-10-15 12:59       ` Frederic Weisbecker
  2013-10-15 12:54     ` Frederic Weisbecker
  1 sibling, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2013-10-15 12:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Liu, Chuansheng, Steven Rostedt,
	Ingo Molnar (mingo@kernel.org), hpa@zytor.com,
	akpm@linux-foundation.org, x86@kernel.org,
	'linux-kernel@vger.kernel.org' (linux-kernel@vger.kernel.org),
	Wang, Xiaoming, Li, Zhuangzhi

On Tue, Oct 15, 2013 at 02:37:17PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 15, 2013 at 02:18:53PM +0200, Frederic Weisbecker wrote:
> > On Tue, Oct 15, 2013 at 02:01:04AM +0000, Liu, Chuansheng wrote:
> > > We meet one issue that during trigger all CPU backtrace, but during in the NMI handler arch_trigger_all_cpu_backtrace_handler,
> > > It hit the PAGE fault, then PAGE fault is in loop, at last the thread stack overflow, and system panic.
> > > 
> > > Anyone can give some help? Thanks.
> > 
> > Looks like we re-enter the fault several times. On x86-32, NMIs can
> > fault if they dereference vmalloc'ed area. I wonder if the module thing
> > we lookup in the NMI is stored on some vmalloc'ed area.
> 
> IIRC modules are indeed allocated using vmalloc. See module_alloc()
> using vmalloc_exec()

This might then be a module that uses call_rcu(), but which does not have
the needed rcu_barrier() in the module-exit function.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Panic and page fault in loop during handling NMI backtrace handler
  2013-10-15 12:48     ` Paul E. McKenney
@ 2013-10-15 12:59       ` Frederic Weisbecker
  2013-10-15 13:06         ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2013-10-15 12:59 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Liu, Chuansheng, Steven Rostedt,
	Ingo Molnar (mingo@kernel.org), hpa@zytor.com,
	akpm@linux-foundation.org, x86@kernel.org,
	'linux-kernel@vger.kernel.org' (linux-kernel@vger.kernel.org),
	Wang, Xiaoming, Li, Zhuangzhi

On Tue, Oct 15, 2013 at 05:48:37AM -0700, Paul E. McKenney wrote:
> On Tue, Oct 15, 2013 at 02:37:17PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 15, 2013 at 02:18:53PM +0200, Frederic Weisbecker wrote:
> > > On Tue, Oct 15, 2013 at 02:01:04AM +0000, Liu, Chuansheng wrote:
> > > > We meet one issue that during trigger all CPU backtrace, but during in the NMI handler arch_trigger_all_cpu_backtrace_handler,
> > > > It hit the PAGE fault, then PAGE fault is in loop, at last the thread stack overflow, and system panic.
> > > > 
> > > > Anyone can give some help? Thanks.
> > > 
> > > Looks like we re-enter the fault several times. On x86-32, NMIs can
> > > fault if they dereference vmalloc'ed area. I wonder if the module thing
> > > we lookup in the NMI is stored on some vmalloc'ed area.
> > 
> > IIRC modules are indeed allocated using vmalloc. See module_alloc()
> > using vmalloc_exec()
> 
> This might then be a module that uses call_rcu(), but which does not have
> the needed rcu_barrier() in the module-exit function.

I rather believe it's due to the lazy paging of vmalloc area in x86-32. We had issues
like that in the past. For example that's the reason why we do an ad-hoc per-cpu
allocation on callchain buffers in perf rather than using alloc_percpu() which might
use vmalloc.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Panic and page fault in loop during handling NMI backtrace handler
  2013-10-15 12:59       ` Frederic Weisbecker
@ 2013-10-15 13:06         ` Paul E. McKenney
  2013-10-15 13:17           ` Frederic Weisbecker
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2013-10-15 13:06 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Liu, Chuansheng, Steven Rostedt,
	Ingo Molnar (mingo@kernel.org), hpa@zytor.com,
	akpm@linux-foundation.org, x86@kernel.org,
	'linux-kernel@vger.kernel.org' (linux-kernel@vger.kernel.org),
	Wang, Xiaoming, Li, Zhuangzhi

On Tue, Oct 15, 2013 at 02:59:15PM +0200, Frederic Weisbecker wrote:
> On Tue, Oct 15, 2013 at 05:48:37AM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 15, 2013 at 02:37:17PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 15, 2013 at 02:18:53PM +0200, Frederic Weisbecker wrote:
> > > > On Tue, Oct 15, 2013 at 02:01:04AM +0000, Liu, Chuansheng wrote:
> > > > > We meet one issue that during trigger all CPU backtrace, but during in the NMI handler arch_trigger_all_cpu_backtrace_handler,
> > > > > It hit the PAGE fault, then PAGE fault is in loop, at last the thread stack overflow, and system panic.
> > > > > 
> > > > > Anyone can give some help? Thanks.
> > > > 
> > > > Looks like we re-enter the fault several times. On x86-32, NMIs can
> > > > fault if they dereference vmalloc'ed area. I wonder if the module thing
> > > > we lookup in the NMI is stored on some vmalloc'ed area.
> > > 
> > > IIRC modules are indeed allocated using vmalloc. See module_alloc()
> > > using vmalloc_exec()
> > 
> > This might then be a module that uses call_rcu(), but which does not have
> > the needed rcu_barrier() in the module-exit function.
> 
> I rather believe it's due to the lazy paging of vmalloc area in x86-32. We had issues
> like that in the past. For example that's the reason why we do an ad-hoc per-cpu
> allocation on callchain buffers in perf rather than using alloc_percpu() which might
> use vmalloc.

I must defer to your greater experience with this type of bug.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Panic and page fault in loop during handling NMI backtrace handler
  2013-10-15 13:06         ` Paul E. McKenney
@ 2013-10-15 13:17           ` Frederic Weisbecker
  2013-10-15 13:27             ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2013-10-15 13:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Liu, Chuansheng, Steven Rostedt,
	Ingo Molnar (mingo@kernel.org), hpa@zytor.com,
	akpm@linux-foundation.org, x86@kernel.org,
	'linux-kernel@vger.kernel.org' (linux-kernel@vger.kernel.org),
	Wang, Xiaoming, Li, Zhuangzhi

On Tue, Oct 15, 2013 at 06:06:59AM -0700, Paul E. McKenney wrote:
> On Tue, Oct 15, 2013 at 02:59:15PM +0200, Frederic Weisbecker wrote:
> > On Tue, Oct 15, 2013 at 05:48:37AM -0700, Paul E. McKenney wrote:
> > > On Tue, Oct 15, 2013 at 02:37:17PM +0200, Peter Zijlstra wrote:
> > > > On Tue, Oct 15, 2013 at 02:18:53PM +0200, Frederic Weisbecker wrote:
> > > > > On Tue, Oct 15, 2013 at 02:01:04AM +0000, Liu, Chuansheng wrote:
> > > > > > We meet one issue that during trigger all CPU backtrace, but during in the NMI handler arch_trigger_all_cpu_backtrace_handler,
> > > > > > It hit the PAGE fault, then PAGE fault is in loop, at last the thread stack overflow, and system panic.
> > > > > > 
> > > > > > Anyone can give some help? Thanks.
> > > > > 
> > > > > Looks like we re-enter the fault several times. On x86-32, NMIs can
> > > > > fault if they dereference vmalloc'ed area. I wonder if the module thing
> > > > > we lookup in the NMI is stored on some vmalloc'ed area.
> > > > 
> > > > IIRC modules are indeed allocated using vmalloc. See module_alloc()
> > > > using vmalloc_exec()
> > > 
> > > This might then be a module that uses call_rcu(), but which does not have
> > > the needed rcu_barrier() in the module-exit function.
> > 
> > I rather believe it's due to the lazy paging of vmalloc area in x86-32. We had issues
> > like that in the past. For example that's the reason why we do an ad-hoc per-cpu
> > allocation on callchain buffers in perf rather than using alloc_percpu() which might
> > use vmalloc.
> 
> I must defer to your greater experience with this type of bug.

With some chances I'll be proved wrong. I hope, because that issue is not easily fixed.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Panic and page fault in loop during handling NMI backtrace handler
  2013-10-15 13:17           ` Frederic Weisbecker
@ 2013-10-15 13:27             ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2013-10-15 13:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Liu, Chuansheng, Steven Rostedt,
	Ingo Molnar (mingo@kernel.org), hpa@zytor.com,
	akpm@linux-foundation.org, x86@kernel.org,
	'linux-kernel@vger.kernel.org' (linux-kernel@vger.kernel.org),
	Wang, Xiaoming, Li, Zhuangzhi

On Tue, Oct 15, 2013 at 03:17:59PM +0200, Frederic Weisbecker wrote:
> On Tue, Oct 15, 2013 at 06:06:59AM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 15, 2013 at 02:59:15PM +0200, Frederic Weisbecker wrote:
> > > On Tue, Oct 15, 2013 at 05:48:37AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Oct 15, 2013 at 02:37:17PM +0200, Peter Zijlstra wrote:
> > > > > On Tue, Oct 15, 2013 at 02:18:53PM +0200, Frederic Weisbecker wrote:
> > > > > > On Tue, Oct 15, 2013 at 02:01:04AM +0000, Liu, Chuansheng wrote:
> > > > > > > We meet one issue that during trigger all CPU backtrace, but during in the NMI handler arch_trigger_all_cpu_backtrace_handler,
> > > > > > > It hit the PAGE fault, then PAGE fault is in loop, at last the thread stack overflow, and system panic.
> > > > > > > 
> > > > > > > Anyone can give some help? Thanks.
> > > > > > 
> > > > > > Looks like we re-enter the fault several times. On x86-32, NMIs can
> > > > > > fault if they dereference vmalloc'ed area. I wonder if the module thing
> > > > > > we lookup in the NMI is stored on some vmalloc'ed area.
> > > > > 
> > > > > IIRC modules are indeed allocated using vmalloc. See module_alloc()
> > > > > using vmalloc_exec()
> > > > 
> > > > This might then be a module that uses call_rcu(), but which does not have
> > > > the needed rcu_barrier() in the module-exit function.
> > > 
> > > I rather believe it's due to the lazy paging of vmalloc area in x86-32. We had issues
> > > like that in the past. For example that's the reason why we do an ad-hoc per-cpu
> > > allocation on callchain buffers in perf rather than using alloc_percpu() which might
> > > use vmalloc.
> > 
> > I must defer to your greater experience with this type of bug.
> 
> With some chances I'll be proved wrong. I hope, because that issue is not easily fixed.

Fair enough!

								Thanx, Paul


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Panic and page fault in loop during handling NMI backtrace handler
  2013-10-15 12:37   ` Peter Zijlstra
  2013-10-15 12:48     ` Paul E. McKenney
@ 2013-10-15 12:54     ` Frederic Weisbecker
  2013-10-15 14:19       ` Steven Rostedt
  1 sibling, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2013-10-15 12:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Liu, Chuansheng, Steven Rostedt, Ingo Molnar (mingo@kernel.org),
	hpa@zytor.com, akpm@linux-foundation.org,
	paulmck@linux.vnet.ibm.com, x86@kernel.org,
	'linux-kernel@vger.kernel.org' (linux-kernel@vger.kernel.org),
	Wang, Xiaoming, Li, Zhuangzhi

On Tue, Oct 15, 2013 at 02:37:17PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 15, 2013 at 02:18:53PM +0200, Frederic Weisbecker wrote:
> > On Tue, Oct 15, 2013 at 02:01:04AM +0000, Liu, Chuansheng wrote:
> > > We meet one issue that during trigger all CPU backtrace, but during in the NMI handler arch_trigger_all_cpu_backtrace_handler,
> > > It hit the PAGE fault, then PAGE fault is in loop, at last the thread stack overflow, and system panic.
> > > 
> > > Anyone can give some help? Thanks.
> > 
> > Looks like we re-enter the fault several times. On x86-32, NMIs can
> > fault if they dereference vmalloc'ed area. I wonder if the module thing
> > we lookup in the NMI is stored on some vmalloc'ed area.
> 
> IIRC modules are indeed allocated using vmalloc. See module_alloc()
> using vmalloc_exec()

Right. At least the module text. Now I'm not sure the module_address_lookup()
dereference that. But there many other objects allocated in module.c that use
alloc_percpu(), which in turn can use vmalloc.

IIRC Steve made the NMIs safely faultable. So may be we can remove the WARN_ON in
do_page_fault(). It may not be a good idea to allow fault in NMIs though. Steve?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Panic and page fault in loop during handling NMI backtrace handler
  2013-10-15 12:54     ` Frederic Weisbecker
@ 2013-10-15 14:19       ` Steven Rostedt
  0 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2013-10-15 14:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Liu, Chuansheng, Ingo Molnar (mingo@kernel.org),
	hpa@zytor.com, akpm@linux-foundation.org,
	paulmck@linux.vnet.ibm.com, x86@kernel.org,
	'linux-kernel@vger.kernel.org' (linux-kernel@vger.kernel.org),
	Wang, Xiaoming, Li, Zhuangzhi

On Tue, 15 Oct 2013 14:54:11 +0200
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Tue, Oct 15, 2013 at 02:37:17PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 15, 2013 at 02:18:53PM +0200, Frederic Weisbecker wrote:
> > > On Tue, Oct 15, 2013 at 02:01:04AM +0000, Liu, Chuansheng wrote:
> > > > We meet one issue that during trigger all CPU backtrace, but during in the NMI handler arch_trigger_all_cpu_backtrace_handler,
> > > > It hit the PAGE fault, then PAGE fault is in loop, at last the thread stack overflow, and system panic.
> > > > 
> > > > Anyone can give some help? Thanks.
> > > 
> > > Looks like we re-enter the fault several times. On x86-32, NMIs can
> > > fault if they dereference vmalloc'ed area. I wonder if the module thing
> > > we lookup in the NMI is stored on some vmalloc'ed area.
> > 
> > IIRC modules are indeed allocated using vmalloc. See module_alloc()
> > using vmalloc_exec()
> 
> Right. At least the module text. Now I'm not sure the module_address_lookup()
> dereference that. But there many other objects allocated in module.c that use
> alloc_percpu(), which in turn can use vmalloc.
> 
> IIRC Steve made the NMIs safely faultable. So may be we can remove the WARN_ON in
> do_page_fault(). It may not be a good idea to allow fault in NMIs though. Steve?

NMIs should be safe to fault after my patches went in. My main concern
was with x86-64 as the NMI vector uses IST which resets the stack, but
x86-32 keeps using the same stack if we are in ring0 (switches if we
are in ring3 just like any other interrupt).

What's the original panic? I don't see the original post?

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Panic and page fault in loop during handling NMI backtrace handler
       [not found] <27240C0AC20F114CBF8149A2696CBE4A01B60835@SHSMSX101.ccr.corp.intel.com>
  2013-10-15 12:18 ` Panic and page fault in loop during handling NMI backtrace handler Frederic Weisbecker
@ 2013-10-15 16:40 ` Steven Rostedt
  2013-10-16  1:54   ` Liu, Chuansheng
  1 sibling, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2013-10-15 16:40 UTC (permalink / raw)
  To: Liu, Chuansheng
  Cc: Ingo Molnar (mingo@kernel.org), hpa@zytor.com, fweisbec@gmail.com,
	akpm@linux-foundation.org, paulmck@linux.vnet.ibm.com,
	Peter Zijlstra (peterz@infradead.org), x86@kernel.org,
	'linux-kernel@vger.kernel.org' (linux-kernel@vger.kernel.org),
	Wang, Xiaoming, Li, Zhuangzhi


BTW, please do not send out HTML email, as that gets blocked from going
to LKML.

On Tue, 15 Oct 2013 02:01:04 +0000
"Liu, Chuansheng" <chuansheng.liu@intel.com> wrote:

> We meet one issue that during trigger all CPU backtrace, but during in the NMI handler arch_trigger_all_cpu_backtrace_handler,
> It hit the PAGE fault, then PAGE fault is in loop, at last the thread stack overflow, and system panic.
> 
> Anyone can give some help? Thanks.
> 
> 
> Panic log as below:
> ===============
> [   15.069144] BUG: unable to handle kernel [   15.073635] paging request at 1649736d
> [   15.076379] IP: [<c200402a>] print_context_stack+0x4a/0xa0
> [   15.082529] *pde = 00000000
> [   15.085758] Thread overran stack, or stack corrupted
> [   15.091303] Oops: 0000 [#1] SMP
> [   15.094932] Modules linked in: atomisp_css2400b0_v2(+) lm3554 ov2722 imx1x5 atmel_mxt_ts vxd392 videobuf_vmalloc videobuf_core bcm_bt_lpm bcm43241 kct_daemon(O)
> [   15.111093] CPU: 2 PID: 2443 Comm: Compiler Tainted: G        W  O 3.10.1+ #1

I'm curious, what "Out-of-tree" module was loaded?

Read the rest from the bottom up, as that's how I wrote it :-)


> [   15.119075] task: f213f980 ti: f0c42000 task.ti: f0c42000
> [   15.125116] EIP: 0060:[<c200402a>] EFLAGS: 00210087 CPU: 2
> [   15.131255] EIP is at print_context_stack+0x4a/0xa0
> [   15.136712] EAX: 16497ffc EBX: 1649736d ECX: 986736d8 EDX: 1649736d
> [   15.143722] ESI: 00000000 EDI: ffffe000 EBP: f0c4220c ESP: f0c421ec
> [   15.150732]  DS: 007b ES: 007b FS: 00d8 GS: 003b SS: 0068
> [   15.156771] CR0: 80050033 CR2: 1649736d CR3: 31245000 CR4: 001007d0
> [   15.163781] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [   15.170789] DR6: ffff0ff0 DR7: 00000400
> [   15.175076] Stack:
> [   15.177324]  16497ffc 16496000 986736d8 ffffe000 986736d8 1649736d c282c148 16496000
> [   15.186067]  f0c4223c c20033b0 c282c148 c29ceecf 00000000 f0c4222c 986736d8 f0c4222c
> [   15.194810]  00000000 c29ceecf 00000000 00000000 f0c42260 c20041a7 f0c4229c c282c148
> [   15.203549] Call Trace:
> [   15.206295]  [<c20033b0>] dump_trace+0x70/0xf0
> [   15.211274]  [<c20041a7>] show_trace_log_lvl+0x47/0x60
> [   15.217028]  [<c2003482>] show_stack_log_lvl+0x52/0xd0
> [   15.222782]  [<c2004201>] show_stack+0x21/0x50
> [   15.227762]  [<c281b38b>] dump_stack+0x16/0x18
> [   15.232742]  [<c2037cff>] warn_slowpath_common+0x5f/0x80
> [   15.238693]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [   15.244156]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [   15.249621]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [   15.255472]  [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> [   15.261228]  [<c282553a>] vmalloc_fault+0x5a/0xcf
> [   15.266497]  [<c282592f>] __do_page_fault+0x2cf/0x4a0
> [   15.272154]  [<c25e13e0>] ? logger_aio_write+0x230/0x230
> [   15.278106]  [<c2039c94>] ? console_unlock+0x314/0x440
> ... //
> [   16.885364]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [   16.891217]  [<c2825b08>] do_page_fault+0x8/0x10
> [   16.896387]  [<c2823066>] error_code+0x5a/0x60
> [   16.901367]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [   16.907219]  [<c208d6a0>] ? print_modules+0x20/0x90
> [   16.912685]  [<c2037cfa>] warn_slowpath_common+0x5a/0x80
> [   16.918634]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [   16.924097]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [   16.929562]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [   16.935415]  [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> [   16.941169]  [<c282553a>] vmalloc_fault+0x5a/0xcf
> [   16.946437]  [<c282592f>] __do_page_fault+0x2cf/0x4a0
> [   16.952095]  [<c25e13e0>] ? logger_aio_write+0x230/0x230
> [   16.958046]  [<c2039c94>] ? console_unlock+0x314/0x440
> [   16.963800]  [<c2003e62>] ? sys_modify_ldt+0x2/0x160
> [   16.969362]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [   16.975215]  [<c2825b08>] do_page_fault+0x8/0x10
> [   16.980386]  [<c2823066>] error_code+0x5a/0x60
> [   16.985366]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [   16.991215]  [<c208d6a0>] ? print_modules+0x20/0x90
> [   16.996673]  [<c2037cfa>] warn_slowpath_common+0x5a/0x80
> [   17.002622]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [   17.008086]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [   17.013550]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [   17.019403]  [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> [   17.025159]  [<c282553a>] vmalloc_fault+0x5a/0xcf

Oh look, we are constantly warning about this same fault! There's your
infinite loop.

Note the WARN_ON_ONCE() does the WARN_ON() first and then updates
__warned = true. Thus, if the WARN_ON() itself faults, then we are in
an infinite loop.

> [   17.030428]  [<c282592f>] __do_page_fault+0x2cf/0x4a0
> [   17.036085]  [<c25e13e0>] ? logger_aio_write+0x230/0x230
> [   17.042037]  [<c2039c94>] ? console_unlock+0x314/0x440
> [   17.047790]  [<c2003e62>] ? sys_modify_ldt+0x2/0x160
> [   17.053352]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [   17.059205]  [<c2825b08>] do_page_fault+0x8/0x10
> [   17.064375]  [<c2823066>] error_code+0x5a/0x60
> [   17.069354]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [   17.075204]  [<c208d6a0>] ? print_modules+0x20/0x90
> [   17.080669]  [<c2037cfa>] warn_slowpath_common+0x5a/0x80
> [   17.086619]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [   17.092082]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> [   17.097546]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [   17.103399]  [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> [   17.109154]  [<c282553a>] vmalloc_fault+0x5a/0xcf

Yep, the WARN_ON() triggered in vmalloc_fault(). We shouldn't worry
about warning in_nmi() for vmalloc faults anymore.


> [   17.114422]  [<c282592f>] __do_page_fault+0x2cf/0x4a0
> [   17.120080]  [<c206b93d>] ? update_group_power+0x1fd/0x240
> [   17.126224]  [<c227827b>] ? number.isra.2+0x32b/0x330
> [   17.131880]  [<c20679bc>] ? update_curr+0xac/0x190
> [   17.137247]  [<c227827b>] ? number.isra.2+0x32b/0x330
> [   17.142905]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [   17.148755]  [<c2825b08>] do_page_fault+0x8/0x10
> [   17.153926]  [<c2823066>] error_code+0x5a/0x60
> [   17.158905]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> [   17.164760]  [<c208d1a9>] ? module_address_lookup+0x29/0xb0
> [   17.170999]  [<c208dddb>] kallsyms_lookup+0x9b/0xb0

Looks like kallsyms_lookup() faulted?

> [   17.176462]  [<c208de1d>] __sprint_symbol+0x2d/0xd0
> [   17.181926]  [<c22790cc>] ? sprintf+0x1c/0x20
> [   17.186804]  [<c208def4>] sprint_symbol+0x14/0x20
> [   17.192063]  [<c208df1e>] __print_symbol+0x1e/0x40
> [   17.197430]  [<c25e00d7>] ? ashmem_shrink+0x77/0xf0
> [   17.202895]  [<c25e13e0>] ? logger_aio_write+0x230/0x230
> [   17.208845]  [<c205bdf5>] ? up+0x25/0x40
> [   17.213242]  [<c2039cb7>] ? console_unlock+0x337/0x440
> [   17.218998]  [<c2818236>] ? printk+0x38/0x3a
> [   17.223782]  [<c20006d0>] __show_regs+0x70/0x190
> [   17.228954]  [<c200353a>] show_regs+0x3a/0x1b0
> [   17.233931]  [<c2818236>] ? printk+0x38/0x3a
> [   17.238717]  [<c2824182>] arch_trigger_all_cpu_backtrace_handler+0x62/0x80
> [   17.246413]  [<c2823919>] nmi_handle.isra.0+0x39/0x60
> [   17.252071]  [<c2823a29>] do_nmi+0xe9/0x3f0

Start here and read upward.

Can you try this patch:

>From 794197cf3f563d36e5ee5b29cbf8e941163f9bc9 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
Date: Tue, 15 Oct 2013 12:34:56 -0400
Subject: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault

Since the NMI iretq nesting has been fixed, there's no reason that
an NMI handler can not take a page fault for vmalloc'd code. No locks
are taken in that code path, and the software now handles nested NMIs
when the fault re-enables NMIs on iretq.

Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
warn on triggers a vmalloc fault for some reason, then we can go into
an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
the variable to make it happen "once").

Reported-by: "Liu, Chuansheng" <chuansheng.liu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/mm/fault.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 3aaeffc..78926c6 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -268,8 +268,6 @@ static noinline __kprobes int vmalloc_fault(unsigned long address)
 	if (!(address >= VMALLOC_START && address < VMALLOC_END))
 		return -1;
 
-	WARN_ON_ONCE(in_nmi());
-
 	/*
 	 * Synchronize this task's top level page-table
 	 * with the 'reference' page table.
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: Panic and page fault in loop during handling NMI backtrace handler
  2013-10-15 16:40 ` Steven Rostedt
@ 2013-10-16  1:54   ` Liu, Chuansheng
  2013-10-16  2:07     ` Steven Rostedt
  0 siblings, 1 reply; 13+ messages in thread
From: Liu, Chuansheng @ 2013-10-16  1:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar (mingo@kernel.org), hpa@zytor.com, fweisbec@gmail.com,
	akpm@linux-foundation.org, paulmck@linux.vnet.ibm.com,
	Peter Zijlstra (peterz@infradead.org), x86@kernel.org,
	'linux-kernel@vger.kernel.org' (linux-kernel@vger.kernel.org),
	Wang, Xiaoming, Li, Zhuangzhi

Hello Steven,

> -----Original Message-----
> From: Steven Rostedt [mailto:rostedt@goodmis.org]
> Sent: Wednesday, October 16, 2013 12:40 AM
> To: Liu, Chuansheng
> Cc: Ingo Molnar (mingo@kernel.org); hpa@zytor.com; fweisbec@gmail.com;
> akpm@linux-foundation.org; paulmck@linux.vnet.ibm.com; Peter Zijlstra
> (peterz@infradead.org); x86@kernel.org; 'linux-kernel@vger.kernel.org'
> (linux-kernel@vger.kernel.org); Wang, Xiaoming; Li, Zhuangzhi
> Subject: Re: Panic and page fault in loop during handling NMI backtrace handler
> 
> 
> BTW, please do not send out HTML email, as that gets blocked from going
> to LKML.
Thanks your reminder, I forgot to convert it into txt email.

> 
> On Tue, 15 Oct 2013 02:01:04 +0000
> "Liu, Chuansheng" <chuansheng.liu@intel.com> wrote:
> 
> > We meet one issue that during trigger all CPU backtrace, but during in the
> NMI handler arch_trigger_all_cpu_backtrace_handler,
> > It hit the PAGE fault, then PAGE fault is in loop, at last the thread stack
> overflow, and system panic.
> >
> > Anyone can give some help? Thanks.
> >
> >
> > Panic log as below:
> > ===============
> > [   15.069144] BUG: unable to handle kernel [   15.073635] paging request
> at 1649736d
> > [   15.076379] IP: [<c200402a>] print_context_stack+0x4a/0xa0
> > [   15.082529] *pde = 00000000
> > [   15.085758] Thread overran stack, or stack corrupted
> > [   15.091303] Oops: 0000 [#1] SMP
> > [   15.094932] Modules linked in: atomisp_css2400b0_v2(+) lm3554 ov2722
> imx1x5 atmel_mxt_ts vxd392 videobuf_vmalloc videobuf_core bcm_bt_lpm
> bcm43241 kct_daemon(O)
> > [   15.111093] CPU: 2 PID: 2443 Comm: Compiler Tainted: G        W  O
> 3.10.1+ #1
> 
> I'm curious, what "Out-of-tree" module was loaded?
We have some un-upstream modules indeed:)

> 
> Read the rest from the bottom up, as that's how I wrote it :-)
> 
> 
> > [   15.119075] task: f213f980 ti: f0c42000 task.ti: f0c42000
> > [   15.125116] EIP: 0060:[<c200402a>] EFLAGS: 00210087 CPU: 2
> > [   15.131255] EIP is at print_context_stack+0x4a/0xa0
> > [   15.136712] EAX: 16497ffc EBX: 1649736d ECX: 986736d8 EDX: 1649736d
> > [   15.143722] ESI: 00000000 EDI: ffffe000 EBP: f0c4220c ESP: f0c421ec
> > [   15.150732]  DS: 007b ES: 007b FS: 00d8 GS: 003b SS: 0068
> > [   15.156771] CR0: 80050033 CR2: 1649736d CR3: 31245000 CR4:
> 001007d0
> > [   15.163781] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3:
> 00000000
> > [   15.170789] DR6: ffff0ff0 DR7: 00000400
> > [   15.175076] Stack:
> > [   15.177324]  16497ffc 16496000 986736d8 ffffe000 986736d8 1649736d
> c282c148 16496000
> > [   15.186067]  f0c4223c c20033b0 c282c148 c29ceecf 00000000 f0c4222c
> 986736d8 f0c4222c
> > [   15.194810]  00000000 c29ceecf 00000000 00000000 f0c42260
> c20041a7 f0c4229c c282c148
> > [   15.203549] Call Trace:
> > [   15.206295]  [<c20033b0>] dump_trace+0x70/0xf0
> > [   15.211274]  [<c20041a7>] show_trace_log_lvl+0x47/0x60
> > [   15.217028]  [<c2003482>] show_stack_log_lvl+0x52/0xd0
> > [   15.222782]  [<c2004201>] show_stack+0x21/0x50
> > [   15.227762]  [<c281b38b>] dump_stack+0x16/0x18
> > [   15.232742]  [<c2037cff>] warn_slowpath_common+0x5f/0x80
> > [   15.238693]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [   15.244156]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [   15.249621]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [   15.255472]  [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> > [   15.261228]  [<c282553a>] vmalloc_fault+0x5a/0xcf
> > [   15.266497]  [<c282592f>] __do_page_fault+0x2cf/0x4a0
> > [   15.272154]  [<c25e13e0>] ? logger_aio_write+0x230/0x230
> > [   15.278106]  [<c2039c94>] ? console_unlock+0x314/0x440
> > ... //
> > [   16.885364]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [   16.891217]  [<c2825b08>] do_page_fault+0x8/0x10
> > [   16.896387]  [<c2823066>] error_code+0x5a/0x60
> > [   16.901367]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [   16.907219]  [<c208d6a0>] ? print_modules+0x20/0x90
> > [   16.912685]  [<c2037cfa>] warn_slowpath_common+0x5a/0x80
> > [   16.918634]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [   16.924097]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [   16.929562]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [   16.935415]  [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> > [   16.941169]  [<c282553a>] vmalloc_fault+0x5a/0xcf
> > [   16.946437]  [<c282592f>] __do_page_fault+0x2cf/0x4a0
> > [   16.952095]  [<c25e13e0>] ? logger_aio_write+0x230/0x230
> > [   16.958046]  [<c2039c94>] ? console_unlock+0x314/0x440
> > [   16.963800]  [<c2003e62>] ? sys_modify_ldt+0x2/0x160
> > [   16.969362]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [   16.975215]  [<c2825b08>] do_page_fault+0x8/0x10
> > [   16.980386]  [<c2823066>] error_code+0x5a/0x60
> > [   16.985366]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [   16.991215]  [<c208d6a0>] ? print_modules+0x20/0x90
> > [   16.996673]  [<c2037cfa>] warn_slowpath_common+0x5a/0x80
> > [   17.002622]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [   17.008086]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [   17.013550]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [   17.019403]  [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> > [   17.025159]  [<c282553a>] vmalloc_fault+0x5a/0xcf
> 
> Oh look, we are constantly warning about this same fault! There's your
> infinite loop.
Yes, it is the real WARN_ON infinite loop.

> 
> Note the WARN_ON_ONCE() does the WARN_ON() first and then updates
> __warned = true. Thus, if the WARN_ON() itself faults, then we are in
> an infinite loop.
> 
> > [   17.030428]  [<c282592f>] __do_page_fault+0x2cf/0x4a0
> > [   17.036085]  [<c25e13e0>] ? logger_aio_write+0x230/0x230
> > [   17.042037]  [<c2039c94>] ? console_unlock+0x314/0x440
> > [   17.047790]  [<c2003e62>] ? sys_modify_ldt+0x2/0x160
> > [   17.053352]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [   17.059205]  [<c2825b08>] do_page_fault+0x8/0x10
> > [   17.064375]  [<c2823066>] error_code+0x5a/0x60
> > [   17.069354]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [   17.075204]  [<c208d6a0>] ? print_modules+0x20/0x90
> > [   17.080669]  [<c2037cfa>] warn_slowpath_common+0x5a/0x80
> > [   17.086619]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [   17.092082]  [<c282553a>] ? vmalloc_fault+0x5a/0xcf
> > [   17.097546]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [   17.103399]  [<c2037d3d>] warn_slowpath_null+0x1d/0x20
> > [   17.109154]  [<c282553a>] vmalloc_fault+0x5a/0xcf
> 
> Yep, the WARN_ON() triggered in vmalloc_fault(). We shouldn't worry
> about warning in_nmi() for vmalloc faults anymore.
Got it.

> 
> 
> > [   17.114422]  [<c282592f>] __do_page_fault+0x2cf/0x4a0
> > [   17.120080]  [<c206b93d>] ? update_group_power+0x1fd/0x240
> > [   17.126224]  [<c227827b>] ? number.isra.2+0x32b/0x330
> > [   17.131880]  [<c20679bc>] ? update_curr+0xac/0x190
> > [   17.137247]  [<c227827b>] ? number.isra.2+0x32b/0x330
> > [   17.142905]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [   17.148755]  [<c2825b08>] do_page_fault+0x8/0x10
> > [   17.153926]  [<c2823066>] error_code+0x5a/0x60
> > [   17.158905]  [<c2825b00>] ? __do_page_fault+0x4a0/0x4a0
> > [   17.164760]  [<c208d1a9>] ? module_address_lookup+0x29/0xb0
> > [   17.170999]  [<c208dddb>] kallsyms_lookup+0x9b/0xb0
> 
> Looks like kallsyms_lookup() faulted?
> 
> > [   17.176462]  [<c208de1d>] __sprint_symbol+0x2d/0xd0
> > [   17.181926]  [<c22790cc>] ? sprintf+0x1c/0x20
> > [   17.186804]  [<c208def4>] sprint_symbol+0x14/0x20
> > [   17.192063]  [<c208df1e>] __print_symbol+0x1e/0x40
> > [   17.197430]  [<c25e00d7>] ? ashmem_shrink+0x77/0xf0
> > [   17.202895]  [<c25e13e0>] ? logger_aio_write+0x230/0x230
> > [   17.208845]  [<c205bdf5>] ? up+0x25/0x40
> > [   17.213242]  [<c2039cb7>] ? console_unlock+0x337/0x440
> > [   17.218998]  [<c2818236>] ? printk+0x38/0x3a
> > [   17.223782]  [<c20006d0>] __show_regs+0x70/0x190
> > [   17.228954]  [<c200353a>] show_regs+0x3a/0x1b0
> > [   17.233931]  [<c2818236>] ? printk+0x38/0x3a
> > [   17.238717]  [<c2824182>]
> arch_trigger_all_cpu_backtrace_handler+0x62/0x80
> > [   17.246413]  [<c2823919>] nmi_handle.isra.0+0x39/0x60
> > [   17.252071]  [<c2823a29>] do_nmi+0xe9/0x3f0
> 
> Start here and read upward.
> 
> Can you try this patch:
> 
> From 794197cf3f563d36e5ee5b29cbf8e941163f9bc9 Mon Sep 17 00:00:00
> 2001
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> Date: Tue, 15 Oct 2013 12:34:56 -0400
> Subject: [PATCH] x86: Remove WARN_ON(in_nmi()) from vmalloc_fault
> 
> Since the NMI iretq nesting has been fixed, there's no reason that
I think you patch fix the infinite loop, we will have a test soon.
BTW, we are using 3.10, could you help to point out which NMI iretq nesting patch?
Thanks.

> an NMI handler can not take a page fault for vmalloc'd code. No locks
> are taken in that code path, and the software now handles nested NMIs
> when the fault re-enables NMIs on iretq.
> 
> Not only that, if the vmalloc_fault() WARN_ON_ONCE() is hit, and that
> warn on triggers a vmalloc fault for some reason, then we can go into
> an infinite loop (the WARN_ON_ONCE() does the WARN() before updating
> the variable to make it happen "once").
> 
> Reported-by: "Liu, Chuansheng" <chuansheng.liu@intel.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  arch/x86/mm/fault.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 3aaeffc..78926c6 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -268,8 +268,6 @@ static noinline __kprobes int vmalloc_fault(unsigned
> long address)
>  	if (!(address >= VMALLOC_START && address < VMALLOC_END))
>  		return -1;
> 
> -	WARN_ON_ONCE(in_nmi());
> -
>  	/*
>  	 * Synchronize this task's top level page-table
>  	 * with the 'reference' page table.
> --
> 1.8.1.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Panic and page fault in loop during handling NMI backtrace handler
  2013-10-16  1:54   ` Liu, Chuansheng
@ 2013-10-16  2:07     ` Steven Rostedt
  2013-10-16  2:09       ` Liu, Chuansheng
  0 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2013-10-16  2:07 UTC (permalink / raw)
  To: Liu, Chuansheng
  Cc: Ingo Molnar (mingo@kernel.org), hpa@zytor.com, fweisbec@gmail.com,
	akpm@linux-foundation.org, paulmck@linux.vnet.ibm.com,
	Peter Zijlstra (peterz@infradead.org), x86@kernel.org,
	'linux-kernel@vger.kernel.org' (linux-kernel@vger.kernel.org),
	Wang, Xiaoming, Li, Zhuangzhi

On Wed, 16 Oct 2013 01:54:51 +0000
"Liu, Chuansheng" <chuansheng.liu@intel.com> wrote:


> > Since the NMI iretq nesting has been fixed, there's no reason that
> I think you patch fix the infinite loop, we will have a test soon.
> BTW, we are using 3.10, could you help to point out which NMI iretq nesting patch?

There were many. You can read about what was done here:

 https://lwn.net/Articles/484932/

The original is here:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ccd49c2391773ffbf52bb80d75c4a92b16972517

But more were added.

But that is back in 3.3, so 3.10 has all the required updates.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Panic and page fault in loop during handling NMI backtrace handler
  2013-10-16  2:07     ` Steven Rostedt
@ 2013-10-16  2:09       ` Liu, Chuansheng
  0 siblings, 0 replies; 13+ messages in thread
From: Liu, Chuansheng @ 2013-10-16  2:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar (mingo@kernel.org), hpa@zytor.com, fweisbec@gmail.com,
	akpm@linux-foundation.org, paulmck@linux.vnet.ibm.com,
	Peter Zijlstra (peterz@infradead.org), x86@kernel.org,
	'linux-kernel@vger.kernel.org' (linux-kernel@vger.kernel.org),
	Wang, Xiaoming, Li, Zhuangzhi

Hello Steven,

> -----Original Message-----
> From: Steven Rostedt [mailto:rostedt@goodmis.org]
> Sent: Wednesday, October 16, 2013 10:08 AM
> To: Liu, Chuansheng
> Cc: Ingo Molnar (mingo@kernel.org); hpa@zytor.com; fweisbec@gmail.com;
> akpm@linux-foundation.org; paulmck@linux.vnet.ibm.com; Peter Zijlstra
> (peterz@infradead.org); x86@kernel.org; 'linux-kernel@vger.kernel.org'
> (linux-kernel@vger.kernel.org); Wang, Xiaoming; Li, Zhuangzhi
> Subject: Re: Panic and page fault in loop during handling NMI backtrace handler
> 
> On Wed, 16 Oct 2013 01:54:51 +0000
> "Liu, Chuansheng" <chuansheng.liu@intel.com> wrote:
> 
> 
> > > Since the NMI iretq nesting has been fixed, there's no reason that
> > I think you patch fix the infinite loop, we will have a test soon.
> > BTW, we are using 3.10, could you help to point out which NMI iretq nesting
> patch?
> 
> There were many. You can read about what was done here:
> 
>  https://lwn.net/Articles/484932/
> 
> The original is here:
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ccd49c
> 2391773ffbf52bb80d75c4a92b16972517
> 
> But more were added.
> 
> But that is back in 3.3, so 3.10 has all the required updates.
Thanks your info, is trying your patch now.

> 
> -- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-10-16  2:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <27240C0AC20F114CBF8149A2696CBE4A01B60835@SHSMSX101.ccr.corp.intel.com>
2013-10-15 12:18 ` Panic and page fault in loop during handling NMI backtrace handler Frederic Weisbecker
2013-10-15 12:37   ` Peter Zijlstra
2013-10-15 12:48     ` Paul E. McKenney
2013-10-15 12:59       ` Frederic Weisbecker
2013-10-15 13:06         ` Paul E. McKenney
2013-10-15 13:17           ` Frederic Weisbecker
2013-10-15 13:27             ` Paul E. McKenney
2013-10-15 12:54     ` Frederic Weisbecker
2013-10-15 14:19       ` Steven Rostedt
2013-10-15 16:40 ` Steven Rostedt
2013-10-16  1:54   ` Liu, Chuansheng
2013-10-16  2:07     ` Steven Rostedt
2013-10-16  2:09       ` Liu, Chuansheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.