public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] how to perform a safe NMI stack trace on all CPUs on x86?
@ 2015-05-13 14:14 王龙
  2015-05-13 14:22 ` Steven Rostedt
  2015-05-13 14:26 ` Jiri Kosina
  0 siblings, 2 replies; 5+ messages in thread
From: 王龙 @ 2015-05-13 14:14 UTC (permalink / raw)
  To: rostedt, jkosina, paulmck, pmladek, dzickus
  Cc: johannes, koct9i, tglx, mingo, hpa, x86, atomlin, akpm,
	sasha.levin, linux-kernel, peifeiyue, long.wanglong, morgan.wang

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 991 bytes --]

Hi all,

In kernel before 3.19, when trigger_all_cpu_backtrace() is called on x86, 
it will trigger an NMI on each CPU and call show_regs(). But this can lead
to a hard lock up if the NMI comes in on another printk().

The commit a9edc88093287183ac934be44f295f183b2c62dd (x86/nmi: Perform a safe 
NMI stack trace on all CPUs) fix this problem on kernel mainline. when the NMI 
triggers, it switches the printk routine for that CPU to call a NMI safe printk 
function that records the printk in a per_cpu seq_buf descriptor. After all 
NMIs have finished recording its data, the seq_bufs are printed in a safe 
context. But how do we fix this problem in older version of kernel(eg, 3.10 stable)? 
The 3.10 stable has no "switch printk routine" and "seq_buf" infrastructures.

Could anyone give me some ideas?

Best Regards
Wang Longÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?
  2015-05-13 14:14 [RFC] how to perform a safe NMI stack trace on all CPUs on x86? 王龙
@ 2015-05-13 14:22 ` Steven Rostedt
  2015-05-14 11:15   ` long.wanglong
  2015-05-13 14:26 ` Jiri Kosina
  1 sibling, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2015-05-13 14:22 UTC (permalink / raw)
  To: 王龙
  Cc: jkosina, paulmck, pmladek, dzickus, johannes, koct9i, tglx, mingo,
	hpa, x86, atomlin, akpm, sasha.levin, linux-kernel, peifeiyue,
	long.wanglong, morgan.wang

On Wed, 13 May 2015 22:14:54 +0800
"王龙" <wanglong@laoqinren.net> wrote:


> context. But how do we fix this problem in older version of kernel(eg, 3.10 stable)? 
> The 3.10 stable has no "switch printk routine" and "seq_buf" infrastructures.
> 
> Could anyone give me some ideas?
> 

Backport the necessary patches.

-- Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?
  2015-05-13 14:14 [RFC] how to perform a safe NMI stack trace on all CPUs on x86? 王龙
  2015-05-13 14:22 ` Steven Rostedt
@ 2015-05-13 14:26 ` Jiri Kosina
  2015-05-14 11:20   ` long.wanglong
  1 sibling, 1 reply; 5+ messages in thread
From: Jiri Kosina @ 2015-05-13 14:26 UTC (permalink / raw)
  To: 王龙
  Cc: rostedt, paulmck, pmladek, dzickus, johannes, koct9i, tglx, mingo,
	hpa, x86, atomlin, akpm, sasha.levin, linux-kernel, peifeiyue,
	long.wanglong, morgan.wang

On Wed, 13 May 2015, 王龙 wrote:

> Hi all,
> 
> In kernel before 3.19, when trigger_all_cpu_backtrace() is called on x86, 
> it will trigger an NMI on each CPU and call show_regs(). But this can lead
> to a hard lock up if the NMI comes in on another printk().
> 
> The commit a9edc88093287183ac934be44f295f183b2c62dd (x86/nmi: Perform a safe 
> NMI stack trace on all CPUs) fix this problem on kernel mainline. when the NMI 
> triggers, it switches the printk routine for that CPU to call a NMI safe printk 
> function that records the printk in a per_cpu seq_buf descriptor. After all 
> NMIs have finished recording its data, the seq_bufs are printed in a safe 
> context. But how do we fix this problem in older version of kernel(eg, 3.10 stable)? 
> The 3.10 stable has no "switch printk routine" and "seq_buf" infrastructures.
> 
> Could anyone give me some ideas?

Either you backport seq_buf-based aproach to the older kernel, or, if you 
are working on 3.4 kernel or earlier (basically any kernel preceeding the 
printk() revamp that happened in 7ff9554bb57 and after), you can use 
slightly simpler aproach.

It's an aproach we used initially when finding out the issue for the first 
time, and it is proven to work as well (but it's not applicable after Kay 
added all the complexity to printk()).

You can see it in our SLE11 kernel tree, available on
	
	http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP4&id=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9

for example.

It's up to you to judget which is the least painful way :)

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?
  2015-05-13 14:22 ` Steven Rostedt
@ 2015-05-14 11:15   ` long.wanglong
  0 siblings, 0 replies; 5+ messages in thread
From: long.wanglong @ 2015-05-14 11:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: 王龙, jkosina, paulmck, pmladek, dzickus, johannes,
	koct9i, tglx, mingo, hpa, x86, atomlin, akpm, sasha.levin,
	linux-kernel, peifeiyue, morgan.wang

On 2015/5/13 22:22, Steven Rostedt wrote:
> On Wed, 13 May 2015 22:14:54 +0800
> "王龙" <wanglong@laoqinren.net> wrote:
> 
> 
>> context. But how do we fix this problem in older version of kernel(eg, 3.10 stable)? 
>> The 3.10 stable has no "switch printk routine" and "seq_buf" infrastructures.
>>
>> Could anyone give me some ideas?
>>
> 
> Backport the necessary patches.
> 
> -- Steve
> 
Hi Steve,

Thank you for your reply, I will backport necessary patches to 3.10 stable.
Welcome you to review my backport patches.

Best Regards
Wang Long
> .
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?
  2015-05-13 14:26 ` Jiri Kosina
@ 2015-05-14 11:20   ` long.wanglong
  0 siblings, 0 replies; 5+ messages in thread
From: long.wanglong @ 2015-05-14 11:20 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: 王龙, rostedt, paulmck, pmladek, dzickus, johannes,
	koct9i, tglx, mingo, hpa, x86, atomlin, akpm, sasha.levin,
	linux-kernel, peifeiyue, morgan.wang

On 2015/5/13 22:26, Jiri Kosina wrote:
> On Wed, 13 May 2015, 王龙 wrote:
> 
>> Hi all,
>>
>> In kernel before 3.19, when trigger_all_cpu_backtrace() is called on x86, 
>> it will trigger an NMI on each CPU and call show_regs(). But this can lead
>> to a hard lock up if the NMI comes in on another printk().
>>
>> The commit a9edc88093287183ac934be44f295f183b2c62dd (x86/nmi: Perform a safe 
>> NMI stack trace on all CPUs) fix this problem on kernel mainline. when the NMI 
>> triggers, it switches the printk routine for that CPU to call a NMI safe printk 
>> function that records the printk in a per_cpu seq_buf descriptor. After all 
>> NMIs have finished recording its data, the seq_bufs are printed in a safe 
>> context. But how do we fix this problem in older version of kernel(eg, 3.10 stable)? 
>> The 3.10 stable has no "switch printk routine" and "seq_buf" infrastructures.
>>
>> Could anyone give me some ideas?
> 
> Either you backport seq_buf-based aproach to the older kernel, or, if you 
> are working on 3.4 kernel or earlier (basically any kernel preceeding the 
> printk() revamp that happened in 7ff9554bb57 and after), you can use 
> slightly simpler aproach.
> 
> It's an aproach we used initially when finding out the issue for the first 
> time, and it is proven to work as well (but it's not applicable after Kay 
> added all the complexity to printk()).
> 
> You can see it in our SLE11 kernel tree, available on
> 	
> 	http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP4&id=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9
> 
> for example.
> 
> It's up to you to judget which is the least painful way :)
> 

Hi Jiri Kosina,

For 3.10 stable, the only way to solve this problem is backport seq_buf-based aproach.

I will backport necessary patches to 3.10 stable. Welcome you to review my backport patches.

Best Regards
Wang Long





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-05-14 12:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-13 14:14 [RFC] how to perform a safe NMI stack trace on all CPUs on x86? 王龙
2015-05-13 14:22 ` Steven Rostedt
2015-05-14 11:15   ` long.wanglong
2015-05-13 14:26 ` Jiri Kosina
2015-05-14 11:20   ` long.wanglong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox