All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xunlei Pang <xpang@redhat.com>
To: "河合英宏 / KAWAI,HIDEHIRO" <hidehiro.kawai.ez@hitachi.com>,
	"'xlpang@redhat.com'" <xlpang@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Dave Young" <dyoung@redhat.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"Baoquan He" <bhe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>, Toshi Kani <toshi.kani@hpe.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Minfei Huang <mnfhuang@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Daniel Walker <dwalker@fifo99.com>,
	Ingo Molnar <mingo@kernel.org>,
	Takao Indoh <indou.takao@jp.fujitsu.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"Lee, Chun-Yi" <joeyli.kernel@gmail.com>,
	Borislav Petkov <bp@suse.de>, Vivek Goyal <vgoyal@redhat.com>,
	Petr Mladek <pmladek@suse.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>,
	Masami Hiramatsu <mhiramat@kernel.org>, Tejun Heo <tj@kernel.org>,
	Vitaly Kuznetsov <vkuznets@redhat.com>
Subject: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version
Date: Tue, 12 Jul 2016 15:26:06 +0800	[thread overview]
Message-ID: <57849B8E.2070007@redhat.com> (raw)
In-Reply-To: <04EAB7311EE43145B2D3536183D1A84454C707D6@GSjpTKYDCembx31.service.hitachi.net>

On 2016/07/12 at 15:12, 河合英宏 / KAWAI,HIDEHIRO wrote:
>> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Xunlei Pang
>> Sent: Tuesday, July 12, 2016 3:57 PM
>> On 2016/07/12 at 11:56, 河合英宏 / KAWAI,HIDEHIRO wrote:
>>> Hi Xunlei,
>>>
>>> Thanks for the review.
>>>
>>>> From: Xunlei Pang [mailto:xpang@redhat.com]
>>>> Sent: Tuesday, July 12, 2016 12:12 PM
>>>> On 2016/07/05 at 19:33, Hidehiro Kawai wrote:
>>>>> This patch fixes one of the problems reported by Daniel Walker
>>>>> (https://lkml.org/lkml/2015/6/24/44).
>>>>>
>>>>> If crash_kexec_post_notifiers boot option is specified, other CPUs
>>>>> are stopped by smp_send_stop() instead of machine_crash_shutdown()
>>>>> in crash_kexec() path.  This behavior change leads two problems.
>>>>>
>>>>>  Problem 1:
>>>>>  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are
>>>>>  still online and try to stop their watchdog timer.  If
>>>>>  smp_send_stop() is called before octeon_generic_shutdown(), stopping
>>>>>  watchdog timer will fail because other CPUs have been offlined by
>>>>>  smp_send_stop().
>>>>>
>>>>>    panic()
>>>>>      if crash_kexec_post_notifiers == 1
>>>>>        smp_send_stop()
>>>>>        atomic_notifier_call_chain()
>>>>>        kmsg_dump()
>>>>>      crash_kexec()
>>>>>        machine_crash_shutdown()
>>>>>          octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs
>>>>>
>>>>>  Problem 2:
>>>>>  Most of architectures stop other CPUs in machine_crash_shutdown()
>>>>>  path, and they also do something needed for kdump.  For example,
>>>>>  they save registers, disable virtualization extensions, and so on.
>>>>>  However, if smp_send_stop() stops other CPUs before
>>>>>  machine_crash_shutdown(), we miss those operations.
>>>>>
>>>>> How do we fix these problems?  In the first place, we should stop
>>>>> other CPUs as soon as possible when panic() was called, otherwise
>>>>> other CPUs may wipe out a clue to the cause of the failure.  So, we
>>>>> replace smp_send_stop() with more suitable one for kdump.
>>>>>
>>>>> This patch solves Problem 2 by replacing smp_send_stop() in panic()
>>>>> with panic_smp_send_stop().  This is a weak function which calls
>>>>> smp_send_stop(), and architecture dependent code may override this
>>>>> with appropriate one.  This patch only provides x86-specific version.
>>>>>
>>>>> Changes in V3:
>>>>> - Revise comments, description, and symbol names
>>>>>
>>>>> Changes in V2:
>>>>> - Replace smp_send_stop() call with crash_kexec version which
>>>>>   saves cpu states and cleans up VMX/SVM
>>>>> - Drop a fix for Problem 1 at this moment
>>>>>
>>>>> Reported-by: Daniel Walker <dwalker@fifo99.com>
>>>>> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
>>>>> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
>>>>> Cc: Dave Young <dyoung@redhat.com>
>>>>> Cc: Baoquan He <bhe@redhat.com>
>>>>> Cc: Vivek Goyal <vgoyal@redhat.com>
>>>>> Cc: Eric Biederman <ebiederm@xmission.com>
>>>>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>> Cc: Borislav Petkov <bp@suse.de>
>>>>> Cc: Toshi Kani <toshi.kani@hpe.com>
>>>>> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
>>>>> Cc: Takao Indoh <indou.takao@jp.fujitsu.com>
>>>>> Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com>
>>>>> Cc: Minfei Huang <mnfhuang@gmail.com>
>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>>> Cc: Petr Mladek <pmladek@suse.com>
>>>>> Cc: Tejun Heo <tj@kernel.org>
>>>>> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
>>>>> ---
>>>>>  arch/x86/kernel/crash.c |   14 ++++++++++----
>>>>>  kernel/panic.c          |   43 ++++++++++++++++++++++++++++++++-----------
>>>>>  2 files changed, 42 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>>>>> index 9ef978d..3305433 100644
>>>>> --- a/arch/x86/kernel/crash.c
>>>>> +++ b/arch/x86/kernel/crash.c
>>>>> @@ -133,15 +133,21 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
>>>>>  	disable_local_APIC();
>>>>>  }
>>>>>
>>>>> -static void kdump_nmi_shootdown_cpus(void)
>>>>> +/* Override the weak function in kernel/panic.c */
>>>>> +void panic_smp_send_stop(void)
>>>>>  {
>>>>> -	nmi_shootdown_cpus(kdump_nmi_callback);
>>>>> +	static int cpus_stopped;
>>>> Should be atomic_t type?
>>> panic_smp_send_stop() can be called by only one panicking CPU
>>> (but can be called twice). It is sufficient to be normal variable.
>> There are other call sites of __crash_kexec() for oops cases, which can
>> call panic_smp_send_stop() concurrently on a different cpu.
> In oops cases, crash_kexec() is called first, then __crash_kexec() is
> called.  crash_kexec() excludes concurrent execution of panic() and
> crash_kexec() via panic_cpu, so panic_smp_send_stop() shouldn't be
> called concurrently.

Right, that's why oops calls crash_kexec() not __crash_kexec().
I have no problem on this. Thanks!

Regards,
Xunlei

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: Xunlei Pang <xpang@redhat.com>
To: "河合英宏 / KAWAI,HIDEHIRO" <hidehiro.kawai.ez@hitachi.com>,
	"'xlpang@redhat.com'" <xlpang@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Dave Young" <dyoung@redhat.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"Baoquan He" <bhe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>, Toshi Kani <toshi.kani@hpe.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Minfei Huang <mnfhuang@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Daniel Walker <dwalker@fifo99.com>,
	Ingo Molnar <mingo@kernel.org>,
	Takao Indoh <indou.takao@jp.fujitsu.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"Lee, Chun-Yi" <joeyli.kernel@gmail.com>,
	Borislav Petkov <bp@suse.de>, Vivek Goyal <vgoyal@redhat.com>,
	Petr Mladek <pmladek@suse.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>,
	Masami Hiramatsu <mhiramat@kernel.org>, Tejun Heo <tj@kernel.org>,
	Vitaly Kuznetsov <vkuznets@redhat.com>
Subject: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version
Date: Tue, 12 Jul 2016 15:26:06 +0800	[thread overview]
Message-ID: <57849B8E.2070007@redhat.com> (raw)
In-Reply-To: <04EAB7311EE43145B2D3536183D1A84454C707D6@GSjpTKYDCembx31.service.hitachi.net>

On 2016/07/12 at 15:12, 河合英宏 / KAWAI,HIDEHIRO wrote:
>> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Xunlei Pang
>> Sent: Tuesday, July 12, 2016 3:57 PM
>> On 2016/07/12 at 11:56, 河合英宏 / KAWAI,HIDEHIRO wrote:
>>> Hi Xunlei,
>>>
>>> Thanks for the review.
>>>
>>>> From: Xunlei Pang [mailto:xpang@redhat.com]
>>>> Sent: Tuesday, July 12, 2016 12:12 PM
>>>> On 2016/07/05 at 19:33, Hidehiro Kawai wrote:
>>>>> This patch fixes one of the problems reported by Daniel Walker
>>>>> (https://lkml.org/lkml/2015/6/24/44).
>>>>>
>>>>> If crash_kexec_post_notifiers boot option is specified, other CPUs
>>>>> are stopped by smp_send_stop() instead of machine_crash_shutdown()
>>>>> in crash_kexec() path.  This behavior change leads two problems.
>>>>>
>>>>>  Problem 1:
>>>>>  octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are
>>>>>  still online and try to stop their watchdog timer.  If
>>>>>  smp_send_stop() is called before octeon_generic_shutdown(), stopping
>>>>>  watchdog timer will fail because other CPUs have been offlined by
>>>>>  smp_send_stop().
>>>>>
>>>>>    panic()
>>>>>      if crash_kexec_post_notifiers == 1
>>>>>        smp_send_stop()
>>>>>        atomic_notifier_call_chain()
>>>>>        kmsg_dump()
>>>>>      crash_kexec()
>>>>>        machine_crash_shutdown()
>>>>>          octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs
>>>>>
>>>>>  Problem 2:
>>>>>  Most of architectures stop other CPUs in machine_crash_shutdown()
>>>>>  path, and they also do something needed for kdump.  For example,
>>>>>  they save registers, disable virtualization extensions, and so on.
>>>>>  However, if smp_send_stop() stops other CPUs before
>>>>>  machine_crash_shutdown(), we miss those operations.
>>>>>
>>>>> How do we fix these problems?  In the first place, we should stop
>>>>> other CPUs as soon as possible when panic() was called, otherwise
>>>>> other CPUs may wipe out a clue to the cause of the failure.  So, we
>>>>> replace smp_send_stop() with more suitable one for kdump.
>>>>>
>>>>> This patch solves Problem 2 by replacing smp_send_stop() in panic()
>>>>> with panic_smp_send_stop().  This is a weak function which calls
>>>>> smp_send_stop(), and architecture dependent code may override this
>>>>> with appropriate one.  This patch only provides x86-specific version.
>>>>>
>>>>> Changes in V3:
>>>>> - Revise comments, description, and symbol names
>>>>>
>>>>> Changes in V2:
>>>>> - Replace smp_send_stop() call with crash_kexec version which
>>>>>   saves cpu states and cleans up VMX/SVM
>>>>> - Drop a fix for Problem 1 at this moment
>>>>>
>>>>> Reported-by: Daniel Walker <dwalker@fifo99.com>
>>>>> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
>>>>> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
>>>>> Cc: Dave Young <dyoung@redhat.com>
>>>>> Cc: Baoquan He <bhe@redhat.com>
>>>>> Cc: Vivek Goyal <vgoyal@redhat.com>
>>>>> Cc: Eric Biederman <ebiederm@xmission.com>
>>>>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>> Cc: Borislav Petkov <bp@suse.de>
>>>>> Cc: Toshi Kani <toshi.kani@hpe.com>
>>>>> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
>>>>> Cc: Takao Indoh <indou.takao@jp.fujitsu.com>
>>>>> Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com>
>>>>> Cc: Minfei Huang <mnfhuang@gmail.com>
>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>>> Cc: Petr Mladek <pmladek@suse.com>
>>>>> Cc: Tejun Heo <tj@kernel.org>
>>>>> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
>>>>> ---
>>>>>  arch/x86/kernel/crash.c |   14 ++++++++++----
>>>>>  kernel/panic.c          |   43 ++++++++++++++++++++++++++++++++-----------
>>>>>  2 files changed, 42 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>>>>> index 9ef978d..3305433 100644
>>>>> --- a/arch/x86/kernel/crash.c
>>>>> +++ b/arch/x86/kernel/crash.c
>>>>> @@ -133,15 +133,21 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
>>>>>  	disable_local_APIC();
>>>>>  }
>>>>>
>>>>> -static void kdump_nmi_shootdown_cpus(void)
>>>>> +/* Override the weak function in kernel/panic.c */
>>>>> +void panic_smp_send_stop(void)
>>>>>  {
>>>>> -	nmi_shootdown_cpus(kdump_nmi_callback);
>>>>> +	static int cpus_stopped;
>>>> Should be atomic_t type?
>>> panic_smp_send_stop() can be called by only one panicking CPU
>>> (but can be called twice). It is sufficient to be normal variable.
>> There are other call sites of __crash_kexec() for oops cases, which can
>> call panic_smp_send_stop() concurrently on a different cpu.
> In oops cases, crash_kexec() is called first, then __crash_kexec() is
> called.  crash_kexec() excludes concurrent execution of panic() and
> crash_kexec() via panic_cpu, so panic_smp_send_stop() shouldn't be
> called concurrently.

Right, that's why oops calls crash_kexec() not __crash_kexec().
I have no problem on this. Thanks!

Regards,
Xunlei

  reply	other threads:[~2016-07-12  7:26 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-05 11:33 [V3 PATCH 0/2] kexec: crash_kexec_post_notifiers boot option related fixes Hidehiro Kawai
2016-07-05 11:33 ` Hidehiro Kawai
2016-07-05 11:33 ` [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version Hidehiro Kawai
2016-07-05 11:33   ` Hidehiro Kawai
2016-07-11  8:34   ` Dave Young
2016-07-11  8:34     ` Dave Young
2016-07-12  2:49     ` 河合英宏 / KAWAI,HIDEHIRO
2016-07-12  2:49       ` 河合英宏 / KAWAI,HIDEHIRO
2016-07-13  2:03       ` 'Dave Young'
2016-07-15 11:50         ` 河合英宏 / KAWAI,HIDEHIRO
2016-07-15 11:50           ` 河合英宏 / KAWAI,HIDEHIRO
2016-07-18  9:02           ` 'Dave Young'
2016-07-18  9:02             ` 'Dave Young'
2016-07-19  5:51             ` 河合英宏 / KAWAI,HIDEHIRO
2016-07-19  5:51               ` 河合英宏 / KAWAI,HIDEHIRO
2016-07-19  6:52               ` 'Dave Young'
2016-07-19  6:52                 ` 'Dave Young'
2016-07-19 11:23                 ` 河合英宏 / KAWAI,HIDEHIRO
2016-07-19 11:23                   ` 河合英宏 / KAWAI,HIDEHIRO
2016-07-12  3:11   ` Xunlei Pang
2016-07-12  3:11     ` Xunlei Pang
2016-07-12  3:56     ` 河合英宏 / KAWAI,HIDEHIRO
2016-07-12  3:56       ` 河合英宏 / KAWAI,HIDEHIRO
2016-07-12  6:57       ` Xunlei Pang
2016-07-12  6:57         ` Xunlei Pang
2016-07-12  7:12         ` 河合英宏 / KAWAI,HIDEHIRO
2016-07-12  7:12           ` 河合英宏 / KAWAI,HIDEHIRO
2016-07-12  7:26           ` Xunlei Pang [this message]
2016-07-12  7:26             ` Xunlei Pang
2016-07-05 11:33 ` [V3 PATCH 2/2] kexec: Use core_param for crash_kexec_post_notifiers boot option Hidehiro Kawai
2016-07-05 11:33   ` Hidehiro Kawai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57849B8E.2070007@redhat.com \
    --to=xpang@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=bp@suse.de \
    --cc=d.hatayama@jp.fujitsu.com \
    --cc=dwalker@fifo99.com \
    --cc=dyoung@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=hpa@zytor.com \
    --cc=indou.takao@jp.fujitsu.com \
    --cc=joeyli.kernel@gmail.com \
    --cc=jpoimboe@redhat.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mnfhuang@gmail.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=toshi.kani@hpe.com \
    --cc=vgoyal@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    --cc=xlpang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.