All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
To: Jonathan Corbet <corbet@lwn.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>,
	linux-doc@vger.kernel.org, x86@kernel.org,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	Michal Hocko <mhocko@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Subject: [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec
Date: Thu, 10 Dec 2015 10:46:24 +0900	[thread overview]
Message-ID: <20151210014624.25437.50028.stgit@softrs> (raw)

When an HA clustering software or administrator detects unresponsiveness
of a host, they issue an NMI to the host to completely stop current
works and take a crash dump.  If the kernel has already panicked
or is capturing a crash dump at that time, further NMI can cause
a crash dump failure.

Also, crash_kexec() called from oops context and panic() can
cause race conditions.

To solve these issues, this patch set does following things:

- Don't call panic() on NMI if the kernel has already panicked
- Extend exclusion control currently done by panic_lock to crash_kexec
- Introduce "apic_extnmi=none" boot option which masks external NMI
  NMI at the boot time

Additionally, "apic_extnmi=all" is provieded.  This option unmasks
external NMI for all CPUs.  This would help cause kernel panic even if
CPU 0 can't handle an external NMI due to hang-up in NMI context
or being handled by other NMI handlers.

This patch set can be applied to current -tip tree.

V6:
- Update comments and patch descriptions all over the patch series
- Add documentation for kernel.panic_on_io_nmi sysctl (PATCH 6/6)
- Separate PATCH 5/6 from PATCH 2/6 because the portion is actually
  needed for "apic_extnmi=all" case introduced by PATCH 4/6
- ...and various fixes (please see the change logs in each patch
  description for details)

V5: https://lkml.org/lkml/2015/11/20/228
- Use WRITE_ONCE() for crash_ipi_done to keep the instruction order
  (PATCH 2/4)
- Address concurrent unknown/external NMI case, too (PATCH 2/4)
- Fix build errors (PATCH 3/4)
- Rename "noextnmi" boot option to "apic_extnmi" and expand its
  feature (PATCH 4/4)

V4: https://lkml.org/lkml/2015/9/25/193
- Improve comments and descriptions (PATCH 1/4 to 3/4)
- Use new __crash_kexec(), no exclusion check version of crash_kexec(),
  instead of checking if panic_cpu is the current cpu or not
  (PATCH 3/4)

V3: https://lkml.org/lkml/2015/8/6/39
- Introduce nmi_panic() macro to reduce code duplication
- In the case of panic on NMI, don't return from NMI handlers
  if another cpu already panicked

V2: https://lkml.org/lkml/2015/7/27/31
- Use atomic_cmpxchg() instead of current spin_trylock() to exclude
  concurrent accesses to panic() and crash_kexec()
- Don't introduce no-lock version of panic() and crash_kexec()

V1: https://lkml.org/lkml/2015/7/22/81

---

Hidehiro Kawai (6):
      panic/x86: Fix re-entrance problem due to panic on NMI
      panic/x86: Allow CPUs to save registers even if they are looping in NMI context
      kexec: Fix race between panic() and crash_kexec() called directly
      x86/apic: Introduce apic_extnmi boot option
      x86/nmi: Fix to save registers for crash dump on external NMI broadcast
      Documentation: Add documentation for kernel.panic_on_io_nmi sysctl


 Documentation/kernel-parameters.txt |    9 +++++++++
 Documentation/sysctl/kernel.txt     |   15 +++++++++++++++
 arch/x86/include/asm/apic.h         |    5 +++++
 arch/x86/include/asm/reboot.h       |    1 +
 arch/x86/kernel/apic/apic.c         |   35 +++++++++++++++++++++++++++++++++--
 arch/x86/kernel/nmi.c               |   27 ++++++++++++++++++++++-----
 arch/x86/kernel/reboot.c            |   28 ++++++++++++++++++++++++++++
 include/linux/kernel.h              |   29 +++++++++++++++++++++++++++++
 include/linux/kexec.h               |    2 ++
 kernel/kexec_core.c                 |   30 +++++++++++++++++++++++++++++-
 kernel/panic.c                      |   29 ++++++++++++++++++++++++-----
 kernel/watchdog.c                   |    2 +-
 12 files changed, 198 insertions(+), 14 deletions(-)


-- 
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
To: Jonathan Corbet <corbet@lwn.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>,
	linux-doc@vger.kernel.org, x86@kernel.org,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	Michal Hocko <mhocko@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Subject: [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec
Date: Thu, 10 Dec 2015 10:46:24 +0900	[thread overview]
Message-ID: <20151210014624.25437.50028.stgit@softrs> (raw)

When an HA clustering software or administrator detects unresponsiveness
of a host, they issue an NMI to the host to completely stop current
works and take a crash dump.  If the kernel has already panicked
or is capturing a crash dump at that time, further NMI can cause
a crash dump failure.

Also, crash_kexec() called from oops context and panic() can
cause race conditions.

To solve these issues, this patch set does following things:

- Don't call panic() on NMI if the kernel has already panicked
- Extend exclusion control currently done by panic_lock to crash_kexec
- Introduce "apic_extnmi=none" boot option which masks external NMI
  NMI at the boot time

Additionally, "apic_extnmi=all" is provieded.  This option unmasks
external NMI for all CPUs.  This would help cause kernel panic even if
CPU 0 can't handle an external NMI due to hang-up in NMI context
or being handled by other NMI handlers.

This patch set can be applied to current -tip tree.

V6:
- Update comments and patch descriptions all over the patch series
- Add documentation for kernel.panic_on_io_nmi sysctl (PATCH 6/6)
- Separate PATCH 5/6 from PATCH 2/6 because the portion is actually
  needed for "apic_extnmi=all" case introduced by PATCH 4/6
- ...and various fixes (please see the change logs in each patch
  description for details)

V5: https://lkml.org/lkml/2015/11/20/228
- Use WRITE_ONCE() for crash_ipi_done to keep the instruction order
  (PATCH 2/4)
- Address concurrent unknown/external NMI case, too (PATCH 2/4)
- Fix build errors (PATCH 3/4)
- Rename "noextnmi" boot option to "apic_extnmi" and expand its
  feature (PATCH 4/4)

V4: https://lkml.org/lkml/2015/9/25/193
- Improve comments and descriptions (PATCH 1/4 to 3/4)
- Use new __crash_kexec(), no exclusion check version of crash_kexec(),
  instead of checking if panic_cpu is the current cpu or not
  (PATCH 3/4)

V3: https://lkml.org/lkml/2015/8/6/39
- Introduce nmi_panic() macro to reduce code duplication
- In the case of panic on NMI, don't return from NMI handlers
  if another cpu already panicked

V2: https://lkml.org/lkml/2015/7/27/31
- Use atomic_cmpxchg() instead of current spin_trylock() to exclude
  concurrent accesses to panic() and crash_kexec()
- Don't introduce no-lock version of panic() and crash_kexec()

V1: https://lkml.org/lkml/2015/7/22/81

---

Hidehiro Kawai (6):
      panic/x86: Fix re-entrance problem due to panic on NMI
      panic/x86: Allow CPUs to save registers even if they are looping in NMI context
      kexec: Fix race between panic() and crash_kexec() called directly
      x86/apic: Introduce apic_extnmi boot option
      x86/nmi: Fix to save registers for crash dump on external NMI broadcast
      Documentation: Add documentation for kernel.panic_on_io_nmi sysctl


 Documentation/kernel-parameters.txt |    9 +++++++++
 Documentation/sysctl/kernel.txt     |   15 +++++++++++++++
 arch/x86/include/asm/apic.h         |    5 +++++
 arch/x86/include/asm/reboot.h       |    1 +
 arch/x86/kernel/apic/apic.c         |   35 +++++++++++++++++++++++++++++++++--
 arch/x86/kernel/nmi.c               |   27 ++++++++++++++++++++++-----
 arch/x86/kernel/reboot.c            |   28 ++++++++++++++++++++++++++++
 include/linux/kernel.h              |   29 +++++++++++++++++++++++++++++
 include/linux/kexec.h               |    2 ++
 kernel/kexec_core.c                 |   30 +++++++++++++++++++++++++++++-
 kernel/panic.c                      |   29 ++++++++++++++++++++++++-----
 kernel/watchdog.c                   |    2 +-
 12 files changed, 198 insertions(+), 14 deletions(-)


-- 
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



             reply	other threads:[~2015-12-10  1:49 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-10  1:46 Hidehiro Kawai [this message]
2015-12-10  1:46 ` [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
2015-12-10  1:46 ` [V6 PATCH 1/6] panic/x86: Fix re-entrance problem due to panic on NMI Hidehiro Kawai
2015-12-10  1:46   ` Hidehiro Kawai
2015-12-10 15:41   ` Borislav Petkov
2015-12-10 15:41     ` Borislav Petkov
2015-12-11  0:23     ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-11  0:23       ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-19 10:12   ` [tip:x86/apic] panic, x86: " tip-bot for Hidehiro Kawai
2015-12-10  1:46 ` [V6 PATCH 2/6] panic/x86: Allow CPUs to save registers even if they are looping in NMI context Hidehiro Kawai
2015-12-10  1:46   ` Hidehiro Kawai
2015-12-19 10:13   ` [tip:x86/apic] panic, x86: Allow CPUs to save registers even if " tip-bot for Hidehiro Kawai
2015-12-10  1:46 ` [V6 PATCH 3/6] kexec: Fix race between panic() and crash_kexec() called directly Hidehiro Kawai
2015-12-10  1:46   ` Hidehiro Kawai
2015-12-19 10:13   ` [tip:x86/apic] kexec: Fix race between panic() and crash_kexec() tip-bot for Hidehiro Kawai
2015-12-10  1:46 ` [V6 PATCH 4/6] x86/apic: Introduce apic_extnmi boot option Hidehiro Kawai
2015-12-10  1:46   ` Hidehiro Kawai
2015-12-19 10:13   ` [tip:x86/apic] x86/apic: Introduce apic_extnmi command line parameter tip-bot for Hidehiro Kawai
2015-12-10  1:46 ` [V6 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast Hidehiro Kawai
2015-12-10  1:46   ` Hidehiro Kawai
2015-12-10  3:57   ` kbuild test robot
2015-12-10  3:57     ` kbuild test robot
2015-12-10  6:36     ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-10  6:36       ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-10  6:52       ` [V6.1 " Hidehiro Kawai
2015-12-10  6:52         ` Hidehiro Kawai
2015-12-11 18:04         ` Borislav Petkov
2015-12-11 18:04           ` Borislav Petkov
2015-12-19 10:14         ` [tip:x86/apic] x86/nmi: Save regs in crash dump on external NMI tip-bot for Hidehiro Kawai
2015-12-10  1:46 ` [V6 PATCH 6/6] Documentation: Add documentation for kernel.panic_on_io_nmi sysctl Hidehiro Kawai
2015-12-10  1:46   ` Hidehiro Kawai
2015-12-19 10:14   ` [tip:x86/apic] Documentation: Document " tip-bot for Hidehiro Kawai
2015-12-12 11:17 ` [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec Borislav Petkov
2015-12-12 11:17   ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151210014624.25437.50028.stgit@softrs \
    --to=hidehiro.kawai.ez@hitachi.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=vgoyal@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.