stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: <gregkh@linuxfoundation.org>
To: sumit.semwal@linaro.org, alexander.levin@verizon.com,
	gregkh@linuxfoundation.org, haiyangz@microsoft.com,
	kys@microsoft.com, mingo@kernel.org, tglx@linutronix.de,
	vkuznets@redhat.com
Cc: <stable@vger.kernel.org>, <stable-commits@vger.kernel.org>
Subject: Patch "x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic" has been added to the 4.4-stable tree
Date: Tue, 28 Mar 2017 14:13:40 +0200	[thread overview]
Message-ID: <149070322019214@kroah.com> (raw)
In-Reply-To: <1490458699-24484-5-git-send-email-sumit.semwal@linaro.org>


This is a note to let you know that I've just added the patch titled

    x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic

to the 4.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     x86-hyperv-handle-unknown-nmis-on-one-cpu-when-unknown_nmi_panic.patch
and it can be found in the queue-4.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


>From foo@baz Tue Mar 28 13:59:27 CEST 2017
From: Sumit Semwal <sumit.semwal@linaro.org>
Date: Sat, 25 Mar 2017 21:48:04 +0530
Subject: x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
To: stable@vger.kernel.org
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>, devel@linuxdriverproject.org, Haiyang Zhang <haiyangz@microsoft.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@kernel.org>, Sasha Levin <alexander.levin@verizon.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Sumit Semwal <sumit.semwal@linaro.org>
Message-ID: <1490458699-24484-5-git-send-email-sumit.semwal@linaro.org>

From: Sumit Semwal <sumit.semwal@linaro.org>


From: Vitaly Kuznetsov <vkuznets@redhat.com>

[ Upstream commit 59107e2f48831daedc46973ce4988605ab066de3 ]

There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt')
which injects NMI to the guest. We may want to crash the guest and do kdump
on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to
allow the kdump kernel to re-establish VMBus connection so it will see
VMBus devices (storage, network,..).

To properly unload VMBus making it possible to start over during kdump we
need to do the following:

 - Send an 'unload' message to the hypervisor. This can be done on any CPU
   so we do this the crashing CPU.

 - Receive the 'unload finished' reply message. WS2012R2 delivers this
   message to the CPU which was used to establish VMBus connection during
   module load and this CPU may differ from the CPU sending 'unload'.

Receiving a VMBus message means the following:

 - There is a per-CPU slot in memory for one message. This slot can in
   theory be accessed by any CPU.

 - We get an interrupt on the CPU when a message was placed into the slot.

 - When we read the message we need to clear the slot and signal the fact
   to the hypervisor. In case there are more messages to this CPU pending
   the hypervisor will deliver the next message. The signaling is done by
   writing to an MSR so this can only be done on the appropriate CPU.

To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload()
function which checks message slots for all CPUs in a loop waiting for the
'unload finished' messages. However, there is an issue which arises when
these conditions are met:

 - We're crashing on a CPU which is different from the one which was used
   to initially contact the hypervisor.

 - The CPU which was used for the initial contact is blocked with interrupts
   disabled and there is a message pending in the message slot.

In this case we won't be able to read the 'unload finished' message on the
crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs
simultaneously: the first CPU entering panic() will proceed to crash and
all other CPUs will stop themselves with interrupts disabled.

The suggested solution is to handle unknown NMIs for Hyper-V guests on the
first CPU which gets them only. This will allow us to rely on VMBus
interrupt handler being able to receive the 'unload finish' message in
case it is delivered to a different CPU.

The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the
boot CPU only, WS2012R2 and earlier Hyper-V versions are affected.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: devel@linuxdriverproject.org
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Link: http://lkml.kernel.org/r/20161202100720.28121-1-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/mshyperv.c |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -30,6 +30,7 @@
 #include <asm/apic.h>
 #include <asm/timer.h>
 #include <asm/reboot.h>
+#include <asm/nmi.h>
 
 struct ms_hyperv_info ms_hyperv;
 EXPORT_SYMBOL_GPL(ms_hyperv);
@@ -157,6 +158,26 @@ static unsigned char hv_get_nmi_reason(v
 	return 0;
 }
 
+#ifdef CONFIG_X86_LOCAL_APIC
+/*
+ * Prior to WS2016 Debug-VM sends NMIs to all CPUs which makes
+ * it dificult to process CHANNELMSG_UNLOAD in case of crash. Handle
+ * unknown NMI on the first CPU which gets it.
+ */
+static int hv_nmi_unknown(unsigned int val, struct pt_regs *regs)
+{
+	static atomic_t nmi_cpu = ATOMIC_INIT(-1);
+
+	if (!unknown_nmi_panic)
+		return NMI_DONE;
+
+	if (atomic_cmpxchg(&nmi_cpu, -1, raw_smp_processor_id()) != -1)
+		return NMI_HANDLED;
+
+	return NMI_DONE;
+}
+#endif
+
 static void __init ms_hyperv_init_platform(void)
 {
 	/*
@@ -182,6 +203,9 @@ static void __init ms_hyperv_init_platfo
 		printk(KERN_INFO "HyperV: LAPIC Timer Frequency: %#x\n",
 				lapic_timer_frequency);
 	}
+
+	register_nmi_handler(NMI_UNKNOWN, hv_nmi_unknown, NMI_FLAG_FIRST,
+			     "hv_nmi_unknown");
 #endif
 
 	if (ms_hyperv.features & HV_X64_MSR_TIME_REF_COUNT_AVAILABLE)


Patches currently in stable-queue which might be from sumit.semwal@linaro.org are

queue-4.4/pci-add-comments-about-rom-bar-updating.patch
queue-4.4/acpi-blacklist-make-dell-latitude-3350-ethernet-work.patch
queue-4.4/s390-zcrypt-introduce-cex6-toleration.patch
queue-4.4/block-allow-write_same-commands-with-the-sg_io-ioctl.patch
queue-4.4/pci-do-any-vf-bar-updates-before-enabling-the-bars.patch
queue-4.4/x86-hyperv-handle-unknown-nmis-on-one-cpu-when-unknown_nmi_panic.patch
queue-4.4/serial-8250_pci-detach-low-level-driver-during-pci-error-recovery.patch
queue-4.4/xen-do-not-re-use-pirq-number-cached-in-pci-device-msi-msg-data.patch
queue-4.4/pci-separate-vf-bar-updates-from-standard-bar-updates.patch
queue-4.4/pci-ignore-bar-updates-on-virtual-functions.patch
queue-4.4/pci-update-bars-using-property-bits-appropriate-for-type.patch
queue-4.4/vfio-spapr-postpone-allocation-of-userspace-version-of-tce-table.patch
queue-4.4/pci-don-t-update-vf-bars-while-vf-memory-space-is-enabled.patch
queue-4.4/igb-workaround-for-igb-i210-firmware-issue.patch
queue-4.4/pci-remove-pci_resource_bar-and-pci_iov_resource_bar.patch
queue-4.4/pci-decouple-ioresource_rom_enable-and-pci_rom_address_enable.patch
queue-4.4/acpi-blacklist-add-_rev-quirks-for-dell-precision-5520-and-3520.patch
queue-4.4/igb-add-i211-to-i210-phy-workaround.patch
queue-4.4/uvcvideo-uvc_scan_fallback-for-webcams-with-broken-chain.patch

  reply	other threads:[~2017-03-28 12:14 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-25 16:18 [PATCH for-4.4 00/19] Stable commits from Ubuntu Yakkety 4.9-lts Sumit Semwal
2017-03-25 16:18 ` [PATCH for-4.4 01/19] xen: do not re-use pirq number cached in pci device msi msg data Sumit Semwal
2017-03-28 12:13   ` Patch "xen: do not re-use pirq number cached in pci device msi msg data" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 02/19] igb: Workaround for igb i210 firmware issue Sumit Semwal
2017-03-28 12:13   ` Patch "igb: Workaround for igb i210 firmware issue" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 03/19] igb: add i211 to i210 PHY workaround Sumit Semwal
2017-03-28 12:13   ` Patch "igb: add i211 to i210 PHY workaround" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 04/19] x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic Sumit Semwal
2017-03-28 12:13   ` gregkh [this message]
2017-03-25 16:18 ` [PATCH for-4.4 05/19] PCI: Separate VF BAR updates from standard BAR updates Sumit Semwal
2017-03-28 12:13   ` Patch "PCI: Separate VF BAR updates from standard BAR updates" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 06/19] PCI: Remove pci_resource_bar() and pci_iov_resource_bar() Sumit Semwal
2017-03-28 12:13   ` Patch "PCI: Remove pci_resource_bar() and pci_iov_resource_bar()" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 07/19] PCI: Add comments about ROM BAR updating Sumit Semwal
2017-03-28 12:13   ` Patch "PCI: Add comments about ROM BAR updating" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 08/19] PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE Sumit Semwal
2017-03-28 12:13   ` Patch "PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 09/19] PCI: Don't update VF BARs while VF memory space is enabled Sumit Semwal
2017-03-28 12:13   ` Patch "PCI: Don't update VF BARs while VF memory space is enabled" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 10/19] PCI: Update BARs using property bits appropriate for type Sumit Semwal
2017-03-28 12:13   ` Patch "PCI: Update BARs using property bits appropriate for type" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 11/19] PCI: Ignore BAR updates on virtual functions Sumit Semwal
2017-03-28 12:13   ` Patch "PCI: Ignore BAR updates on virtual functions" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 12/19] PCI: Do any VF BAR updates before enabling the BARs Sumit Semwal
2017-03-28 12:13   ` Patch "PCI: Do any VF BAR updates before enabling the BARs" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 13/19] vfio/spapr: Postpone allocation of userspace version of TCE table Sumit Semwal
2017-03-28 12:13   ` Patch "vfio/spapr: Postpone allocation of userspace version of TCE table" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 14/19] block: allow WRITE_SAME commands with the SG_IO ioctl Sumit Semwal
2017-03-28 12:12   ` Patch "block: allow WRITE_SAME commands with the SG_IO ioctl" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 15/19] s390/zcrypt: Introduce CEX6 toleration Sumit Semwal
2017-03-28 12:13   ` Patch "s390/zcrypt: Introduce CEX6 toleration" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 16/19] uvcvideo: uvc_scan_fallback() for webcams with broken chain Sumit Semwal
2017-03-28 12:13   ` Patch "uvcvideo: uvc_scan_fallback() for webcams with broken chain" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 17/19] ACPI / blacklist: add _REV quirks for Dell Precision 5520 and 3520 Sumit Semwal
2017-03-28 12:12   ` Patch "ACPI / blacklist: add _REV quirks for Dell Precision 5520 and 3520" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 18/19] ACPI / blacklist: Make Dell Latitude 3350 ethernet work Sumit Semwal
2017-03-28 12:12   ` Patch "ACPI / blacklist: Make Dell Latitude 3350 ethernet work" has been added to the 4.4-stable tree gregkh
2017-03-25 16:18 ` [PATCH for-4.4 19/19] serial: 8250_pci: Detach low-level driver during PCI error recovery Sumit Semwal
2017-03-28 12:13   ` Patch "serial: 8250_pci: Detach low-level driver during PCI error recovery" has been added to the 4.4-stable tree gregkh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=149070322019214@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=alexander.levin@verizon.com \
    --cc=haiyangz@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=mingo@kernel.org \
    --cc=stable-commits@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).