stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>,
	Balbir Singh <bsingharora@gmail.com>,
	Michael Ellerman <mpe@ellerman.id.au>
Subject: [PATCH 4.16 081/113] powerpc/mce: Fix a bug where mce loops on memory UE.
Date: Mon, 30 Apr 2018 12:24:52 -0700	[thread overview]
Message-ID: <20180430184018.603869231@linuxfoundation.org> (raw)
In-Reply-To: <20180430184015.043892819@linuxfoundation.org>

4.16-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

commit 75ecfb49516c53da00c57b9efe48fa3f5504a791 upstream.

The current code extracts the physical address for UE errors and then
hooks it up into memory failure infrastructure. On successful
extraction of physical address it wrongly sets "handled = 1" which
means this UE error has been recovered. Since MCE handler gets return
value as handled = 1, it assumes that error has been recovered and
goes back to same NIP. This causes MCE interrupt again and again in a
loop leading to hard lockup.

Also, initialize phys_addr to ULONG_MAX so that we don't end up
queuing undesired page to hwpoison.

Without this patch we see:
  Severe Machine check interrupt [Recovered]
    NIP: [000000001002588c] PID: 7109 Comm: find
    Initiator: CPU
    Error type: UE [Load/Store]
      Effective address: 00007fffd2755940
      Physical address:  000020181a080000
  ...
  Severe Machine check interrupt [Recovered]
    NIP: [000000001002588c] PID: 7109 Comm: find
    Initiator: CPU
    Error type: UE [Load/Store]
      Effective address: 00007fffd2755940
      Physical address:  000020181a080000
  Severe Machine check interrupt [Recovered]
    NIP: [000000001002588c] PID: 7109 Comm: find
    Initiator: CPU
    Error type: UE [Load/Store]
      Effective address: 00007fffd2755940
      Physical address:  000020181a080000
  Memory failure: 0x20181a08: recovery action for dirty LRU page: Recovered
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  Memory failure: 0x20181a08: already hardware poisoned
  ...
  Watchdog CPU:38 Hard LOCKUP

After this patch we see:

  Severe Machine check interrupt [Not recovered]
    NIP: [00007fffaae585f4] PID: 7168 Comm: find
    Initiator: CPU
    Error type: UE [Load/Store]
      Effective address: 00007fffaafe28ac
      Physical address:  00002017c0bd0000
  find[7168]: unhandled signal 7 at 00007fffaae585f4 nip 00007fffaae585f4 lr 00007fffaae585e0 code 4
  Memory failure: 0x2017c0bd: recovery action for dirty LRU page: Recovered

Fixes: 01eaac2b0591 ("powerpc/mce: Hookup ierror (instruction) UE errors")
Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/powerpc/kernel/mce_power.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -441,7 +441,6 @@ static int mce_handle_ierror(struct pt_r
 					if (pfn != ULONG_MAX) {
 						*phys_addr =
 							(pfn << PAGE_SHIFT);
-						handled = 1;
 					}
 				}
 			}
@@ -532,9 +531,7 @@ static int mce_handle_derror(struct pt_r
 			 * kernel/exception-64s.h
 			 */
 			if (get_paca()->in_mce < MAX_MCE_DEPTH)
-				if (!mce_find_instr_ea_and_pfn(regs, addr,
-								phys_addr))
-					handled = 1;
+				mce_find_instr_ea_and_pfn(regs, addr, phys_addr);
 		}
 		found = 1;
 	}
@@ -572,7 +569,7 @@ static long mce_handle_error(struct pt_r
 		const struct mce_ierror_table itable[])
 {
 	struct mce_error_info mce_err = { 0 };
-	uint64_t addr, phys_addr;
+	uint64_t addr, phys_addr = ULONG_MAX;
 	uint64_t srr1 = regs->msr;
 	long handled;
 

  parent reply	other threads:[~2018-04-30 19:28 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-30 19:23 [PATCH 4.16 000/113] 4.16.7-stable review Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 001/113] ext4: prevent right-shifting extents beyond EXT_MAX_BLOCKS Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 002/113] ext4: set h_journal if there is a failure starting a reserved handle Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 004/113] ext4: add validity checks for bitmap block numbers Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 005/113] ext4: fix bitmap position validation Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 006/113] random: set up the NUMA crng instances after the CRNG is fully initialized Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 007/113] random: fix possible sleeping allocation from irq context Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 008/113] random: rate limit unseeded randomness warnings Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 009/113] usbip: usbip_event: fix to not print kernel pointer address Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 010/113] usbip: usbip_host: fix to hold parent lock for device_attach() calls Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 011/113] usbip: vhci_hcd: Fix usb device and sockfd leaks Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 012/113] usbip: vhci_hcd: check rhport before using in vhci_hub_control() Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 013/113] Revert "xhci: plat: Register shutdown for xhci_plat" Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 014/113] xhci: Fix Kernel oops in xhci dbgtty Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 015/113] xhci: Fix USB ports for Dell Inspiron 5775 Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 016/113] USB: serial: simple: add libtransistor console Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 017/113] USB: serial: ftdi_sio: use jtag quirk for Arrow USB Blaster Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 018/113] USB: serial: cp210x: add ID for NI USB serial console Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 019/113] serial: mvebu-uart: Fix local flags handling on termios update Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 020/113] usb: typec: ucsi: Increase command completion timeout value Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 021/113] usb: core: Add quirk for HP v222w 16GB Mini Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 022/113] USB: Increment wakeup count on remote wakeup Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 023/113] ALSA: usb-audio: Skip broken EU on Dell dock USB-audio Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 024/113] virtio: add ability to iterate over vqs Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 025/113] virtio_console: dont tie bufs to a vq Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 027/113] virtio_console: drop custom control queue cleanup Greg Kroah-Hartman
2018-04-30 19:23 ` [PATCH 4.16 028/113] virtio_console: move removal code Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 029/113] virtio_console: reset on out of memory Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 030/113] drm/virtio: fix vq wait_event condition Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 031/113] tty: Dont call panic() at tty_ldisc_init() Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 032/113] tty: n_gsm: Fix long delays with control frame timeouts in ADM mode Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 033/113] tty: n_gsm: Fix DLCI handling for ADM mode if debug & 2 is not set Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 034/113] tty: Avoid possible error pointer dereference at tty_ldisc_restore() Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 035/113] tty: Use __GFP_NOFAIL for tty_ldisc_get() Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 036/113] cifs: smbd: Avoid allocating iov on the stack Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 037/113] cifs: smbd: Dont use RDMA read/write when signing is used Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 038/113] ALSA: dice: fix OUI for TC group Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 039/113] ALSA: dice: fix error path to destroy initialized stream data Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 040/113] ALSA: hda - Skip jack and others for non-existing PCM streams Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 041/113] ALSA: opl3: Hardening for potential Spectre v1 Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 042/113] ALSA: asihpi: " Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 043/113] ALSA: hdspm: " Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 044/113] ALSA: rme9652: " Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 045/113] ALSA: control: " Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 046/113] ALSA: pcm: Return negative delays from SNDRV_PCM_IOCTL_DELAY Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 047/113] ALSA: core: Report audio_tstamp in snd_pcm_sync_ptr Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 048/113] ALSA: seq: oss: Fix unbalanced use lock for synth MIDI device Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 049/113] ALSA: seq: oss: Hardening for potential Spectre v1 Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 050/113] ALSA: hda: " Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 051/113] ALSA: hda/realtek - Add some fixes for ALC233 Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 052/113] ALSA: hda/realtek - Update ALC255 depop optimize Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 053/113] ALSA: hda/realtek - change the location for one of two front mics Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 054/113] mtd: spi-nor: cadence-quadspi: Fix page fault kernel panic Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 055/113] mtd: cfi: cmdset_0001: Do not allow read/write to suspend erase block Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 056/113] mtd: cfi: cmdset_0001: Workaround Micron Erase suspend bug Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 057/113] mtd: cfi: cmdset_0002: Do not allow read/write to suspend erase block Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 058/113] mtd: rawnand: tango: Fix struct clk memory leak Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 059/113] mtd: rawnand: marvell: fix the chip-select DT parsing logic Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 060/113] kobject: dont use WARN for registration failures Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 061/113] scsi: sd_zbc: Avoid that resetting a zone fails sporadically Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 062/113] scsi: sd: Defer spinning up drive while SANITIZE is in progress Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 063/113] blk-mq: start request gstate with gen 1 Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 064/113] bfq-iosched: ensure to clear bic/bfqq pointers when preparing request Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 065/113] block: do not use interruptible wait anywhere Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 066/113] vfio: ccw: process ssch with interrupts disabled Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 067/113] SMB311: Fix reconnect Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 068/113] ANDROID: binder: prevent transactions into own process Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 069/113] PCI: aardvark: Fix logic in advk_pcie_{rd,wr}_conf() Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 070/113] PCI: aardvark: Set PIO_ADDR_LS correctly in advk_pcie_rd_conf() Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 071/113] PCI: aardvark: Use ISR1 instead of ISR0 interrupt in legacy irq mode Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 072/113] PCI: aardvark: Fix PCIe Max Read Request Size setting Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 073/113] ARM: amba: Make driver_override output consistent with other buses Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 074/113] ARM: amba: Fix race condition with driver_override Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 075/113] ARM: amba: Dont read past the end of sysfs "driver_override" buffer Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 076/113] ARM: dts: Fix NAS4220B pin config Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 077/113] ARM: socfpga_defconfig: Remove QSPI Sector 4K size force Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 078/113] KVM: arm/arm64: Close VMID generation race Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 080/113] powerpc/mm: Flush cache on memory hot(un)plug Greg Kroah-Hartman
2018-04-30 19:24 ` Greg Kroah-Hartman [this message]
2018-04-30 19:24 ` [PATCH 4.16 082/113] powerpc/powernv/npu: Do a PID GPU TLB flush when invalidating a large address range Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 083/113] crypto: drbg - set freed buffers to NULL Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 084/113] ASoC: dmic: Fix clock parenting Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 085/113] ASoC: fsl_esai: Fix divisor calculation failure at lower ratio Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 086/113] libceph: un-backoff on tick when we have a authenticated session Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 087/113] libceph: reschedule a tick in finish_hunting() Greg Kroah-Hartman
2018-04-30 19:24 ` [PATCH 4.16 088/113] libceph: validate con->state at the top of try_write() Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 089/113] PCI / PM: Do not clear state_saved in pci_pm_freeze() when smart suspend is set Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 090/113] virt: vbox: Move declarations of vboxguest private functions to private header Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 091/113] virt: vbox: Add vbg_req_free() helper function Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 092/113] virt: vbox: Use __get_free_pages instead of kmalloc for DMA32 memory Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 093/113] fpga-manager: altera-ps-spi: preserve nCONFIG state Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 094/113] module: Fix display of wrong module .text address Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 095/113] earlycon: Use a pointer table to fix __earlycon_table stride Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 096/113] cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer interrupt Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 097/113] rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 103/113] drm/amd/display: Fix deadlock when flushing irq Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 104/113] drm/amd/display: Dont read EDID in atomic_check Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 107/113] x86/ipc: Fix x32 version of shmid64_ds and msqid64_ds Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 108/113] x86/smpboot: Dont use mwait_play_dead() on AMD systems Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 109/113] x86/microcode/intel: Save microcode patch unconditionally Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 110/113] x86/microcode: Do not exit early from __reload_late() Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 111/113] tick/sched: Do not mess with an enqueued hrtimer Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 112/113] crypto: ccp - add check to get PSP master only when PSP is detected Greg Kroah-Hartman
2018-04-30 19:25 ` [PATCH 4.16 113/113] arm/arm64: KVM: Add PSCI version selection API Greg Kroah-Hartman
2018-05-01 13:22 ` [PATCH 4.16 000/113] 4.16.7-stable review Guenter Roeck
2018-05-01 13:36 ` Dan Rue
2018-05-01 15:02   ` Greg Kroah-Hartman
2018-05-01 19:05 ` Shuah Khan
2018-05-01 19:26   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180430184018.603869231@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bsingharora@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mahesh@linux.vnet.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).