stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Ying Huang <ying.huang@intel.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Chen Yu <yu.c.chen@intel.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Len Brown <len.brown@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Rui Zhang <rui.zhang@intel.com>
Subject: [PATCH 4.9 047/125] PCI/PM: Restore the status of PCI devices across hibernation
Date: Tue, 25 Jul 2017 12:19:22 -0700	[thread overview]
Message-ID: <20170725192016.966303313@linuxfoundation.org> (raw)
In-Reply-To: <20170725192014.314851996@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Chen Yu <yu.c.chen@intel.com>

commit e60514bd4485c0c7c5a7cf779b200ce0b95c70d6 upstream.

Currently we saw a lot of "No irq handler" errors during hibernation, which
caused the system hang finally:

  ata4.00: qc timeout (cmd 0xec)
  ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
  ata4.00: revalidation failed (errno=-5)
  ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
  do_IRQ: 31.151 No irq handler for vector

According to above logs, there is an interrupt triggered and it is
dispatched to CPU31 with a vector number 151, but there is no handler for
it, thus this IRQ will not get acked and will cause an IRQ flood which
kills the system.  To be more specific, the 31.151 is an interrupt from the
AHCI host controller.

After some investigation, the reason why this issue is triggered is because
the thaw_noirq() function does not restore the MSI/MSI-X settings across
hibernation.

The scenario is illustrated below:

  1. Before hibernation, IRQ 34 is the handler for the AHCI device, which
     is bound to CPU31.

  2. Hibernation starts, the AHCI device is put into low power state.

  3. All the nonboot CPUs are put offline, so IRQ 34 has to be migrated to
     the last alive one - CPU0.

  4. After the snapshot has been created, all the nonboot CPUs are brought
     up again; IRQ 34 remains bound to CPU0.

  5. AHCI devices are put into D0.

  6. The snapshot is written to the disk.

The issue is triggered in step 6.  The AHCI interrupt should be delivered
to CPU0, however it is delivered to the original CPU31 instead, which
causes the "No irq handler" issue.

Ying Huang has provided a clue that, in step 3 it is possible that writing
to the register might not take effect as the PCI devices have been
suspended.

In step 3, the IRQ 34 affinity should be modified from CPU31 to CPU0, but
in fact it is not.  In __pci_write_msi_msg(), if the device is already in
low power state, the low level MSI message entry will not be updated but
cached.  During the device restore process after a normal suspend/resume,
pci_restore_msi_state() writes the cached MSI back to the hardware.

But this is not the case for hibernation.  pci_restore_msi_state() is not
currently called in pci_pm_thaw_noirq(), although pci_save_state() has
saved the necessary PCI cached information in pci_pm_freeze_noirq().

Restore the PCI status for the device during hibernation.  Otherwise the
status might be lost across hibernation (for example, settings for MSI,
MSI-X, ATS, ACS, IOV, etc.), which might cause problems during hibernation.

Suggested-by: Ying Huang <ying.huang@intel.com>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/pci/pci-driver.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -954,6 +954,7 @@ static int pci_pm_thaw_noirq(struct devi
 		return pci_legacy_resume_early(dev);
 
 	pci_update_current_state(pci_dev, PCI_D0);
+	pci_restore_state(pci_dev);
 
 	if (drv && drv->pm && drv->pm->thaw_noirq)
 		error = drv->pm->thaw_noirq(dev);

  parent reply	other threads:[~2017-07-25 19:21 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-25 19:18 [PATCH 4.9 000/125] 4.9.40-stable review Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 001/125] disable new gcc-7.1.1 warnings for now Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 002/125] [media] ir-core: fix gcc-7 warning on bool arithmetic Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 003/125] dm mpath: cleanup -Wbool-operation warning in choose_pgpath() Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 004/125] [media] s5p-jpeg: dont return a random width/height Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 005/125] thermal: max77620: fix device-node reference imbalance Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 006/125] thermal: cpu_cooling: Avoid accessing potentially freed structures Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 007/125] ath9k: fix tx99 use after free Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 008/125] ath9k: fix tx99 bus error Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 009/125] ath9k: fix an invalid pointer dereference in ath9k_rng_stop() Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 010/125] NFC: fix broken device allocation Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 011/125] NFC: nfcmrvl_uart: add missing tty-device sanity check Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 012/125] NFC: nfcmrvl: do not use device-managed resources Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 013/125] NFC: nfcmrvl: use nfc-device for firmware download Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 014/125] NFC: nfcmrvl: fix firmware-management initialisation Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 015/125] nfc: Ensure presence of required attributes in the activate_target handler Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 016/125] nfc: Fix the sockaddr length sanitization in llcp_sock_connect Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 017/125] NFC: Add sockaddr length checks before accessing sa_family in bind handlers Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 018/125] perf intel-pt: Move decoder error setting into one condition Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 019/125] perf intel-pt: Improve sample timestamp Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 020/125] perf intel-pt: Fix missing stack clear Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 021/125] perf intel-pt: Ensure IP is zero when state is INTEL_PT_STATE_NO_IP Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 022/125] perf intel-pt: Fix last_ip usage Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 023/125] perf intel-pt: Ensure never to set last_ip when packet count is zero Greg Kroah-Hartman
2017-07-25 19:18 ` [PATCH 4.9 024/125] perf intel-pt: Use FUP always when scanning for an IP Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 025/125] perf intel-pt: Clear FUP flag on error Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 026/125] Bluetooth: use constant time memory comparison for secret values Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 027/125] wlcore: fix 64K page support Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 028/125] btrfs: Dont clear SGID when inheriting ACLs Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 029/125] igb: Explicitly select page 0 at initialization Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 030/125] ASoC: compress: Derive substream from stream based on direction Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 031/125] PM / Domains: Fix unsafe iteration over modified list of device links Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 032/125] PM / Domains: Fix unsafe iteration over modified list of domain providers Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 033/125] PM / Domains: Fix unsafe iteration over modified list of domains Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 034/125] scsi: ses: do not add a device to an enclosure if enclosure_add_links() fails Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 035/125] scsi: Add STARGET_CREATED_REMOVE state to scsi_target_state Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 036/125] iscsi-target: Add login_keys_workaround attribute for non RFC initiators Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 037/125] xen/scsiback: Fix a TMR related use-after-free Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 038/125] powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp() Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 039/125] powerpc/64: Fix atomic64_inc_not_zero() to return an int Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 040/125] powerpc: Fix emulation of mcrf in emulate_step() Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 041/125] powerpc: Fix emulation of mfocrf " Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 042/125] powerpc/asm: Mark cr0 as clobbered in mftb() Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 043/125] powerpc/mm/radix: Properly clear process table entry Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 044/125] af_key: Fix sadb_x_ipsecrequest parsing Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 045/125] PCI: Work around poweroff & suspend-to-RAM issue on Macbook Pro 11 Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 046/125] PCI: rockchip: Use normal register bank for config accessors Greg Kroah-Hartman
2017-07-25 19:19 ` Greg Kroah-Hartman [this message]
2017-07-25 19:19 ` [PATCH 4.9 048/125] ipvs: SNAT packet replies only for NATed connections Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 049/125] xhci: fix 20000ms port resume timeout Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 050/125] xhci: Fix NULL pointer dereference when cleaning up streams for removed host Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 051/125] xhci: Bad Ethernet performance plugged in ASM1042A host Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 052/125] [media] mxl111sf: Fix driver to use heap allocate buffers for USB messages Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 053/125] usb: storage: return on error to avoid a null pointer dereference Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 054/125] USB: cdc-acm: add device-id for quirky printer Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 055/125] usb: renesas_usbhs: fix usbhsc_resume() for !USBHSF_RUNTIME_PWCTRL Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 056/125] usb: renesas_usbhs: gadget: disable all eps when the driver stops Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 057/125] md: dont use flush_signals in userspace processes Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 059/125] [media] cx88: Fix regression in initial video standard setting Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 060/125] libnvdimm, btt: fix btt_rw_page not returning errors Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 061/125] libnvdimm: fix badblock range handling of ARS range Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 062/125] ext2: Dont clear SGID when inheriting ACLs Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 063/125] Raid5 should update rdev->sectors after reshape Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 064/125] s390/syscalls: Fix out of bounds arguments access Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 068/125] ipmi: use rcu lock around call to intf->handlers->sender() Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 069/125] ipmi:ssif: Add missing unlock in error branch Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 070/125] xfs: Dont clear SGID when inheriting ACLs Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 071/125] f2fs: sanity check size of nat and sit cache Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 072/125] f2fs: Dont clear SGID when inheriting ACLs Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 074/125] ovl: drop CAP_SYS_RESOURCE from saved mounters credentials Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 075/125] vfio: Fix group release deadlock Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 076/125] vfio: New external user group/file match Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 077/125] nvme-rdma: remove race conditions from IB signalling Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 078/125] ftrace: Fix uninitialized variable in match_records() Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 079/125] MIPS: Fix mips_atomic_set() retry condition Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 080/125] MIPS: Fix mips_atomic_set() with EVA Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 081/125] MIPS: Negate error syscall return in trace Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 082/125] ubifs: Dont leak kernel memory to the MTD Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 083/125] ACPI / EC: Drop EC noirq hooks to fix a regression Greg Kroah-Hartman
2017-07-25 19:19 ` [PATCH 4.9 084/125] Revert "ACPI / EC: Enable event freeze mode..." " Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 085/125] x86/acpi: Prevent out of bound access caused by broken ACPI tables Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 086/125] x86/ioapic: Pass the correct data to unmask_ioapic_irq() Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 087/125] MIPS: Fix MIPS I ISA /proc/cpuinfo reporting Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 088/125] MIPS: Save static registers before sysmips Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 089/125] MIPS: Actually decode JALX in `__compute_return_epc_for_insn Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 090/125] MIPS: Fix unaligned PC interpretation in `compute_return_epc Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 091/125] MIPS: math-emu: Prevent wrong ISA mode instruction emulation Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 092/125] MIPS: Send SIGILL for BPOSGE32 in `__compute_return_epc_for_insn Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 093/125] MIPS: Rename `sigill_r6 to `sigill_r2r6 " Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 094/125] MIPS: Send SIGILL for linked branches " Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 095/125] MIPS: Send SIGILL for R6 " Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 096/125] MIPS: Fix a typo: s/preset/present/ in r2-to-r6 emulation error message Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 097/125] Input: i8042 - fix crash at boot time Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 098/125] IB/iser: Fix connection teardown race condition Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 099/125] IB/core: Namespace is mandatory input for address resolution Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 100/125] sunrpc: use constant time memory comparison for mac Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 101/125] NFS: only invalidate dentrys that are clearly invalid Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 102/125] udf: Fix deadlock between writeback and udf_setsize() Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 103/125] target: Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 104/125] iser-target: Avoid isert_conn->cm_id dereference in isert_login_recv_done Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 106/125] Revert "perf/core: Drop kernel samples even though :u is specified" Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 107/125] staging: rtl8188eu: add TL-WN722N v2 support Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 109/125] staging: sm750fb: avoid conflicting vesafb Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 110/125] staging: lustre: ko2iblnd: check copy_from_iter/copy_to_iter return code Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 111/125] ceph: fix race in concurrent readdir Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 112/125] RDMA/core: Initialize port_num in qp_attr Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 113/125] drm/mst: Fix error handling during MST sideband message reception Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 114/125] drm/mst: Avoid dereferencing a NULL mstb in drm_dp_mst_handle_up_req() Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 115/125] drm/mst: Avoid processing partially received up/down message transactions Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 116/125] mlx5: Avoid that mlx5_ib_sg_to_klms() overflows the klms[] array Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 117/125] hfsplus: Dont clear SGID when inheriting ACLs Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 118/125] ovl: fix random return value on mount Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 119/125] acpi/nfit: Fix memory corruption/Unregister mce decoder on failure Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 120/125] of: device: Export of_device_{get_modalias, uvent_modalias} to modules Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 121/125] spmi: Include OF based modalias in device uevent Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 122/125] reiserfs: Dont clear SGID when inheriting ACLs Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 123/125] PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 124/125] tracing: Fix kmemleak in instance_rmdir Greg Kroah-Hartman
2017-07-25 19:20 ` [PATCH 4.9 125/125] alarmtimer: dont rate limit one-shot timers Greg Kroah-Hartman
2017-07-26  2:56 ` [PATCH 4.9 000/125] 4.9.40-stable review Guenter Roeck
2017-07-26 14:12 ` Sumit Semwal
2017-07-26 19:56   ` Greg Kroah-Hartman
2017-07-26 14:24 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170725192016.966303313@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rui.zhang@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=ying.huang@intel.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).