* [PATCH] PCI / PM: Restore the status of PCI devices across hibernation @ 2017-05-25 8:49 Chen Yu 2017-06-01 23:22 ` Rafael J. Wysocki 2017-06-16 19:45 ` Bjorn Helgaas 0 siblings, 2 replies; 5+ messages in thread From: Chen Yu @ 2017-05-25 8:49 UTC (permalink / raw) To: Yu Chen Cc: Rafael J . Wysocki, Bjorn Helgaas, Len Brown, Dan Williams, Rui Zhang, Ying Huang, linux-pci, linux-pm, linux-kernel Currently we saw a lot of "No irq handler" errors during hibernation, which caused the system hang finally: [ 710.141581] ata4.00: qc timeout (cmd 0xec) [ 710.147135] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 710.154593] ata4.00: revalidation failed (errno=-5) [ 710.468124] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 710.477746] do_IRQ: 31.151 No irq handler for vector According to above logs, there is an interrupt triggered and it is dispatched to CPU31 with a vector number 151, but there is no handler for it, thus this irq will not get acked and caused irq flood which kill the system. To be more specific, the 31.151 is an interrupt from the ahci host controller. After some investigation, the reason why this issue is triggered is because the thaw_noirq() function does not restore the MSI/MSIX settings across hibernation. The scenario is illustrated below: 1. Before the hibernation starts, the irq 34 is the handler for the ahci device, which is binded on cpu31. 2. Hibernation starts, the ahci device is put into low power state. 3. All the nonboot CPUs are put offline, so the irq 34 has to be migrated to the last alive one - CPU0. 4. After the snapshot has been created, all the nonboot CPUs are brought up again, the CPU affinity for IRQ 34 remains to be 0. 5. ahci device are put into D0. 6. The snapshot is written to the disk. The issue is triggered in step 6, in theory the ahci interrupt should be delivered to CPU0, however the actually result is that this interrupt is delivered to the original CPU31 instead, which cause the "No irq handler" issue. Ying Huang has has provided a clue that, in step 3 it is possible that the writing to the register might not take effect as the PCI devices have been put suspended. Actually it is true: In step 3, the irq 34 affinity is supposed to be modified from 31 to 0, but actually it did not. In __pci_write_msi_msg(), if the device is already in low power state, the low level msi message entry will not be updated but cached. So in theory during the device restore process, the cached msi modification information should be written back to the hardware, and this is what pci_restore_msi_state() do during normal suspend-resume. But this is not the case for hibernation, pci_restore_msi_state() is not invoked currently, to be more specific, pci_restore_state() is not invoked in pci_pm_thaw_noirq(), although pci_save_state() has saved the necessary pci cached information in pci_pm_freeze_noirq(). This patch tries to restore the pci status for the device during hibernation, otherwise the status might be lost across hibernation(for example, the MSI/MSIX message settings), which might cause problems during hibernation. Suggested-by: Ying Huang <ying.huang@intel.com> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Len Brown <len.brown@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Rui Zhang <rui.zhang@intel.com> Cc: Ying Huang <ying.huang@intel.com> Cc: linux-pci@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Chen Yu <yu.c.chen@intel.com> --- drivers/pci/pci-driver.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 192e7b6..b399fa3 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -964,6 +964,7 @@ static int pci_pm_thaw_noirq(struct device *dev) return pci_legacy_resume_early(dev); pci_update_current_state(pci_dev, PCI_D0); + pci_restore_state(pci_dev); if (drv && drv->pm && drv->pm->thaw_noirq) error = drv->pm->thaw_noirq(dev); -- 2.7.4 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] PCI / PM: Restore the status of PCI devices across hibernation 2017-05-25 8:49 [PATCH] PCI / PM: Restore the status of PCI devices across hibernation Chen Yu @ 2017-06-01 23:22 ` Rafael J. Wysocki 2017-06-02 5:51 ` Chen Yu 2017-06-16 19:45 ` Bjorn Helgaas 1 sibling, 1 reply; 5+ messages in thread From: Rafael J. Wysocki @ 2017-06-01 23:22 UTC (permalink / raw) To: Chen Yu, Bjorn Helgaas Cc: Rafael J . Wysocki, Len Brown, Dan Williams, Rui Zhang, Ying Huang, Linux PCI, Linux PM, Linux Kernel Mailing List On Thu, May 25, 2017 at 10:49 AM, Chen Yu <yu.c.chen@intel.com> wrote: > Currently we saw a lot of "No irq handler" errors during hibernation, > which caused the system hang finally: > > [ 710.141581] ata4.00: qc timeout (cmd 0xec) > [ 710.147135] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4) > [ 710.154593] ata4.00: revalidation failed (errno=-5) > [ 710.468124] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 710.477746] do_IRQ: 31.151 No irq handler for vector > > According to above logs, there is an interrupt triggered and it is > dispatched to CPU31 with a vector number 151, but there is no handler > for it, thus this irq will not get acked and caused irq flood which kill > the system. To be more specific, the 31.151 is an interrupt from the ahci > host controller. > > After some investigation, the reason why this issue is triggered is > because the thaw_noirq() function does not restore the MSI/MSIX settings > across hibernation. > > The scenario is illustrated below: > > 1. Before the hibernation starts, the irq 34 is the handler for the ahci device, > which is binded on cpu31. > 2. Hibernation starts, the ahci device is put into low power state. > 3. All the nonboot CPUs are put offline, so the irq 34 has to be migrated to > the last alive one - CPU0. > 4. After the snapshot has been created, all the nonboot CPUs are brought up again, > the CPU affinity for IRQ 34 remains to be 0. > 5. ahci device are put into D0. > 6. The snapshot is written to the disk. > > The issue is triggered in step 6, in theory the ahci interrupt should be > delivered to CPU0, however the actually result is that this interrupt is > delivered to the original CPU31 instead, which cause the "No irq handler" issue. > > Ying Huang has has provided a clue that, in step 3 it is possible that the writing > to the register might not take effect as the PCI devices have been put suspended. > Actually it is true: > In step 3, the irq 34 affinity is supposed to be modified from 31 to 0, > but actually it did not. In __pci_write_msi_msg(), if the device is already > in low power state, the low level msi message entry will not be updated > but cached. So in theory during the device restore process, the cached msi > modification information should be written back to the hardware, and this > is what pci_restore_msi_state() do during normal suspend-resume. > But this is not the case for hibernation, pci_restore_msi_state() is not > invoked currently, to be more specific, pci_restore_state() is not invoked > in pci_pm_thaw_noirq(), although pci_save_state() has saved the necessary > pci cached information in pci_pm_freeze_noirq(). > > This patch tries to restore the pci status for the device during hibernation, > otherwise the status might be lost across hibernation(for example, the MSI/MSIX > message settings), which might cause problems during hibernation. > > Suggested-by: Ying Huang <ying.huang@intel.com> > Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: Len Brown <len.brown@intel.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Rui Zhang <rui.zhang@intel.com> > Cc: Ying Huang <ying.huang@intel.com> > Cc: linux-pci@vger.kernel.org > Cc: linux-pm@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Chen Yu <yu.c.chen@intel.com> Thanks for the detailed description of what's going on! The fix is correct IMO, so Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Bjorn, if you want me to take this one, please let me know. > --- > drivers/pci/pci-driver.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 192e7b6..b399fa3 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -964,6 +964,7 @@ static int pci_pm_thaw_noirq(struct device *dev) > return pci_legacy_resume_early(dev); > > pci_update_current_state(pci_dev, PCI_D0); > + pci_restore_state(pci_dev); > > if (drv && drv->pm && drv->pm->thaw_noirq) > error = drv->pm->thaw_noirq(dev); > -- > 2.7.4 > Thanks, Rafael ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] PCI / PM: Restore the status of PCI devices across hibernation 2017-06-01 23:22 ` Rafael J. Wysocki @ 2017-06-02 5:51 ` Chen Yu 0 siblings, 0 replies; 5+ messages in thread From: Chen Yu @ 2017-06-02 5:51 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Bjorn Helgaas, Rafael J . Wysocki, Len Brown, Dan Williams, Rui Zhang, Ying Huang, Linux PCI, Linux PM, Linux Kernel Mailing List On Fri, Jun 02, 2017 at 01:22:28AM +0200, Rafael J. Wysocki wrote: > On Thu, May 25, 2017 at 10:49 AM, Chen Yu <yu.c.chen@intel.com> wrote: > > Currently we saw a lot of "No irq handler" errors during hibernation, > > which caused the system hang finally: > > > > [ 710.141581] ata4.00: qc timeout (cmd 0xec) > > [ 710.147135] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4) > > [ 710.154593] ata4.00: revalidation failed (errno=-5) > > [ 710.468124] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > > [ 710.477746] do_IRQ: 31.151 No irq handler for vector > > > > According to above logs, there is an interrupt triggered and it is > > dispatched to CPU31 with a vector number 151, but there is no handler > > for it, thus this irq will not get acked and caused irq flood which kill > > the system. To be more specific, the 31.151 is an interrupt from the ahci > > host controller. > > > > After some investigation, the reason why this issue is triggered is > > because the thaw_noirq() function does not restore the MSI/MSIX settings > > across hibernation. > > > > The scenario is illustrated below: > > > > 1. Before the hibernation starts, the irq 34 is the handler for the ahci device, > > which is binded on cpu31. > > 2. Hibernation starts, the ahci device is put into low power state. > > 3. All the nonboot CPUs are put offline, so the irq 34 has to be migrated to > > the last alive one - CPU0. > > 4. After the snapshot has been created, all the nonboot CPUs are brought up again, > > the CPU affinity for IRQ 34 remains to be 0. > > 5. ahci device are put into D0. > > 6. The snapshot is written to the disk. > > > > The issue is triggered in step 6, in theory the ahci interrupt should be > > delivered to CPU0, however the actually result is that this interrupt is > > delivered to the original CPU31 instead, which cause the "No irq handler" issue. > > > > Ying Huang has has provided a clue that, in step 3 it is possible that the writing > > to the register might not take effect as the PCI devices have been put suspended. > > Actually it is true: > > In step 3, the irq 34 affinity is supposed to be modified from 31 to 0, > > but actually it did not. In __pci_write_msi_msg(), if the device is already > > in low power state, the low level msi message entry will not be updated > > but cached. So in theory during the device restore process, the cached msi > > modification information should be written back to the hardware, and this > > is what pci_restore_msi_state() do during normal suspend-resume. > > But this is not the case for hibernation, pci_restore_msi_state() is not > > invoked currently, to be more specific, pci_restore_state() is not invoked > > in pci_pm_thaw_noirq(), although pci_save_state() has saved the necessary > > pci cached information in pci_pm_freeze_noirq(). > > > > This patch tries to restore the pci status for the device during hibernation, > > otherwise the status might be lost across hibernation(for example, the MSI/MSIX > > message settings), which might cause problems during hibernation. > > > > Suggested-by: Ying Huang <ying.huang@intel.com> > > Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Cc: Bjorn Helgaas <bhelgaas@google.com> > > Cc: Len Brown <len.brown@intel.com> > > Cc: Dan Williams <dan.j.williams@intel.com> > > Cc: Rui Zhang <rui.zhang@intel.com> > > Cc: Ying Huang <ying.huang@intel.com> > > Cc: linux-pci@vger.kernel.org > > Cc: linux-pm@vger.kernel.org > > Cc: linux-kernel@vger.kernel.org > > Signed-off-by: Chen Yu <yu.c.chen@intel.com> > > Thanks for the detailed description of what's going on! > > The fix is correct IMO, so > > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Thanks for the review! Please also help add a tag for stable: Cc: stable <stable@vger.kernel.org> thanks, Yu ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] PCI / PM: Restore the status of PCI devices across hibernation 2017-05-25 8:49 [PATCH] PCI / PM: Restore the status of PCI devices across hibernation Chen Yu 2017-06-01 23:22 ` Rafael J. Wysocki @ 2017-06-16 19:45 ` Bjorn Helgaas 2017-06-16 22:23 ` Rafael J. Wysocki 1 sibling, 1 reply; 5+ messages in thread From: Bjorn Helgaas @ 2017-06-16 19:45 UTC (permalink / raw) To: Chen Yu Cc: Rafael J . Wysocki, Bjorn Helgaas, Len Brown, Dan Williams, Rui Zhang, Ying Huang, linux-pci, linux-pm, linux-kernel On Thu, May 25, 2017 at 04:49:07PM +0800, Chen Yu wrote: > Currently we saw a lot of "No irq handler" errors during hibernation, > which caused the system hang finally: > > [ 710.141581] ata4.00: qc timeout (cmd 0xec) > [ 710.147135] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4) > [ 710.154593] ata4.00: revalidation failed (errno=-5) > [ 710.468124] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 710.477746] do_IRQ: 31.151 No irq handler for vector > > According to above logs, there is an interrupt triggered and it is > dispatched to CPU31 with a vector number 151, but there is no handler > for it, thus this irq will not get acked and caused irq flood which kill > the system. To be more specific, the 31.151 is an interrupt from the ahci > host controller. > > After some investigation, the reason why this issue is triggered is > because the thaw_noirq() function does not restore the MSI/MSIX settings > across hibernation. > > The scenario is illustrated below: > > 1. Before the hibernation starts, the irq 34 is the handler for the ahci device, > which is binded on cpu31. > 2. Hibernation starts, the ahci device is put into low power state. > 3. All the nonboot CPUs are put offline, so the irq 34 has to be migrated to > the last alive one - CPU0. > 4. After the snapshot has been created, all the nonboot CPUs are brought up again, > the CPU affinity for IRQ 34 remains to be 0. > 5. ahci device are put into D0. > 6. The snapshot is written to the disk. > > The issue is triggered in step 6, in theory the ahci interrupt should be > delivered to CPU0, however the actually result is that this interrupt is > delivered to the original CPU31 instead, which cause the "No irq handler" issue. > > Ying Huang has has provided a clue that, in step 3 it is possible that the writing > to the register might not take effect as the PCI devices have been put suspended. > Actually it is true: > In step 3, the irq 34 affinity is supposed to be modified from 31 to 0, > but actually it did not. In __pci_write_msi_msg(), if the device is already > in low power state, the low level msi message entry will not be updated > but cached. So in theory during the device restore process, the cached msi > modification information should be written back to the hardware, and this > is what pci_restore_msi_state() do during normal suspend-resume. > But this is not the case for hibernation, pci_restore_msi_state() is not > invoked currently, to be more specific, pci_restore_state() is not invoked > in pci_pm_thaw_noirq(), although pci_save_state() has saved the necessary > pci cached information in pci_pm_freeze_noirq(). > > This patch tries to restore the pci status for the device during hibernation, > otherwise the status might be lost across hibernation(for example, the MSI/MSIX > message settings), which might cause problems during hibernation. > > Suggested-by: Ying Huang <ying.huang@intel.com> > Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: Len Brown <len.brown@intel.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Rui Zhang <rui.zhang@intel.com> > Cc: Ying Huang <ying.huang@intel.com> > Cc: linux-pci@vger.kernel.org > Cc: linux-pm@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Chen Yu <yu.c.chen@intel.com> Added a stable tag and applied with Rafael's reviewed-by to pci/pm for v4.13, thanks! pci_restore_state() restores a lot of stuff besides MSI/MSI-X: PCIe device, link, slot control, ATS, VC, BARs, ACS, IOV. I guess I'm a little surprised that we haven't noticed more issues if all these things were broken. > --- > drivers/pci/pci-driver.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 192e7b6..b399fa3 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -964,6 +964,7 @@ static int pci_pm_thaw_noirq(struct device *dev) > return pci_legacy_resume_early(dev); > > pci_update_current_state(pci_dev, PCI_D0); > + pci_restore_state(pci_dev); > > if (drv && drv->pm && drv->pm->thaw_noirq) > error = drv->pm->thaw_noirq(dev); > -- > 2.7.4 > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] PCI / PM: Restore the status of PCI devices across hibernation 2017-06-16 19:45 ` Bjorn Helgaas @ 2017-06-16 22:23 ` Rafael J. Wysocki 0 siblings, 0 replies; 5+ messages in thread From: Rafael J. Wysocki @ 2017-06-16 22:23 UTC (permalink / raw) To: Bjorn Helgaas Cc: Chen Yu, Rafael J . Wysocki, Bjorn Helgaas, Len Brown, Dan Williams, Rui Zhang, Ying Huang, Linux PCI, Linux PM, Linux Kernel Mailing List On Fri, Jun 16, 2017 at 9:45 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > On Thu, May 25, 2017 at 04:49:07PM +0800, Chen Yu wrote: >> Currently we saw a lot of "No irq handler" errors during hibernation, >> which caused the system hang finally: >> >> [ 710.141581] ata4.00: qc timeout (cmd 0xec) >> [ 710.147135] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4) >> [ 710.154593] ata4.00: revalidation failed (errno=-5) >> [ 710.468124] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) >> [ 710.477746] do_IRQ: 31.151 No irq handler for vector >> >> According to above logs, there is an interrupt triggered and it is >> dispatched to CPU31 with a vector number 151, but there is no handler >> for it, thus this irq will not get acked and caused irq flood which kill >> the system. To be more specific, the 31.151 is an interrupt from the ahci >> host controller. >> >> After some investigation, the reason why this issue is triggered is >> because the thaw_noirq() function does not restore the MSI/MSIX settings >> across hibernation. >> >> The scenario is illustrated below: >> >> 1. Before the hibernation starts, the irq 34 is the handler for the ahci device, >> which is binded on cpu31. >> 2. Hibernation starts, the ahci device is put into low power state. >> 3. All the nonboot CPUs are put offline, so the irq 34 has to be migrated to >> the last alive one - CPU0. >> 4. After the snapshot has been created, all the nonboot CPUs are brought up again, >> the CPU affinity for IRQ 34 remains to be 0. >> 5. ahci device are put into D0. >> 6. The snapshot is written to the disk. >> >> The issue is triggered in step 6, in theory the ahci interrupt should be >> delivered to CPU0, however the actually result is that this interrupt is >> delivered to the original CPU31 instead, which cause the "No irq handler" issue. >> >> Ying Huang has has provided a clue that, in step 3 it is possible that the writing >> to the register might not take effect as the PCI devices have been put suspended. >> Actually it is true: >> In step 3, the irq 34 affinity is supposed to be modified from 31 to 0, >> but actually it did not. In __pci_write_msi_msg(), if the device is already >> in low power state, the low level msi message entry will not be updated >> but cached. So in theory during the device restore process, the cached msi >> modification information should be written back to the hardware, and this >> is what pci_restore_msi_state() do during normal suspend-resume. >> But this is not the case for hibernation, pci_restore_msi_state() is not >> invoked currently, to be more specific, pci_restore_state() is not invoked >> in pci_pm_thaw_noirq(), although pci_save_state() has saved the necessary >> pci cached information in pci_pm_freeze_noirq(). >> >> This patch tries to restore the pci status for the device during hibernation, >> otherwise the status might be lost across hibernation(for example, the MSI/MSIX >> message settings), which might cause problems during hibernation. >> >> Suggested-by: Ying Huang <ying.huang@intel.com> >> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: Len Brown <len.brown@intel.com> >> Cc: Dan Williams <dan.j.williams@intel.com> >> Cc: Rui Zhang <rui.zhang@intel.com> >> Cc: Ying Huang <ying.huang@intel.com> >> Cc: linux-pci@vger.kernel.org >> Cc: linux-pm@vger.kernel.org >> Cc: linux-kernel@vger.kernel.org >> Signed-off-by: Chen Yu <yu.c.chen@intel.com> > > Added a stable tag and applied with Rafael's reviewed-by to pci/pm for > v4.13, thanks! > > pci_restore_state() restores a lot of stuff besides MSI/MSI-X: PCIe > device, link, slot control, ATS, VC, BARs, ACS, IOV. I guess I'm a > little surprised that we haven't noticed more issues if all these > things were broken. That's because they weren't broken. :-) None of them is expected to change over the image creation, which is why pci_pm_thaw_noirq() didn't call pci_restore_state(), but we overlooked the fact that taking nonboot CPUs offline changed the configuration of interrupts that needed to be restored afterward. So this one is really exceptional. Thanks, Rafael ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-06-16 22:23 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-05-25 8:49 [PATCH] PCI / PM: Restore the status of PCI devices across hibernation Chen Yu 2017-06-01 23:22 ` Rafael J. Wysocki 2017-06-02 5:51 ` Chen Yu 2017-06-16 19:45 ` Bjorn Helgaas 2017-06-16 22:23 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).