From mboxrd@z Thu Jan 1 00:00:00 1970 From: Laurent Pinchart Subject: Re: [PATCH] PCI: Freeze PME scan before suspending devices Date: Wed, 19 Apr 2017 01:05:20 +0300 Message-ID: <4107359.5uIJCz7KyJ@avalon> References: <03aaf3220cccbabb7db86aa4eb10d1147add5c34.1492536769.git.lukas@wunner.de> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <03aaf3220cccbabb7db86aa4eb10d1147add5c34.1492536769.git.lukas@wunner.de> Sender: linux-pci-owner@vger.kernel.org To: Lukas Wunner Cc: Bjorn Helgaas , Laurent Pinchart , Geert Uytterhoeven , "Rafael J. Wysocki" , Mika Westerberg , Niklas Soderlund , Simon Horman , Yinghai Lu , Matthew Garrett , linux-pci@vger.kernel.org, linux-pm@vger.kernel.org, linux-renesas-soc@vger.kernel.org List-Id: linux-pm@vger.kernel.org Hi Lukas, Thank you for the patch. On Tuesday 18 Apr 2017 20:44:30 Lukas Wunner wrote: > Laurent Pinchart reported that the Renesas R-Car H2 Lager board > (r8a7790) crashes during suspend tests. Geert Uytterhoeven managed t= o > reproduce the issue on an M2-W Koelsch board (r8a7791): >=20 > It occurs when the PME scan runs, once per second. During PME scan, = the > PCI host bridge (rcar-pci) registers are accessed while its module cl= ock > has already been disabled, leading to the crash. >=20 > One reproducer is to configure s2ram to use "s2idle" instead of "deep= " > suspend: >=20 > # echo 0 > /sys/module/printk/parameters/console_suspend > # echo s2idle > /sys/power/mem_sleep > # echo mem > /sys/power/state >=20 > Another reproducer is to write either "platform" or "processors" to > /sys/power/pm_test. It does not (or is less likely) to happen during= > full system suspend ("core" or "none") because system suspend also > disables timers, and thus the workqueue handling PME scans no longer > runs. Geert believes the issue may still happen in the small window > between disabling module clocks and disabling timers: >=20 > # echo 0 > /sys/module/printk/parameters/console_suspend > # echo platform > /sys/power/pm_test # Or "processors" > # echo mem > /sys/power/state >=20 > (Make sure CONFIG_PCI_RCAR_GEN2 and CONFIG_USB_OHCI_HCD_PCI are > enabled.) >=20 > Rafael Wysocki agrees that PME scans should be suspended before the h= ost > bridge registers become inaccessible. To that end, queue the task on= a > workqueue that gets frozen before devices suspend. >=20 > Rafael notes however that as a result, some wakeup events may be miss= ed > if they are delivered via PME from a device without working IRQ (whic= h > hence must be polled) and occur after the workqueue has been frozen. > If that turns out to be an issue in practice, it may be possible to > solve it by calling pci_pme_list_scan() once directly from one of the= > host bridge's pm_ops callbacks. >=20 > Stacktrace for posterity: >=20 > PM: Syncing filesystems ... [ 38.566237] done. > PM: Preparing system for sleep (mem) > Freezing user space processes ... [ 38.579813] (elapsed 0.001 secon= ds) > done. Freezing remaining freezable tasks ... (elapsed 0.001 seconds) = done. > PM: Suspending system (mem) > PM: suspend of devices complete after 152.456 msecs > PM: late suspend of devices complete after 2.809 msecs > PM: noirq suspend of devices complete after 29.863 msecs > suspend debug: Waiting for 5 second(s). > Unhandled fault: asynchronous external abort (0x1211) at 0x00000000 > pgd =3D c0003000 > [00000000] *pgd=3D80000040004003, *pmd=3D00000000 > Internal error: : 1211 [#1] SMP ARM > Modules linked in: > CPU: 1 PID: 20 Comm: kworker/1:1 Not tainted > 4.9.0-rc1-koelsch-00011-g68db9bc814362e7f #3383 > Hardware name: Generic R8A7791 (Flattened Device Tree) > Workqueue: events pci_pme_list_scan > task: eb56e140 task.stack: eb58e000 > PC is at pci_generic_config_read+0x64/0x6c > LR is at rcar_pci_cfg_base+0x64/0x84 > pc : [] lr : [] psr: 600d0093 > sp : eb58fe98 ip : c041d750 fp : 00000008 > r10: c0e2283c r9 : 00000000 r8 : 600d0013 > r7 : 00000008 r6 : eb58fed6 r5 : 00000002 r4 : eb58feb4 > r3 : 00000000 r2 : 00000044 r1 : 00000008 r0 : 00000000 > Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user > Control: 30c5387d Table: 6a9f6c80 DAC: 55555555 > Process kworker/1:1 (pid: 20, stack limit =3D 0xeb58e210) > Stack: (0xeb58fe98 to 0xeb590000) > fe80: 00000002 > 00000044 fea0: eb6f5800 c041d9b0 eb58feb4 00000008 00000044 00000000 > eb78a000 eb78a000 fec0: 00000044 00000000 eb9aff00 c0424bf0 eb78a000 > 00000000 eb78a000 c0e22830 fee0: ea8a6fc0 c0424c5c eaae79c0 c0424ce0 > eb55f380 c0e22838 eb9a9800 c0235fbc ff00: eb55f380 c0e22838 eb55f380 > eb9a9800 eb9a9800 eb58e000 eb9a9824 c0e02100 ff20: eb55f398 c02366c4 > eb56e140 eb5631c0 00000000 eb55f380 c023641c 00000000 ff40: 00000000 > 00000000 00000000 c023a928 cd105598 00000000 40506a34 eb55f380 ff60: > 00000000 00000000 dead4ead ffffffff ffffffff eb58ff74 eb58ff74 000000= 00 > ff80: 00000000 dead4ead ffffffff ffffffff eb58ff90 eb58ff90 eb58ffac > eb5631c0 ffa0: c023a844 00000000 00000000 c0206d68 00000000 00000000 > 00000000 00000000 ffc0: 00000000 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 ffe0: 00000000 00000000 00000000 00000000 > 00000013 00000000 3a81336c 10ccd1dd [] (pci_generic_config_= read) > from [] > (pci_bus_read_config_word+0x58/0x80) > [] (pci_bus_read_config_word) from [] > (pci_check_pme_status+0x34/0x78) > [] (pci_check_pme_status) from [] > (pci_pme_wakeup+0x28/0x54) [] (pci_pme_wakeup) from [] > (pci_pme_list_scan+0x58/0xb4) [] (pci_pme_list_scan) from > [] > (process_one_work+0x1bc/0x308) > [] (process_one_work) from [] > (worker_thread+0x2a8/0x3e0) [] (worker_thread) from [] > (kthread+0xe4/0xfc) [] (kthread) from [] > (ret_from_fork+0x14/0x2c) Code: ea000000 e5903000 f57ff04f e3a00000 > (e5843000) > ---[ end trace 667d43ba3aa9e589 ]--- >=20 > Reported-by: Laurent Pinchart My Lager board now fails to resume (but that's also the case without th= is=20 patch, I'll have to investigate that separately) but the PCI crash is g= one, so Reviewed-by: Laurent Pinchart > Reported-and-tested-by: Geert Uytterhoeven > Acked-by: Rafael J. Wysocki > Cc: Mika Westerberg > Cc: Niklas S=F6derlund > Cc: Simon Horman > Cc: Yinghai Lu > Cc: Matthew Garrett > Cc: stable@vger.kernel.org # 2.6.37+ > Fixes: df17e62e5bff ("PCI: Add support for polling PME state on suspe= nded > legacy PCI devices") Signed-off-by: Lukas Wunner > --- > drivers/pci/pci.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) >=20 > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index aa55501d2179..c561a9e4916a 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -1784,8 +1784,8 @@ static void pci_pme_list_scan(struct work_struc= t > *work) } > =09} > =09if (!list_empty(&pci_pme_list)) > -=09=09schedule_delayed_work(&pci_pme_work, > -=09=09=09=09 msecs_to_jiffies(PME_TIMEOUT)); > +=09=09queue_delayed_work(system_freezable_wq, &pci_pme_work, > +=09=09=09=09 msecs_to_jiffies(PME_TIMEOUT)); > =09mutex_unlock(&pci_pme_list_mutex); > } >=20 > @@ -1850,8 +1850,9 @@ void pci_pme_active(struct pci_dev *dev, bool e= nable) > =09=09=09mutex_lock(&pci_pme_list_mutex); > =09=09=09list_add(&pme_dev->list, &pci_pme_list); > =09=09=09if (list_is_singular(&pci_pme_list)) > -=09=09=09=09schedule_delayed_work(&pci_pme_work, > -=09=09=09=09=09=09 =20 msecs_to_jiffies(PME_TIMEOUT)); > +=09=09=09=09queue_delayed_work(system_freezable_wq, > +=09=09=09=09=09=09 &pci_pme_work, > +=09=09=09=09=09=09 =20 msecs_to_jiffies(PME_TIMEOUT)); > =09=09=09mutex_unlock(&pci_pme_list_mutex); > =09=09} else { > =09=09=09mutex_lock(&pci_pme_list_mutex); --=20 Regards, Laurent Pinchart