From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f43.google.com ([74.125.82.43]:32852 "EHLO mail-wm0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752238AbeCOVeR (ORCPT ); Thu, 15 Mar 2018 17:34:17 -0400 Received: by mail-wm0-f43.google.com with SMTP id s206so129290wme.0 for ; Thu, 15 Mar 2018 14:34:16 -0700 (PDT) Subject: Re: [PATCH RFC 0/7] net: phy: patch series aiming to improve few aspects of phylib To: Geert Uytterhoeven Cc: Florian Fainelli , Andrew Lunn , Geert Uytterhoeven , "netdev@vger.kernel.org" References: <421476ed-03c7-63a0-44a4-c6d9c7702647@gmail.com> From: Heiner Kallweit Message-ID: Date: Thu, 15 Mar 2018 22:34:04 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org List-ID: Am 15.03.2018 um 11:07 schrieb Geert Uytterhoeven: > Hi Heiner, > > On Wed, Mar 14, 2018 at 9:10 PM, Heiner Kallweit wrote: >> This patch series aims to tackle few issues with phylib: >> >> - address issues with patch series [1] (smsc911x + phylib changes) >> - make phy_stop synchronous >> - get rid of phy_start/stop_machine and handle it in phy_start/phy_stop >> - in mdio_suspend consider runtime pm state of mdio bus parent >> - consider more WOL conditions when deciding whether PHY is allowed to >> suspend >> - only resume phy after system suspend if needed >> >> [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg196061.html >> >> It works fine here but other NIC drivers may use phylib differently. >> Therefore I'd appreciate feedback and more testing. >> >> I could think of some subsequent patches, e.g. phy_error() could be >> reduced to calling phy_stop() and printing an error message >> (today it silently sets the PHY state to PHY_HALTED). >> >> Heiner Kallweit (7): >> net: phy: unconditionally resume and re-enable interrupts in phy_start >> net: phy: improve checking for when PHY is allowed to suspend >> net: phy: resume PHY only if needed in mdio_bus_phy_suspend >> net: phy: remove phy_start_machine >> net: phy: make phy_stop synchronous >> net: phy: use new function phy_stop_suspending in mdio_bus_phy_suspend >> net: phy: remove phy_stop_machine > > Thanks for your series! > > I've gave this a try on a few machines, incl. r8a73a4/ape6evm and > sh73a0/kzm9g, which have smsc911x Ethernet chips on a power-managed bus. > > On both machines it crashes during system suspend, which means the smsc911c's > registers are accessed while the device is suspended: > > PM: suspend entry (deep) > PM: Syncing filesystems ... done. > Freezing user space processes ... (elapsed 0.001 seconds) done. > OOM killer disabled. > Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > PM: suspend devices took 0.130 seconds > Disabling non-boot CPUs ... > Unhandled fault: imprecise external abort (0x1406) at 0x000ce408 > pgd = f4465d7b > [000ce408] *pgd=00000000 > Internal error: : 1406 [#1] SMP ARM > Modules linked in: > CPU: 1 PID: 20 Comm: kworker/1:1 Not tainted > 4.16.0-rc5-kzm9g-00470-g319cfb3643965f46-dirty #1030 > Hardware name: Generic SH73A0 (Flattened Device Tree) > Workqueue: events linkwatch_event > PC is at __smsc911x_reg_read+0x1c/0x60 > LR is at smsc911x_tx_get_txstatus+0x2c/0x7c > pc : [] lr : [] psr: 20010093 > sp : df51bd38 ip : df51bce0 fp : 00000000 > r10: 00000000 r9 : 00000000 r8 : c0909b58 > r7 : a0010013 r6 : df636e08 r5 : df636dc0 r4 : df636800 > r3 : e0903000 r2 : 00000001 r1 : e0903080 r0 : 00000000 > Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none > Control: 10c5387d Table: 5ee4004a DAC: 00000051 > Process kworker/1:1 (pid: 20, stack limit = 0x1e2af6bb) > Stack: (0xdf51bd38 to 0xdf51c000) > bd20: c03efb14 df636800 > bd40: df636dc0 c063c198 df51bdb0 c03efa80 c03efb14 df636800 df636800 c03efb20 > bd60: c03efb14 dec5e8f4 df636800 c063c198 df51bdb0 c04b4494 dec5e8f0 dec4ea80 > bd80: df636800 c04d7c28 dec4ea80 df636800 dec5e800 c04d3d68 0000002a 00000000 > bda0: c04d3990 c020af0c df400a80 00000000 00000000 00000000 00000000 00000000 > bdc0: 00000000 00000000 00000050 00000000 df51be03 c04a5828 00000580 c04a5758 > bde0: dec4ea80 000004db 014000c0 c0908448 00000001 c04a58a0 df51be03 c04d14e0 > be00: 00000000 3cef0b86 c04d13bc dec4ea80 df636800 00000010 00000000 00000000 > be20: df636800 00000000 c0931b44 c04d73c0 00000000 00000000 00000000 00000000 > be40: 00000000 00000000 00000000 ffffffff 014000c0 df636800 c0931ad8 df51bed4 > be60: c0931ad8 c04d7468 014000c0 00000000 00000000 c014404c c0908448 c0908448 > be80: df636800 c04d7534 014000c0 00000000 00000000 014000c0 c0908448 c04b9d8c > bea0: df636800 00000000 00000000 3cef0b86 c0931b44 df636800 c0931b44 c04d8854 > bec0: df636aac c04d8b10 df51bf2c c0908448 00000000 df51bed4 df51bed4 3cef0b86 > bee0: df51bf2c df50dc80 c0931ad8 dfbdaac0 df51bf2c dfbddd00 00000000 00000001 > bf00: 00000000 c04d8b98 c04d8b74 c013cc8c 00000001 00000000 c013cc14 c013d214 > bf20: c0908448 00000000 00000004 c0931ad8 00000000 00000000 c075f7d9 3cef0b86 > bf40: c0905900 df50dc80 dfbdaac0 dfbdaac0 df51a000 dfbdaaf4 c0905900 df50dc98 > bf60: 00000008 c013d4b0 df518540 df50de80 df5110c0 00000000 df491eb0 df50dc80 > bf80: c013d1e4 df50deb8 00000000 c014293c df5110c0 c014281c 00000000 00000000 > bfa0: 00000000 00000000 00000000 c01010b4 00000000 00000000 00000000 00000000 > bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 7fdfffff fff7fdff > [] (__smsc911x_reg_read) from [] > (smsc911x_tx_get_txstatus+0x2c/0x7c) > [] (smsc911x_tx_get_txstatus) from [] > (smsc911x_tx_update_txcounters+0x14/0xa8) > [] (smsc911x_tx_update_txcounters) from [] > (smsc911x_get_stats+0xc/0x58) > [] (smsc911x_get_stats) from [] (dev_get_stats+0x58/0xa8) > [] (dev_get_stats) from [] (rtnl_fill_stats+0x38/0x118) > [] (rtnl_fill_stats) from [] > (rtnl_fill_ifinfo.constprop.12+0x6a8/0x105c) > [] (rtnl_fill_ifinfo.constprop.12) from [] > (rtmsg_ifinfo_build_skb+0x7c/0xd0) > [] (rtmsg_ifinfo_build_skb) from [] > (rtmsg_ifinfo_event.part.6+0x28/0x4c) > [] (rtmsg_ifinfo_event.part.6) from [] > (rtmsg_ifinfo+0x24/0x2c) > [] (rtmsg_ifinfo) from [] (netdev_state_change+0x5c/0x80) > [] (netdev_state_change) from [] > (linkwatch_do_dev+0x50/0x74) > [] (linkwatch_do_dev) from [] > (__linkwatch_run_queue+0x138/0x19c) > [] (__linkwatch_run_queue) from [] > (linkwatch_event+0x24/0x34) > [] (linkwatch_event) from [] (process_one_work+0x250/0x41c) > [] (process_one_work) from [] (worker_thread+0x2cc/0x40c) > [] (worker_thread) from [] (kthread+0x120/0x140) > [] (kthread) from [] (ret_from_fork+0x14/0x20) > Exception stack(0xdf51bfb0 to 0xdf51bff8) > bfa0: 00000000 00000000 00000000 00000000 > bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 > Code: e5903000 e0831001 e5910000 f57ff04f (e12fff1e) > > I've bisected this to "net: phy: use new function phy_stop_suspending in, > mdio_bus_phy_suspend". > Thanks a lot for testing and the quick feedback! Reason for the problem seems to be that mdio_bus_phy_suspend() now includes a call to phy_link_down() which eventually fires an asynchronous linkwatch event. And this asynchronous event (accessing the chip registers) is processed only after the chip has been powered down. Maybe it's sufficient if I set the do_carrier parameter of phy_link_down() to false if suspending. Have to spend few more thoughts on this. Regards, Heiner > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds >