From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <netdev-owner@vger.kernel.org>
Received: from mail-wm0-f43.google.com ([74.125.82.43]:32852 "EHLO
        mail-wm0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1752238AbeCOVeR (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 15 Mar 2018 17:34:17 -0400
Received: by mail-wm0-f43.google.com with SMTP id s206so129290wme.0
        for <netdev@vger.kernel.org>; Thu, 15 Mar 2018 14:34:16 -0700 (PDT)
Subject: Re: [PATCH RFC 0/7] net: phy: patch series aiming to improve few
 aspects of phylib
To: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>,
        Andrew Lunn <andrew@lunn.ch>,
        Geert Uytterhoeven <geert+renesas@glider.be>,
        "netdev@vger.kernel.org" <netdev@vger.kernel.org>
References: <421476ed-03c7-63a0-44a4-c6d9c7702647@gmail.com>
 <CAMuHMdUR8U5hEnbsPOrWpMHaLYMs=1dsNJLgAO6_TrzpBjFW0Q@mail.gmail.com>
From: Heiner Kallweit <hkallweit1@gmail.com>
Message-ID: <b1d876d9-dbed-e191-ad00-05689f82780e@gmail.com>
Date: Thu, 15 Mar 2018 22:34:04 +0100
MIME-Version: 1.0
In-Reply-To: <CAMuHMdUR8U5hEnbsPOrWpMHaLYMs=1dsNJLgAO6_TrzpBjFW0Q@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

Am 15.03.2018 um 11:07 schrieb Geert Uytterhoeven:
> Hi Heiner,
> 
> On Wed, Mar 14, 2018 at 9:10 PM, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>> This patch series aims to tackle few issues with phylib:
>>
>> - address issues with patch series [1] (smsc911x + phylib changes)
>> - make phy_stop synchronous
>> - get rid of phy_start/stop_machine and handle it in phy_start/phy_stop
>> - in mdio_suspend consider runtime pm state of mdio bus parent
>> - consider more WOL conditions when deciding whether PHY is allowed to
>>   suspend
>> - only resume phy after system suspend if needed
>>
>> [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg196061.html
>>
>> It works fine here but other NIC drivers may use phylib differently.
>> Therefore I'd appreciate feedback and more testing.
>>
>> I could think of some subsequent patches, e.g. phy_error() could be
>> reduced to calling phy_stop() and printing an error message
>> (today it silently sets the PHY state to PHY_HALTED).
>>
>> Heiner Kallweit (7):
>>   net: phy: unconditionally resume and re-enable interrupts in phy_start
>>   net: phy: improve checking for when PHY is allowed to suspend
>>   net: phy: resume PHY only if needed in mdio_bus_phy_suspend
>>   net: phy: remove phy_start_machine
>>   net: phy: make phy_stop synchronous
>>   net: phy: use new function phy_stop_suspending in mdio_bus_phy_suspend
>>   net: phy: remove phy_stop_machine
> 
> Thanks for your series!
> 
> I've gave this a try on a few machines, incl. r8a73a4/ape6evm and
> sh73a0/kzm9g, which have smsc911x Ethernet chips on a power-managed bus.
> 
> On both machines it crashes during system suspend, which means the smsc911c's
> registers are accessed while the device is suspended:
> 
> PM: suspend entry (deep)
> PM: Syncing filesystems ... done.
> Freezing user space processes ... (elapsed 0.001 seconds) done.
> OOM killer disabled.
> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> PM: suspend devices took 0.130 seconds
> Disabling non-boot CPUs ...
> Unhandled fault: imprecise external abort (0x1406) at 0x000ce408
> pgd = f4465d7b
> [000ce408] *pgd=00000000
> Internal error: : 1406 [#1] SMP ARM
> Modules linked in:
> CPU: 1 PID: 20 Comm: kworker/1:1 Not tainted
> 4.16.0-rc5-kzm9g-00470-g319cfb3643965f46-dirty #1030
> Hardware name: Generic SH73A0 (Flattened Device Tree)
> Workqueue: events linkwatch_event
> PC is at __smsc911x_reg_read+0x1c/0x60
> LR is at smsc911x_tx_get_txstatus+0x2c/0x7c
> pc : [<c03eefd4>]    lr : [<c03ef820>]    psr: 20010093
> sp : df51bd38  ip : df51bce0  fp : 00000000
> r10: 00000000  r9 : 00000000  r8 : c0909b58
> r7 : a0010013  r6 : df636e08  r5 : df636dc0  r4 : df636800
> r3 : e0903000  r2 : 00000001  r1 : e0903080  r0 : 00000000
> Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 5ee4004a  DAC: 00000051
> Process kworker/1:1 (pid: 20, stack limit = 0x1e2af6bb)
> Stack: (0xdf51bd38 to 0xdf51c000)
> bd20:                                                       c03efb14 df636800
> bd40: df636dc0 c063c198 df51bdb0 c03efa80 c03efb14 df636800 df636800 c03efb20
> bd60: c03efb14 dec5e8f4 df636800 c063c198 df51bdb0 c04b4494 dec5e8f0 dec4ea80
> bd80: df636800 c04d7c28 dec4ea80 df636800 dec5e800 c04d3d68 0000002a 00000000
> bda0: c04d3990 c020af0c df400a80 00000000 00000000 00000000 00000000 00000000
> bdc0: 00000000 00000000 00000050 00000000 df51be03 c04a5828 00000580 c04a5758
> bde0: dec4ea80 000004db 014000c0 c0908448 00000001 c04a58a0 df51be03 c04d14e0
> be00: 00000000 3cef0b86 c04d13bc dec4ea80 df636800 00000010 00000000 00000000
> be20: df636800 00000000 c0931b44 c04d73c0 00000000 00000000 00000000 00000000
> be40: 00000000 00000000 00000000 ffffffff 014000c0 df636800 c0931ad8 df51bed4
> be60: c0931ad8 c04d7468 014000c0 00000000 00000000 c014404c c0908448 c0908448
> be80: df636800 c04d7534 014000c0 00000000 00000000 014000c0 c0908448 c04b9d8c
> bea0: df636800 00000000 00000000 3cef0b86 c0931b44 df636800 c0931b44 c04d8854
> bec0: df636aac c04d8b10 df51bf2c c0908448 00000000 df51bed4 df51bed4 3cef0b86
> bee0: df51bf2c df50dc80 c0931ad8 dfbdaac0 df51bf2c dfbddd00 00000000 00000001
> bf00: 00000000 c04d8b98 c04d8b74 c013cc8c 00000001 00000000 c013cc14 c013d214
> bf20: c0908448 00000000 00000004 c0931ad8 00000000 00000000 c075f7d9 3cef0b86
> bf40: c0905900 df50dc80 dfbdaac0 dfbdaac0 df51a000 dfbdaaf4 c0905900 df50dc98
> bf60: 00000008 c013d4b0 df518540 df50de80 df5110c0 00000000 df491eb0 df50dc80
> bf80: c013d1e4 df50deb8 00000000 c014293c df5110c0 c014281c 00000000 00000000
> bfa0: 00000000 00000000 00000000 c01010b4 00000000 00000000 00000000 00000000
> bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 7fdfffff fff7fdff
> [<c03eefd4>] (__smsc911x_reg_read) from [<c03ef820>]
> (smsc911x_tx_get_txstatus+0x2c/0x7c)
> [<c03ef820>] (smsc911x_tx_get_txstatus) from [<c03efa80>]
> (smsc911x_tx_update_txcounters+0x14/0xa8)
> [<c03efa80>] (smsc911x_tx_update_txcounters) from [<c03efb20>]
> (smsc911x_get_stats+0xc/0x58)
> [<c03efb20>] (smsc911x_get_stats) from [<c04b4494>] (dev_get_stats+0x58/0xa8)
> [<c04b4494>] (dev_get_stats) from [<c04d7c28>] (rtnl_fill_stats+0x38/0x118)
> [<c04d7c28>] (rtnl_fill_stats) from [<c04d3d68>]
> (rtnl_fill_ifinfo.constprop.12+0x6a8/0x105c)
> [<c04d3d68>] (rtnl_fill_ifinfo.constprop.12) from [<c04d73c0>]
> (rtmsg_ifinfo_build_skb+0x7c/0xd0)
> [<c04d73c0>] (rtmsg_ifinfo_build_skb) from [<c04d7468>]
> (rtmsg_ifinfo_event.part.6+0x28/0x4c)
> [<c04d7468>] (rtmsg_ifinfo_event.part.6) from [<c04d7534>]
> (rtmsg_ifinfo+0x24/0x2c)
> [<c04d7534>] (rtmsg_ifinfo) from [<c04b9d8c>] (netdev_state_change+0x5c/0x80)
> [<c04b9d8c>] (netdev_state_change) from [<c04d8854>]
> (linkwatch_do_dev+0x50/0x74)
> [<c04d8854>] (linkwatch_do_dev) from [<c04d8b10>]
> (__linkwatch_run_queue+0x138/0x19c)
> [<c04d8b10>] (__linkwatch_run_queue) from [<c04d8b98>]
> (linkwatch_event+0x24/0x34)
> [<c04d8b98>] (linkwatch_event) from [<c013cc8c>] (process_one_work+0x250/0x41c)
> [<c013cc8c>] (process_one_work) from [<c013d4b0>] (worker_thread+0x2cc/0x40c)
> [<c013d4b0>] (worker_thread) from [<c014293c>] (kthread+0x120/0x140)
> [<c014293c>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> Exception stack(0xdf51bfb0 to 0xdf51bff8)
> bfa0:                                     00000000 00000000 00000000 00000000
> bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> Code: e5903000 e0831001 e5910000 f57ff04f (e12fff1e)
> 
> I've bisected this to  "net: phy: use new function phy_stop_suspending in,
> mdio_bus_phy_suspend".
> 
Thanks a lot for testing and the quick feedback!
Reason for the problem seems to be that mdio_bus_phy_suspend() now includes
a call to phy_link_down() which eventually fires an asynchronous linkwatch
event. And this asynchronous event (accessing the chip registers) is
processed only after the chip has been powered down.

Maybe it's sufficient if I set the do_carrier parameter of phy_link_down()
to false if suspending. Have to spend few more thoughts on this.

Regards, Heiner


> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
>