* i.MX28 based system losing eth0 on boot @ 2014-05-06 16:44 Brian Lilly 2014-05-06 18:11 ` Uwe Kleine-König 2014-05-07 3:17 ` Fabio Estevam 0 siblings, 2 replies; 15+ messages in thread From: Brian Lilly @ 2014-05-06 16:44 UTC (permalink / raw) To: Uwe Kleine-König Cc: David S. Miller, Fabio Estevam, Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel Uwe: With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0 come up, then brought right back down with an MDIO rx timeout moments after. Adding back in the removed code keeps the interface alive and it's working afterward without trouble. I've tested the re-inserted code in 3.12, 3.14 without issue on our boards. Is there something else that can be done to prevent the MDIO timeouts? We are using basically the same schematic for networking as the imx28evk. Any thoughts on how to resolve this? Thanks, Brian Lilly ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: i.MX28 based system losing eth0 on boot 2014-05-06 16:44 i.MX28 based system losing eth0 on boot Brian Lilly @ 2014-05-06 18:11 ` Uwe Kleine-König 2014-05-06 18:39 ` Florian Fainelli 2014-05-06 19:12 ` Brian Lilly 2014-05-07 3:17 ` Fabio Estevam 1 sibling, 2 replies; 15+ messages in thread From: Uwe Kleine-König @ 2014-05-06 18:11 UTC (permalink / raw) To: Brian Lilly Cc: David S. Miller, Fabio Estevam, Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel, kernel Hello Brian, On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote: > With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0 > come up, then brought right back down with an MDIO rx timeout moments > after. Adding back in the removed code keeps the interface alive and > it's working afterward without trouble. I've tested the re-inserted > code in 3.12, 3.14 without issue on our boards. So you can reliably trigger that problem? You're just doing ifconfig eth0 1.2.3.4 up (or equivalent) and the interface goes down without further interference with the above mentioned commit? The exact error you're seeing is MDIO read timeout (with some prefix saying something about fec and eth0 I think)? This error is also present with a264b981f2 reverted, just doesn't affect eth0 being functional? Does the timeout always happen, or only on specific addresses? This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT? > Is there something else that can be done to prevent the MDIO timeouts? > We are using basically the same schematic for networking as the > imx28evk. Hard to say, but assuming it works just fine on the imx28evk for you, too, there seems to be some hardware difference that makes your machine fail. (That doesn't mean it's not fixable in software.) I don't know if a mdio read error is intended to make the device go down, maybe one the the netdev guys can answer that. Assuming that it's not intended, instrument the code, find out how that timeout makes your device go down and find the wrong branch. I'd start with adding stackdumps when the mdio timeout happens and when fec_enet_start_xmit is called with fep->link == 0. Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | http://www.pengutronix.de/ | ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: i.MX28 based system losing eth0 on boot 2014-05-06 18:11 ` Uwe Kleine-König @ 2014-05-06 18:39 ` Florian Fainelli 2014-05-06 19:12 ` Brian Lilly 1 sibling, 0 replies; 15+ messages in thread From: Florian Fainelli @ 2014-05-06 18:39 UTC (permalink / raw) To: Uwe Kleine-König Cc: Brian Lilly, David S. Miller, Fabio Estevam, Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel@vger.kernel.org, kernel 2014-05-06 11:11 GMT-07:00 Uwe Kleine-König <u.kleine-koenig@pengutronix.de>: > Hello Brian, > > On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote: >> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0 >> come up, then brought right back down with an MDIO rx timeout moments >> after. Adding back in the removed code keeps the interface alive and >> it's working afterward without trouble. I've tested the re-inserted >> code in 3.12, 3.14 without issue on our boards. > So you can reliably trigger that problem? You're just doing > > ifconfig eth0 1.2.3.4 up > > (or equivalent) and the interface goes down without further > interference with the above mentioned commit? The exact error you're > seeing is > > MDIO read timeout > > (with some prefix saying something about fec and eth0 I think)? > > This error is also present with a264b981f2 reverted, just doesn't affect > eth0 being functional? Does the timeout always happen, or only on > specific addresses? > > This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT? > >> Is there something else that can be done to prevent the MDIO timeouts? >> We are using basically the same schematic for networking as the >> imx28evk. > Hard to say, but assuming it works just fine on the imx28evk for you, > too, there seems to be some hardware difference that makes your machine > fail. (That doesn't mean it's not fixable in software.) > > I don't know if a mdio read error is intended to make the device go > down, maybe one the the netdev guys can answer that. What is likely happening is that you are failing auto-negotiation (phy_read_status return < 0) because of the MDIO timeout, so we never call netif_carrier_on(), and so the link is not UP. The reason for that could be a genuine MDIO read timeout from the bus, or your PHY might be slightly bogus and need more time to complete auto-negotiation, or anything that ressembles that. There is some special MDIO timeout logic in the FEC driver that I would seriously audit as it seems to be bogus, or it seems at the very least that the MDIO timeouts are known and need to be worked around. > Assuming that it's not intended, instrument the code, find out how that > timeout makes your device go down and find the wrong branch. I'd start > with adding stackdumps when the mdio timeout happens and when > fec_enet_start_xmit is called with fep->link == 0. I would also double check fec_enet_adjust_link() which seems to handle a case where we have a MDIO bus timeout, and tries to do something that looks incorrect to me. PHY_HALTED basically corresponds to phy_stop() being called, which means that you won't be running the adjust_link callback, so I wonder how this situation is actually happening. > > Best regards > Uwe > > -- > Pengutronix e.K. | Uwe Kleine-König | > Industrial Linux Solutions | http://www.pengutronix.de/ | > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Florian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: i.MX28 based system losing eth0 on boot 2014-05-06 18:11 ` Uwe Kleine-König 2014-05-06 18:39 ` Florian Fainelli @ 2014-05-06 19:12 ` Brian Lilly 2014-05-06 19:24 ` Florian Fainelli 1 sibling, 1 reply; 15+ messages in thread From: Brian Lilly @ 2014-05-06 19:12 UTC (permalink / raw) To: Uwe Kleine-König Cc: David S. Miller, Fabio Estevam, Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel, kernel It is happening during boot up: <snip, kernel 3.12 > Configuring network interfaces... [ 35.117114] fec 800f0000.ethernet eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) [ 35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready udhcpc (v1.21.1) started Sending discover... [ 37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full [ 37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Sending discover... Sending select for 10.10.10.217... Lease of 10.10.10.217 obtained, lease time 86400 /etc/udhcpc.d/50default: Adding DNS 10.10.10.13 [ 39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready done. Starting rpcbind daemon...done. net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.all.rp_filter = 1 Mon Apr 14 22:40:00 UTC 2014 INIT: Entering runlevel: 5 Starting Xserver Starting system message bus: dbus. Starting Connection Manager Starting wpa_supplicant Successfully initialized wpa_supplicant Starting Dropbear SSH server [ 44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) [ 45.781364] fec 800f0000.ethernet eth0: MDIO read timeout [ 46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 47.811385] fec 800f0000.ethernet eth0: MDIO read timeout With a different kernel (3.14): [ 28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) [ 30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full [ 37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) [ 38.398346] fec 800f0000.ethernet eth0: MDIO read timeout [ 39.438412] fec 800f0000.ethernet eth0: MDIO read timeout [ 39.468419] fec 800f0000.ethernet eth0: MDIO write timeout [ 40.498848] fec 800f0000.ethernet eth0: MDIO read timeout Afterward I have to ifdown eth0, ifup eth0 and then it functions normally, without reverting the commit. root@cfa100xx:~# ifdown eth0 [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) root@cfa100xx:~# ifup eth0 udhcpc (v1.21.1) started Sending discover... [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full Sending discover... Sending select for 10.10.10.217... Lease of 10.10.10.217 obtained, lease time 86400 ip: RTNETLINK answers: File exists -- Brian On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König <u.kleine-koenig@pengutronix.de> wrote: > Hello Brian, > > On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote: >> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0 >> come up, then brought right back down with an MDIO rx timeout moments >> after. Adding back in the removed code keeps the interface alive and >> it's working afterward without trouble. I've tested the re-inserted >> code in 3.12, 3.14 without issue on our boards. > So you can reliably trigger that problem? You're just doing > > ifconfig eth0 1.2.3.4 up > > (or equivalent) and the interface goes down without further > interference with the above mentioned commit? The exact error you're > seeing is > > MDIO read timeout > > (with some prefix saying something about fec and eth0 I think)? > > This error is also present with a264b981f2 reverted, just doesn't affect > eth0 being functional? Does the timeout always happen, or only on > specific addresses? > > This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT? > >> Is there something else that can be done to prevent the MDIO timeouts? >> We are using basically the same schematic for networking as the >> imx28evk. > Hard to say, but assuming it works just fine on the imx28evk for you, > too, there seems to be some hardware difference that makes your machine > fail. (That doesn't mean it's not fixable in software.) > > I don't know if a mdio read error is intended to make the device go > down, maybe one the the netdev guys can answer that. > Assuming that it's not intended, instrument the code, find out how that > timeout makes your device go down and find the wrong branch. I'd start > with adding stackdumps when the mdio timeout happens and when > fec_enet_start_xmit is called with fep->link == 0. > > Best regards > Uwe > > -- > Pengutronix e.K. | Uwe Kleine-König | > Industrial Linux Solutions | http://www.pengutronix.de/ | ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: i.MX28 based system losing eth0 on boot 2014-05-06 19:12 ` Brian Lilly @ 2014-05-06 19:24 ` Florian Fainelli 2014-05-06 21:40 ` Brian Lilly 0 siblings, 1 reply; 15+ messages in thread From: Florian Fainelli @ 2014-05-06 19:24 UTC (permalink / raw) To: Brian Lilly Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam, Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel@vger.kernel.org, kernel 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: > It is happening during boot up: > > <snip, kernel 3.12 > > > Configuring network interfaces... [ 35.117114] fec 800f0000.ethernet > eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720] Note that the SMSC PHY driver is picked up here, and that specific driver implements a different phy_read_status() callback due to how the PHY operates. The PHY driver also overrides the config_init() callback to perform some PHY-specific initialization. See below for more. > (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) > [ 35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready > udhcpc (v1.21.1) started > > Sending discover... > > [ 37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full > [ 37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > Sending discover... > > Sending select for 10.10.10.217... > Lease of 10.10.10.217 obtained, lease time 86400 > /etc/udhcpc.d/50default: Adding DNS 10.10.10.13 > [ 39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready > done. > Starting rpcbind daemon...done. > net.ipv4.conf.default.rp_filter = 1 > net.ipv4.conf.all.rp_filter = 1 > Mon Apr 14 22:40:00 UTC 2014 > INIT: Entering runlevel: 5 > Starting Xserver > Starting system message bus: dbus. > Starting Connection Manager > Starting wpa_supplicant > Successfully initialized wpa_supplicant > Starting Dropbear SSH server > [ 44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver > [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) The correct PHY driver is selected here... > [ 45.781364] fec 800f0000.ethernet eth0: MDIO read timeout > [ 46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready > [ 47.811385] fec 800f0000.ethernet eth0: MDIO read timeout But we are still seeing MDIO read timeouts, which is not great. > > With a different kernel (3.14): > > [ 28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver > [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) > [ 30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full > [ 37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver > [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) Here, the Generic PHY driver has been selected, which will use the MII_BMSR register contents to determine the Link status and parameters. You might want to make sure that your board selects the appropriate PHY driver, such that we are not chasing two issues here. > [ 38.398346] fec 800f0000.ethernet eth0: MDIO read timeout > [ 39.438412] fec 800f0000.ethernet eth0: MDIO read timeout > [ 39.468419] fec 800f0000.ethernet eth0: MDIO write timeout > [ 40.498848] fec 800f0000.ethernet eth0: MDIO read timeout It would also be helpful to print the register that were accessed, such that you could correlate this with the exact steps in the PHY library state machine. Please also retry the experiment with the SMSC PHY driver enabled, as it does some PHY specific initialization that seems to be relevant. Then we are hopefully left with only the MDIO timeout issue and not the PHY mis-configuration + MDIO timeout. > > Afterward I have to ifdown eth0, ifup eth0 and then it functions > normally, without reverting the commit. > > root@cfa100xx:~# ifdown eth0 > [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver > [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) > root@cfa100xx:~# ifup eth0 > udhcpc (v1.21.1) started > Sending discover... > [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full > Sending discover... > Sending select for 10.10.10.217... > Lease of 10.10.10.217 obtained, lease time 86400 > ip: RTNETLINK answers: File exists > > -- > Brian > > > On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König > <u.kleine-koenig@pengutronix.de> wrote: >> Hello Brian, >> >> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote: >>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0 >>> come up, then brought right back down with an MDIO rx timeout moments >>> after. Adding back in the removed code keeps the interface alive and >>> it's working afterward without trouble. I've tested the re-inserted >>> code in 3.12, 3.14 without issue on our boards. >> So you can reliably trigger that problem? You're just doing >> >> ifconfig eth0 1.2.3.4 up >> >> (or equivalent) and the interface goes down without further >> interference with the above mentioned commit? The exact error you're >> seeing is >> >> MDIO read timeout >> >> (with some prefix saying something about fec and eth0 I think)? >> >> This error is also present with a264b981f2 reverted, just doesn't affect >> eth0 being functional? Does the timeout always happen, or only on >> specific addresses? >> >> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT? >> >>> Is there something else that can be done to prevent the MDIO timeouts? >>> We are using basically the same schematic for networking as the >>> imx28evk. >> Hard to say, but assuming it works just fine on the imx28evk for you, >> too, there seems to be some hardware difference that makes your machine >> fail. (That doesn't mean it's not fixable in software.) >> >> I don't know if a mdio read error is intended to make the device go >> down, maybe one the the netdev guys can answer that. >> Assuming that it's not intended, instrument the code, find out how that >> timeout makes your device go down and find the wrong branch. I'd start >> with adding stackdumps when the mdio timeout happens and when >> fec_enet_start_xmit is called with fep->link == 0. >> >> Best regards >> Uwe >> >> -- >> Pengutronix e.K. | Uwe Kleine-König | >> Industrial Linux Solutions | http://www.pengutronix.de/ | > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Florian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: i.MX28 based system losing eth0 on boot 2014-05-06 19:24 ` Florian Fainelli @ 2014-05-06 21:40 ` Brian Lilly 2014-05-06 22:06 ` Florian Fainelli 0 siblings, 1 reply; 15+ messages in thread From: Brian Lilly @ 2014-05-06 21:40 UTC (permalink / raw) To: Florian Fainelli Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam, Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel@vger.kernel.org, kernel The PHY on board is the SMSC LAN8720 With the generic PHY driver selected: http://pastebin.com/A4MH4Ptw [ 28.828761] fec 800f0000.ethernet eth0: Freescale FEC PHY driver [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) [ 28.840626] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 30.827536] libphy: 800f0000.etherne:00 - Link is Up - 100/Full [ 30.833739] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 32.986999] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready [ 37.316421] fec 800f0000.ethernet eth0: Freescale FEC PHY driver [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) [ 38.345047] fec 800f0000.ethernet eth0: MDIO read timeout [ 39.506210] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 40.374961] fec 800f0000.ethernet eth0: MDIO read timeout With the SMSC PHY driver selected: http://pastebin.com/DhdDyrMv [ 28.778974] fec 800f0000.ethernet eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) [ 28.791742] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 30.773078] libphy: 800f0000.etherne:00 - Link is Up - 100/Full [ 30.779286] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 32.934692] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready [ 37.242162] fec 800f0000.ethernet eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) [ 38.270611] fec 800f0000.ethernet eth0: MDIO read timeout [ 39.415256] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 40.300454] fec 800f0000.ethernet eth0: MDIO read timeout On Tue, May 6, 2014 at 12:24 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: > 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >> It is happening during boot up: >> >> <snip, kernel 3.12 > >> >> Configuring network interfaces... [ 35.117114] fec 800f0000.ethernet >> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720] > > Note that the SMSC PHY driver is picked up here, and that specific > driver implements a different phy_read_status() callback due to how > the PHY operates. The PHY driver also overrides the config_init() > callback to perform some PHY-specific initialization. See below for > more. > >> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >> [ 35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >> udhcpc (v1.21.1) started >> >> Sending discover... >> >> [ 37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >> [ 37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >> Sending discover... >> >> Sending select for 10.10.10.217... >> Lease of 10.10.10.217 obtained, lease time 86400 >> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13 >> [ 39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >> done. >> Starting rpcbind daemon...done. >> net.ipv4.conf.default.rp_filter = 1 >> net.ipv4.conf.all.rp_filter = 1 >> Mon Apr 14 22:40:00 UTC 2014 >> INIT: Entering runlevel: 5 >> Starting Xserver >> Starting system message bus: dbus. >> Starting Connection Manager >> Starting wpa_supplicant >> Successfully initialized wpa_supplicant >> Starting Dropbear SSH server >> [ 44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) > > The correct PHY driver is selected here... > >> [ 45.781364] fec 800f0000.ethernet eth0: MDIO read timeout >> [ 46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >> [ 47.811385] fec 800f0000.ethernet eth0: MDIO read timeout > > But we are still seeing MDIO read timeouts, which is not great. > >> >> With a different kernel (3.14): >> >> [ 28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >> [ 30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >> [ 37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) > > Here, the Generic PHY driver has been selected, which will use the > MII_BMSR register contents to determine the Link status and > parameters. You might want to make sure that your board selects the > appropriate PHY driver, such that we are not chasing two issues here. > >> [ 38.398346] fec 800f0000.ethernet eth0: MDIO read timeout >> [ 39.438412] fec 800f0000.ethernet eth0: MDIO read timeout >> [ 39.468419] fec 800f0000.ethernet eth0: MDIO write timeout >> [ 40.498848] fec 800f0000.ethernet eth0: MDIO read timeout > > It would also be helpful to print the register that were accessed, > such that you could correlate this with the exact steps in the PHY > library state machine. Please also retry the experiment with the SMSC > PHY driver enabled, as it does some PHY specific initialization that > seems to be relevant. Then we are hopefully left with only the MDIO > timeout issue and not the PHY mis-configuration + MDIO timeout. > >> >> Afterward I have to ifdown eth0, ifup eth0 and then it functions >> normally, without reverting the commit. >> >> root@cfa100xx:~# ifdown eth0 >> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >> root@cfa100xx:~# ifup eth0 >> udhcpc (v1.21.1) started >> Sending discover... >> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >> Sending discover... >> Sending select for 10.10.10.217... >> Lease of 10.10.10.217 obtained, lease time 86400 >> ip: RTNETLINK answers: File exists >> >> -- >> Brian >> >> >> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König >> <u.kleine-koenig@pengutronix.de> wrote: >>> Hello Brian, >>> >>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote: >>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0 >>>> come up, then brought right back down with an MDIO rx timeout moments >>>> after. Adding back in the removed code keeps the interface alive and >>>> it's working afterward without trouble. I've tested the re-inserted >>>> code in 3.12, 3.14 without issue on our boards. >>> So you can reliably trigger that problem? You're just doing >>> >>> ifconfig eth0 1.2.3.4 up >>> >>> (or equivalent) and the interface goes down without further >>> interference with the above mentioned commit? The exact error you're >>> seeing is >>> >>> MDIO read timeout >>> >>> (with some prefix saying something about fec and eth0 I think)? >>> >>> This error is also present with a264b981f2 reverted, just doesn't affect >>> eth0 being functional? Does the timeout always happen, or only on >>> specific addresses? >>> >>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT? >>> >>>> Is there something else that can be done to prevent the MDIO timeouts? >>>> We are using basically the same schematic for networking as the >>>> imx28evk. >>> Hard to say, but assuming it works just fine on the imx28evk for you, >>> too, there seems to be some hardware difference that makes your machine >>> fail. (That doesn't mean it's not fixable in software.) >>> >>> I don't know if a mdio read error is intended to make the device go >>> down, maybe one the the netdev guys can answer that. >>> Assuming that it's not intended, instrument the code, find out how that >>> timeout makes your device go down and find the wrong branch. I'd start >>> with adding stackdumps when the mdio timeout happens and when >>> fec_enet_start_xmit is called with fep->link == 0. >>> >>> Best regards >>> Uwe >>> >>> -- >>> Pengutronix e.K. | Uwe Kleine-König | >>> Industrial Linux Solutions | http://www.pengutronix.de/ | >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Florian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: i.MX28 based system losing eth0 on boot 2014-05-06 21:40 ` Brian Lilly @ 2014-05-06 22:06 ` Florian Fainelli 2014-05-06 22:27 ` Brian Lilly 0 siblings, 1 reply; 15+ messages in thread From: Florian Fainelli @ 2014-05-06 22:06 UTC (permalink / raw) To: Brian Lilly Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam, Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel@vger.kernel.org, kernel 2014-05-06 14:40 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: > The PHY on board is the SMSC LAN8720 > > With the generic PHY driver selected: http://pastebin.com/A4MH4Ptw > > [ 28.828761] fec 800f0000.ethernet eth0: Freescale FEC PHY driver > [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) > [ 28.840626] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready > [ 30.827536] libphy: 800f0000.etherne:00 - Link is Up - 100/Full > [ 30.833739] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > [ 32.986999] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready > [ 37.316421] fec 800f0000.ethernet eth0: Freescale FEC PHY driver > [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) > [ 38.345047] fec 800f0000.ethernet eth0: MDIO read timeout > [ 39.506210] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready > [ 40.374961] fec 800f0000.ethernet eth0: MDIO read timeout > > With the SMSC PHY driver selected: http://pastebin.com/DhdDyrMv > > [ 28.778974] fec 800f0000.ethernet eth0: Freescale FEC PHY driver > [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) > [ 28.791742] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready > [ 30.773078] libphy: 800f0000.etherne:00 - Link is Up - 100/Full > [ 30.779286] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > [ 32.934692] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready > [ 37.242162] fec 800f0000.ethernet eth0: Freescale FEC PHY driver > [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) > [ 38.270611] fec 800f0000.ethernet eth0: MDIO read timeout > [ 39.415256] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready > [ 40.300454] fec 800f0000.ethernet eth0: MDIO read timeout Thanks for trying this, at least this is consistent no matter which PHY driver we are using. Just to rule out a potential PHY power-down issue, could you try to revert the following commit be9dad1f9f26604fb71c0d53ccb39a8f1d425807 ("net: phy: suspend phydev when going to HALTED") and see if that works better for you? Thanks! > > On Tue, May 6, 2014 at 12:24 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: >> 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >>> It is happening during boot up: >>> >>> <snip, kernel 3.12 > >>> >>> Configuring network interfaces... [ 35.117114] fec 800f0000.ethernet >>> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720] >> >> Note that the SMSC PHY driver is picked up here, and that specific >> driver implements a different phy_read_status() callback due to how >> the PHY operates. The PHY driver also overrides the config_init() >> callback to perform some PHY-specific initialization. See below for >> more. >> >>> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>> [ 35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>> udhcpc (v1.21.1) started >>> >>> Sending discover... >>> >>> [ 37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>> [ 37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>> Sending discover... >>> >>> Sending select for 10.10.10.217... >>> Lease of 10.10.10.217 obtained, lease time 86400 >>> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13 >>> [ 39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >>> done. >>> Starting rpcbind daemon...done. >>> net.ipv4.conf.default.rp_filter = 1 >>> net.ipv4.conf.all.rp_filter = 1 >>> Mon Apr 14 22:40:00 UTC 2014 >>> INIT: Entering runlevel: 5 >>> Starting Xserver >>> Starting system message bus: dbus. >>> Starting Connection Manager >>> Starting wpa_supplicant >>> Successfully initialized wpa_supplicant >>> Starting Dropbear SSH server >>> [ 44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >> >> The correct PHY driver is selected here... >> >>> [ 45.781364] fec 800f0000.ethernet eth0: MDIO read timeout >>> [ 46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>> [ 47.811385] fec 800f0000.ethernet eth0: MDIO read timeout >> >> But we are still seeing MDIO read timeouts, which is not great. >> >>> >>> With a different kernel (3.14): >>> >>> [ 28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>> [ 30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>> [ 37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >> >> Here, the Generic PHY driver has been selected, which will use the >> MII_BMSR register contents to determine the Link status and >> parameters. You might want to make sure that your board selects the >> appropriate PHY driver, such that we are not chasing two issues here. >> >>> [ 38.398346] fec 800f0000.ethernet eth0: MDIO read timeout >>> [ 39.438412] fec 800f0000.ethernet eth0: MDIO read timeout >>> [ 39.468419] fec 800f0000.ethernet eth0: MDIO write timeout >>> [ 40.498848] fec 800f0000.ethernet eth0: MDIO read timeout >> >> It would also be helpful to print the register that were accessed, >> such that you could correlate this with the exact steps in the PHY >> library state machine. Please also retry the experiment with the SMSC >> PHY driver enabled, as it does some PHY specific initialization that >> seems to be relevant. Then we are hopefully left with only the MDIO >> timeout issue and not the PHY mis-configuration + MDIO timeout. >> >>> >>> Afterward I have to ifdown eth0, ifup eth0 and then it functions >>> normally, without reverting the commit. >>> >>> root@cfa100xx:~# ifdown eth0 >>> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>> root@cfa100xx:~# ifup eth0 >>> udhcpc (v1.21.1) started >>> Sending discover... >>> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>> Sending discover... >>> Sending select for 10.10.10.217... >>> Lease of 10.10.10.217 obtained, lease time 86400 >>> ip: RTNETLINK answers: File exists >>> >>> -- >>> Brian >>> >>> >>> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König >>> <u.kleine-koenig@pengutronix.de> wrote: >>>> Hello Brian, >>>> >>>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote: >>>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0 >>>>> come up, then brought right back down with an MDIO rx timeout moments >>>>> after. Adding back in the removed code keeps the interface alive and >>>>> it's working afterward without trouble. I've tested the re-inserted >>>>> code in 3.12, 3.14 without issue on our boards. >>>> So you can reliably trigger that problem? You're just doing >>>> >>>> ifconfig eth0 1.2.3.4 up >>>> >>>> (or equivalent) and the interface goes down without further >>>> interference with the above mentioned commit? The exact error you're >>>> seeing is >>>> >>>> MDIO read timeout >>>> >>>> (with some prefix saying something about fec and eth0 I think)? >>>> >>>> This error is also present with a264b981f2 reverted, just doesn't affect >>>> eth0 being functional? Does the timeout always happen, or only on >>>> specific addresses? >>>> >>>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT? >>>> >>>>> Is there something else that can be done to prevent the MDIO timeouts? >>>>> We are using basically the same schematic for networking as the >>>>> imx28evk. >>>> Hard to say, but assuming it works just fine on the imx28evk for you, >>>> too, there seems to be some hardware difference that makes your machine >>>> fail. (That doesn't mean it's not fixable in software.) >>>> >>>> I don't know if a mdio read error is intended to make the device go >>>> down, maybe one the the netdev guys can answer that. >>>> Assuming that it's not intended, instrument the code, find out how that >>>> timeout makes your device go down and find the wrong branch. I'd start >>>> with adding stackdumps when the mdio timeout happens and when >>>> fec_enet_start_xmit is called with fep->link == 0. >>>> >>>> Best regards >>>> Uwe >>>> >>>> -- >>>> Pengutronix e.K. | Uwe Kleine-König | >>>> Industrial Linux Solutions | http://www.pengutronix.de/ | >>> -- >>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> Florian -- Florian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: i.MX28 based system losing eth0 on boot 2014-05-06 22:06 ` Florian Fainelli @ 2014-05-06 22:27 ` Brian Lilly 2014-05-07 3:07 ` Florian Fainelli 0 siblings, 1 reply; 15+ messages in thread From: Brian Lilly @ 2014-05-06 22:27 UTC (permalink / raw) To: Florian Fainelli Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam, Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel@vger.kernel.org, kernel It would appear that I don't have that commit. I could move to 3.14 to see if it makes a difference, but the last couple of responses have been on 3.12.18 -- or perhaps I'm missing something else. Please let me know if you have any questions. Thank you. Brian Lilly Crystalfontz America, Incorporated 12412 East Saltese Road Spokane Valley, WA 99216 brian@crystalfontz.com http://www.crystalfontz.com Twitter: @Crystalfontz US toll-free (888) 206-9720 voice (509) 892-1200 On Tue, May 6, 2014 at 3:06 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: > 2014-05-06 14:40 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >> The PHY on board is the SMSC LAN8720 >> >> With the generic PHY driver selected: http://pastebin.com/A4MH4Ptw >> >> [ 28.828761] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >> [ 28.840626] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >> [ 30.827536] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >> [ 30.833739] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >> [ 32.986999] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >> [ 37.316421] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >> [ 38.345047] fec 800f0000.ethernet eth0: MDIO read timeout >> [ 39.506210] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >> [ 40.374961] fec 800f0000.ethernet eth0: MDIO read timeout >> >> With the SMSC PHY driver selected: http://pastebin.com/DhdDyrMv >> >> [ 28.778974] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >> [ 28.791742] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >> [ 30.773078] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >> [ 30.779286] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >> [ 32.934692] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >> [ 37.242162] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >> [ 38.270611] fec 800f0000.ethernet eth0: MDIO read timeout >> [ 39.415256] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >> [ 40.300454] fec 800f0000.ethernet eth0: MDIO read timeout > > Thanks for trying this, at least this is consistent no matter which > PHY driver we are using. Just to rule out a potential PHY power-down > issue, could you try to revert the following commit > be9dad1f9f26604fb71c0d53ccb39a8f1d425807 ("net: phy: suspend phydev > when going to HALTED") and see if that works better for you? > > Thanks! > >> >> On Tue, May 6, 2014 at 12:24 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: >>> 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >>>> It is happening during boot up: >>>> >>>> <snip, kernel 3.12 > >>>> >>>> Configuring network interfaces... [ 35.117114] fec 800f0000.ethernet >>>> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720] >>> >>> Note that the SMSC PHY driver is picked up here, and that specific >>> driver implements a different phy_read_status() callback due to how >>> the PHY operates. The PHY driver also overrides the config_init() >>> callback to perform some PHY-specific initialization. See below for >>> more. >>> >>>> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>> [ 35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>> udhcpc (v1.21.1) started >>>> >>>> Sending discover... >>>> >>>> [ 37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>> [ 37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>>> Sending discover... >>>> >>>> Sending select for 10.10.10.217... >>>> Lease of 10.10.10.217 obtained, lease time 86400 >>>> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13 >>>> [ 39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >>>> done. >>>> Starting rpcbind daemon...done. >>>> net.ipv4.conf.default.rp_filter = 1 >>>> net.ipv4.conf.all.rp_filter = 1 >>>> Mon Apr 14 22:40:00 UTC 2014 >>>> INIT: Entering runlevel: 5 >>>> Starting Xserver >>>> Starting system message bus: dbus. >>>> Starting Connection Manager >>>> Starting wpa_supplicant >>>> Successfully initialized wpa_supplicant >>>> Starting Dropbear SSH server >>>> [ 44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>> >>> The correct PHY driver is selected here... >>> >>>> [ 45.781364] fec 800f0000.ethernet eth0: MDIO read timeout >>>> [ 46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>> [ 47.811385] fec 800f0000.ethernet eth0: MDIO read timeout >>> >>> But we are still seeing MDIO read timeouts, which is not great. >>> >>>> >>>> With a different kernel (3.14): >>>> >>>> [ 28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>> [ 30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>> [ 37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>> >>> Here, the Generic PHY driver has been selected, which will use the >>> MII_BMSR register contents to determine the Link status and >>> parameters. You might want to make sure that your board selects the >>> appropriate PHY driver, such that we are not chasing two issues here. >>> >>>> [ 38.398346] fec 800f0000.ethernet eth0: MDIO read timeout >>>> [ 39.438412] fec 800f0000.ethernet eth0: MDIO read timeout >>>> [ 39.468419] fec 800f0000.ethernet eth0: MDIO write timeout >>>> [ 40.498848] fec 800f0000.ethernet eth0: MDIO read timeout >>> >>> It would also be helpful to print the register that were accessed, >>> such that you could correlate this with the exact steps in the PHY >>> library state machine. Please also retry the experiment with the SMSC >>> PHY driver enabled, as it does some PHY specific initialization that >>> seems to be relevant. Then we are hopefully left with only the MDIO >>> timeout issue and not the PHY mis-configuration + MDIO timeout. >>> >>>> >>>> Afterward I have to ifdown eth0, ifup eth0 and then it functions >>>> normally, without reverting the commit. >>>> >>>> root@cfa100xx:~# ifdown eth0 >>>> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>> root@cfa100xx:~# ifup eth0 >>>> udhcpc (v1.21.1) started >>>> Sending discover... >>>> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>> Sending discover... >>>> Sending select for 10.10.10.217... >>>> Lease of 10.10.10.217 obtained, lease time 86400 >>>> ip: RTNETLINK answers: File exists >>>> >>>> -- >>>> Brian >>>> >>>> >>>> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König >>>> <u.kleine-koenig@pengutronix.de> wrote: >>>>> Hello Brian, >>>>> >>>>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote: >>>>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0 >>>>>> come up, then brought right back down with an MDIO rx timeout moments >>>>>> after. Adding back in the removed code keeps the interface alive and >>>>>> it's working afterward without trouble. I've tested the re-inserted >>>>>> code in 3.12, 3.14 without issue on our boards. >>>>> So you can reliably trigger that problem? You're just doing >>>>> >>>>> ifconfig eth0 1.2.3.4 up >>>>> >>>>> (or equivalent) and the interface goes down without further >>>>> interference with the above mentioned commit? The exact error you're >>>>> seeing is >>>>> >>>>> MDIO read timeout >>>>> >>>>> (with some prefix saying something about fec and eth0 I think)? >>>>> >>>>> This error is also present with a264b981f2 reverted, just doesn't affect >>>>> eth0 being functional? Does the timeout always happen, or only on >>>>> specific addresses? >>>>> >>>>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT? >>>>> >>>>>> Is there something else that can be done to prevent the MDIO timeouts? >>>>>> We are using basically the same schematic for networking as the >>>>>> imx28evk. >>>>> Hard to say, but assuming it works just fine on the imx28evk for you, >>>>> too, there seems to be some hardware difference that makes your machine >>>>> fail. (That doesn't mean it's not fixable in software.) >>>>> >>>>> I don't know if a mdio read error is intended to make the device go >>>>> down, maybe one the the netdev guys can answer that. >>>>> Assuming that it's not intended, instrument the code, find out how that >>>>> timeout makes your device go down and find the wrong branch. I'd start >>>>> with adding stackdumps when the mdio timeout happens and when >>>>> fec_enet_start_xmit is called with fep->link == 0. >>>>> >>>>> Best regards >>>>> Uwe >>>>> >>>>> -- >>>>> Pengutronix e.K. | Uwe Kleine-König | >>>>> Industrial Linux Solutions | http://www.pengutronix.de/ | >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> >>> -- >>> Florian > > > > -- > Florian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: i.MX28 based system losing eth0 on boot 2014-05-06 22:27 ` Brian Lilly @ 2014-05-07 3:07 ` Florian Fainelli 2014-05-07 19:16 ` Brian Lilly 0 siblings, 1 reply; 15+ messages in thread From: Florian Fainelli @ 2014-05-07 3:07 UTC (permalink / raw) To: Brian Lilly Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam, Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel@vger.kernel.org, kernel 2014-05-06 15:27 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: > It would appear that I don't have that commit. I could move to 3.14 > to see if it makes a difference, but the last couple of responses have > been on 3.12.18 -- or perhaps I'm missing something else. I did miss that you were also seeing the problem in 3.12. At that point, I believe that the driver was working around a potential PHY bug that is not covered by the SMSC PHY driver, or that the MDIO timeout is simply not long enough, or that your MDIO interrupts fire much longer than what the timeout allows, or that these interrupts are not reliable. You could probably try to ignore the timeout and see if you get sensible data out of the MDIO bus regardless. > Please let me know if you have any questions. > > Thank you. > > Brian Lilly > Crystalfontz America, Incorporated > 12412 East Saltese Road > Spokane Valley, WA 99216 > brian@crystalfontz.com http://www.crystalfontz.com > Twitter: @Crystalfontz > US toll-free (888) 206-9720 voice (509) 892-1200 > > > On Tue, May 6, 2014 at 3:06 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: >> 2014-05-06 14:40 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >>> The PHY on board is the SMSC LAN8720 >>> >>> With the generic PHY driver selected: http://pastebin.com/A4MH4Ptw >>> >>> [ 28.828761] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>> [ 28.840626] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>> [ 30.827536] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>> [ 30.833739] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>> [ 32.986999] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >>> [ 37.316421] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>> [ 38.345047] fec 800f0000.ethernet eth0: MDIO read timeout >>> [ 39.506210] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>> [ 40.374961] fec 800f0000.ethernet eth0: MDIO read timeout >>> >>> With the SMSC PHY driver selected: http://pastebin.com/DhdDyrMv >>> >>> [ 28.778974] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>> [ 28.791742] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>> [ 30.773078] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>> [ 30.779286] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>> [ 32.934692] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >>> [ 37.242162] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>> [ 38.270611] fec 800f0000.ethernet eth0: MDIO read timeout >>> [ 39.415256] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>> [ 40.300454] fec 800f0000.ethernet eth0: MDIO read timeout >> >> Thanks for trying this, at least this is consistent no matter which >> PHY driver we are using. Just to rule out a potential PHY power-down >> issue, could you try to revert the following commit >> be9dad1f9f26604fb71c0d53ccb39a8f1d425807 ("net: phy: suspend phydev >> when going to HALTED") and see if that works better for you? >> >> Thanks! >> >>> >>> On Tue, May 6, 2014 at 12:24 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: >>>> 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >>>>> It is happening during boot up: >>>>> >>>>> <snip, kernel 3.12 > >>>>> >>>>> Configuring network interfaces... [ 35.117114] fec 800f0000.ethernet >>>>> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720] >>>> >>>> Note that the SMSC PHY driver is picked up here, and that specific >>>> driver implements a different phy_read_status() callback due to how >>>> the PHY operates. The PHY driver also overrides the config_init() >>>> callback to perform some PHY-specific initialization. See below for >>>> more. >>>> >>>>> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>> [ 35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>> udhcpc (v1.21.1) started >>>>> >>>>> Sending discover... >>>>> >>>>> [ 37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>> [ 37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>>>> Sending discover... >>>>> >>>>> Sending select for 10.10.10.217... >>>>> Lease of 10.10.10.217 obtained, lease time 86400 >>>>> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13 >>>>> [ 39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >>>>> done. >>>>> Starting rpcbind daemon...done. >>>>> net.ipv4.conf.default.rp_filter = 1 >>>>> net.ipv4.conf.all.rp_filter = 1 >>>>> Mon Apr 14 22:40:00 UTC 2014 >>>>> INIT: Entering runlevel: 5 >>>>> Starting Xserver >>>>> Starting system message bus: dbus. >>>>> Starting Connection Manager >>>>> Starting wpa_supplicant >>>>> Successfully initialized wpa_supplicant >>>>> Starting Dropbear SSH server >>>>> [ 44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>> >>>> The correct PHY driver is selected here... >>>> >>>>> [ 45.781364] fec 800f0000.ethernet eth0: MDIO read timeout >>>>> [ 46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>> [ 47.811385] fec 800f0000.ethernet eth0: MDIO read timeout >>>> >>>> But we are still seeing MDIO read timeouts, which is not great. >>>> >>>>> >>>>> With a different kernel (3.14): >>>>> >>>>> [ 28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>> [ 30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>> [ 37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>> >>>> Here, the Generic PHY driver has been selected, which will use the >>>> MII_BMSR register contents to determine the Link status and >>>> parameters. You might want to make sure that your board selects the >>>> appropriate PHY driver, such that we are not chasing two issues here. >>>> >>>>> [ 38.398346] fec 800f0000.ethernet eth0: MDIO read timeout >>>>> [ 39.438412] fec 800f0000.ethernet eth0: MDIO read timeout >>>>> [ 39.468419] fec 800f0000.ethernet eth0: MDIO write timeout >>>>> [ 40.498848] fec 800f0000.ethernet eth0: MDIO read timeout >>>> >>>> It would also be helpful to print the register that were accessed, >>>> such that you could correlate this with the exact steps in the PHY >>>> library state machine. Please also retry the experiment with the SMSC >>>> PHY driver enabled, as it does some PHY specific initialization that >>>> seems to be relevant. Then we are hopefully left with only the MDIO >>>> timeout issue and not the PHY mis-configuration + MDIO timeout. >>>> >>>>> >>>>> Afterward I have to ifdown eth0, ifup eth0 and then it functions >>>>> normally, without reverting the commit. >>>>> >>>>> root@cfa100xx:~# ifdown eth0 >>>>> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>> root@cfa100xx:~# ifup eth0 >>>>> udhcpc (v1.21.1) started >>>>> Sending discover... >>>>> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>> Sending discover... >>>>> Sending select for 10.10.10.217... >>>>> Lease of 10.10.10.217 obtained, lease time 86400 >>>>> ip: RTNETLINK answers: File exists >>>>> >>>>> -- >>>>> Brian >>>>> >>>>> >>>>> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König >>>>> <u.kleine-koenig@pengutronix.de> wrote: >>>>>> Hello Brian, >>>>>> >>>>>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote: >>>>>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0 >>>>>>> come up, then brought right back down with an MDIO rx timeout moments >>>>>>> after. Adding back in the removed code keeps the interface alive and >>>>>>> it's working afterward without trouble. I've tested the re-inserted >>>>>>> code in 3.12, 3.14 without issue on our boards. >>>>>> So you can reliably trigger that problem? You're just doing >>>>>> >>>>>> ifconfig eth0 1.2.3.4 up >>>>>> >>>>>> (or equivalent) and the interface goes down without further >>>>>> interference with the above mentioned commit? The exact error you're >>>>>> seeing is >>>>>> >>>>>> MDIO read timeout >>>>>> >>>>>> (with some prefix saying something about fec and eth0 I think)? >>>>>> >>>>>> This error is also present with a264b981f2 reverted, just doesn't affect >>>>>> eth0 being functional? Does the timeout always happen, or only on >>>>>> specific addresses? >>>>>> >>>>>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT? >>>>>> >>>>>>> Is there something else that can be done to prevent the MDIO timeouts? >>>>>>> We are using basically the same schematic for networking as the >>>>>>> imx28evk. >>>>>> Hard to say, but assuming it works just fine on the imx28evk for you, >>>>>> too, there seems to be some hardware difference that makes your machine >>>>>> fail. (That doesn't mean it's not fixable in software.) >>>>>> >>>>>> I don't know if a mdio read error is intended to make the device go >>>>>> down, maybe one the the netdev guys can answer that. >>>>>> Assuming that it's not intended, instrument the code, find out how that >>>>>> timeout makes your device go down and find the wrong branch. I'd start >>>>>> with adding stackdumps when the mdio timeout happens and when >>>>>> fec_enet_start_xmit is called with fep->link == 0. >>>>>> >>>>>> Best regards >>>>>> Uwe >>>>>> >>>>>> -- >>>>>> Pengutronix e.K. | Uwe Kleine-König | >>>>>> Industrial Linux Solutions | http://www.pengutronix.de/ | >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>>> >>>> -- >>>> Florian >> >> >> >> -- >> Florian -- Florian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: i.MX28 based system losing eth0 on boot 2014-05-07 3:07 ` Florian Fainelli @ 2014-05-07 19:16 ` Brian Lilly 2014-05-07 19:34 ` Florian Fainelli 0 siblings, 1 reply; 15+ messages in thread From: Brian Lilly @ 2014-05-07 19:16 UTC (permalink / raw) To: Florian Fainelli Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam, Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel@vger.kernel.org, kernel Also, in 3.14, commenting out both "return -ETIMEDOUT" instances in fec_main.c results in a working interface. Please let me know if you have any questions. Thank you. Brian Lilly Crystalfontz America, Incorporated 12412 East Saltese Road Spokane Valley, WA 99216 brian@crystalfontz.com http://www.crystalfontz.com Twitter: @Crystalfontz US toll-free (888) 206-9720 voice (509) 892-1200 On Tue, May 6, 2014 at 8:07 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: > 2014-05-06 15:27 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >> It would appear that I don't have that commit. I could move to 3.14 >> to see if it makes a difference, but the last couple of responses have >> been on 3.12.18 -- or perhaps I'm missing something else. > > I did miss that you were also seeing the problem in 3.12. At that > point, I believe that the driver was working around a potential PHY > bug that is not covered by the SMSC PHY driver, or that the MDIO > timeout is simply not long enough, or that your MDIO interrupts fire > much longer than what the timeout allows, or that these interrupts are > not reliable. > > You could probably try to ignore the timeout and see if you get > sensible data out of the MDIO bus regardless. > >> Please let me know if you have any questions. >> >> Thank you. >> >> Brian Lilly >> Crystalfontz America, Incorporated >> 12412 East Saltese Road >> Spokane Valley, WA 99216 >> brian@crystalfontz.com http://www.crystalfontz.com >> Twitter: @Crystalfontz >> US toll-free (888) 206-9720 voice (509) 892-1200 >> >> >> On Tue, May 6, 2014 at 3:06 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: >>> 2014-05-06 14:40 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >>>> The PHY on board is the SMSC LAN8720 >>>> >>>> With the generic PHY driver selected: http://pastebin.com/A4MH4Ptw >>>> >>>> [ 28.828761] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>> [ 28.840626] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>> [ 30.827536] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>> [ 30.833739] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>>> [ 32.986999] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >>>> [ 37.316421] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>> [ 38.345047] fec 800f0000.ethernet eth0: MDIO read timeout >>>> [ 39.506210] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>> [ 40.374961] fec 800f0000.ethernet eth0: MDIO read timeout >>>> >>>> With the SMSC PHY driver selected: http://pastebin.com/DhdDyrMv >>>> >>>> [ 28.778974] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>> [ 28.791742] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>> [ 30.773078] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>> [ 30.779286] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>>> [ 32.934692] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >>>> [ 37.242162] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>> [ 38.270611] fec 800f0000.ethernet eth0: MDIO read timeout >>>> [ 39.415256] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>> [ 40.300454] fec 800f0000.ethernet eth0: MDIO read timeout >>> >>> Thanks for trying this, at least this is consistent no matter which >>> PHY driver we are using. Just to rule out a potential PHY power-down >>> issue, could you try to revert the following commit >>> be9dad1f9f26604fb71c0d53ccb39a8f1d425807 ("net: phy: suspend phydev >>> when going to HALTED") and see if that works better for you? >>> >>> Thanks! >>> >>>> >>>> On Tue, May 6, 2014 at 12:24 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: >>>>> 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >>>>>> It is happening during boot up: >>>>>> >>>>>> <snip, kernel 3.12 > >>>>>> >>>>>> Configuring network interfaces... [ 35.117114] fec 800f0000.ethernet >>>>>> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720] >>>>> >>>>> Note that the SMSC PHY driver is picked up here, and that specific >>>>> driver implements a different phy_read_status() callback due to how >>>>> the PHY operates. The PHY driver also overrides the config_init() >>>>> callback to perform some PHY-specific initialization. See below for >>>>> more. >>>>> >>>>>> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>> [ 35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>>> udhcpc (v1.21.1) started >>>>>> >>>>>> Sending discover... >>>>>> >>>>>> [ 37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>>> [ 37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>>>>> Sending discover... >>>>>> >>>>>> Sending select for 10.10.10.217... >>>>>> Lease of 10.10.10.217 obtained, lease time 86400 >>>>>> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13 >>>>>> [ 39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >>>>>> done. >>>>>> Starting rpcbind daemon...done. >>>>>> net.ipv4.conf.default.rp_filter = 1 >>>>>> net.ipv4.conf.all.rp_filter = 1 >>>>>> Mon Apr 14 22:40:00 UTC 2014 >>>>>> INIT: Entering runlevel: 5 >>>>>> Starting Xserver >>>>>> Starting system message bus: dbus. >>>>>> Starting Connection Manager >>>>>> Starting wpa_supplicant >>>>>> Successfully initialized wpa_supplicant >>>>>> Starting Dropbear SSH server >>>>>> [ 44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>> >>>>> The correct PHY driver is selected here... >>>>> >>>>>> [ 45.781364] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>> [ 46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>>> [ 47.811385] fec 800f0000.ethernet eth0: MDIO read timeout >>>>> >>>>> But we are still seeing MDIO read timeouts, which is not great. >>>>> >>>>>> >>>>>> With a different kernel (3.14): >>>>>> >>>>>> [ 28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>> [ 30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>>> [ 37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>> >>>>> Here, the Generic PHY driver has been selected, which will use the >>>>> MII_BMSR register contents to determine the Link status and >>>>> parameters. You might want to make sure that your board selects the >>>>> appropriate PHY driver, such that we are not chasing two issues here. >>>>> >>>>>> [ 38.398346] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>> [ 39.438412] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>> [ 39.468419] fec 800f0000.ethernet eth0: MDIO write timeout >>>>>> [ 40.498848] fec 800f0000.ethernet eth0: MDIO read timeout >>>>> >>>>> It would also be helpful to print the register that were accessed, >>>>> such that you could correlate this with the exact steps in the PHY >>>>> library state machine. Please also retry the experiment with the SMSC >>>>> PHY driver enabled, as it does some PHY specific initialization that >>>>> seems to be relevant. Then we are hopefully left with only the MDIO >>>>> timeout issue and not the PHY mis-configuration + MDIO timeout. >>>>> >>>>>> >>>>>> Afterward I have to ifdown eth0, ifup eth0 and then it functions >>>>>> normally, without reverting the commit. >>>>>> >>>>>> root@cfa100xx:~# ifdown eth0 >>>>>> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>> root@cfa100xx:~# ifup eth0 >>>>>> udhcpc (v1.21.1) started >>>>>> Sending discover... >>>>>> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>>> Sending discover... >>>>>> Sending select for 10.10.10.217... >>>>>> Lease of 10.10.10.217 obtained, lease time 86400 >>>>>> ip: RTNETLINK answers: File exists >>>>>> >>>>>> -- >>>>>> Brian >>>>>> >>>>>> >>>>>> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König >>>>>> <u.kleine-koenig@pengutronix.de> wrote: >>>>>>> Hello Brian, >>>>>>> >>>>>>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote: >>>>>>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0 >>>>>>>> come up, then brought right back down with an MDIO rx timeout moments >>>>>>>> after. Adding back in the removed code keeps the interface alive and >>>>>>>> it's working afterward without trouble. I've tested the re-inserted >>>>>>>> code in 3.12, 3.14 without issue on our boards. >>>>>>> So you can reliably trigger that problem? You're just doing >>>>>>> >>>>>>> ifconfig eth0 1.2.3.4 up >>>>>>> >>>>>>> (or equivalent) and the interface goes down without further >>>>>>> interference with the above mentioned commit? The exact error you're >>>>>>> seeing is >>>>>>> >>>>>>> MDIO read timeout >>>>>>> >>>>>>> (with some prefix saying something about fec and eth0 I think)? >>>>>>> >>>>>>> This error is also present with a264b981f2 reverted, just doesn't affect >>>>>>> eth0 being functional? Does the timeout always happen, or only on >>>>>>> specific addresses? >>>>>>> >>>>>>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT? >>>>>>> >>>>>>>> Is there something else that can be done to prevent the MDIO timeouts? >>>>>>>> We are using basically the same schematic for networking as the >>>>>>>> imx28evk. >>>>>>> Hard to say, but assuming it works just fine on the imx28evk for you, >>>>>>> too, there seems to be some hardware difference that makes your machine >>>>>>> fail. (That doesn't mean it's not fixable in software.) >>>>>>> >>>>>>> I don't know if a mdio read error is intended to make the device go >>>>>>> down, maybe one the the netdev guys can answer that. >>>>>>> Assuming that it's not intended, instrument the code, find out how that >>>>>>> timeout makes your device go down and find the wrong branch. I'd start >>>>>>> with adding stackdumps when the mdio timeout happens and when >>>>>>> fec_enet_start_xmit is called with fep->link == 0. >>>>>>> >>>>>>> Best regards >>>>>>> Uwe >>>>>>> >>>>>>> -- >>>>>>> Pengutronix e.K. | Uwe Kleine-König | >>>>>>> Industrial Linux Solutions | http://www.pengutronix.de/ | >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>>>> >>>>> -- >>>>> Florian >>> >>> >>> >>> -- >>> Florian > > > > -- > Florian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: i.MX28 based system losing eth0 on boot 2014-05-07 19:16 ` Brian Lilly @ 2014-05-07 19:34 ` Florian Fainelli 2014-05-07 19:51 ` Brian Lilly 0 siblings, 1 reply; 15+ messages in thread From: Florian Fainelli @ 2014-05-07 19:34 UTC (permalink / raw) To: Brian Lilly Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam, Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel@vger.kernel.org, kernel 2014-05-07 12:16 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: > Also, in 3.14, commenting out both "return -ETIMEDOUT" instances in > fec_main.c results in a working interface. > Please let me know if you have any questions. At this point, you could probably instrument the interrupt handler and see if you get FEC_MDIO interrupt causes at all? > > Thank you. > > Brian Lilly > Crystalfontz America, Incorporated > 12412 East Saltese Road > Spokane Valley, WA 99216 > brian@crystalfontz.com http://www.crystalfontz.com > Twitter: @Crystalfontz > US toll-free (888) 206-9720 voice (509) 892-1200 > > > On Tue, May 6, 2014 at 8:07 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: >> 2014-05-06 15:27 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >>> It would appear that I don't have that commit. I could move to 3.14 >>> to see if it makes a difference, but the last couple of responses have >>> been on 3.12.18 -- or perhaps I'm missing something else. >> >> I did miss that you were also seeing the problem in 3.12. At that >> point, I believe that the driver was working around a potential PHY >> bug that is not covered by the SMSC PHY driver, or that the MDIO >> timeout is simply not long enough, or that your MDIO interrupts fire >> much longer than what the timeout allows, or that these interrupts are >> not reliable. >> >> You could probably try to ignore the timeout and see if you get >> sensible data out of the MDIO bus regardless. >> >>> Please let me know if you have any questions. >>> >>> Thank you. >>> >>> Brian Lilly >>> Crystalfontz America, Incorporated >>> 12412 East Saltese Road >>> Spokane Valley, WA 99216 >>> brian@crystalfontz.com http://www.crystalfontz.com >>> Twitter: @Crystalfontz >>> US toll-free (888) 206-9720 voice (509) 892-1200 >>> >>> >>> On Tue, May 6, 2014 at 3:06 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: >>>> 2014-05-06 14:40 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >>>>> The PHY on board is the SMSC LAN8720 >>>>> >>>>> With the generic PHY driver selected: http://pastebin.com/A4MH4Ptw >>>>> >>>>> [ 28.828761] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>> [ 28.840626] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>> [ 30.827536] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>> [ 30.833739] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>>>> [ 32.986999] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >>>>> [ 37.316421] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>> [ 38.345047] fec 800f0000.ethernet eth0: MDIO read timeout >>>>> [ 39.506210] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>> [ 40.374961] fec 800f0000.ethernet eth0: MDIO read timeout >>>>> >>>>> With the SMSC PHY driver selected: http://pastebin.com/DhdDyrMv >>>>> >>>>> [ 28.778974] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>> [ 28.791742] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>> [ 30.773078] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>> [ 30.779286] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>>>> [ 32.934692] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >>>>> [ 37.242162] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>> [ 38.270611] fec 800f0000.ethernet eth0: MDIO read timeout >>>>> [ 39.415256] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>> [ 40.300454] fec 800f0000.ethernet eth0: MDIO read timeout >>>> >>>> Thanks for trying this, at least this is consistent no matter which >>>> PHY driver we are using. Just to rule out a potential PHY power-down >>>> issue, could you try to revert the following commit >>>> be9dad1f9f26604fb71c0d53ccb39a8f1d425807 ("net: phy: suspend phydev >>>> when going to HALTED") and see if that works better for you? >>>> >>>> Thanks! >>>> >>>>> >>>>> On Tue, May 6, 2014 at 12:24 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: >>>>>> 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >>>>>>> It is happening during boot up: >>>>>>> >>>>>>> <snip, kernel 3.12 > >>>>>>> >>>>>>> Configuring network interfaces... [ 35.117114] fec 800f0000.ethernet >>>>>>> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720] >>>>>> >>>>>> Note that the SMSC PHY driver is picked up here, and that specific >>>>>> driver implements a different phy_read_status() callback due to how >>>>>> the PHY operates. The PHY driver also overrides the config_init() >>>>>> callback to perform some PHY-specific initialization. See below for >>>>>> more. >>>>>> >>>>>>> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>>> [ 35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>>>> udhcpc (v1.21.1) started >>>>>>> >>>>>>> Sending discover... >>>>>>> >>>>>>> [ 37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>>>> [ 37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>>>>>> Sending discover... >>>>>>> >>>>>>> Sending select for 10.10.10.217... >>>>>>> Lease of 10.10.10.217 obtained, lease time 86400 >>>>>>> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13 >>>>>>> [ 39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >>>>>>> done. >>>>>>> Starting rpcbind daemon...done. >>>>>>> net.ipv4.conf.default.rp_filter = 1 >>>>>>> net.ipv4.conf.all.rp_filter = 1 >>>>>>> Mon Apr 14 22:40:00 UTC 2014 >>>>>>> INIT: Entering runlevel: 5 >>>>>>> Starting Xserver >>>>>>> Starting system message bus: dbus. >>>>>>> Starting Connection Manager >>>>>>> Starting wpa_supplicant >>>>>>> Successfully initialized wpa_supplicant >>>>>>> Starting Dropbear SSH server >>>>>>> [ 44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>> >>>>>> The correct PHY driver is selected here... >>>>>> >>>>>>> [ 45.781364] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>>> [ 46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>>>> [ 47.811385] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>> >>>>>> But we are still seeing MDIO read timeouts, which is not great. >>>>>> >>>>>>> >>>>>>> With a different kernel (3.14): >>>>>>> >>>>>>> [ 28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>>> [ 30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>>>> [ 37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>> >>>>>> Here, the Generic PHY driver has been selected, which will use the >>>>>> MII_BMSR register contents to determine the Link status and >>>>>> parameters. You might want to make sure that your board selects the >>>>>> appropriate PHY driver, such that we are not chasing two issues here. >>>>>> >>>>>>> [ 38.398346] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>>> [ 39.438412] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>>> [ 39.468419] fec 800f0000.ethernet eth0: MDIO write timeout >>>>>>> [ 40.498848] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>> >>>>>> It would also be helpful to print the register that were accessed, >>>>>> such that you could correlate this with the exact steps in the PHY >>>>>> library state machine. Please also retry the experiment with the SMSC >>>>>> PHY driver enabled, as it does some PHY specific initialization that >>>>>> seems to be relevant. Then we are hopefully left with only the MDIO >>>>>> timeout issue and not the PHY mis-configuration + MDIO timeout. >>>>>> >>>>>>> >>>>>>> Afterward I have to ifdown eth0, ifup eth0 and then it functions >>>>>>> normally, without reverting the commit. >>>>>>> >>>>>>> root@cfa100xx:~# ifdown eth0 >>>>>>> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>>> root@cfa100xx:~# ifup eth0 >>>>>>> udhcpc (v1.21.1) started >>>>>>> Sending discover... >>>>>>> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>>>> Sending discover... >>>>>>> Sending select for 10.10.10.217... >>>>>>> Lease of 10.10.10.217 obtained, lease time 86400 >>>>>>> ip: RTNETLINK answers: File exists >>>>>>> >>>>>>> -- >>>>>>> Brian >>>>>>> >>>>>>> >>>>>>> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König >>>>>>> <u.kleine-koenig@pengutronix.de> wrote: >>>>>>>> Hello Brian, >>>>>>>> >>>>>>>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote: >>>>>>>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0 >>>>>>>>> come up, then brought right back down with an MDIO rx timeout moments >>>>>>>>> after. Adding back in the removed code keeps the interface alive and >>>>>>>>> it's working afterward without trouble. I've tested the re-inserted >>>>>>>>> code in 3.12, 3.14 without issue on our boards. >>>>>>>> So you can reliably trigger that problem? You're just doing >>>>>>>> >>>>>>>> ifconfig eth0 1.2.3.4 up >>>>>>>> >>>>>>>> (or equivalent) and the interface goes down without further >>>>>>>> interference with the above mentioned commit? The exact error you're >>>>>>>> seeing is >>>>>>>> >>>>>>>> MDIO read timeout >>>>>>>> >>>>>>>> (with some prefix saying something about fec and eth0 I think)? >>>>>>>> >>>>>>>> This error is also present with a264b981f2 reverted, just doesn't affect >>>>>>>> eth0 being functional? Does the timeout always happen, or only on >>>>>>>> specific addresses? >>>>>>>> >>>>>>>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT? >>>>>>>> >>>>>>>>> Is there something else that can be done to prevent the MDIO timeouts? >>>>>>>>> We are using basically the same schematic for networking as the >>>>>>>>> imx28evk. >>>>>>>> Hard to say, but assuming it works just fine on the imx28evk for you, >>>>>>>> too, there seems to be some hardware difference that makes your machine >>>>>>>> fail. (That doesn't mean it's not fixable in software.) >>>>>>>> >>>>>>>> I don't know if a mdio read error is intended to make the device go >>>>>>>> down, maybe one the the netdev guys can answer that. >>>>>>>> Assuming that it's not intended, instrument the code, find out how that >>>>>>>> timeout makes your device go down and find the wrong branch. I'd start >>>>>>>> with adding stackdumps when the mdio timeout happens and when >>>>>>>> fec_enet_start_xmit is called with fep->link == 0. >>>>>>>> >>>>>>>> Best regards >>>>>>>> Uwe >>>>>>>> >>>>>>>> -- >>>>>>>> Pengutronix e.K. | Uwe Kleine-König | >>>>>>>> Industrial Linux Solutions | http://www.pengutronix.de/ | >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Florian >>>> >>>> >>>> >>>> -- >>>> Florian >> >> >> >> -- >> Florian -- Florian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: i.MX28 based system losing eth0 on boot 2014-05-07 19:34 ` Florian Fainelli @ 2014-05-07 19:51 ` Brian Lilly 2014-05-08 1:47 ` fugang.duan 0 siblings, 1 reply; 15+ messages in thread From: Brian Lilly @ 2014-05-07 19:51 UTC (permalink / raw) To: Florian Fainelli Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam, Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel@vger.kernel.org, kernel Florian: Thank you for your help. After doubling the timeout length it worked. I managed to get my hands on a imx28evk board and compared our component load versus theirs, to find they have a 1.5k pull-up on ENET_MDIO to +3.3v which wasn't present on our board. Adding a 1.5k pull-up resistor on ENET_MDIO solves the problem, and boots as expected without patching anything. Sorry for the trouble on this. Apparently our EE had some question as to whether or not the pull-up was necessary, and put it in the schematic, and the footprint on the board, but marked it as a DNP, which of course left it off the board and out of the BOM. <facepalm> On Wed, May 7, 2014 at 12:34 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: > 2014-05-07 12:16 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >> Also, in 3.14, commenting out both "return -ETIMEDOUT" instances in >> fec_main.c results in a working interface. >> Please let me know if you have any questions. > > At this point, you could probably instrument the interrupt handler and > see if you get FEC_MDIO interrupt causes at all? > >> >> Thank you. >> >> Brian Lilly >> Crystalfontz America, Incorporated >> 12412 East Saltese Road >> Spokane Valley, WA 99216 >> brian@crystalfontz.com http://www.crystalfontz.com >> Twitter: @Crystalfontz >> US toll-free (888) 206-9720 voice (509) 892-1200 >> >> >> On Tue, May 6, 2014 at 8:07 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: >>> 2014-05-06 15:27 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >>>> It would appear that I don't have that commit. I could move to 3.14 >>>> to see if it makes a difference, but the last couple of responses have >>>> been on 3.12.18 -- or perhaps I'm missing something else. >>> >>> I did miss that you were also seeing the problem in 3.12. At that >>> point, I believe that the driver was working around a potential PHY >>> bug that is not covered by the SMSC PHY driver, or that the MDIO >>> timeout is simply not long enough, or that your MDIO interrupts fire >>> much longer than what the timeout allows, or that these interrupts are >>> not reliable. >>> >>> You could probably try to ignore the timeout and see if you get >>> sensible data out of the MDIO bus regardless. >>> >>>> Please let me know if you have any questions. >>>> >>>> Thank you. >>>> >>>> Brian Lilly >>>> Crystalfontz America, Incorporated >>>> 12412 East Saltese Road >>>> Spokane Valley, WA 99216 >>>> brian@crystalfontz.com http://www.crystalfontz.com >>>> Twitter: @Crystalfontz >>>> US toll-free (888) 206-9720 voice (509) 892-1200 >>>> >>>> >>>> On Tue, May 6, 2014 at 3:06 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: >>>>> 2014-05-06 14:40 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >>>>>> The PHY on board is the SMSC LAN8720 >>>>>> >>>>>> With the generic PHY driver selected: http://pastebin.com/A4MH4Ptw >>>>>> >>>>>> [ 28.828761] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>> [ 28.840626] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>>> [ 30.827536] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>>> [ 30.833739] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>>>>> [ 32.986999] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >>>>>> [ 37.316421] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>> [ 38.345047] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>> [ 39.506210] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>>> [ 40.374961] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>> >>>>>> With the SMSC PHY driver selected: http://pastebin.com/DhdDyrMv >>>>>> >>>>>> [ 28.778974] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>> [ 28.791742] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>>> [ 30.773078] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>>> [ 30.779286] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>>>>> [ 32.934692] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >>>>>> [ 37.242162] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>> [ 38.270611] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>> [ 39.415256] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>>> [ 40.300454] fec 800f0000.ethernet eth0: MDIO read timeout >>>>> >>>>> Thanks for trying this, at least this is consistent no matter which >>>>> PHY driver we are using. Just to rule out a potential PHY power-down >>>>> issue, could you try to revert the following commit >>>>> be9dad1f9f26604fb71c0d53ccb39a8f1d425807 ("net: phy: suspend phydev >>>>> when going to HALTED") and see if that works better for you? >>>>> >>>>> Thanks! >>>>> >>>>>> >>>>>> On Tue, May 6, 2014 at 12:24 PM, Florian Fainelli <f.fainelli@gmail.com> wrote: >>>>>>> 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>: >>>>>>>> It is happening during boot up: >>>>>>>> >>>>>>>> <snip, kernel 3.12 > >>>>>>>> >>>>>>>> Configuring network interfaces... [ 35.117114] fec 800f0000.ethernet >>>>>>>> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720] >>>>>>> >>>>>>> Note that the SMSC PHY driver is picked up here, and that specific >>>>>>> driver implements a different phy_read_status() callback due to how >>>>>>> the PHY operates. The PHY driver also overrides the config_init() >>>>>>> callback to perform some PHY-specific initialization. See below for >>>>>>> more. >>>>>>> >>>>>>>> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>>>> [ 35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>>>>> udhcpc (v1.21.1) started >>>>>>>> >>>>>>>> Sending discover... >>>>>>>> >>>>>>>> [ 37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>>>>> [ 37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >>>>>>>> Sending discover... >>>>>>>> >>>>>>>> Sending select for 10.10.10.217... >>>>>>>> Lease of 10.10.10.217 obtained, lease time 86400 >>>>>>>> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13 >>>>>>>> [ 39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready >>>>>>>> done. >>>>>>>> Starting rpcbind daemon...done. >>>>>>>> net.ipv4.conf.default.rp_filter = 1 >>>>>>>> net.ipv4.conf.all.rp_filter = 1 >>>>>>>> Mon Apr 14 22:40:00 UTC 2014 >>>>>>>> INIT: Entering runlevel: 5 >>>>>>>> Starting Xserver >>>>>>>> Starting system message bus: dbus. >>>>>>>> Starting Connection Manager >>>>>>>> Starting wpa_supplicant >>>>>>>> Successfully initialized wpa_supplicant >>>>>>>> Starting Dropbear SSH server >>>>>>>> [ 44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>>> >>>>>>> The correct PHY driver is selected here... >>>>>>> >>>>>>>> [ 45.781364] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>>>> [ 46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready >>>>>>>> [ 47.811385] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>>> >>>>>>> But we are still seeing MDIO read timeouts, which is not great. >>>>>>> >>>>>>>> >>>>>>>> With a different kernel (3.14): >>>>>>>> >>>>>>>> [ 28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>>>> [ 30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>>>>> [ 37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>>> >>>>>>> Here, the Generic PHY driver has been selected, which will use the >>>>>>> MII_BMSR register contents to determine the Link status and >>>>>>> parameters. You might want to make sure that your board selects the >>>>>>> appropriate PHY driver, such that we are not chasing two issues here. >>>>>>> >>>>>>>> [ 38.398346] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>>>> [ 39.438412] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>>>> [ 39.468419] fec 800f0000.ethernet eth0: MDIO write timeout >>>>>>>> [ 40.498848] fec 800f0000.ethernet eth0: MDIO read timeout >>>>>>> >>>>>>> It would also be helpful to print the register that were accessed, >>>>>>> such that you could correlate this with the exact steps in the PHY >>>>>>> library state machine. Please also retry the experiment with the SMSC >>>>>>> PHY driver enabled, as it does some PHY specific initialization that >>>>>>> seems to be relevant. Then we are hopefully left with only the MDIO >>>>>>> timeout issue and not the PHY mis-configuration + MDIO timeout. >>>>>>> >>>>>>>> >>>>>>>> Afterward I have to ifdown eth0, ifup eth0 and then it functions >>>>>>>> normally, without reverting the commit. >>>>>>>> >>>>>>>> root@cfa100xx:~# ifdown eth0 >>>>>>>> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver >>>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1) >>>>>>>> root@cfa100xx:~# ifup eth0 >>>>>>>> udhcpc (v1.21.1) started >>>>>>>> Sending discover... >>>>>>>> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full >>>>>>>> Sending discover... >>>>>>>> Sending select for 10.10.10.217... >>>>>>>> Lease of 10.10.10.217 obtained, lease time 86400 >>>>>>>> ip: RTNETLINK answers: File exists >>>>>>>> >>>>>>>> -- >>>>>>>> Brian >>>>>>>> >>>>>>>> >>>>>>>> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König >>>>>>>> <u.kleine-koenig@pengutronix.de> wrote: >>>>>>>>> Hello Brian, >>>>>>>>> >>>>>>>>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote: >>>>>>>>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0 >>>>>>>>>> come up, then brought right back down with an MDIO rx timeout moments >>>>>>>>>> after. Adding back in the removed code keeps the interface alive and >>>>>>>>>> it's working afterward without trouble. I've tested the re-inserted >>>>>>>>>> code in 3.12, 3.14 without issue on our boards. >>>>>>>>> So you can reliably trigger that problem? You're just doing >>>>>>>>> >>>>>>>>> ifconfig eth0 1.2.3.4 up >>>>>>>>> >>>>>>>>> (or equivalent) and the interface goes down without further >>>>>>>>> interference with the above mentioned commit? The exact error you're >>>>>>>>> seeing is >>>>>>>>> >>>>>>>>> MDIO read timeout >>>>>>>>> >>>>>>>>> (with some prefix saying something about fec and eth0 I think)? >>>>>>>>> >>>>>>>>> This error is also present with a264b981f2 reverted, just doesn't affect >>>>>>>>> eth0 being functional? Does the timeout always happen, or only on >>>>>>>>> specific addresses? >>>>>>>>> >>>>>>>>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT? >>>>>>>>> >>>>>>>>>> Is there something else that can be done to prevent the MDIO timeouts? >>>>>>>>>> We are using basically the same schematic for networking as the >>>>>>>>>> imx28evk. >>>>>>>>> Hard to say, but assuming it works just fine on the imx28evk for you, >>>>>>>>> too, there seems to be some hardware difference that makes your machine >>>>>>>>> fail. (That doesn't mean it's not fixable in software.) >>>>>>>>> >>>>>>>>> I don't know if a mdio read error is intended to make the device go >>>>>>>>> down, maybe one the the netdev guys can answer that. >>>>>>>>> Assuming that it's not intended, instrument the code, find out how that >>>>>>>>> timeout makes your device go down and find the wrong branch. I'd start >>>>>>>>> with adding stackdumps when the mdio timeout happens and when >>>>>>>>> fec_enet_start_xmit is called with fep->link == 0. >>>>>>>>> >>>>>>>>> Best regards >>>>>>>>> Uwe >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Pengutronix e.K. | Uwe Kleine-König | >>>>>>>>> Industrial Linux Solutions | http://www.pengutronix.de/ | >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Florian >>>>> >>>>> >>>>> >>>>> -- >>>>> Florian >>> >>> >>> >>> -- >>> Florian > > > > -- > Florian ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: i.MX28 based system losing eth0 on boot 2014-05-07 19:51 ` Brian Lilly @ 2014-05-08 1:47 ` fugang.duan 0 siblings, 0 replies; 15+ messages in thread From: fugang.duan @ 2014-05-08 1:47 UTC (permalink / raw) To: Brian Lilly, Florian Fainelli Cc: Uwe Kleine-König, David S. Miller, Fabio.Estevam@freescale.com, Jim Baxter, Frank.Li@freescale.com, netdev, linux-kernel@vger.kernel.org, kernel From: Brian Lilly <brian@crystalfontz.com> Data: Thursday, May 08, 2014 3:52 AM >To: Florian Fainelli >Cc: Uwe Kleine-König; David S. Miller; Estevam Fabio-R49496; Jim Baxter; Li Frank- >B20596; Duan Fugang-B38611; netdev; linux-kernel@vger.kernel.org; kernel >Subject: Re: i.MX28 based system losing eth0 on boot > >Florian: > >Thank you for your help. > >After doubling the timeout length it worked. > >I managed to get my hands on a imx28evk board and compared our component load >versus theirs, to find they have a 1.5k pull-up on ENET_MDIO to +3.3v which wasn't >present on our board. Adding a 1.5k pull-up resistor on ENET_MDIO solves the >problem, and boots as expected without patching anything. > >Sorry for the trouble on this. > >Apparently our EE had some question as to whether or not the pull-up was necessary, >and put it in the schematic, and the footprint on the board, but marked it as a >DNP, which of course left it off the board and out of the BOM. [...] Yes, 1.5K pull-up on MDIO is necessary, otherwise write/read phy register data is not right due to the drive strength is not enough. Thanks, Andy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: i.MX28 based system losing eth0 on boot 2014-05-06 16:44 i.MX28 based system losing eth0 on boot Brian Lilly 2014-05-06 18:11 ` Uwe Kleine-König @ 2014-05-07 3:17 ` Fabio Estevam 2014-05-07 19:00 ` Brian Lilly 1 sibling, 1 reply; 15+ messages in thread From: Fabio Estevam @ 2014-05-07 3:17 UTC (permalink / raw) To: Brian Lilly Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam, Jim Baxter, Frank Li, Fugang Duan, netdev@vger.kernel.org, linux-kernel Brian, On Tue, May 6, 2014 at 1:44 PM, Brian Lilly <brian@crystalfontz.com> wrote: > Uwe: > > With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0 > come up, then brought right back down with an MDIO rx timeout moments > after. Adding back in the removed code keeps the interface alive and > it's working afterward without trouble. I've tested the re-inserted > code in 3.12, 3.14 without issue on our boards. > > Is there something else that can be done to prevent the MDIO timeouts? > We are using basically the same schematic for networking as the > imx28evk. > > Any thoughts on how to resolve this? Could you try the latest Russell's FEC patches available at? http://ftp.arm.linux.org.uk/cgit/linux-arm.git/log/?h=fec-testing In particular this one could help with your "MDIO timeout" issue: http://ftp.arm.linux.org.uk/cgit/linux-arm.git/commit/?h=fec-testing&id=ec1fac3de70b16c69d3edc9f223e91d56b1915de ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: i.MX28 based system losing eth0 on boot 2014-05-07 3:17 ` Fabio Estevam @ 2014-05-07 19:00 ` Brian Lilly 0 siblings, 0 replies; 15+ messages in thread From: Brian Lilly @ 2014-05-07 19:00 UTC (permalink / raw) To: Fabio Estevam Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam, Jim Baxter, Frank Li, Fugang Duan, netdev@vger.kernel.org, linux-kernel Moving forward to 3.15.0-rc4 merged with Russell's FEC patches makes it much more noisy (http://pastebin.com/17TyyMPn): Populating dev cache Configuring network interfaces... [ 26.268156] fec 800f0000.ethernet eth0: MDIO write timeout [ 26.298087] fec 800f0000.ethernet eth0: MDIO read timeout [ 26.328074] fec 800f0000.ethernet eth0: MDIO write timeout [ 26.358077] fec 800f0000.ethernet eth0: MDIO read timeout [ 26.388070] fec 800f0000.ethernet eth0: MDIO write timeout [ 26.393631] fec 800f0000.ethernet eth0: could not attach to PHY ip: SIOCSIFFLAGS: Connection timed out Starting rpcbind daemon...rpcbind: cannot create socket for udp6 rpcbind: cannot create socket for tcp6 done. net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.all.rp_filter = 1 INIT: Entering runlevel: 5 Starting Xserver Starting system message bus: dbus. Starting Connection Manager Starting wpa_supplicant Successfully initialized wpa_supplicant Starting Dropbear SSH server: dropbear. starting Busybox UDHCP Server: u[ 31.129045] fec 800f0000.ethernet eth0: MDIO write timeout dhcpd... [ 31.158388] fec 800f0000.ethernet eth0: MDIO read timeout [ 31.188437] fec 800f0000.ethernet eth0: MDIO write timeout done. [ 31.218260] fec 800f0000.ethernet eth0: MDIO read timeout [ 31.248256] fec 800f0000.ethernet eth0: MDIO write timeout [ 31.253830] fec 800f0000.ethernet eth0: could not attach to PHY Starting syslogd/klogd: done from dmesg: [ 26.268156] fec 800f0000.ethernet eth0: MDIO write timeout [ 26.298087] fec 800f0000.ethernet eth0: MDIO read timeout [ 26.328074] fec 800f0000.ethernet eth0: MDIO write timeout [ 26.358077] fec 800f0000.ethernet eth0: MDIO read timeout [ 26.388070] fec 800f0000.ethernet eth0: MDIO write timeout [ 26.393631] fec 800f0000.ethernet eth0: could not attach to PHY [ 31.129045] fec 800f0000.ethernet eth0: MDIO write timeout [ 31.158388] fec 800f0000.ethernet eth0: MDIO read timeout [ 31.188437] fec 800f0000.ethernet eth0: MDIO write timeout [ 31.218260] fec 800f0000.ethernet eth0: MDIO read timeout [ 31.248256] fec 800f0000.ethernet eth0: MDIO write timeout [ 31.253830] fec 800f0000.ethernet eth0: could not attach to PHY I can go back and cull the timeout bits in 3.12 or 3.14 and report back if you think that it'd be helpful ... Please let me know if you have any questions. Thank you. Brian Lilly Crystalfontz America, Incorporated 12412 East Saltese Road Spokane Valley, WA 99216 brian@crystalfontz.com http://www.crystalfontz.com Twitter: @Crystalfontz US toll-free (888) 206-9720 voice (509) 892-1200 On Tue, May 6, 2014 at 8:17 PM, Fabio Estevam <festevam@gmail.com> wrote: > Brian, > > On Tue, May 6, 2014 at 1:44 PM, Brian Lilly <brian@crystalfontz.com> wrote: >> Uwe: >> >> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0 >> come up, then brought right back down with an MDIO rx timeout moments >> after. Adding back in the removed code keeps the interface alive and >> it's working afterward without trouble. I've tested the re-inserted >> code in 3.12, 3.14 without issue on our boards. >> >> Is there something else that can be done to prevent the MDIO timeouts? >> We are using basically the same schematic for networking as the >> imx28evk. >> >> Any thoughts on how to resolve this? > > Could you try the latest Russell's FEC patches available at? > http://ftp.arm.linux.org.uk/cgit/linux-arm.git/log/?h=fec-testing > > In particular this one could help with your "MDIO timeout" issue: > http://ftp.arm.linux.org.uk/cgit/linux-arm.git/commit/?h=fec-testing&id=ec1fac3de70b16c69d3edc9f223e91d56b1915de ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-05-08 1:47 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-06 16:44 i.MX28 based system losing eth0 on boot Brian Lilly 2014-05-06 18:11 ` Uwe Kleine-König 2014-05-06 18:39 ` Florian Fainelli 2014-05-06 19:12 ` Brian Lilly 2014-05-06 19:24 ` Florian Fainelli 2014-05-06 21:40 ` Brian Lilly 2014-05-06 22:06 ` Florian Fainelli 2014-05-06 22:27 ` Brian Lilly 2014-05-07 3:07 ` Florian Fainelli 2014-05-07 19:16 ` Brian Lilly 2014-05-07 19:34 ` Florian Fainelli 2014-05-07 19:51 ` Brian Lilly 2014-05-08 1:47 ` fugang.duan 2014-05-07 3:17 ` Fabio Estevam 2014-05-07 19:00 ` Brian Lilly
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).