From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
To: linux-arm-kernel@lists.infradead.org,
Fabio Estevam <festevam@gmail.com>,
netdev@vger.kernel.org, Andrew Lunn <andrew@lunn.ch>,
Florian Fainelli <f.fainelli@gmail.com>,
Heiner Kallweit <hkallweit1@gmail.com>
Subject: [BUG] fec mdio times out under system stress
Date: Sun, 11 Aug 2019 14:37:07 +0100 [thread overview]
Message-ID: <20190811133707.GC13294@shell.armlinux.org.uk> (raw)
Hi Fabio,
When I woke up this morning, I found that one of the Hummingboards
had gone offline (as in, lost network link) during the night.
Investigating, I find that the system had gone into OOM, and at
that time, triggered an unrelated:
[4111697.698776] fec 2188000.ethernet eth0: MDIO read timeout
[4111697.712996] MII_DATA: 0x6006796d
[4111697.729415] MII_SPEED: 0x0000001a
[4111697.745232] IEVENT: 0x00000000
[4111697.745242] IMASK: 0x0a8000aa
[4111698.002233] Atheros 8035 ethernet 2188000.ethernet-1:00: PHY state change RUNNING -> HALTED
[4111698.009882] fec 2188000.ethernet eth0: Link is Down
This is on a dual-core iMX6.
It looks like the read actually completed (since MII_DATA contains
the register data) but we somehow lost the interrupt (or maybe
received the interrupt after wait_for_completion_timeout() timed
out.)
From what I can see, the OOM events happened on CPU1, CPU1 was
allocated the FEC interrupt, and the PHY polling that suffered the
MDIO timeout was on CPU0.
Given that IEVENT is zero, it seems that CPU1 had read serviced the
interrupt, but it is not clear how far through processing that it
was - it may be that fec_enet_interrupt() had been delayed by the
OOM condition.
This seems rather fragile - as the system slowing down due to OOM
triggers the network to completely collapse by phylib taking the
PHY offline, making the system inaccessible except through the
console.
In my case, even serial console wasn't operational (except for
magic sysrq). Not sure what agetty was playing at... so the only
way I could recover any information from the system was to connect
the HDMI and plug in a USB keyboard.
Any thoughts on how FEC MDIO accesses could be made more robust?
Maybe phylib should retry a number of times - but with read-sensitive
registers, if the read has already completed successfully, and its
just a problem with the FEC MDIO hardware, that could cause issues.
Thanks.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
next reply other threads:[~2019-08-11 13:37 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-11 13:37 Russell King - ARM Linux admin [this message]
2019-08-11 14:06 ` [BUG] fec mdio times out under system stress Russell King - ARM Linux admin
2019-08-11 14:54 ` Vladimir Oltean
2019-08-11 16:03 ` Andrew Lunn
2019-08-11 16:31 ` Andrew Lunn
2019-08-11 16:44 ` Andrew Lunn
2019-08-12 15:10 ` Fabio Estevam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190811133707.GC13294@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=andrew@lunn.ch \
--cc=f.fainelli@gmail.com \
--cc=festevam@gmail.com \
--cc=hkallweit1@gmail.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).