netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Holler <holler@ahsoftware.de>
To: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>,
	Michal Simek <michal.simek@xilinx.com>,
	David Miller <davem@davemloft.net>
Subject: Bug(s) with netconsole (using mv643xx_eth on Kirkwood)
Date: Thu, 03 Apr 2014 19:58:03 +0200	[thread overview]
Message-ID: <533DA12B.8090904@ahsoftware.de> (raw)
In-Reply-To: <533D8518.1010306@ahsoftware.de>

(I've changed the topic and removed stable@ from the cc-list to reflect 
the current status)

(Long mail, but hopefully a good problem description)

I already knew about problems with netconsole and mv643xx_eth since
4 years, but didn't care a lot because everything else worked flawless,
I even had forgotten that I've enabled netconsole. (But the bugs I've
experienced 4 years ago, seeing no msgs remotely from netconsole seem to
have disappeared).

But now, using 3.14, I hit a bug which killed the ethernet with a 100%
success rate, and, after digging a bit, I've come to the conclusion
that netconsole (together with a maybe broken initialization of the PHY) 
is the source of the problem.

The kernel is 3.14 (mainline) with one reverted patch (7cd1463). This 
patch changed the initialization of the PHY such, that the ethernet dies 
100% reproducible on a Kirkwood 88F6281 based machine. Reverting that 
patch gives me a oneline bug-enabler:

------
diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c 
b/drivers/net/ethernet/marvell/mv643xx_eth.c
index e891b48..246f065 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -2095,7 +2095,8 @@ static void port_start(struct mv643xx_eth_private *mp)
                 struct ethtool_cmd cmd;

                 mv643xx_eth_get_settings(mp->dev, &cmd);
-               phy_reset(mp);
+               //phy_reset(mp);
+               phy_init_hw(mp->phy);
                 mv643xx_eth_set_settings(mp->dev, &cmd);
                 phy_start(mp->phy);
         }
------

First I describe what happens at boot:

- Bootloader (U-Boot) enables (somehow) the network such that is usable 
as a console for the bootloader,
- Kernel is loaded and started with netconsole enabled through the 
kernel command line (netconsole=...),
- eth driver probe => PHY reset
- netconsole initializes the network (netpoll_setup) => PHY reset,
- userland starts,
- userland configures network (ip addr add fixedIP ..., a hack used for 
a very early ntpdate before the rootfs becomes rw), I'm not sure if 
that's end up again in a PHY reset.
- userland starts network by using dhcpcd => PHY reset

Now several use cases:

Case 1:
Using plain 3.14 the last step fails with no carrier, because the PHY 
ends up in a never ending reset (BMCR_RESET always set) in 
m88e1111_config_init() called by phy_init_hw() in port_start() in 
mv643xx_eth.

Case 2:
Without enabling netconsole through the kernel command line, I see no 
problems.

Case 3:
If I enable the old phy_reset() in mv643xx_eth, I see no problems.

Case 4:
If I reduce the time the newly used reset in phy_init_hw() spends in
calling mdelay(500) twice to some milliseconds m88e1111_config_init by 
polling for a cleared BMCR_RESET, I see no problems.

Case 5:
If I disable the initialization of the network in the bootloader, 
netconsole even worked 4 years ago. But I haven't looked into that case 
further, because I always want to use the network as a console for the 
bootloader.


Current assumption:

So, after having spend too much time into diagnosing the above stuff (so 
I was right in ignoring the non-working netconsole for 4 years), I've 
comed to the conclusion that some synchronization between 
netconsole/netpoll and the normal network stack or mv643xx_eth is 
missing. That would explain why the PHY ends up in a never ending reset 
and why this only happens reproducible if the PHY reset needs a whole 
second by using mdelay(500) twice (which likely is used to switch
the task to netconsole inbetween). It might be a hw problem too (I 
haven't read the datasheet or looked for any erratas).

I hope everyone who missed some more information is happy now, otherwise
I (again) wasted time to type a problem description (not to speak about 
the already spent time trying to diagnose the problem)

So go on and try to take the almost low hanging fruit. I'm not sure if I
will spend more time on that topic as I already have a working 
patch/workaround and the discussion has become a bit tiresome. Sorry.

Regards,

Alexander Holler

  reply	other threads:[~2014-04-03 17:58 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-01 23:55 [PATCH regression] net: phy: fix initialization (config_init) for Marvel 88E1116R PHYs Alexander Holler
2014-04-02  0:00 ` Florian Fainelli
2014-04-02  0:57 ` Florian Fainelli
2014-04-02  9:09   ` Alexander Holler
2014-04-02 10:54     ` Alexander Holler
2014-04-02 19:01     ` Florian Fainelli
2014-04-02 20:25       ` Sebastian Hesselbarth
2014-04-02 22:12         ` Alexander Holler
2014-04-02 22:20           ` Florian Fainelli
2014-04-02 22:27           ` Sebastian Hesselbarth
2014-04-03  7:17             ` Alexander Holler
2014-04-03  8:49               ` Sebastian Hesselbarth
2014-04-03 15:06                 ` Alexander Holler
2014-04-03 15:14                   ` David Miller
2014-04-03 15:45                     ` Alexander Holler
2014-04-03 15:45                   ` Sebastian Hesselbarth
2014-04-03 15:58                     ` Alexander Holler
2014-04-03 17:58                       ` Alexander Holler [this message]
2014-04-03 18:21                         ` Bug(s) with netconsole (using mv643xx_eth on Kirkwood) Sebastian Hesselbarth
2014-04-03 18:23                           ` Alexander Holler
2014-04-03 18:39                           ` Alexander Holler
2014-04-03 18:44                             ` Florian Fainelli
2014-04-04 11:36                               ` Alexander Holler
2014-04-03 17:44                   ` [PATCH regression] net: phy: fix initialization (config_init) for Marvel 88E1116R PHYs Sebastian Hesselbarth
2014-04-03 18:20                     ` Alexander Holler
2014-04-02 22:30     ` Sebastian Hesselbarth
2014-04-02 11:51 ` Sergei Shtylyov
2014-04-02 12:07   ` Sergei Shtylyov
2014-04-02 14:35     ` Alexander Holler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=533DA12B.8090904@ahsoftware.de \
    --to=holler@ahsoftware.de \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michal.simek@xilinx.com \
    --cc=netdev@vger.kernel.org \
    --cc=sebastian.hesselbarth@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).