Re: [PATCH net] net: phy: make phy_error() report which PHY has failed - Russell King

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
To: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net] net: phy: make phy_error() report which PHY has failed
Date: Wed, 18 Dec 2019 22:09:08 +0000	[thread overview]
Message-ID: <20191218220908.GX25745@shell.armlinux.org.uk> (raw)
In-Reply-To: <61f23d43-1c4d-a11e-a798-c938a896ddb3@gmail.com>

On Wed, Dec 18, 2019 at 09:54:32PM +0100, Heiner Kallweit wrote:
> On 18.12.2019 00:34, Russell King - ARM Linux admin wrote:
> > On Tue, Dec 17, 2019 at 10:41:34PM +0100, Heiner Kallweit wrote:
> >> On 17.12.2019 13:53, Russell King wrote:
> >>> phy_error() is called from phy_interrupt() or phy_state_machine(), and
> >>> uses WARN_ON() to print a backtrace. The backtrace is not useful when
> >>> reporting a PHY error.
> >>>
> >>> However, a system may contain multiple ethernet PHYs, and phy_error()
> >>> gives no clue which one caused the problem.
> >>>
> >>> Replace WARN_ON() with a call to phydev_err() so that we can see which
> >>> PHY had an error, and also inform the user that we are halting the PHY.
> >>>
> >>> Fixes: fa7b28c11bbf ("net: phy: print stack trace in phy_error")
> >>> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> >>> ---
> >>> There is another related problem in this area. If an error is detected
> >>> while the PHY is running, phy_error() moves to PHY_HALTED state. If we
> >>> try to take the network device down, then:
> >>>
> >>> void phy_stop(struct phy_device *phydev)
> >>> {
> >>>         if (!phy_is_started(phydev)) {
> >>>                 WARN(1, "called from state %s\n",
> >>>                      phy_state_to_str(phydev->state));
> >>>                 return;
> >>>         }
> >>>
> >>> triggers, and we never do any of the phy_stop() cleanup. I'm not sure
> >>> what the best way to solve this is - introducing a PHY_ERROR state may
> >>> be a solution, but I think we want some phy_is_started() sites to
> >>> return true for it and others to return false.
> >>>
> >>> Heiner - you introduced the above warning, could you look at improving
> >>> this case so we don't print a warning and taint the kernel when taking
> >>> a network device down after phy_error() please?
> >>>
> >> I think we need both types of information:
> >> - the affected PHY device
> >> - the stack trace to see where the issue was triggered
> > 
> > Can you please explain why the stack trace is useful.  For the paths
> > that are reachable, all it tells you is whether it was reached via
> > the interrupt or the workqueue.
> > 
> > If it's via the interrupt, the rest of the backtrace beyond that is
> > irrelevant.  If it's the workqueue, the backtrace doesn't go back
> > very far, and doesn't tell you what operation triggered it.
> > 
> > If it's important to see where or why phy_error() was called, there
> > are much better ways of doing that, notably passing a string into
> > phy_error() to describe the actual error itself.  That would convey
> > way more useful information than the backtrace does.
> > 
> > I have been faced with these backtraces, and they have not been at
> > all useful for diagnosing the problem.
> > 
> "The problem" comes in two flavors:
> 1. The problem that caused the PHY error
> 2. The problem caused by the PHY error (if we decide to not
>    always switch to HALTED state)
> 
> We can't do much for case 1, maybe we could add an errno argument
> to phy_error(). To facilitate analyzing case 2 we'd need to change
> code pieces like the following.
> 
> case a:
> err = f1();
> case b:
> err = f2();
> 
> if (err)
> 	phy_error()
> 
> For my understanding: What caused the PHY error in your case(s)?
> Which info would have been useful for analyzing the error?

Errors reading/writing from the PHY.

The problem with a backtrace from phy_error() is it doesn't tell you
where the error actually occurred, it only tells you where the error
is reported - which is one of two different paths at the moment.
That can be achieved with much more elegance and simplicity by
passing a string into phy_error() to describe the call site if that's
even relevant.

I would say, however, that knowing where the error occurred would be
far better information.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

next prev parent reply	other threads:[~2019-12-18 22:09 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-17 12:53 [PATCH net] net: phy: make phy_error() report which PHY has failed Russell King
2019-12-17 21:41 ` Heiner Kallweit
2019-12-17 23:34   ` Russell King - ARM Linux admin
2019-12-18 20:54     ` Heiner Kallweit
2019-12-18 22:09       ` Russell King - ARM Linux admin [this message]
2019-12-19  7:10         ` Heiner Kallweit
2019-12-19 17:06           ` Russell King - ARM Linux admin
2019-12-20 18:46             ` Florian Fainelli
2019-12-20 22:28               ` Heiner Kallweit
2019-12-19 20:50 ` David Miller
2019-12-19 21:05   ` Andrew Lunn
2019-12-19 22:14     ` David Miller
2019-12-19 21:45   ` Russell King - ARM Linux admin
2019-12-20  9:18     ` Andrew Lunn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191218220908.GX25745@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=hkallweit1@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).