netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Lunn <andrew@lunn.ch>
To: Michael Walle <michael@walle.cc>
Cc: Heiner Kallweit <hkallweit1@gmail.com>,
	Russell King <linux@armlinux.org.uk>,
	Horatiu Vultur <horatiu.vultur@microchip.com>,
	netdev@vger.kernel.org, Xu Liang <lxu@maxlinear.com>
Subject: Re: GPY215 PHY interrupt issue
Date: Thu, 1 Dec 2022 23:06:38 +0100	[thread overview]
Message-ID: <Y4klbgDIuxHXaWrC@lunn.ch> (raw)
In-Reply-To: <158870dd20a5e30cda9f17009aa0c6c8@walle.cc>

On Thu, Dec 01, 2022 at 10:31:19PM +0100, Michael Walle wrote:
> Am 2022-12-01 16:54, schrieb Andrew Lunn:
> > > So, switching the line to GPIO input doesn't help here, which also
> > > means the interrupt line will be stuck the whole time.
> > 
> > Sounds like they totally messed up the design somehow.
> > 
> > Since we are into horrible hack territory.....
> > 
> > I assume you are using the Link state change interrupt? LSTC?
> 
> Yes, but recently I've found it that it also happens with
> the speed change interrupt (during link-up). By pure luck (or
> bad luck really?) I discovered that when I reduce the MDIO
> frequency I get a similar behavior for the interrupt line
> at link-up with the LSPC interrupt. I don't think it has
> something to do with the frequency but with changed timing.

So at a wild guess, there is some sort of race condition between the
2.5MHz ish MDIO clock and the rest of the system which is probably
clocked at 25Mhz.

We have to hope this is limited to just interrupts! Not any MDIO bus
transaction.

> +	/* The PHY might leave the interrupt line asserted even after PHY_ISTAT
> +	* is read. To avoid interrupt storms, delay the interrupt handling as
> +	* long as the PHY drives the interrupt line. An internal bus read will
> +	* stall as long as the interrupt line is asserted, thus just read a
> +	* random register here.
> +	* Because we cannot access the internal bus at all while the interrupt
> +	* is driven by the PHY, there is no way to make the interrupt line
> +	* unstuck (e.g. by changing the pinmux to GPIO input) during that time
> +	* frame. Therefore, polling is the best we can do and won't do any more
> +	* harm.
> +	* It was observed that this bug happens on link state and link speed
> +	* changes on a GPY215B and GYP215C independent of the firmware version
> +	* (which doesn't mean that this list is exhaustive).
> +	*/
> +	if (needs_mdint_wa && (reg & (PHY_IMASK_LSTC | PHY_IMASK_LSPC)))
> +		gpy_mbox_read(phydev, REG_GPIO0_OUT);
> +
>  	phy_trigger_machine(phydev);
> 
>  	return IRQ_HANDLED;

So this delayed exiting the interrupt handler until the line actually
return to normal. And during that time, the interrupt is disabled. And
hence the storm is avoided.

I'm assuming gpy_mbox_read() has a timeout, so if the PHY completely
jams, it does exit? If that happens, maybe call phy_error() to
indicate the PHY is dead. And don't return IRQ_HANDLED, but
IRQ_NONE. I think the IRQ core should notice the storm for an
interrupt which nobody cares about and disable the interrupt.
Probably not much you can do after that, but at least the machine
won't be totally dead.

> It seems like at least these two are :/ So with the code above
> we could avoid the interrupt storm but we can't do anything about
> the blocked interrupt line. I'm unsure if that is acceptable or
> if we'd have to disable interrupts on this PHY altogether and
> fallback to polling.

It probably depends on your system design. If this is the only PHY and
the storm can be avoided, it is probably O.K. If you have other PHYs
sharing the line, and those PHYs are doing time sensitive stuff like
PTP, maybe polling would be better.

Maybe add a kconfig symbol CONFIG_MAXLINEAR_GPHY_BROKEN_INTERRUPTS and
make all the interrupt code conditional on this? Hopefully distros
won't enable it, but the option is there to buy into the behaviour if
you want?

      Andrew

  reply	other threads:[~2022-12-01 22:06 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-25 14:44 GPY215 PHY interrupt issue Michael Walle
2022-11-25 15:17 ` Andrew Lunn
2022-11-27 19:01   ` Michael Walle
2022-11-28  7:41   ` Michael Walle
2022-11-28 13:30     ` Andrew Lunn
2022-12-01 10:24       ` Michael Walle
2022-12-01 15:54         ` Andrew Lunn
2022-12-01 21:31           ` Michael Walle
2022-12-01 22:06             ` Andrew Lunn [this message]
2022-12-01 23:24               ` Michael Walle
2022-12-02  0:05                 ` Andrew Lunn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y4klbgDIuxHXaWrC@lunn.ch \
    --to=andrew@lunn.ch \
    --cc=hkallweit1@gmail.com \
    --cc=horatiu.vultur@microchip.com \
    --cc=linux@armlinux.org.uk \
    --cc=lxu@maxlinear.com \
    --cc=michael@walle.cc \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).