All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation
@ 2023-12-26 14:19 Asmaa Mnebhi
  2023-12-27 22:19 ` Marek Mojík
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Asmaa Mnebhi @ 2023-12-26 14:19 UTC (permalink / raw)
  To: davem, linux, netdev; +Cc: Asmaa Mnebhi, davthompson

Very rarely, the KSZ9031 fails to complete autonegotiation although it was
initiated via phy_start(). As a result, the link stays down. Restarting
autonegotiation when in this state solves the issue.

Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
---
 drivers/net/phy/micrel.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 08e3915001c3..de8140c5907f 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -1475,6 +1475,7 @@ static int ksz9031_get_features(struct phy_device *phydev)
 
 static int ksz9031_read_status(struct phy_device *phydev)
 {
+	u8 timeout = 10;
 	int err;
 	int regval;
 
@@ -1494,6 +1495,22 @@ static int ksz9031_read_status(struct phy_device *phydev)
 		return genphy_config_aneg(phydev);
 	}
 
+	/* KSZ9031's autonegotiation takes normally 4-5 seconds to complete.
+	 * Occasionally it fails to complete autonegotiation. The workaround is
+	 * to restart it.
+	 */
+        if (phydev->autoneg == AUTONEG_ENABLE) {
+		while (timeout) {
+			if (phy_aneg_done(phydev))
+				break;
+			mdelay(1000);
+			timeout--;
+		};
+
+		if (timeout == 0)
+			phy_restart_aneg(phydev);
+	}
+
 	return 0;
 }
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation
  2023-12-26 14:19 [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation Asmaa Mnebhi
@ 2023-12-27 22:19 ` Marek Mojík
  2024-01-02 18:45 ` Russell King (Oracle)
  2024-01-19 23:14 ` Florian Fainelli
  2 siblings, 0 replies; 12+ messages in thread
From: Marek Mojík @ 2023-12-27 22:19 UTC (permalink / raw)
  To: Asmaa Mnebhi, davem, linux, netdev; +Cc: davthompson



On 12/26/23 15:19, Asmaa Mnebhi wrote:
> Very rarely, the KSZ9031 fails to complete autonegotiation although it was
> initiated via phy_start(). As a result, the link stays down. Restarting
> autonegotiation when in this state solves the issue.
> 
> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
> ---
>   drivers/net/phy/micrel.c | 17 +++++++++++++++++
>   1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
> index 08e3915001c3..de8140c5907f 100644
> --- a/drivers/net/phy/micrel.c
> +++ b/drivers/net/phy/micrel.c
> @@ -1475,6 +1475,7 @@ static int ksz9031_get_features(struct phy_device *phydev)
>   
>   static int ksz9031_read_status(struct phy_device *phydev)
>   {
> +	u8 timeout = 10;
>   	int err;
>   	int regval;
>   
> @@ -1494,6 +1495,22 @@ static int ksz9031_read_status(struct phy_device *phydev)
>   		return genphy_config_aneg(phydev);
>   	}
>   
> +	/* KSZ9031's autonegotiation takes normally 4-5 seconds to complete.
> +	 * Occasionally it fails to complete autonegotiation. The workaround is
> +	 * to restart it.
> +	 */
> +        if (phydev->autoneg == AUTONEG_ENABLE) {
> +		while (timeout) {
> +			if (phy_aneg_done(phydev))
> +				break;
> +			mdelay(1000);
> +			timeout--;
> +		};
> +
> +		if (timeout == 0)
> +			phy_restart_aneg(phydev);
> +	}
> +
>   	return 0;
>   }


Hi Asmaa, mdelay is busy-wait, so you're unnecessarily blocking cpu core
for 10 seconds, msleep should be used here as explained in the docs 
https://kernel.org/doc/Documentation/timers/timers-howto.txt

Marek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation
  2023-12-26 14:19 [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation Asmaa Mnebhi
  2023-12-27 22:19 ` Marek Mojík
@ 2024-01-02 18:45 ` Russell King (Oracle)
  2024-01-03  1:27   ` Andrew Lunn
  2024-01-19 23:14 ` Florian Fainelli
  2 siblings, 1 reply; 12+ messages in thread
From: Russell King (Oracle) @ 2024-01-02 18:45 UTC (permalink / raw)
  To: Asmaa Mnebhi; +Cc: davem, netdev, davthompson

On Tue, Dec 26, 2023 at 09:19:03AM -0500, Asmaa Mnebhi wrote:
> +	/* KSZ9031's autonegotiation takes normally 4-5 seconds to complete.
> +	 * Occasionally it fails to complete autonegotiation. The workaround is
> +	 * to restart it.
> +	 */
> +        if (phydev->autoneg == AUTONEG_ENABLE) {
> +		while (timeout) {
> +			if (phy_aneg_done(phydev))
> +				break;
> +			mdelay(1000);
> +			timeout--;
> +		};

Extra needless ;

Also.. ouch! This means we end up holding phydev->lock for up to ten
seconds, which prevents anything else happening with phylib during
that time. Not sure I can see a good way around that though. Andrew?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation
  2024-01-02 18:45 ` Russell King (Oracle)
@ 2024-01-03  1:27   ` Andrew Lunn
  2024-01-19 18:11     ` Asmaa Mnebhi
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Lunn @ 2024-01-03  1:27 UTC (permalink / raw)
  To: Russell King (Oracle); +Cc: Asmaa Mnebhi, davem, netdev, davthompson

On Tue, Jan 02, 2024 at 06:45:17PM +0000, Russell King (Oracle) wrote:
> On Tue, Dec 26, 2023 at 09:19:03AM -0500, Asmaa Mnebhi wrote:
> > +	/* KSZ9031's autonegotiation takes normally 4-5 seconds to complete.
> > +	 * Occasionally it fails to complete autonegotiation. The workaround is
> > +	 * to restart it.
> > +	 */
> > +        if (phydev->autoneg == AUTONEG_ENABLE) {
> > +		while (timeout) {
> > +			if (phy_aneg_done(phydev))
> > +				break;
> > +			mdelay(1000);
> > +			timeout--;
> > +		};
> 
> Extra needless ;
> 
> Also.. ouch! This means we end up holding phydev->lock for up to ten
> seconds, which prevents anything else happening with phylib during
> that time. Not sure I can see a good way around that though. Andrew?

Is there any status registers which indicate energy detection? No
point doing retries if there is no sign of a link partner.

I would also suggest moving the timeout into a driver private data
structure, and rely on phylib polling the PHY once per second and
restart autoneg from that. That will avoid holding the lock for a long
time.

	Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation
  2024-01-03  1:27   ` Andrew Lunn
@ 2024-01-19 18:11     ` Asmaa Mnebhi
  2024-01-19 19:44       ` Andrew Lunn
  0 siblings, 1 reply; 12+ messages in thread
From: Asmaa Mnebhi @ 2024-01-19 18:11 UTC (permalink / raw)
  To: Andrew Lunn, Russell King (Oracle)
  Cc: davem@davemloft.net, netdev@vger.kernel.org, David Thompson

 
> Is there any status registers which indicate energy detection? No point doing
> retries if there is no sign of a link partner.
> 
> I would also suggest moving the timeout into a driver private data structure,
> and rely on phylib polling the PHY once per second and restart autoneg from
> that. That will avoid holding the lock for a long time.
> 
Hi Andrew, 

Thank you for your feedback.

There is no status register indicating energy detection on the KSZ9031 PHY. This issue reproduces during a 2000 reboot test of the BlueField chip. The link partner is always up during the test and provides network to other entities such as servers and BMC.

We use this PHY driver with the mlxbf-gige driver. The PHY irq is used rather than polling in phy_connect_direct. if we don't want to make changes to the micrel.c driver, we could move this logic to mlxbf_gige_open() function right after calling phy_start()? This would ensure that autonegotiation is completed. Please let me know what you think.

Thanks.
Asmaa 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation
  2024-01-19 18:11     ` Asmaa Mnebhi
@ 2024-01-19 19:44       ` Andrew Lunn
  2024-01-19 19:56         ` Andrew Lunn
  2024-01-19 20:18         ` Asmaa Mnebhi
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Lunn @ 2024-01-19 19:44 UTC (permalink / raw)
  To: Asmaa Mnebhi
  Cc: Russell King (Oracle), davem@davemloft.net,
	netdev@vger.kernel.org, David Thompson

On Fri, Jan 19, 2024 at 06:11:56PM +0000, Asmaa Mnebhi wrote:
>  
> > Is there any status registers which indicate energy detection? No point doing
> > retries if there is no sign of a link partner.
> > 
> > I would also suggest moving the timeout into a driver private data structure,
> > and rely on phylib polling the PHY once per second and restart autoneg from
> > that. That will avoid holding the lock for a long time.
> > 
> Hi Andrew, 
> 
> Thank you for your feedback.

Lets try to figure out some more about the situation when it fails to
link up.

What is the value of BMSR when it fails to report complete? You say
you are using interrupts, so i just want to make sure its not an
interrupt problem, you are using edge interrupts instead of level,
etc.  Maybe i'm remembering wrong, but i though i made a comment about
this once when reviewing one of your drivers. What about the contents
of registers 0x1b and 0x1f?

Thanks
   Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation
  2024-01-19 19:44       ` Andrew Lunn
@ 2024-01-19 19:56         ` Andrew Lunn
  2024-01-19 20:18         ` Asmaa Mnebhi
  1 sibling, 0 replies; 12+ messages in thread
From: Andrew Lunn @ 2024-01-19 19:56 UTC (permalink / raw)
  To: Asmaa Mnebhi
  Cc: Russell King (Oracle), davem@davemloft.net,
	netdev@vger.kernel.org, David Thompson

> What is the value of BMSR when it fails to report complete? You say
> you are using interrupts, so i just want to make sure its not an
> interrupt problem, you are using edge interrupts instead of level,
> etc.

Please could you check that /proc/interrupts do show level interrupts
are being used.

       Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation
  2024-01-19 19:44       ` Andrew Lunn
  2024-01-19 19:56         ` Andrew Lunn
@ 2024-01-19 20:18         ` Asmaa Mnebhi
  2024-01-19 22:38           ` Andrew Lunn
  1 sibling, 1 reply; 12+ messages in thread
From: Asmaa Mnebhi @ 2024-01-19 20:18 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Russell King (Oracle), davem@davemloft.net,
	netdev@vger.kernel.org, David Thompson

> > >
> > Hi Andrew,
> >
> > Thank you for your feedback.
> 
> Lets try to figure out some more about the situation when it fails to link up.
> 
> What is the value of BMSR when it fails to report complete? You say you are
> using interrupts, so i just want to make sure its not an interrupt problem, you
> are using edge interrupts instead of level, etc.  Maybe i'm remembering wrong,
> but i though i made a comment about this once when reviewing one of your
> drivers. What about the contents of registers 0x1b and 0x1f?
> 
Yes I dumped all PHY registers and didn't see anything suspicious besides the autonegotiation not completing:
root@localhost:~/phytool# ./phytool/phytool read oob_net0/0x3/0x0
0x1140

root@localhost:~/phytool# ./phytool/phytool read oob_net0/0x3/0x1
0x7949 //aneg didnt complete and link failed

root@localhost:~/phytool# ./phytool/phytool read oob_net0/0x3/0x9
0x0200 // correct advertisement from PHY

root@localhost:~/phytool# ./phytool/phytool read oob_net0/0x3/0xA
0000 // nothing detected on link partner 

root@localhost:~/phytool# ./phytool/phytool read oob_net0/0x3/0xF
0x3000 // correct ability advertised

root@localhost:~/phytool# ./phytool/phytool read oob_net0/0x3/0x15
0000 // no errors

root@localhost:~/phytool# ./phytool/phytool read oob_net0/0x3/0x1B
0x0500 // no pending interrupts

root@localhost:~/phytool# ./phytool/phytool read oob_net0/0x3/0x1F
0x0300 


I also added the following debug prints. Please see comment next to them if they were printed or not during the reproduction.

--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
static int ksz9031_read_status(struct phy_device *phydev)
        int regval; 
        err = genphy_read_status(phydev);
+        if (err) {
+            printk("ksz9031 genphy_read_status failed"); //not printed
....
          regval = phy_read(phydev, MII_STAT1000);
+        if ((regval & 0xFF) == 0xFF) {
+          printk("ksz9031 err"); //not printed


--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
      void phy_state_machine(struct work_struct *work)
        if (needs_aneg) {
+        printk("needs_aneg"); //printed
          err = phy_start_aneg(phydev);


--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1718,6 +1718,8 @@ int __genphy_config_aneg(struct phy_device *phydev, bool changed)
                        changed = true; /* do restart aneg */
        }
 
+       printk("restart = %d", restart); // prints "restart = 1" so autonegotiation is restarted by the line below.

        /* Only restart aneg if we are advertising something different
         * than we were before.
   

The above prints proved that the micrel PHY started autonegotiation but the result is that it failed to complete it. I also noticed that the KSZ9031 PHY takes ~5 full seconds to complete aneg which is much longer than other PHYs like VSC8221 (which we use with BlueField-3 systems).

Regarding this comment in your next email:

"Please could you check that /proc/interrupts do show level interrupts are being used."
I don't think the problem is the interrupt. The interrupt for link up is issued only when autonegotiation is completed. If autonegotiation is not completed the link just stays down as indicated in PHY register 1, and no interrupt is issued.

Thanks.
Asmaa



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation
  2024-01-19 20:18         ` Asmaa Mnebhi
@ 2024-01-19 22:38           ` Andrew Lunn
  2024-01-19 22:57             ` Asmaa Mnebhi
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Lunn @ 2024-01-19 22:38 UTC (permalink / raw)
  To: Asmaa Mnebhi
  Cc: Russell King (Oracle), davem@davemloft.net,
	netdev@vger.kernel.org, David Thompson

> The above prints proved that the micrel PHY started autonegotiation
> but the result is that it failed to complete it. I also noticed that
> the KSZ9031 PHY takes ~5 full seconds to complete aneg which is much
> longer than other PHYs like VSC8221 (which we use with BlueField-3
> systems).

What is the link partner? From the datasheet

MMD Address 1h, Register 5Ah – 1000BASE-T Link-Up Time Control

When the link partner is another KSZ9031 device,
the 1000BASE-T link-up time can be long. These
three bits provide an optional setting to reduce the
1000BASE-T link-up time.
100 = Default power-up setting
011 = Optional setting to reduce link-up time when
the link partner is a KSZ9031 device.

Might be worth setting it and see what happens.

Have you tried playing with the prefer master/prefer slave options? If
you have identical PHYs on each end, it could be they are generating
the same 'random' number used to determine who should be master and
who should be slave. If they both pick the same number, they are
supposed to pick a different random number and try again. There have
been some PHYs which are broken in this respect. prefer master/prefer
slave should influence the random number, biasing it higher/lower.

auto-neg should typically take a little over 1 second. 5 seconds is
way too long, something is not correct. You might want to sniff the
fast link pulses, try to decode the values and see what is going on.

I would not be surprised if you find out this 5 second complete time
is somehow related to it not completing at all.

	Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation
  2024-01-19 22:38           ` Andrew Lunn
@ 2024-01-19 22:57             ` Asmaa Mnebhi
  2024-01-24 22:18               ` Asmaa Mnebhi
  0 siblings, 1 reply; 12+ messages in thread
From: Asmaa Mnebhi @ 2024-01-19 22:57 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Russell King (Oracle), davem@davemloft.net,
	netdev@vger.kernel.org, David Thompson

> 
> > The above prints proved that the micrel PHY started autonegotiation
> > but the result is that it failed to complete it. I also noticed that
> > the KSZ9031 PHY takes ~5 full seconds to complete aneg which is much
> > longer than other PHYs like VSC8221 (which we use with BlueField-3
> > systems).
> 
> What is the link partner? From the datasheet
> 
> MMD Address 1h, Register 5Ah – 1000BASE-T Link-Up Time Control
> 
> When the link partner is another KSZ9031 device, the 1000BASE-T link-up time
> can be long. These three bits provide an optional setting to reduce the
> 1000BASE-T link-up time.
> 100 = Default power-up setting
> 011 = Optional setting to reduce link-up time when the link partner is a KSZ9031
> device.
> 
> Might be worth setting it and see what happens.
> 
> Have you tried playing with the prefer master/prefer slave options? If you have
> identical PHYs on each end, it could be they are generating the same 'random'
> number used to determine who should be master and who should be slave. If
> they both pick the same number, they are supposed to pick a different random
> number and try again. There have been some PHYs which are broken in this
> respect. prefer master/prefer slave should influence the random number,
> biasing it higher/lower.
> 
> auto-neg should typically take a little over 1 second. 5 seconds is way too long,
> something is not correct. You might want to sniff the fast link pulses, try to
> decode the values and see what is going on.
> 
> I would not be surprised if you find out this 5 second complete time is somehow
> related to it not completing at all.
>

The link partner is a switch (KSZ9893R) so I am not sure setting the 5Ah register to 011 would help. 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation
  2023-12-26 14:19 [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation Asmaa Mnebhi
  2023-12-27 22:19 ` Marek Mojík
  2024-01-02 18:45 ` Russell King (Oracle)
@ 2024-01-19 23:14 ` Florian Fainelli
  2 siblings, 0 replies; 12+ messages in thread
From: Florian Fainelli @ 2024-01-19 23:14 UTC (permalink / raw)
  To: Asmaa Mnebhi, davem, linux, netdev; +Cc: davthompson

On 12/26/23 06:19, Asmaa Mnebhi wrote:
> Very rarely, the KSZ9031 fails to complete autonegotiation although it was
> initiated via phy_start(). As a result, the link stays down. Restarting
> autonegotiation when in this state solves the issue.
> 
> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>

We used to have a link_timeout as well as the PHY_HAS_MAGICANEG logic in 
the PHY library a long time ago:

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?id=00db8189d984d6c51226dafbbe4a667ce9b7d5da

maybe you can schedule a work queue in case you are using interrupts to 
re-check the link status periodically?
-- 
Florian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation
  2024-01-19 22:57             ` Asmaa Mnebhi
@ 2024-01-24 22:18               ` Asmaa Mnebhi
  0 siblings, 0 replies; 12+ messages in thread
From: Asmaa Mnebhi @ 2024-01-24 22:18 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Russell King (Oracle), davem@davemloft.net,
	netdev@vger.kernel.org, David Thompson

> > What is the link partner? From the datasheet
> >
> > MMD Address 1h, Register 5Ah – 1000BASE-T Link-Up Time Control
> >
> > When the link partner is another KSZ9031 device, the 1000BASE-T
> > link-up time can be long. These three bits provide an optional setting
> > to reduce the 1000BASE-T link-up time.
> > 100 = Default power-up setting
> > 011 = Optional setting to reduce link-up time when the link partner is
> > a KSZ9031 device.
> >
> > Might be worth setting it and see what happens.
> >
> > Have you tried playing with the prefer master/prefer slave options? If
> > you have identical PHYs on each end, it could be they are generating the same
> 'random'
> > number used to determine who should be master and who should be slave.
> > If they both pick the same number, they are supposed to pick a
> > different random number and try again. There have been some PHYs which
> > are broken in this respect. prefer master/prefer slave should
> > influence the random number, biasing it higher/lower.
> >
> > auto-neg should typically take a little over 1 second. 5 seconds is
> > way too long, something is not correct. You might want to sniff the
> > fast link pulses, try to decode the values and see what is going on.
> >
> > I would not be surprised if you find out this 5 second complete time
> > is somehow related to it not completing at all.
> >
> 
> The link partner is a switch (KSZ9893R) so I am not sure setting the 5Ah register
> to 011 would help.

I set the 5Ah register to 011 and that didn’t help. I also am consulting the vendor and the hardware team regarding why autonegotiation takes so long with the KSZ9031. Will report back when they get back to me. 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-01-24 22:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-26 14:19 [PATCH v1 1/1] net: phy: micrel: Add workaround for incomplete autonegotiation Asmaa Mnebhi
2023-12-27 22:19 ` Marek Mojík
2024-01-02 18:45 ` Russell King (Oracle)
2024-01-03  1:27   ` Andrew Lunn
2024-01-19 18:11     ` Asmaa Mnebhi
2024-01-19 19:44       ` Andrew Lunn
2024-01-19 19:56         ` Andrew Lunn
2024-01-19 20:18         ` Asmaa Mnebhi
2024-01-19 22:38           ` Andrew Lunn
2024-01-19 22:57             ` Asmaa Mnebhi
2024-01-24 22:18               ` Asmaa Mnebhi
2024-01-19 23:14 ` Florian Fainelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.