* [PATCH v2 2/2] e1000e: fix link fluctuations problem
@ 2024-05-03 10:18 Ricky Wu
2024-05-07 12:31 ` Andrew Lunn
0 siblings, 1 reply; 9+ messages in thread
From: Ricky Wu @ 2024-05-03 10:18 UTC (permalink / raw)
To: jesse.brandeburg
Cc: anthony.l.nguyen, intel-wired-lan, davem, edumazet, kuba, pabeni,
netdev, linux-kernel, rickywu0421, en-wei.wu
As described in https://bugzilla.kernel.org/show_bug.cgi?id=218642,
Intel I219-LM reports link up -> link down -> link up after hot-plugging
the Ethernet cable.
The problem is because the unstable behavior of Link Status bit in
PHY Status Register of some e1000e NIC. When we re-plug the cable,
the e1000e_phy_has_link_generic() (called after the Link-Status-Changed
interrupt) has read this bit with 1->0->1 (1=link up, 0=link down)
and e1000e reports it to net device layer respectively.
This patch solves the problem by passing polling delays on
e1000e_phy_has_link_generic() so that it will not get the unstable
states of Link Status bit.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218642
Fixes: 7d3cabbcc86 ("e1000e: disable K1 at 1000Mbps for 82577/82578")
Signed-off-by: Ricky Wu <en-wei.wu@canonical.com>
---
In v2:
* Split the sleep codes part into PATCHSET [1/2]
---
drivers/net/ethernet/intel/e1000e/ich8lan.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index f9e94be36e97..68f5698a22b0 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1428,7 +1428,17 @@ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
* link. If so, then we want to get the current speed/duplex
* of the PHY.
*/
- ret_val = e1000e_phy_has_link_generic(hw, 1, 0, &link);
+ /* We've seen that I219-LM sometimes has link fluctuations
+ * (link up -> link down -> link up) after hot-plugging the cable.
+ * The problem is caused by the instability of the Link Status bit
+ * (BMSR_LSTATUS) in MII Status Register. The average time between
+ * the first link up and link down is between 3~4 ms.
+ * Increasing the iteration times and setting up the delay to
+ * 100ms (which is safe) solves the problem.
+ * This behavior hasn't been seen on other NICs and also not being
+ * documented in datasheet/errata.
+ */
+ ret_val = e1000e_phy_has_link_generic(hw, COPPER_LINK_UP_LIMIT, 100000, &link);
if (ret_val)
goto out;
--
2.40.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v2 2/2] e1000e: fix link fluctuations problem 2024-05-03 10:18 [PATCH v2 2/2] e1000e: fix link fluctuations problem Ricky Wu @ 2024-05-07 12:31 ` Andrew Lunn 2024-05-08 5:05 ` [Intel-wired-lan] " Sasha Neftin 0 siblings, 1 reply; 9+ messages in thread From: Andrew Lunn @ 2024-05-07 12:31 UTC (permalink / raw) To: Ricky Wu Cc: jesse.brandeburg, anthony.l.nguyen, intel-wired-lan, davem, edumazet, kuba, pabeni, netdev, linux-kernel, rickywu0421 On Fri, May 03, 2024 at 06:18:36PM +0800, Ricky Wu wrote: > As described in https://bugzilla.kernel.org/show_bug.cgi?id=218642, > Intel I219-LM reports link up -> link down -> link up after hot-plugging > the Ethernet cable. Please could you quote some parts of 802.3 which state this is a problem. How is this breaking the standard. Andrew ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Intel-wired-lan] [PATCH v2 2/2] e1000e: fix link fluctuations problem 2024-05-07 12:31 ` Andrew Lunn @ 2024-05-08 5:05 ` Sasha Neftin 2024-05-09 9:13 ` Ruinskiy, Dima 0 siblings, 1 reply; 9+ messages in thread From: Sasha Neftin @ 2024-05-08 5:05 UTC (permalink / raw) To: Andrew Lunn, Ricky Wu Cc: netdev, rickywu0421, linux-kernel, edumazet, intel-wired-lan, kuba, anthony.l.nguyen, pabeni, davem, Ruinskiy, Dima, Lifshits, Vitaly, naamax.meir, Avivi, Amir, Nguyen, Anthony L, Keller, Jacob E On 07/05/2024 15:31, Andrew Lunn wrote: > On Fri, May 03, 2024 at 06:18:36PM +0800, Ricky Wu wrote: >> As described in https://bugzilla.kernel.org/show_bug.cgi?id=218642, >> Intel I219-LM reports link up -> link down -> link up after hot-plugging >> the Ethernet cable. > > Please could you quote some parts of 802.3 which state this is a > problem. How is this breaking the standard. > > Andrew In I219-* parts used LSI PHY. This PHY is compliant with the 802.3 IEEE standard if I recall correctly. Auto-negotiation and link establishment are processed following the IEEE standard and could vary from platform to platform but are not violent to the IEEE standard. En-Wei, My recommendation is not to accept these patches. If you think there is a HW/PHY problem - open a ticket on Intel PAE. Sasha ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Intel-wired-lan] [PATCH v2 2/2] e1000e: fix link fluctuations problem 2024-05-08 5:05 ` [Intel-wired-lan] " Sasha Neftin @ 2024-05-09 9:13 ` Ruinskiy, Dima 2024-05-09 13:46 ` Andrew Lunn 0 siblings, 1 reply; 9+ messages in thread From: Ruinskiy, Dima @ 2024-05-09 9:13 UTC (permalink / raw) To: Sasha Neftin, Andrew Lunn, Ricky Wu Cc: netdev, rickywu0421, linux-kernel, edumazet, intel-wired-lan, kuba, anthony.l.nguyen, pabeni, davem, Lifshits, Vitaly, naamax.meir, Avivi, Amir, Keller, Jacob E On 08/05/2024 8:05, Sasha Neftin wrote: > On 07/05/2024 15:31, Andrew Lunn wrote: >> On Fri, May 03, 2024 at 06:18:36PM +0800, Ricky Wu wrote: >>> As described in https://bugzilla.kernel.org/show_bug.cgi?id=218642, >>> Intel I219-LM reports link up -> link down -> link up after hot-plugging >>> the Ethernet cable. >> >> Please could you quote some parts of 802.3 which state this is a >> problem. How is this breaking the standard. >> >> Andrew > > In I219-* parts used LSI PHY. This PHY is compliant with the 802.3 IEEE > standard if I recall correctly. Auto-negotiation and link establishment > are processed following the IEEE standard and could vary from platform > to platform but are not violent to the IEEE standard. > > En-Wei, My recommendation is not to accept these patches. If you think > there is a HW/PHY problem - open a ticket on Intel PAE. > > Sasha I concur. I am wary of changing the behavior of some driver fundamentals, to satisfy a particular validation/certification flow, if there is no real functionality problem. It can open a big Pandora box. Checking the Bugzilla report again, I am not sure we understand the issue fully: [ 143.141006] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None [ 143.144878] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down [ 146.838980] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None This looks like a very quick link "flap", following by proper link establishment ~3.7 seconds later. These ~3.7 seconds are in line of what link auto-negotiation would take (auto-negotiation is the default mode for this driver). The first print (1000 Mbps Half Duplex) actually makes no sense - it cannot be real link status since 1000/Half is not a supported speed. So it seems to me that actually the first "link up" is an incorrect/incomplete/premature reading, not the "link down". --Dima ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Intel-wired-lan] [PATCH v2 2/2] e1000e: fix link fluctuations problem 2024-05-09 9:13 ` Ruinskiy, Dima @ 2024-05-09 13:46 ` Andrew Lunn 2024-05-09 17:40 ` En-Wei WU 2024-05-10 9:55 ` Sasha Neftin 0 siblings, 2 replies; 9+ messages in thread From: Andrew Lunn @ 2024-05-09 13:46 UTC (permalink / raw) To: Ruinskiy, Dima Cc: Sasha Neftin, Ricky Wu, netdev, rickywu0421, linux-kernel, edumazet, intel-wired-lan, kuba, anthony.l.nguyen, pabeni, davem, Lifshits, Vitaly, naamax.meir, Avivi, Amir, Keller, Jacob E On Thu, May 09, 2024 at 12:13:27PM +0300, Ruinskiy, Dima wrote: > On 08/05/2024 8:05, Sasha Neftin wrote: > > On 07/05/2024 15:31, Andrew Lunn wrote: > > > On Fri, May 03, 2024 at 06:18:36PM +0800, Ricky Wu wrote: > > > > As described in https://bugzilla.kernel.org/show_bug.cgi?id=218642, > > > > Intel I219-LM reports link up -> link down -> link up after hot-plugging > > > > the Ethernet cable. > > > > > > Please could you quote some parts of 802.3 which state this is a > > > problem. How is this breaking the standard. > > > > > > Andrew > > > > In I219-* parts used LSI PHY. This PHY is compliant with the 802.3 IEEE > > standard if I recall correctly. Auto-negotiation and link establishment > > are processed following the IEEE standard and could vary from platform > > to platform but are not violent to the IEEE standard. > > > > En-Wei, My recommendation is not to accept these patches. If you think > > there is a HW/PHY problem - open a ticket on Intel PAE. > > > > Sasha > > I concur. I am wary of changing the behavior of some driver fundamentals, to > satisfy a particular validation/certification flow, if there is no real > functionality problem. It can open a big Pandora box. > > Checking the Bugzilla report again, I am not sure we understand the issue > fully: > > [ 143.141006] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half > Duplex, Flow Control: None > [ 143.144878] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down > [ 146.838980] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full > Duplex, Flow Control: None > > This looks like a very quick link "flap", following by proper link > establishment ~3.7 seconds later. These ~3.7 seconds are in line of what > link auto-negotiation would take (auto-negotiation is the default mode for > this driver). That actually seems slow. It is normally a little over 1 second. I forget the exact number. But is the PHY being polled once a second, rather than being interrupt driven? > The first print (1000 Mbps Half Duplex) actually makes no > sense - it cannot be real link status since 1000/Half is not a supported > speed. It would be interesting to see what the link partner sees. What does it think the I219-LM is advertising? Is it advertising 1000BaseT_Half? But why would auto-neg resolve to that if 1000BaseT_Full is available? > So it seems to me that actually the first "link up" is an > incorrect/incomplete/premature reading, not the "link down". Agreed. Root cause this, which looks like a real problem, rather than apply a band-aid for a test system. Andrew ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Intel-wired-lan] [PATCH v2 2/2] e1000e: fix link fluctuations problem 2024-05-09 13:46 ` Andrew Lunn @ 2024-05-09 17:40 ` En-Wei WU 2024-05-10 9:55 ` Sasha Neftin 1 sibling, 0 replies; 9+ messages in thread From: En-Wei WU @ 2024-05-09 17:40 UTC (permalink / raw) To: Andrew Lunn Cc: Ruinskiy, Dima, Sasha Neftin, netdev, rickywu0421, linux-kernel, edumazet, intel-wired-lan, kuba, anthony.l.nguyen, pabeni, davem, Lifshits, Vitaly, naamax.meir, Avivi, Amir, Keller, Jacob E > En-Wei, My recommendation is not to accept these patches. If you think > there is a HW/PHY problem - open a ticket on Intel PAE. > I concur. I am wary of changing the behavior of some driver > fundamentals, to satisfy a particular validation/certification flow, if > there is no real functionality problem. It can open a big Pandora box. OK. Thanks for your help. I think we can end this patchset now. > It is normally a little over 1 second. I > forget the exact number. But is the PHY being polled once a second, > rather than being interrupt driven? If I read the code correctly, the PHY is polled every 2 seconds by the e1000e watchdog. But if an interrupt occurs and it's a link-status-change interrupt, the watchdog will be called immediately and the PHY is polled. > What does it think the I219-LM is advertising? Is it advertising 1000BaseT_Half? > But why would auto-neg resolve to that if 1000BaseT_Full is available? I'm also interested in it. I'll do some checking later to see what's advertising by us and the link partner. > Agreed. Root cause this, which looks like a real problem, rather than > apply a band-aid for a test system. OK. I think there is a clue which is related to auto-negotiation. I'll work on it later. Thank all of you for your help, I really appreciate it. On Thu, 9 May 2024 at 15:46, Andrew Lunn <andrew@lunn.ch> wrote: > > On Thu, May 09, 2024 at 12:13:27PM +0300, Ruinskiy, Dima wrote: > > On 08/05/2024 8:05, Sasha Neftin wrote: > > > On 07/05/2024 15:31, Andrew Lunn wrote: > > > > On Fri, May 03, 2024 at 06:18:36PM +0800, Ricky Wu wrote: > > > > > As described in https://bugzilla.kernel.org/show_bug.cgi?id=218642, > > > > > Intel I219-LM reports link up -> link down -> link up after hot-plugging > > > > > the Ethernet cable. > > > > > > > > Please could you quote some parts of 802.3 which state this is a > > > > problem. How is this breaking the standard. > > > > > > > > Andrew > > > > > > In I219-* parts used LSI PHY. This PHY is compliant with the 802.3 IEEE > > > standard if I recall correctly. Auto-negotiation and link establishment > > > are processed following the IEEE standard and could vary from platform > > > to platform but are not violent to the IEEE standard. > > > > > > En-Wei, My recommendation is not to accept these patches. If you think > > > there is a HW/PHY problem - open a ticket on Intel PAE. > > > > > > Sasha > > > > I concur. I am wary of changing the behavior of some driver fundamentals, to > > satisfy a particular validation/certification flow, if there is no real > > functionality problem. It can open a big Pandora box. > > > > Checking the Bugzilla report again, I am not sure we understand the issue > > fully: > > > > [ 143.141006] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half > > Duplex, Flow Control: None > > [ 143.144878] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down > > [ 146.838980] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full > > Duplex, Flow Control: None > > > > This looks like a very quick link "flap", following by proper link > > establishment ~3.7 seconds later. These ~3.7 seconds are in line of what > > link auto-negotiation would take (auto-negotiation is the default mode for > > this driver). > > That actually seems slow. It is normally a little over 1 second. I > forget the exact number. But is the PHY being polled once a second, > rather than being interrupt driven? > > > The first print (1000 Mbps Half Duplex) actually makes no > > sense - it cannot be real link status since 1000/Half is not a supported > > speed. > > It would be interesting to see what the link partner sees. What does > it think the I219-LM is advertising? Is it advertising 1000BaseT_Half? > But why would auto-neg resolve to that if 1000BaseT_Full is available? > > > So it seems to me that actually the first "link up" is an > > incorrect/incomplete/premature reading, not the "link down". > > Agreed. Root cause this, which looks like a real problem, rather than > apply a band-aid for a test system. > > Andrew ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Intel-wired-lan] [PATCH v2 2/2] e1000e: fix link fluctuations problem 2024-05-09 13:46 ` Andrew Lunn 2024-05-09 17:40 ` En-Wei WU @ 2024-05-10 9:55 ` Sasha Neftin 2024-05-10 12:32 ` Andrew Lunn 1 sibling, 1 reply; 9+ messages in thread From: Sasha Neftin @ 2024-05-10 9:55 UTC (permalink / raw) To: Andrew Lunn, Ruinskiy, Dima Cc: Ricky Wu, netdev, rickywu0421, linux-kernel, edumazet, intel-wired-lan, kuba, anthony.l.nguyen, pabeni, davem, Lifshits, Vitaly, naamax.meir, Avivi, Amir, Keller, Jacob E On 09/05/2024 16:46, Andrew Lunn wrote: > On Thu, May 09, 2024 at 12:13:27PM +0300, Ruinskiy, Dima wrote: >> On 08/05/2024 8:05, Sasha Neftin wrote: >>> On 07/05/2024 15:31, Andrew Lunn wrote: >>>> On Fri, May 03, 2024 at 06:18:36PM +0800, Ricky Wu wrote: >>>>> As described in https://bugzilla.kernel.org/show_bug.cgi?id=218642, >>>>> Intel I219-LM reports link up -> link down -> link up after hot-plugging >>>>> the Ethernet cable. >>>> >>>> Please could you quote some parts of 802.3 which state this is a >>>> problem. How is this breaking the standard. >>>> >>>> Andrew >>> >>> In I219-* parts used LSI PHY. This PHY is compliant with the 802.3 IEEE >>> standard if I recall correctly. Auto-negotiation and link establishment >>> are processed following the IEEE standard and could vary from platform >>> to platform but are not violent to the IEEE standard. >>> >>> En-Wei, My recommendation is not to accept these patches. If you think >>> there is a HW/PHY problem - open a ticket on Intel PAE. >>> >>> Sasha >> >> I concur. I am wary of changing the behavior of some driver fundamentals, to >> satisfy a particular validation/certification flow, if there is no real >> functionality problem. It can open a big Pandora box. >> >> Checking the Bugzilla report again, I am not sure we understand the issue >> fully: >> >> [ 143.141006] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half >> Duplex, Flow Control: None >> [ 143.144878] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down >> [ 146.838980] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full >> Duplex, Flow Control: None >> >> This looks like a very quick link "flap", following by proper link >> establishment ~3.7 seconds later. These ~3.7 seconds are in line of what >> link auto-negotiation would take (auto-negotiation is the default mode for >> this driver). > > That actually seems slow. It is normally a little over 1 second. I > forget the exact number. But is the PHY being polled once a second, > rather than being interrupt driven? > >> The first print (1000 Mbps Half Duplex) actually makes no >> sense - it cannot be real link status since 1000/Half is not a supported >> speed. > > It would be interesting to see what the link partner sees. What does > it think the I219-LM is advertising? Is it advertising 1000BaseT_Half? i219 parts come with LSI PHY. 1000BASE-T half-duplex is not supported. 1000BASET half-duplex not advertised in IEEE 1000BASE-T Control Register 9. > But why would auto-neg resolve to that if 1000BaseT_Full is available? > >> So it seems to me that actually the first "link up" is an >> incorrect/incomplete/premature reading, not the "link down". > > Agreed. Root cause this, which looks like a real problem, rather than > apply a band-aid for a test system. > > Andrew ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Intel-wired-lan] [PATCH v2 2/2] e1000e: fix link fluctuations problem 2024-05-10 9:55 ` Sasha Neftin @ 2024-05-10 12:32 ` Andrew Lunn 2024-05-13 6:04 ` Sasha Neftin 0 siblings, 1 reply; 9+ messages in thread From: Andrew Lunn @ 2024-05-10 12:32 UTC (permalink / raw) To: Sasha Neftin Cc: Ruinskiy, Dima, Ricky Wu, netdev, rickywu0421, linux-kernel, edumazet, intel-wired-lan, kuba, anthony.l.nguyen, pabeni, davem, Lifshits, Vitaly, naamax.meir, Avivi, Amir, Keller, Jacob E > > It would be interesting to see what the link partner sees. What does > > it think the I219-LM is advertising? Is it advertising 1000BaseT_Half? > > i219 parts come with LSI PHY. 1000BASE-T half-duplex is not supported. > 1000BASET half-duplex not advertised in IEEE 1000BASE-T Control Register 9. That is the theory. But in practice? What does the link partner really see? I've come across systems which get advertisement wrong. However, in that case, i suspect it is the software above the PHY, not the PHY itself which was wrong. Andrew ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Intel-wired-lan] [PATCH v2 2/2] e1000e: fix link fluctuations problem 2024-05-10 12:32 ` Andrew Lunn @ 2024-05-13 6:04 ` Sasha Neftin 0 siblings, 0 replies; 9+ messages in thread From: Sasha Neftin @ 2024-05-13 6:04 UTC (permalink / raw) To: Andrew Lunn Cc: Ruinskiy, Dima, Ricky Wu, netdev, rickywu0421, linux-kernel, edumazet, intel-wired-lan, kuba, anthony.l.nguyen, pabeni, davem, Lifshits, Vitaly, naamax.meir, Avivi, Amir, Keller, Jacob E, Lifshits, Vitaly On 10/05/2024 15:32, Andrew Lunn wrote: >>> It would be interesting to see what the link partner sees. What does >>> it think the I219-LM is advertising? Is it advertising 1000BaseT_Half? >> >> i219 parts come with LSI PHY. 1000BASE-T half-duplex is not supported. >> 1000BASET half-duplex not advertised in IEEE 1000BASE-T Control Register 9. > > That is the theory. But in practice? What does the link partner really > see? I've come across systems which get advertisement wrong. However, > in that case, i suspect it is the software above the PHY, not the PHY > itself which was wrong. > > Andrew Not only in the theory. On the link partner IEEE 1000BASE-T Status Register - Address 10, Link Partner 1000BASE-T Half-Duplex Capability is 0. (1000BAS-T HD not advertised). I believe here is some false positive indication during the auto-negotiation process.(this is not related to the link fluctuation) Sasha ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-05-13 6:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-03 10:18 [PATCH v2 2/2] e1000e: fix link fluctuations problem Ricky Wu 2024-05-07 12:31 ` Andrew Lunn 2024-05-08 5:05 ` [Intel-wired-lan] " Sasha Neftin 2024-05-09 9:13 ` Ruinskiy, Dima 2024-05-09 13:46 ` Andrew Lunn 2024-05-09 17:40 ` En-Wei WU 2024-05-10 9:55 ` Sasha Neftin 2024-05-10 12:32 ` Andrew Lunn 2024-05-13 6:04 ` Sasha Neftin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).