* [PATCH] sky2: flow control off @ 2007-02-02 23:34 Stephen Hemminger 2007-02-03 22:32 ` Willy Tarreau 2007-02-07 0:18 ` Jeff Garzik 0 siblings, 2 replies; 6+ messages in thread From: Stephen Hemminger @ 2007-02-02 23:34 UTC (permalink / raw) To: Linus Torvalds, Jeff Garzik; +Cc: netdev, linux-kernel Turn flow control off for sky2. When flow control is on, the transmitter may get randomly stuck. Perhaps there is hardware problem, but until Marvell provides errata information for workaround, it should default to off. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> --- drivers/net/sky2.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c index 822dd0b..a31dea5 100644 --- a/drivers/net/sky2.c +++ b/drivers/net/sky2.c @@ -3263,7 +3263,7 @@ #endif /* Auto speed and flow control */ sky2->autoneg = AUTONEG_ENABLE; - sky2->flow_mode = FC_BOTH; + sky2->flow_mode = FC_NONE; sky2->duplex = -1; sky2->speed = -1; -- 1.4.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] sky2: flow control off 2007-02-02 23:34 [PATCH] sky2: flow control off Stephen Hemminger @ 2007-02-03 22:32 ` Willy Tarreau 2007-02-05 17:22 ` Stephen Hemminger 2007-02-07 0:18 ` Jeff Garzik 1 sibling, 1 reply; 6+ messages in thread From: Willy Tarreau @ 2007-02-03 22:32 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Linus Torvalds, Jeff Garzik, netdev, linux-kernel Hi Stephen, On Fri, Feb 02, 2007 at 03:34:25PM -0800, Stephen Hemminger wrote: > Turn flow control off for sky2. When flow control is on, the transmitter > may get randomly stuck. Perhaps there is hardware problem, but until > Marvell provides errata information for workaround, it should default to off. Are you aware of any way to reproduce or at least to increases chances of occurrences of this problem ? It looks much like a problem I've been experiencing with Marvell's driver with a 88E8053 chip on 2.4. Doing "ethtool -r" proved useful to reset the transceiver in production. When trying to reproduce the problem, I noticed corrupted and duplicated frames in tcpdump captures when packets between 147 and 496 bytes were received then re-routed via the same NIC towards a 100 Mbps machine on the same switch, a Gig Dlink doing flow control. Yes, I know, there are a lot of variables ! But changing the target machine, or the packet sizes was enough to stop the corruption. I could not get the transmitter stuck as I sometimes have in production, but I think that both problems are related. The fact that a comparable problem is encountered with your driver really makes me think about a hardware bug :-/ BTW, switching to your driver (sky2-1.5), I could not reproduce the corruption problem anymore, but I've not yet put it in production to check if it fixes the transceiver problem. Anyway, it might be interesting that users who encounter this problem try an old version of the driver just in case they detect a difference. regards, Willy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] sky2: flow control off 2007-02-03 22:32 ` Willy Tarreau @ 2007-02-05 17:22 ` Stephen Hemminger 2007-02-05 18:42 ` Willy Tarreau 0 siblings, 1 reply; 6+ messages in thread From: Stephen Hemminger @ 2007-02-05 17:22 UTC (permalink / raw) To: Willy Tarreau; +Cc: netdev Here is what I saw. The transmitter on the Marvell Yukon II (88e8053) hangs when doing transmit flow control under load. There appears to be a bug or race condition that causes the MAC to stop transmitting data. There are two drivers for the Yukon II device on Linux. SysKonnect/Marvell has one called sk98lin it is downloadable from syskonnect.def, and I wrote one called sky2 that is part of the standard Linux kernel. This problem is reproducible with the sky2 driver only; the sk98lin driver has a watchdog routine that resets the hardware perodically, so it masks the problem. When the failure mode occurs only after several minutes of sustained activity and a situation where PAUSE frames would be received. In my testing I used server == 1000mbit ===> switch --- 100mbit ---> client Server was Mac Mini (88E8053) running Linux 2.6.20-rc7 and client was a Sony Vaio (88e8036) laptop. The server was running NFS in kernel and client was doing a large copy. The server was using UDP to cause large amounts of 802 pause frames. The problem is not as reproducible with TCP tests because TCP congestion control avoids over running the switch. When failure occurs: * packets continue to be received and passed up the stack * GMAC status register is the pause state * transmit packets continue transferred by the DMA into the RAM buffer * when the the RAM buffer fills no more packets are DMA'd * when transmit queue in driver fills, it gets a watch dog timeout * switch appears to get confused and other ports hang as well. During development of the sky2 driver a similar problem was observed on receive if the receive DMA buffer was not 8 byte aligned. For performance reasons, Linux drivers usually offset the Rx buffer by 2 bytes so that the TCP/IP headers are aligned for faster CPU access. If the sky2 Rx buffer was offset, then the receiver DMA would occasionally hung. The workaround for receive was to align the receive buffer on a quad word boundary. This problem appears to be flow control related because after disabling flow control, no errors occurred in a 48 hour test run. There probably are other races and hangs that are related. I don't consider all the hangs eliminated yet. -- Stephen Hemminger <shemminger@linux-foundation.org> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] sky2: flow control off 2007-02-05 17:22 ` Stephen Hemminger @ 2007-02-05 18:42 ` Willy Tarreau 0 siblings, 0 replies; 6+ messages in thread From: Willy Tarreau @ 2007-02-05 18:42 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev Hi Stephen, First, thanks for this detailed explanation. On Mon, Feb 05, 2007 at 09:22:53AM -0800, Stephen Hemminger wrote: > Here is what I saw. > > The transmitter on the Marvell Yukon II (88e8053) hangs when doing transmit flow > control under load. There appears to be a bug or race condition that > causes the MAC to stop transmitting data. > > There are two drivers for the Yukon II device on Linux. SysKonnect/Marvell > has one called sk98lin it is downloadable from syskonnect.def, and I wrote > one called sky2 that is part of the standard Linux kernel. This problem > is reproducible with the sky2 driver only; the sk98lin driver has a watchdog > routine that resets the hardware perodically, so it masks the problem. > > When the failure mode occurs only after several minutes of sustained activity > and a situation where PAUSE frames would be received. In my testing I used > > server == 1000mbit ===> switch --- 100mbit ---> client > > Server was Mac Mini (88E8053) running Linux 2.6.20-rc7 and client was a > Sony Vaio (88e8036) laptop. The server was running NFS in kernel > and client was doing a large copy. The server was using UDP to cause > large amounts of 802 pause frames. The problem is not as reproducible with > TCP tests because TCP congestion control avoids over running the switch. I encountered *exactly* this problem with a one-leg firewall equipped with a 88E8053 attached to a 1000 Mbps switch, itself hosting 100 Mbps stations, but with sk98lin (2.4). Running tcpdump on the firewall, I noticed duplicated and corrupted frames. I could only reproduce the duplicated and corrupted frames on a lab setup, not the Tx hangs, by sending high UDP traffic on the port to a 100 Mbps host. Sending to 1000 Mbps hosts never triggered the problem, hence my conclusions about flow control too. What I found interesting is that using a very old version of the sky2 driver which I had with me (sky2 v0.5), I could not trigger the problem anymore. But right now, I realize that this version of the driver did not support flow control yet, which might converge with your observations : # ethtool -i eth0 driver: sky2 version: 0.5 firmware-version: N/A bus-info: 01:00.0 # ethtool -a eth0 Pause parameters for eth0: Autonegotiate: on RX: off TX: off > When failure occurs: > * packets continue to be received and passed up the stack > > * GMAC status register is the pause state > * transmit packets continue transferred by the DMA into the RAM buffer > * when the the RAM buffer fills no more packets are DMA'd > * when transmit queue in driver fills, it gets a watch dog timeout > > * switch appears to get confused and other ports hang as well. > > During development of the sky2 driver a similar problem was observed on > receive if the receive DMA buffer was not 8 byte aligned. For performance > reasons, Linux drivers usually offset the Rx buffer by 2 bytes so that > the TCP/IP headers are aligned for faster CPU access. If the sky2 Rx > buffer was offset, then the receiver DMA would occasionally hung. The > workaround for receive was to align the receive buffer on a quad word > boundary. > > This problem appears to be flow control related because after disabling > flow control, no errors occurred in a 48 hour test run. No problem here with the old driver without flow control either. I can try to disable it right here on my setup with sk98lin, and test again. I did not know that the sk98lin had a watchdog, it could explain why sometimes the system entered a strange state (packets taking *seconds* to be forwarded). Anyway, I'm more and more convinced that there are hardware bugs. It is not normal at all that both the original syskonnect driver and your fresh new code show such similar problems ! > There probably are other races and hangs that are related. I don't > consider all the hangs eliminated yet. Well, at least you have a more maintainable driver than what was the previous one, so you will eventually manage to fix all problems ;-) Best regards, Willy ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] sky2: flow control off 2007-02-02 23:34 [PATCH] sky2: flow control off Stephen Hemminger 2007-02-03 22:32 ` Willy Tarreau @ 2007-02-07 0:18 ` Jeff Garzik 2007-02-07 3:57 ` Stephen Hemminger 1 sibling, 1 reply; 6+ messages in thread From: Jeff Garzik @ 2007-02-07 0:18 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Linus Torvalds, netdev, linux-kernel, Andrew Morton Stephen Hemminger wrote: > Turn flow control off for sky2. When flow control is on, the transmitter > may get randomly stuck. Perhaps there is hardware problem, but until > Marvell provides errata information for workaround, it should default to off. > > Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> > --- > drivers/net/sky2.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c > index 822dd0b..a31dea5 100644 > --- a/drivers/net/sky2.c > +++ b/drivers/net/sky2.c > @@ -3263,7 +3263,7 @@ #endif > > /* Auto speed and flow control */ > sky2->autoneg = AUTONEG_ENABLE; > - sky2->flow_mode = FC_BOTH; > + sky2->flow_mode = FC_NONE; I ACK the patch... conditional on some -mm style testing and user ACKs. Logic: if there were no downsides to disabling flow control globally, the world's networks would have already done so. Flow control can be quite helpful, so I while I understand the errata argument, I also want to understand the full effect of this tiny patch. Jeff ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] sky2: flow control off 2007-02-07 0:18 ` Jeff Garzik @ 2007-02-07 3:57 ` Stephen Hemminger 0 siblings, 0 replies; 6+ messages in thread From: Stephen Hemminger @ 2007-02-07 3:57 UTC (permalink / raw) To: Jeff Garzik Cc: Stephen Hemminger, Linus Torvalds, netdev, linux-kernel, Andrew Morton On Tue, 06 Feb 2007 19:18:07 -0500 Jeff Garzik <jgarzik@pobox.com> wrote: > Stephen Hemminger wrote: > > Turn flow control off for sky2. When flow control is on, the transmitter > > may get randomly stuck. Perhaps there is hardware problem, but until > > Marvell provides errata information for workaround, it should default to off. > > > > Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> > > --- > > drivers/net/sky2.c | 2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > > diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c > > index 822dd0b..a31dea5 100644 > > --- a/drivers/net/sky2.c > > +++ b/drivers/net/sky2.c > > @@ -3263,7 +3263,7 @@ #endif > > > > /* Auto speed and flow control */ > > sky2->autoneg = AUTONEG_ENABLE; > > - sky2->flow_mode = FC_BOTH; > > + sky2->flow_mode = FC_NONE; > > I ACK the patch... conditional on some -mm style testing and user ACKs. > > Logic: if there were no downsides to disabling flow control globally, > the world's networks would have already done so. Flow control can be > quite helpful, so I while I understand the errata argument, I also want > to understand the full effect of this tiny patch. > Actually, the E1000 had it off until recently. The downside is that if a system is connected on a switch with a gigabit to 100mbit port and using a stupid protocol like NFS over UDP, then the packet burst is sure to get truncated so the 8K fragmented UDP never gets through. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-02-07 3:58 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-02-02 23:34 [PATCH] sky2: flow control off Stephen Hemminger 2007-02-03 22:32 ` Willy Tarreau 2007-02-05 17:22 ` Stephen Hemminger 2007-02-05 18:42 ` Willy Tarreau 2007-02-07 0:18 ` Jeff Garzik 2007-02-07 3:57 ` Stephen Hemminger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).