* Intel EEPro 100 with kernel drivers
@ 2001-10-29 1:13 Thomas Langås
2001-10-29 1:57 ` Jim Hull
` (3 more replies)
0 siblings, 4 replies; 26+ messages in thread
From: Thomas Langås @ 2001-10-29 1:13 UTC (permalink / raw)
To: linux-kernel
Hi!
We've got a lot of machines with the eepro 100 from intel onboard, and when
we try to stress-test the network (running bonnie++ on a nfs-shared
directory on a machine), the network-card says "eth0: Card reports no
resources" to dmesg, and then the "line" appear dead for some time (one
minutte or more). What can be done to remove this error? NFS timesout with
this error (obviously)...
--
Thomas
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Intel EEPro 100 with kernel drivers 2001-10-29 1:13 Intel EEPro 100 with kernel drivers Thomas Langås @ 2001-10-29 1:57 ` Jim Hull 2001-10-29 3:43 ` J Sloan ` (2 subsequent siblings) 3 siblings, 0 replies; 26+ messages in thread From: Jim Hull @ 2001-10-29 1:57 UTC (permalink / raw) To: linux-kernel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 1784 bytes --] -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I actually have the same issue but I am not seeing any performance loss. I do extensive NFS transfers as this box also stores a software raid array, and aside from the kernel message, I am unaffected. - From what I understand the problem is a hardware bug, and I believe I read somewhere that by forcing the network card to use its own IRQ and not having it share an IRQ will alleviate this problem. Hope this helps .... On a side note I run this nic on about 10 production web servers running fbsd 3.5 receiving extensive traffic loads and have no problems with them at all. Jim ============================ They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. - --Benjamin Franklin, Historical Review of Pennyslvania, 1759 On Mon, 29 Oct 2001, [iso-8859-1] Thomas Langås wrote: Hi! We've got a lot of machines with the eepro 100 from intel onboard, and when we try to stress-test the network (running bonnie++ on a nfs-shared directory on a machine), the network-card says "eth0: Card reports no resources" to dmesg, and then the "line" appear dead for some time (one minutte or more). What can be done to remove this error? NFS timesout with this error (obviously)... - -- Thomas - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE73LeWdygyS8O4zQ0RAnB8AJ4xqGShA8xlANM9pFmbvNWf4Ia2GgCgusjL ZgmY6+MW8+vzzYIHCdSRDts= =ql0B -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-29 1:13 Intel EEPro 100 with kernel drivers Thomas Langås 2001-10-29 1:57 ` Jim Hull @ 2001-10-29 3:43 ` J Sloan 2001-10-31 8:01 ` Thomas Langås 2001-10-29 10:44 ` Alan Cox 2001-10-30 8:36 ` Jarmo Järvenpää 3 siblings, 1 reply; 26+ messages in thread From: J Sloan @ 2001-10-29 3:43 UTC (permalink / raw) To: linux-kernel Thomas Langås wrote: > Hi! > > We've got a lot of machines with the eepro 100 from intel onboard, and when > we try to stress-test the network (running bonnie++ on a nfs-shared > directory on a machine), the network-card says "eth0: Card reports no > resources" to dmesg, and then the "line" appear dead for some time (one > minutte or more). What can be done to remove this error? NFS timesout with > this error (obviously)... We found that using the intel e100 driver instead of the eepro100 eliminates these errors - YMMV of course - cu jjs ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-29 3:43 ` J Sloan @ 2001-10-31 8:01 ` Thomas Langås 2001-10-31 15:22 ` Andrey Savochkin 2001-10-31 18:10 ` Juergen Hasch 0 siblings, 2 replies; 26+ messages in thread From: Thomas Langås @ 2001-10-31 8:01 UTC (permalink / raw) To: J Sloan; +Cc: linux-kernel J Sloan: > We found that using the intel e100 driver > instead of the eepro100 eliminates these > errors - YMMV of course - I've now tried the Intel driver, no help, still get the NFS timeouts (the intel driver doesn't output anything to dmesg, so it's no way of telling if the same things occur as in the eepro100 stock-kernel driver). This is how I do the test: NFS share a filesystem NFS mount it on another box (not running intel e100 nic) Start bonnie++ on the box that has mounted the nfs share After 10-20mins, the first NFS timeout comes (which means the card is out of resources, and "halts" for a bit). When the card becomes out of resources, it seems like it uses a few minutes before it comes online again, no wonder why, tho. Has anyone got any suggestions on how to start tracking down, and maybe fixing this problem? Or, is this a hardware error? Or maybe a firmware error? Should I start contacting Dell and tell them that's there's a possible error in their PowerEdge 2550-series? -- Thomas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-31 8:01 ` Thomas Langås @ 2001-10-31 15:22 ` Andrey Savochkin 2001-10-31 15:52 ` Kirill Ratkin 2001-11-01 7:55 ` Thomas Langås 2001-10-31 18:10 ` Juergen Hasch 1 sibling, 2 replies; 26+ messages in thread From: Andrey Savochkin @ 2001-10-31 15:22 UTC (permalink / raw) To: Thomas LangЕs; +Cc: linux-kernel, J Sloan Hi, On Wed, Oct 31, 2001 at 09:01:25AM +0100, Thomas LangЕs wrote: > > I've now tried the Intel driver, no help, still get the NFS timeouts (the > intel driver doesn't output anything to dmesg, so it's no way of telling if > the same things occur as in the eepro100 stock-kernel driver). > > This is how I do the test: > > NFS share a filesystem > NFS mount it on another box (not running intel e100 nic) > Start bonnie++ on the box that has mounted the nfs share > > After 10-20mins, the first NFS timeout comes (which means the card is out of > resources, and "halts" for a bit). When the card becomes out of resources, > it seems like it uses a few minutes before it comes online again, no wonder > why, tho. > > Has anyone got any suggestions on how to start tracking down, and maybe > fixing this problem? Or, is this a hardware error? Or maybe a firmware Well, with eepro100 the start may be the following: 1. When the card stalls, start ping from that host. This way you ensure that you have something in transmit ring. If it's transmitting that stalls, you'll get a message from netdev watchdog. 2. If ping works, then your problem appear to be pure NFS one, i.e. inability of NFS to recover from network operation disruption. 3. If ping is able to transmit, but not receive (you may check it by tcpdump), then we have a receiver problem. We'll think what to do then. 4. In any case, running eepro100-diag from scyld.com at the moment of the stall may give some useful information. 5. In any case, searching eepro100 mailing list archive on scyld.com is a good idea, you may learn what other people observe/do. Andrey > error? Should I start contacting Dell and tell them that's there's a > possible error in their PowerEdge 2550-series? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-31 15:22 ` Andrey Savochkin @ 2001-10-31 15:52 ` Kirill Ratkin 2001-11-01 7:55 ` Thomas Langås 1 sibling, 0 replies; 26+ messages in thread From: Kirill Ratkin @ 2001-10-31 15:52 UTC (permalink / raw) To: Andrey Savochkin; +Cc: Thomas LangЕs, linux-kernel, J Sloan Andrey Savochkin wrote: > > Hi, > > On Wed, Oct 31, 2001 at 09:01:25AM +0100, Thomas LangЕs wrote: > > > > I've now tried the Intel driver, no help, still get the NFS timeouts (the > > intel driver doesn't output anything to dmesg, so it's no way of telling if > > the same things occur as in the eepro100 stock-kernel driver). > > > > This is how I do the test: > > > > NFS share a filesystem > > NFS mount it on another box (not running intel e100 nic) > > Start bonnie++ on the box that has mounted the nfs share > > > > After 10-20mins, the first NFS timeout comes (which means the card is out of > > resources, and "halts" for a bit). When the card becomes out of resources, > > it seems like it uses a few minutes before it comes online again, no wonder > > why, tho. > > > > Has anyone got any suggestions on how to start tracking down, and maybe > > fixing this problem? Or, is this a hardware error? Or maybe a firmware > > Well, with eepro100 the start may be the following: > 1. When the card stalls, start ping from that host. > This way you ensure that you have something in transmit ring. > If it's transmitting that stalls, you'll get a message from netdev watchdog. > 2. If ping works, then your problem appear to be pure NFS one, i.e. inability > of NFS to recover from network operation disruption. > 3. If ping is able to transmit, but not receive (you may check it by > tcpdump), then we have a receiver problem. > We'll think what to do then. > > 4. In any case, running eepro100-diag from scyld.com at the moment of the > stall may give some useful information. > 5. In any case, searching eepro100 mailing list archive on scyld.com is a > good idea, you may learn what other people observe/do. > > Andrey > > > error? Should I start contacting Dell and tell them that's there's a > > possible error in their PowerEdge 2550-series? Guys. This is Network section of my config: # # Networking options # CONFIG_PACKET=y # CONFIG_PACKET_MMAP is not set # CONFIG_NETLINK is not set # CONFIG_NETFILTER is not set # CONFIG_FILTER is not set CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_MULTICAST=y # CONFIG_IP_ADVANCED_ROUTER is not set # CONFIG_IP_PNP is not set # CONFIG_NET_IPIP is not set # CONFIG_NET_IPGRE is not set # CONFIG_IP_MROUTE is not set # CONFIG_INET_ECN is not set # CONFIG_SYN_COOKIES is not set # CONFIG_IPV6 is not set # CONFIG_KHTTPD is not set # CONFIG_ATM is not set # CONFIG_IPX is not set # CONFIG_ATALK is not set # CONFIG_DECNET is not set # CONFIG_BRIDGE is not set # CONFIG_X25 is not set # CONFIG_LAPB is not set # CONFIG_LLC is not set # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set # CONFIG_NET_FASTROUTE is not set # CONFIG_NET_HW_FLOWCONTROL is not set # # QoS and/or fair queueing # # CONFIG_NET_SCHED is not set I work on this config (2.4.13) now and my machine has eepro100.o loaded. Now I test it. This problem is appear when some options of IP section is enabled. Now I can't say which of them. (I think SYN or MROUTE but it's my assumption). > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-31 15:22 ` Andrey Savochkin 2001-10-31 15:52 ` Kirill Ratkin @ 2001-11-01 7:55 ` Thomas Langås 2001-11-01 9:47 ` Andrey Savochkin 1 sibling, 1 reply; 26+ messages in thread From: Thomas Langås @ 2001-11-01 7:55 UTC (permalink / raw) To: Andrey Savochkin; +Cc: linux-kernel, J Sloan Andrey Savochkin: > Well, with eepro100 the start may be the following: > 1. When the card stalls, start ping from that host. > This way you ensure that you have something in transmit ring. > If it's transmitting that stalls, you'll get a message from netdev watchdog. >From the server, or the client? I've already tried pinging from the server when I get the error-message in dmesg, but it's unresponsive to anything. And, I mean anything, network-wise. There seems to be a timeout somewhere, because after some time, everything resumes back to normal again. > 4. In any case, running eepro100-diag from scyld.com at the moment of the > stall may give some useful information. OK, I'll do the test again, and run the eepro100-diag. Any special options you want me to specify? > 5. In any case, searching eepro100 mailing list archive on scyld.com is a > good idea, you may learn what other people observe/do. OK, I'll search... :) -- Thomas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-11-01 7:55 ` Thomas Langås @ 2001-11-01 9:47 ` Andrey Savochkin 2001-11-01 10:00 ` Thomas Langås 0 siblings, 1 reply; 26+ messages in thread From: Andrey Savochkin @ 2001-11-01 9:47 UTC (permalink / raw) To: Thomas LangЕs; +Cc: linux-kernel, J Sloan On Thu, Nov 01, 2001 at 08:55:23AM +0100, Thomas LangЕs wrote: > Andrey Savochkin: > > Well, with eepro100 the start may be the following: > > 1. When the card stalls, start ping from that host. > > This way you ensure that you have something in transmit ring. > > If it's transmitting that stalls, you'll get a message from netdev watchdog. > > From the server, or the client? I've already tried pinging from the server >From the computer where the network card hangs and where you see messages in dmesg. The network card hangs on only one side, right? > when I get the error-message in dmesg, but it's unresponsive to anything. > And, I mean anything, network-wise. There seems to be a timeout somewhere, > because after some time, everything resumes back to normal again. If the operations stall just for few seconds, it's perfectly ok. If after a few second stop the card itself resumes to operate normally, but NFS operations are blocked for much longer time, it's NFS problem. If the card itself stops operation for a long time, it needs to be fixed. Andrey ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-11-01 9:47 ` Andrey Savochkin @ 2001-11-01 10:00 ` Thomas Langås 0 siblings, 0 replies; 26+ messages in thread From: Thomas Langås @ 2001-11-01 10:00 UTC (permalink / raw) To: Andrey Savochkin; +Cc: linux-kernel, J Sloan Andrey Savochkin: > >From the computer where the network card hangs and where you see messages in > dmesg. The network card hangs on only one side, right? Yepp, and sorry, I ment, I tried pinging from client-side. > If the operations stall just for few seconds, it's perfectly ok. > If after a few second stop the card itself resumes to operate normally, but > NFS operations are blocked for much longer time, it's NFS problem. > If the card itself stops operation for a long time, it needs to be fixed. Ok, it seems like the stock-kernel-driver hangs much longer than the intel-driver (intel driver did only hang for a few sec when I tried just now). -- Thomas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-31 8:01 ` Thomas Langås 2001-10-31 15:22 ` Andrey Savochkin @ 2001-10-31 18:10 ` Juergen Hasch 2001-11-01 8:03 ` Thomas Langås ` (2 more replies) 1 sibling, 3 replies; 26+ messages in thread From: Juergen Hasch @ 2001-10-31 18:10 UTC (permalink / raw) To: linux-kernel, Thomas Langås, J Sloan; +Cc: linux-kernel Am Mittwoch, 31. Oktober 2001 09:01 schrieb Thomas Langås: > J Sloan: > > We found that using the intel e100 driver > > instead of the eepro100 eliminates these > > errors - YMMV of course - > > I've now tried the Intel driver, no help, still get the NFS timeouts (the > intel driver doesn't output anything to dmesg, so it's no way of telling if > the same things occur as in the eepro100 stock-kernel driver). I had some trouble with an Intel STL 2 board and the onboard EEPRO100. Samba worked OK but it always got stuck on NFS transfers. There was a bug in the older BMC firmware, so the eepro100 detected some NFS frames as "TCO" packets. (http://support.intel.com/support/motherboards/server/ta_353-1.htm) If you use the e100 driver, you can look at /proc/net/PRO_LAN_ADAPTERS/eth0.info If the "Tx_TCO_Packets" entry isn't zero after NFS times out, this may be your problem. With the eepro100 driver you will only see overruns with ifconfig. If this is the case, you may want to check for a BMC (board management controller) software update. ...Juergen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-31 18:10 ` Juergen Hasch @ 2001-11-01 8:03 ` Thomas Langås 2001-11-01 8:48 ` Juergen Hasch 2001-11-01 11:11 ` Andrey Savochkin 2001-11-01 13:39 ` Henning P. Schmiedehausen 2 siblings, 1 reply; 26+ messages in thread From: Thomas Langås @ 2001-11-01 8:03 UTC (permalink / raw) To: Juergen Hasch; +Cc: linux-kernel, J Sloan Juergen Hasch: > If you use the e100 driver, you can look at > /proc/net/PRO_LAN_ADAPTERS/eth0.info > If the "Tx_TCO_Packets" entry isn't zero after NFS times out, > this may be your problem. > With the eepro100 driver you will only see overruns with ifconfig. Here's the full /proc/net/PRO_LAN_Adapters/eth0.info output (after NFS timeouts): gekko:~# cat /proc/net/PRO_LAN_Adapters/eth0.info Description Intel(R) 8255x-based Ethernet Adapter Driver_Name e100 Driver_Version 1.6.22 PCI_Vendor 0x8086 PCI_Device_ID 0x1229 PCI_Subsystem_Vendor 0x1028 PCI_Subsystem_ID 0x009b PCI_Revision_ID 0x0008 PCI_Bus 2 PCI_Slot 4 IRQ 16 System_Device_Name eth0 Current_HWaddr 00:B0:D0:F0:8B:65 Permanent_HWaddr 00:B0:D0:F0:8B:65 Part_Number 07195d-000 Link up Speed 100 Duplex full State up Rx_Packets 27747043 Tx_Packets 25999146 Rx_Bytes 1730389022 Tx_Bytes 21884644 Rx_Errors 0 Tx_Errors 0 Rx_Dropped 0 Tx_Dropped 0 Multicast 0 Collisions 0 Rx_Length_Errors 0 Rx_Over_Errors 0 Rx_CRC_Errors 0 Rx_Frame_Errors 0 Rx_FIFO_Errors 0 Rx_Missed_Errors 0 Tx_Aborted_Errors 0 Tx_Carrier_Errors 0 Tx_FIFO_Errors 0 Tx_Heartbeat_Errors 0 Tx_Window_Errors 0 Rx_TCP_Checksum_Good 0 Rx_TCP_Checksum_Bad 0 Tx_TCP_Checksum_Good 0 Tx_TCP_Checksum_Bad 0 Tx_Abort_Late_Coll 0 Tx_Deferred_Ok 0 Tx_Single_Coll_Ok 0 Tx_Multi_Coll_Ok 0 Rx_Long_Length_Errors 0 Rx_Align_Errors 0 Tx_Flow_Control_Pause 0 Rx_Flow_Control_Pause 0 Rx_Flow_Control_Unsup 0 Tx_TCO_Packets 0 Rx_TCO_Packets 1 scbp = 0xf89da000 bddp = 0xf77568c0 -- Thomas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-11-01 8:03 ` Thomas Langås @ 2001-11-01 8:48 ` Juergen Hasch 2001-11-01 9:06 ` Thomas Langås 0 siblings, 1 reply; 26+ messages in thread From: Juergen Hasch @ 2001-11-01 8:48 UTC (permalink / raw) To: linux-kernel, Thomas Langås; +Cc: linux-kernel, J Sloan > Here's the full /proc/net/PRO_LAN_Adapters/eth0.info output (after NFS > timeouts): > > gekko:~# cat /proc/net/PRO_LAN_Adapters/eth0.info > Description Intel(R) 8255x-based Ethernet Adapter > Driver_Name e100 > Driver_Version 1.6.22 > PCI_Vendor 0x8086 > PCI_Device_ID 0x1229 > PCI_Subsystem_Vendor 0x1028 > PCI_Subsystem_ID 0x009b > PCI_Revision_ID 0x0008 > PCI_Bus 2 > PCI_Slot 4 > IRQ 16 > System_Device_Name eth0 > Current_HWaddr 00:B0:D0:F0:8B:65 > Permanent_HWaddr 00:B0:D0:F0:8B:65 > Part_Number 07195d-000 > > Link up > Speed 100 > Duplex full > State up > > Rx_Packets 27747043 > Tx_Packets 25999146 > Rx_Bytes 1730389022 > Tx_Bytes 21884644 > Rx_Errors 0 > Tx_Errors 0 > Rx_Dropped 0 > Tx_Dropped 0 > Multicast 0 > Collisions 0 > Rx_Length_Errors 0 > Rx_Over_Errors 0 > Rx_CRC_Errors 0 > Rx_Frame_Errors 0 > Rx_FIFO_Errors 0 > Rx_Missed_Errors 0 > Tx_Aborted_Errors 0 > Tx_Carrier_Errors 0 > Tx_FIFO_Errors 0 > Tx_Heartbeat_Errors 0 > Tx_Window_Errors 0 > > Rx_TCP_Checksum_Good 0 > Rx_TCP_Checksum_Bad 0 > Tx_TCP_Checksum_Good 0 > Tx_TCP_Checksum_Bad 0 > > Tx_Abort_Late_Coll 0 > Tx_Deferred_Ok 0 > Tx_Single_Coll_Ok 0 > Tx_Multi_Coll_Ok 0 > Rx_Long_Length_Errors 0 > Rx_Align_Errors 0 > > Tx_Flow_Control_Pause 0 > Rx_Flow_Control_Pause 0 > Rx_Flow_Control_Unsup 0 > > Tx_TCO_Packets 0 > Rx_TCO_Packets 1 > scbp = 0xf89da000 bddp = 0xf77568c0 Well this doesn't look exactly the same as on the system I had problems with. But your Rx_TCO_Packets counter is 1, so this may be related (I also got Rx overrun errors). It may be that your BMC receives the packet and simply chooses to ignore it because it is no valid server management packet. Could you make another test and take a look at the eth0.info ? I could reproduce the problem when copying a large file over NFS, but not when transferring it via ftp. Try this a few times. If you can reproduce you network card being stuck only when using NFS and having Rx_TCO_Packets > 0 after it is stuck, this is it. Then you either need tu upgrade your BMC firmware or add another network card, which doesn't eat NFS packets. ...Juergen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-11-01 8:48 ` Juergen Hasch @ 2001-11-01 9:06 ` Thomas Langås 2001-11-01 9:43 ` Juergen Hasch 0 siblings, 1 reply; 26+ messages in thread From: Thomas Langås @ 2001-11-01 9:06 UTC (permalink / raw) To: Juergen Hasch; +Cc: linux-kernel, J Sloan Juergen Hasch: > But your Rx_TCO_Packets counter is 1, so this may be related > (I also got Rx overrun errors). It may be that your BMC receives the packet > and simply chooses to ignore it because it is no valid server management > packet. > Could you make another test and take a look at the eth0.info ? > I could reproduce the problem when copying a large file over NFS, but not > when transferring it via ftp. Try this a few times. > If you can reproduce you network card being stuck only when using NFS and > having Rx_TCO_Packets > 0 after it is stuck, this is it. > Then you either need tu upgrade your BMC firmware or add another network card, > which doesn't eat NFS packets. I'm testing now, however, running eepro100-diag gave me some interessting output: Sleep mode is enabled. This is not recommended. Under high load the card may not respond to PCI requests, and thus cause a master abort. How do I disable sleepmode? I've never even enabled it. -- Thomas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-11-01 9:06 ` Thomas Langås @ 2001-11-01 9:43 ` Juergen Hasch 0 siblings, 0 replies; 26+ messages in thread From: Juergen Hasch @ 2001-11-01 9:43 UTC (permalink / raw) To: linux-kernel, Thomas Langås; +Cc: linux-kernel, J Sloan Am Donnerstag, 1. November 2001 10:06 schrieb Thomas Langås: > Juergen Hasch: > > I'm testing now, however, running eepro100-diag gave me some interessting > output: > > Sleep mode is enabled. This is not recommended. Under high load the card > may not respond to PCI requests, and thus cause a master abort. > > How do I disable sleepmode? I've never even enabled it. The sleep bit is sometimes enabled by default (it was for me). You can clear it with eepro100-diag (I think it was the -Gww option). The documentation for eepro100-diag is somehow sparse, but clearing the sleep bit was discussed on the eepro100 mailing list at scyld.com in great detail. You might want to browse the archives there. ...Juergen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-31 18:10 ` Juergen Hasch 2001-11-01 8:03 ` Thomas Langås @ 2001-11-01 11:11 ` Andrey Savochkin 2001-11-01 12:00 ` Thomas Langås 2001-11-01 13:39 ` Henning P. Schmiedehausen 2 siblings, 1 reply; 26+ messages in thread From: Andrey Savochkin @ 2001-11-01 11:11 UTC (permalink / raw) To: Juergen Hasch; +Cc: linux-kernel, Thomas LangЕs, J Sloan On Wed, Oct 31, 2001 at 07:10:49PM +0100, Juergen Hasch wrote: > > I had some trouble with an Intel STL 2 board and the onboard EEPRO100. > Samba worked OK but it always got stuck on NFS transfers. > > There was a bug in the older BMC firmware, so the eepro100 detected > some NFS frames as "TCO" packets. > (http://support.intel.com/support/motherboards/server/ta_353-1.htm) > > If you use the e100 driver, you can look at > /proc/net/PRO_LAN_ADAPTERS/eth0.info > If the "Tx_TCO_Packets" entry isn't zero after NFS times out, > this may be your problem. > With the eepro100 driver you will only see overruns with ifconfig. It should be Rx_TCO_Packets, not Tx. The problem described in Intel's advisory is related to incorrect processing of receiving packets. Andrey ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-11-01 11:11 ` Andrey Savochkin @ 2001-11-01 12:00 ` Thomas Langås 2001-11-01 12:15 ` Juergen Hasch 0 siblings, 1 reply; 26+ messages in thread From: Thomas Langås @ 2001-11-01 12:00 UTC (permalink / raw) To: Andrey Savochkin; +Cc: Juergen Hasch, linux-kernel, J Sloan Andrey Savochkin: > It should be Rx_TCO_Packets, not Tx. > The problem described in Intel's advisory is related to incorrect processing > of receiving packets. But if it's this bug that's triggered with NFS-traffic, then the counter should be increasing with every timeout, right? Not just one time. I get a lot of timeout and the counter is still just 1. I'm going out to buy me another NIC and try tests a bit systematically, and report back with the results afterwards. -- Thomas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-11-01 12:00 ` Thomas Langås @ 2001-11-01 12:15 ` Juergen Hasch 0 siblings, 0 replies; 26+ messages in thread From: Juergen Hasch @ 2001-11-01 12:15 UTC (permalink / raw) To: linux-kernel, Thomas Langås, Andrey Savochkin; +Cc: J Sloan Am Donnerstag, 1. November 2001 13:00 schrieb Thomas Langås: > Andrey Savochkin: > > It should be Rx_TCO_Packets, not Tx. > > The problem described in Intel's advisory is related to incorrect > > processing of receiving packets. > > But if it's this bug that's triggered with NFS-traffic, then the counter > should be increasing with every timeout, right? Not just one time. I get a > lot of timeout and the counter is still just 1. > > I'm going out to buy me another NIC and try tests a bit systematically, and > report back with the results afterwards. The Rx_TCO_Packets counter should increase at each timeout you get, so this looks like another problem. I have got two servers with two different EEPRO100 network cards. One works better with the eepro100 driver, the other one seems to favour the e100 driver :-) Both cards are working flawlessly now, however I was close to buying new NICs because of the problems like command timeouts, no resources messages and NFS timeouts. ...Juergen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-31 18:10 ` Juergen Hasch 2001-11-01 8:03 ` Thomas Langås 2001-11-01 11:11 ` Andrey Savochkin @ 2001-11-01 13:39 ` Henning P. Schmiedehausen 2 siblings, 0 replies; 26+ messages in thread From: Henning P. Schmiedehausen @ 2001-11-01 13:39 UTC (permalink / raw) To: linux-kernel Hasch@t-online.de (Juergen Hasch) writes: >> I've now tried the Intel driver, no help, still get the NFS timeouts (the >> intel driver doesn't output anything to dmesg, so it's no way of telling if >> the same things occur as in the eepro100 stock-kernel driver). >I had some trouble with an Intel STL 2 board and the onboard EEPRO100. >Samba worked OK but it always got stuck on NFS transfers. A datapoint that might be interesting: I run four of these buggers with eepros as Internet-Interfaces for heavy traffic (30-80 MBit sustained 24/7) under 2.2.19. Not a single glitch on each of these boxes. The machines have two PIII/1GHz each and a (custom built) SMP kernel based off RH 2.2.19-6.2.7 boot message: eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html eepro100.c: $Revision: 1.20.2.10 $ 2000/05/31 Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others eepro100.c: VA Linux custom, Dragan Stancevic <visitor@valinux.com> 2000/11/15 eth0: Intel PCI EtherExpress Pro100 82557, 00:D0:B7:A8:67:EC, I/O at 0x2c00, IRQ 21. Board assembly 000000-000, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x04f4518b). So there may be a change between 2.2 and 2.4 that triggers the problems. Regards Henning -- Dipl.-Inf. (Univ.) Henning P. Schmiedehausen -- Geschaeftsfuehrer INTERMETA - Gesellschaft fuer Mehrwertdienste mbH hps@intermeta.de Am Schwabachgrund 22 Fon.: 09131 / 50654-0 info@intermeta.de D-91054 Buckenhof Fax.: 09131 / 50654-20 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-29 1:13 Intel EEPro 100 with kernel drivers Thomas Langås 2001-10-29 1:57 ` Jim Hull 2001-10-29 3:43 ` J Sloan @ 2001-10-29 10:44 ` Alan Cox 2001-10-29 11:40 ` Michael Rozhavsky 2001-10-29 13:52 ` Thomas Langås 2001-10-30 8:36 ` Jarmo Järvenpää 3 siblings, 2 replies; 26+ messages in thread From: Alan Cox @ 2001-10-29 10:44 UTC (permalink / raw) To: linux-kernel > directory on a machine), the network-card says "eth0: Card reports no > resources" to dmesg, and then the "line" appear dead for some time (one > minutte or more). What can be done to remove this error? NFS timesout with > this error (obviously)... Which kernel version, which eepro100 chip ? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-29 10:44 ` Alan Cox @ 2001-10-29 11:40 ` Michael Rozhavsky 2001-10-29 11:49 ` Alan Cox 2001-10-29 13:52 ` Thomas Langås 1 sibling, 1 reply; 26+ messages in thread From: Michael Rozhavsky @ 2001-10-29 11:40 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel Hi, We have exactly the same problem with 2.4.9, 2.4.10 and 2.4.13, so We had to switch to Intel's driver. from 'cat /proc/pci' Bus 1, device 1, function 0: Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 8). IRQ 9. Master Capable. Latency=64. Min Gnt=8.Max Lat=56. Non-prefetchable 32 bit memory at 0xff8fe000 [0xff8fefff]. I/O at 0xdf00 [0xdf3f]. Non-prefetchable 32 bit memory at 0xff600000 [0xff6fffff]. It is Intel i810 motherboard with NIC onboard. but Intel's driver (e100-1.6.22) says on boot: eth0: Intel(R) 82559 Fast Ethernet LAN on Motherboard the chip is: GD82559 L021LP51 We have this problem when nic is under high traffic. Is there any other information that can help you to track the problem? P.S. I can reproduce this problem any time. On Mon, Oct 29, 2001 at 10:44:41AM +0000, Alan Cox wrote: > > directory on a machine), the network-card says "eth0: Card reports no > > resources" to dmesg, and then the "line" appear dead for some time (one > > minutte or more). What can be done to remove this error? NFS timesout with > > this error (obviously)... > > Which kernel version, which eepro100 chip ? > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ Best regards. -- Michael Rozhavsky Tel: +972-4-9936248 mrozhavsky@opticalaccess.com Fax: +972-4-9890564 Optical Access Senior Software Engineer www.opticalaccess.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-29 11:40 ` Michael Rozhavsky @ 2001-10-29 11:49 ` Alan Cox 2001-10-29 11:45 ` Michael Rozhavsky 0 siblings, 1 reply; 26+ messages in thread From: Alan Cox @ 2001-10-29 11:49 UTC (permalink / raw) To: Michael Rozhavsky; +Cc: Alan Cox, linux-kernel > We have exactly the same problem with 2.4.9, 2.4.10 and 2.4.13, so > We had to switch to Intel's driver. 10Mbit half duplex ? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-29 11:49 ` Alan Cox @ 2001-10-29 11:45 ` Michael Rozhavsky 0 siblings, 0 replies; 26+ messages in thread From: Michael Rozhavsky @ 2001-10-29 11:45 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel On Mon, Oct 29, 2001 at 11:49:14AM +0000, Alan Cox wrote: > > We have exactly the same problem with 2.4.9, 2.4.10 and 2.4.13, so > > We had to switch to Intel's driver. > > 10Mbit half duplex ? 10Mbit but Full duplex. Best regards. -- Michael Rozhavsky Tel: +972-4-9936248 mrozhavsky@opticalaccess.com Fax: +972-4-9890564 Optical Access Senior Software Engineer www.opticalaccess.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-29 10:44 ` Alan Cox 2001-10-29 11:40 ` Michael Rozhavsky @ 2001-10-29 13:52 ` Thomas Langås 1 sibling, 0 replies; 26+ messages in thread From: Thomas Langås @ 2001-10-29 13:52 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel Alan Cox: > Which kernel version, which eepro100 chip ? All kernels so far, starting with 2.4.0 (the first one we tested), and we've now come to 2.4.13 and the error is still there. Output from lspci -vvvxx: 02:04.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08) Subsystem: Dell Computer Corporation: Unknown device 009b Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (2000ns min, 14000ns max), cache line size 08 Interrupt: pin A routed to IRQ 16 Region 0: Memory at fe900000 (32-bit, non-prefetchable) [size=4K] Region 1: I/O ports at bcc0 [size=64] Region 2: Memory at fe500000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at fe600000 [disabled] [size=1M] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=2 PME- 00: 86 80 29 12 17 01 90 02 08 00 00 02 08 20 00 00 10: 00 00 90 fe c1 bc 00 00 00 00 50 fe 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 9b 00 30: 00 00 60 fe dc 00 00 00 00 00 00 00 05 01 08 38 Output from dmesg: eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others eth0: Intel Corporation 82557 [Ethernet Pro 100], 00:B0:D0:F0:8B:65, IRQ 16. Receiver lock-up bug exists -- enabling work-around. Board assembly 07195d-000, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x04f4518b). Receiver lock-up workaround activated. I'd gladly help you track down and fix this problem, and if you need any more info (or testing of patches) just tell me :) -- Thomas ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-29 1:13 Intel EEPro 100 with kernel drivers Thomas Langås ` (2 preceding siblings ...) 2001-10-29 10:44 ` Alan Cox @ 2001-10-30 8:36 ` Jarmo Järvenpää 2001-10-30 8:54 ` Dead2 3 siblings, 1 reply; 26+ messages in thread From: Jarmo Järvenpää @ 2001-10-30 8:36 UTC (permalink / raw) To: linux-kernel Thomas Langås wrote: > > Hi! > > We've got a lot of machines with the eepro 100 from intel onboard, and when > we try to stress-test the network (running bonnie++ on a nfs-shared > directory on a machine), the network-card says "eth0: Card reports no > resources" to dmesg, and then the "line" appear dead for some time (one > minutte or more). What can be done to remove this error? NFS timesout with > this error (obviously)... > > -- > Thomas We have almost the same problem, except it totally locks up the computer. Light network utilization is ok, but heavy traffic does the effect. No syslog reports, even keyboards leds won't light up (numlock, etc). Rebooting helps for a while. We had to install another network card for a workaround. I've tried kernels 2.4.10 and 2.4.12. The network card is integrated at the motherboard. dmesg: ---- eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others PCI: Found IRQ 11 for device 01:08.0 eth1: Intel Corporation 82801BA(M) Ethernet, 00:03:47:A2:F8:81, IRQ 11. Board assembly 000000-000, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x04f4518b). ----- [obelix:/root]> eepro100-diag -ee -f -vv eepro100-diag.c:v2.05 6/13/2001 Donald Becker (becker@scyld.com) http://www.scyld.com/diag/index.html Index #1: Found a Intel i82562 Pro/100 V adapter at 0xde80. i82557 chip registers at 0xde80: 00000000 00000000 00000000 00080002 183f0000 00000000 No interrupt sources are pending. The transmit unit state is 'Idle'. The receive unit state is 'Idle'. This status is unusual for an activated interface. EEPROM contents, size 64x16: 00: 0300 a247 81f8 1a03 0000 0201 4701 0000 0x08: 0000 0000 49b0 3013 8086 007f ffff ffff 0x10: ffff ffff ffff ffff ffff ffff ffff ffff 0x18: ffff ffff ffff ffff ffff ffff ffff ffff 0x20: ffff ffff ffff ffff ffff ffff ffff ffff 0x28: ffff ffff ffff ffff ffff ffff ffff ffff 0x30: 0000 ffff ffff ffff ffff ffff ffff ffff 0x38: ffff ffff ffff 0000 ffff ffff ffff 35dd The EEPROM checksum is correct. Intel EtherExpress Pro 10/100 EEPROM contents: Station address 00:03:47:A2:F8:81. Board assembly 000000-000, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. MII PHY #1 transceiver registers: 3100 7809 02a8 0330 05e1 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 2004 0000 0000 0000 0000 0000 0000 0000 0000 0000 0ce0 0000 0010 0000 0000 0000. [obelix:/root]> ---- Oh, eepro100-diag reported 'Sleep mode is enabled', which could do something like this -> I disabled it, but no positive effect. Any similar problems? Thanks, Jarmo ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-30 8:36 ` Jarmo Järvenpää @ 2001-10-30 8:54 ` Dead2 2001-10-30 9:02 ` Jarmo Järvenpää 0 siblings, 1 reply; 26+ messages in thread From: Dead2 @ 2001-10-30 8:54 UTC (permalink / raw) To: linux-kernel >We have almost the same problem, except it totally locks up the >computer. Light network utilization is ok, but heavy traffic does the >effect. >No syslog reports, even keyboards leds won't light up (numlock, etc). >Rebooting helps for a while. We had to install another network card for >a workaround. I've tried kernels 2.4.10 and 2.4.12. > >The network card is integrated at the motherboard. > >dmesg: >---- >*snip* >----- >*snip* >---- >Oh, eepro100-diag reported 'Sleep mode is enabled', which could do >something like this -> I disabled it, but no positive effect. > > >Any similar problems? Sounds like this might be the same problem that we are experiencing here. The nic does get a high load of traffic immedeately when it has booted up. No messages of anything remotely wrong whatsoever, even after setting the highest debug level in the eepro100 driver. -=Dead2=- *my previous message to this list about this issue* > Tested now with another motherboard with the same results. > MSI 6321 Pro 1.0 > > Both these motherboards use VIA dual-cpu chipsets. > Same results with 2.4.13-Pre6 on both motherboards. > > > I have an Asus CUV266-d motherboard, and want to use my Intel NIC's.. > > > > 2.4.10 & 2.4.12 hangs while "Setting up routing" > > No error messages appear. > > > > 2.4.x(4 maybe?) has both 'e100' drivers and the 'eepro100' drivers. > > When loading the 'eepro100', it hangs just like with todays kernels. > > When loading the 'e100', everything works just fine for a short while.. > > 20-40seconds I guess.. Then the computer hangs. > > > > When not loading any NIC drivers, everything works just fine. > > > > The NIC's i've tried are named "Intel(R) PRO/100+ Dual Port Server > Adapter" > > Have also tried a "Intel(R) PRO/100+ Adapter" > > > > Any ideas of what to test? > > I have the latest bios and have tried just about all bios settings. > > 'noapic' doesn't help. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Intel EEPro 100 with kernel drivers 2001-10-30 8:54 ` Dead2 @ 2001-10-30 9:02 ` Jarmo Järvenpää 0 siblings, 0 replies; 26+ messages in thread From: Jarmo Järvenpää @ 2001-10-30 9:02 UTC (permalink / raw) To: Dead2; +Cc: linux-kernel I searched a bit and seems some users have had same kind of problems with 10Mbit network with high amounts of collisions, just like ours is. Jarmo ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2001-11-01 13:39 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-10-29 1:13 Intel EEPro 100 with kernel drivers Thomas Langås 2001-10-29 1:57 ` Jim Hull 2001-10-29 3:43 ` J Sloan 2001-10-31 8:01 ` Thomas Langås 2001-10-31 15:22 ` Andrey Savochkin 2001-10-31 15:52 ` Kirill Ratkin 2001-11-01 7:55 ` Thomas Langås 2001-11-01 9:47 ` Andrey Savochkin 2001-11-01 10:00 ` Thomas Langås 2001-10-31 18:10 ` Juergen Hasch 2001-11-01 8:03 ` Thomas Langås 2001-11-01 8:48 ` Juergen Hasch 2001-11-01 9:06 ` Thomas Langås 2001-11-01 9:43 ` Juergen Hasch 2001-11-01 11:11 ` Andrey Savochkin 2001-11-01 12:00 ` Thomas Langås 2001-11-01 12:15 ` Juergen Hasch 2001-11-01 13:39 ` Henning P. Schmiedehausen 2001-10-29 10:44 ` Alan Cox 2001-10-29 11:40 ` Michael Rozhavsky 2001-10-29 11:49 ` Alan Cox 2001-10-29 11:45 ` Michael Rozhavsky 2001-10-29 13:52 ` Thomas Langås 2001-10-30 8:36 ` Jarmo Järvenpää 2001-10-30 8:54 ` Dead2 2001-10-30 9:02 ` Jarmo Järvenpää
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox