* Re: TG3: very high CPU usage [not found] <fa.eu7l1gd.ekqe1j@ifi.uio.no> @ 2004-01-19 11:32 ` Andreas Hartmann 2004-01-20 3:54 ` Mark Williams (MWP) 0 siblings, 1 reply; 21+ messages in thread From: Andreas Hartmann @ 2004-01-19 11:32 UTC (permalink / raw) To: linux-kernel Mark Williams (MWP) wrote: [...] > However, when using Apache or any FTP client/daemon, the TG3 driver appears to be VERY slow maxing out CPU usage at 100% while only transfering at around 12MB/sec. > This applies for both incoming or outgoing data. [...] > Ive tried other NICs, etc and confirmed that it is a problem with the TG3 driver. I saw the same problem with the bcm-driver (Kernel 2.4.x) shipped with SuSE 9 / SLES 8. Testcase was the initial mirror of a 10 GB partition on a raid5 serveraid / XSeries 235 (2 way) to the same hardware on the remote machine using both times the onboard NIC (Broadcom GBit Ethernet) via drbd: 100% CPU usage, 12 MB/s, machine is nearly death. Regards, Andreas Hartmann ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-19 11:32 ` TG3: very high CPU usage Andreas Hartmann @ 2004-01-20 3:54 ` Mark Williams (MWP) 0 siblings, 0 replies; 21+ messages in thread From: Mark Williams (MWP) @ 2004-01-20 3:54 UTC (permalink / raw) To: Andreas Hartmann; +Cc: linux-kernel > Mark Williams (MWP) wrote: > [...] > >However, when using Apache or any FTP client/daemon, the TG3 driver > >appears to be VERY slow maxing out CPU usage at 100% while only > >transfering at around 12MB/sec. > >This applies for both incoming or outgoing data. > > [...] > > >Ive tried other NICs, etc and confirmed that it is a problem with the TG3 > >driver. > > I saw the same problem with the bcm-driver (Kernel 2.4.x) shipped with > SuSE 9 / SLES 8. Testcase was the initial mirror of a 10 GB partition on a > raid5 serveraid / XSeries 235 (2 way) to the same hardware on the remote > machine using both times the onboard NIC (Broadcom GBit Ethernet) via drbd: > 100% CPU usage, 12 MB/s, machine is nearly death. Well im glad someone else also has this problem. Any of the TG3 maintainers have an idea as to whats causing it? Im handy with C, but nowhere near good enough to go hacking away at the driver. I would be happy to help test new drivers if needed. Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <fa.g9joqss.1nneajs@ifi.uio.no>]
[parent not found: <fa.e29fqcc.sick10@ifi.uio.no>]
* Re: TG3: very high CPU usage [not found] ` <fa.e29fqcc.sick10@ifi.uio.no> @ 2004-01-20 9:17 ` Andreas Hartmann 2004-01-20 9:44 ` Lincoln Dale 0 siblings, 1 reply; 21+ messages in thread From: Andreas Hartmann @ 2004-01-20 9:17 UTC (permalink / raw) To: linux-kernel Hi, I searched for tg3 in lkml and found one more posting, dealing with these problems (subject): bcm5705 with tg3 driver and high rx load -> bad system responsiveness There really seems to be a problem. Ronald Wahl pointed out, that the driver from http://www.broadcom.com/drivers/downloaddrivers.php does not have the problem. Maybe, we could both look for drivers from the hardware producer and test them? I will do it when I'm back at work in two weeks. Regards, Andreas Hartmann ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-20 9:17 ` Andreas Hartmann @ 2004-01-20 9:44 ` Lincoln Dale 2004-01-20 10:16 ` Mark Williams (MWP) 0 siblings, 1 reply; 21+ messages in thread From: Lincoln Dale @ 2004-01-20 9:44 UTC (permalink / raw) To: Andreas Hartmann; +Cc: linux-kernel [you may want to use linux-net@vger.kernel.org instead of linux-kernel; its possible that the tg3 folk lost your email in the flood] At 08:17 PM 20/01/2004, Andreas Hartmann wrote: >Hi, > >I searched for tg3 in lkml and found one more posting, dealing with these >problems (subject): > >bcm5705 with tg3 driver and high rx load -> bad system responsiveness > >There really seems to be a problem. Ronald Wahl pointed out, that the >driver from >http://www.broadcom.com/drivers/downloaddrivers.php does not have the >problem. Maybe, we could both look for drivers from the hardware producer >and test them? I will do it when I'm back at work in two weeks. how exactly are you "triggering" the high CPU load? i.e. what is the server doing? file-sharing? NFS? CIFS? something else? i have LOTS of IBM xSeries servers (IBM x335, x345, x440), all of which have Broadcom BCM 5700 (tg3) NICs. i drive them all at wire-rate gig-e with iSCSI. i'm yet to see any 'excessive' CPU load associated with tg3 relative to tigon2 (AceNIC2) and Intel e1000 NICs. cheers, lincoln. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-20 9:44 ` Lincoln Dale @ 2004-01-20 10:16 ` Mark Williams (MWP) 2004-01-20 12:09 ` Lincoln Dale 0 siblings, 1 reply; 21+ messages in thread From: Mark Williams (MWP) @ 2004-01-20 10:16 UTC (permalink / raw) To: Lincoln Dale; +Cc: linux-kernel > [you may want to use linux-net@vger.kernel.org instead of linux-kernel; its > possible that the tg3 folk lost your email in the flood] > > At 08:17 PM 20/01/2004, Andreas Hartmann wrote: > >Hi, > > > >I searched for tg3 in lkml and found one more posting, dealing with these > >problems (subject): > > > >bcm5705 with tg3 driver and high rx load -> bad system responsiveness > > > >There really seems to be a problem. Ronald Wahl pointed out, that the > >driver from > >http://www.broadcom.com/drivers/downloaddrivers.php does not have the > >problem. Maybe, we could both look for drivers from the hardware producer > >and test them? I will do it when I'm back at work in two weeks. > > how exactly are you "triggering" the high CPU load? i.e. what is the > server doing? file-sharing? NFS? CIFS? something else? Any transfer (Apache, FTP, Samaba), causes it. > i have LOTS of IBM xSeries servers (IBM x335, x345, x440), all of which > have Broadcom BCM 5700 (tg3) NICs. > i drive them all at wire-rate gig-e with iSCSI. > > i'm yet to see any 'excessive' CPU load associated with tg3 relative to > tigon2 (AceNIC2) and Intel e1000 NICs. It might not effect those cards. I think the TG3 driver was changed to support the card im trying to use (Netgear GA302T) and similar. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-20 10:16 ` Mark Williams (MWP) @ 2004-01-20 12:09 ` Lincoln Dale 0 siblings, 0 replies; 21+ messages in thread From: Lincoln Dale @ 2004-01-20 12:09 UTC (permalink / raw) To: Mark Williams (MWP); +Cc: linux-kernel At 09:16 PM 20/01/2004, Mark Williams (MWP) wrote: > > i have LOTS of IBM xSeries servers (IBM x335, x345, x440), all of which > > have Broadcom BCM 5700 (tg3) NICs. > > i drive them all at wire-rate gig-e with iSCSI. > > > > i'm yet to see any 'excessive' CPU load associated with tg3 relative to > > tigon2 (AceNIC2) and Intel e1000 NICs. > >It might not effect those cards. >I think the TG3 driver was changed to support the card im trying to use >(Netgear GA302T) and similar. curious. i remember from the Tigon2 days, it didn't matter if you used a NetGear card, an Alteon card, or an Alteon card ripped out of the inside of an ACEDirector switch -- they were all the same reference design. i don't believe that anyone using the bcm5700 would deviate significantly beyond the reference design - there wouldn't be any reason to. (the only variants are probably due to dual-port versions ... of course, i'm sure the tg3 driver authors will now correct me on the differences. <grin>). cheers, lincoln. ^ permalink raw reply [flat|nested] 21+ messages in thread
* TG3: very high CPU usage @ 2004-01-19 3:35 Mark Williams (MWP) 2004-01-20 12:33 ` JG 0 siblings, 1 reply; 21+ messages in thread From: Mark Williams (MWP) @ 2004-01-19 3:35 UTC (permalink / raw) To: linux-kernel Greetings all, Has the TG3 driver been well tested with the AC9100 and compatible gigabit NIC chipsets? iperf, between a 2.6.0 box and a WinXP box (both running Netgear GA302Ts with the AC9100), shows max throughput of 35MB/sec. However, when using Apache or any FTP client/daemon, the TG3 driver appears to be VERY slow maxing out CPU usage at 100% while only transfering at around 12MB/sec. This applies for both incoming or outgoing data. 2.6.1 behaves worse, using 100% CPU usage to maintain approx 9MB/sec rates. Ive tried other NICs, etc and confirmed that it is a problem with the TG3 driver. Is this a known problem? Thanks, Mark Williams. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-19 3:35 Mark Williams (MWP) @ 2004-01-20 12:33 ` JG 2004-01-20 23:13 ` Lincoln Dale 0 siblings, 1 reply; 21+ messages in thread From: JG @ 2004-01-20 12:33 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 2098 bytes --] hi, > iperf, between a 2.6.0 box and a WinXP box (both running Netgear GA302Ts with the AC9100), shows max throughput of 35MB/sec. i have also two boxes (one with 2.6.0, the other one 2.6.1-mm2) equipped with netgear ga302t cards (x-over cable). i don't see a very high cpu usage, but since upgrading to 2.6.x kernels i sometimes have really weird speed issues. i often only get transfer rates of about ~200-300 kilobytes/second...yes, and this over a gigabit interface, tested over ftp. i'm also running a nfs server on the 2.6.1-mm2 box, the 2.6.0 pc is the client, but again, sometimes it's *very* slow. if i reboot my 2.6.1-mm2 box (the other one is a server which can't be rebooted) it seems to be fine for some time. i didn't have such problems with 2.4.19 kernels on both pcs, there i got about 30-35MB/s over ftp without any problems, so i don't think it's hardware related. lspci -v 2.6.1-mm2: 00:09.0 Ethernet controller: Altima (nee Broadcom) AC9100 Gigabit Ethernet (rev 15) Subsystem: Netgear: Unknown device 302a Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 16 Memory at cffe0000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] PCI-X non-bridge device. Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- 2.6.0: same as above, only other interrupt this is also something i don't know how to debug, it is on the 2.6.0 box with an uptime of 7 days. ifconfig: eth1 Link encap:Ethernet HWaddr 00:09:5B:1F:1F:BC inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:217871027 errors:2769019 dropped:0 overruns:0 frame:2771160 TX packets:150029615 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2016894721 (1923.4 Mb) TX bytes:1073040436 (1023.3 Mb) Interrupt:11 how can i find out where these errors come from? thx, JG [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-20 12:33 ` JG @ 2004-01-20 23:13 ` Lincoln Dale 2004-01-21 3:19 ` Tom Sightler 0 siblings, 1 reply; 21+ messages in thread From: Lincoln Dale @ 2004-01-20 23:13 UTC (permalink / raw) To: JG, Andreas Hartmann; +Cc: linux-kernel At 11:33 PM 20/01/2004, JG wrote: >i have also two boxes (one with 2.6.0, the other one 2.6.1-mm2) equipped >with netgear ga302t cards (x-over cable). >i don't see a very high cpu usage, but since upgrading to 2.6.x kernels i >sometimes have really weird speed issues. i often only get transfer rates >of about ~200-300 kilobytes/second...yes, and this over a gigabit >interface, tested over ftp. >i'm also running a nfs server on the 2.6.1-mm2 box, the 2.6.0 pc is the >client, but again, sometimes it's *very* slow. if i reboot my 2.6.1-mm2 >box (the other one is a server which can't be rebooted) it seems to be >fine for some time. > >i didn't have such problems with 2.4.19 kernels on both pcs, there i got >about 30-35MB/s over ftp without any problems, so i don't think it's >hardware related. IBM x335 server (dual P4 Xeons @ 2.4GHz), BCM 5702 onboard 2 x 10/100/1000, connected via copper 1000baseT to a Cisco Catalyst 3750 ethernet switch. running ttcp between two hosts shows wire-rate @ 17% CPU. gig-e is not using jumbo frames: [root@mel-stglab-host31 root]# ttcp -t -l 65536 -v -b 2097152 -s -D -n100000 10.67.16.91 ttcp-t: buflen=65536, nbuf=100000, align=16384/0, port=5001, sockbufsize=2097152 tcp -> 10.67.16.91 ttcp-t: socket ttcp-t: sndbuf ttcp-t: nodelay ttcp-t: connect ttcp-t: 6553600000 bytes in 58.42 real seconds = 109558.82 KB/sec +++ ttcp-t: 6553600000 bytes in 10.38 CPU seconds = 616723.50 KB/cpu sec ttcp-t: 100000 I/O calls, msec/call = 0.60, calls/sec = 1711.86 ttcp-t: 0.0user 10.3sys 0:58real 17% 0i+0d 0maxrss 0+16pf 79360+131csw ttcp-t: buffer address 0x8050000 [root@mel-stglab-host31 root]# uname -a Linux mel-stglab-host31 2.6.0-test9 #13 SMP Mon Nov 3 17:18:17 EST 2003 i686 i686 i386 GNU/Linux [root@mel-stglab-host31 root]# lspci -v [..] 03:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703 Gigabit Ethernet (rev 02) Subsystem: IBM: Unknown device 026f Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 24 Memory at f87f0000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] PCI-X non-bridge device. Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- 03:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703 Gigabit Ethernet (rev 02) Subsystem: IBM: Unknown device 026f Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 25 Memory at f87e0000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] PCI-X non-bridge device. Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- [root@mel-stglab-host31 asm]# ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 511 RX Mini: 0 RX Jumbo: 255 TX: 0 Current hardware settings: RX: 200 RX Mini: 0 RX Jumbo: 100 TX: 511 [root@mel-stglab-host31 asm]# ethtool -i eth0 driver: tg3 version: 2.2 firmware-version: bus-info: 0000:03:01.0 [root@mel-stglab-host31 asm]# ethtool eth0 Settings for eth0: Supported ports: [ MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Current message level: 0x000000ff (255) Link detected: yes [root@mel-stglab-host31 asm]# the only thing unusual about this kernel that i'm running is that i don't use HighMem; i fixup PAGE_OFFSET to 0x80000000 to avoid the performance overhead of PAE mode. cheers, lincoln. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-20 23:13 ` Lincoln Dale @ 2004-01-21 3:19 ` Tom Sightler 2004-01-22 3:57 ` Lincoln Dale 0 siblings, 1 reply; 21+ messages in thread From: Tom Sightler @ 2004-01-21 3:19 UTC (permalink / raw) To: Lincoln Dale; +Cc: JG, Andreas Hartmann, Linux-Kernel On Tue, 2004-01-20 at 18:13, Lincoln Dale wrote: > At 11:33 PM 20/01/2004, JG wrote: > >i have also two boxes (one with 2.6.0, the other one 2.6.1-mm2) equipped > >with netgear ga302t cards (x-over cable). > >i don't see a very high cpu usage, but since upgrading to 2.6.x kernels i > >sometimes have really weird speed issues. i often only get transfer rates > >of about ~200-300 kilobytes/second...yes, and this over a gigabit > >interface, tested over ftp. > >i'm also running a nfs server on the 2.6.1-mm2 box, the 2.6.0 pc is the > >client, but again, sometimes it's *very* slow. if i reboot my 2.6.1-mm2 > >box (the other one is a server which can't be rebooted) it seems to be > >fine for some time. > > > >i didn't have such problems with 2.4.19 kernels on both pcs, there i got > >about 30-35MB/s over ftp without any problems, so i don't think it's > >hardware related. I'm curious is the people seeing this problem happen to have preempt enabled in their config. I've noticed that my laptop, which also happens to have a tg3 based 10/100/1000 card, uses tons of CPU during trasfers, but only when preempt is enabled. After looking into this, my Aironet wireless has exactly the same problem. When preempt is enabled a simple scp transfer running at approximately maximum speed for 802.11b (7.5Mb/sec) uses almost 70% of the CPU. The tg3 driver doing the same scp at 40Mb/sec (100Mb ethernet) uses > 90% of the CPU. However, turning off preempt and my system runs at approximately the same speed on wireless (7.5Mb/sec) but only about 5% CPU. The tg3 driver with preempt disabled allows the scp to run at near wire speed (95-100Mb/sec) and uses only a fraction of the CPU. Just curious if this might be what others are seeing. Later, Tom ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-21 3:19 ` Tom Sightler @ 2004-01-22 3:57 ` Lincoln Dale 2004-01-22 12:55 ` JG 2004-01-22 16:06 ` Tom Sightler 0 siblings, 2 replies; 21+ messages in thread From: Lincoln Dale @ 2004-01-22 3:57 UTC (permalink / raw) To: Tom Sightler; +Cc: JG, Andreas Hartmann, Linux-Kernel At 02:19 PM 21/01/2004, Tom Sightler wrote: >I'm curious is the people seeing this problem happen to have preempt >enabled in their config. I've noticed that my laptop, which also >happens to have a tg3 based 10/100/1000 card, uses tons of CPU during >trasfers, but only when preempt is enabled. nope. i didn't use PREEMPT=y in my previous test, but i have just done so now. the difference in CPU utilization when pushing wire-rate gig-e ttcp on this system (Dual P4 Xeon) with PREEMPT=y or PREEMPT=n is just noise. you should run oprofile and see where your cpu time is spent. with preempt enabled: ttcp-t: 6553600000 bytes in 58.29 real seconds = 109797.21 KB/sec +++ ttcp-t: 6553600000 bytes in 18.47 CPU seconds = 346485.49 KB/cpu sec ttcp-t: 100000 I/O calls, msec/call = 0.60, calls/sec = 1715.58 ttcp-t: 0.1user 18.3sys 0:58real 31% 0i+0d 0maxrss 0+16pf 8038+1csw with preempt disabled: ttcp-t: 6553600000 bytes in 58.42 real seconds = 109543.94 KB/sec +++ ttcp-t: 6553600000 bytes in 18.82 CPU seconds = 340115.47 KB/cpu sec ttcp-t: 100000 I/O calls, msec/call = 0.60, calls/sec = 1711.62 ttcp-t: 0.0user 18.7sys 0:58real 32% 0i+0d 0maxrss 0+16pf 7985+2csw -- with PREEMPT=y: [root@mel-stglab-host31 linux]# zcat /proc/config.gz |grep PREEM CONFIG_PREEMPT=y [root@mel-stglab-host31 linux]# sh -c 'opcontrol --start; opcontrol --reset; ttcp -t -l65536 -s -v -b2097152 -D -n100000 10.67.16.91; opcontrol --stop; opreport -l /usr/src/linux/vmlinux' 2>&1 | head -30 Profiler running. Signalling daemon... done ttcp-t: socket ttcp-t: sndbuf ttcp-t: nodelay ttcp-t: connect ttcp-t: buflen=65536, nbuf=100000, align=16384/0, port=5001, sockbufsize=2097152 tcp -> 10.67.16.91 ttcp-t: 6553600000 bytes in 58.29 real seconds = 109797.21 KB/sec +++ ttcp-t: 6553600000 bytes in 18.47 CPU seconds = 346485.49 KB/cpu sec ttcp-t: 100000 I/O calls, msec/call = 0.60, calls/sec = 1715.58 ttcp-t: 0.1user 18.3sys 0:58real 31% 0i+0d 0maxrss 0+16pf 8038+1csw ttcp-t: buffer address 0x8050000 Stopping profiling. CPU: P4 / Xeon with 2 hyper-threads, speed 2393.64 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (count cycles when processor is active) count 100000 samples % symbol name 198646 14.4698 tg3_enable_ints 166885 12.1562 __copy_from_user_ll 93473 6.8088 tg3_interrupt 57435 4.1837 default_idle 50924 3.7094 skb_clone 47592 3.4667 tg3_rx 45568 3.3193 tcp_sendmsg 36887 2.6869 qdisc_restart 35256 2.5681 tg3_poll 31588 2.3009 ip_queue_xmit 29742 2.1665 skb_release_data 29258 2.1312 tcp_write_xmit 27799 2.0249 irq_entries_start 24281 1.7687 alloc_skb -- with PREEMPT=n: [root@mel-stglab-host31 linux]# zcat /proc/config.gz |grep PREEM # CONFIG_PREEMPT is not set [root@mel-stglab-host31 linux]# sh -c 'opcontrol --start; opcontrol --reset; ttcp -t -l65536 -s -v -b2097152 -D -n100000 10.67.16.91; opcontrol --stop; opreport -l /usr/src/linux/vmlinux' 2>&1 | head -30 Profiler running. Signalling daemon... done ttcp-t: socket ttcp-t: sndbuf ttcp-t: nodelay ttcp-t: connect ttcp-t: buflen=65536, nbuf=100000, align=16384/0, port=5001, sockbufsize=2097152 tcp -> 10.67.16.91 ttcp-t: 6553600000 bytes in 58.42 real seconds = 109543.94 KB/sec +++ ttcp-t: 6553600000 bytes in 18.82 CPU seconds = 340115.47 KB/cpu sec ttcp-t: 100000 I/O calls, msec/call = 0.60, calls/sec = 1711.62 ttcp-t: 0.0user 18.7sys 0:58real 32% 0i+0d 0maxrss 0+16pf 7985+2csw ttcp-t: buffer address 0x8050000 Stopping profiling. CPU: P4 / Xeon with 2 hyper-threads, speed 2393.76 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (count cycles when processor is active) count 100000 samples % symbol name 225502 18.1062 tg3_enable_ints 152548 12.2485 __copy_from_user_ll 85683 6.8797 tg3_interrupt 58248 4.6769 skb_clone 52288 4.1984 tcp_sendmsg 37893 3.0425 default_idle 35365 2.8396 tg3_rx 32555 2.6139 ip_queue_xmit 31778 2.5515 qdisc_restart 30089 2.4159 tg3_poll 29935 2.4036 tcp_v4_rcv 24818 1.9927 tcp_write_xmit 23781 1.9094 tcp_transmit_skb 23329 1.8732 skb_release_data cheers, lincoln. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-22 3:57 ` Lincoln Dale @ 2004-01-22 12:55 ` JG 2004-01-24 13:43 ` JG 2004-01-22 16:06 ` Tom Sightler 1 sibling, 1 reply; 21+ messages in thread From: JG @ 2004-01-22 12:55 UTC (permalink / raw) To: Lincoln Dale; +Cc: Tom Sightler, Andreas Hartmann, Linux-Kernel [-- Attachment #1: Type: text/plain, Size: 488 bytes --] hi, > nope. > i didn't use PREEMPT=y in my previous test, but i have just done so now. i have preempt enabled on both machines. at the moment i don't have time to recompile my kernel, but i'm going to test 2.6.2-rc1-mm1 soon on one of my machines where i'll disable it. i'm also going to test my systems with ttcp, because at the moment i'm transferring my backup from the server to my machine with 105.48 kB/s over the gigabit line via ftp :( but cpu is normal on both machines. JG [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-22 12:55 ` JG @ 2004-01-24 13:43 ` JG 2004-01-25 0:03 ` Lincoln Dale 0 siblings, 1 reply; 21+ messages in thread From: JG @ 2004-01-24 13:43 UTC (permalink / raw) To: Lincoln Dale; +Cc: Linux-Kernel [-- Attachment #1: Type: text/plain, Size: 2429 bytes --] hi, > i'm also going to test my systems with ttcp, because at the moment i'm transferring my backup from the server to my machine with 105.48 kB/s over the gigabit line via ftp :( but cpu is normal on both machines. i did some tests now, here are the results. box1 = 2.6.0 (tg3 driver v2.3, nov5/03) box2 = 2.6.2-rc1-mm2 (tg3 v2.5, dec22/03) box1 was sending, box2 receiving: box1 # ttcp -t -l 65536 -v -b 2097152 -s -D -n100000 192.168.0.3 ttcp-t: buflen=65536, nbuf=100000, align=16384/0, port=5001, sockbufsize=2097152 tcp -> 192.168.0.3 ttcp-t: socket ttcp-t: sndbuf ttcp-t: nodelay ttcp-t: connect ttcp-t: -2036334592 bytes in 1247.57 real seconds = 1768.00 KB/sec +++ ttcp-t: -2036334592 bytes in 30.01 CPU seconds = 73492.73 KB/cpu sec ttcp-t: 100000 I/O calls, msec/call = 12.78, calls/sec = 80.16 ttcp-t: 0.1user 29.8sys 20:47real 2% 0i+0d 0maxrss 1+16pf 67585+105csw ttcp-t: buffer address 0x807c000 ------------------------------------------ now the opposite, box2 was sending, box1 receiving: box2 ttcp # ttcp -t -l 65536 -v -b 2097152 -s -D -n100000 192.168.0.2 ttcp-t: buflen=65536, nbuf=100000, align=16384/0, port=5001, sockbufsize=2097152 tcp -> 192.168.0.2 ttcp-t: socket ttcp-t: sndbuf ttcp-t: nodelay ttcp-t: connect ttcp-t: -2036334592 bytes in 153.82 real seconds = 14339.52 KB/sec +++ ttcp-t: -2036334592 bytes in 28.61 CPU seconds = 77085.45 KB/cpu sec ttcp-t: 100000 I/O calls, msec/call = 1.58, calls/sec = 650.11 ttcp-t: 0.1user 28.4sys 2:33real 18% 0i+0d 0maxrss 0+17pf 63153+846csw ttcp-t: buffer address 0x807c000 i thought the cable could be defective because of the results, but i tested with another machine (windows xp, 100mbit card) and both up and download speed via ftp (from both boxes!) was at about 8-9MB/s. so no problem with the cable and it seems also no problem with 100mbit, but as soon as i connect the two tg3 cards together with 1000mbit, one direction is slow (cable is gbit certified and worked with 2.4 kernels without any problem). as i already mentionend in a previous email, the errors on the tg3 cards are quite high, but only in RX: box1: RX packets:18585312 errors:102500 dropped:0 overruns:0 frame:102598 TX packets:12435471 errors:0 dropped:0 overruns:0 carrier:0 box2: RX packets:6864695 errors:202162 dropped:0 overruns:0 frame:204652 TX packets:10049776 errors:0 dropped:0 overruns:0 carrier:0 cpu usage was also normal in every test (about 15-30%). JG [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-24 13:43 ` JG @ 2004-01-25 0:03 ` Lincoln Dale 2004-01-25 12:31 ` JG 0 siblings, 1 reply; 21+ messages in thread From: Lincoln Dale @ 2004-01-25 0:03 UTC (permalink / raw) To: JG; +Cc: Linux-Kernel Hi, At 12:43 AM 25/01/2004, JG wrote: >box1 was sending, box2 receiving: >box1 # ttcp -t -l 65536 -v -b 2097152 -s -D -n100000 192.168.0.3 >ttcp-t: -2036334592 bytes in 1247.57 real seconds = 1768.00 KB/sec +++ >ttcp-t: -2036334592 bytes in 30.01 CPU seconds = 73492.73 KB/cpu sec urgh, those are terrible numbers! >now the opposite, box2 was sending, box1 receiving: >box2 ttcp # ttcp -t -l 65536 -v -b 2097152 -s -D -n100000 192.168.0.2 >ttcp-t: -2036334592 bytes in 153.82 real seconds = 14339.52 KB/sec +++ >ttcp-t: -2036334592 bytes in 28.61 CPU seconds = 77085.45 KB/cpu sec better, but still terrible. even an old Pentium3 @ 500MHz here is capable of pushing GbE wire-rate (i just tested this using a Tigon2). >i thought the cable could be defective because of the results, but i >tested with another machine (windows xp, 100mbit card) and both up and >download speed via ftp (from both boxes!) was at about 8-9MB/s. so no >problem with the cable and it seems also no problem with 100mbit, but as >soon as i connect the two tg3 cards together with 1000mbit, one direction >is slow (cable is gbit certified and worked with 2.4 kernels without any >problem). actually, this isn't necessarily the case. Fast Ethernet only uses 1 pair of wires each for Tx/Rx (4 wires), whereas copper GbE uses 2 pairs each for Tx/Rx (8 wires). it may be the case that your cable has some bad connections on the pins only used for 1000baseT. >as i already mentionend in a previous email, the errors on the tg3 cards >are quite high, but only in RX: >box1: >RX packets:18585312 errors:102500 dropped:0 overruns:0 frame:102598 >TX packets:12435471 errors:0 dropped:0 overruns:0 carrier:0 >box2: >RX packets:6864695 errors:202162 dropped:0 overruns:0 frame:204652 >TX packets:10049776 errors:0 dropped:0 overruns:0 carrier:0 on a x-over cable, you should NEVER have any errors. if this is indeed simply an x-over cable, then i'd replace it and try again. (note that for 1000baseT you don't need to worry about whether the cable is x-over or not; 1000baseT on most NICs/switches will auto-detect the parity anyway..). Broadcom have a tool on their web site called "BACS" which can take advantage of some of the neat stuff in the PHY used on these boards. one of the tests it can do is to check the quality of the cable and report any problems it sees; it can run a signal/noise test on each pair. FYI, doing a "Cable Analysis" on a single port of a BCM5703 here connected to a switch (not x-over) with a ~1 metre patch cable shows: Distance (m): ~1 Margin (dB): 5.132 Frequency Margin (MHz): 41.382 cheers, lincoln. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-25 0:03 ` Lincoln Dale @ 2004-01-25 12:31 ` JG 2004-01-31 9:15 ` JG 0 siblings, 1 reply; 21+ messages in thread From: JG @ 2004-01-25 12:31 UTC (permalink / raw) To: Lincoln Dale; +Cc: Linux-Kernel [-- Attachment #1: Type: text/plain, Size: 1368 bytes --] hi, > urgh, those are terrible numbers! yes ;) > even an old Pentium3 @ 500MHz here is capable of pushing GbE wire-rate (i > just tested this using a Tigon2). my machines are athlon xp 1700+ and 2400+ so they should be fast enough... > Fast Ethernet only uses 1 pair of wires each for Tx/Rx (4 wires), whereas > copper GbE uses 2 pairs each for Tx/Rx (8 wires). oh, yes, i didn't think of that (my bad...) because i thought "it worked with 2.4 kernels". > if this is indeed simply an x-over cable, then i'd replace it and try again. yes, they are located in different rooms and connected via an 20m x-over cable through the wall (easier than affording a gbit switch ;)) > Broadcom have a tool on their web site called "BACS" which [...] check the quality of the cable and report any > problems it sees; it can run a signal/noise test on each pair. thank you for the info! i searched their site, but i only found a reference to BACS on their faq page and that this software should be on their driver cdrom (well, it is not on my netgear cdrom). but i'll test my cable with a fluke networks cable tester tomorrorw or on tuesday. i'll post the results if they are relevant. i also tested with a knoppix cdrom on box2, which i can reboot, with 2.4.21 kernel and v1.5 tg3 driver, but the problem was also there so it really seems to be the cable... thx, JG [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-25 12:31 ` JG @ 2004-01-31 9:15 ` JG 2004-02-01 0:20 ` Lincoln Dale 0 siblings, 1 reply; 21+ messages in thread From: JG @ 2004-01-31 9:15 UTC (permalink / raw) To: Lincoln Dale; +Cc: Linux-Kernel [-- Attachment #1: Type: text/plain, Size: 964 bytes --] hi, i'm replying to my email. > thank you for the info! i searched their site, but i only found a reference to BACS on their faq page and that this software should be on their driver cdrom (well, it is not on my netgear cdrom). > but i'll test my cable with a fluke networks cable tester tomorrorw or on tuesday. i'll post the results if they are relevant. well, i did a thorough cable test with a DSP-4100 fluke networks cable tester and i had some bad values. i've been using 3 cables (24m) with adapters, all single cables were fine, so the adapters seemed to cause the problem. but i'm now using a longer x-over cable (30m) where i also get those speed problems. it is a *bit* better, i get about 1-2MB/s in both directions, but i'm also experiencing a very high error rate over the x-over cable...(~40-50 errors per second) do you have this BACS software and is it possible to test the NIC itself with it? maybe one of my NICs is causing this. thx, JG [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-31 9:15 ` JG @ 2004-02-01 0:20 ` Lincoln Dale 2004-02-07 18:26 ` JG 0 siblings, 1 reply; 21+ messages in thread From: Lincoln Dale @ 2004-02-01 0:20 UTC (permalink / raw) To: JG; +Cc: Linux-Kernel At 08:15 PM 31/01/2004, JG wrote: >well, i did a thorough cable test with a DSP-4100 fluke networks cable >tester and i had some bad values. i've been using 3 cables (24m) with >adapters, all single cables were fine, so the adapters seemed to cause the >problem. >but i'm now using a longer x-over cable (30m) where i also get those speed >problems. it is a *bit* better, i get about 1-2MB/s in both directions, >but i'm also experiencing a very high error rate over the x-over >cable...(~40-50 errors per second) if you get ANY errors, then its bad; even 1 error per second basically means "one lost packet per second", which will severly limit your TCP throughput. one thing you may want to do to is drop the link to 100mbit/s rather than gig-e; that will use less cable pairs and may avoid the problem. 100mbit/s without errors will likely be way way faster than 1000mbit/s with 50 errors/sec. >do you have this BACS software and is it possible to test the NIC itself >with it? maybe one of my NICs is causing this. it seems there is only a Windows version of their diagnostics. personally, i use IBM xSeries servers. their version of the BACS code is at <http://www-306.ibm.com/pc/support/site.wss/document.do?lndocid=MIGR-43815>. i've seen other servers (e.g. Compaq DL360?) that also use the BCM57xx; their BACS tool is rebadged as being a HP tool. cheers, lincoln. >thx, >JG > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-02-01 0:20 ` Lincoln Dale @ 2004-02-07 18:26 ` JG 2004-02-08 0:00 ` Lincoln Dale 0 siblings, 1 reply; 21+ messages in thread From: JG @ 2004-02-07 18:26 UTC (permalink / raw) To: Lincoln Dale; +Cc: Linux-Kernel [-- Attachment #1: Type: text/plain, Size: 244 bytes --] hi, just wanted to report, that it wasn't the cable or the tg3 driver but a defective NIC. i switched to a card with a realtek 8169 chipset and now it looks muuuuch better ;) and the best thing, no errors in ifconfig. thx for everything! JG [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-02-07 18:26 ` JG @ 2004-02-08 0:00 ` Lincoln Dale 2004-02-09 10:13 ` JG 0 siblings, 1 reply; 21+ messages in thread From: Lincoln Dale @ 2004-02-08 0:00 UTC (permalink / raw) To: JG; +Cc: Linux-Kernel At 05:26 AM 8/02/2004, JG wrote: >just wanted to report, that it wasn't the cable or the tg3 driver but a >defective NIC. i switched to a card with a realtek 8169 chipset and now it >looks muuuuch better ;) and the best thing, no errors in ifconfig. good to hear. what revision BCM5700 was it that you had? i've heard reports that the newer BCM5705s have 'issues' whereas BCM5700-5703 are good. can you post your 'lspci -vvv' output? cheers, lincoln. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-02-08 0:00 ` Lincoln Dale @ 2004-02-09 10:13 ` JG 0 siblings, 0 replies; 21+ messages in thread From: JG @ 2004-02-09 10:13 UTC (permalink / raw) To: Lincoln Dale; +Cc: Linux-Kernel [-- Attachment #1: Type: text/plain, Size: 1575 bytes --] hi, > good to hear. > what revision BCM5700 was it that you had? > > i've heard reports that the newer BCM5705s have 'issues' whereas > BCM5700-5703 are good. > > can you post your 'lspci -vvv' output? i'm sorry i can't tell the chip revision anymore (only card serial number) because i already RMA'ed the NIC. i do have the same netgear card in another system where the serial number nearly identical (differs only by 1 digit), here's the lspci -vvv output: 00:09.0 Ethernet controller: Altima (nee Broadcom) AC9100 Gigabit Ethernet (rev 15) Subsystem: Netgear: Unknown device 302a Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 (16000ns min), cache line size 08 Interrupt: pin A routed to IRQ 16 Region 0: Memory at cffe0000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] PCI-X non-bridge device. Command: DPERE- ERO- RBC=0 OST=0 Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM- Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- Address: fc879f41878ba220 Data: 3f06 JG [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: TG3: very high CPU usage 2004-01-22 3:57 ` Lincoln Dale 2004-01-22 12:55 ` JG @ 2004-01-22 16:06 ` Tom Sightler 1 sibling, 0 replies; 21+ messages in thread From: Tom Sightler @ 2004-01-22 16:06 UTC (permalink / raw) To: Lincoln Dale; +Cc: JG, Andreas Hartmann, Linux-Kernel On Wed, 2004-01-21 at 22:57, Lincoln Dale wrote: > At 02:19 PM 21/01/2004, Tom Sightler wrote: > >I'm curious is the people seeing this problem happen to have preempt > >enabled in their config. I've noticed that my laptop, which also > >happens to have a tg3 based 10/100/1000 card, uses tons of CPU during > >trasfers, but only when preempt is enabled. > > nope. > i didn't use PREEMPT=y in my previous test, but i have just done so now. > > the difference in CPU utilization when pushing wire-rate gig-e ttcp on this > system (Dual P4 Xeon) with PREEMPT=y or PREEMPT=n is just noise. > > you should run oprofile and see where your cpu time is spent. Well, it was just a tought. As it turns out in my case it seems the problem was related to ACPI and PREEMPT (I still don't understand what exactly). Everything seems normal with ACPI without PREEMPT, or without ACPI with PREEMPT, but if I enable both ACPI and PREEMPT I get a ton of CPU usage. In Fedora Core top it shows up as IRQ time. I haven't run oprofile yet but it seems this problem is something to do with ACPI and PREEMPT on my machine (perhaps something to do with IRQ routing when ACPI is enabled). Sounds like that doesn't apply to any of the systems you guys are talking about. Sorry for the noise. PREEMPT with ACPI is showing some other problems on my machine as well (for example when PREEMPT is enabled my battery status applet fails after several hours of uptime, or even shorter if a stress the network). I can't reproduce this if I disable PREEMPT. Anyway, good luck in finding a common issue for your problems. Later, Tom ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2004-02-09 10:13 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <fa.eu7l1gd.ekqe1j@ifi.uio.no>
2004-01-19 11:32 ` TG3: very high CPU usage Andreas Hartmann
2004-01-20 3:54 ` Mark Williams (MWP)
[not found] <fa.g9joqss.1nneajs@ifi.uio.no>
[not found] ` <fa.e29fqcc.sick10@ifi.uio.no>
2004-01-20 9:17 ` Andreas Hartmann
2004-01-20 9:44 ` Lincoln Dale
2004-01-20 10:16 ` Mark Williams (MWP)
2004-01-20 12:09 ` Lincoln Dale
2004-01-19 3:35 Mark Williams (MWP)
2004-01-20 12:33 ` JG
2004-01-20 23:13 ` Lincoln Dale
2004-01-21 3:19 ` Tom Sightler
2004-01-22 3:57 ` Lincoln Dale
2004-01-22 12:55 ` JG
2004-01-24 13:43 ` JG
2004-01-25 0:03 ` Lincoln Dale
2004-01-25 12:31 ` JG
2004-01-31 9:15 ` JG
2004-02-01 0:20 ` Lincoln Dale
2004-02-07 18:26 ` JG
2004-02-08 0:00 ` Lincoln Dale
2004-02-09 10:13 ` JG
2004-01-22 16:06 ` Tom Sightler
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox