* kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1
@ 2006-04-12 21:42 Guenther Thomsen
2006-04-12 21:48 ` Stephen Hemminger
2006-05-16 19:11 ` kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Stephen Hemminger
0 siblings, 2 replies; 8+ messages in thread
From: Guenther Thomsen @ 2006-04-12 21:42 UTC (permalink / raw)
To: shemminger, John W. Linville; +Cc: netdev
I'm happy to report, that the version of the sky2 driver in 2.6.17-rc1
yields line rate at low CPU utilization (as determined using ttcp).
Unfortunately, it's not quite bug-free yet ;-}
When enabling the second interface (of the same network controller) the
kernel panics (perhaps during DHCP discovery?):
--8<--
[root@penguin1 ~]# ifup eth1
Determining IP information for eth1...Unable to handle kernel paging
request at ffffc20000014000 RIP:
<ffffffff811a3329>{sky2_mac_init+522}
PGD 13fc49067 PUD 13fc4a067 PMD 13fc4b067 PTE 0
Oops: 0000 [1] SMP
CPU 3os linked in: autofs4 sr_mod cdrom dm_mod button usb_storage
uhci_hcd 11BladeRunner_sk98lin #1
RIP: 0010:[<ffffffff811a3329>] <ffffffff811a68 RCX: 000000000000001e
RDX: 0000000000004008 RSI: ffffc20000010000 11 0000000000000fe0 R12:
0000000000001000
R13: ffff81013fae61a8 R14:
S: 010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2: ffffc20000014000fe40)
Stack: 0000000000001000 ffff81013fae6000 ffff81013fae6500
00000f1013fae6000
Call Trace: <ffffffff811a3e0c>{sky2_up+334} <ffffffff81
<ffffffff81144cf4>{sprintf+144}
<ffffffff8123b616>{inet_ioctl{s_ioctl+44}
<ffffffff81085703>{sys_ioctl+107}
<ffffffff81009a2ac_init+522} RSP <ffff81013487fd28>
CR2: ffffc20000014000
<0>Kerne
-->8--
or (2nd try):
--8<--
[root@penguin1 ~]# Unable to handle kernel paging request at
ffffc20000014000 RIP:
<ffffffff811a3329>{sky2_mac_init+522}
PGD 13fc49067 PUD 13fc4a067 PMD 13fc4b067 PTE 0
Oops: 0000 [1] SMP
CPU 2
Modules linked in: autofs4 sr_mod cdrom dm_mod button usb_storage
uhci_hcd ehci_hcd e752x_edac edac_mc shpcR: 00:[<ffffffff811a3329>]
<ffffffff811a3329>{sky2_mac_init+522}
RDX: 0000000000004008 RSI: ffffc20000010000 RDI: 0000000000000000
R1 0000000001000
R13: ffff81013f0511a8 R14: 0000000000000001 R15: 0000S0000 CR0:
000000008005003b
CR2: ffffc20000014000 CR3: 000000013425b000000000001000 ffff81013f051000
ffff81013f051500 0000000000000000
Call Trace: <ffffffff811a3e0c>{sky2_up+334}
<ffffffff811fea04>{dev_op844cf4>{sprintf+144}
<ffffffff8123b616>{inet_ioctl+74}
<fffffffff81085703>{ys_ioctl+107}
<ffffffff81009ac8>{tracesys+209}
+} RSP <ffff81013534fd28>
CR2: ffffc20000014000
<0>Kernel panic - n
-->8--
The kernel is vanilla 2.6.17-rc1, the sky2 driver was compiled into the
kernel. OS is RedHat Fedora Core 4. The kernel was compiled using
gcc32.
The system is a Blade of a BladeRunner 4130 of Penguincomputing, it
contains two Xeon CPU (+ HT enabled) and an on-board 8062 network
controller of Marvell (88E8062 is stamped on the chip).
The hardware seems to work fine using 2.6.15(.7) with the sk98lin driver
version 8.31 of Syskonnect (skd.de).
Please let me know, if I can provide further information or assist in
any other way.
best regards
Guenther
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 2006-04-12 21:42 kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Guenther Thomsen @ 2006-04-12 21:48 ` Stephen Hemminger 2006-04-12 22:26 ` Guenther Thomsen 2006-05-16 19:11 ` kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Stephen Hemminger 1 sibling, 1 reply; 8+ messages in thread From: Stephen Hemminger @ 2006-04-12 21:48 UTC (permalink / raw) To: Guenther Thomsen; +Cc: John W. Linville, netdev You need this patch, which Jeff hasn't applied yet. ----- Subject: sky2: crash when bringing up second port Sky2 driver will oops referencing bad memory if used on a dual port card. The problem is accessing past end of MIB counter space. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> --- test-2.6.orig/drivers/net/sky2.c +++ test-2.6/drivers/net/sky2.c @@ -579,8 +579,8 @@ static void sky2_mac_init(struct sky2_hw reg = gma_read16(hw, port, GM_PHY_ADDR); gma_write16(hw, port, GM_PHY_ADDR, reg | GM_PAR_MIB_CLR); - for (i = 0; i < GM_MIB_CNT_SIZE; i++) - gma_read16(hw, port, GM_MIB_CNT_BASE + 8 * i); + for (i = GM_MIB_CNT_BASE; i <= GM_MIB_CNT_END; i += 4) + gma_read16(hw, port, i); gma_write16(hw, port, GM_PHY_ADDR, reg); /* transmit control */ --- test-2.6.orig/drivers/net/sky2.h +++ test-2.6/drivers/net/sky2.h @@ -1375,7 +1375,7 @@ enum { GM_PHY_ADDR = 0x0088, /* 16 bit r/w GPHY Address Register */ /* MIB Counters */ GM_MIB_CNT_BASE = 0x0100, /* Base Address of MIB Counters */ - GM_MIB_CNT_SIZE = 256, + GM_MIB_CNT_END = 0x025C, /* Last MIB counter */ }; ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 2006-04-12 21:48 ` Stephen Hemminger @ 2006-04-12 22:26 ` Guenther Thomsen 2006-04-17 18:18 ` Stephen Hemminger 0 siblings, 1 reply; 8+ messages in thread From: Guenther Thomsen @ 2006-04-12 22:26 UTC (permalink / raw) To: Stephen Hemminger; +Cc: John W. Linville, netdev On Wednesday 12 April 2006 14:48, Stephen Hemminger wrote: > You need this patch, which Jeff hasn't applied yet. > ----- > Subject: sky2: crash when bringing up second port > > Sky2 driver will oops referencing bad memory if used on > a dual port card. The problem is accessing past end of > MIB counter space. > > Signed-off-by: Stephen Hemminger <shemminger@osdl.org> > > > --- test-2.6.orig/drivers/net/sky2.c > +++ test-2.6/drivers/net/sky2.c > @@ -579,8 +579,8 @@ static void sky2_mac_init(struct sky2_hw > reg = gma_read16(hw, port, GM_PHY_ADDR); > gma_write16(hw, port, GM_PHY_ADDR, reg | GM_PAR_MIB_CLR); > > - for (i = 0; i < GM_MIB_CNT_SIZE; i++) > - gma_read16(hw, port, GM_MIB_CNT_BASE + 8 * i); > + for (i = GM_MIB_CNT_BASE; i <= GM_MIB_CNT_END; i += 4) > + gma_read16(hw, port, i); > gma_write16(hw, port, GM_PHY_ADDR, reg); > > /* transmit control */ > --- test-2.6.orig/drivers/net/sky2.h > +++ test-2.6/drivers/net/sky2.h > @@ -1375,7 +1375,7 @@ enum { > GM_PHY_ADDR = 0x0088, /* 16 bit r/w GPHY Address Register */ > /* MIB Counters */ > GM_MIB_CNT_BASE = 0x0100, /* Base Address of MIB Counters */ > - GM_MIB_CNT_SIZE = 256, > + GM_MIB_CNT_END = 0x025C, /* Last MIB counter */ > }; Thanks for the very quick response. The patch indeed prevents the panic when bringing up the second interface, but now the host doesn't receive any packets anymore. It still sends packets (ARP requests, naturally). If I inject the Ethernet address of a second host into the arp table of the test subject, ICMP Echo requests are sent, but then sendmsg's buffer space is exhausted (?): --8<-- [root@penguin1 ~]# arp -s 192.168.65.67 00:A0:D1:E1:F3:2C [root@penguin1 ~]# ping 192.168.65.67 PING 192.168.65.67 (192.168.65.67) 56(84) bytes of data. ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available --- 192.168.65.67 ping statistics --- 19 packets transmitted, 0 received, 100% packet loss, time 37012ms -->8-- There is no hint of a malfunction to be found in the kernel's message buffer. best regards Guenther ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 2006-04-12 22:26 ` Guenther Thomsen @ 2006-04-17 18:18 ` Stephen Hemminger 2006-04-26 0:06 ` sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1) Guenther Thomsen 0 siblings, 1 reply; 8+ messages in thread From: Stephen Hemminger @ 2006-04-17 18:18 UTC (permalink / raw) To: Guenther Thomsen; +Cc: John W. Linville, netdev I don't know what you are doing different, but my 2 port SysKonnect card is working fine. Running SMP AMD64 and 2.6.17 latest. Showing full speed on both ports. ^ permalink raw reply [flat|nested] 8+ messages in thread
* sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1) 2006-04-17 18:18 ` Stephen Hemminger @ 2006-04-26 0:06 ` Guenther Thomsen 2006-04-26 16:44 ` Stephen Hemminger 0 siblings, 1 reply; 8+ messages in thread From: Guenther Thomsen @ 2006-04-26 0:06 UTC (permalink / raw) To: Stephen Hemminger; +Cc: John W. Linville, netdev On Monday 17 April 2006 11:18, Stephen Hemminger wrote: > I don't know what you are doing different, but my 2 port SysKonnect > card is working fine. Running SMP AMD64 and 2.6.17 latest. > > Showing full speed on both ports. I missed that e-mail, sorry. I just gave it another try, this time with 2.6.16.11 . One port works fine (so far, I just did very limited testing with ttcp). The second port does negotiate IP address via DHCP, but the packgages it receives seem to be garbled: --8<-- 0x0000: 0000 6175 6469 7428 3131 3435 3939 3430 ..audit(11459940 0x0010: 3031 2e39 3738 3a33 3829 3a20 7573 6572 01.978:38):.user 0x0020: 2070 6964 3d33 3230 3920 7569 643d .pid=3209.uid= 12:56:23.725090 00:00:00:00:00:00 > 30:6e:6d:00:00:00 null I (s=32,r=55,P) len=42 12:56:24.603274 00:00:21:00:00:00 > 00:00:00:00:00:00 null disc/C len=43 12:56:26.619326 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:28.635346 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:29.734046 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:29.865239 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:30.651371 00:00:00:00:00:00 > a6:00:00:00:4d:04, ethertype Unknown (0xe20c), length 60: 0x0000: 0000 6175 6469 7428 3131 3435 3939 3436 ..audit(11459946 0x0010: 3031 2e33 3639 3a34 3729 3a20 7573 6572 01.369:47):.user 0x0020: 2070 6964 3d33 3239 3820 7569 643d .pid=3298.uid= 12:56:30.916718 00:00:f0:71:61:00 > 28:37:03:5b:3a:00 null I (s=16,r=0,C) len=42 12:56:30.923558 00:00:21:00:00:00 > 00:00:00:00:00:00 null rnr (r=55,C) len=42 12:56:32.667413 00:00:d0:2e:30:42 > 10:60:61:00:00:00, ethertype Unknown (0x572b), length 60: 0x0000: 0000 d675 0d00 0000 0000 0200 0000 0000 ...u............ 0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0020: 0000 ffff ffff 0000 0000 1300 0000 .............. 12:56:33.296384 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 12:56:33.303222 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 [..] 13:00:44.340062 00:00:00:00:00:00 > 5f:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:44.672350 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:44.868724 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:45.340123 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:46.340173 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:46.688433 IP truncated-ip - 1454 bytes missing! 192.168.65.66.40313 > 192.168.65.65.5001: . 1426488980:1426490428(1448) ack 1790562292 win 1460 <nop,nop,timestamp[|tcp]> 13:00:48.704431 00:00:21:00:00:00 > 00:00:00:00:00:00 null I (s=17,r=18,C) len=42 13:00:48.886426 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:50.720463 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:52.736496 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:54.752522 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:54.927556 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 13:00:54.934394 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 -->8-- On a different host connected to the same switch, traffic looks more like: --8<-- 2:01:49.388992 IP 192.168.64.1.ntp > 255.255.255.255.ntp: NTPv3, Broadcast, length 48 12:01:50.176550 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:01:51.235034 arp reply 192.168.64.32 is-at 00:0a:49:00:5e:8a 12:01:51.241857 arp reply 192.168.64.33 is-at 00:0a:49:00:5e:8b 12:01:51.891193 00:00:01:02:c8:58 > 45:c0:00:1c:00:20, ethertype Unknown (0xe000), length 60: 0x0000: 0001 1164 ee9b 0000 0000 0000 0000 0000 ...d............ 0x0010: 0000 0000 0000 0000 0000 0000 2f6b 8c87 ............/k.. 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. 12:01:52.192552 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:01:52.801392 arp reply 192.168.64.34 is-at 00:0a:49:00:5e:8c 12:01:52.808240 arp reply 192.168.64.35 is-at 00:0a:49:00:5e:8d 12:01:54.208495 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:01:56.224453 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:01:58.240464 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 12:02:00.029320 arp reply 192.168.64.39 is-at 00:0a:49:00:5e:ff 12:02:00.256420 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 -->8-- I noticed that the interrupt count is very low too (the interrupt count as shown in /proc/interrupts is much higher): --8<-- [root@penguin1 ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:A0:D1:E1:F2:D8 inet addr:192.168.65.65 Bcast:192.168.65.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:4559786 errors:0 dropped:0 overruns:0 frame:0 TX packets:4071967 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4680823977 (4.3 GiB) TX bytes:4332319475 (4.0 GiB) Interrupt:169 eth1 Link encap:Ethernet HWaddr 00:A0:D1:E1:F2:D9 inet addr:192.168.64.199 Bcast:192.168.64.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2193 errors:0 dropped:0 overruns:0 frame:0 TX packets:29 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:180137 (175.9 KiB) TX bytes:1856 (1.8 KiB) Interrupt:169 -->8-- I then tried 2.6.17-rc2-git6. At first it looked OK, the second ethernet device was configured properly and I got some traffic through. Once I started copying large files (some 5GB were successfully copied) over NFS using a (very) fast NFS server though, traffic received by eth1 got corrupted again: --8<-- [root@penguin1 ~]# tcpdump -n -i eth1 -s 0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes 14:23:14.049450 arp who-has 192.168.64.199 tell 192.168.64.202 14:23:14.049519 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 14:23:14.745075 arp who-has 192.168.64.199 tell 192.168.64.202 14:23:14.745082 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 14:23:14.852108 IP truncated-ip - 1454 bytes missing! 192.168.64.110.nfs > 192.168.64.199.1021: . 159991419:159992879(1460) ack 3444328765 win 64240 14:23:14.944489 00:00:00:00:00:00 > a3:00:00:00:50:04, ethertype Unknown (0x210d), length 98: 0x0000: 0000 6175 6469 7428 3131 3436 3030 3030 ..audit(11460000 0x0010: 3032 2e31 3836 3a36 3329 3a20 7573 6572 02.186:63):.user 0x0020: 2070 6964 3d33 3336 3120 7569 643d 3020 .pid=3361.uid=0. 0x0030: 6175 6964 3d34 3239 3439 3637 3239 3520 auid=4294967295. 0x0040: 6d73 673d 2750 414d 2073 6574 6372 6564 msg='PAM.setcred 0x0050: 3a20 7573 :.us 14:23:15.944703 arp who-has 192.168.64.253 tell 192.168.79.254 14:23:16.868291 arp who-has 192.168.64.199 tell 192.168.64.202 14:23:16.868301 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 14:23:16.944907 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST 14:23:17.945113 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST 14:23:18.884430 arp who-has 192.168.64.199 tell 192.168.64.202 14:23:18.884441 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 14:23:18.945318 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST -->8-- The ".audit ... PAM.sedcred" string is interesting. This is most likely not traffic from the net, but a text inside the host's RAM. Did some pointer get mangled? I recompiled the kernel, now with RHFC4's gcc32. The result is similiar (only after some data was copied using NFS, the second interface goes bad): --8<-- [root@penguin1 ~]# tcpdump -n -s 0 -i eth1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes 15:48:02.306927 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8801 15:48:02.316088 arp who-has 192.168.64.202 tell 192.168.64.199 15:48:02.316329 arp who-has 192.168.64.199 tell 192.168.79.254 15:48:02.316335 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 15:48:02.316338 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 15:48:03.307095 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8802 15:48:03.307289 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8803 15:48:03.316166 arp who-has 192.168.64.202 tell 192.168.64.199 15:48:03.316397 arp who-has 192.168.64.199 tell 192.168.79.254 15:48:03.316401 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 15:48:03.316404 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 15:48:03.784698 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8804 12 packets captured 12 packets received by filter 0 packets dropped by kernel -->8-- No suspect text and no zero filled packets, only truncated ones now, but that's bad enough to stop NFS and cause bad packet loss: --8<-- 64 bytes from 192.168.64.199: icmp_seq=83 ttl=64 time=147073 ms 64 bytes from 192.168.64.199: icmp_seq=84 ttl=64 time=149073 ms 64 bytes from 192.168.64.199: icmp_seq=85 ttl=64 time=149073 ms 64 bytes from 192.168.64.199: icmp_seq=87 ttl=64 time=149073 ms 64 bytes from 192.168.64.199: icmp_seq=88 ttl=64 time=149073 ms 64 bytes from 192.168.64.199: icmp_seq=233 ttl=64 time=82023 ms 64 bytes from 192.168.64.199: icmp_seq=236 ttl=64 time=80018 ms 64 bytes from 192.168.64.199: icmp_seq=241 ttl=64 time=81018 ms 64 bytes from 192.168.64.199: icmp_seq=243 ttl=64 time=81018 ms 64 bytes from 192.168.64.199: icmp_seq=253 ttl=64 time=85018 ms 64 bytes from 192.168.64.199: icmp_seq=255 ttl=64 time=85018 ms 64 bytes from 192.168.64.199: icmp_seq=256 ttl=64 time=85629 ms 64 bytes from 192.168.64.199: icmp_seq=257 ttl=64 time=87023 ms --- 192.168.64.199 ping statistics --- 346 packets transmitted, 63 received, +3 errors, 81% packet loss, time 345136ms rtt min/avg/max/mdev = 80018.748/119940.275/149073.885/21090.211 ms, pipe 151 -->8-- Considering the recent NFS changes, I tried to get the system into this state using just ttcp. With some determination, three more hosts and a few million packets, I succeeded. This time eth0 truncated packets and traffic slowed to a crawl (~1 good packet every 2s). Some progress has been made, but it's not quite solid yet. best regards Guenther ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1) 2006-04-26 0:06 ` sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1) Guenther Thomsen @ 2006-04-26 16:44 ` Stephen Hemminger 2006-04-26 17:41 ` Guenther Thomsen 0 siblings, 1 reply; 8+ messages in thread From: Stephen Hemminger @ 2006-04-26 16:44 UTC (permalink / raw) To: Guenther Thomsen; +Cc: John W. Linville, netdev On Tue, 25 Apr 2006 17:06:25 -0700 Guenther Thomsen <gthomsen@bluearc.com> wrote: > On Monday 17 April 2006 11:18, Stephen Hemminger wrote: > > I don't know what you are doing different, but my 2 port SysKonnect > > card is working fine. Running SMP AMD64 and 2.6.17 latest. > > > > Showing full speed on both ports. > I missed that e-mail, sorry. > > I just gave it another try, this time with 2.6.16.11 . One port works > fine (so far, I just did very limited testing with ttcp). The second port > does negotiate IP address via DHCP, but the packgages it receives > seem to be garbled: > > --8<-- > 0x0000: 0000 6175 6469 7428 3131 3435 3939 3430 ..audit(11459940 > 0x0010: 3031 2e39 3738 3a33 3829 3a20 7573 6572 01.978:38):.user > 0x0020: 2070 6964 3d33 3230 3920 7569 643d .pid=3209.uid= > 12:56:23.725090 00:00:00:00:00:00 > 30:6e:6d:00:00:00 null I (s=32,r=55,P) len=42 > 12:56:24.603274 00:00:21:00:00:00 > 00:00:00:00:00:00 null disc/C len=43 > 12:56:26.619326 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 12:56:28.635346 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 12:56:29.734046 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 12:56:29.865239 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 12:56:30.651371 00:00:00:00:00:00 > a6:00:00:00:4d:04, ethertype Unknown (0xe20c), length 60: > 0x0000: 0000 6175 6469 7428 3131 3435 3939 3436 ..audit(11459946 > 0x0010: 3031 2e33 3639 3a34 3729 3a20 7573 6572 01.369:47):.user > 0x0020: 2070 6964 3d33 3239 3820 7569 643d .pid=3298.uid= > 12:56:30.916718 00:00:f0:71:61:00 > 28:37:03:5b:3a:00 null I (s=16,r=0,C) len=42 > 12:56:30.923558 00:00:21:00:00:00 > 00:00:00:00:00:00 null rnr (r=55,C) len=42 > 12:56:32.667413 00:00:d0:2e:30:42 > 10:60:61:00:00:00, ethertype Unknown (0x572b), length 60: > 0x0000: 0000 d675 0d00 0000 0000 0200 0000 0000 ...u............ > 0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ > 0x0020: 0000 ffff ffff 0000 0000 1300 0000 .............. > 12:56:33.296384 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 12:56:33.303222 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > [..] > 13:00:44.340062 00:00:00:00:00:00 > 5f:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 13:00:44.672350 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 13:00:44.868724 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 13:00:45.340123 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 13:00:46.340173 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 13:00:46.688433 IP truncated-ip - 1454 bytes missing! 192.168.65.66.40313 > 192.168.65.65.5001: . 1426488980:1426490428(1448) ack 1790562292 win 1460 <nop,nop,timestamp[|tcp]> > 13:00:48.704431 00:00:21:00:00:00 > 00:00:00:00:00:00 null I (s=17,r=18,C) len=42 > 13:00:48.886426 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 13:00:50.720463 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 13:00:52.736496 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 13:00:54.752522 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 13:00:54.927556 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > 13:00:54.934394 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42 > -->8-- > On a different host connected to the same switch, traffic looks more like: > --8<-- > 2:01:49.388992 IP 192.168.64.1.ntp > 255.255.255.255.ntp: NTPv3, Broadcast, length 48 > 12:01:50.176550 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 > 12:01:51.235034 arp reply 192.168.64.32 is-at 00:0a:49:00:5e:8a > 12:01:51.241857 arp reply 192.168.64.33 is-at 00:0a:49:00:5e:8b > 12:01:51.891193 00:00:01:02:c8:58 > 45:c0:00:1c:00:20, ethertype Unknown (0xe000), length 60: > 0x0000: 0001 1164 ee9b 0000 0000 0000 0000 0000 ...d............ > 0x0010: 0000 0000 0000 0000 0000 0000 2f6b 8c87 ............/k.. > 0x0020: 0000 0000 0000 0000 0000 0000 0000 .............. > 12:01:52.192552 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 > 12:01:52.801392 arp reply 192.168.64.34 is-at 00:0a:49:00:5e:8c > 12:01:52.808240 arp reply 192.168.64.35 is-at 00:0a:49:00:5e:8d > 12:01:54.208495 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 > 12:01:56.224453 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 > 12:01:58.240464 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 > 12:02:00.029320 arp reply 192.168.64.39 is-at 00:0a:49:00:5e:ff > 12:02:00.256420 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 > -->8-- > > I noticed that the interrupt count is very low too (the interrupt count > as shown in /proc/interrupts is much higher): > --8<-- > [root@penguin1 ~]# ifconfig > eth0 Link encap:Ethernet HWaddr 00:A0:D1:E1:F2:D8 > inet addr:192.168.65.65 Bcast:192.168.65.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:4559786 errors:0 dropped:0 overruns:0 frame:0 > TX packets:4071967 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:4680823977 (4.3 GiB) TX bytes:4332319475 (4.0 GiB) > Interrupt:169 > > eth1 Link encap:Ethernet HWaddr 00:A0:D1:E1:F2:D9 > inet addr:192.168.64.199 Bcast:192.168.64.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:2193 errors:0 dropped:0 overruns:0 frame:0 > TX packets:29 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:180137 (175.9 KiB) TX bytes:1856 (1.8 KiB) > Interrupt:169 > -->8-- > > I then tried 2.6.17-rc2-git6. At first it looked OK, the second ethernet > device was configured properly and I got some traffic through. Once > I started copying large files (some 5GB were successfully copied) over > NFS using a (very) fast NFS server though, traffic received by eth1 got > corrupted again: > > --8<-- > [root@penguin1 ~]# tcpdump -n -i eth1 -s 0 > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes > 14:23:14.049450 arp who-has 192.168.64.199 tell 192.168.64.202 > 14:23:14.049519 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 > 14:23:14.745075 arp who-has 192.168.64.199 tell 192.168.64.202 > 14:23:14.745082 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 > 14:23:14.852108 IP truncated-ip - 1454 bytes missing! 192.168.64.110.nfs > 192.168.64.199.1021: . 159991419:159992879(1460) ack 3444328765 win 64240 > 14:23:14.944489 00:00:00:00:00:00 > a3:00:00:00:50:04, ethertype Unknown (0x210d), length 98: > 0x0000: 0000 6175 6469 7428 3131 3436 3030 3030 ..audit(11460000 > 0x0010: 3032 2e31 3836 3a36 3329 3a20 7573 6572 02.186:63):.user > 0x0020: 2070 6964 3d33 3336 3120 7569 643d 3020 .pid=3361.uid=0. > 0x0030: 6175 6964 3d34 3239 3439 3637 3239 3520 auid=4294967295. > 0x0040: 6d73 673d 2750 414d 2073 6574 6372 6564 msg='PAM.setcred > 0x0050: 3a20 7573 :.us > 14:23:15.944703 arp who-has 192.168.64.253 tell 192.168.79.254 > 14:23:16.868291 arp who-has 192.168.64.199 tell 192.168.64.202 > 14:23:16.868301 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 > 14:23:16.944907 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST > 14:23:17.945113 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST > 14:23:18.884430 arp who-has 192.168.64.199 tell 192.168.64.202 > 14:23:18.884441 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 > 14:23:18.945318 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST > -->8-- > > The ".audit ... PAM.sedcred" string is interesting. This is most likely > not traffic from the net, but a text inside the host's RAM. Did some > pointer get mangled? > > I recompiled the kernel, now with RHFC4's gcc32. The result is similiar > (only after some data was copied using NFS, the second interface goes > bad): > --8<-- > [root@penguin1 ~]# tcpdump -n -s 0 -i eth1 > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes > 15:48:02.306927 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8801 > 15:48:02.316088 arp who-has 192.168.64.202 tell 192.168.64.199 > 15:48:02.316329 arp who-has 192.168.64.199 tell 192.168.79.254 > 15:48:02.316335 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 > 15:48:02.316338 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 > 15:48:03.307095 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8802 > 15:48:03.307289 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8803 > 15:48:03.316166 arp who-has 192.168.64.202 tell 192.168.64.199 > 15:48:03.316397 arp who-has 192.168.64.199 tell 192.168.79.254 > 15:48:03.316401 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9 > 15:48:03.316404 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15 > 15:48:03.784698 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8804 > > 12 packets captured > 12 packets received by filter > 0 packets dropped by kernel > -->8-- > No suspect text and no zero filled packets, only truncated ones now, > but that's bad enough to stop NFS and cause bad packet loss: > --8<-- > 64 bytes from 192.168.64.199: icmp_seq=83 ttl=64 time=147073 ms > 64 bytes from 192.168.64.199: icmp_seq=84 ttl=64 time=149073 ms > 64 bytes from 192.168.64.199: icmp_seq=85 ttl=64 time=149073 ms > 64 bytes from 192.168.64.199: icmp_seq=87 ttl=64 time=149073 ms > 64 bytes from 192.168.64.199: icmp_seq=88 ttl=64 time=149073 ms > 64 bytes from 192.168.64.199: icmp_seq=233 ttl=64 time=82023 ms > 64 bytes from 192.168.64.199: icmp_seq=236 ttl=64 time=80018 ms > 64 bytes from 192.168.64.199: icmp_seq=241 ttl=64 time=81018 ms > 64 bytes from 192.168.64.199: icmp_seq=243 ttl=64 time=81018 ms > 64 bytes from 192.168.64.199: icmp_seq=253 ttl=64 time=85018 ms > 64 bytes from 192.168.64.199: icmp_seq=255 ttl=64 time=85018 ms > 64 bytes from 192.168.64.199: icmp_seq=256 ttl=64 time=85629 ms > 64 bytes from 192.168.64.199: icmp_seq=257 ttl=64 time=87023 ms > > --- 192.168.64.199 ping statistics --- > 346 packets transmitted, 63 received, +3 errors, 81% packet loss, time 345136ms > rtt min/avg/max/mdev = 80018.748/119940.275/149073.885/21090.211 ms, pipe 151 > -->8-- > > Considering the recent NFS changes, I tried to get the system into this > state using just ttcp. With some determination, three more hosts and > a few million packets, I succeeded. This time eth0 truncated packets > and traffic slowed to a crawl (~1 good packet every 2s). > > Some progress has been made, but it's not quite solid yet. > Are you saturating both ports on the card or only one? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1) 2006-04-26 16:44 ` Stephen Hemminger @ 2006-04-26 17:41 ` Guenther Thomsen 0 siblings, 0 replies; 8+ messages in thread From: Guenther Thomsen @ 2006-04-26 17:41 UTC (permalink / raw) To: Stephen Hemminger; +Cc: John W. Linville, netdev On Wednesday 26 April 2006 09:44, Stephen Hemminger wrote: > On Tue, 25 Apr 2006 17:06:25 -0700 > > Guenther Thomsen <gthomsen@bluearc.com> wrote: [..] > > Considering the recent NFS changes, I tried to get the system into > > this state using just ttcp. With some determination, three more > > hosts and a few million packets, I succeeded. This time eth0 > > truncated packets and traffic slowed to a crawl (~1 good packet > > every 2s). > > > > Some progress has been made, but it's not quite solid yet. > > Are you saturating both ports on the card or only one? On the system under test I started four ttcp sessions: two senders and two receivers (the second one on a non-standard port). One pair for each port (device). I'm not sure, to which degree the device was saturated. It certainly should have been, since the remote hosts are capable of line rate, but I found the sending ttcp sessions on the system under test to be slow, as long as traffic was incoming. best regards Guenther ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 2006-04-12 21:42 kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Guenther Thomsen 2006-04-12 21:48 ` Stephen Hemminger @ 2006-05-16 19:11 ` Stephen Hemminger 1 sibling, 0 replies; 8+ messages in thread From: Stephen Hemminger @ 2006-05-16 19:11 UTC (permalink / raw) To: Guenther Thomsen; +Cc: John W. Linville, netdev Could you try the 2.6.17-rc4 version with this patch. It turns out the board seems to give out of order status responses. Ignore the vendor sk98lin driver, when I try the stock version it spends it's life resetting itself because it sets up PCI bus wrong. If I fix that, it spends it's time getting confused because it can't handle intermixed status reports properly (checksum et all is per port not per board). drivers/net/sky2.c | 28 +++++++++++++++++++++------- 1 files changed, 21 insertions(+), 7 deletions(-) 792547bc5e8e4f7d5a1070a168056f429635c254 diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c index ffd267f..11e7914 100644 --- a/drivers/net/sky2.c +++ b/drivers/net/sky2.c @@ -1020,8 +1020,27 @@ static int sky2_up(struct net_device *de struct sky2_hw *hw = sky2->hw; unsigned port = sky2->port; u32 ramsize, rxspace, imask; - int err = -ENOMEM; + int cap, err; + struct net_device *otherdev = hw->dev[sky2->port^1]; + /* + * Reduce split transactions (and turn off) rx checksums to + * prevent problems with dual ports. + */ + if (otherdev && netif_running(otherdev) && + (cap = pci_find_capability(hw->pdev, PCI_CAP_ID_PCIX))) { + struct sky2_port *osky2 = netdev_priv(otherdev); + u16 cmd; + + cmd = sky2_pci_read16(hw, cap + PCI_X_CMD); + cmd &= ~PCI_X_CMD_MAX_SPLIT; + sky2_pci_write16(hw, cap + PCI_X_CMD, cmd); + + sky2->rx_csum = 0; + osky2->rx_csum = 0; + } + + err = -ENOMEM; if (netif_msg_ifup(sky2)) printk(KERN_INFO PFX "%s: enabling interface\n", dev->name); @@ -3067,12 +3086,7 @@ static __devinit struct net_device *sky2 sky2->duplex = -1; sky2->speed = -1; sky2->advertising = sky2_supported_modes(hw); - - /* Receive checksum disabled for Yukon XL - * because of observed problems with incorrect - * values when multiple packets are received in one interrupt - */ - sky2->rx_csum = (hw->chip_id != CHIP_ID_YUKON_XL); + sky2->rx_csum = 1; spin_lock_init(&sky2->phy_lock); sky2->tx_pending = TX_DEF_PENDING; -- 1.2.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-05-16 19:11 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-04-12 21:42 kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Guenther Thomsen 2006-04-12 21:48 ` Stephen Hemminger 2006-04-12 22:26 ` Guenther Thomsen 2006-04-17 18:18 ` Stephen Hemminger 2006-04-26 0:06 ` sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1) Guenther Thomsen 2006-04-26 16:44 ` Stephen Hemminger 2006-04-26 17:41 ` Guenther Thomsen 2006-05-16 19:11 ` kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Stephen Hemminger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).