kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1
@ 2006-04-12 21:42 Guenther Thomsen
  2006-04-12 21:48 ` Stephen Hemminger
  2006-05-16 19:11 ` kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Stephen Hemminger
  0 siblings, 2 replies; 8+ messages in thread
From: Guenther Thomsen @ 2006-04-12 21:42 UTC (permalink / raw)
  To: shemminger, John W. Linville; +Cc: netdev

I'm happy to report, that the version of the sky2 driver in 2.6.17-rc1 
yields line rate at low CPU utilization (as determined using ttcp).

Unfortunately, it's not quite bug-free yet ;-} 

When enabling the second interface (of the same network controller) the 
kernel panics (perhaps during DHCP discovery?):

--8<--
[root@penguin1 ~]# ifup eth1

Determining IP information for eth1...Unable to handle kernel paging 
request at ffffc20000014000 RIP:
<ffffffff811a3329>{sky2_mac_init+522}
PGD 13fc49067 PUD 13fc4a067 PMD 13fc4b067 PTE 0
Oops: 0000 [1] SMP
CPU 3os linked in: autofs4 sr_mod cdrom dm_mod button usb_storage 
uhci_hcd 11BladeRunner_sk98lin #1
RIP: 0010:[<ffffffff811a3329>] <ffffffff811a68 RCX: 000000000000001e
RDX: 0000000000004008 RSI: ffffc20000010000 11 0000000000000fe0 R12: 
0000000000001000
R13: ffff81013fae61a8 R14:
                           S: 010 DS: 0000 ES: 0000 CR0: 
000000008005003b
CR2: ffffc20000014000fe40)
Stack: 0000000000001000 ffff81013fae6000 ffff81013fae6500 
00000f1013fae6000
Call Trace: <ffffffff811a3e0c>{sky2_up+334} <ffffffff81
       <ffffffff81144cf4>{sprintf+144} 
<ffffffff8123b616>{inet_ioctl{s_ioctl+44} 
<ffffffff81085703>{sys_ioctl+107}
       <ffffffff81009a2ac_init+522} RSP <ffff81013487fd28>
CR2: ffffc20000014000
 <0>Kerne
-->8--

or (2nd try): 

--8<--
[root@penguin1 ~]# Unable to handle kernel paging request at 
ffffc20000014000 RIP:
<ffffffff811a3329>{sky2_mac_init+522}
PGD 13fc49067 PUD 13fc4a067 PMD 13fc4b067 PTE 0
Oops: 0000 [1] SMP
CPU 2
Modules linked in: autofs4 sr_mod cdrom dm_mod button usb_storage 
uhci_hcd ehci_hcd e752x_edac edac_mc shpcR: 00:[<ffffffff811a3329>] 
<ffffffff811a3329>{sky2_mac_init+522}
RDX: 0000000000004008 RSI: ffffc20000010000 RDI: 0000000000000000
R1 0000000001000
R13: ffff81013f0511a8 R14: 0000000000000001 R15: 0000S0000 CR0: 
000000008005003b
CR2: ffffc20000014000 CR3: 000000013425b000000000001000 ffff81013f051000 
ffff81013f051500 0000000000000000
   Call Trace: <ffffffff811a3e0c>{sky2_up+334} 
<ffffffff811fea04>{dev_op844cf4>{sprintf+144} 
<ffffffff8123b616>{inet_ioctl+74}
       <fffffffff81085703>{ys_ioctl+107}
       <ffffffff81009ac8>{tracesys+209}
+} RSP <ffff81013534fd28>
CR2: ffffc20000014000
 <0>Kernel panic - n
-->8--

The kernel is vanilla 2.6.17-rc1, the sky2 driver was compiled into the 
kernel. OS is RedHat Fedora Core 4. The kernel was compiled using 
gcc32.

The system is a Blade of a BladeRunner 4130 of Penguincomputing, it 
contains two Xeon CPU (+ HT enabled) and an on-board 8062 network 
controller of Marvell (88E8062 is stamped on the chip).

The hardware seems to work fine using 2.6.15(.7) with the sk98lin driver 
version 8.31 of Syskonnect (skd.de).

Please let me know, if I can provide further information or assist in 
any other way.

best regards
	Guenther

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1
  2006-04-12 21:42 kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Guenther Thomsen
@ 2006-04-12 21:48 ` Stephen Hemminger
  2006-04-12 22:26   ` Guenther Thomsen
  2006-05-16 19:11 ` kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Stephen Hemminger
  1 sibling, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2006-04-12 21:48 UTC (permalink / raw)
  To: Guenther Thomsen; +Cc: John W. Linville, netdev

You need this patch, which Jeff hasn't applied yet.
-----
Subject: sky2: crash when bringing up second port

Sky2 driver will oops referencing bad memory if used on
a dual port card.  The problem is accessing past end of
MIB counter space.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>


--- test-2.6.orig/drivers/net/sky2.c
+++ test-2.6/drivers/net/sky2.c
@@ -579,8 +579,8 @@ static void sky2_mac_init(struct sky2_hw
 	reg = gma_read16(hw, port, GM_PHY_ADDR);
 	gma_write16(hw, port, GM_PHY_ADDR, reg | GM_PAR_MIB_CLR);
 
-	for (i = 0; i < GM_MIB_CNT_SIZE; i++)
-		gma_read16(hw, port, GM_MIB_CNT_BASE + 8 * i);
+	for (i = GM_MIB_CNT_BASE; i <= GM_MIB_CNT_END; i += 4)
+		gma_read16(hw, port, i);
 	gma_write16(hw, port, GM_PHY_ADDR, reg);
 
 	/* transmit control */
--- test-2.6.orig/drivers/net/sky2.h
+++ test-2.6/drivers/net/sky2.h
@@ -1375,7 +1375,7 @@ enum {
 	GM_PHY_ADDR	= 0x0088,	/* 16 bit r/w	GPHY Address Register */
 /* MIB Counters */
 	GM_MIB_CNT_BASE	= 0x0100,	/* Base Address of MIB Counters */
-	GM_MIB_CNT_SIZE	= 256,
+	GM_MIB_CNT_END	= 0x025C,	/* Last MIB counter */
 };
 
 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1
  2006-04-12 21:48 ` Stephen Hemminger
@ 2006-04-12 22:26   ` Guenther Thomsen
  2006-04-17 18:18     ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Guenther Thomsen @ 2006-04-12 22:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: John W. Linville, netdev

On Wednesday 12 April 2006 14:48, Stephen Hemminger wrote:
> You need this patch, which Jeff hasn't applied yet.
> -----
> Subject: sky2: crash when bringing up second port
>
> Sky2 driver will oops referencing bad memory if used on
> a dual port card.  The problem is accessing past end of
> MIB counter space.
>
> Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
>
>
> --- test-2.6.orig/drivers/net/sky2.c
> +++ test-2.6/drivers/net/sky2.c
> @@ -579,8 +579,8 @@ static void sky2_mac_init(struct sky2_hw
>  	reg = gma_read16(hw, port, GM_PHY_ADDR);
>  	gma_write16(hw, port, GM_PHY_ADDR, reg | GM_PAR_MIB_CLR);
>
> -	for (i = 0; i < GM_MIB_CNT_SIZE; i++)
> -		gma_read16(hw, port, GM_MIB_CNT_BASE + 8 * i);
> +	for (i = GM_MIB_CNT_BASE; i <= GM_MIB_CNT_END; i += 4)
> +		gma_read16(hw, port, i);
>  	gma_write16(hw, port, GM_PHY_ADDR, reg);
>
>  	/* transmit control */
> --- test-2.6.orig/drivers/net/sky2.h
> +++ test-2.6/drivers/net/sky2.h
> @@ -1375,7 +1375,7 @@ enum {
>  	GM_PHY_ADDR	= 0x0088,	/* 16 bit r/w	GPHY Address Register */
>  /* MIB Counters */
>  	GM_MIB_CNT_BASE	= 0x0100,	/* Base Address of MIB Counters */
> -	GM_MIB_CNT_SIZE	= 256,
> +	GM_MIB_CNT_END	= 0x025C,	/* Last MIB counter */
>  };

Thanks for the very quick response. The patch indeed prevents the panic 
when bringing up the second interface, but now the host doesn't receive 
any packets anymore. It still sends packets (ARP requests, naturally). 
If I inject the Ethernet address of a second host into the arp table of 
the test subject, ICMP Echo requests are sent, but then sendmsg's 
buffer space is exhausted (?):
--8<--
[root@penguin1 ~]# arp -s 192.168.65.67 00:A0:D1:E1:F3:2C
[root@penguin1 ~]# ping 192.168.65.67
PING 192.168.65.67 (192.168.65.67) 56(84) bytes of data.
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available

--- 192.168.65.67 ping statistics ---
19 packets transmitted, 0 received, 100% packet loss, time 37012ms
-->8--

There is no hint of a malfunction to be found in the kernel's message 
buffer.

best regards
	Guenther

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1
  2006-04-12 22:26   ` Guenther Thomsen
@ 2006-04-17 18:18     ` Stephen Hemminger
  2006-04-26  0:06       ` sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1) Guenther Thomsen
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2006-04-17 18:18 UTC (permalink / raw)
  To: Guenther Thomsen; +Cc: John W. Linville, netdev

I don't know what you are doing different, but my 2 port SysKonnect card
is working fine.  Running SMP AMD64 and 2.6.17 latest.

Showing full speed on both ports.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1)
  2006-04-17 18:18     ` Stephen Hemminger
@ 2006-04-26  0:06       ` Guenther Thomsen
  2006-04-26 16:44         ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: Guenther Thomsen @ 2006-04-26  0:06 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: John W. Linville, netdev

On Monday 17 April 2006 11:18, Stephen Hemminger wrote:
> I don't know what you are doing different, but my 2 port SysKonnect
> card is working fine.  Running SMP AMD64 and 2.6.17 latest.
>
> Showing full speed on both ports.
I missed that e-mail, sorry.

I just gave it another try, this time with 2.6.16.11 . One port works 
fine (so far, I just did very limited testing with ttcp). The second port 
does negotiate IP address via DHCP, but the packgages it receives 
seem to be garbled:

--8<--
       0x0000:  0000 6175 6469 7428 3131 3435 3939 3430  ..audit(11459940
        0x0010:  3031 2e39 3738 3a33 3829 3a20 7573 6572  01.978:38):.user
        0x0020:  2070 6964 3d33 3230 3920 7569 643d       .pid=3209.uid=
12:56:23.725090 00:00:00:00:00:00 > 30:6e:6d:00:00:00 null I (s=32,r=55,P) len=42
12:56:24.603274 00:00:21:00:00:00 > 00:00:00:00:00:00 null disc/C len=43
12:56:26.619326 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:28.635346 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:29.734046 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:29.865239 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:30.651371 00:00:00:00:00:00 > a6:00:00:00:4d:04, ethertype Unknown (0xe20c), length 60:
        0x0000:  0000 6175 6469 7428 3131 3435 3939 3436  ..audit(11459946
        0x0010:  3031 2e33 3639 3a34 3729 3a20 7573 6572  01.369:47):.user
        0x0020:  2070 6964 3d33 3239 3820 7569 643d       .pid=3298.uid=
12:56:30.916718 00:00:f0:71:61:00 > 28:37:03:5b:3a:00 null I (s=16,r=0,C) len=42
12:56:30.923558 00:00:21:00:00:00 > 00:00:00:00:00:00 null rnr (r=55,C) len=42
12:56:32.667413 00:00:d0:2e:30:42 > 10:60:61:00:00:00, ethertype Unknown (0x572b), length 60:
        0x0000:  0000 d675 0d00 0000 0000 0200 0000 0000  ...u............
        0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0020:  0000 ffff ffff 0000 0000 1300 0000       ..............
12:56:33.296384 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
12:56:33.303222 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
[..]
13:00:44.340062 00:00:00:00:00:00 > 5f:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:44.672350 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:44.868724 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:45.340123 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:46.340173 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:46.688433 IP truncated-ip - 1454 bytes missing! 192.168.65.66.40313 > 192.168.65.65.5001: . 1426488980:1426490428(1448) ack 1790562292 win 1460 <nop,nop,timestamp[|tcp]>
13:00:48.704431 00:00:21:00:00:00 > 00:00:00:00:00:00 null I (s=17,r=18,C) len=42
13:00:48.886426 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:50.720463 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:52.736496 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:54.752522 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:54.927556 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
13:00:54.934394 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
-->8--
On a different host connected to the same switch, traffic looks more like:
--8<--
2:01:49.388992 IP 192.168.64.1.ntp > 255.255.255.255.ntp: NTPv3, Broadcast, length 48
12:01:50.176550 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:01:51.235034 arp reply 192.168.64.32 is-at 00:0a:49:00:5e:8a
12:01:51.241857 arp reply 192.168.64.33 is-at 00:0a:49:00:5e:8b
12:01:51.891193 00:00:01:02:c8:58 > 45:c0:00:1c:00:20, ethertype Unknown (0xe000), length 60:
        0x0000:  0001 1164 ee9b 0000 0000 0000 0000 0000  ...d............
        0x0010:  0000 0000 0000 0000 0000 0000 2f6b 8c87  ............/k..
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
12:01:52.192552 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:01:52.801392 arp reply 192.168.64.34 is-at 00:0a:49:00:5e:8c
12:01:52.808240 arp reply 192.168.64.35 is-at 00:0a:49:00:5e:8d
12:01:54.208495 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:01:56.224453 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:01:58.240464 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
12:02:00.029320 arp reply 192.168.64.39 is-at 00:0a:49:00:5e:ff
12:02:00.256420 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
-->8--

I noticed that the interrupt count is very low too (the interrupt count
as shown in /proc/interrupts is much higher):
--8<--
[root@penguin1 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:A0:D1:E1:F2:D8
          inet addr:192.168.65.65  Bcast:192.168.65.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4559786 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4071967 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:4680823977 (4.3 GiB)  TX bytes:4332319475 (4.0 GiB)
          Interrupt:169

eth1      Link encap:Ethernet  HWaddr 00:A0:D1:E1:F2:D9
          inet addr:192.168.64.199  Bcast:192.168.64.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2193 errors:0 dropped:0 overruns:0 frame:0
          TX packets:29 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:180137 (175.9 KiB)  TX bytes:1856 (1.8 KiB)
          Interrupt:169
-->8--

I then tried 2.6.17-rc2-git6. At first it looked OK, the second ethernet 
device was configured properly and I got some traffic through. Once 
I started copying large files (some 5GB were successfully copied) over 
NFS using a (very) fast NFS server though, traffic received by eth1 got
corrupted again:

--8<--
 [root@penguin1 ~]# tcpdump -n -i eth1 -s 0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
14:23:14.049450 arp who-has 192.168.64.199 tell 192.168.64.202
14:23:14.049519 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
14:23:14.745075 arp who-has 192.168.64.199 tell 192.168.64.202
14:23:14.745082 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
14:23:14.852108 IP truncated-ip - 1454 bytes missing! 192.168.64.110.nfs > 192.168.64.199.1021: . 159991419:159992879(1460) ack 3444328765 win 64240
14:23:14.944489 00:00:00:00:00:00 > a3:00:00:00:50:04, ethertype Unknown (0x210d), length 98:
        0x0000:  0000 6175 6469 7428 3131 3436 3030 3030  ..audit(11460000
        0x0010:  3032 2e31 3836 3a36 3329 3a20 7573 6572  02.186:63):.user
        0x0020:  2070 6964 3d33 3336 3120 7569 643d 3020  .pid=3361.uid=0.
        0x0030:  6175 6964 3d34 3239 3439 3637 3239 3520  auid=4294967295.
        0x0040:  6d73 673d 2750 414d 2073 6574 6372 6564  msg='PAM.setcred
        0x0050:  3a20 7573                                :.us
14:23:15.944703 arp who-has 192.168.64.253 tell 192.168.79.254
14:23:16.868291 arp who-has 192.168.64.199 tell 192.168.64.202
14:23:16.868301 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
14:23:16.944907 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
14:23:17.945113 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
14:23:18.884430 arp who-has 192.168.64.199 tell 192.168.64.202
14:23:18.884441 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
14:23:18.945318 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
-->8--

The ".audit ... PAM.sedcred" string is interesting. This is most likely 
not traffic from the net, but a text inside the host's RAM. Did some 
pointer get mangled?
 
I recompiled the kernel, now with RHFC4's gcc32. The result is similiar
(only after some data was copied using NFS, the second interface goes
bad):
--8<--
[root@penguin1 ~]# tcpdump -n -s 0 -i eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
15:48:02.306927 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8801
15:48:02.316088 arp who-has 192.168.64.202 tell 192.168.64.199
15:48:02.316329 arp who-has 192.168.64.199 tell 192.168.79.254
15:48:02.316335 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
15:48:02.316338 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
15:48:03.307095 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8802
15:48:03.307289 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8803
15:48:03.316166 arp who-has 192.168.64.202 tell 192.168.64.199
15:48:03.316397 arp who-has 192.168.64.199 tell 192.168.79.254
15:48:03.316401 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
15:48:03.316404 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
15:48:03.784698 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8804

12 packets captured
12 packets received by filter
0 packets dropped by kernel
-->8--
No suspect text and no zero filled packets, only truncated ones now,
but that's bad enough to stop NFS and cause bad packet loss:
--8<--
64 bytes from 192.168.64.199: icmp_seq=83 ttl=64 time=147073 ms
64 bytes from 192.168.64.199: icmp_seq=84 ttl=64 time=149073 ms
64 bytes from 192.168.64.199: icmp_seq=85 ttl=64 time=149073 ms
64 bytes from 192.168.64.199: icmp_seq=87 ttl=64 time=149073 ms
64 bytes from 192.168.64.199: icmp_seq=88 ttl=64 time=149073 ms
64 bytes from 192.168.64.199: icmp_seq=233 ttl=64 time=82023 ms
64 bytes from 192.168.64.199: icmp_seq=236 ttl=64 time=80018 ms
64 bytes from 192.168.64.199: icmp_seq=241 ttl=64 time=81018 ms
64 bytes from 192.168.64.199: icmp_seq=243 ttl=64 time=81018 ms
64 bytes from 192.168.64.199: icmp_seq=253 ttl=64 time=85018 ms
64 bytes from 192.168.64.199: icmp_seq=255 ttl=64 time=85018 ms
64 bytes from 192.168.64.199: icmp_seq=256 ttl=64 time=85629 ms
64 bytes from 192.168.64.199: icmp_seq=257 ttl=64 time=87023 ms

--- 192.168.64.199 ping statistics ---
346 packets transmitted, 63 received, +3 errors, 81% packet loss, time 345136ms
rtt min/avg/max/mdev = 80018.748/119940.275/149073.885/21090.211 ms, pipe 151
-->8--

Considering the recent NFS changes, I tried to get the system into this
state using just ttcp. With some determination, three more hosts and 
a few million packets, I succeeded. This time eth0 truncated packets
and traffic slowed to a crawl (~1 good packet every 2s).

Some progress has been made, but it's not quite solid yet.

best regards
	Guenther

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1)
  2006-04-26  0:06       ` sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1) Guenther Thomsen
@ 2006-04-26 16:44         ` Stephen Hemminger
  2006-04-26 17:41           ` Guenther Thomsen
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2006-04-26 16:44 UTC (permalink / raw)
  To: Guenther Thomsen; +Cc: John W. Linville, netdev

On Tue, 25 Apr 2006 17:06:25 -0700
Guenther Thomsen <gthomsen@bluearc.com> wrote:

> On Monday 17 April 2006 11:18, Stephen Hemminger wrote:
> > I don't know what you are doing different, but my 2 port SysKonnect
> > card is working fine.  Running SMP AMD64 and 2.6.17 latest.
> >
> > Showing full speed on both ports.
> I missed that e-mail, sorry.
> 
> I just gave it another try, this time with 2.6.16.11 . One port works 
> fine (so far, I just did very limited testing with ttcp). The second port 
> does negotiate IP address via DHCP, but the packgages it receives 
> seem to be garbled:
> 
> --8<--
>        0x0000:  0000 6175 6469 7428 3131 3435 3939 3430  ..audit(11459940
>         0x0010:  3031 2e39 3738 3a33 3829 3a20 7573 6572  01.978:38):.user
>         0x0020:  2070 6964 3d33 3230 3920 7569 643d       .pid=3209.uid=
> 12:56:23.725090 00:00:00:00:00:00 > 30:6e:6d:00:00:00 null I (s=32,r=55,P) len=42
> 12:56:24.603274 00:00:21:00:00:00 > 00:00:00:00:00:00 null disc/C len=43
> 12:56:26.619326 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:28.635346 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:29.734046 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:29.865239 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:30.651371 00:00:00:00:00:00 > a6:00:00:00:4d:04, ethertype Unknown (0xe20c), length 60:
>         0x0000:  0000 6175 6469 7428 3131 3435 3939 3436  ..audit(11459946
>         0x0010:  3031 2e33 3639 3a34 3729 3a20 7573 6572  01.369:47):.user
>         0x0020:  2070 6964 3d33 3239 3820 7569 643d       .pid=3298.uid=
> 12:56:30.916718 00:00:f0:71:61:00 > 28:37:03:5b:3a:00 null I (s=16,r=0,C) len=42
> 12:56:30.923558 00:00:21:00:00:00 > 00:00:00:00:00:00 null rnr (r=55,C) len=42
> 12:56:32.667413 00:00:d0:2e:30:42 > 10:60:61:00:00:00, ethertype Unknown (0x572b), length 60:
>         0x0000:  0000 d675 0d00 0000 0000 0200 0000 0000  ...u............
>         0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
>         0x0020:  0000 ffff ffff 0000 0000 1300 0000       ..............
> 12:56:33.296384 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:33.303222 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> [..]
> 13:00:44.340062 00:00:00:00:00:00 > 5f:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:44.672350 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:44.868724 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:45.340123 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:46.340173 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:46.688433 IP truncated-ip - 1454 bytes missing! 192.168.65.66.40313 > 192.168.65.65.5001: . 1426488980:1426490428(1448) ack 1790562292 win 1460 <nop,nop,timestamp[|tcp]>
> 13:00:48.704431 00:00:21:00:00:00 > 00:00:00:00:00:00 null I (s=17,r=18,C) len=42
> 13:00:48.886426 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:50.720463 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:52.736496 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:54.752522 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:54.927556 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:54.934394 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> -->8--
> On a different host connected to the same switch, traffic looks more like:
> --8<--
> 2:01:49.388992 IP 192.168.64.1.ntp > 255.255.255.255.ntp: NTPv3, Broadcast, length 48
> 12:01:50.176550 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:01:51.235034 arp reply 192.168.64.32 is-at 00:0a:49:00:5e:8a
> 12:01:51.241857 arp reply 192.168.64.33 is-at 00:0a:49:00:5e:8b
> 12:01:51.891193 00:00:01:02:c8:58 > 45:c0:00:1c:00:20, ethertype Unknown (0xe000), length 60:
>         0x0000:  0001 1164 ee9b 0000 0000 0000 0000 0000  ...d............
>         0x0010:  0000 0000 0000 0000 0000 0000 2f6b 8c87  ............/k..
>         0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
> 12:01:52.192552 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:01:52.801392 arp reply 192.168.64.34 is-at 00:0a:49:00:5e:8c
> 12:01:52.808240 arp reply 192.168.64.35 is-at 00:0a:49:00:5e:8d
> 12:01:54.208495 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:01:56.224453 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:01:58.240464 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:02:00.029320 arp reply 192.168.64.39 is-at 00:0a:49:00:5e:ff
> 12:02:00.256420 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> -->8--
> 
> I noticed that the interrupt count is very low too (the interrupt count
> as shown in /proc/interrupts is much higher):
> --8<--
> [root@penguin1 ~]# ifconfig
> eth0      Link encap:Ethernet  HWaddr 00:A0:D1:E1:F2:D8
>           inet addr:192.168.65.65  Bcast:192.168.65.255  Mask:255.255.255.0
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:4559786 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:4071967 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:4680823977 (4.3 GiB)  TX bytes:4332319475 (4.0 GiB)
>           Interrupt:169
> 
> eth1      Link encap:Ethernet  HWaddr 00:A0:D1:E1:F2:D9
>           inet addr:192.168.64.199  Bcast:192.168.64.255  Mask:255.255.255.0
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:2193 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:29 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:180137 (175.9 KiB)  TX bytes:1856 (1.8 KiB)
>           Interrupt:169
> -->8--
> 
> I then tried 2.6.17-rc2-git6. At first it looked OK, the second ethernet 
> device was configured properly and I got some traffic through. Once 
> I started copying large files (some 5GB were successfully copied) over 
> NFS using a (very) fast NFS server though, traffic received by eth1 got
> corrupted again:
> 
> --8<--
>  [root@penguin1 ~]# tcpdump -n -i eth1 -s 0
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
> 14:23:14.049450 arp who-has 192.168.64.199 tell 192.168.64.202
> 14:23:14.049519 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 14:23:14.745075 arp who-has 192.168.64.199 tell 192.168.64.202
> 14:23:14.745082 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 14:23:14.852108 IP truncated-ip - 1454 bytes missing! 192.168.64.110.nfs > 192.168.64.199.1021: . 159991419:159992879(1460) ack 3444328765 win 64240
> 14:23:14.944489 00:00:00:00:00:00 > a3:00:00:00:50:04, ethertype Unknown (0x210d), length 98:
>         0x0000:  0000 6175 6469 7428 3131 3436 3030 3030  ..audit(11460000
>         0x0010:  3032 2e31 3836 3a36 3329 3a20 7573 6572  02.186:63):.user
>         0x0020:  2070 6964 3d33 3336 3120 7569 643d 3020  .pid=3361.uid=0.
>         0x0030:  6175 6964 3d34 3239 3439 3637 3239 3520  auid=4294967295.
>         0x0040:  6d73 673d 2750 414d 2073 6574 6372 6564  msg='PAM.setcred
>         0x0050:  3a20 7573                                :.us
> 14:23:15.944703 arp who-has 192.168.64.253 tell 192.168.79.254
> 14:23:16.868291 arp who-has 192.168.64.199 tell 192.168.64.202
> 14:23:16.868301 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 14:23:16.944907 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
> 14:23:17.945113 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
> 14:23:18.884430 arp who-has 192.168.64.199 tell 192.168.64.202
> 14:23:18.884441 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 14:23:18.945318 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
> -->8--
> 
> The ".audit ... PAM.sedcred" string is interesting. This is most likely 
> not traffic from the net, but a text inside the host's RAM. Did some 
> pointer get mangled?
>  
> I recompiled the kernel, now with RHFC4's gcc32. The result is similiar
> (only after some data was copied using NFS, the second interface goes
> bad):
> --8<--
> [root@penguin1 ~]# tcpdump -n -s 0 -i eth1
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
> 15:48:02.306927 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8801
> 15:48:02.316088 arp who-has 192.168.64.202 tell 192.168.64.199
> 15:48:02.316329 arp who-has 192.168.64.199 tell 192.168.79.254
> 15:48:02.316335 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 15:48:02.316338 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 15:48:03.307095 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8802
> 15:48:03.307289 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8803
> 15:48:03.316166 arp who-has 192.168.64.202 tell 192.168.64.199
> 15:48:03.316397 arp who-has 192.168.64.199 tell 192.168.79.254
> 15:48:03.316401 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 15:48:03.316404 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 15:48:03.784698 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8804
> 
> 12 packets captured
> 12 packets received by filter
> 0 packets dropped by kernel
> -->8--
> No suspect text and no zero filled packets, only truncated ones now,
> but that's bad enough to stop NFS and cause bad packet loss:
> --8<--
> 64 bytes from 192.168.64.199: icmp_seq=83 ttl=64 time=147073 ms
> 64 bytes from 192.168.64.199: icmp_seq=84 ttl=64 time=149073 ms
> 64 bytes from 192.168.64.199: icmp_seq=85 ttl=64 time=149073 ms
> 64 bytes from 192.168.64.199: icmp_seq=87 ttl=64 time=149073 ms
> 64 bytes from 192.168.64.199: icmp_seq=88 ttl=64 time=149073 ms
> 64 bytes from 192.168.64.199: icmp_seq=233 ttl=64 time=82023 ms
> 64 bytes from 192.168.64.199: icmp_seq=236 ttl=64 time=80018 ms
> 64 bytes from 192.168.64.199: icmp_seq=241 ttl=64 time=81018 ms
> 64 bytes from 192.168.64.199: icmp_seq=243 ttl=64 time=81018 ms
> 64 bytes from 192.168.64.199: icmp_seq=253 ttl=64 time=85018 ms
> 64 bytes from 192.168.64.199: icmp_seq=255 ttl=64 time=85018 ms
> 64 bytes from 192.168.64.199: icmp_seq=256 ttl=64 time=85629 ms
> 64 bytes from 192.168.64.199: icmp_seq=257 ttl=64 time=87023 ms
> 
> --- 192.168.64.199 ping statistics ---
> 346 packets transmitted, 63 received, +3 errors, 81% packet loss, time 345136ms
> rtt min/avg/max/mdev = 80018.748/119940.275/149073.885/21090.211 ms, pipe 151
> -->8--
> 
> Considering the recent NFS changes, I tried to get the system into this
> state using just ttcp. With some determination, three more hosts and 
> a few million packets, I succeeded. This time eth0 truncated packets
> and traffic slowed to a crawl (~1 good packet every 2s).
> 
> Some progress has been made, but it's not quite solid yet.
> 

Are you saturating both ports on the card or only one?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1)
  2006-04-26 16:44         ` Stephen Hemminger
@ 2006-04-26 17:41           ` Guenther Thomsen
  0 siblings, 0 replies; 8+ messages in thread
From: Guenther Thomsen @ 2006-04-26 17:41 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: John W. Linville, netdev

On Wednesday 26 April 2006 09:44, Stephen Hemminger wrote:
> On Tue, 25 Apr 2006 17:06:25 -0700
>
> Guenther Thomsen <gthomsen@bluearc.com> wrote:
[..]
> > Considering the recent NFS changes, I tried to get the system into
> > this state using just ttcp. With some determination, three more
> > hosts and a few million packets, I succeeded. This time eth0
> > truncated packets and traffic slowed to a crawl (~1 good packet
> > every 2s).
> >
> > Some progress has been made, but it's not quite solid yet.
>
> Are you saturating both ports on the card or only one?

On the system under test I started four ttcp sessions: two senders and 
two receivers (the second one on a non-standard port). One pair for 
each port (device). I'm not sure, to which degree the device was 
saturated. It certainly should have been, since the remote hosts are 
capable of line rate, but I found the sending ttcp sessions on the 
system under test to be slow, as long as traffic was incoming.

best regards
	Guenther

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1
  2006-04-12 21:42 kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Guenther Thomsen
  2006-04-12 21:48 ` Stephen Hemminger
@ 2006-05-16 19:11 ` Stephen Hemminger
  1 sibling, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2006-05-16 19:11 UTC (permalink / raw)
  To: Guenther Thomsen; +Cc: John W. Linville, netdev

Could you try the 2.6.17-rc4 version with this patch. It turns out the board
seems to give out of order status responses.

Ignore the vendor sk98lin driver, when I try the stock version it spends it's
life resetting itself because it sets up PCI bus wrong. If I fix that, it spends
it's time getting confused because it can't handle intermixed status reports
properly (checksum et all is per port not per board).


 drivers/net/sky2.c |   28 +++++++++++++++++++++-------
 1 files changed, 21 insertions(+), 7 deletions(-)

792547bc5e8e4f7d5a1070a168056f429635c254
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index ffd267f..11e7914 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1020,8 +1020,27 @@ static int sky2_up(struct net_device *de
 	struct sky2_hw *hw = sky2->hw;
 	unsigned port = sky2->port;
 	u32 ramsize, rxspace, imask;
-	int err = -ENOMEM;
+	int cap, err;
+	struct net_device *otherdev = hw->dev[sky2->port^1];
 
+	/*
+	 * Reduce split transactions (and turn off) rx checksums to
+	 * prevent problems with dual ports.
+	 */
+	if (otherdev && netif_running(otherdev) &&
+	    (cap = pci_find_capability(hw->pdev, PCI_CAP_ID_PCIX))) {
+		struct sky2_port *osky2 = netdev_priv(otherdev);
+		u16 cmd;
+
+		cmd = sky2_pci_read16(hw, cap + PCI_X_CMD);
+		cmd &= ~PCI_X_CMD_MAX_SPLIT;
+		sky2_pci_write16(hw, cap + PCI_X_CMD, cmd);
+
+		sky2->rx_csum = 0;
+		osky2->rx_csum = 0;
+	}
+
+	err = -ENOMEM;
 	if (netif_msg_ifup(sky2))
 		printk(KERN_INFO PFX "%s: enabling interface\n", dev->name);
 
@@ -3067,12 +3086,7 @@ static __devinit struct net_device *sky2
 	sky2->duplex = -1;
 	sky2->speed = -1;
 	sky2->advertising = sky2_supported_modes(hw);
-
-	/* Receive checksum disabled for Yukon XL
-	 * because of observed problems with incorrect
-	 * values when multiple packets are received in one interrupt
-	 */
-	sky2->rx_csum = (hw->chip_id != CHIP_ID_YUKON_XL);
+	sky2->rx_csum = 1;
 
 	spin_lock_init(&sky2->phy_lock);
 	sky2->tx_pending = TX_DEF_PENDING;
-- 
1.2.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-05-16 19:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-12 21:42 kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Guenther Thomsen
2006-04-12 21:48 ` Stephen Hemminger
2006-04-12 22:26   ` Guenther Thomsen
2006-04-17 18:18     ` Stephen Hemminger
2006-04-26  0:06       ` sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1) Guenther Thomsen
2006-04-26 16:44         ` Stephen Hemminger
2006-04-26 17:41           ` Guenther Thomsen
2006-05-16 19:11 ` kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).