From: Stephen Hemminger <shemminger@osdl.org>
To: Guenther Thomsen <gthomsen@bluearc.com>
Cc: "John W. Linville" <linville@redhat.com>, netdev@vger.kernel.org
Subject: Re: sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1)
Date: Wed, 26 Apr 2006 09:44:45 -0700 [thread overview]
Message-ID: <20060426094445.6b892761@localhost.localdomain> (raw)
In-Reply-To: <200604251706.25617.gthomsen@bluearc.com>
On Tue, 25 Apr 2006 17:06:25 -0700
Guenther Thomsen <gthomsen@bluearc.com> wrote:
> On Monday 17 April 2006 11:18, Stephen Hemminger wrote:
> > I don't know what you are doing different, but my 2 port SysKonnect
> > card is working fine. Running SMP AMD64 and 2.6.17 latest.
> >
> > Showing full speed on both ports.
> I missed that e-mail, sorry.
>
> I just gave it another try, this time with 2.6.16.11 . One port works
> fine (so far, I just did very limited testing with ttcp). The second port
> does negotiate IP address via DHCP, but the packgages it receives
> seem to be garbled:
>
> --8<--
> 0x0000: 0000 6175 6469 7428 3131 3435 3939 3430 ..audit(11459940
> 0x0010: 3031 2e39 3738 3a33 3829 3a20 7573 6572 01.978:38):.user
> 0x0020: 2070 6964 3d33 3230 3920 7569 643d .pid=3209.uid=
> 12:56:23.725090 00:00:00:00:00:00 > 30:6e:6d:00:00:00 null I (s=32,r=55,P) len=42
> 12:56:24.603274 00:00:21:00:00:00 > 00:00:00:00:00:00 null disc/C len=43
> 12:56:26.619326 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:28.635346 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:29.734046 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:29.865239 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:30.651371 00:00:00:00:00:00 > a6:00:00:00:4d:04, ethertype Unknown (0xe20c), length 60:
> 0x0000: 0000 6175 6469 7428 3131 3435 3939 3436 ..audit(11459946
> 0x0010: 3031 2e33 3639 3a34 3729 3a20 7573 6572 01.369:47):.user
> 0x0020: 2070 6964 3d33 3239 3820 7569 643d .pid=3298.uid=
> 12:56:30.916718 00:00:f0:71:61:00 > 28:37:03:5b:3a:00 null I (s=16,r=0,C) len=42
> 12:56:30.923558 00:00:21:00:00:00 > 00:00:00:00:00:00 null rnr (r=55,C) len=42
> 12:56:32.667413 00:00:d0:2e:30:42 > 10:60:61:00:00:00, ethertype Unknown (0x572b), length 60:
> 0x0000: 0000 d675 0d00 0000 0000 0200 0000 0000 ...u............
> 0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 0x0020: 0000 ffff ffff 0000 0000 1300 0000 ..............
> 12:56:33.296384 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:33.303222 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> [..]
> 13:00:44.340062 00:00:00:00:00:00 > 5f:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:44.672350 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:44.868724 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:45.340123 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:46.340173 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:46.688433 IP truncated-ip - 1454 bytes missing! 192.168.65.66.40313 > 192.168.65.65.5001: . 1426488980:1426490428(1448) ack 1790562292 win 1460 <nop,nop,timestamp[|tcp]>
> 13:00:48.704431 00:00:21:00:00:00 > 00:00:00:00:00:00 null I (s=17,r=18,C) len=42
> 13:00:48.886426 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:50.720463 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:52.736496 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:54.752522 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:54.927556 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:54.934394 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> -->8--
> On a different host connected to the same switch, traffic looks more like:
> --8<--
> 2:01:49.388992 IP 192.168.64.1.ntp > 255.255.255.255.ntp: NTPv3, Broadcast, length 48
> 12:01:50.176550 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:01:51.235034 arp reply 192.168.64.32 is-at 00:0a:49:00:5e:8a
> 12:01:51.241857 arp reply 192.168.64.33 is-at 00:0a:49:00:5e:8b
> 12:01:51.891193 00:00:01:02:c8:58 > 45:c0:00:1c:00:20, ethertype Unknown (0xe000), length 60:
> 0x0000: 0001 1164 ee9b 0000 0000 0000 0000 0000 ...d............
> 0x0010: 0000 0000 0000 0000 0000 0000 2f6b 8c87 ............/k..
> 0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
> 12:01:52.192552 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:01:52.801392 arp reply 192.168.64.34 is-at 00:0a:49:00:5e:8c
> 12:01:52.808240 arp reply 192.168.64.35 is-at 00:0a:49:00:5e:8d
> 12:01:54.208495 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:01:56.224453 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:01:58.240464 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:02:00.029320 arp reply 192.168.64.39 is-at 00:0a:49:00:5e:ff
> 12:02:00.256420 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> -->8--
>
> I noticed that the interrupt count is very low too (the interrupt count
> as shown in /proc/interrupts is much higher):
> --8<--
> [root@penguin1 ~]# ifconfig
> eth0 Link encap:Ethernet HWaddr 00:A0:D1:E1:F2:D8
> inet addr:192.168.65.65 Bcast:192.168.65.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:4559786 errors:0 dropped:0 overruns:0 frame:0
> TX packets:4071967 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:4680823977 (4.3 GiB) TX bytes:4332319475 (4.0 GiB)
> Interrupt:169
>
> eth1 Link encap:Ethernet HWaddr 00:A0:D1:E1:F2:D9
> inet addr:192.168.64.199 Bcast:192.168.64.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:2193 errors:0 dropped:0 overruns:0 frame:0
> TX packets:29 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:180137 (175.9 KiB) TX bytes:1856 (1.8 KiB)
> Interrupt:169
> -->8--
>
> I then tried 2.6.17-rc2-git6. At first it looked OK, the second ethernet
> device was configured properly and I got some traffic through. Once
> I started copying large files (some 5GB were successfully copied) over
> NFS using a (very) fast NFS server though, traffic received by eth1 got
> corrupted again:
>
> --8<--
> [root@penguin1 ~]# tcpdump -n -i eth1 -s 0
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
> 14:23:14.049450 arp who-has 192.168.64.199 tell 192.168.64.202
> 14:23:14.049519 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 14:23:14.745075 arp who-has 192.168.64.199 tell 192.168.64.202
> 14:23:14.745082 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 14:23:14.852108 IP truncated-ip - 1454 bytes missing! 192.168.64.110.nfs > 192.168.64.199.1021: . 159991419:159992879(1460) ack 3444328765 win 64240
> 14:23:14.944489 00:00:00:00:00:00 > a3:00:00:00:50:04, ethertype Unknown (0x210d), length 98:
> 0x0000: 0000 6175 6469 7428 3131 3436 3030 3030 ..audit(11460000
> 0x0010: 3032 2e31 3836 3a36 3329 3a20 7573 6572 02.186:63):.user
> 0x0020: 2070 6964 3d33 3336 3120 7569 643d 3020 .pid=3361.uid=0.
> 0x0030: 6175 6964 3d34 3239 3439 3637 3239 3520 auid=4294967295.
> 0x0040: 6d73 673d 2750 414d 2073 6574 6372 6564 msg='PAM.setcred
> 0x0050: 3a20 7573 :.us
> 14:23:15.944703 arp who-has 192.168.64.253 tell 192.168.79.254
> 14:23:16.868291 arp who-has 192.168.64.199 tell 192.168.64.202
> 14:23:16.868301 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 14:23:16.944907 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
> 14:23:17.945113 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
> 14:23:18.884430 arp who-has 192.168.64.199 tell 192.168.64.202
> 14:23:18.884441 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 14:23:18.945318 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
> -->8--
>
> The ".audit ... PAM.sedcred" string is interesting. This is most likely
> not traffic from the net, but a text inside the host's RAM. Did some
> pointer get mangled?
>
> I recompiled the kernel, now with RHFC4's gcc32. The result is similiar
> (only after some data was copied using NFS, the second interface goes
> bad):
> --8<--
> [root@penguin1 ~]# tcpdump -n -s 0 -i eth1
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
> 15:48:02.306927 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8801
> 15:48:02.316088 arp who-has 192.168.64.202 tell 192.168.64.199
> 15:48:02.316329 arp who-has 192.168.64.199 tell 192.168.79.254
> 15:48:02.316335 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 15:48:02.316338 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 15:48:03.307095 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8802
> 15:48:03.307289 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8803
> 15:48:03.316166 arp who-has 192.168.64.202 tell 192.168.64.199
> 15:48:03.316397 arp who-has 192.168.64.199 tell 192.168.79.254
> 15:48:03.316401 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 15:48:03.316404 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 15:48:03.784698 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8804
>
> 12 packets captured
> 12 packets received by filter
> 0 packets dropped by kernel
> -->8--
> No suspect text and no zero filled packets, only truncated ones now,
> but that's bad enough to stop NFS and cause bad packet loss:
> --8<--
> 64 bytes from 192.168.64.199: icmp_seq=83 ttl=64 time=147073 ms
> 64 bytes from 192.168.64.199: icmp_seq=84 ttl=64 time=149073 ms
> 64 bytes from 192.168.64.199: icmp_seq=85 ttl=64 time=149073 ms
> 64 bytes from 192.168.64.199: icmp_seq=87 ttl=64 time=149073 ms
> 64 bytes from 192.168.64.199: icmp_seq=88 ttl=64 time=149073 ms
> 64 bytes from 192.168.64.199: icmp_seq=233 ttl=64 time=82023 ms
> 64 bytes from 192.168.64.199: icmp_seq=236 ttl=64 time=80018 ms
> 64 bytes from 192.168.64.199: icmp_seq=241 ttl=64 time=81018 ms
> 64 bytes from 192.168.64.199: icmp_seq=243 ttl=64 time=81018 ms
> 64 bytes from 192.168.64.199: icmp_seq=253 ttl=64 time=85018 ms
> 64 bytes from 192.168.64.199: icmp_seq=255 ttl=64 time=85018 ms
> 64 bytes from 192.168.64.199: icmp_seq=256 ttl=64 time=85629 ms
> 64 bytes from 192.168.64.199: icmp_seq=257 ttl=64 time=87023 ms
>
> --- 192.168.64.199 ping statistics ---
> 346 packets transmitted, 63 received, +3 errors, 81% packet loss, time 345136ms
> rtt min/avg/max/mdev = 80018.748/119940.275/149073.885/21090.211 ms, pipe 151
> -->8--
>
> Considering the recent NFS changes, I tried to get the system into this
> state using just ttcp. With some determination, three more hosts and
> a few million packets, I succeeded. This time eth0 truncated packets
> and traffic slowed to a crawl (~1 good packet every 2s).
>
> Some progress has been made, but it's not quite solid yet.
>
Are you saturating both ports on the card or only one?
next prev parent reply other threads:[~2006-04-26 16:46 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-12 21:42 kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Guenther Thomsen
2006-04-12 21:48 ` Stephen Hemminger
2006-04-12 22:26 ` Guenther Thomsen
2006-04-17 18:18 ` Stephen Hemminger
2006-04-26 0:06 ` sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1) Guenther Thomsen
2006-04-26 16:44 ` Stephen Hemminger [this message]
2006-04-26 17:41 ` Guenther Thomsen
2006-05-16 19:11 ` kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060426094445.6b892761@localhost.localdomain \
--to=shemminger@osdl.org \
--cc=gthomsen@bluearc.com \
--cc=linville@redhat.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.