From: Stephen Hemminger <shemminger@osdl.org>
To: Guenther Thomsen <gthomsen@bluearc.com>
Cc: "John W. Linville" <linville@redhat.com>, netdev@vger.kernel.org
Subject: Re: sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1)
Date: Wed, 26 Apr 2006 09:44:45 -0700 [thread overview]
Message-ID: <20060426094445.6b892761@localhost.localdomain> (raw)
In-Reply-To: <200604251706.25617.gthomsen@bluearc.com>
On Tue, 25 Apr 2006 17:06:25 -0700
Guenther Thomsen <gthomsen@bluearc.com> wrote:
> On Monday 17 April 2006 11:18, Stephen Hemminger wrote:
> > I don't know what you are doing different, but my 2 port SysKonnect
> > card is working fine. Running SMP AMD64 and 2.6.17 latest.
> >
> > Showing full speed on both ports.
> I missed that e-mail, sorry.
>
> I just gave it another try, this time with 2.6.16.11 . One port works
> fine (so far, I just did very limited testing with ttcp). The second port
> does negotiate IP address via DHCP, but the packgages it receives
> seem to be garbled:
>
> --8<--
> 0x0000: 0000 6175 6469 7428 3131 3435 3939 3430 ..audit(11459940
> 0x0010: 3031 2e39 3738 3a33 3829 3a20 7573 6572 01.978:38):.user
> 0x0020: 2070 6964 3d33 3230 3920 7569 643d .pid=3209.uid=
> 12:56:23.725090 00:00:00:00:00:00 > 30:6e:6d:00:00:00 null I (s=32,r=55,P) len=42
> 12:56:24.603274 00:00:21:00:00:00 > 00:00:00:00:00:00 null disc/C len=43
> 12:56:26.619326 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:28.635346 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:29.734046 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:29.865239 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:30.651371 00:00:00:00:00:00 > a6:00:00:00:4d:04, ethertype Unknown (0xe20c), length 60:
> 0x0000: 0000 6175 6469 7428 3131 3435 3939 3436 ..audit(11459946
> 0x0010: 3031 2e33 3639 3a34 3729 3a20 7573 6572 01.369:47):.user
> 0x0020: 2070 6964 3d33 3239 3820 7569 643d .pid=3298.uid=
> 12:56:30.916718 00:00:f0:71:61:00 > 28:37:03:5b:3a:00 null I (s=16,r=0,C) len=42
> 12:56:30.923558 00:00:21:00:00:00 > 00:00:00:00:00:00 null rnr (r=55,C) len=42
> 12:56:32.667413 00:00:d0:2e:30:42 > 10:60:61:00:00:00, ethertype Unknown (0x572b), length 60:
> 0x0000: 0000 d675 0d00 0000 0000 0200 0000 0000 ...u............
> 0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 0x0020: 0000 ffff ffff 0000 0000 1300 0000 ..............
> 12:56:33.296384 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 12:56:33.303222 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> [..]
> 13:00:44.340062 00:00:00:00:00:00 > 5f:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:44.672350 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:44.868724 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:45.340123 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:46.340173 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:46.688433 IP truncated-ip - 1454 bytes missing! 192.168.65.66.40313 > 192.168.65.65.5001: . 1426488980:1426490428(1448) ack 1790562292 win 1460 <nop,nop,timestamp[|tcp]>
> 13:00:48.704431 00:00:21:00:00:00 > 00:00:00:00:00:00 null I (s=17,r=18,C) len=42
> 13:00:48.886426 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:50.720463 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:52.736496 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:54.752522 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:54.927556 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> 13:00:54.934394 00:00:00:00:00:00 > 00:00:00:00:00:00 null I (s=0,r=0,C) len=42
> -->8--
> On a different host connected to the same switch, traffic looks more like:
> --8<--
> 2:01:49.388992 IP 192.168.64.1.ntp > 255.255.255.255.ntp: NTPv3, Broadcast, length 48
> 12:01:50.176550 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:01:51.235034 arp reply 192.168.64.32 is-at 00:0a:49:00:5e:8a
> 12:01:51.241857 arp reply 192.168.64.33 is-at 00:0a:49:00:5e:8b
> 12:01:51.891193 00:00:01:02:c8:58 > 45:c0:00:1c:00:20, ethertype Unknown (0xe000), length 60:
> 0x0000: 0001 1164 ee9b 0000 0000 0000 0000 0000 ...d............
> 0x0010: 0000 0000 0000 0000 0000 0000 2f6b 8c87 ............/k..
> 0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
> 12:01:52.192552 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:01:52.801392 arp reply 192.168.64.34 is-at 00:0a:49:00:5e:8c
> 12:01:52.808240 arp reply 192.168.64.35 is-at 00:0a:49:00:5e:8d
> 12:01:54.208495 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:01:56.224453 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:01:58.240464 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 12:02:00.029320 arp reply 192.168.64.39 is-at 00:0a:49:00:5e:ff
> 12:02:00.256420 802.1d config 8000.00:a0:d1:e1:b4:78.8026 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> -->8--
>
> I noticed that the interrupt count is very low too (the interrupt count
> as shown in /proc/interrupts is much higher):
> --8<--
> [root@penguin1 ~]# ifconfig
> eth0 Link encap:Ethernet HWaddr 00:A0:D1:E1:F2:D8
> inet addr:192.168.65.65 Bcast:192.168.65.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:4559786 errors:0 dropped:0 overruns:0 frame:0
> TX packets:4071967 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:4680823977 (4.3 GiB) TX bytes:4332319475 (4.0 GiB)
> Interrupt:169
>
> eth1 Link encap:Ethernet HWaddr 00:A0:D1:E1:F2:D9
> inet addr:192.168.64.199 Bcast:192.168.64.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:2193 errors:0 dropped:0 overruns:0 frame:0
> TX packets:29 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:180137 (175.9 KiB) TX bytes:1856 (1.8 KiB)
> Interrupt:169
> -->8--
>
> I then tried 2.6.17-rc2-git6. At first it looked OK, the second ethernet
> device was configured properly and I got some traffic through. Once
> I started copying large files (some 5GB were successfully copied) over
> NFS using a (very) fast NFS server though, traffic received by eth1 got
> corrupted again:
>
> --8<--
> [root@penguin1 ~]# tcpdump -n -i eth1 -s 0
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
> 14:23:14.049450 arp who-has 192.168.64.199 tell 192.168.64.202
> 14:23:14.049519 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 14:23:14.745075 arp who-has 192.168.64.199 tell 192.168.64.202
> 14:23:14.745082 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 14:23:14.852108 IP truncated-ip - 1454 bytes missing! 192.168.64.110.nfs > 192.168.64.199.1021: . 159991419:159992879(1460) ack 3444328765 win 64240
> 14:23:14.944489 00:00:00:00:00:00 > a3:00:00:00:50:04, ethertype Unknown (0x210d), length 98:
> 0x0000: 0000 6175 6469 7428 3131 3436 3030 3030 ..audit(11460000
> 0x0010: 3032 2e31 3836 3a36 3329 3a20 7573 6572 02.186:63):.user
> 0x0020: 2070 6964 3d33 3336 3120 7569 643d 3020 .pid=3361.uid=0.
> 0x0030: 6175 6964 3d34 3239 3439 3637 3239 3520 auid=4294967295.
> 0x0040: 6d73 673d 2750 414d 2073 6574 6372 6564 msg='PAM.setcred
> 0x0050: 3a20 7573 :.us
> 14:23:15.944703 arp who-has 192.168.64.253 tell 192.168.79.254
> 14:23:16.868291 arp who-has 192.168.64.199 tell 192.168.64.202
> 14:23:16.868301 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 14:23:16.944907 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
> 14:23:17.945113 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
> 14:23:18.884430 arp who-has 192.168.64.199 tell 192.168.64.202
> 14:23:18.884441 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 14:23:18.945318 IP truncated-ip - 12 bytes missing! 192.168.64.101.netbios-ns > 192.168.64.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
> -->8--
>
> The ".audit ... PAM.sedcred" string is interesting. This is most likely
> not traffic from the net, but a text inside the host's RAM. Did some
> pointer get mangled?
>
> I recompiled the kernel, now with RHFC4's gcc32. The result is similiar
> (only after some data was copied using NFS, the second interface goes
> bad):
> --8<--
> [root@penguin1 ~]# tcpdump -n -s 0 -i eth1
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
> 15:48:02.306927 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8801
> 15:48:02.316088 arp who-has 192.168.64.202 tell 192.168.64.199
> 15:48:02.316329 arp who-has 192.168.64.199 tell 192.168.79.254
> 15:48:02.316335 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 15:48:02.316338 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 15:48:03.307095 IP 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8802
> 15:48:03.307289 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8803
> 15:48:03.316166 arp who-has 192.168.64.202 tell 192.168.64.199
> 15:48:03.316397 arp who-has 192.168.64.199 tell 192.168.79.254
> 15:48:03.316401 arp reply 192.168.64.199 is-at 00:a0:d1:e1:f2:d9
> 15:48:03.316404 802.1d config 8000.00:a0:d1:e1:b4:78.8025 root 8000.00:a0:d1:e1:b4:78 pathcost 0 age 0 max 20 hello 2 fdelay 15
> 15:48:03.784698 IP truncated-ip - 38 bytes missing! 192.168.64.202 > 192.168.64.199: icmp 64: echo request seq 8804
>
> 12 packets captured
> 12 packets received by filter
> 0 packets dropped by kernel
> -->8--
> No suspect text and no zero filled packets, only truncated ones now,
> but that's bad enough to stop NFS and cause bad packet loss:
> --8<--
> 64 bytes from 192.168.64.199: icmp_seq=83 ttl=64 time=147073 ms
> 64 bytes from 192.168.64.199: icmp_seq=84 ttl=64 time=149073 ms
> 64 bytes from 192.168.64.199: icmp_seq=85 ttl=64 time=149073 ms
> 64 bytes from 192.168.64.199: icmp_seq=87 ttl=64 time=149073 ms
> 64 bytes from 192.168.64.199: icmp_seq=88 ttl=64 time=149073 ms
> 64 bytes from 192.168.64.199: icmp_seq=233 ttl=64 time=82023 ms
> 64 bytes from 192.168.64.199: icmp_seq=236 ttl=64 time=80018 ms
> 64 bytes from 192.168.64.199: icmp_seq=241 ttl=64 time=81018 ms
> 64 bytes from 192.168.64.199: icmp_seq=243 ttl=64 time=81018 ms
> 64 bytes from 192.168.64.199: icmp_seq=253 ttl=64 time=85018 ms
> 64 bytes from 192.168.64.199: icmp_seq=255 ttl=64 time=85018 ms
> 64 bytes from 192.168.64.199: icmp_seq=256 ttl=64 time=85629 ms
> 64 bytes from 192.168.64.199: icmp_seq=257 ttl=64 time=87023 ms
>
> --- 192.168.64.199 ping statistics ---
> 346 packets transmitted, 63 received, +3 errors, 81% packet loss, time 345136ms
> rtt min/avg/max/mdev = 80018.748/119940.275/149073.885/21090.211 ms, pipe 151
> -->8--
>
> Considering the recent NFS changes, I tried to get the system into this
> state using just ttcp. With some determination, three more hosts and
> a few million packets, I succeeded. This time eth0 truncated packets
> and traffic slowed to a crawl (~1 good packet every 2s).
>
> Some progress has been made, but it's not quite solid yet.
>
Are you saturating both ports on the card or only one?
next prev parent reply other threads:[~2006-04-26 16:46 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-12 21:42 kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Guenther Thomsen
2006-04-12 21:48 ` Stephen Hemminger
2006-04-12 22:26 ` Guenther Thomsen
2006-04-17 18:18 ` Stephen Hemminger
2006-04-26 0:06 ` sky2 driver problems in 2.6.17-rc2-git6 (was: Re: kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1) Guenther Thomsen
2006-04-26 16:44 ` Stephen Hemminger [this message]
2006-04-26 17:41 ` Guenther Thomsen
2006-05-16 19:11 ` kernel panic (on DHCP discover?) in sky2 driver of 2.6.17-rc1 Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060426094445.6b892761@localhost.localdomain \
--to=shemminger@osdl.org \
--cc=gthomsen@bluearc.com \
--cc=linville@redhat.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).