netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* net-2.6.22 UDP stalls/hangs
@ 2007-04-23 20:07 Andrew Morton
  2007-04-23 20:18 ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2007-04-23 20:07 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev


I have a problem here.  To eliminate other -mm things I tested bare
git+ssh://master.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.22.git
as of 15 minutes ago and the problem is there too.

The machine is x86_64 running FC6.  The config is based on RH's own FC6
config and it's at http://userweb.kernel.org/~akpm/config-akpm2.txt

Symptoms are that networking-related initscripts take a looooong time. 
statd and cups take maybe a minute and when it gets to starting sendmail,
things appear to hang permanently - I hit the switch after a few minutes.

A sysrq-T was taken during the statd bringup stall:
http://userweb.kernel.org/~akpm/dmesg-akpm2.txt

The interesting bit is:


Apr 23 12:01:15 akpm2 kernel: rpc.statd     S 0000001f2b1f297b     0  3479   3478 (NOTLB)
Apr 23 12:01:15 akpm2 kernel:  ffff81024ef2fb28 0000000000000082 0000000000000000 00000009000000c6
Apr 23 12:01:15 akpm2 kernel:  0000000000000246 000000000000004c ffff81025eba8040 ffff81025fe08100
Apr 23 12:01:15 akpm2 kernel:  ffff81025eba8258 000000075e936000 00000000ffff29e4 0000000000000286
Apr 23 12:01:15 akpm2 kernel: Call Trace:
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff8048c513>] udp_poll+0x0/0x104
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff8025e759>] schedule_timeout+0x8a/0xad
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff8028d7a3>] process_timeout+0x0/0x5
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff8022ed7b>] do_sys_poll+0x27a/0x35c
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff8021e6a6>] __pollwait+0x0/0xdd
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff80284486>] default_wake_function+0x0/0xe
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff80333c56>] socket_has_perm+0x5b/0x68
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff802510ea>] sock_sendmsg+0xea/0x107
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff8048eacb>] arp_bind_neighbour+0x6b/0x9f
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff8029696e>] autoremove_wake_function+0x0/0x2e
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff80254bf4>] __ip_route_output_key+0x709/0x7c4
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff804565a5>] sys_sendto+0x128/0x151
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff80331502>] file_has_perm+0x48/0xa3
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff80249319>] sys_poll+0x32/0x3b
Apr 23 12:01:15 akpm2 kernel:  [<ffffffff8025911e>] system_call+0x7e/0x83
Apr 23 12:01:15 akpm2 kernel: 
Apr 23 12:01:51 akpm2 rpc.statd[3479]: gethostbyname error for akpm2.corp.google.com

I think I saw the same problem maybe 1.5 weeks ago on this machine, but I
didn't have time to investigate further.  So it's not some recent thing.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net-2.6.22 UDP stalls/hangs
  2007-04-23 20:07 net-2.6.22 UDP stalls/hangs Andrew Morton
@ 2007-04-23 20:18 ` David Miller
  2007-04-23 20:27   ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2007-04-23 20:18 UTC (permalink / raw)
  To: akpm; +Cc: netdev

From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 23 Apr 2007 13:07:34 -0700

> The interesting bit is:
 ...
> I think I saw the same problem maybe 1.5 weeks ago on this machine, but I
> didn't have time to investigate further.  So it's not some recent thing.

My initial reaction is that DNS responses are being lost or dropped
for some reason.

Please run tcpdump on the network while this machine comes up and
see if DNS requests and replies are going back and forth
correctly.

Also, please indicate what kind of network card is in this computer.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net-2.6.22 UDP stalls/hangs
  2007-04-23 20:18 ` David Miller
@ 2007-04-23 20:27   ` Andrew Morton
  2007-04-23 20:37     ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2007-04-23 20:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Mon, 23 Apr 2007 13:18:10 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Mon, 23 Apr 2007 13:07:34 -0700
> 
> > The interesting bit is:
>  ...
> > I think I saw the same problem maybe 1.5 weeks ago on this machine, but I
> > didn't have time to investigate further.  So it's not some recent thing.
> 
> My initial reaction is that DNS responses are being lost or dropped
> for some reason.

Plausible.   I'll try booting it with the ethernet unplugged.

> Please run tcpdump on the network while this machine comes up and
> see if DNS requests and replies are going back and forth
> correctly.

hm.  That would require a hub and rather a lot of head-scratching.

> Also, please indicate what kind of network card is in this computer.

e1000:

05:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net-2.6.22 UDP stalls/hangs
  2007-04-23 20:27   ` Andrew Morton
@ 2007-04-23 20:37     ` David Miller
  2007-04-23 20:56       ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2007-04-23 20:37 UTC (permalink / raw)
  To: akpm; +Cc: netdev

From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 23 Apr 2007 13:27:19 -0700

> On Mon, 23 Apr 2007 13:18:10 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
> 
> > From: Andrew Morton <akpm@linux-foundation.org>
> > Date: Mon, 23 Apr 2007 13:07:34 -0700
> > 
> > > The interesting bit is:
> >  ...
> > > I think I saw the same problem maybe 1.5 weeks ago on this machine, but I
> > > didn't have time to investigate further.  So it's not some recent thing.
> > 
> > My initial reaction is that DNS responses are being lost or dropped
> > for some reason.
> 
> Plausible.   I'll try booting it with the ethernet unplugged.

That won't test the same scenerio.

If the network cable is unplugged, ARP responses won't arrive and
therefore sendmsg() calls will return with a host unreachable error.

The situation you need to recreate is specifically UDP packets getting
dropped.

The reason I wanted the tcpdump trace is so that we can see whether
the problem is UDP packets going out or going in which are being
mangled/dropped.

You don't need a hub to get a dump.  Instead you can run a caching
named on some other system, configure your FC6 box to use that system
for DNS via /etc/resolv.conf, then run tcpdump on the caching named
machine.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net-2.6.22 UDP stalls/hangs
  2007-04-23 20:37     ` David Miller
@ 2007-04-23 20:56       ` Andrew Morton
  2007-04-23 21:17         ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2007-04-23 20:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Mon, 23 Apr 2007 13:37:30 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Mon, 23 Apr 2007 13:27:19 -0700
> 
> > On Mon, 23 Apr 2007 13:18:10 -0700 (PDT)
> > David Miller <davem@davemloft.net> wrote:
> > 
> > > From: Andrew Morton <akpm@linux-foundation.org>
> > > Date: Mon, 23 Apr 2007 13:07:34 -0700
> > > 
> > > > The interesting bit is:
> > >  ...
> > > > I think I saw the same problem maybe 1.5 weeks ago on this machine, but I
> > > > didn't have time to investigate further.  So it's not some recent thing.
> > > 
> > > My initial reaction is that DNS responses are being lost or dropped
> > > for some reason.
> > 
> > Plausible.   I'll try booting it with the ethernet unplugged.
> 
> That won't test the same scenerio.
> 
> If the network cable is unplugged, ARP responses won't arrive and
> therefore sendmsg() calls will return with a host unreachable error.
> 
> The situation you need to recreate is specifically UDP packets getting
> dropped.
> 
> The reason I wanted the tcpdump trace is so that we can see whether
> the problem is UDP packets going out or going in which are being
> mangled/dropped.
> 
> You don't need a hub to get a dump.  Instead you can run a caching
> named on some other system, configure your FC6 box to use that system
> for DNS via /etc/resolv.conf, then run tcpdump on the caching named
> machine.

hm, fancy.



I unplugged the cable and the machine booted normally.  Lots of commands
were hanging when I plugged it back in.

I plugged the cable back in and on one console ran

	tcpdump -l -i eth0

but of course tcpdump didn't do anything because it wants to do reverse
lookups.  But interestingly, tcpdump was taking maybe 15 seconds to respond
to ^c and to killall.  tcpdump was stuck in udp_poll(), like statd was. 
But I think it's significant that we're not taking signals while in that
interruptible sleep.

I am able to ping the test machine from another host on the same network.

On the test machine I used `tcpdump -l -n -i eth0' and on another vt, ran
`ping www.google.com'.  The test machine is 172.18.116.155

13:40:51.120004 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
13:40:51.489171 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
13:40:52.567615 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
13:40:53.489201 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
13:40:53.755655 arp who-has 172.18.119.254 tell 172.18.116.155
13:40:53.755991 arp reply 172.18.119.254 is-at 00:00:0c:07:ac:01
13:40:53.755997 IP 172.18.116.155.32806 > 172.24.0.7.domain:  42807+ A? www.google.com. (32)
13:40:53.991979 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
13:40:55.435664 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
13:40:55.514942 IP 172.18.116.45.netbios-dgm > 172.18.119.255.netbios-dgm: NBT UDP PACKET(138)
13:40:55.710092 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
13:40:56.463086 arp who-has 172.18.119.254 tell 172.18.116.45
13:40:56.856033 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
13:40:57.709673 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
13:40:58.331717 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
13:40:58.751949 IP 172.18.116.155.32807 > 172.25.146.107.domain:  42807+ A? www.google.com. (32)
13:40:59.276068 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-unknown (3) 16: state=initial group=2 [|hsrp]
13:40:59.709703 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
13:40:59.716492 IP 172.18.119.178.netbios-dgm > 172.18.119.255.netbios-dgm: NBT UDP PACKET(138)
13:40:59.814742 arp who-has 172.18.119.254 tell 172.18.116.206
13:40:59.844096 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
13:41:01.215791 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
13:41:01.709583 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
13:41:01.751918 IP 172.18.116.199.ipp > 172.18.119.255.ipp: UDP, length 124
13:41:02.776596 arp who-has 172.18.119.254 tell 172.18.117.227
13:41:02.836204 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
13:41:03.709613 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 

so it looks like we tried to send the query but we didn't see anything come back.



Which means I need to do the caching named thing.  I tried (using RH's fc6
kernel), but it doesn't work.  Help?

On 172.18.116.160 I'm running

root      7375  0.0  0.0  75496   500 ?        Ssl  Jan22   0:00 /usr/sbin/nscd-2.3.2 -f /etc/nscd-2.3.2.conf

and on the test machine I put

nameserver 172.18.116.160

into /etc/resolv.conf.

Is nscd the caching named which you're referring to?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net-2.6.22 UDP stalls/hangs
  2007-04-23 20:56       ` Andrew Morton
@ 2007-04-23 21:17         ` David Miller
  2007-04-23 21:45           ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2007-04-23 21:17 UTC (permalink / raw)
  To: akpm; +Cc: netdev

From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 23 Apr 2007 13:56:39 -0700

> On Mon, 23 Apr 2007 13:37:30 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
> 
> > From: Andrew Morton <akpm@linux-foundation.org>
> > Date: Mon, 23 Apr 2007 13:27:19 -0700
> > 
> > > On Mon, 23 Apr 2007 13:18:10 -0700 (PDT)
> > > David Miller <davem@davemloft.net> wrote:
> > > 
> > > > From: Andrew Morton <akpm@linux-foundation.org>
> > > > Date: Mon, 23 Apr 2007 13:07:34 -0700
> > > > 
> > > > > The interesting bit is:
> > > >  ...
> > > > > I think I saw the same problem maybe 1.5 weeks ago on this machine, but I
> > > > > didn't have time to investigate further.  So it's not some recent thing.
> > > > 
> > > > My initial reaction is that DNS responses are being lost or dropped
> > > > for some reason.
> > > 
> > > Plausible.   I'll try booting it with the ethernet unplugged.
> > 
> > That won't test the same scenerio.
> > 
> > If the network cable is unplugged, ARP responses won't arrive and
> > therefore sendmsg() calls will return with a host unreachable error.
> > 
> > The situation you need to recreate is specifically UDP packets getting
> > dropped.
> > 
> > The reason I wanted the tcpdump trace is so that we can see whether
> > the problem is UDP packets going out or going in which are being
> > mangled/dropped.
> > 
> > You don't need a hub to get a dump.  Instead you can run a caching
> > named on some other system, configure your FC6 box to use that system
> > for DNS via /etc/resolv.conf, then run tcpdump on the caching named
> > machine.
> 
> hm, fancy.
> 
> 
> 
> I unplugged the cable and the machine booted normally.  Lots of commands
> were hanging when I plugged it back in.
> 
> I plugged the cable back in and on one console ran
> 
> 	tcpdump -l -i eth0
> 
> but of course tcpdump didn't do anything because it wants to do reverse
> lookups.  But interestingly, tcpdump was taking maybe 15 seconds to respond
> to ^c and to killall.  tcpdump was stuck in udp_poll(), like statd was. 
> But I think it's significant that we're not taking signals while in that
> interruptible sleep.
> 
> I am able to ping the test machine from another host on the same network.
> 
> On the test machine I used `tcpdump -l -n -i eth0' and on another vt, ran
> `ping www.google.com'.  The test machine is 172.18.116.155
> 
> 13:40:53.755997 IP 172.18.116.155.32806 > 172.24.0.7.domain:  42807+ A? www.google.com. (32)

...

no reply from 172.24.0.7

> 13:40:58.751949 IP 172.18.116.155.32807 > 172.25.146.107.domain:  42807+ A? www.google.com. (32)

...

no reply from 172.25.146.107

> so it looks like we tried to send the query but we didn't see anything come back.

Right.
> 
> Is nscd the caching named which you're referring to?

I would respond, but I first checked how many responses show up when
giving "caching named fedora" to google, and decided that you can
figure it out yourself :-)

More seriously, you need to install the "caching-nameserver" package
it appears, on Fedora.

nscd is not named, nscd is a part of glibc

named is part of the 'bind' package, you know, the standard DNS daemon
implementation for the past say 15 years or so... 

Aparently this 'caching-nameserver' package will bring in 'bind' plus
some configuration files that will give you a caching nameserver
setup.

You might have to tweak things for bind to allow non-local
connections.  On the machine where you install 'caching-nameserver'
use 127.0.0.1 in /etc/resolv.conf and make sure DNS lookups work, then
you can test on the FC6 system by using the other systems's IP
address.

And that's enough sysadmin FAQ'age for me for one day... :-/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net-2.6.22 UDP stalls/hangs
  2007-04-23 21:17         ` David Miller
@ 2007-04-23 21:45           ` Andrew Morton
  2007-04-23 22:12             ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2007-04-23 21:45 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Mon, 23 Apr 2007 14:17:06 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> > Is nscd the caching named which you're referring to?
> 
> I would respond, but I first checked how many responses show up when
> giving "caching named fedora" to google, and decided that you can
> figure it out yourself :-)
> 
> More seriously, you need to install the "caching-nameserver" package
> it appears, on Fedora.

The machine on which I'd need to run the caching nameserver is in fact
running hacked-about Ubuntu.  It's on the corporate network so I risk being
chased by angry people with pointy sticks and `apt-cache search nameserver'
and `apt-cache search caching' don't come up with anything useful.

Sigh.  Looks like I'll need to drag in the hub from home tomorrow.

Or git-bisect.  Seems that rc7-mm1 isn't getting any closer.  I guess I'll
need to drop git-net and git-wireless and a bunch of other stuff for now so
I can get in and diagnose all the other bugs.


Let me play around with udpspam a bit.  It doesn't _have_ to be the
resolver.  Are there any simple UDP-based client/server test apps around?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net-2.6.22 UDP stalls/hangs
  2007-04-23 21:45           ` Andrew Morton
@ 2007-04-23 22:12             ` Andrew Morton
  2007-04-23 22:15               ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2007-04-23 22:12 UTC (permalink / raw)
  To: David Miller, netdev

On Mon, 23 Apr 2007 14:45:57 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> Let me play around with udpspam a bit.

tcpdump does show stuff coming in when I run udpspam against the test
machine from another host.

More rtnl weirdness.  Running `ifup eth0' gave me:


Apr 23 14:53:57 localhost smartd[4051]: smartd has fork()ed into background mode. New PID=4051. 
Apr 23 14:56:47 localhost kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
Apr 23 14:56:47 localhost kernel: e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Apr 23 14:56:47 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Apr 23 14:56:48 localhost dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Apr 23 14:56:48 localhost dhclient: DHCPACK from 172.18.119.253
Apr 23 14:56:48 localhost avahi-daemon[3971]: New relevant interface eth0.IPv4 for mDNS.
Apr 23 14:56:48 localhost avahi-daemon[3971]: Joining mDNS multicast group on interface eth0.IPv4 with address 172.18.116.155.
Apr 23 14:56:48 localhost avahi-daemon[3971]: Registering new address record for 172.18.116.155 on eth0.
Apr 23 14:56:48 localhost avahi-daemon[3971]: Withdrawing address record for 172.18.116.155 on eth0.
Apr 23 14:56:48 localhost avahi-daemon[3971]: Leaving mDNS multicast group on interface eth0.IPv4 with address 172.18.116.155.
Apr 23 14:56:48 localhost avahi-daemon[3971]: iface.c: interface_mdns_mcast_join() called but no local address available.
Apr 23 14:56:48 localhost avahi-daemon[3971]: Interface eth0.IPv4 no longer relevant for mDNS.
Apr 23 14:56:48 localhost avahi-daemon[3971]: New relevant interface eth0.IPv4 for mDNS.
Apr 23 14:56:48 localhost avahi-daemon[3971]: Joining mDNS multicast group on interface eth0.IPv4 with address 172.18.116.155.
Apr 23 14:56:48 localhost kernel: RTNL: assertion failed at net/ipv4/igmp.c (1205)
Apr 23 14:56:48 localhost kernel: 
Apr 23 14:56:48 localhost kernel: Call Trace:
Apr 23 14:56:48 localhost kernel:  [<ffffffff8049340c>] ip_mc_inc_group+0x3e/0x1f2
Apr 23 14:56:48 localhost kernel:  [<ffffffff80493b2b>] ip_mc_join_group+0xca/0xe8
Apr 23 14:56:48 localhost kernel:  [<ffffffff8047e441>] do_ip_setsockopt+0x6db/0x9d7
Apr 23 14:56:48 localhost kernel:  [<ffffffff8029696e>] autoremove_wake_function+0x0/0x2e
Apr 23 14:56:48 localhost kernel:  [<ffffffff80336018>] selinux_inode_getattr+0x50/0x5e
Apr 23 14:56:48 localhost kernel:  [<ffffffff80333c56>] socket_has_perm+0x5b/0x68
Apr 23 14:56:48 localhost kernel:  [<ffffffff8047e7e5>] ip_setsockopt+0x22/0x86
Apr 23 14:56:48 localhost kernel:  [<ffffffff8045587b>] sys_setsockopt+0x8f/0xb5
Apr 23 14:56:48 localhost kernel:  [<ffffffff8025911e>] system_call+0x7e/0x83
Apr 23 14:56:48 localhost kernel: 
Apr 23 14:56:48 localhost NET[4351]: /sbin/dhclient-script : updated /etc/resolv.conf
Apr 23 14:56:48 localhost avahi-daemon[3971]: Registering new address record for 172.18.116.155 on eth0.
Apr 23 14:56:48 localhost dhclient: bound to 172.18.116.155 -- renewal in 3205 seconds.
Apr 23 14:56:49 localhost avahi-daemon[3971]: New relevant interface eth0.IPv6 for mDNS.
Apr 23 14:56:49 localhost avahi-daemon[3971]: Joining mDNS multicast group on interface eth0.IPv6 with address fe80::204:23ff:fec6:d7d2.
Apr 23 14:56:49 localhost avahi-daemon[3971]: Registering new address record for fe80::204:23ff:fec6:d7d2 on eth0.

which is just stupid.  The rtnl_lock() is right there in ip_mc_join_group().
And this is a different architecture and config and compiler from yesterday's
fun.  And no scheduler patches involved here.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net-2.6.22 UDP stalls/hangs
  2007-04-23 22:12             ` Andrew Morton
@ 2007-04-23 22:15               ` David Miller
  2007-04-23 22:37                 ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2007-04-23 22:15 UTC (permalink / raw)
  To: akpm; +Cc: netdev

From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 23 Apr 2007 15:12:40 -0700

> which is just stupid.  The rtnl_lock() is right there in ip_mc_join_group().
> And this is a different architecture and config and compiler from yesterday's
> fun.  And no scheduler patches involved here.

Perhaps something on another cpu is dropping the rtnl semaphore one
times too many.

Recently a bug of this nature was discovered in the wireless stack.
But unless you have a wireless device in this box too, it's probably
unrelated.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net-2.6.22 UDP stalls/hangs
  2007-04-23 22:15               ` David Miller
@ 2007-04-23 22:37                 ` Andrew Morton
  2007-04-23 22:45                   ` David Miller
  2007-04-23 23:14                   ` Rick Jones
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2007-04-23 22:37 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Mon, 23 Apr 2007 15:15:31 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Mon, 23 Apr 2007 15:12:40 -0700
> 
> > which is just stupid.  The rtnl_lock() is right there in ip_mc_join_group().
> > And this is a different architecture and config and compiler from yesterday's
> > fun.  And no scheduler patches involved here.
> 
> Perhaps something on another cpu is dropping the rtnl semaphore one
> times too many.
> 
> Recently a bug of this nature was discovered in the wireless stack.
> But unless you have a wireless device in this box too, it's probably
> unrelated.

Could be.  But I'd expect the mutex code to whine about the extra unlock. 
And about from an unlock from a different thread.  If I have that option
turned on.

Oh well, one thing at a time.  The good news is that I can reproduce the
problem with netperf.

kpm:/usr/src/netperf-2.4.3> netperf -H akpm2 -t UDP_RR
UDP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to akpm2 (172.18.116.155) port 0 AF_INET
netperf: receive_response: no response received. errno 0 counter 0

That's running netserver on the test machine.

The machine running netperf is 172.18.116.160 and the test machine running
netserver is 172.18.116.155

tcpdump from the test machine:

15:24:37.924210 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
15:24:38.859309 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
15:24:39.078273 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
15:24:39.924074 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
15:24:40.017081 IP 172.24.0.7.domain > 172.18.116.57.37456:  59635 4/7/6 CNAME[|domain]
15:24:41.383433 IP 172.18.116.160.33137 > 172.18.116.155.12865: S 2760291763:2760291763(0) win 5840 <mss 1460,sackOK,timestamp 1967355840 0,nop,wscale 8>
15:24:41.383479 IP 172.18.116.155.12865 > 172.18.116.160.33137: S 1640262480:1640262480(0) ack 2760291764 win 5792 <mss 1460,sackOK,timestamp 7714 1967355840,nop,wscale 7>
15:24:41.383683 IP 172.18.116.160.33137 > 172.18.116.155.12865: . ack 1 win 23 <nop,nop,timestamp 1967355840 7714>
15:24:41.383883 IP 172.18.116.160.33137 > 172.18.116.155.12865: P 1:257(256) ack 1 win 23 <nop,nop,timestamp 1967355840 7714>
15:24:41.383902 IP 172.18.116.155.12865 > 172.18.116.160.33137: . ack 257 win 54 <nop,nop,timestamp 7714 1967355840>
15:24:41.384065 IP 172.18.116.155.12865 > 172.18.116.160.33137: P 1:257(256) ack 257 win 54 <nop,nop,timestamp 7714 1967355840>
15:24:41.587266 IP 172.18.116.155.12865 > 172.18.116.160.33137: P 1:257(256) ack 257 win 54 <nop,nop,timestamp 7765 1967355840>
15:24:41.839234 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
15:24:41.924303 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
15:24:41.995285 IP 172.18.116.155.12865 > 172.18.116.160.33137: P 1:257(256) ack 257 win 54 <nop,nop,timestamp 7867 1967355840>
15:24:42.030341 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
15:24:42.811330 IP 172.18.116.155.12865 > 172.18.116.160.33137: P 1:257(256) ack 257 win 54 <nop,nop,timestamp 8071 1967355840>
15:24:43.924183 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
15:24:44.121880 IP 172.24.0.7.domain > 172.18.116.22.46700:  52073* 1/4/4 A[|domain]
15:24:44.443419 IP 172.18.116.155.12865 > 172.18.116.160.33137: P 1:257(256) ack 257 win 54 <nop,nop,timestamp 8479 1967355840>
15:24:44.723257 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
15:24:44.886356 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
15:24:45.924263 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
15:24:47.659300 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
15:24:47.707599 IP 172.18.116.155.12865 > 172.18.116.160.33137: P 1:257(256) ack 257 win 54 <nop,nop,timestamp 9295 1967355840>
15:24:47.874419 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
15:24:47.952350 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
15:24:48.037569 IP 172.24.0.7.domain > 172.18.117.18.46665:  59092 2/7/6 CNAME[|domain]

So I think we did a bit of TCP chatter then no UDP at all?

It's interesting that the test machine can see other people's DNS queries
go past.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net-2.6.22 UDP stalls/hangs
  2007-04-23 22:37                 ` Andrew Morton
@ 2007-04-23 22:45                   ` David Miller
  2007-04-23 23:35                     ` Andrew Morton
  2007-04-24  0:04                     ` Herbert Xu
  2007-04-23 23:14                   ` Rick Jones
  1 sibling, 2 replies; 15+ messages in thread
From: David Miller @ 2007-04-23 22:45 UTC (permalink / raw)
  To: akpm; +Cc: netdev, acme, herbert

From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 23 Apr 2007 15:37:14 -0700

> So I think we did a bit of TCP chatter then no UDP at all?
> 
> It's interesting that the test machine can see other people's DNS queries
> go past.

It's mysterious alright.

I can't say that the UDP's are going out corrupted because tcpdump
seems to decode the DNS queries just fine.  Hmmm, if you're sending
this out on the broken machine we can't rule out corrupted checksums.

And if tcpdump doesn't see the UDP replies it means that it isn't even
reaching the device, let alone the stack.  At least that rules out
the stack dropping UDP packets for some reason.

It's possible we've stuffed up some expectation the e1000 driver
has for TX checksum offload.  Turn off TX checksums with
"ethtool -K eth0 tx off" and see if that makes the problem
go away.  Next, try "ethtool -K eth0 rx off".

I suspect skb_transport_offset() might be wrong for UDP packets
for some reason, as that's what drivers/net/e1000/e1000_main.c
e1000_tx_csum() depend upon.

Either that or some error in Herbert's recent checksum offload
handling changes, such as, in fact I am highly suspicious of
the second change listed below, you may want to try reverting
just that one:

commit 8952d6c988ec31070732117f353666a4b9a09fea
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Mon Apr 9 11:59:39 2007 -0700

    [NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY
    
    When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
    maps to the semantics of CHECKSUM_UNNECESSARY.  Therefore we should
    treat it as such in the stack.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 7f8be19f5a5737ce6ad670756183235c71b560bb
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Mon Apr 9 11:59:07 2007 -0700

    [NET]: Use csum_start offset instead of skb_transport_header
    
    The skb transport pointer is currently used to specify the start
    of the checksum region for transmit checksum offload.  Unfortunately,
    the same pointer is also used during receive side processing.
    
    This creates a problem when we want to retransmit a received
    packet with partial checksums since the skb transport pointer
    would be overwritten.
    
    This patch solves this problem by creating a new 16-bit csum_start
    offset value to replace the skb transport header for the purpose
    of checksums.  This offset is calculated from skb->head so that
    it does not have to change when skb->data changes.
    
    No extra space is required since csum_offset itself fits within
    a 16-bit word so we can use the other 16 bits for csum_start.
    
    For backwards compatibility, just before we push a packet with
    partial checksums off into the device driver, we set the skb
    transport header to what it would have been under the old scheme.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net-2.6.22 UDP stalls/hangs
  2007-04-23 22:37                 ` Andrew Morton
  2007-04-23 22:45                   ` David Miller
@ 2007-04-23 23:14                   ` Rick Jones
  1 sibling, 0 replies; 15+ messages in thread
From: Rick Jones @ 2007-04-23 23:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David Miller, netdev

> Oh well, one thing at a time.  The good news is that I can reproduce the
> problem with netperf.
> 
> kpm:/usr/src/netperf-2.4.3> netperf -H akpm2 -t UDP_RR
> UDP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to akpm2 (172.18.116.155) port 0 AF_INET
> netperf: receive_response: no response received. errno 0 counter 0
> 
> That's running netserver on the test machine.
> 
> The machine running netperf is 172.18.116.160 and the test machine running
> netserver is 172.18.116.155
> 
> tcpdump from the test machine:
> 
> 15:24:37.924210 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
> 15:24:38.859309 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
> 15:24:39.078273 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
> 15:24:39.924074 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
> 15:24:40.017081 IP 172.24.0.7.domain > 172.18.116.57.37456:  59635 4/7/6 CNAME[|domain]
> 15:24:41.383433 IP 172.18.116.160.33137 > 172.18.116.155.12865: S 2760291763:2760291763(0) win 5840 <mss 1460,sackOK,timestamp 1967355840 0,nop,wscale 8>
> 15:24:41.383479 IP 172.18.116.155.12865 > 172.18.116.160.33137: S 1640262480:1640262480(0) ack 2760291764 win 5792 <mss 1460,sackOK,timestamp 7714 1967355840,nop,wscale 7>
> 15:24:41.383683 IP 172.18.116.160.33137 > 172.18.116.155.12865: . ack 1 win 23 <nop,nop,timestamp 1967355840 7714>
> 15:24:41.383883 IP 172.18.116.160.33137 > 172.18.116.155.12865: P 1:257(256) ack 1 win 23 <nop,nop,timestamp 1967355840 7714>
> 15:24:41.383902 IP 172.18.116.155.12865 > 172.18.116.160.33137: . ack 257 win 54 <nop,nop,timestamp 7714 1967355840>
> 15:24:41.384065 IP 172.18.116.155.12865 > 172.18.116.160.33137: P 1:257(256) ack 257 win 54 <nop,nop,timestamp 7714 1967355840>
> 15:24:41.587266 IP 172.18.116.155.12865 > 172.18.116.160.33137: P 1:257(256) ack 257 win 54 <nop,nop,timestamp 7765 1967355840>
> 15:24:41.839234 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
> 15:24:41.924303 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
> 15:24:41.995285 IP 172.18.116.155.12865 > 172.18.116.160.33137: P 1:257(256) ack 257 win 54 <nop,nop,timestamp 7867 1967355840>
> 15:24:42.030341 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
> 15:24:42.811330 IP 172.18.116.155.12865 > 172.18.116.160.33137: P 1:257(256) ack 257 win 54 <nop,nop,timestamp 8071 1967355840>
> 15:24:43.924183 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
> 15:24:44.121880 IP 172.24.0.7.domain > 172.18.116.22.46700:  52073* 1/4/4 A[|domain]
> 15:24:44.443419 IP 172.18.116.155.12865 > 172.18.116.160.33137: P 1:257(256) ack 257 win 54 <nop,nop,timestamp 8479 1967355840>
> 15:24:44.723257 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
> 15:24:44.886356 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
> 15:24:45.924263 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
> 15:24:47.659300 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
> 15:24:47.707599 IP 172.18.116.155.12865 > 172.18.116.160.33137: P 1:257(256) ack 257 win 54 <nop,nop,timestamp 9295 1967355840>
> 15:24:47.874419 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
> 15:24:47.952350 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15 
> 15:24:48.037569 IP 172.24.0.7.domain > 172.18.117.18.46665:  59092 2/7/6 CNAME[|domain]
> 
> So I think we did a bit of TCP chatter then no UDP at all?

Looks that way, and on top if it got no results back from netserver on 
the control (TCP, port 12865) connection.  Adding some -d's to the 
global options will cause netperf to regurgitate what messages it is 
sending and such.

I'd have expected that even if no UDP traffic could flow between netperf 
and netserver the timer running in the netserver _should_ have gotten it 
out of the recv()/recvfrom() call in recv_udp_rr() (src/nettest_bsd.c) 
and that netperf would then report a "normal" result of just 0 
transactions per second.

Either that timer didn't get set, didn't fire, or was insufficient to 
get netserver out of that recv() on the UDP socket, or comms between the 
two system got fubar for TCP too.

rick jones



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net-2.6.22 UDP stalls/hangs
  2007-04-23 22:45                   ` David Miller
@ 2007-04-23 23:35                     ` Andrew Morton
  2007-04-24  0:04                     ` Herbert Xu
  1 sibling, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2007-04-23 23:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, acme, herbert

On Mon, 23 Apr 2007 15:45:09 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Mon, 23 Apr 2007 15:37:14 -0700
> 
> > So I think we did a bit of TCP chatter then no UDP at all?
> > 
> > It's interesting that the test machine can see other people's DNS queries
> > go past.
> 
> It's mysterious alright.
> 
> I can't say that the UDP's are going out corrupted because tcpdump
> seems to decode the DNS queries just fine.  Hmmm, if you're sending
> this out on the broken machine we can't rule out corrupted checksums.
> 
> And if tcpdump doesn't see the UDP replies it means that it isn't even
> reaching the device, let alone the stack.  At least that rules out
> the stack dropping UDP packets for some reason.
> 
> It's possible we've stuffed up some expectation the e1000 driver
> has for TX checksum offload.  Turn off TX checksums with
> "ethtool -K eth0 tx off" and see if that makes the problem
> go away.  Next, try "ethtool -K eth0 rx off".
> 
> I suspect skb_transport_offset() might be wrong for UDP packets
> for some reason, as that's what drivers/net/e1000/e1000_main.c
> e1000_tx_csum() depend upon.
> 
> Either that or some error in Herbert's recent checksum offload
> handling changes, such as, in fact I am highly suspicious of
> the second change listed below, you may want to try reverting
> just that one:

Bingo.

> commit 8952d6c988ec31070732117f353666a4b9a09fea
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Mon Apr 9 11:59:39 2007 -0700
> 
>     [NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY
>     
>     When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
>     maps to the semantics of CHECKSUM_UNNECESSARY.  Therefore we should
>     treat it as such in the stack.
>     
>     Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> commit 7f8be19f5a5737ce6ad670756183235c71b560bb
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Mon Apr 9 11:59:07 2007 -0700
> 
>     [NET]: Use csum_start offset instead of skb_transport_header
>     
>     The skb transport pointer is currently used to specify the start
>     of the checksum region for transmit checksum offload.  Unfortunately,
>     the same pointer is also used during receive side processing.
>     
>     This creates a problem when we want to retransmit a received
>     packet with partial checksums since the skb transport pointer
>     would be overwritten.
>     
>     This patch solves this problem by creating a new 16-bit csum_start
>     offset value to replace the skb transport header for the purpose
>     of checksums.  This offset is calculated from skb->head so that
>     it does not have to change when skb->data changes.
>     
>     No extra space is required since csum_offset itself fits within
>     a 16-bit word so we can use the other 16 bits for csum_start.
>     
>     For backwards compatibility, just before we push a packet with
>     partial checksums off into the device driver, we set the skb
>     transport header to what it would have been under the old scheme.
>     
>     Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

Reverting both 8952d6c988ec31070732117f353666a4b9a09fea and
7f8be19f5a5737ce6ad670756183235c71b560bb fixes it.  Reverting only
7f8be19f5a5737ce6ad670756183235c71b560bb also fixes it.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net-2.6.22 UDP stalls/hangs
  2007-04-23 22:45                   ` David Miller
  2007-04-23 23:35                     ` Andrew Morton
@ 2007-04-24  0:04                     ` Herbert Xu
  2007-04-24  0:07                       ` David Miller
  1 sibling, 1 reply; 15+ messages in thread
From: Herbert Xu @ 2007-04-24  0:04 UTC (permalink / raw)
  To: David Miller; +Cc: akpm, netdev, acme

On Mon, Apr 23, 2007 at 03:45:09PM -0700, David Miller wrote:
> 
> Either that or some error in Herbert's recent checksum offload
> handling changes, such as, in fact I am highly suspicious of
> the second change listed below, you may want to try reverting
> just that one:

Indeed.  My change depended on drivers correctly using csum_offset
instead of the old csum field.  That was wrong anyway since sparse
would now warn against that usage.  However, prior to my change it
was harmless.

[NETDRV]: Perform missing csum_offset conversions

When csum_offset was introduced we did a conversion from csum to
csum_offset where applicable.  A couple of drivers were missed in
this process.

It was harmless to begin with since the two fields coincided.  Now
that we've made them different with the addition of csum_start, the
missed drivers must be converted or they can't send packets out at
all that require checksum offload.
     
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/drivers/net/atl1/atl1_main.c b/drivers/net/atl1/atl1_main.c
index d60c221..4b1d4d1 100644
--- a/drivers/net/atl1/atl1_main.c
+++ b/drivers/net/atl1/atl1_main.c
@@ -1328,7 +1328,7 @@ static int atl1_tx_csum(struct atl1_adapter *adapter, struct sk_buff *skb,
 
 	if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
 		cso = skb_transport_offset(skb);
-		css = cso + skb->csum;
+		css = cso + skb->csum_offset;
 		if (unlikely(cso & 0x1)) {
 			printk(KERN_DEBUG "%s: payload offset != even number\n",
 				atl1_driver_name);
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 610216e..48e2ade 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2959,7 +2959,8 @@ e1000_tx_csum(struct e1000_adapter *adapter, struct e1000_tx_ring *tx_ring,
 
 		context_desc->lower_setup.ip_config = 0;
 		context_desc->upper_setup.tcp_fields.tucss = css;
-		context_desc->upper_setup.tcp_fields.tucso = css + skb->csum;
+		context_desc->upper_setup.tcp_fields.tucso =
+			css + skb->csum_offset;
 		context_desc->upper_setup.tcp_fields.tucse = 0;
 		context_desc->tcp_seg_setup.data = 0;
 		context_desc->cmd_and_length = cpu_to_le32(E1000_TXD_CMD_DEXT);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: net-2.6.22 UDP stalls/hangs
  2007-04-24  0:04                     ` Herbert Xu
@ 2007-04-24  0:07                       ` David Miller
  0 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2007-04-24  0:07 UTC (permalink / raw)
  To: herbert; +Cc: akpm, netdev, acme

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 24 Apr 2007 10:04:58 +1000

> On Mon, Apr 23, 2007 at 03:45:09PM -0700, David Miller wrote:
> > 
> > Either that or some error in Herbert's recent checksum offload
> > handling changes, such as, in fact I am highly suspicious of
> > the second change listed below, you may want to try reverting
> > just that one:
> 
> Indeed.  My change depended on drivers correctly using csum_offset
> instead of the old csum field.  That was wrong anyway since sparse
> would now warn against that usage.  However, prior to my change it
> was harmless.
> 
> [NETDRV]: Perform missing csum_offset conversions
> 
> When csum_offset was introduced we did a conversion from csum to
> csum_offset where applicable.  A couple of drivers were missed in
> this process.
> 
> It was harmless to begin with since the two fields coincided.  Now
> that we've made them different with the addition of csum_start, the
> missed drivers must be converted or they can't send packets out at
> all that require checksum offload.
>      
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks a lot Herbert.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-04-24  0:06 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-23 20:07 net-2.6.22 UDP stalls/hangs Andrew Morton
2007-04-23 20:18 ` David Miller
2007-04-23 20:27   ` Andrew Morton
2007-04-23 20:37     ` David Miller
2007-04-23 20:56       ` Andrew Morton
2007-04-23 21:17         ` David Miller
2007-04-23 21:45           ` Andrew Morton
2007-04-23 22:12             ` Andrew Morton
2007-04-23 22:15               ` David Miller
2007-04-23 22:37                 ` Andrew Morton
2007-04-23 22:45                   ` David Miller
2007-04-23 23:35                     ` Andrew Morton
2007-04-24  0:04                     ` Herbert Xu
2007-04-24  0:07                       ` David Miller
2007-04-23 23:14                   ` Rick Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).