From: Andrew Morton <akpm@linux-foundation.org>
To: David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Subject: Re: net-2.6.22 UDP stalls/hangs
Date: Mon, 23 Apr 2007 13:56:39 -0700 [thread overview]
Message-ID: <20070423135639.26ca125a.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070423.133730.115642229.davem@davemloft.net>
On Mon, 23 Apr 2007 13:37:30 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Mon, 23 Apr 2007 13:27:19 -0700
>
> > On Mon, 23 Apr 2007 13:18:10 -0700 (PDT)
> > David Miller <davem@davemloft.net> wrote:
> >
> > > From: Andrew Morton <akpm@linux-foundation.org>
> > > Date: Mon, 23 Apr 2007 13:07:34 -0700
> > >
> > > > The interesting bit is:
> > > ...
> > > > I think I saw the same problem maybe 1.5 weeks ago on this machine, but I
> > > > didn't have time to investigate further. So it's not some recent thing.
> > >
> > > My initial reaction is that DNS responses are being lost or dropped
> > > for some reason.
> >
> > Plausible. I'll try booting it with the ethernet unplugged.
>
> That won't test the same scenerio.
>
> If the network cable is unplugged, ARP responses won't arrive and
> therefore sendmsg() calls will return with a host unreachable error.
>
> The situation you need to recreate is specifically UDP packets getting
> dropped.
>
> The reason I wanted the tcpdump trace is so that we can see whether
> the problem is UDP packets going out or going in which are being
> mangled/dropped.
>
> You don't need a hub to get a dump. Instead you can run a caching
> named on some other system, configure your FC6 box to use that system
> for DNS via /etc/resolv.conf, then run tcpdump on the caching named
> machine.
hm, fancy.
I unplugged the cable and the machine booted normally. Lots of commands
were hanging when I plugged it back in.
I plugged the cable back in and on one console ran
tcpdump -l -i eth0
but of course tcpdump didn't do anything because it wants to do reverse
lookups. But interestingly, tcpdump was taking maybe 15 seconds to respond
to ^c and to killall. tcpdump was stuck in udp_poll(), like statd was.
But I think it's significant that we're not taking signals while in that
interruptible sleep.
I am able to ping the test machine from another host on the same network.
On the test machine I used `tcpdump -l -n -i eth0' and on another vt, ran
`ping www.google.com'. The test machine is 172.18.116.155
13:40:51.120004 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
13:40:51.489171 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15
13:40:52.567615 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
13:40:53.489201 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15
13:40:53.755655 arp who-has 172.18.119.254 tell 172.18.116.155
13:40:53.755991 arp reply 172.18.119.254 is-at 00:00:0c:07:ac:01
13:40:53.755997 IP 172.18.116.155.32806 > 172.24.0.7.domain: 42807+ A? www.google.com. (32)
13:40:53.991979 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
13:40:55.435664 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
13:40:55.514942 IP 172.18.116.45.netbios-dgm > 172.18.119.255.netbios-dgm: NBT UDP PACKET(138)
13:40:55.710092 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15
13:40:56.463086 arp who-has 172.18.119.254 tell 172.18.116.45
13:40:56.856033 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
13:40:57.709673 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15
13:40:58.331717 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
13:40:58.751949 IP 172.18.116.155.32807 > 172.25.146.107.domain: 42807+ A? www.google.com. (32)
13:40:59.276068 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-unknown (3) 16: state=initial group=2 [|hsrp]
13:40:59.709703 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15
13:40:59.716492 IP 172.18.119.178.netbios-dgm > 172.18.119.255.netbios-dgm: NBT UDP PACKET(138)
13:40:59.814742 arp who-has 172.18.119.254 tell 172.18.116.206
13:40:59.844096 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
13:41:01.215791 IP 172.18.119.253.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=active group=1 addr=172.18.119.254
13:41:01.709583 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15
13:41:01.751918 IP 172.18.116.199.ipp > 172.18.119.255.ipp: UDP, length 124
13:41:02.776596 arp who-has 172.18.119.254 tell 172.18.117.227
13:41:02.836204 IP 172.18.119.252.hsrp > 224.0.0.2.hsrp: HSRPv0-hello 20: state=standby group=1 addr=172.18.119.254
13:41:03.709613 802.1d config 8000.00:18:74:5d:04:66.80ae root 0066.00:15:c7:20:57:c0 pathcost 4 age 1 max 20 hello 2 fdelay 15
so it looks like we tried to send the query but we didn't see anything come back.
Which means I need to do the caching named thing. I tried (using RH's fc6
kernel), but it doesn't work. Help?
On 172.18.116.160 I'm running
root 7375 0.0 0.0 75496 500 ? Ssl Jan22 0:00 /usr/sbin/nscd-2.3.2 -f /etc/nscd-2.3.2.conf
and on the test machine I put
nameserver 172.18.116.160
into /etc/resolv.conf.
Is nscd the caching named which you're referring to?
next prev parent reply other threads:[~2007-04-23 20:56 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-23 20:07 net-2.6.22 UDP stalls/hangs Andrew Morton
2007-04-23 20:18 ` David Miller
2007-04-23 20:27 ` Andrew Morton
2007-04-23 20:37 ` David Miller
2007-04-23 20:56 ` Andrew Morton [this message]
2007-04-23 21:17 ` David Miller
2007-04-23 21:45 ` Andrew Morton
2007-04-23 22:12 ` Andrew Morton
2007-04-23 22:15 ` David Miller
2007-04-23 22:37 ` Andrew Morton
2007-04-23 22:45 ` David Miller
2007-04-23 23:35 ` Andrew Morton
2007-04-24 0:04 ` Herbert Xu
2007-04-24 0:07 ` David Miller
2007-04-23 23:14 ` Rick Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070423135639.26ca125a.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.