From: Andrew Savchenko <bircoph@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org
Subject: Re: [BUG] Kernel recieves DNS reply, but doesn't deliver it to a waiting application
Date: Sun, 21 Oct 2012 03:25:43 +0400 [thread overview]
Message-ID: <20121021032543.09d1844f.bircoph@gmail.com> (raw)
In-Reply-To: <20121014031119.a60263d6.bircoph@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4558 bytes --]
Hello,
On Sun, 14 Oct 2012 03:11:19 +0400 Andrew Savchenko wrote:
> On Sat, 13 Oct 2012 15:44:20 +0200 Eric Dumazet wrote:
> > On Sat, 2012-10-13 at 16:36 +0400, Andrew Savchenko wrote:
> > > On Wed, 3 Oct 2012 23:25:48 +0400 Andrew Savchenko wrote:
> > > > I encountered a very weird bug: after a while of uptime kernel stops to deliver
> > > > DNS reply to applications. Tcpdump shows that correct reply is recieved, but
> > > > strace shows inquiring application never recieves it and ends with timeout,
> > > > epoll_wait() always returns 0:
> > > > a slice from: $ host kernel.org 8.8.8.8:
> [...]
> > > > In a few days I'll try 3.4.12 (I need to rebuild kernel anyway due to unrelated
> > > > issue) and will report if this bug will occur again. But please note it may
> > > > take several weeks to check this.
> > >
> > > I got this problem again with 3.4.12 kernel. System lasted less than
> > > a week and reboot was the only option...
> >
> > You should investigate and check where the incoming packet is lost
> >
> > Tools :
> >
> > netstat -s
> >
> > drop_monitor module and dropwatch command
> >
> > cat /proc/net/udp
>
> Thank you for you reply; I updated my kernel to 3.4.14, enabled
> CONFIG_NET_DROP_MONITOR, and installed dropwatch utility.
>
> I will report back when the bug will struck again.
> This may take a weak or two, however.
This bug is back again on kernel 3.4.14, but this time I was able to
get debug data and to recover running kernel without reboot.
Drowpatch showed that DNS UDP replies are always dropped here:
1 drops at __udp_queue_rcv_skb+61 (0xffffffff813bd670)
Another observations:
- only UDP replies are lost, TCP works fine;
- if network load is dropped dramatically (ip_forward disabled, most
network daemons are stopped) UDP DNS queries work again; but with
gradual load increase replies became first slow and than cease at all.
- CPU load is very low (uptime is below 0.05), so this shouldn't be
an insufficient computing power issue.
I found __udp_queue_rcv_skb function in net/ipv4/udp.c. From the code
and observations above it follows that this is likely to be a ENOMEM
condition leading to a packet loss.
This is a memory data after bug happened:
# cat /proc/meminfo
MemTotal: 1021576 kB
MemFree: 32056 kB
Buffers: 105204 kB
Cached: 646716 kB
SwapCached: 236 kB
Active: 205932 kB
Inactive: 587156 kB
Active(anon): 20636 kB
Inactive(anon): 22488 kB
Active(file): 185296 kB
Inactive(file): 564668 kB
Unevictable: 2152 kB
Mlocked: 2152 kB
SwapTotal: 995992 kB
SwapFree: 995020 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 43120 kB
Mapped: 7504 kB
Shmem: 148 kB
Slab: 176004 kB
SReclaimable: 118636 kB
SUnreclaim: 57368 kB
KernelStack: 688 kB
PageTables: 2948 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1506780 kB
Committed_AS: 62708 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 262732 kB
VmallocChunk: 34359474615 kB
AnonHugePages: 0 kB
DirectMap4k: 33536 kB
DirectMap2M: 1013760 kB
# sysctl -a | grep mem
net.core.optmem_max = 20480
net.core.rmem_default = 229376
net.core.rmem_max = 131071
net.core.wmem_default = 229376
net.core.wmem_max = 131071
net.ipv4.igmp_max_memberships = 20
net.ipv4.tcp_mem = 22350 29801 44700
net.ipv4.tcp_rmem = 4096 87380 6291456
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.udp_mem = 24150 32202 48300
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_wmem_min = 4096
vm.lowmem_reserve_ratio = 256 256 32
vm.overcommit_memory = 0
Sysctl memory parameters are system defaults, I haven't changed them
via sysctl or /proc interfaces.
I tried to increase udm_mem values to the following:
net.ipv4.udp_mem = 100000 150000 200000
This solved my issue, at least for a while: DNS queries are working
fine now.
But I suspect that there is some memory loss in the kernel UDP stack,
because this issue never happens after reboot and always after about
a week of network operation. So this memory increase should help only
for a month or so, if memory loss is linear.
If you need some memory debug information, let me know which one and
what tools will be needed.
Best regards,
Andrew Savchenko
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
next prev parent reply other threads:[~2012-10-20 23:26 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-03 19:25 Kernel recieves DNS reply, but doesn't deliver it to a waiting application Andrew Savchenko
2012-10-13 12:36 ` [BUG] " Andrew Savchenko
2012-10-13 13:44 ` Eric Dumazet
2012-10-13 23:11 ` Andrew Savchenko
2012-10-20 23:25 ` Andrew Savchenko [this message]
2012-10-21 12:52 ` Eric Dumazet
2012-10-22 3:36 ` Andrew Savchenko
2012-10-22 6:48 ` Eric Dumazet
2012-10-22 21:27 ` Andrew Savchenko
2012-12-12 8:27 ` Andrew Savchenko
2012-12-23 11:06 ` Andrew Savchenko
2012-12-28 18:11 ` Eric Dumazet
2013-01-16 16:36 ` Andrew Savchenko
2013-02-04 13:39 ` Andrew Savchenko
2013-02-04 15:21 ` Eric Dumazet
2012-11-23 7:45 ` Andrew Savchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121021032543.09d1844f.bircoph@gmail.com \
--to=bircoph@gmail.com \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).