From: "Holger Hoffstaette" <holger.hoffstaette@googlemail.com>
To: netdev@vger.kernel.org
Subject: Re: Network hangs with 2.6.30.5
Date: Thu, 03 Sep 2009 21:55:39 +0200 [thread overview]
Message-ID: <pan.2009.09.03.19.55.37.672875@googlemail.com> (raw)
In-Reply-To: 4AA0188C.20107@gmail.com
On Thu, 03 Sep 2009 21:27:08 +0200, Eric Dumazet wrote:
> Holger Hoffstaette a écrit :
>> Problem found! At least for me..
>>
>>> On 01-09-2009 17:32, Holger Hoffstaette wrote:
>>>> On Tue, 01 Sep 2009 16:17:08 +0200, Holger Hoffstaette wrote:
>>>>
>>>> [network regressions in .30]
>>
>> I got the git .30.y stable tree and reverted various e1000 commits that
>> seemed to coincide with the various .30-rc releases but nothing helped.
>> Also no relation to offloads etc.
>>
>> However I did notice that the "stuck squid" problem seemed to magically
>> fix itself after a few seconds - then hang again, fix itself after
>> timeouts etc. So I suspected something TCP related and BINGO!
>>
>> Turns out I had both tcp_tw_recycle and tcp_tw_reuse set to 1 for
>> reasons I don't want to explain. :)
>>
>> I can now arbitrarily fix the hanging behaviour by setting
>> tcp_tw_recycle to 0, and cause hangs by setting it to 1 again. For
>> obvious reasons this seems to affect squid more than other tasks with
>> more long-lived connections. What is the right behaviour? beats me.
>>
>> tcp_tw_reuse does not appear to play a role, so the real culprit at
>> least in my case seems to be tcp_tw_recycle. In previous releases this
>> (and tw_reuse) was necessary for various server tasks.
>>
>> Nevertheless, something has changed between .29 and .30 that "broke" the
>> previous behaviour. Whether this is progress or an regression I cannot
>> say. Maybe someone else has an idea?
>>
>>
> Well... not yet :)
>
> We probably can reproduce this problem with any NIC...
>
> Could you send from the 'buggy' setup
>
> $ grep . /proc/sys/net/ipv4/*
Sure:
root>grep . /proc/sys/net/ipv4/*
grep: /proc/sys/net/ipv4/conf: Invalid argument
/proc/sys/net/ipv4/icmp_echo_ignore_all:0
/proc/sys/net/ipv4/icmp_echo_ignore_broadcasts:1
/proc/sys/net/ipv4/icmp_errors_use_inbound_ifaddr:0
/proc/sys/net/ipv4/icmp_ignore_bogus_error_responses:1
/proc/sys/net/ipv4/icmp_ratelimit:1000
/proc/sys/net/ipv4/icmp_ratemask:6168
/proc/sys/net/ipv4/igmp_max_memberships:20
/proc/sys/net/ipv4/igmp_max_msf:10
/proc/sys/net/ipv4/inet_peer_gc_maxtime:120
/proc/sys/net/ipv4/inet_peer_gc_mintime:10
/proc/sys/net/ipv4/inet_peer_maxttl:600
/proc/sys/net/ipv4/inet_peer_minttl:120
/proc/sys/net/ipv4/inet_peer_threshold:65664
/proc/sys/net/ipv4/ip_default_ttl:64
/proc/sys/net/ipv4/ip_dynaddr:0
/proc/sys/net/ipv4/ip_forward:0
/proc/sys/net/ipv4/ip_local_port_range:32768 61000
/proc/sys/net/ipv4/ip_no_pmtu_disc:0
/proc/sys/net/ipv4/ip_nonlocal_bind:0
/proc/sys/net/ipv4/ipfrag_high_thresh:8388608
/proc/sys/net/ipv4/ipfrag_low_thresh:1048575
/proc/sys/net/ipv4/ipfrag_max_dist:64
/proc/sys/net/ipv4/ipfrag_secret_interval:600
/proc/sys/net/ipv4/ipfrag_time:30
grep: /proc/sys/net/ipv4/neigh: Invalid argument
grep: /proc/sys/net/ipv4/route: Invalid argument
/proc/sys/net/ipv4/rt_cache_rebuild_count:4
/proc/sys/net/ipv4/tcp_abc:0
/proc/sys/net/ipv4/tcp_abort_on_overflow:0
/proc/sys/net/ipv4/tcp_adv_win_scale:2
/proc/sys/net/ipv4/tcp_allowed_congestion_control:cubic reno
/proc/sys/net/ipv4/tcp_app_win:31
/proc/sys/net/ipv4/tcp_available_congestion_control:cubic reno
/proc/sys/net/ipv4/tcp_base_mss:512
/proc/sys/net/ipv4/tcp_congestion_control:cubic
/proc/sys/net/ipv4/tcp_dsack:1
/proc/sys/net/ipv4/tcp_ecn:0
/proc/sys/net/ipv4/tcp_fack:1
/proc/sys/net/ipv4/tcp_fin_timeout:60
/proc/sys/net/ipv4/tcp_frto:2
/proc/sys/net/ipv4/tcp_frto_response:0
/proc/sys/net/ipv4/tcp_keepalive_intvl:75
/proc/sys/net/ipv4/tcp_keepalive_probes:9
/proc/sys/net/ipv4/tcp_keepalive_time:7200
/proc/sys/net/ipv4/tcp_low_latency:0
/proc/sys/net/ipv4/tcp_max_orphans:16384
/proc/sys/net/ipv4/tcp_max_ssthresh:0
/proc/sys/net/ipv4/tcp_max_syn_backlog:1024
/proc/sys/net/ipv4/tcp_max_tw_buckets:180000
/proc/sys/net/ipv4/tcp_mem:82944 110592 165888
/proc/sys/net/ipv4/tcp_moderate_rcvbuf:1
/proc/sys/net/ipv4/tcp_mtu_probing:0
/proc/sys/net/ipv4/tcp_no_metrics_save:0
/proc/sys/net/ipv4/tcp_orphan_retries:0
/proc/sys/net/ipv4/tcp_reordering:3
/proc/sys/net/ipv4/tcp_retrans_collapse:1
/proc/sys/net/ipv4/tcp_retries1:3
/proc/sys/net/ipv4/tcp_retries2:15
/proc/sys/net/ipv4/tcp_rfc1337:0
/proc/sys/net/ipv4/tcp_rmem:4096 262144 8388608
/proc/sys/net/ipv4/tcp_sack:1
/proc/sys/net/ipv4/tcp_slow_start_after_idle:1
/proc/sys/net/ipv4/tcp_stdurg:0
/proc/sys/net/ipv4/tcp_syn_retries:5
/proc/sys/net/ipv4/tcp_synack_retries:5
/proc/sys/net/ipv4/tcp_timestamps:1
/proc/sys/net/ipv4/tcp_tso_win_divisor:3
/proc/sys/net/ipv4/tcp_tw_recycle:1
/proc/sys/net/ipv4/tcp_tw_reuse:1
/proc/sys/net/ipv4/tcp_window_scaling:1
/proc/sys/net/ipv4/tcp_wmem:4096 262144 8388608
/proc/sys/net/ipv4/tcp_workaround_signed_windows:0
/proc/sys/net/ipv4/udp_mem:82944 110592 165888
/proc/sys/net/ipv4/udp_rmem_min:4096
/proc/sys/net/ipv4/udp_wmem_min:4096
root>
> When you say squid is stuck, does it mean it doesnt accept new connections
> ?
Yes, that seems to be the behaviour. To verify - I just browse any random
website and after a few requests the browser does not get any replies any
longer, timeouts etc. After setting tw_recycle to 0 it immediately starts
working again (and then continues to work).
Took a bit to find out in which direction things were hanging/getting
stuck, since it was not immediately clear from simply observing the client.
> Could help to strace it and check what it is doing ? --
After a quite frantic start (endless amount of 54 byte writes??) it sits
in its epoll loop and waits:
..
epoll_wait(4, {}, 8192, 1000) = 0
gettimeofday({1252007105, 637264}, NULL) = 0
gettimeofday({1252007105, 637309}, NULL) = 0
epoll_wait(4, {}, 8192, 1000) = 0
gettimeofday({1252007106, 637262}, NULL) = 0
gettimeofday({1252007106, 637308}, NULL) = 0
epoll_wait(4, {}, 8192, 0) = 0
gettimeofday({1252007106, 637389}, NULL) = 0
gettimeofday({1252007106, 637421}, NULL) = 0
epoll_wait(4, {}, 8192, 1000) = 0
gettimeofday({1252007107, 637266}, NULL) = 0
gettimeofday({1252007107, 637311}, NULL) = 0
..etc..
and occasionally diddles with its cache directory:
..
epoll_wait(4, {}, 8192, 997) = 0
gettimeofday({1252007148, 554039}, NULL) = 0
gettimeofday({1252007148, 554097}, NULL) = 0
open("/var/cache/squid/02/09", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 9
getdents64(9, /* 113 entries */, 32768) = 3600
getdents64(9, /* 0 entries */, 32768) = 0
close(9) = 0
epoll_wait(4, {}, 8192, 0) = 0
gettimeofday({1252007148, 555923}, NULL) = 0
..
When requests come in, it does whatever squid does, but when it hangs, the
following popped up:
gettimeofday({1252007207, 116223}, NULL) = 0
epoll_ctl(4, EPOLL_CTL_MOD, 23, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=23, u64=582647025628086295}}) = 0
gettimeofday({1252007207, 116380}, NULL) = 0
epoll_wait(4, {{EPOLLIN, {u32=15, u64=582647025628086287}}}, 8192, 359) = 1
gettimeofday({1252007207, 117309}, NULL) = 0
read(15, "48,33,48,31,40,31\" href=\"/2/hi/af"..., 2046) = 581
gettimeofday({1252007207, 117476}, NULL) = 0
epoll_ctl(4, EPOLL_CTL_MOD, 23, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=23, u64=23}}) = 0
epoll_wait(4, {{EPOLLIN, {u32=15, u64=582647025628086287}}, {EPOLLOUT, {u32=23, u64=23}}}, 8192, 358) = 2
gettimeofday({1252007207, 117790}, NULL) = 0
read(15, ""..., 2046) = 0
write(23, "48,33,48,31,40,31\" href=\"/2/hi/af"..., 581) = 581
gettimeofday({1252007207, 118032}, NULL) = 0
read(15, ""..., 65535) = 0
read(23, 0xbf925401, 65535) = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(4, EPOLL_CTL_DEL, 15, {0, {u32=15, u64=15}}) = 0
close(15) = 0
epoll_ctl(4, EPOLL_CTL_MOD, 23, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=23, u64=581683939341500439}}) = 0
epoll_ctl(4, EPOLL_CTL_DEL, 23, {0, {u32=23, u64=581683939341500439}}) = 0
close(23) = 0
epoll_wait(4, {}, 8192, 357) = 0
gettimeofday({1252007207, 477363}, NULL) = 0
gettimeofday({1252007207, 477408}, NULL) = 0
epoll_wait(4, {}, 8192, 0) = 0
gettimeofday({1252007207, 477561}, NULL) = 0
gettimeofday({1252007207, 477629}, NULL) = 0
epoll_wait(4, {}, 8192, 449) = 0
Then it again spins in the epoll loop for a while until it recovers and
starts to handle the outstanding requests (as far as I can read - it
scrolls by too quickly).
Was that somewhat helpful? I can certainly create a full trace but that's
going to be big.
regards
Holger
next prev parent reply other threads:[~2009-09-03 19:56 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-01 9:50 Network hangs with 2.6.30.5 Clifford Heath
2009-09-01 10:47 ` Eric Dumazet
2009-09-01 11:20 ` Ben Hutchings
2009-09-01 14:17 ` Holger Hoffstaette
2009-09-01 15:32 ` Holger Hoffstaette
2009-09-03 7:46 ` Jarek Poplawski
2009-09-03 19:20 ` Holger Hoffstaette
2009-09-03 19:27 ` Eric Dumazet
2009-09-03 19:55 ` Holger Hoffstaette [this message]
2009-09-07 7:21 ` Jarek Poplawski
2009-09-10 22:41 ` Clifford Heath
2009-10-01 22:49 ` David Miller
2009-10-02 8:11 ` Ilpo Järvinen
2009-10-02 12:29 ` Ilpo Järvinen
2009-10-02 12:38 ` Eric Dumazet
2009-11-19 23:40 ` David Miller
2009-11-20 12:04 ` Evgeniy Polyakov
2009-11-20 12:09 ` Ilpo Järvinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pan.2009.09.03.19.55.37.672875@googlemail.com \
--to=holger.hoffstaette@googlemail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).