From: "Holger Hoffstaette" <holger.hoffstaette@googlemail.com>
To: netdev@vger.kernel.org
Subject: Re: Network hangs with 2.6.30.5
Date: Thu, 03 Sep 2009 21:55:39 +0200 [thread overview]
Message-ID: <pan.2009.09.03.19.55.37.672875@googlemail.com> (raw)
In-Reply-To: 4AA0188C.20107@gmail.com
On Thu, 03 Sep 2009 21:27:08 +0200, Eric Dumazet wrote:
> Holger Hoffstaette a écrit :
>> Problem found! At least for me..
>>
>>> On 01-09-2009 17:32, Holger Hoffstaette wrote:
>>>> On Tue, 01 Sep 2009 16:17:08 +0200, Holger Hoffstaette wrote:
>>>>
>>>> [network regressions in .30]
>>
>> I got the git .30.y stable tree and reverted various e1000 commits that
>> seemed to coincide with the various .30-rc releases but nothing helped.
>> Also no relation to offloads etc.
>>
>> However I did notice that the "stuck squid" problem seemed to magically
>> fix itself after a few seconds - then hang again, fix itself after
>> timeouts etc. So I suspected something TCP related and BINGO!
>>
>> Turns out I had both tcp_tw_recycle and tcp_tw_reuse set to 1 for
>> reasons I don't want to explain. :)
>>
>> I can now arbitrarily fix the hanging behaviour by setting
>> tcp_tw_recycle to 0, and cause hangs by setting it to 1 again. For
>> obvious reasons this seems to affect squid more than other tasks with
>> more long-lived connections. What is the right behaviour? beats me.
>>
>> tcp_tw_reuse does not appear to play a role, so the real culprit at
>> least in my case seems to be tcp_tw_recycle. In previous releases this
>> (and tw_reuse) was necessary for various server tasks.
>>
>> Nevertheless, something has changed between .29 and .30 that "broke" the
>> previous behaviour. Whether this is progress or an regression I cannot
>> say. Maybe someone else has an idea?
>>
>>
> Well... not yet :)
>
> We probably can reproduce this problem with any NIC...
>
> Could you send from the 'buggy' setup
>
> $ grep . /proc/sys/net/ipv4/*
Sure:
root>grep . /proc/sys/net/ipv4/*
grep: /proc/sys/net/ipv4/conf: Invalid argument
/proc/sys/net/ipv4/icmp_echo_ignore_all:0
/proc/sys/net/ipv4/icmp_echo_ignore_broadcasts:1
/proc/sys/net/ipv4/icmp_errors_use_inbound_ifaddr:0
/proc/sys/net/ipv4/icmp_ignore_bogus_error_responses:1
/proc/sys/net/ipv4/icmp_ratelimit:1000
/proc/sys/net/ipv4/icmp_ratemask:6168
/proc/sys/net/ipv4/igmp_max_memberships:20
/proc/sys/net/ipv4/igmp_max_msf:10
/proc/sys/net/ipv4/inet_peer_gc_maxtime:120
/proc/sys/net/ipv4/inet_peer_gc_mintime:10
/proc/sys/net/ipv4/inet_peer_maxttl:600
/proc/sys/net/ipv4/inet_peer_minttl:120
/proc/sys/net/ipv4/inet_peer_threshold:65664
/proc/sys/net/ipv4/ip_default_ttl:64
/proc/sys/net/ipv4/ip_dynaddr:0
/proc/sys/net/ipv4/ip_forward:0
/proc/sys/net/ipv4/ip_local_port_range:32768 61000
/proc/sys/net/ipv4/ip_no_pmtu_disc:0
/proc/sys/net/ipv4/ip_nonlocal_bind:0
/proc/sys/net/ipv4/ipfrag_high_thresh:8388608
/proc/sys/net/ipv4/ipfrag_low_thresh:1048575
/proc/sys/net/ipv4/ipfrag_max_dist:64
/proc/sys/net/ipv4/ipfrag_secret_interval:600
/proc/sys/net/ipv4/ipfrag_time:30
grep: /proc/sys/net/ipv4/neigh: Invalid argument
grep: /proc/sys/net/ipv4/route: Invalid argument
/proc/sys/net/ipv4/rt_cache_rebuild_count:4
/proc/sys/net/ipv4/tcp_abc:0
/proc/sys/net/ipv4/tcp_abort_on_overflow:0
/proc/sys/net/ipv4/tcp_adv_win_scale:2
/proc/sys/net/ipv4/tcp_allowed_congestion_control:cubic reno
/proc/sys/net/ipv4/tcp_app_win:31
/proc/sys/net/ipv4/tcp_available_congestion_control:cubic reno
/proc/sys/net/ipv4/tcp_base_mss:512
/proc/sys/net/ipv4/tcp_congestion_control:cubic
/proc/sys/net/ipv4/tcp_dsack:1
/proc/sys/net/ipv4/tcp_ecn:0
/proc/sys/net/ipv4/tcp_fack:1
/proc/sys/net/ipv4/tcp_fin_timeout:60
/proc/sys/net/ipv4/tcp_frto:2
/proc/sys/net/ipv4/tcp_frto_response:0
/proc/sys/net/ipv4/tcp_keepalive_intvl:75
/proc/sys/net/ipv4/tcp_keepalive_probes:9
/proc/sys/net/ipv4/tcp_keepalive_time:7200
/proc/sys/net/ipv4/tcp_low_latency:0
/proc/sys/net/ipv4/tcp_max_orphans:16384
/proc/sys/net/ipv4/tcp_max_ssthresh:0
/proc/sys/net/ipv4/tcp_max_syn_backlog:1024
/proc/sys/net/ipv4/tcp_max_tw_buckets:180000
/proc/sys/net/ipv4/tcp_mem:82944 110592 165888
/proc/sys/net/ipv4/tcp_moderate_rcvbuf:1
/proc/sys/net/ipv4/tcp_mtu_probing:0
/proc/sys/net/ipv4/tcp_no_metrics_save:0
/proc/sys/net/ipv4/tcp_orphan_retries:0
/proc/sys/net/ipv4/tcp_reordering:3
/proc/sys/net/ipv4/tcp_retrans_collapse:1
/proc/sys/net/ipv4/tcp_retries1:3
/proc/sys/net/ipv4/tcp_retries2:15
/proc/sys/net/ipv4/tcp_rfc1337:0
/proc/sys/net/ipv4/tcp_rmem:4096 262144 8388608
/proc/sys/net/ipv4/tcp_sack:1
/proc/sys/net/ipv4/tcp_slow_start_after_idle:1
/proc/sys/net/ipv4/tcp_stdurg:0
/proc/sys/net/ipv4/tcp_syn_retries:5
/proc/sys/net/ipv4/tcp_synack_retries:5
/proc/sys/net/ipv4/tcp_timestamps:1
/proc/sys/net/ipv4/tcp_tso_win_divisor:3
/proc/sys/net/ipv4/tcp_tw_recycle:1
/proc/sys/net/ipv4/tcp_tw_reuse:1
/proc/sys/net/ipv4/tcp_window_scaling:1
/proc/sys/net/ipv4/tcp_wmem:4096 262144 8388608
/proc/sys/net/ipv4/tcp_workaround_signed_windows:0
/proc/sys/net/ipv4/udp_mem:82944 110592 165888
/proc/sys/net/ipv4/udp_rmem_min:4096
/proc/sys/net/ipv4/udp_wmem_min:4096
root>
> When you say squid is stuck, does it mean it doesnt accept new connections
> ?
Yes, that seems to be the behaviour. To verify - I just browse any random
website and after a few requests the browser does not get any replies any
longer, timeouts etc. After setting tw_recycle to 0 it immediately starts
working again (and then continues to work).
Took a bit to find out in which direction things were hanging/getting
stuck, since it was not immediately clear from simply observing the client.
> Could help to strace it and check what it is doing ? --
After a quite frantic start (endless amount of 54 byte writes??) it sits
in its epoll loop and waits:
..
epoll_wait(4, {}, 8192, 1000) = 0
gettimeofday({1252007105, 637264}, NULL) = 0
gettimeofday({1252007105, 637309}, NULL) = 0
epoll_wait(4, {}, 8192, 1000) = 0
gettimeofday({1252007106, 637262}, NULL) = 0
gettimeofday({1252007106, 637308}, NULL) = 0
epoll_wait(4, {}, 8192, 0) = 0
gettimeofday({1252007106, 637389}, NULL) = 0
gettimeofday({1252007106, 637421}, NULL) = 0
epoll_wait(4, {}, 8192, 1000) = 0
gettimeofday({1252007107, 637266}, NULL) = 0
gettimeofday({1252007107, 637311}, NULL) = 0
..etc..
and occasionally diddles with its cache directory:
..
epoll_wait(4, {}, 8192, 997) = 0
gettimeofday({1252007148, 554039}, NULL) = 0
gettimeofday({1252007148, 554097}, NULL) = 0
open("/var/cache/squid/02/09", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 9
getdents64(9, /* 113 entries */, 32768) = 3600
getdents64(9, /* 0 entries */, 32768) = 0
close(9) = 0
epoll_wait(4, {}, 8192, 0) = 0
gettimeofday({1252007148, 555923}, NULL) = 0
..
When requests come in, it does whatever squid does, but when it hangs, the
following popped up:
gettimeofday({1252007207, 116223}, NULL) = 0
epoll_ctl(4, EPOLL_CTL_MOD, 23, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=23, u64=582647025628086295}}) = 0
gettimeofday({1252007207, 116380}, NULL) = 0
epoll_wait(4, {{EPOLLIN, {u32=15, u64=582647025628086287}}}, 8192, 359) = 1
gettimeofday({1252007207, 117309}, NULL) = 0
read(15, "48,33,48,31,40,31\" href=\"/2/hi/af"..., 2046) = 581
gettimeofday({1252007207, 117476}, NULL) = 0
epoll_ctl(4, EPOLL_CTL_MOD, 23, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=23, u64=23}}) = 0
epoll_wait(4, {{EPOLLIN, {u32=15, u64=582647025628086287}}, {EPOLLOUT, {u32=23, u64=23}}}, 8192, 358) = 2
gettimeofday({1252007207, 117790}, NULL) = 0
read(15, ""..., 2046) = 0
write(23, "48,33,48,31,40,31\" href=\"/2/hi/af"..., 581) = 581
gettimeofday({1252007207, 118032}, NULL) = 0
read(15, ""..., 65535) = 0
read(23, 0xbf925401, 65535) = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(4, EPOLL_CTL_DEL, 15, {0, {u32=15, u64=15}}) = 0
close(15) = 0
epoll_ctl(4, EPOLL_CTL_MOD, 23, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=23, u64=581683939341500439}}) = 0
epoll_ctl(4, EPOLL_CTL_DEL, 23, {0, {u32=23, u64=581683939341500439}}) = 0
close(23) = 0
epoll_wait(4, {}, 8192, 357) = 0
gettimeofday({1252007207, 477363}, NULL) = 0
gettimeofday({1252007207, 477408}, NULL) = 0
epoll_wait(4, {}, 8192, 0) = 0
gettimeofday({1252007207, 477561}, NULL) = 0
gettimeofday({1252007207, 477629}, NULL) = 0
epoll_wait(4, {}, 8192, 449) = 0
Then it again spins in the epoll loop for a while until it recovers and
starts to handle the outstanding requests (as far as I can read - it
scrolls by too quickly).
Was that somewhat helpful? I can certainly create a full trace but that's
going to be big.
regards
Holger
next prev parent reply other threads:[~2009-09-03 19:56 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-01 9:50 Network hangs with 2.6.30.5 Clifford Heath
2009-09-01 10:47 ` Eric Dumazet
2009-09-01 11:20 ` Ben Hutchings
2009-09-01 14:17 ` Holger Hoffstaette
2009-09-01 15:32 ` Holger Hoffstaette
2009-09-03 7:46 ` Jarek Poplawski
2009-09-03 19:20 ` Holger Hoffstaette
2009-09-03 19:27 ` Eric Dumazet
2009-09-03 19:55 ` Holger Hoffstaette [this message]
2009-09-07 7:21 ` Jarek Poplawski
2009-09-10 22:41 ` Clifford Heath
2009-10-01 22:49 ` David Miller
2009-10-02 8:11 ` Ilpo Järvinen
2009-10-02 12:29 ` Ilpo Järvinen
2009-10-02 12:38 ` Eric Dumazet
2009-11-19 23:40 ` David Miller
2009-11-20 12:04 ` Evgeniy Polyakov
2009-11-20 12:09 ` Ilpo Järvinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pan.2009.09.03.19.55.37.672875@googlemail.com \
--to=holger.hoffstaette@googlemail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.