* TCP default settings (bugzilla)
@ 2026-04-15 14:14 Stephen Hemminger
0 siblings, 0 replies; 4+ messages in thread
From: Stephen Hemminger @ 2026-04-15 14:14 UTC (permalink / raw)
To: netdev
A pair of TCP configuration related bug reports just showed up in bugzilla.
Getting the right time values here seems like a trade off between fast
failover and not dropping crappy connections.
Given how well formatted the buts are they look AI generated.
https://bugzilla.kernel.org/show_bug.cgi?id=221366
The default value of net.ipv4.tcp_retries2 (15 retries, resulting in
~924 seconds / ~15.4 minutes before TCP abandons a dead connection) is
far too high for modern data center environments. When a remote host
becomes unreachable (server crash, failover, network partition),
applications are stuck for up to 16 minutes before receiving an error
and taking recovery action. This causes cascading failures, connection
pool exhaustion, and prolonged service outages.
https://bugzilla.kernel.org/show_bug.cgi?id=221365
The default value of net.ipv4.tcp_keepalive_time (7200 seconds / 2
hours) is incompatible with virtually all modern network
infrastructure, causing silent connection failures. Intermediate
stateful devices (load balancers, firewalls, NAT gateways) routinely
expire idle TCP connections after 300-1800 seconds — long before the
first keepalive probe is ever sent.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: TCP default settings (bugzilla)
@ 2026-04-17 5:58 plantegg ren
0 siblings, 0 replies; 4+ messages in thread
From: plantegg ren @ 2026-04-17 5:58 UTC (permalink / raw)
To: stephen; +Cc: netdev
Hi Stephen,
I'm the reporter of those two bugs. I'm a DBA and Linux SRE with over
10 years at Alibaba Cloud (Aliyun).
These come from real production pain, not just theory. During my time at
Alibaba Cloud, I pushed to change the default tcp_retries2 from 15 to 7
in Alibaba Cloud Linux 3 (ALinux3) — our in-house distro serving millions
of ECS instances. That change alone eliminated a whole class of prolonged
outages across the fleet.
The most memorable case: MySQL crashed and restarted in seconds, but the
application tier stayed down for ~16 minutes because all existing
connections were stuck in retransmission. After changing tcp_retries2 from
15 to 5, recovery time dropped from 957s to about 20s.
The tcp_keepalive_time issue bit us through LVS — connections silently
dropped after 900s of idle time, but TCP didn't notice until 7200s later.
We spent days chasing "random" Connection Reset errors across dozens of
services before tracing it to this mismatch.
Every ops team I've talked to ends up applying these tweaks independently
after getting burned. If a major cloud distro already ships tcp_retries2=7,
maybe it's time for upstream to reconsider the default too.
I did use AI to help format the bug reports (guilty as charged), but the
problems and the data are from years of production experience.
Thanks for forwarding to the list.
Xijun Ren
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: TCP default settings (bugzilla)
@ 2026-04-17 7:01 plantegg ren
2026-04-17 7:33 ` Willy Tarreau
0 siblings, 1 reply; 4+ messages in thread
From: plantegg ren @ 2026-04-17 7:01 UTC (permalink / raw)
To: stephen; +Cc: netdev
Hi,
One more real-world data point that just happened two weeks ago,
directly related to tcp_keepalive_time.
AWS recently rolled out Nitro V6 (8th-gen EC2 instances) which reduced
the ENI connection tracking timeout from 432000 seconds (5 days) to
just 350 seconds:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/security-group-connection-tracking.html
Our MySQL/HikariCP connection pools started seeing intermittent timeout
errors every 20-30 minutes after migrating to 8th-gen instances. We
captured packets on both client and server simultaneously. Here is what
we found on a single connection (idle for 818 seconds, well past the
350-second ENI timeout):
Server side -- MySQL receives the request and sends responses normally:
#270 71.51s 10.23.99.71 -> 172.20.64.240 [ACK] last activity
~~~ connection idle for 818 seconds ~~~
#271 889.94s 10.23.99.71 -> 172.20.64.240 [PSH,ACK] len=5 client
request arrives
#272 889.94s 172.20.64.240 -> 10.23.99.71 [PSH,ACK] len=11 server
responds OK
#275 890.15s 172.20.64.240 -> 10.23.99.71 [PSH,ACK] len=11 server
retransmits
#278 890.59s 172.20.64.240 -> 10.23.99.71 [PSH,ACK] len=11 server
retransmits
#281 891.02s 172.20.64.240 -> 10.23.99.71 [PSH,ACK] len=11 server
retransmits
... (server keeps retransmitting, client never ACKs)
Client side -- sends request, but NEVER receives any server response:
#267 71.51s 10.23.99.71 -> 172.20.64.240 [ACK] last activity
~~~ connection idle for 818 seconds ~~~
#268 889.94s 10.23.99.71 -> 172.20.64.240 [PSH,ACK] len=5 sends request
#269 890.15s 10.23.99.71 -> 172.20.64.240 [PSH,ACK] len=5 retransmit 1
#270 890.37s 10.23.99.71 -> 172.20.64.240 [PSH,ACK] len=5 retransmit 2
#271 890.79s 10.23.99.71 -> 172.20.64.240 [PSH,ACK] len=5 retransmit 3
#272 891.65s 10.23.99.71 -> 172.20.64.240 [PSH,ACK] len=5 retransmit 4
#273 893.38s 10.23.99.71 -> 172.20.64.240 [PSH,ACK] len=5 retransmit 5
#274 894.94s 10.23.99.71 -> 172.20.64.240 [FIN,ACK] gives up
Zero packets from 172.20.64.240 after the idle gap. Zero RSTs.
The ENI silently drops all inbound packets (server -> client) because
the connection tracking entry expired after 350 seconds. Outbound
packets (client -> server) still pass through, so the server receives
the request and responds -- but its responses are black-holed by the
ENI. No RST is sent, so both sides are completely unaware.
If tcp_keepalive_time were lower than 350 seconds, the keepalive probes
would have kept the ENI tracking entry alive, and none of this would
have happened.
The trend is clear -- middlebox idle timeouts are getting shorter (AWS
went from 432000s to 350s overnight), while tcp_keepalive_time has
stayed at 7200 seconds for decades. The gap is widening.
Xijun
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: TCP default settings (bugzilla)
2026-04-17 7:01 plantegg ren
@ 2026-04-17 7:33 ` Willy Tarreau
0 siblings, 0 replies; 4+ messages in thread
From: Willy Tarreau @ 2026-04-17 7:33 UTC (permalink / raw)
To: plantegg ren; +Cc: stephen, netdev
On Fri, Apr 17, 2026 at 03:01:08PM +0800, plantegg ren wrote:
> Hi,
>
> One more real-world data point that just happened two weeks ago,
> directly related to tcp_keepalive_time.
>
> AWS recently rolled out Nitro V6 (8th-gen EC2 instances) which reduced
> the ENI connection tracking timeout from 432000 seconds (5 days) to
> just 350 seconds:
>
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/security-group-connection-tracking.html
>
> Our MySQL/HikariCP connection pools started seeing intermittent timeout
> errors every 20-30 minutes after migrating to 8th-gen instances. We
> captured packets on both client and server simultaneously. Here is what
> we found on a single connection (idle for 818 seconds, well past the
> 350-second ENI timeout):
>
> Server side -- MySQL receives the request and sends responses normally:
>
> #270 71.51s 10.23.99.71 -> 172.20.64.240 [ACK] last activity
> ~~~ connection idle for 818 seconds ~~~
> #271 889.94s 10.23.99.71 -> 172.20.64.240 [PSH,ACK] len=5 client
> request arrives
> #272 889.94s 172.20.64.240 -> 10.23.99.71 [PSH,ACK] len=11 server
> responds OK
> #275 890.15s 172.20.64.240 -> 10.23.99.71 [PSH,ACK] len=11 server
> retransmits
> #278 890.59s 172.20.64.240 -> 10.23.99.71 [PSH,ACK] len=11 server
> retransmits
> #281 891.02s 172.20.64.240 -> 10.23.99.71 [PSH,ACK] len=11 server
> retransmits
> ... (server keeps retransmitting, client never ACKs)
>
> Client side -- sends request, but NEVER receives any server response:
>
> #267 71.51s 10.23.99.71 -> 172.20.64.240 [ACK] last activity
> ~~~ connection idle for 818 seconds ~~~
> #268 889.94s 10.23.99.71 -> 172.20.64.240 [PSH,ACK] len=5 sends request
> #269 890.15s 10.23.99.71 -> 172.20.64.240 [PSH,ACK] len=5 retransmit 1
> #270 890.37s 10.23.99.71 -> 172.20.64.240 [PSH,ACK] len=5 retransmit 2
> #271 890.79s 10.23.99.71 -> 172.20.64.240 [PSH,ACK] len=5 retransmit 3
> #272 891.65s 10.23.99.71 -> 172.20.64.240 [PSH,ACK] len=5 retransmit 4
> #273 893.38s 10.23.99.71 -> 172.20.64.240 [PSH,ACK] len=5 retransmit 5
> #274 894.94s 10.23.99.71 -> 172.20.64.240 [FIN,ACK] gives up
>
> Zero packets from 172.20.64.240 after the idle gap. Zero RSTs.
>
> The ENI silently drops all inbound packets (server -> client) because
> the connection tracking entry expired after 350 seconds. Outbound
> packets (client -> server) still pass through, so the server receives
> the request and responds -- but its responses are black-holed by the
> ENI. No RST is sent, so both sides are completely unaware.
>
> If tcp_keepalive_time were lower than 350 seconds, the keepalive probes
> would have kept the ENI tracking entry alive, and none of this would
> have happened.
>
> The trend is clear -- middlebox idle timeouts are getting shorter (AWS
> went from 432000s to 350s overnight), while tcp_keepalive_time has
> stayed at 7200 seconds for decades. The gap is widening.
It's up to the application to configure the keepalive interval if it
is relying on long connections, it's done using TCP_KEEPINTVL, and if
you're dealing with an application that doesn't expose the setting,
you indeed still have access to the system-wide setting above.
It's been well-known for at least two decades that no middle box could
sanely keep idle connections forever with the amount of traffic they're
seeing. 25 years ago I was already tuning the conntrack timeouts for a
bank firewall that was dealing with only 6k connections per second so
as to stay within reasonable memory sizes while keeping a good quality
of service. There's nothing new here.
Willy
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-17 7:33 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-15 14:14 TCP default settings (bugzilla) Stephen Hemminger
-- strict thread matches above, loose matches on Subject: below --
2026-04-17 5:58 plantegg ren
2026-04-17 7:01 plantegg ren
2026-04-17 7:33 ` Willy Tarreau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox