From: Kuniyuki Iwashima <kuniyu@google.com>
To: gopalmalaviya53@gmail.com
Cc: netdev@vger.kernel.org
Subject: Re: [RFC] net: ipv4: optional early cleanup of half-closed TCP sockets
Date: Tue, 20 Jan 2026 21:47:16 +0000 [thread overview]
Message-ID: <20260120214802.270100-1-kuniyu@google.com> (raw)
In-Reply-To: <CAFWXMN3WexdQEsjZRGumLuyA8phXmSte_j7JTAjL_v11ZxAmtg@mail.gmail.com>
From: Gopal Malaviya <gopalmalaviya53@gmail.com>
Date: Wed, 21 Jan 2026 02:03:39 +0530
> Hi,
>
> Background:
>
> I am looking into cases where TCP sockets that transition into
> half-closed states (CLOSE_WAIT or FIN_WAIT2 after receipt of FIN)
> remain available long enough to be reused by userland connection
> pools. In some HTTP client workloads, especially those involving
> frequent requests with large request bodies, reuse of such sockets
> can lead to follow-up failures such as timeouts or premature close
> events on subsequent operations.
>
> This behavior is compliant with TCP semantics, but application-level
> connection pools may incorrectly assume that a socket is still usable
> as long as it has not been explicitly closed.
>
> Problem:
>
> When a remote peer closes its send side early, the local socket
> enters a half-closed state as described in RFC 793, RFC 1122, and
> RFC 9293. These states are correct and expected. However, sockets
> in CLOSE_WAIT or FIN_WAIT2 may persist long enough to be returned
> to userland pools, even though practical data exchange is no longer
> possible.
>
> For workloads that rely heavily on persistent connection reuse,
> this can cause intermittent and difficult-to-diagnose failures.
>
> Proposal:
>
> Introduce an optional sysctl:
>
> net.ipv4.tcp_aggressive_halfclose = 0 (default)
>
> When enabled:
>
> - Upon receiving FIN and transitioning into CLOSE_WAIT or FIN_WAIT2,
> the socket is marked as a candidate for early teardown.
>
> - After a short configurable grace period (seconds or keepalive
> probes), if the socket remains half-closed, the kernel performs
> a normal teardown using existing mechanisms (e.g. tcp_done()).
>
> - Sockets handled in this mode would also avoid TIME_WAIT reuse,
> ensuring they are not inadvertently returned to userland.
>
> A secondary sysctl could control the grace interval, for example:
>
> net.ipv4.tcp_aggressive_halfclose_grace = <seconds>
>
> Default TCP behavior remains unchanged unless explicitly enabled.
>
> Rationale:
>
> The intent is to provide an opt-in mechanism for environments where
> reuse of half-closed sockets interacts poorly with application-managed
> connection pools. The proposal does not modify semantics for established
> connections, connection setup, or orderly close initiated locally.
>
> RFC 793, RFC 1122, and RFC 9293 define the TCP state machine and
> half-close behavior but allow implementations flexibility in resource
> management and socket lifetime. This proposal aims to use that
> flexibility in a narrowly-scoped and optional manner.
>
> Implementation notes (initial thoughts):
>
> - Tag sockets on FIN reception when entering CLOSE_WAIT or FIN_WAIT2.
> - Apply a short timer or probe-based grace period.
> - On expiry, perform standard teardown.
> - Avoid TIME_WAIT reuse for sockets marked for aggressive half-close.
> - Keep all behavior gated behind sysctl(s).
>
> Request for feedback:
>
> Before preparing a full patch series, I would appreciate feedback on:
>
> - Whether the general idea is acceptable as an opt-in extension.
> - Preferred naming and placement of the sysctl(s).
> - Whether a grace period is preferred over immediate teardown.
> - Any interactions with existing timers or state transitions
> that should be considered.
> - Any related prior discussions worth reviewing.
You can implement the logic in userspace,
e.g. with "ss --kill" :
1. Create CLOSE-WAIT and FIN-WAIT-2 sockets
# python3
>>> from socket import *
>>> s = socket()
>>> s.listen()
>>> c = socket()
>>> c.connect(s.getsockname())
>>> s1, _ = s.accept()
>>> c
<socket.socket fd=6, family=2, type=1, proto=0, laddr=('127.0.0.1', 46490), raddr=('127.0.0.1', 58241)>
>>> c.close()
# ss -tan
...
CLOSE-WAIT 1 0 127.0.0.1:58241 127.0.0.1:46490
FIN-WAIT-2 0 0 127.0.0.1:46490 127.0.0.1:58241
2. Close them
# ss --kill -t sport == 46490
...
FIN-WAIT-2 0 0 127.0.0.1:46490 127.0.0.1:58241
# ss --kill -t dport == 46490
...
CLOSE-WAIT 1 0 127.0.0.1:58241 127.0.0.1:46490
next prev parent reply other threads:[~2026-01-20 21:48 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-20 20:33 [RFC] net: ipv4: optional early cleanup of half-closed TCP sockets Gopal Malaviya
2026-01-20 21:47 ` Kuniyuki Iwashima [this message]
2026-01-29 16:51 ` Gopal Malaviya
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260120214802.270100-1-kuniyu@google.com \
--to=kuniyu@google.com \
--cc=gopalmalaviya53@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox