Re: [RFC] net: ipv4: optional early cleanup of half-closed TCP sockets

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Kuniyuki Iwashima <kuniyu@google.com>
To: gopalmalaviya53@gmail.com
Cc: netdev@vger.kernel.org
Subject: Re: [RFC] net: ipv4: optional early cleanup of half-closed TCP sockets
Date: Tue, 20 Jan 2026 21:47:16 +0000	[thread overview]
Message-ID: <20260120214802.270100-1-kuniyu@google.com> (raw)
In-Reply-To: <CAFWXMN3WexdQEsjZRGumLuyA8phXmSte_j7JTAjL_v11ZxAmtg@mail.gmail.com>

From: Gopal Malaviya <gopalmalaviya53@gmail.com>
Date: Wed, 21 Jan 2026 02:03:39 +0530
> Hi,
> 
> Background:
> 
> I am looking into cases where TCP sockets that transition into
> half-closed states (CLOSE_WAIT or FIN_WAIT2 after receipt of FIN)
> remain available long enough to be reused by userland connection
> pools. In some HTTP client workloads, especially those involving
> frequent requests with large request bodies, reuse of such sockets
> can lead to follow-up failures such as timeouts or premature close
> events on subsequent operations.
> 
> This behavior is compliant with TCP semantics, but application-level
> connection pools may incorrectly assume that a socket is still usable
> as long as it has not been explicitly closed.
> 
> Problem:
> 
> When a remote peer closes its send side early, the local socket
> enters a half-closed state as described in RFC 793, RFC 1122, and
> RFC 9293. These states are correct and expected. However, sockets
> in CLOSE_WAIT or FIN_WAIT2 may persist long enough to be returned
> to userland pools, even though practical data exchange is no longer
> possible.
> 
> For workloads that rely heavily on persistent connection reuse,
> this can cause intermittent and difficult-to-diagnose failures.
> 
> Proposal:
> 
> Introduce an optional sysctl:
> 
>     net.ipv4.tcp_aggressive_halfclose = 0 (default)
> 
> When enabled:
> 
>   - Upon receiving FIN and transitioning into CLOSE_WAIT or FIN_WAIT2,
>     the socket is marked as a candidate for early teardown.
> 
>   - After a short configurable grace period (seconds or keepalive
>     probes), if the socket remains half-closed, the kernel performs
>     a normal teardown using existing mechanisms (e.g. tcp_done()).
> 
>   - Sockets handled in this mode would also avoid TIME_WAIT reuse,
>     ensuring they are not inadvertently returned to userland.
> 
> A secondary sysctl could control the grace interval, for example:
> 
>     net.ipv4.tcp_aggressive_halfclose_grace = <seconds>
> 
> Default TCP behavior remains unchanged unless explicitly enabled.
> 
> Rationale:
> 
> The intent is to provide an opt-in mechanism for environments where
> reuse of half-closed sockets interacts poorly with application-managed
> connection pools. The proposal does not modify semantics for established
> connections, connection setup, or orderly close initiated locally.
> 
> RFC 793, RFC 1122, and RFC 9293 define the TCP state machine and
> half-close behavior but allow implementations flexibility in resource
> management and socket lifetime. This proposal aims to use that
> flexibility in a narrowly-scoped and optional manner.
> 
> Implementation notes (initial thoughts):
> 
>   - Tag sockets on FIN reception when entering CLOSE_WAIT or FIN_WAIT2.
>   - Apply a short timer or probe-based grace period.
>   - On expiry, perform standard teardown.
>   - Avoid TIME_WAIT reuse for sockets marked for aggressive half-close.
>   - Keep all behavior gated behind sysctl(s).
> 
> Request for feedback:
> 
> Before preparing a full patch series, I would appreciate feedback on:
> 
>   - Whether the general idea is acceptable as an opt-in extension.
>   - Preferred naming and placement of the sysctl(s).
>   - Whether a grace period is preferred over immediate teardown.
>   - Any interactions with existing timers or state transitions
>     that should be considered.
>   - Any related prior discussions worth reviewing.

You can implement the logic in userspace,
e.g. with "ss --kill" :

1. Create CLOSE-WAIT and FIN-WAIT-2 sockets

  # python3
  >>> from socket import *
  >>> s = socket()
  >>> s.listen()
  >>> c = socket()
  >>> c.connect(s.getsockname())
  >>> s1, _ = s.accept()
  >>> c
  <socket.socket fd=6, family=2, type=1, proto=0, laddr=('127.0.0.1', 46490), raddr=('127.0.0.1', 58241)>
  >>> c.close()

  # ss -tan
  ...
  CLOSE-WAIT 1      0  127.0.0.1:58241      127.0.0.1:46490
  FIN-WAIT-2 0      0  127.0.0.1:46490      127.0.0.1:58241

2. Close them

  # ss --kill -t sport == 46490
  ...
  FIN-WAIT-2 0      0  127.0.0.1:46490      127.0.0.1:58241

  # ss --kill -t dport == 46490
  ...
  CLOSE-WAIT 1      0  127.0.0.1:58241      127.0.0.1:46490

next prev parent reply	other threads:[~2026-01-20 21:48 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-20 20:33 [RFC] net: ipv4: optional early cleanup of half-closed TCP sockets Gopal Malaviya
2026-01-20 21:47 ` Kuniyuki Iwashima [this message]
2026-01-29 16:51   ` Gopal Malaviya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260120214802.270100-1-kuniyu@google.com \
    --to=kuniyu@google.com \
    --cc=gopalmalaviya53@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox