public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] net: ipv4: optional early cleanup of half-closed TCP sockets
@ 2026-01-20 20:33 Gopal Malaviya
  2026-01-20 21:47 ` Kuniyuki Iwashima
  0 siblings, 1 reply; 3+ messages in thread
From: Gopal Malaviya @ 2026-01-20 20:33 UTC (permalink / raw)
  To: netdev

Hi,

Background:

I am looking into cases where TCP sockets that transition into
half-closed states (CLOSE_WAIT or FIN_WAIT2 after receipt of FIN)
remain available long enough to be reused by userland connection
pools. In some HTTP client workloads, especially those involving
frequent requests with large request bodies, reuse of such sockets
can lead to follow-up failures such as timeouts or premature close
events on subsequent operations.

This behavior is compliant with TCP semantics, but application-level
connection pools may incorrectly assume that a socket is still usable
as long as it has not been explicitly closed.

Problem:

When a remote peer closes its send side early, the local socket
enters a half-closed state as described in RFC 793, RFC 1122, and
RFC 9293. These states are correct and expected. However, sockets
in CLOSE_WAIT or FIN_WAIT2 may persist long enough to be returned
to userland pools, even though practical data exchange is no longer
possible.

For workloads that rely heavily on persistent connection reuse,
this can cause intermittent and difficult-to-diagnose failures.

Proposal:

Introduce an optional sysctl:

    net.ipv4.tcp_aggressive_halfclose = 0 (default)

When enabled:

  - Upon receiving FIN and transitioning into CLOSE_WAIT or FIN_WAIT2,
    the socket is marked as a candidate for early teardown.

  - After a short configurable grace period (seconds or keepalive
    probes), if the socket remains half-closed, the kernel performs
    a normal teardown using existing mechanisms (e.g. tcp_done()).

  - Sockets handled in this mode would also avoid TIME_WAIT reuse,
    ensuring they are not inadvertently returned to userland.

A secondary sysctl could control the grace interval, for example:

    net.ipv4.tcp_aggressive_halfclose_grace = <seconds>

Default TCP behavior remains unchanged unless explicitly enabled.

Rationale:

The intent is to provide an opt-in mechanism for environments where
reuse of half-closed sockets interacts poorly with application-managed
connection pools. The proposal does not modify semantics for established
connections, connection setup, or orderly close initiated locally.

RFC 793, RFC 1122, and RFC 9293 define the TCP state machine and
half-close behavior but allow implementations flexibility in resource
management and socket lifetime. This proposal aims to use that
flexibility in a narrowly-scoped and optional manner.

Implementation notes (initial thoughts):

  - Tag sockets on FIN reception when entering CLOSE_WAIT or FIN_WAIT2.
  - Apply a short timer or probe-based grace period.
  - On expiry, perform standard teardown.
  - Avoid TIME_WAIT reuse for sockets marked for aggressive half-close.
  - Keep all behavior gated behind sysctl(s).

Request for feedback:

Before preparing a full patch series, I would appreciate feedback on:

  - Whether the general idea is acceptable as an opt-in extension.
  - Preferred naming and placement of the sysctl(s).
  - Whether a grace period is preferred over immediate teardown.
  - Any interactions with existing timers or state transitions
    that should be considered.
  - Any related prior discussions worth reviewing.

Thanks for your time and guidance.

Gopal Malaviya

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] net: ipv4: optional early cleanup of half-closed TCP sockets
  2026-01-20 20:33 [RFC] net: ipv4: optional early cleanup of half-closed TCP sockets Gopal Malaviya
@ 2026-01-20 21:47 ` Kuniyuki Iwashima
  2026-01-29 16:51   ` Gopal Malaviya
  0 siblings, 1 reply; 3+ messages in thread
From: Kuniyuki Iwashima @ 2026-01-20 21:47 UTC (permalink / raw)
  To: gopalmalaviya53; +Cc: netdev

From: Gopal Malaviya <gopalmalaviya53@gmail.com>
Date: Wed, 21 Jan 2026 02:03:39 +0530
> Hi,
> 
> Background:
> 
> I am looking into cases where TCP sockets that transition into
> half-closed states (CLOSE_WAIT or FIN_WAIT2 after receipt of FIN)
> remain available long enough to be reused by userland connection
> pools. In some HTTP client workloads, especially those involving
> frequent requests with large request bodies, reuse of such sockets
> can lead to follow-up failures such as timeouts or premature close
> events on subsequent operations.
> 
> This behavior is compliant with TCP semantics, but application-level
> connection pools may incorrectly assume that a socket is still usable
> as long as it has not been explicitly closed.
> 
> Problem:
> 
> When a remote peer closes its send side early, the local socket
> enters a half-closed state as described in RFC 793, RFC 1122, and
> RFC 9293. These states are correct and expected. However, sockets
> in CLOSE_WAIT or FIN_WAIT2 may persist long enough to be returned
> to userland pools, even though practical data exchange is no longer
> possible.
> 
> For workloads that rely heavily on persistent connection reuse,
> this can cause intermittent and difficult-to-diagnose failures.
> 
> Proposal:
> 
> Introduce an optional sysctl:
> 
>     net.ipv4.tcp_aggressive_halfclose = 0 (default)
> 
> When enabled:
> 
>   - Upon receiving FIN and transitioning into CLOSE_WAIT or FIN_WAIT2,
>     the socket is marked as a candidate for early teardown.
> 
>   - After a short configurable grace period (seconds or keepalive
>     probes), if the socket remains half-closed, the kernel performs
>     a normal teardown using existing mechanisms (e.g. tcp_done()).
> 
>   - Sockets handled in this mode would also avoid TIME_WAIT reuse,
>     ensuring they are not inadvertently returned to userland.
> 
> A secondary sysctl could control the grace interval, for example:
> 
>     net.ipv4.tcp_aggressive_halfclose_grace = <seconds>
> 
> Default TCP behavior remains unchanged unless explicitly enabled.
> 
> Rationale:
> 
> The intent is to provide an opt-in mechanism for environments where
> reuse of half-closed sockets interacts poorly with application-managed
> connection pools. The proposal does not modify semantics for established
> connections, connection setup, or orderly close initiated locally.
> 
> RFC 793, RFC 1122, and RFC 9293 define the TCP state machine and
> half-close behavior but allow implementations flexibility in resource
> management and socket lifetime. This proposal aims to use that
> flexibility in a narrowly-scoped and optional manner.
> 
> Implementation notes (initial thoughts):
> 
>   - Tag sockets on FIN reception when entering CLOSE_WAIT or FIN_WAIT2.
>   - Apply a short timer or probe-based grace period.
>   - On expiry, perform standard teardown.
>   - Avoid TIME_WAIT reuse for sockets marked for aggressive half-close.
>   - Keep all behavior gated behind sysctl(s).
> 
> Request for feedback:
> 
> Before preparing a full patch series, I would appreciate feedback on:
> 
>   - Whether the general idea is acceptable as an opt-in extension.
>   - Preferred naming and placement of the sysctl(s).
>   - Whether a grace period is preferred over immediate teardown.
>   - Any interactions with existing timers or state transitions
>     that should be considered.
>   - Any related prior discussions worth reviewing.

You can implement the logic in userspace,
e.g. with "ss --kill" :

1. Create CLOSE-WAIT and FIN-WAIT-2 sockets

  # python3
  >>> from socket import *
  >>> s = socket()
  >>> s.listen()
  >>> c = socket()
  >>> c.connect(s.getsockname())
  >>> s1, _ = s.accept()
  >>> c
  <socket.socket fd=6, family=2, type=1, proto=0, laddr=('127.0.0.1', 46490), raddr=('127.0.0.1', 58241)>
  >>> c.close()

  # ss -tan
  ...
  CLOSE-WAIT 1      0  127.0.0.1:58241      127.0.0.1:46490
  FIN-WAIT-2 0      0  127.0.0.1:46490      127.0.0.1:58241

2. Close them

  # ss --kill -t sport == 46490
  ...
  FIN-WAIT-2 0      0  127.0.0.1:46490      127.0.0.1:58241

  # ss --kill -t dport == 46490
  ...
  CLOSE-WAIT 1      0  127.0.0.1:58241      127.0.0.1:46490

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] net: ipv4: optional early cleanup of half-closed TCP sockets
  2026-01-20 21:47 ` Kuniyuki Iwashima
@ 2026-01-29 16:51   ` Gopal Malaviya
  0 siblings, 0 replies; 3+ messages in thread
From: Gopal Malaviya @ 2026-01-29 16:51 UTC (permalink / raw)
  To: Kuniyuki Iwashima; +Cc: netdev

Hi,

Thanks for the userspace example using ss --kill, that is a helpful
reference point.

Apart from userspace approaches, the motivation for raising this as
an RFC comes from scenarios where half-closed sockets are reused by
application-managed connection pools before any external userspace
action can detect or intervene. In those cases, polling- or
admin-driven mechanisms tend to be reactive and can race with reuse.

The question I am trying to explore is whether a narrowly scoped,
opt-in kernel policy tied directly to TCP state transitions
(particularly FIN_WAIT2) could be useful for workloads that need
deterministic cleanup without relying on external tooling.

Happy to adjust scope or direction based on feedback.

Thanks,
Gopal Malaviya

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-01-29 16:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-20 20:33 [RFC] net: ipv4: optional early cleanup of half-closed TCP sockets Gopal Malaviya
2026-01-20 21:47 ` Kuniyuki Iwashima
2026-01-29 16:51   ` Gopal Malaviya

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox