Netdev List
 help / color / mirror / Atom feed
* [BUG] TCP connection deadlock under simultaneous bidirectional ICSK_ACK_NOMEM (OOM)
@ 2026-06-04  8:22 xietangxin
  2026-06-08 11:55 ` Menglong Dong
  0 siblings, 1 reply; 2+ messages in thread
From: xietangxin @ 2026-06-04  8:22 UTC (permalink / raw)
  To: edumazet, davem, kuba, pabeni
  Cc: jmaloy, menglong8.dong, kuniyu, horms, willemb, netdev,
	linux-kernel

Hi all,

We have observed a TCP connection deadlock on stable 6.6 under heavy stress testing.

1.Both Peer A and Peer B enter the ICSK_ACK_NOMEM branch in tcp_select_window().
After commit 8c670bdfa58e ("tcp: correct handling of extreme memory squeeze"),
Both peers freeze their rcv_nxt and set rcv_wnd = 0.

2.Prior to freezing, both sides had already sent out flight data.
Since both sides are dropping incoming data packets due to OOM, rcv_nxt stops advancing,
but the peer's seq of subsequent packets continues to grow.

3.When Peer A receives Peer B's Zero Window ACK,
the packet's seq is far ahead of Peer A's frozen rcv_nxt.
Both peers drop each other's packet, also no Zero Window Probes are triggered
because snd_wnd is never updated to 0.


Simplified Packet Trace:

Assume Peer A's rcv_nxt = 1000, and Peer B's rcv_nxt = 5000 initially.

Time  Dir      Type        Seq   Ack   Win  Len  Status
------------------------------------------------------------------------
T1:   B -> A   [PSH, ACK]  1000  5000  3000 100  (A hits OOM, rcv_nxt=1000)
T2:   B -> A   [ACK]       1100  5000  3000 200  (Dropped due to A's OOM)
T3:   B -> A   [PSH, ACK]  1300  5000  3000 200  (Dropped due to A's OOM)

T4:   A -> B   [PSH, ACK]  5000  1000  3000 100  (B hits OOM, rcv_nxt=5000)
T5:   A -> B   [ACK]       5100  1000  3000 200  (Dropped due to B's OOM)
T6:   A -> B   [PSH, ACK]  5300  1000  3000 200  (Dropped due to B's OOM)

-- Both sides are now in OOM. B's Seq is 1500; A's Seq is 5500 --

T7:   B -> A   [ZeroWin]   1500  5000  0    0    (Dropped: Seq 1500 != 1000)
T8:   A -> B   [ZeroWin]   5500  1000  0    0    (Dropped: Seq 5500 != 5000)
T9:   A -> B   [WinUpdate] 5500  1000  20   0    (Dropped: Seq 5500 != 5000)

Should we relax the sequence check in tcp_sequence() for zero window ACK?

Any feedback or guidance would be greatly appreciated.

-- 
Best regards,
Tangxin Xie


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [BUG] TCP connection deadlock under simultaneous bidirectional ICSK_ACK_NOMEM (OOM)
  2026-06-04  8:22 [BUG] TCP connection deadlock under simultaneous bidirectional ICSK_ACK_NOMEM (OOM) xietangxin
@ 2026-06-08 11:55 ` Menglong Dong
  0 siblings, 0 replies; 2+ messages in thread
From: Menglong Dong @ 2026-06-08 11:55 UTC (permalink / raw)
  To: xietangxin
  Cc: edumazet, davem, kuba, pabeni, jmaloy, menglong8.dong, kuniyu,
	horms, willemb, netdev, linux-kernel, linux-stable

On 2026/6/4 16:22 xietangxin <xietangxin@yeah.net> write:
> Hi all,
> 
> We have observed a TCP connection deadlock on stable 6.6 under heavy stress testing.
> 
> 1.Both Peer A and Peer B enter the ICSK_ACK_NOMEM branch in tcp_select_window().
> After commit 8c670bdfa58e ("tcp: correct handling of extreme memory squeeze"),
> Both peers freeze their rcv_nxt and set rcv_wnd = 0.
> 
> 2.Prior to freezing, both sides had already sent out flight data.
> Since both sides are dropping incoming data packets due to OOM, rcv_nxt stops advancing,
> but the peer's seq of subsequent packets continues to grow.
> 
> 3.When Peer A receives Peer B's Zero Window ACK,
> the packet's seq is far ahead of Peer A's frozen rcv_nxt.
> Both peers drop each other's packet, also no Zero Window Probes are triggered
> because snd_wnd is never updated to 0.
> 

Hi,

The problem you addressed is already fixed in this commit:
0e24d17bd966 ("tcp: implement RFC 7323 window retraction receiver requirements"),
which hasn't been picked to the 6.6 branch.

That patch doesn't have the Fix tag, so I'm not sure if it will be picked
to the 6.6 branch. Just CC the linux-stable :)

Thanks!
Menglong Dong

> 
> Simplified Packet Trace:
> 
> Assume Peer A's rcv_nxt = 1000, and Peer B's rcv_nxt = 5000 initially.
> 
> Time  Dir      Type        Seq   Ack   Win  Len  Status
> ------------------------------------------------------------------------
> T1:   B -> A   [PSH, ACK]  1000  5000  3000 100  (A hits OOM, rcv_nxt=1000)
> T2:   B -> A   [ACK]       1100  5000  3000 200  (Dropped due to A's OOM)
> T3:   B -> A   [PSH, ACK]  1300  5000  3000 200  (Dropped due to A's OOM)
> 
> T4:   A -> B   [PSH, ACK]  5000  1000  3000 100  (B hits OOM, rcv_nxt=5000)
> T5:   A -> B   [ACK]       5100  1000  3000 200  (Dropped due to B's OOM)
> T6:   A -> B   [PSH, ACK]  5300  1000  3000 200  (Dropped due to B's OOM)
> 
> -- Both sides are now in OOM. B's Seq is 1500; A's Seq is 5500 --
> 
> T7:   B -> A   [ZeroWin]   1500  5000  0    0    (Dropped: Seq 1500 != 1000)
> T8:   A -> B   [ZeroWin]   5500  1000  0    0    (Dropped: Seq 5500 != 5000)
> T9:   A -> B   [WinUpdate] 5500  1000  20   0    (Dropped: Seq 5500 != 5000)
> 
> Should we relax the sequence check in tcp_sequence() for zero window ACK?
> 
> Any feedback or guidance would be greatly appreciated.
> 
> -- 
> Best regards,
> Tangxin Xie
> 
> 
> 





^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-06-08 11:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04  8:22 [BUG] TCP connection deadlock under simultaneous bidirectional ICSK_ACK_NOMEM (OOM) xietangxin
2026-06-08 11:55 ` Menglong Dong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox