* [PATCH net] tcp: fix early ETIMEDOUT after spurious non-SACK RTO
@ 2022-09-03 12:10 Neal Cardwell
2022-09-03 17:03 ` Eric Dumazet
2022-09-06 9:20 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 3+ messages in thread
From: Neal Cardwell @ 2022-09-03 12:10 UTC (permalink / raw)
To: David Miller, Jakub Kicinski
Cc: netdev, Neal Cardwell, Nagaraj Arankal, Yuchung Cheng
From: Neal Cardwell <ncardwell@google.com>
Fix a bug reported and analyzed by Nagaraj Arankal, where the handling
of a spurious non-SACK RTO could cause a connection to fail to clear
retrans_stamp, causing a later RTO to very prematurely time out the
connection with ETIMEDOUT.
Here is the buggy scenario, expanding upon Nagaraj Arankal's excellent
report:
(*1) Send one data packet on a non-SACK connection
(*2) Because no ACK packet is received, the packet is retransmitted
and we enter CA_Loss; but this retransmission is spurious.
(*3) The ACK for the original data is received. The transmitted packet
is acknowledged. The TCP timestamp is before the retrans_stamp,
so tcp_may_undo() returns true, and tcp_try_undo_loss() returns
true without changing state to Open (because tcp_is_sack() is
false), and tcp_process_loss() returns without calling
tcp_try_undo_recovery(). Normally after undoing a CA_Loss
episode, tcp_fastretrans_alert() would see that the connection
has returned to CA_Open and fall through and call
tcp_try_to_open(), which would set retrans_stamp to 0. However,
for non-SACK connections we hold the connection in CA_Loss, so do
not fall through to call tcp_try_to_open() and do not set
retrans_stamp to 0. So retrans_stamp is (erroneously) still
non-zero.
At this point the first "retransmission event" has passed and
been recovered from. Any future retransmission is a completely
new "event". However, retrans_stamp is erroneously still
set. (And we are still in CA_Loss, which is correct.)
(*4) After 16 minutes (to correspond with tcp_retries2=15), a new data
packet is sent. Note: No data is transmitted between (*3) and
(*4) and we disabled keep alives.
The socket's timeout SHOULD be calculated from this point in
time, but instead it's calculated from the prior "event" 16
minutes ago (step (*2)).
(*5) Because no ACK packet is received, the packet is retransmitted.
(*6) At the time of the 2nd retransmission, the socket returns
ETIMEDOUT, prematurely, because retrans_stamp is (erroneously)
too far in the past (set at the time of (*2)).
This commit fixes this bug by ensuring that we reuse in
tcp_try_undo_loss() the same careful logic for non-SACK connections
that we have in tcp_try_undo_recovery(). To avoid duplicating logic,
we factor out that logic into a new
tcp_is_non_sack_preventing_reopen() helper and call that helper from
both undo functions.
Fixes: da34ac7626b5 ("tcp: only undo on partial ACKs in CA_Loss")
Reported-by: Nagaraj Arankal <nagaraj.p.arankal@hpe.com>
Link: https://lore.kernel.org/all/SJ0PR84MB1847BE6C24D274C46A1B9B0EB27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM/
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
---
net/ipv4/tcp_input.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b85a9f755da41..bc2ea12221f95 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2513,6 +2513,21 @@ static inline bool tcp_may_undo(const struct tcp_sock *tp)
return tp->undo_marker && (!tp->undo_retrans || tcp_packet_delayed(tp));
}
+static bool tcp_is_non_sack_preventing_reopen(struct sock *sk)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if (tp->snd_una == tp->high_seq && tcp_is_reno(tp)) {
+ /* Hold old state until something *above* high_seq
+ * is ACKed. For Reno it is MUST to prevent false
+ * fast retransmits (RFC2582). SACK TCP is safe. */
+ if (!tcp_any_retrans_done(sk))
+ tp->retrans_stamp = 0;
+ return true;
+ }
+ return false;
+}
+
/* People celebrate: "We love our President!" */
static bool tcp_try_undo_recovery(struct sock *sk)
{
@@ -2535,14 +2550,8 @@ static bool tcp_try_undo_recovery(struct sock *sk)
} else if (tp->rack.reo_wnd_persist) {
tp->rack.reo_wnd_persist--;
}
- if (tp->snd_una == tp->high_seq && tcp_is_reno(tp)) {
- /* Hold old state until something *above* high_seq
- * is ACKed. For Reno it is MUST to prevent false
- * fast retransmits (RFC2582). SACK TCP is safe. */
- if (!tcp_any_retrans_done(sk))
- tp->retrans_stamp = 0;
+ if (tcp_is_non_sack_preventing_reopen(sk))
return true;
- }
tcp_set_ca_state(sk, TCP_CA_Open);
tp->is_sack_reneg = 0;
return false;
@@ -2578,6 +2587,8 @@ static bool tcp_try_undo_loss(struct sock *sk, bool frto_undo)
NET_INC_STATS(sock_net(sk),
LINUX_MIB_TCPSPURIOUSRTOS);
inet_csk(sk)->icsk_retransmits = 0;
+ if (tcp_is_non_sack_preventing_reopen(sk))
+ return true;
if (frto_undo || tcp_is_sack(tp)) {
tcp_set_ca_state(sk, TCP_CA_Open);
tp->is_sack_reneg = 0;
--
2.37.2.789.g6183377224-goog
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net] tcp: fix early ETIMEDOUT after spurious non-SACK RTO
2022-09-03 12:10 [PATCH net] tcp: fix early ETIMEDOUT after spurious non-SACK RTO Neal Cardwell
@ 2022-09-03 17:03 ` Eric Dumazet
2022-09-06 9:20 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: Eric Dumazet @ 2022-09-03 17:03 UTC (permalink / raw)
To: Neal Cardwell, David Miller, Jakub Kicinski
Cc: netdev, Neal Cardwell, Nagaraj Arankal, Yuchung Cheng
On 9/3/22 05:10, Neal Cardwell wrote:
> From: Neal Cardwell <ncardwell@google.com>
>
> Fix a bug reported and analyzed by Nagaraj Arankal, where the handling
> of a spurious non-SACK RTO could cause a connection to fail to clear
> retrans_stamp, causing a later RTO to very prematurely time out the
> connection with ETIMEDOUT.
>
> Here is the buggy scenario, expanding upon Nagaraj Arankal's excellent
> report:
>
> (*1) Send one data packet on a non-SACK connection
>
> (*2) Because no ACK packet is received, the packet is retransmitted
> and we enter CA_Loss; but this retransmission is spurious.
>
> (*3) The ACK for the original data is received. The transmitted packet
> is acknowledged. The TCP timestamp is before the retrans_stamp,
> so tcp_may_undo() returns true, and tcp_try_undo_loss() returns
> true without changing state to Open (because tcp_is_sack() is
> false), and tcp_process_loss() returns without calling
> tcp_try_undo_recovery(). Normally after undoing a CA_Loss
> episode, tcp_fastretrans_alert() would see that the connection
> has returned to CA_Open and fall through and call
> tcp_try_to_open(), which would set retrans_stamp to 0. However,
> for non-SACK connections we hold the connection in CA_Loss, so do
> not fall through to call tcp_try_to_open() and do not set
> retrans_stamp to 0. So retrans_stamp is (erroneously) still
> non-zero.
>
> At this point the first "retransmission event" has passed and
> been recovered from. Any future retransmission is a completely
> new "event". However, retrans_stamp is erroneously still
> set. (And we are still in CA_Loss, which is correct.)
>
> (*4) After 16 minutes (to correspond with tcp_retries2=15), a new data
> packet is sent. Note: No data is transmitted between (*3) and
> (*4) and we disabled keep alives.
>
> The socket's timeout SHOULD be calculated from this point in
> time, but instead it's calculated from the prior "event" 16
> minutes ago (step (*2)).
>
> (*5) Because no ACK packet is received, the packet is retransmitted.
>
> (*6) At the time of the 2nd retransmission, the socket returns
> ETIMEDOUT, prematurely, because retrans_stamp is (erroneously)
> too far in the past (set at the time of (*2)).
>
> This commit fixes this bug by ensuring that we reuse in
> tcp_try_undo_loss() the same careful logic for non-SACK connections
> that we have in tcp_try_undo_recovery(). To avoid duplicating logic,
> we factor out that logic into a new
> tcp_is_non_sack_preventing_reopen() helper and call that helper from
> both undo functions.
>
> Fixes: da34ac7626b5 ("tcp: only undo on partial ACKs in CA_Loss")
> Reported-by: Nagaraj Arankal <nagaraj.p.arankal@hpe.com>
> Link: https://lore.kernel.org/all/SJ0PR84MB1847BE6C24D274C46A1B9B0EB27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM/
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Amazing that some folks still do not enable SACK...
Thanks !
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net] tcp: fix early ETIMEDOUT after spurious non-SACK RTO
2022-09-03 12:10 [PATCH net] tcp: fix early ETIMEDOUT after spurious non-SACK RTO Neal Cardwell
2022-09-03 17:03 ` Eric Dumazet
@ 2022-09-06 9:20 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-09-06 9:20 UTC (permalink / raw)
To: Neal Cardwell; +Cc: davem, kuba, netdev, ncardwell, nagaraj.p.arankal, ycheng
Hello:
This patch was applied to netdev/net.git (master)
by Paolo Abeni <pabeni@redhat.com>:
On Sat, 3 Sep 2022 08:10:23 -0400 you wrote:
> From: Neal Cardwell <ncardwell@google.com>
>
> Fix a bug reported and analyzed by Nagaraj Arankal, where the handling
> of a spurious non-SACK RTO could cause a connection to fail to clear
> retrans_stamp, causing a later RTO to very prematurely time out the
> connection with ETIMEDOUT.
>
> [...]
Here is the summary with links:
- [net] tcp: fix early ETIMEDOUT after spurious non-SACK RTO
https://git.kernel.org/netdev/net/c/686dc2db2a0f
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-09-06 9:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-03 12:10 [PATCH net] tcp: fix early ETIMEDOUT after spurious non-SACK RTO Neal Cardwell
2022-09-03 17:03 ` Eric Dumazet
2022-09-06 9:20 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).