* [PATCH net-next 0/3] tcp: remove obsolete RFC3517/RFC6675 code
@ 2025-06-13 23:09 Neal Cardwell
2025-06-13 23:09 ` [PATCH net-next 1/3] tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code Neal Cardwell
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Neal Cardwell @ 2025-06-13 23:09 UTC (permalink / raw)
To: David Miller, Jakub Kicinski, Eric Dumazet; +Cc: netdev, Neal Cardwell
From: Neal Cardwell <ncardwell@google.com>
RACK-TLP loss detection has been enabled as the default loss detection
algorithm for Linux TCP since 2018, in:
commit b38a51fec1c1 ("tcp: disable RFC6675 loss detection")
In case users ran into unexpected bugs or performance regressions,
that commit allowed Linux system administrators to revert to using
RFC3517/RFC6675 loss recovery by setting net.ipv4.tcp_recovery to 0.
In the seven years since 2018, our team has not heard reports of
anyone reverting Linux TCP to use RFC3517/RFC6675 loss recovery, and
we can't find any record in web searches of such a revert.
RACK-TLP was published as a standards-track RFC, RFC8985, in February
2021.
Several other major TCP implementations have default-enabled RACK-TLP
at this point as well.
RACK-TLP offers several significant performance advantages over
RFC3517/RFC6675 loss recovery, including much better performance in
the common cases of tail drops, lost retransmissions, and reordering.
It is now time to remove the obsolete and unused RFC3517/RFC6675 loss
recovery code. This will allow a substantial simplification of the
Linux TCP code base, and removes 12 bytes of state in every tcp_sock
for 64-bit machines (8 bytes on 32-bit machines).
To arrange the commits in reasonable sizes, this patch series is split
into 3 commits:
(1) Removes the core RFC3517/RFC6675 logic.
(2) Removes the RFC3517/RFC6675 hint state and the first layer of logic that
updates that state.
(3) Removes the emptied-out tcp_clear_retrans_hints_partial() helper function
and all of its call sites.
Neal Cardwell (3):
tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code
tcp: remove RFC3517/RFC6675 hint state: lost_skb_hint, lost_cnt_hint
tcp: remove RFC3517/RFC6675 tcp_clear_retrans_hints_partial()
Documentation/networking/ip-sysctl.rst | 8 +-
.../networking/net_cachelines/tcp_sock.rst | 2 -
include/linux/tcp.h | 3 -
include/net/tcp.h | 6 -
net/ipv4/tcp.c | 3 +-
net/ipv4/tcp_input.c | 151 ++----------------
net/ipv4/tcp_output.c | 6 -
7 files changed, 15 insertions(+), 164 deletions(-)
--
2.50.0.rc1.591.g9c95f17f64-goog
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH net-next 1/3] tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code 2025-06-13 23:09 [PATCH net-next 0/3] tcp: remove obsolete RFC3517/RFC6675 code Neal Cardwell @ 2025-06-13 23:09 ` Neal Cardwell 2025-06-14 20:07 ` Jakub Kicinski 2025-06-14 22:03 ` kernel test robot 2025-06-13 23:09 ` [PATCH net-next 2/3] tcp: remove RFC3517/RFC6675 hint state: lost_skb_hint, lost_cnt_hint Neal Cardwell 2025-06-13 23:09 ` [PATCH net-next 3/3] tcp: remove RFC3517/RFC6675 tcp_clear_retrans_hints_partial() Neal Cardwell 2 siblings, 2 replies; 8+ messages in thread From: Neal Cardwell @ 2025-06-13 23:09 UTC (permalink / raw) To: David Miller, Jakub Kicinski, Eric Dumazet Cc: netdev, Neal Cardwell, Yuchung Cheng From: Neal Cardwell <ncardwell@google.com> RACK-TLP loss detection has been enabled as the default loss detection algorithm for Linux TCP since 2018, in: commit b38a51fec1c1 ("tcp: disable RFC6675 loss detection") In case users ran into unexpected bugs or performance regressions, that commit allowed Linux system administrators to revert to using RFC3517/RFC6675 loss recovery by setting net.ipv4.tcp_recovery to 0. In the seven years since 2018, our team has not heard reports of anyone reverting Linux TCP to use RFC3517/RFC6675 loss recovery, and we can't find any record in web searches of such a revert. RACK-TLP was published as a standards-track RFC, RFC8985, in February 2021. Several other major TCP implementations have default-enabled RACK-TLP at this point as well. RACK-TLP offers several significant performance advantages over RFC3517/RFC6675 loss recovery, including much better performance in the common cases of tail drops, lost retransmissions, and reordering. It is now time to remove the obsolete and unused RFC3517/RFC6675 loss recovery code. This will allow a substantial simplification of the Linux TCP code base, and removes 12 bytes of state in every tcp_sock for 64-bit machines (8 bytes on 32-bit machines). To arrange the commits in reasonable sizes, this patch series is split into 3 commits. The following 2 commits remove bookkeeping state and code that is no longer needed after this removal of RFC3517/RFC6675 loss recovery. Suggested-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> --- Documentation/networking/ip-sysctl.rst | 8 +- net/ipv4/tcp_input.c | 134 ++----------------------- 2 files changed, 14 insertions(+), 128 deletions(-) diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 0f1251cce3149..b31c055f576fa 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -645,9 +645,11 @@ tcp_recovery - INTEGER features. ========= ============================================================= - RACK: 0x1 enables the RACK loss detection for fast detection of lost - retransmissions and tail drops. It also subsumes and disables - RFC6675 recovery for SACK connections. + RACK: 0x1 enables RACK loss detection, for fast detection of lost + retransmissions and tail drops, and resilience to + reordering. currrently, setting this bit to 0 has no + effect, since RACK is the only supported loss detection + algorithm. RACK: 0x2 makes RACK's reordering window static (min_rtt/4). diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 8ec92dec321a9..b52eaa45e652f 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2151,12 +2151,6 @@ static inline void tcp_init_undo(struct tcp_sock *tp) tp->undo_retrans = -1; } -static bool tcp_is_rack(const struct sock *sk) -{ - return READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_recovery) & - TCP_RACK_LOSS_DETECTION; -} - /* If we detect SACK reneging, forget all SACK information * and reset tags completely, otherwise preserve SACKs. If receiver * dropped its ofo queue, we will know this due to reneging detection. @@ -2182,8 +2176,7 @@ static void tcp_timeout_mark_lost(struct sock *sk) skb_rbtree_walk_from(skb) { if (is_reneg) TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED; - else if (tcp_is_rack(sk) && skb != head && - tcp_rack_skb_timeout(tp, skb, 0) > 0) + else if (skb != head && tcp_rack_skb_timeout(tp, skb, 0) > 0) continue; /* Don't mark recently sent ones lost yet */ tcp_mark_skb_lost(sk, skb); } @@ -2264,22 +2257,6 @@ static bool tcp_check_sack_reneging(struct sock *sk, int *ack_flag) return false; } -/* Heurestics to calculate number of duplicate ACKs. There's no dupACKs - * counter when SACK is enabled (without SACK, sacked_out is used for - * that purpose). - * - * With reordering, holes may still be in flight, so RFC3517 recovery - * uses pure sacked_out (total number of SACKed segments) even though - * it violates the RFC that uses duplicate ACKs, often these are equal - * but when e.g. out-of-window ACKs or packet duplication occurs, - * they differ. Since neither occurs due to loss, TCP should really - * ignore them. - */ -static inline int tcp_dupack_heuristics(const struct tcp_sock *tp) -{ - return tp->sacked_out + 1; -} - /* Linux NewReno/SACK/ECN state machine. * -------------------------------------- * @@ -2332,13 +2309,7 @@ static inline int tcp_dupack_heuristics(const struct tcp_sock *tp) * * If the receiver supports SACK: * - * RFC6675/3517: It is the conventional algorithm. A packet is - * considered lost if the number of higher sequence packets - * SACKed is greater than or equal the DUPACK thoreshold - * (reordering). This is implemented in tcp_mark_head_lost and - * tcp_update_scoreboard. - * - * RACK (draft-ietf-tcpm-rack-01): it is a newer algorithm + * RACK (RFC8985): RACK is a newer loss detection algorithm * (2017-) that checks timing instead of counting DUPACKs. * Essentially a packet is considered lost if it's not S/ACKed * after RTT + reordering_window, where both metrics are @@ -2353,8 +2324,8 @@ static inline int tcp_dupack_heuristics(const struct tcp_sock *tp) * is lost (NewReno). This heuristics are the same in NewReno * and SACK. * - * Really tricky (and requiring careful tuning) part of algorithm - * is hidden in functions tcp_time_to_recover() and tcp_xmit_retransmit_queue(). + * The really tricky (and requiring careful tuning) part of the algorithm + * is hidden in the RACK code in tcp_recovery.c and tcp_xmit_retransmit_queue(). * The first determines the moment _when_ we should reduce CWND and, * hence, slow down forward transmission. In fact, it determines the moment * when we decide that hole is caused by loss, rather than by a reorder. @@ -2381,79 +2352,8 @@ static bool tcp_time_to_recover(struct sock *sk, int flag) { struct tcp_sock *tp = tcp_sk(sk); - /* Trick#1: The loss is proven. */ - if (tp->lost_out) - return true; - - /* Not-A-Trick#2 : Classic rule... */ - if (!tcp_is_rack(sk) && tcp_dupack_heuristics(tp) > tp->reordering) - return true; - - return false; -} - -/* Detect loss in event "A" above by marking head of queue up as lost. - * For RFC3517 SACK, a segment is considered lost if it - * has at least tp->reordering SACKed seqments above it; "packets" refers to - * the maximum SACKed segments to pass before reaching this limit. - */ -static void tcp_mark_head_lost(struct sock *sk, int packets, int mark_head) -{ - struct tcp_sock *tp = tcp_sk(sk); - struct sk_buff *skb; - int cnt; - /* Use SACK to deduce losses of new sequences sent during recovery */ - const u32 loss_high = tp->snd_nxt; - - WARN_ON(packets > tp->packets_out); - skb = tp->lost_skb_hint; - if (skb) { - /* Head already handled? */ - if (mark_head && after(TCP_SKB_CB(skb)->seq, tp->snd_una)) - return; - cnt = tp->lost_cnt_hint; - } else { - skb = tcp_rtx_queue_head(sk); - cnt = 0; - } - - skb_rbtree_walk_from(skb) { - /* TODO: do this better */ - /* this is not the most efficient way to do this... */ - tp->lost_skb_hint = skb; - tp->lost_cnt_hint = cnt; - - if (after(TCP_SKB_CB(skb)->end_seq, loss_high)) - break; - - if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) - cnt += tcp_skb_pcount(skb); - - if (cnt > packets) - break; - - if (!(TCP_SKB_CB(skb)->sacked & TCPCB_LOST)) - tcp_mark_skb_lost(sk, skb); - - if (mark_head) - break; - } - tcp_verify_left_out(tp); -} - -/* Account newly detected lost packet(s) */ - -static void tcp_update_scoreboard(struct sock *sk, int fast_rexmit) -{ - struct tcp_sock *tp = tcp_sk(sk); - - if (tcp_is_sack(tp)) { - int sacked_upto = tp->sacked_out - tp->reordering; - if (sacked_upto >= 0) - tcp_mark_head_lost(sk, sacked_upto, 0); - else if (fast_rexmit) - tcp_mark_head_lost(sk, 1, 1); - } + /* Has loss detection marked at least one packet lost? */ + return tp->lost_out != 0; } static bool tcp_tsopt_ecr_before(const struct tcp_sock *tp, u32 when) @@ -2990,17 +2890,8 @@ static void tcp_process_loss(struct sock *sk, int flag, int num_dupack, *rexmit = REXMIT_LOST; } -static bool tcp_force_fast_retransmit(struct sock *sk) -{ - struct tcp_sock *tp = tcp_sk(sk); - - return after(tcp_highest_sack_seq(tp), - tp->snd_una + tp->reordering * tp->mss_cache); -} - /* Undo during fast recovery after partial ACK. */ -static bool tcp_try_undo_partial(struct sock *sk, u32 prior_snd_una, - bool *do_lost) +static bool tcp_try_undo_partial(struct sock *sk, u32 prior_snd_una) { struct tcp_sock *tp = tcp_sk(sk); @@ -3025,9 +2916,6 @@ static bool tcp_try_undo_partial(struct sock *sk, u32 prior_snd_una, tcp_undo_cwnd_reduction(sk, true); NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPPARTIALUNDO); tcp_try_keep_open(sk); - } else { - /* Partial ACK arrived. Force fast retransmit. */ - *do_lost = tcp_force_fast_retransmit(sk); } return false; } @@ -3041,7 +2929,7 @@ static void tcp_identify_packet_loss(struct sock *sk, int *ack_flag) if (unlikely(tcp_is_reno(tp))) { tcp_newreno_mark_lost(sk, *ack_flag & FLAG_SND_UNA_ADVANCED); - } else if (tcp_is_rack(sk)) { + } else { u32 prior_retrans = tp->retrans_out; if (tcp_rack_mark_lost(sk)) @@ -3070,8 +2958,6 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una, struct tcp_sock *tp = tcp_sk(sk); int fast_rexmit = 0, flag = *ack_flag; bool ece_ack = flag & FLAG_ECE; - bool do_lost = num_dupack || ((flag & FLAG_DATA_SACKED) && - tcp_force_fast_retransmit(sk)); if (!tp->packets_out && tp->sacked_out) tp->sacked_out = 0; @@ -3120,7 +3006,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una, if (!(flag & FLAG_SND_UNA_ADVANCED)) { if (tcp_is_reno(tp)) tcp_add_reno_sack(sk, num_dupack, ece_ack); - } else if (tcp_try_undo_partial(sk, prior_snd_una, &do_lost)) + } else if (tcp_try_undo_partial(sk, prior_snd_una)) return; if (tcp_try_undo_dsack(sk)) @@ -3178,8 +3064,6 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una, fast_rexmit = 1; } - if (!tcp_is_rack(sk) && do_lost) - tcp_update_scoreboard(sk, fast_rexmit); *rexmit = REXMIT_LOST; } -- 2.50.0.rc1.591.g9c95f17f64-goog ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 1/3] tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code 2025-06-13 23:09 ` [PATCH net-next 1/3] tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code Neal Cardwell @ 2025-06-14 20:07 ` Jakub Kicinski 2025-06-15 0:18 ` Neal Cardwell 2025-06-14 22:03 ` kernel test robot 1 sibling, 1 reply; 8+ messages in thread From: Jakub Kicinski @ 2025-06-14 20:07 UTC (permalink / raw) To: Neal Cardwell Cc: David Miller, Eric Dumazet, netdev, Neal Cardwell, Yuchung Cheng On Fri, 13 Jun 2025 19:09:04 -0400 Neal Cardwell wrote: > RACK-TLP loss detection has been enabled as the default loss detection > algorithm for Linux TCP since 2018, in: > > commit b38a51fec1c1 ("tcp: disable RFC6675 loss detection") Hi! There is a warning here: net/ipv4/tcp_input.c:2959:6: warning: variable 'fast_rexmit' set but not used [-Wunused-but-set-variable] 2959 | int fast_rexmit = 0, flag = *ack_flag; | ^ and another one in patch 2: net/ipv4/tcp_input.c:3367:29: warning: variable ‘delta’ set but not used [-Wunused-but-set-variable] 3367 | int delta; | ^~~~~ -- pw-bot: cr ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 1/3] tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code 2025-06-14 20:07 ` Jakub Kicinski @ 2025-06-15 0:18 ` Neal Cardwell 0 siblings, 0 replies; 8+ messages in thread From: Neal Cardwell @ 2025-06-15 0:18 UTC (permalink / raw) To: Jakub Kicinski Cc: Neal Cardwell, David Miller, Eric Dumazet, netdev, Yuchung Cheng On Sat, Jun 14, 2025 at 4:07 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 13 Jun 2025 19:09:04 -0400 Neal Cardwell wrote: > > RACK-TLP loss detection has been enabled as the default loss detection > > algorithm for Linux TCP since 2018, in: > > > > commit b38a51fec1c1 ("tcp: disable RFC6675 loss detection") > > Hi! There is a warning here: > > net/ipv4/tcp_input.c:2959:6: warning: variable 'fast_rexmit' set but not used [-Wunused-but-set-variable] > 2959 | int fast_rexmit = 0, flag = *ack_flag; > | ^ > > and another one in patch 2: > > net/ipv4/tcp_input.c:3367:29: warning: variable ‘delta’ set but not used [-Wunused-but-set-variable] > 3367 | int delta; > | ^~~~~ Sorry about that! Sent a v2: https://lore.kernel.org/netdev/20250615001435.2390793-1-ncardwell.sw@gmail.com/T/#t thanks, neal ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 1/3] tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code 2025-06-13 23:09 ` [PATCH net-next 1/3] tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code Neal Cardwell 2025-06-14 20:07 ` Jakub Kicinski @ 2025-06-14 22:03 ` kernel test robot 1 sibling, 0 replies; 8+ messages in thread From: kernel test robot @ 2025-06-14 22:03 UTC (permalink / raw) To: Neal Cardwell; +Cc: llvm, oe-kbuild-all Hi Neal, [This is a private test report for your RFC patch.] kernel test robot noticed the following build warnings: [auto build test WARNING on net-next/main] url: https://github.com/intel-lab-lkp/linux/commits/Neal-Cardwell/tcp-remove-obsolete-and-unused-RFC3517-RFC6675-loss-recovery-code/20250614-071032 base: net-next/main patch link: https://lore.kernel.org/r/20250613230907.1702265-2-ncardwell.sw%40gmail.com patch subject: [PATCH net-next 1/3] tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code config: i386-buildonly-randconfig-004-20250615 (https://download.01.org/0day-ci/archive/20250615/202506150546.5gETHB6Z-lkp@intel.com/config) compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250615/202506150546.5gETHB6Z-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202506150546.5gETHB6Z-lkp@intel.com/ All warnings (new ones prefixed by >>): >> net/ipv4/tcp_input.c:2959:6: warning: variable 'fast_rexmit' set but not used [-Wunused-but-set-variable] 2959 | int fast_rexmit = 0, flag = *ack_flag; | ^ 1 warning generated. vim +/fast_rexmit +2959 net/ipv4/tcp_input.c 98e36d449cc681 Yuchung Cheng 2017-01-12 2941 ^1da177e4c3f41 Linus Torvalds 2005-04-16 2942 /* Process an event, which can update packets-in-flight not trivially. ^1da177e4c3f41 Linus Torvalds 2005-04-16 2943 * Main goal of this function is to calculate new estimate for left_out, ^1da177e4c3f41 Linus Torvalds 2005-04-16 2944 * taking into account both packets sitting in receiver's buffer and ^1da177e4c3f41 Linus Torvalds 2005-04-16 2945 * packets lost by network. ^1da177e4c3f41 Linus Torvalds 2005-04-16 2946 * 31ba0c10723e9e Yuchung Cheng 2016-02-02 2947 * Besides that it updates the congestion state when packet loss or ECN 31ba0c10723e9e Yuchung Cheng 2016-02-02 2948 * is detected. But it does not reduce the cwnd, it is done by the 31ba0c10723e9e Yuchung Cheng 2016-02-02 2949 * congestion control later. ^1da177e4c3f41 Linus Torvalds 2005-04-16 2950 * ^1da177e4c3f41 Linus Torvalds 2005-04-16 2951 * It does _not_ decide what to send, it is made in function ^1da177e4c3f41 Linus Torvalds 2005-04-16 2952 * tcp_xmit_retransmit_queue(). ^1da177e4c3f41 Linus Torvalds 2005-04-16 2953 */ 737ff314563ca2 Yuchung Cheng 2017-11-08 2954 static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una, 19119f298bb1f2 Eric Dumazet 2018-11-27 2955 int num_dupack, int *ack_flag, int *rexmit) ^1da177e4c3f41 Linus Torvalds 2005-04-16 2956 { 6687e988d9aeac Arnaldo Carvalho de Melo 2005-08-10 2957 struct inet_connection_sock *icsk = inet_csk(sk); ^1da177e4c3f41 Linus Torvalds 2005-04-16 2958 struct tcp_sock *tp = tcp_sk(sk); 31ba0c10723e9e Yuchung Cheng 2016-02-02 @2959 int fast_rexmit = 0, flag = *ack_flag; c634e34f6ebfb7 Yousuk Seung 2020-06-26 2960 bool ece_ack = flag & FLAG_ECE; ^1da177e4c3f41 Linus Torvalds 2005-04-16 2961 8ba6ddaaf86c4c Eric Dumazet 2017-10-05 2962 if (!tp->packets_out && tp->sacked_out) ^1da177e4c3f41 Linus Torvalds 2005-04-16 2963 tp->sacked_out = 0; ^1da177e4c3f41 Linus Torvalds 2005-04-16 2964 ^1da177e4c3f41 Linus Torvalds 2005-04-16 2965 /* Now state machine starts. ^1da177e4c3f41 Linus Torvalds 2005-04-16 2966 * A. ECE, hence prohibit cwnd undoing, the reduction is required. */ c634e34f6ebfb7 Yousuk Seung 2020-06-26 2967 if (ece_ack) ^1da177e4c3f41 Linus Torvalds 2005-04-16 2968 tp->prior_ssthresh = 0; ^1da177e4c3f41 Linus Torvalds 2005-04-16 2969 ^1da177e4c3f41 Linus Torvalds 2005-04-16 2970 /* B. In all the states check for reneging SACKs. */ d2a0fc372aca56 Fred Chen 2023-10-21 2971 if (tcp_check_sack_reneging(sk, ack_flag)) ^1da177e4c3f41 Linus Torvalds 2005-04-16 2972 return; ^1da177e4c3f41 Linus Torvalds 2005-04-16 2973 974c12360dfe6a Yuchung Cheng 2012-01-19 2974 /* C. Check consistency of the current state. */ 005903bc3a0e84 Ilpo Järvinen 2007-08-09 2975 tcp_verify_left_out(tp); ^1da177e4c3f41 Linus Torvalds 2005-04-16 2976 974c12360dfe6a Yuchung Cheng 2012-01-19 2977 /* D. Check state exit conditions. State can be terminated ^1da177e4c3f41 Linus Torvalds 2005-04-16 2978 * when high_seq is ACKed. */ 6687e988d9aeac Arnaldo Carvalho de Melo 2005-08-10 2979 if (icsk->icsk_ca_state == TCP_CA_Open) { a7abf3cd76e1e1 Eric Dumazet 2021-03-11 2980 WARN_ON(tp->retrans_out != 0 && !tp->syn_data); ^1da177e4c3f41 Linus Torvalds 2005-04-16 2981 tp->retrans_stamp = 0; ^1da177e4c3f41 Linus Torvalds 2005-04-16 2982 } else if (!before(tp->snd_una, tp->high_seq)) { 6687e988d9aeac Arnaldo Carvalho de Melo 2005-08-10 2983 switch (icsk->icsk_ca_state) { ^1da177e4c3f41 Linus Torvalds 2005-04-16 2984 case TCP_CA_CWR: ^1da177e4c3f41 Linus Torvalds 2005-04-16 2985 /* CWR is to be held something *above* high_seq ^1da177e4c3f41 Linus Torvalds 2005-04-16 2986 * is ACKed for CWR bit to reach receiver. */ ^1da177e4c3f41 Linus Torvalds 2005-04-16 2987 if (tp->snd_una != tp->high_seq) { 684bad1107571d Yuchung Cheng 2012-09-02 2988 tcp_end_cwnd_reduction(sk); 6687e988d9aeac Arnaldo Carvalho de Melo 2005-08-10 2989 tcp_set_ca_state(sk, TCP_CA_Open); ^1da177e4c3f41 Linus Torvalds 2005-04-16 2990 } ^1da177e4c3f41 Linus Torvalds 2005-04-16 2991 break; ^1da177e4c3f41 Linus Torvalds 2005-04-16 2992 ^1da177e4c3f41 Linus Torvalds 2005-04-16 2993 case TCP_CA_Recovery: e60402d0a909ca Ilpo Järvinen 2007-08-09 2994 if (tcp_is_reno(tp)) ^1da177e4c3f41 Linus Torvalds 2005-04-16 2995 tcp_reset_reno_sack(tp); 9e412ba7632f71 Ilpo Järvinen 2007-04-20 2996 if (tcp_try_undo_recovery(sk)) ^1da177e4c3f41 Linus Torvalds 2005-04-16 2997 return; 684bad1107571d Yuchung Cheng 2012-09-02 2998 tcp_end_cwnd_reduction(sk); ^1da177e4c3f41 Linus Torvalds 2005-04-16 2999 break; ^1da177e4c3f41 Linus Torvalds 2005-04-16 3000 } ^1da177e4c3f41 Linus Torvalds 2005-04-16 3001 } ^1da177e4c3f41 Linus Torvalds 2005-04-16 3002 974c12360dfe6a Yuchung Cheng 2012-01-19 3003 /* E. Process state. */ 6687e988d9aeac Arnaldo Carvalho de Melo 2005-08-10 3004 switch (icsk->icsk_ca_state) { ^1da177e4c3f41 Linus Torvalds 2005-04-16 3005 case TCP_CA_Recovery: 2e6052941ae1f2 Ilpo Järvinen 2007-08-02 3006 if (!(flag & FLAG_SND_UNA_ADVANCED)) { 19119f298bb1f2 Eric Dumazet 2018-11-27 3007 if (tcp_is_reno(tp)) c634e34f6ebfb7 Yousuk Seung 2020-06-26 3008 tcp_add_reno_sack(sk, num_dupack, ece_ack); f6e254d10bca5a Neal Cardwell 2025-06-13 3009 } else if (tcp_try_undo_partial(sk, prior_snd_una)) 7026b912f97d91 Yuchung Cheng 2013-05-29 3010 return; a29cb6914681a5 Yuchung Cheng 2021-06-02 3011 a29cb6914681a5 Yuchung Cheng 2021-06-02 3012 if (tcp_try_undo_dsack(sk)) a6458ab7fd4f42 Neal Cardwell 2024-06-26 3013 tcp_try_to_open(sk, flag); a29cb6914681a5 Yuchung Cheng 2021-06-02 3014 a29cb6914681a5 Yuchung Cheng 2021-06-02 3015 tcp_identify_packet_loss(sk, ack_flag); a29cb6914681a5 Yuchung Cheng 2021-06-02 3016 if (icsk->icsk_ca_state != TCP_CA_Recovery) { a29cb6914681a5 Yuchung Cheng 2021-06-02 3017 if (!tcp_time_to_recover(sk, flag)) c7d9d6a185a7ea Yuchung Cheng 2013-05-29 3018 return; a29cb6914681a5 Yuchung Cheng 2021-06-02 3019 /* Undo reverts the recovery state. If loss is evident, a29cb6914681a5 Yuchung Cheng 2021-06-02 3020 * starts a new recovery (e.g. reordering then loss); a29cb6914681a5 Yuchung Cheng 2021-06-02 3021 */ a29cb6914681a5 Yuchung Cheng 2021-06-02 3022 tcp_enter_recovery(sk, ece_ack); c7d9d6a185a7ea Yuchung Cheng 2013-05-29 3023 } ^1da177e4c3f41 Linus Torvalds 2005-04-16 3024 break; ^1da177e4c3f41 Linus Torvalds 2005-04-16 3025 case TCP_CA_Loss: 19119f298bb1f2 Eric Dumazet 2018-11-27 3026 tcp_process_loss(sk, flag, num_dupack, rexmit); 3868ab0f192581 Aananth V 2023-09-14 3027 if (icsk->icsk_ca_state != TCP_CA_Loss) 3868ab0f192581 Aananth V 2023-09-14 3028 tcp_update_rto_time(tp); 6ac06ecd3a5d1d Yuchung Cheng 2018-05-16 3029 tcp_identify_packet_loss(sk, ack_flag); 98e36d449cc681 Yuchung Cheng 2017-01-12 3030 if (!(icsk->icsk_ca_state == TCP_CA_Open || 98e36d449cc681 Yuchung Cheng 2017-01-12 3031 (*ack_flag & FLAG_LOST_RETRANS))) ^1da177e4c3f41 Linus Torvalds 2005-04-16 3032 return; 291a00d1a70f96 Yuchung Cheng 2015-07-01 3033 /* Change state if cwnd is undone or retransmits are lost */ a8eceea84a3a35 Joe Perches 2020-03-12 3034 fallthrough; ^1da177e4c3f41 Linus Torvalds 2005-04-16 3035 default: e60402d0a909ca Ilpo Järvinen 2007-08-09 3036 if (tcp_is_reno(tp)) { 2e6052941ae1f2 Ilpo Järvinen 2007-08-02 3037 if (flag & FLAG_SND_UNA_ADVANCED) ^1da177e4c3f41 Linus Torvalds 2005-04-16 3038 tcp_reset_reno_sack(tp); c634e34f6ebfb7 Yousuk Seung 2020-06-26 3039 tcp_add_reno_sack(sk, num_dupack, ece_ack); ^1da177e4c3f41 Linus Torvalds 2005-04-16 3040 } ^1da177e4c3f41 Linus Torvalds 2005-04-16 3041 f698204bd0bfdc Neal Cardwell 2011-11-16 3042 if (icsk->icsk_ca_state <= TCP_CA_Disorder) 9e412ba7632f71 Ilpo Järvinen 2007-04-20 3043 tcp_try_undo_dsack(sk); ^1da177e4c3f41 Linus Torvalds 2005-04-16 3044 6ac06ecd3a5d1d Yuchung Cheng 2018-05-16 3045 tcp_identify_packet_loss(sk, ack_flag); 750ea2bafa55aa Yuchung Cheng 2012-05-02 3046 if (!tcp_time_to_recover(sk, flag)) { 31ba0c10723e9e Yuchung Cheng 2016-02-02 3047 tcp_try_to_open(sk, flag); ^1da177e4c3f41 Linus Torvalds 2005-04-16 3048 return; ^1da177e4c3f41 Linus Torvalds 2005-04-16 3049 } ^1da177e4c3f41 Linus Torvalds 2005-04-16 3050 5d424d5a674f78 John Heffner 2006-03-20 3051 /* MTU probe failure: don't reduce cwnd */ 5d424d5a674f78 John Heffner 2006-03-20 3052 if (icsk->icsk_ca_state < TCP_CA_CWR && 5d424d5a674f78 John Heffner 2006-03-20 3053 icsk->icsk_mtup.probe_size && 0e7b13685f9a06 John Heffner 2006-03-20 3054 tp->snd_una == tp->mtu_probe.probe_seq_start) { 5d424d5a674f78 John Heffner 2006-03-20 3055 tcp_mtup_probe_failed(sk); 5d424d5a674f78 John Heffner 2006-03-20 3056 /* Restores the reduction we did in tcp_mtup_probe() */ 40570375356c87 Eric Dumazet 2022-04-05 3057 tcp_snd_cwnd_set(tp, tcp_snd_cwnd(tp) + 1); 5d424d5a674f78 John Heffner 2006-03-20 3058 tcp_simple_retransmit(sk); 5d424d5a674f78 John Heffner 2006-03-20 3059 return; 5d424d5a674f78 John Heffner 2006-03-20 3060 } 5d424d5a674f78 John Heffner 2006-03-20 3061 ^1da177e4c3f41 Linus Torvalds 2005-04-16 3062 /* Otherwise enter Recovery state */ c634e34f6ebfb7 Yousuk Seung 2020-06-26 3063 tcp_enter_recovery(sk, ece_ack); 85cc391c0e4584 Ilpo Järvinen 2007-11-15 3064 fast_rexmit = 1; ^1da177e4c3f41 Linus Torvalds 2005-04-16 3065 } ^1da177e4c3f41 Linus Torvalds 2005-04-16 3066 e662ca40de846e Yuchung Cheng 2016-02-02 3067 *rexmit = REXMIT_LOST; ^1da177e4c3f41 Linus Torvalds 2005-04-16 3068 } ^1da177e4c3f41 Linus Torvalds 2005-04-16 3069 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next 2/3] tcp: remove RFC3517/RFC6675 hint state: lost_skb_hint, lost_cnt_hint 2025-06-13 23:09 [PATCH net-next 0/3] tcp: remove obsolete RFC3517/RFC6675 code Neal Cardwell 2025-06-13 23:09 ` [PATCH net-next 1/3] tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code Neal Cardwell @ 2025-06-13 23:09 ` Neal Cardwell 2025-06-14 22:55 ` kernel test robot 2025-06-13 23:09 ` [PATCH net-next 3/3] tcp: remove RFC3517/RFC6675 tcp_clear_retrans_hints_partial() Neal Cardwell 2 siblings, 1 reply; 8+ messages in thread From: Neal Cardwell @ 2025-06-13 23:09 UTC (permalink / raw) To: David Miller, Jakub Kicinski, Eric Dumazet Cc: netdev, Neal Cardwell, Yuchung Cheng From: Neal Cardwell <ncardwell@google.com> Now that obsolete RFC3517/RFC6675 TCP loss detection has been removed, we can remove the somewhat complex and intrusive code to maintain its hint state: lost_skb_hint and lost_cnt_hint. This commit makes tcp_clear_retrans_hints_partial() empty. We will remove tcp_clear_retrans_hints_partial() and its call sites in the next commit. Suggested-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> --- .../networking/net_cachelines/tcp_sock.rst | 2 -- include/linux/tcp.h | 3 --- include/net/tcp.h | 1 - net/ipv4/tcp.c | 3 +-- net/ipv4/tcp_input.c | 15 --------------- net/ipv4/tcp_output.c | 5 ----- 6 files changed, 1 insertion(+), 28 deletions(-) diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst index bc9b2131bf7ac..7bbda5944ee2f 100644 --- a/Documentation/networking/net_cachelines/tcp_sock.rst +++ b/Documentation/networking/net_cachelines/tcp_sock.rst @@ -115,7 +115,6 @@ u32 lost_out read_mostly read_m u32 sacked_out read_mostly read_mostly tcp_left_out(tx);tcp_packets_in_flight(tx/rx);tcp_clean_rtx_queue(rx) struct hrtimer pacing_timer struct hrtimer compressed_ack_timer -struct sk_buff* lost_skb_hint read_mostly tcp_clean_rtx_queue struct sk_buff* retransmit_skb_hint read_mostly tcp_clean_rtx_queue struct rb_root out_of_order_queue read_mostly tcp_data_queue,tcp_fast_path_check struct sk_buff* ooo_last_skb @@ -123,7 +122,6 @@ struct tcp_sack_block[1] duplicate_sack struct tcp_sack_block[4] selective_acks struct tcp_sack_block[4] recv_sack_cache struct sk_buff* highest_sack read_write tcp_event_new_data_sent -int lost_cnt_hint u32 prior_ssthresh u32 high_seq u32 retrans_stamp diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 29f59d50dc73f..1a5737b3753d0 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -208,7 +208,6 @@ struct tcp_sock { u32 notsent_lowat; /* TCP_NOTSENT_LOWAT */ u16 gso_segs; /* Max number of segs per GSO packet */ /* from STCP, retrans queue hinting */ - struct sk_buff *lost_skb_hint; struct sk_buff *retransmit_skb_hint; __cacheline_group_end(tcp_sock_read_tx); @@ -419,8 +418,6 @@ struct tcp_sock { struct tcp_sack_block recv_sack_cache[4]; - int lost_cnt_hint; - u32 prior_ssthresh; /* ssthresh saved at recovery start */ u32 high_seq; /* snd_nxt at onset of congestion */ diff --git a/include/net/tcp.h b/include/net/tcp.h index 5078ad868feef..f57d121837949 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1813,7 +1813,6 @@ static inline void tcp_mib_init(struct net *net) /* from STCP */ static inline void tcp_clear_retrans_hints_partial(struct tcp_sock *tp) { - tp->lost_skb_hint = NULL; } static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f64f8276a73cd..27d3ef83ce7b2 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -5053,9 +5053,8 @@ static void __init tcp_struct_check(void) CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_tx, reordering); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_tx, notsent_lowat); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_tx, gso_segs); - CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_tx, lost_skb_hint); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_tx, retransmit_skb_hint); - CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_read_tx, 40); + CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_read_tx, 32); /* TXRX read-mostly hotpath cache lines */ CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_txrx, tsoffset); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b52eaa45e652f..9ded9b371d98a 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1451,11 +1451,6 @@ static u8 tcp_sacktag_one(struct sock *sk, tp->sacked_out += pcount; /* Out-of-order packets delivered */ state->sack_delivered += pcount; - - /* Lost marker hint past SACKed? Tweak RFC3517 cnt */ - if (tp->lost_skb_hint && - before(start_seq, TCP_SKB_CB(tp->lost_skb_hint)->seq)) - tp->lost_cnt_hint += pcount; } /* D-SACK. We can detect redundant retransmission in S|R and plain R @@ -1496,9 +1491,6 @@ static bool tcp_shifted_skb(struct sock *sk, struct sk_buff *prev, tcp_skb_timestamp_us(skb)); tcp_rate_skb_delivered(sk, skb, state->rate); - if (skb == tp->lost_skb_hint) - tp->lost_cnt_hint += pcount; - TCP_SKB_CB(prev)->end_seq += shifted; TCP_SKB_CB(skb)->seq += shifted; @@ -1531,10 +1523,6 @@ static bool tcp_shifted_skb(struct sock *sk, struct sk_buff *prev, if (skb == tp->retransmit_skb_hint) tp->retransmit_skb_hint = prev; - if (skb == tp->lost_skb_hint) { - tp->lost_skb_hint = prev; - tp->lost_cnt_hint -= tcp_skb_pcount(prev); - } TCP_SKB_CB(prev)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags; TCP_SKB_CB(prev)->eor = TCP_SKB_CB(skb)->eor; @@ -3319,8 +3307,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, const struct sk_buff *ack_skb, next = skb_rb_next(skb); if (unlikely(skb == tp->retransmit_skb_hint)) tp->retransmit_skb_hint = NULL; - if (unlikely(skb == tp->lost_skb_hint)) - tp->lost_skb_hint = NULL; tcp_highest_sack_replace(sk, skb, next); tcp_rtx_queue_unlink_and_free(skb, sk); } @@ -3385,7 +3371,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, const struct sk_buff *ack_skb, tcp_check_sack_reordering(sk, reord, 0); delta = prior_sacked - tp->sacked_out; - tp->lost_cnt_hint -= min(tp->lost_cnt_hint, delta); } } else if (skb && rtt_update && sack_rtt_us >= 0 && sack_rtt_us > tcp_stamp_us_delta(tp->tcp_mstamp, diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 3ac8d2d17e1ff..b0ffefe604b4c 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1554,11 +1554,6 @@ static void tcp_adjust_pcount(struct sock *sk, const struct sk_buff *skb, int de if (tcp_is_reno(tp) && decr > 0) tp->sacked_out -= min_t(u32, tp->sacked_out, decr); - if (tp->lost_skb_hint && - before(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(tp->lost_skb_hint)->seq) && - (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)) - tp->lost_cnt_hint -= decr; - tcp_verify_left_out(tp); } -- 2.50.0.rc1.591.g9c95f17f64-goog ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next 2/3] tcp: remove RFC3517/RFC6675 hint state: lost_skb_hint, lost_cnt_hint 2025-06-13 23:09 ` [PATCH net-next 2/3] tcp: remove RFC3517/RFC6675 hint state: lost_skb_hint, lost_cnt_hint Neal Cardwell @ 2025-06-14 22:55 ` kernel test robot 0 siblings, 0 replies; 8+ messages in thread From: kernel test robot @ 2025-06-14 22:55 UTC (permalink / raw) To: Neal Cardwell; +Cc: llvm, oe-kbuild-all [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=utf-8, Size: 17912 bytes --] Hi Neal, [This is a private test report for your RFC patch.] kernel test robot noticed the following build warnings: [auto build test WARNING on net-next/main] url: https://github.com/intel-lab-lkp/linux/commits/Neal-Cardwell/tcp-remove-obsolete-and-unused-RFC3517-RFC6675-loss-recovery-code/20250614-071032 base: net-next/main patch link: https://lore.kernel.org/r/20250613230907.1702265-3-ncardwell.sw%40gmail.com patch subject: [PATCH net-next 2/3] tcp: remove RFC3517/RFC6675 hint state: lost_skb_hint, lost_cnt_hint config: i386-buildonly-randconfig-004-20250615 (https://download.01.org/0day-ci/archive/20250615/202506150617.IXN2yJ4v-lkp@intel.com/config) compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250615/202506150617.IXN2yJ4v-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202506150617.IXN2yJ4v-lkp@intel.com/ All warnings (new ones prefixed by >>): net/ipv4/tcp_input.c:2947:6: warning: variable 'fast_rexmit' set but not used [-Wunused-but-set-variable] 2947 | int fast_rexmit = 0, flag = *ack_flag; | ^ >> net/ipv4/tcp_input.c:3367:8: warning: variable 'delta' set but not used [-Wunused-but-set-variable] 3367 | int delta; | ^ 2 warnings generated. vim +/delta +3367 net/ipv4/tcp_input.c ad971f616aa98e Eric Dumazet 2014-10-11 3212 7c46a03e67d11d Ilpo Järvinen 2007-09-20 3213 /* Remove acknowledged frames from the retransmission queue. If our packet 7c46a03e67d11d Ilpo Järvinen 2007-09-20 3214 * is before the ack sequence we can discard it as it's confirmed to have 7c46a03e67d11d Ilpo Järvinen 2007-09-20 3215 * arrived at the other end. 7c46a03e67d11d Ilpo Järvinen 2007-09-20 3216 */ e7ed11ee945438 Yousuk Seung 2021-01-20 3217 static int tcp_clean_rtx_queue(struct sock *sk, const struct sk_buff *ack_skb, e7ed11ee945438 Yousuk Seung 2021-01-20 3218 u32 prior_fack, u32 prior_snd_una, c634e34f6ebfb7 Yousuk Seung 2020-06-26 3219 struct tcp_sacktag_state *sack, bool ece_ack) ^1da177e4c3f41 Linus Torvalds 2005-04-16 3220 { 2d2abbab63f672 Stephen Hemminger 2005-11-10 3221 const struct inet_connection_sock *icsk = inet_csk(sk); 9a568de4818dea Eric Dumazet 2017-05-16 3222 u64 first_ackt, last_ackt; 740b0f1841f6e3 Eric Dumazet 2014-02-26 3223 struct tcp_sock *tp = tcp_sk(sk); 740b0f1841f6e3 Eric Dumazet 2014-02-26 3224 u32 prior_sacked = tp->sacked_out; 737ff314563ca2 Yuchung Cheng 2017-11-08 3225 u32 reord = tp->snd_nxt; /* lowest acked un-retx un-sacked seq */ 75c119afe14f74 Eric Dumazet 2017-10-05 3226 struct sk_buff *skb, *next; 34a6eda163975d Peter Senna Tschudin 2013-10-02 3227 bool fully_acked = true; 31231a8a873026 Kenneth Klette Jonassen 2015-05-01 3228 long sack_rtt_us = -1L; 740b0f1841f6e3 Eric Dumazet 2014-02-26 3229 long seq_rtt_us = -1L; 31231a8a873026 Kenneth Klette Jonassen 2015-05-01 3230 long ca_rtt_us = -1L; 7201883599ac8b Ilpo Järvinen 2007-12-30 3231 u32 pkts_acked = 0; 2f715c1dde6e17 Yuchung Cheng 2013-10-24 3232 bool rtt_update; 740b0f1841f6e3 Eric Dumazet 2014-02-26 3233 int flag = 0; 740b0f1841f6e3 Eric Dumazet 2014-02-26 3234 9a568de4818dea Eric Dumazet 2017-05-16 3235 first_ackt = 0; ^1da177e4c3f41 Linus Torvalds 2005-04-16 3236 75c119afe14f74 Eric Dumazet 2017-10-05 3237 for (skb = skb_rb_first(&sk->tcp_rtx_queue); skb; skb = next) { ^1da177e4c3f41 Linus Torvalds 2005-04-16 3238 struct tcp_skb_cb *scb = TCP_SKB_CB(skb); 737ff314563ca2 Yuchung Cheng 2017-11-08 3239 const u32 start_seq = scb->seq; 7c46a03e67d11d Ilpo Järvinen 2007-09-20 3240 u8 sacked = scb->sacked; 740b0f1841f6e3 Eric Dumazet 2014-02-26 3241 u32 acked_pcount; ^1da177e4c3f41 Linus Torvalds 2005-04-16 3242 2072c228c9a05c Gavin McCullagh 2007-12-29 3243 /* Determine how many packets and what bytes were acked, tso and else */ ^1da177e4c3f41 Linus Torvalds 2005-04-16 3244 if (after(scb->end_seq, tp->snd_una)) { 13fcf850cc2037 Ilpo Järvinen 2007-10-09 3245 if (tcp_skb_pcount(skb) == 1 || 13fcf850cc2037 Ilpo Järvinen 2007-10-09 3246 !after(tp->snd_una, scb->seq)) ^1da177e4c3f41 Linus Torvalds 2005-04-16 3247 break; 13fcf850cc2037 Ilpo Järvinen 2007-10-09 3248 7201883599ac8b Ilpo Järvinen 2007-12-30 3249 acked_pcount = tcp_tso_acked(sk, skb); 7201883599ac8b Ilpo Järvinen 2007-12-30 3250 if (!acked_pcount) 13fcf850cc2037 Ilpo Järvinen 2007-10-09 3251 break; a2a385d627e154 Eric Dumazet 2012-05-16 3252 fully_acked = false; 13fcf850cc2037 Ilpo Järvinen 2007-10-09 3253 } else { 7201883599ac8b Ilpo Järvinen 2007-12-30 3254 acked_pcount = tcp_skb_pcount(skb); ^1da177e4c3f41 Linus Torvalds 2005-04-16 3255 } ^1da177e4c3f41 Linus Torvalds 2005-04-16 3256 ad971f616aa98e Eric Dumazet 2014-10-11 3257 if (unlikely(sacked & TCPCB_RETRANS)) { ^1da177e4c3f41 Linus Torvalds 2005-04-16 3258 if (sacked & TCPCB_SACKED_RETRANS) 7201883599ac8b Ilpo Järvinen 2007-12-30 3259 tp->retrans_out -= acked_pcount; 7c46a03e67d11d Ilpo Järvinen 2007-09-20 3260 flag |= FLAG_RETRANS_DATA_ACKED; 3d0d26c7976bf1 Kenneth Klette Jonassen 2015-04-11 3261 } else if (!(sacked & TCPCB_SACKED_ACKED)) { 2fd66ffba50716 Eric Dumazet 2018-09-21 3262 last_ackt = tcp_skb_timestamp_us(skb); 9a568de4818dea Eric Dumazet 2017-05-16 3263 WARN_ON_ONCE(last_ackt == 0); 9a568de4818dea Eric Dumazet 2017-05-16 3264 if (!first_ackt) 740b0f1841f6e3 Eric Dumazet 2014-02-26 3265 first_ackt = last_ackt; 740b0f1841f6e3 Eric Dumazet 2014-02-26 3266 737ff314563ca2 Yuchung Cheng 2017-11-08 3267 if (before(start_seq, reord)) 737ff314563ca2 Yuchung Cheng 2017-11-08 3268 reord = start_seq; e33099f96d99c3 Yuchung Cheng 2013-03-20 3269 if (!after(scb->end_seq, tp->high_seq)) e33099f96d99c3 Yuchung Cheng 2013-03-20 3270 flag |= FLAG_ORIG_SACK_ACKED; c7caf8d3ed7a66 Ilpo Järvinen 2007-11-10 3271 } 7c46a03e67d11d Ilpo Järvinen 2007-09-20 3272 ddf1af6fa00e77 Yuchung Cheng 2016-02-02 3273 if (sacked & TCPCB_SACKED_ACKED) { 7201883599ac8b Ilpo Järvinen 2007-12-30 3274 tp->sacked_out -= acked_pcount; ddf1af6fa00e77 Yuchung Cheng 2016-02-02 3275 } else if (tcp_is_sack(tp)) { 082d4fa980b07b Yousuk Seung 2020-06-26 3276 tcp_count_delivered(tp, acked_pcount, ece_ack); ddf1af6fa00e77 Yuchung Cheng 2016-02-02 3277 if (!tcp_skb_spurious_retrans(tp, skb)) 1d0833df594390 Yuchung Cheng 2017-01-12 3278 tcp_rack_advance(tp, sacked, scb->end_seq, 2fd66ffba50716 Eric Dumazet 2018-09-21 3279 tcp_skb_timestamp_us(skb)); ddf1af6fa00e77 Yuchung Cheng 2016-02-02 3280 } ^1da177e4c3f41 Linus Torvalds 2005-04-16 3281 if (sacked & TCPCB_LOST) 7201883599ac8b Ilpo Järvinen 2007-12-30 3282 tp->lost_out -= acked_pcount; 7c46a03e67d11d Ilpo Järvinen 2007-09-20 3283 7201883599ac8b Ilpo Järvinen 2007-12-30 3284 tp->packets_out -= acked_pcount; 7201883599ac8b Ilpo Järvinen 2007-12-30 3285 pkts_acked += acked_pcount; b9f64820fb226a Yuchung Cheng 2016-09-19 3286 tcp_rate_skb_delivered(sk, skb, sack->rate); 13fcf850cc2037 Ilpo Järvinen 2007-10-09 3287 009a2e3e4ec395 Ilpo Järvinen 2007-09-20 3288 /* Initial outgoing SYN's get put onto the write_queue 009a2e3e4ec395 Ilpo Järvinen 2007-09-20 3289 * just like anything else we transmit. It is not 009a2e3e4ec395 Ilpo Järvinen 2007-09-20 3290 * true data, and if we misinform our callers that 009a2e3e4ec395 Ilpo Järvinen 2007-09-20 3291 * this ACK acks real data, we will erroneously exit 009a2e3e4ec395 Ilpo Järvinen 2007-09-20 3292 * connection startup slow start one packet too 009a2e3e4ec395 Ilpo Järvinen 2007-09-20 3293 * quickly. This is severely frowned upon behavior. 009a2e3e4ec395 Ilpo Järvinen 2007-09-20 3294 */ ad971f616aa98e Eric Dumazet 2014-10-11 3295 if (likely(!(scb->tcp_flags & TCPHDR_SYN))) { 009a2e3e4ec395 Ilpo Järvinen 2007-09-20 3296 flag |= FLAG_DATA_ACKED; 009a2e3e4ec395 Ilpo Järvinen 2007-09-20 3297 } else { 009a2e3e4ec395 Ilpo Järvinen 2007-09-20 3298 flag |= FLAG_SYN_ACKED; 009a2e3e4ec395 Ilpo Järvinen 2007-09-20 3299 tp->retrans_stamp = 0; 009a2e3e4ec395 Ilpo Järvinen 2007-09-20 3300 } 009a2e3e4ec395 Ilpo Järvinen 2007-09-20 3301 13fcf850cc2037 Ilpo Järvinen 2007-10-09 3302 if (!fully_acked) 13fcf850cc2037 Ilpo Järvinen 2007-10-09 3303 break; 13fcf850cc2037 Ilpo Järvinen 2007-10-09 3304 e7ed11ee945438 Yousuk Seung 2021-01-20 3305 tcp_ack_tstamp(sk, skb, ack_skb, prior_snd_una); fdb7eb21ddd3cc Yousuk Seung 2020-06-26 3306 75c119afe14f74 Eric Dumazet 2017-10-05 3307 next = skb_rb_next(skb); ad971f616aa98e Eric Dumazet 2014-10-11 3308 if (unlikely(skb == tp->retransmit_skb_hint)) ef9da47c7cc64d Ilpo Järvinen 2008-09-20 3309 tp->retransmit_skb_hint = NULL; 2bec445f9bf35e Eric Dumazet 2020-01-22 3310 tcp_highest_sack_replace(sk, skb, next); 75c119afe14f74 Eric Dumazet 2017-10-05 3311 tcp_rtx_queue_unlink_and_free(skb, sk); ^1da177e4c3f41 Linus Torvalds 2005-04-16 3312 } ^1da177e4c3f41 Linus Torvalds 2005-04-16 3313 0f87230d1a6c25 Francis Yan 2016-11-27 3314 if (!skb) 0f87230d1a6c25 Francis Yan 2016-11-27 3315 tcp_chrono_stop(sk, TCP_CHRONO_BUSY); 0f87230d1a6c25 Francis Yan 2016-11-27 3316 33f5f57eeb0c63 Ilpo Järvinen 2008-10-07 3317 if (likely(between(tp->snd_up, prior_snd_una, tp->snd_una))) 33f5f57eeb0c63 Ilpo Järvinen 2008-10-07 3318 tp->snd_up = tp->snd_una; 33f5f57eeb0c63 Ilpo Järvinen 2008-10-07 3319 ff91e9292fc5aa Yousuk Seung 2020-06-30 3320 if (skb) { e7ed11ee945438 Yousuk Seung 2021-01-20 3321 tcp_ack_tstamp(sk, skb, ack_skb, prior_snd_una); ff91e9292fc5aa Yousuk Seung 2020-06-30 3322 if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) cadbd0313bc897 Ilpo Järvinen 2007-12-31 3323 flag |= FLAG_SACK_RENEGING; ff91e9292fc5aa Yousuk Seung 2020-06-30 3324 } cadbd0313bc897 Ilpo Järvinen 2007-12-31 3325 9a568de4818dea Eric Dumazet 2017-05-16 3326 if (likely(first_ackt) && !(flag & FLAG_RETRANS_DATA_ACKED)) { 9a568de4818dea Eric Dumazet 2017-05-16 3327 seq_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, first_ackt); 9a568de4818dea Eric Dumazet 2017-05-16 3328 ca_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, last_ackt); eb36be0fd55e0a Yuchung Cheng 2018-01-17 3329 40bc6063796ec7 Yuchung Cheng 2021-09-23 3330 if (pkts_acked == 1 && fully_acked && !prior_sacked && 40bc6063796ec7 Yuchung Cheng 2021-09-23 3331 (tp->snd_una - prior_snd_una) < tp->mss_cache && eb36be0fd55e0a Yuchung Cheng 2018-01-17 3332 sack->rate->prior_delivered + 1 == tp->delivered && eb36be0fd55e0a Yuchung Cheng 2018-01-17 3333 !(flag & (FLAG_CA_ALERT | FLAG_SYN_ACKED))) { eb36be0fd55e0a Yuchung Cheng 2018-01-17 3334 /* Conservatively mark a delayed ACK. It's typically eb36be0fd55e0a Yuchung Cheng 2018-01-17 3335 * from a lone runt packet over the round trip to eb36be0fd55e0a Yuchung Cheng 2018-01-17 3336 * a receiver w/o out-of-order or CE events. eb36be0fd55e0a Yuchung Cheng 2018-01-17 3337 */ eb36be0fd55e0a Yuchung Cheng 2018-01-17 3338 flag |= FLAG_ACK_MAYBE_DELAYED; eb36be0fd55e0a Yuchung Cheng 2018-01-17 3339 } 31231a8a873026 Kenneth Klette Jonassen 2015-05-01 3340 } 9a568de4818dea Eric Dumazet 2017-05-16 3341 if (sack->first_sackt) { 9a568de4818dea Eric Dumazet 2017-05-16 3342 sack_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, sack->first_sackt); 9a568de4818dea Eric Dumazet 2017-05-16 3343 ca_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, sack->last_sackt); 740b0f1841f6e3 Eric Dumazet 2014-02-26 3344 } f672258391b42a Yuchung Cheng 2015-10-16 3345 rtt_update = tcp_ack_update_rtt(sk, flag, seq_rtt_us, sack_rtt_us, 775e68a93fe4d3 Yuchung Cheng 2017-05-31 3346 ca_rtt_us, sack->rate); ed08495c31bb99 Yuchung Cheng 2013-07-22 3347 7c46a03e67d11d Ilpo Järvinen 2007-09-20 3348 if (flag & FLAG_ACKED) { df92c8394e6ea0 Neal Cardwell 2017-08-03 3349 flag |= FLAG_SET_XMIT_TIMER; /* set TLP or RTO timer */ 72211e90501f95 Ilpo Järvinen 2009-03-14 3350 if (unlikely(icsk->icsk_mtup.probe_size && 72211e90501f95 Ilpo Järvinen 2009-03-14 3351 !after(tp->mtu_probe.probe_seq_end, tp->snd_una))) { 72211e90501f95 Ilpo Järvinen 2009-03-14 3352 tcp_mtup_probe_success(sk); 72211e90501f95 Ilpo Järvinen 2009-03-14 3353 } 72211e90501f95 Ilpo Järvinen 2009-03-14 3354 c7caf8d3ed7a66 Ilpo Järvinen 2007-11-10 3355 if (tcp_is_reno(tp)) { c634e34f6ebfb7 Yousuk Seung 2020-06-26 3356 tcp_remove_reno_sacks(sk, pkts_acked, ece_ack); 1236f22fbae15d Ilpo Järvinen 2018-06-29 3357 1236f22fbae15d Ilpo Järvinen 2018-06-29 3358 /* If any of the cumulatively ACKed segments was 1236f22fbae15d Ilpo Järvinen 2018-06-29 3359 * retransmitted, non-SACK case cannot confirm that 1236f22fbae15d Ilpo Järvinen 2018-06-29 3360 * progress was due to original transmission due to 1236f22fbae15d Ilpo Järvinen 2018-06-29 3361 * lack of TCPCB_SACKED_ACKED bits even if some of 1236f22fbae15d Ilpo Järvinen 2018-06-29 3362 * the packets may have been never retransmitted. 1236f22fbae15d Ilpo Järvinen 2018-06-29 3363 */ 1236f22fbae15d Ilpo Järvinen 2018-06-29 3364 if (flag & FLAG_RETRANS_DATA_ACKED) 1236f22fbae15d Ilpo Järvinen 2018-06-29 3365 flag &= ~FLAG_ORIG_SACK_ACKED; c7caf8d3ed7a66 Ilpo Järvinen 2007-11-10 3366 } else { 59a08cba6a604a Ilpo Järvinen 2009-02-28 @3367 int delta; 59a08cba6a604a Ilpo Järvinen 2009-02-28 3368 c7caf8d3ed7a66 Ilpo Järvinen 2007-11-10 3369 /* Non-retransmitted hole got filled? That's reordering */ 737ff314563ca2 Yuchung Cheng 2017-11-08 3370 if (before(reord, prior_fack)) 737ff314563ca2 Yuchung Cheng 2017-11-08 3371 tcp_check_sack_reordering(sk, reord, 0); 90638a04ad8484 Ilpo Järvinen 2008-09-20 3372 713bafea929201 Yuchung Cheng 2017-11-08 3373 delta = prior_sacked - tp->sacked_out; c7caf8d3ed7a66 Ilpo Järvinen 2007-11-10 3374 } 740b0f1841f6e3 Eric Dumazet 2014-02-26 3375 } else if (skb && rtt_update && sack_rtt_us >= 0 && 2fd66ffba50716 Eric Dumazet 2018-09-21 3376 sack_rtt_us > tcp_stamp_us_delta(tp->tcp_mstamp, 2fd66ffba50716 Eric Dumazet 2018-09-21 3377 tcp_skb_timestamp_us(skb))) { 2f715c1dde6e17 Yuchung Cheng 2013-10-24 3378 /* Do not re-arm RTO if the sack RTT is measured from data sent 2f715c1dde6e17 Yuchung Cheng 2013-10-24 3379 * after when the head was last (re)transmitted. Otherwise the 2f715c1dde6e17 Yuchung Cheng 2013-10-24 3380 * timeout may continue to extend in loss recovery. 2f715c1dde6e17 Yuchung Cheng 2013-10-24 3381 */ df92c8394e6ea0 Neal Cardwell 2017-08-03 3382 flag |= FLAG_SET_XMIT_TIMER; /* set TLP or RTO timer */ ^1da177e4c3f41 Linus Torvalds 2005-04-16 3383 } ^1da177e4c3f41 Linus Torvalds 2005-04-16 3384 756ee1729b2feb Lawrence Brakmo 2016-05-11 3385 if (icsk->icsk_ca_ops->pkts_acked) { 756ee1729b2feb Lawrence Brakmo 2016-05-11 3386 struct ack_sample sample = { .pkts_acked = pkts_acked, 40bc6063796ec7 Yuchung Cheng 2021-09-23 3387 .rtt_us = sack->rate->rtt_us }; 756ee1729b2feb Lawrence Brakmo 2016-05-11 3388 40bc6063796ec7 Yuchung Cheng 2021-09-23 3389 sample.in_flight = tp->mss_cache * 40bc6063796ec7 Yuchung Cheng 2021-09-23 3390 (tp->delivered - sack->rate->prior_delivered); 756ee1729b2feb Lawrence Brakmo 2016-05-11 3391 icsk->icsk_ca_ops->pkts_acked(sk, &sample); 756ee1729b2feb Lawrence Brakmo 2016-05-11 3392 } 138998fdd12e73 Kenneth Klette Jonassen 2015-05-01 3393 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next 3/3] tcp: remove RFC3517/RFC6675 tcp_clear_retrans_hints_partial() 2025-06-13 23:09 [PATCH net-next 0/3] tcp: remove obsolete RFC3517/RFC6675 code Neal Cardwell 2025-06-13 23:09 ` [PATCH net-next 1/3] tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code Neal Cardwell 2025-06-13 23:09 ` [PATCH net-next 2/3] tcp: remove RFC3517/RFC6675 hint state: lost_skb_hint, lost_cnt_hint Neal Cardwell @ 2025-06-13 23:09 ` Neal Cardwell 2 siblings, 0 replies; 8+ messages in thread From: Neal Cardwell @ 2025-06-13 23:09 UTC (permalink / raw) To: David Miller, Jakub Kicinski, Eric Dumazet Cc: netdev, Neal Cardwell, Yuchung Cheng From: Neal Cardwell <ncardwell@google.com> Now that we have removed the RFC3517/RFC6675 hints, tcp_clear_retrans_hints_partial() is empty, and can be removed. Suggested-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> --- include/net/tcp.h | 5 ----- net/ipv4/tcp_input.c | 2 -- net/ipv4/tcp_output.c | 1 - 3 files changed, 8 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index f57d121837949..9f852f5f8b95e 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1811,13 +1811,8 @@ static inline void tcp_mib_init(struct net *net) } /* from STCP */ -static inline void tcp_clear_retrans_hints_partial(struct tcp_sock *tp) -{ -} - static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp) { - tcp_clear_retrans_hints_partial(tp); tp->retransmit_skb_hint = NULL; } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 9ded9b371d98a..937a0085598e5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2769,8 +2769,6 @@ void tcp_simple_retransmit(struct sock *sk) tcp_mark_skb_lost(sk, skb); } - tcp_clear_retrans_hints_partial(tp); - if (!tp->lost_out) return; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b0ffefe604b4c..eb50746dc4820 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3247,7 +3247,6 @@ static bool tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb) TCP_SKB_CB(skb)->eor = TCP_SKB_CB(next_skb)->eor; /* changed transmit queue under us so clear hints */ - tcp_clear_retrans_hints_partial(tp); if (next_skb == tp->retransmit_skb_hint) tp->retransmit_skb_hint = skb; -- 2.50.0.rc1.591.g9c95f17f64-goog ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-06-15 0:18 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-13 23:09 [PATCH net-next 0/3] tcp: remove obsolete RFC3517/RFC6675 code Neal Cardwell 2025-06-13 23:09 ` [PATCH net-next 1/3] tcp: remove obsolete and unused RFC3517/RFC6675 loss recovery code Neal Cardwell 2025-06-14 20:07 ` Jakub Kicinski 2025-06-15 0:18 ` Neal Cardwell 2025-06-14 22:03 ` kernel test robot 2025-06-13 23:09 ` [PATCH net-next 2/3] tcp: remove RFC3517/RFC6675 hint state: lost_skb_hint, lost_cnt_hint Neal Cardwell 2025-06-14 22:55 ` kernel test robot 2025-06-13 23:09 ` [PATCH net-next 3/3] tcp: remove RFC3517/RFC6675 tcp_clear_retrans_hints_partial() Neal Cardwell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.