* [PATCHv2 net-2.6 0/4 (was 0/3)]: TCP fixes @ 2008-04-07 11:55 Ilpo Järvinen 2008-04-07 11:55 ` [PATCHv2 net-2.6 1/4] [TCP]: Restore 2.6.24 mark_head_lost behavior for newreno/fack Ilpo Järvinen 2008-04-07 21:59 ` [PATCHv2 net-2.6 0/4 (was 0/3)]: TCP fixes Ilpo Järvinen 0 siblings, 2 replies; 8+ messages in thread From: Ilpo Järvinen @ 2008-04-07 11:55 UTC (permalink / raw) To: David Miller Cc: netdev, linux-kernel, Andrew Morton, Rafael J. Wysocki, Robin H. Johnson, Georgi Chorbadzhiyski, Soeren Sonnenburg, Denys Fedoryshchenko, Alessandro Suardi Hi Dave, Ok, here's the v2 series which tries hard to avoid opening loopholes to if condition in tcp_mark_head_lost which again would make the added tcp_fragment code to not execute :-). The rest (patches 3-4 are identical to previous series 2-3). ...I repost the whole series to make it easy for you if you were fast enough to deleted them from your inbox already, others please bear me :-). --- v1 intro follows --- Here are some TCP fixes that resulted from the last weeks reports & debug. The first one is quite likely to hit but it's considerably harder to get it print an overflow warning, while I've seen two reports about the second one (one of them is for 2.6.24.y), please ignore the earlier version of the tcp_simple_retransmit patch. The FRTO patch is once again result of code review rather than some report but seems necessary to avoid detection of some not that likely cases as spurious RTOs when there were some other losses in the same window with the probe. Nevertheless, it should be safe to return non-FRTO behavior. ...So far, they're just compile tested. I'll see if I find some time to boot them on the evening. There might still be one fackets_out miscounter awaiting detection & debug output (I hope Georgi catches it) because none of these seem an obvious reason for triggery of the !sacked_out && fackets_out trap. All of them are also valid for stables but please note that it won't be too easy, at least for the first & second patch because of recent changes (especially if stable != 2.6.24.y). I'll try to do the adapted versions later on for the stable. -- i. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCHv2 net-2.6 1/4] [TCP]: Restore 2.6.24 mark_head_lost behavior for newreno/fack 2008-04-07 11:55 [PATCHv2 net-2.6 0/4 (was 0/3)]: TCP fixes Ilpo Järvinen @ 2008-04-07 11:55 ` Ilpo Järvinen 2008-04-07 11:55 ` [PATCHv2 net-2.6 2/4] [TCP]: Fix NewReno's fast rexmit/recovery problems with GSOed skb Ilpo Järvinen 2008-04-07 21:59 ` [PATCHv2 net-2.6 0/4 (was 0/3)]: TCP fixes Ilpo Järvinen 1 sibling, 1 reply; 8+ messages in thread From: Ilpo Järvinen @ 2008-04-07 11:55 UTC (permalink / raw) To: David Miller Cc: netdev, linux-kernel, Andrew Morton, Rafael J. Wysocki, Robin H. Johnson, Georgi Chorbadzhiyski, Soeren Sonnenburg, Denys Fedoryshchenko, Alessandro Suardi, Ilpo J�rvinen The fast retransmission can be forced locally to the rfc3517 branch in tcp_update_scoreboard instead of making such fragile constructs deeper in tcp_mark_head_lost. This is necessary for the next patch which must not have loopholes for cnt > packets check. As one can notice, readability got some improvements too because of this :-). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 6e46b4c..f82b573 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2134,7 +2134,7 @@ static void tcp_verify_retransmit_hint(struct tcp_sock *tp, struct sk_buff *skb) /* Mark head of queue up as lost. With RFC3517 SACK, the packets is * is against sacked "cnt", otherwise it's against facked "cnt" */ -static void tcp_mark_head_lost(struct sock *sk, int packets, int fast_rexmit) +static void tcp_mark_head_lost(struct sock *sk, int packets) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; @@ -2161,7 +2161,7 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int fast_rexmit) (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)) cnt += tcp_skb_pcount(skb); - if (((!fast_rexmit || (tp->lost_out > 0)) && (cnt > packets)) || + if ((cnt > packets) || after(TCP_SKB_CB(skb)->end_seq, tp->high_seq)) break; if (!(TCP_SKB_CB(skb)->sacked & (TCPCB_SACKED_ACKED|TCPCB_LOST))) { @@ -2180,17 +2180,17 @@ static void tcp_update_scoreboard(struct sock *sk, int fast_rexmit) struct tcp_sock *tp = tcp_sk(sk); if (tcp_is_reno(tp)) { - tcp_mark_head_lost(sk, 1, fast_rexmit); + tcp_mark_head_lost(sk, 1); } else if (tcp_is_fack(tp)) { int lost = tp->fackets_out - tp->reordering; if (lost <= 0) lost = 1; - tcp_mark_head_lost(sk, lost, fast_rexmit); + tcp_mark_head_lost(sk, lost); } else { int sacked_upto = tp->sacked_out - tp->reordering; - if (sacked_upto < 0) - sacked_upto = 0; - tcp_mark_head_lost(sk, sacked_upto, fast_rexmit); + if (sacked_upto < fast_rexmit) + sacked_upto = fast_rexmit; + tcp_mark_head_lost(sk, sacked_upto); } /* New heuristics: it is possible only after we switched @@ -2524,7 +2524,7 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag) before(tp->snd_una, tp->high_seq) && icsk->icsk_ca_state != TCP_CA_Open && tp->fackets_out > tp->reordering) { - tcp_mark_head_lost(sk, tp->fackets_out - tp->reordering, 0); + tcp_mark_head_lost(sk, tp->fackets_out - tp->reordering); NET_INC_STATS_BH(LINUX_MIB_TCPLOSS); } -- 1.5.2.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCHv2 net-2.6 2/4] [TCP]: Fix NewReno's fast rexmit/recovery problems with GSOed skb 2008-04-07 11:55 ` [PATCHv2 net-2.6 1/4] [TCP]: Restore 2.6.24 mark_head_lost behavior for newreno/fack Ilpo Järvinen @ 2008-04-07 11:55 ` Ilpo Järvinen 2008-04-07 11:55 ` [PATCHv2 net-2.6 3/4] [TCP]: tcp_simple_retransmit can cause S+L Ilpo Järvinen 0 siblings, 1 reply; 8+ messages in thread From: Ilpo Järvinen @ 2008-04-07 11:55 UTC (permalink / raw) To: David Miller Cc: netdev, linux-kernel, Andrew Morton, Rafael J. Wysocki, Robin H. Johnson, Georgi Chorbadzhiyski, Soeren Sonnenburg, Denys Fedoryshchenko, Alessandro Suardi, Ilpo J�rvinen Fixes a long-standing bug which makes NewReno recovery crippled. With GSO the whole head skb was marked as LOST which is in violation of NewReno procedure that only wants to mark one packet and ended up breaking our TCP code by causing counter overflow because our code was built on top of assumption about valid NewReno procedure. This manifested as triggering a WARN_ON for the overflow in a number of places. It seems relatively safe alternative to just do nothing if tcp_fragment fails due to oom because another duplicate ACK is likely to be received soon and the fragmentation will be retried. Special thanks goes to Soeren Sonnenburg <kernel@nn7.de> who was lucky enough to be able to reproduce this so that the warning for the overflow was hit. It's not as easy task as it seems even if this bug happens quite often because the amount of outstanding data is pretty significant for the mismarkings to lead to an overflow. Because it's very late in 2.6.25-rc cycle (if this even makes in time), I didn't want to touch anything with SACK enabled here. Fragmenting might be useful for it as well but it's more or less a policy decision rather than mandatory fix. Thus there's no need to rush and we can postpone considering tcp_fragment with SACK for 2.6.26. In 2.6.24 and earlier, this very same bug existed but the effect is slightly different because of a small changes in the if conditions that fit to the patch's context. With them nothing got lost marker and thus no retransmissions happened. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 22 ++++++++++++++++++---- 1 files changed, 18 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index f82b573..f053bfe 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2138,7 +2138,9 @@ static void tcp_mark_head_lost(struct sock *sk, int packets) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; - int cnt; + int cnt, oldcnt; + int err; + unsigned int mss; BUG_TRAP(packets <= tp->packets_out); if (tp->lost_skb_hint) { @@ -2157,13 +2159,25 @@ static void tcp_mark_head_lost(struct sock *sk, int packets) tp->lost_skb_hint = skb; tp->lost_cnt_hint = cnt; + if (after(TCP_SKB_CB(skb)->end_seq, tp->high_seq)) + break; + + oldcnt = cnt; if (tcp_is_fack(tp) || tcp_is_reno(tp) || (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)) cnt += tcp_skb_pcount(skb); - if ((cnt > packets) || - after(TCP_SKB_CB(skb)->end_seq, tp->high_seq)) - break; + if (cnt > packets) { + if (tcp_is_sack(tp) || (oldcnt >= packets)) + break; + + mss = skb_shinfo(skb)->gso_size; + err = tcp_fragment(sk, skb, (packets - oldcnt) * mss, mss); + if (err < 0) + break; + cnt = packets; + } + if (!(TCP_SKB_CB(skb)->sacked & (TCPCB_SACKED_ACKED|TCPCB_LOST))) { TCP_SKB_CB(skb)->sacked |= TCPCB_LOST; tp->lost_out += tcp_skb_pcount(skb); -- 1.5.2.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCHv2 net-2.6 3/4] [TCP]: tcp_simple_retransmit can cause S+L 2008-04-07 11:55 ` [PATCHv2 net-2.6 2/4] [TCP]: Fix NewReno's fast rexmit/recovery problems with GSOed skb Ilpo Järvinen @ 2008-04-07 11:55 ` Ilpo Järvinen 2008-04-07 11:55 ` [PATCHv2 net-2.6 4/4] [TCP]: Don't allow FRTO to take place while MTU is being probed Ilpo Järvinen 0 siblings, 1 reply; 8+ messages in thread From: Ilpo Järvinen @ 2008-04-07 11:55 UTC (permalink / raw) To: David Miller Cc: netdev, linux-kernel, Andrew Morton, Rafael J. Wysocki, Robin H. Johnson, Georgi Chorbadzhiyski, Soeren Sonnenburg, Denys Fedoryshchenko, Alessandro Suardi, Ilpo J�rvinen This fixes Bugzilla #10384 tcp_simple_retransmit does L increment without any checking whatsoever for overflowing S+L when Reno is in use. The simplest scenario I can currently think of is rather complex in practice (there might be some more straightforward cases though). Ie., if mss is reduced during mtu probing, it may end up marking everything lost and if some duplicate ACKs arrived prior to that sacked_out will be non-zero as well, leading to S+L > packets_out, tcp_clean_rtx_queue on the next cumulative ACK or tcp_fastretrans_alert on the next duplicate ACK will fix the S counter. More straightforward (but questionable) solution would be to just call tcp_reset_reno_sack() in tcp_simple_retransmit but it would negatively impact the probe's retransmission, ie., the retransmissions would not occur if some duplicate ACKs had arrived. So I had to add reno sacked_out reseting to CA_Loss state when the first cumulative ACK arrives (this stale sacked_out might actually be the explanation for the reports of left_out overflows in kernel prior to 2.6.23 and S+L overflow reports of 2.6.24). However, this alone won't be enough to fix kernel before 2.6.24 because it is building on top of the commit 1b6d427bb7e ([TCP]: Reduce sacked_out with reno when purging write_queue) to keep the sacked_out from overflowing. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com> --- include/net/tcp.h | 2 ++ net/ipv4/tcp_input.c | 24 ++++++++++++++++++------ net/ipv4/tcp_output.c | 3 +++ 3 files changed, 23 insertions(+), 6 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 723b368..8f5fc52 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -760,6 +760,8 @@ static inline unsigned int tcp_packets_in_flight(const struct tcp_sock *tp) return tp->packets_out - tcp_left_out(tp) + tp->retrans_out; } +extern int tcp_limit_reno_sacked(struct tcp_sock *tp); + /* If cwnd > ssthresh, we may raise ssthresh to be half-way to cwnd. * The exception is rate halving phase, when cwnd is decreasing towards * ssthresh. diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index f053bfe..a2a06d6 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1625,13 +1625,11 @@ out: return flag; } -/* If we receive more dupacks than we expected counting segments - * in assumption of absent reordering, interpret this as reordering. - * The only another reason could be bug in receiver TCP. +/* Limits sacked_out so that sum with lost_out isn't ever larger than + * packets_out. Returns zero if sacked_out adjustement wasn't necessary. */ -static void tcp_check_reno_reordering(struct sock *sk, const int addend) +int tcp_limit_reno_sacked(struct tcp_sock *tp) { - struct tcp_sock *tp = tcp_sk(sk); u32 holes; holes = max(tp->lost_out, 1U); @@ -1639,8 +1637,20 @@ static void tcp_check_reno_reordering(struct sock *sk, const int addend) if ((tp->sacked_out + holes) > tp->packets_out) { tp->sacked_out = tp->packets_out - holes; - tcp_update_reordering(sk, tp->packets_out + addend, 0); + return 1; } + return 0; +} + +/* If we receive more dupacks than we expected counting segments + * in assumption of absent reordering, interpret this as reordering. + * The only another reason could be bug in receiver TCP. + */ +static void tcp_check_reno_reordering(struct sock *sk, const int addend) +{ + struct tcp_sock *tp = tcp_sk(sk); + if (tcp_limit_reno_sacked(tp)) + tcp_update_reordering(sk, tp->packets_out + addend, 0); } /* Emulate SACKs for SACKless connection: account for a new dupack. */ @@ -2600,6 +2610,8 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag) case TCP_CA_Loss: if (flag & FLAG_DATA_ACKED) icsk->icsk_retransmits = 0; + if (tcp_is_reno(tp) && flag & FLAG_SND_UNA_ADVANCED) + tcp_reset_reno_sack(tp); if (!tcp_try_undo_loss(sk)) { tcp_moderate_cwnd(tp); tcp_xmit_retransmit_queue(sk); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 6e25540..441fdd3 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1808,6 +1808,9 @@ void tcp_simple_retransmit(struct sock *sk) if (!lost) return; + if (tcp_is_reno(tp)) + tcp_limit_reno_sacked(tp); + tcp_verify_left_out(tp); /* Don't muck with the congestion window here. -- 1.5.2.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCHv2 net-2.6 4/4] [TCP]: Don't allow FRTO to take place while MTU is being probed 2008-04-07 11:55 ` [PATCHv2 net-2.6 3/4] [TCP]: tcp_simple_retransmit can cause S+L Ilpo Järvinen @ 2008-04-07 11:55 ` Ilpo Järvinen 0 siblings, 0 replies; 8+ messages in thread From: Ilpo Järvinen @ 2008-04-07 11:55 UTC (permalink / raw) To: David Miller Cc: netdev, linux-kernel, Andrew Morton, Rafael J. Wysocki, Robin H. Johnson, Georgi Chorbadzhiyski, Soeren Sonnenburg, Denys Fedoryshchenko, Alessandro Suardi, Ilpo J�rvinen MTU probe can cause some remedies for FRTO because the normal packet ordering may be violated allowing FRTO to make a wrong decision (it might not be that serious threat for anything though). Thus it's safer to not run FRTO while MTU probe is underway. It seems that the basic FRTO variant should also look for an skb at probe_seq.start to check if that's retransmitted one but I didn't implement it now (plain seqno in window check isn't robust against wraparounds). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a2a06d6..2b09a89 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1691,11 +1691,16 @@ static inline void tcp_reset_reno_sack(struct tcp_sock *tp) int tcp_use_frto(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); + const struct inet_connection_sock *icsk = inet_csk(sk); struct sk_buff *skb; if (!sysctl_tcp_frto) return 0; + /* MTU probe and F-RTO won't really play nicely along currently */ + if (icsk->icsk_mtup.probe_size) + return 0; + if (IsSackFrto()) return 1; -- 1.5.2.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCHv2 net-2.6 0/4 (was 0/3)]: TCP fixes 2008-04-07 11:55 [PATCHv2 net-2.6 0/4 (was 0/3)]: TCP fixes Ilpo Järvinen 2008-04-07 11:55 ` [PATCHv2 net-2.6 1/4] [TCP]: Restore 2.6.24 mark_head_lost behavior for newreno/fack Ilpo Järvinen @ 2008-04-07 21:59 ` Ilpo Järvinen 2008-04-08 9:40 ` David Miller 2008-04-09 5:56 ` Soeren Sonnenburg 1 sibling, 2 replies; 8+ messages in thread From: Ilpo Järvinen @ 2008-04-07 21:59 UTC (permalink / raw) To: David Miller Cc: Netdev, LKML, Andrew Morton, Rafael J. Wysocki, Robin H. Johnson, Georgi Chorbadzhiyski, Soeren Sonnenburg, Denys Fedoryshchenko, Alessandro Suardi [-- Attachment #1: Type: TEXT/PLAIN, Size: 397 bytes --] On Mon, 7 Apr 2008, Ilpo Järvinen wrote: > Ok, here's the v2 series [...snip...] > ...So far, they're just compile tested. I'll see if I find some time > to boot them on the evening. $ uptime 00:53:23 up 1:20, 6 users, load average: 0.54, 0.22, 0.13 ...some torrent & browsing, mainly with tso & tcp_sack = 0. Didn't hit any problems with it (I had my debug patch in as well). -- i. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2 net-2.6 0/4 (was 0/3)]: TCP fixes 2008-04-07 21:59 ` [PATCHv2 net-2.6 0/4 (was 0/3)]: TCP fixes Ilpo Järvinen @ 2008-04-08 9:40 ` David Miller 2008-04-09 5:56 ` Soeren Sonnenburg 1 sibling, 0 replies; 8+ messages in thread From: David Miller @ 2008-04-08 9:40 UTC (permalink / raw) To: ilpo.jarvinen Cc: netdev, linux-kernel, akpm, rjw, robbat2, gf, kernel, denys, alessandro.suardi From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Tue, 8 Apr 2008 00:59:45 +0300 (EEST) > On Mon, 7 Apr 2008, Ilpo Järvinen wrote: > > > Ok, here's the v2 series > > [...snip...] > > > ...So far, they're just compile tested. I'll see if I find some time > > to boot them on the evening. > > $ uptime > 00:53:23 up 1:20, 6 users, load average: 0.54, 0.22, 0.13 > > ...some torrent & browsing, mainly with tso & tcp_sack = 0. Didn't hit > any problems with it (I had my debug patch in as well). I've applied all four patches to net-2.6 and will push them to Linus when I get a chance. Thanks! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2 net-2.6 0/4 (was 0/3)]: TCP fixes 2008-04-07 21:59 ` [PATCHv2 net-2.6 0/4 (was 0/3)]: TCP fixes Ilpo Järvinen 2008-04-08 9:40 ` David Miller @ 2008-04-09 5:56 ` Soeren Sonnenburg 1 sibling, 0 replies; 8+ messages in thread From: Soeren Sonnenburg @ 2008-04-09 5:56 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, Netdev, LKML, Andrew Morton, Rafael J. Wysocki, Robin H. Johnson, Georgi Chorbadzhiyski, Denys Fedoryshchenko, Alessandro Suardi On Tue, 2008-04-08 at 00:59 +0300, Ilpo Järvinen wrote: > On Mon, 7 Apr 2008, Ilpo Järvinen wrote: > > > Ok, here's the v2 series > > [...snip...] > > > ...So far, they're just compile tested. I'll see if I find some time > > to boot them on the evening. > > $ uptime > 00:53:23 up 1:20, 6 users, load average: 0.54, 0.22, 0.13 > > ...some torrent & browsing, mainly with tso & tcp_sack = 0. Didn't hit > any problems with it (I had my debug patch in as well). I've applied the 4 patches and created massive network load over the last 12 hours - the problem did not re-appear and the machine is still alive and kicking. So I guess you've fixed it :-) Soeren. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-04-09 5:57 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-04-07 11:55 [PATCHv2 net-2.6 0/4 (was 0/3)]: TCP fixes Ilpo Järvinen 2008-04-07 11:55 ` [PATCHv2 net-2.6 1/4] [TCP]: Restore 2.6.24 mark_head_lost behavior for newreno/fack Ilpo Järvinen 2008-04-07 11:55 ` [PATCHv2 net-2.6 2/4] [TCP]: Fix NewReno's fast rexmit/recovery problems with GSOed skb Ilpo Järvinen 2008-04-07 11:55 ` [PATCHv2 net-2.6 3/4] [TCP]: tcp_simple_retransmit can cause S+L Ilpo Järvinen 2008-04-07 11:55 ` [PATCHv2 net-2.6 4/4] [TCP]: Don't allow FRTO to take place while MTU is being probed Ilpo Järvinen 2008-04-07 21:59 ` [PATCHv2 net-2.6 0/4 (was 0/3)]: TCP fixes Ilpo Järvinen 2008-04-08 9:40 ` David Miller 2008-04-09 5:56 ` Soeren Sonnenburg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).