From mboxrd@z Thu Jan 1 00:00:00 1970 From: "David S. Miller" Subject: Re: bad TSO performance in 2.6.9-rc2-BK Date: Wed, 29 Sep 2004 16:29:23 -0700 Sender: netdev-bounce@oss.sgi.com Message-ID: <20040929162923.796d142e.davem@davemloft.net> References: <415B24C0.2020208@us.ibm.com> <20040929145050.71afa1ac.davem@davemloft.net> <20040929215613.GC26714@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: niv@us.ibm.com, jheffner@psc.edu, ak@suse.de, herbert@gondor.apana.org.au, andy.grover@gmail.com, anton@samba.org, netdev@oss.sgi.com Return-path: To: Andi Kleen In-Reply-To: <20040929215613.GC26714@wotan.suse.de> Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org Ok, found the bug. This bug was in the existing code and the assertions in the new code was merely discovering it. What happens is that we set TCP_SKB_CB(skb)->tso_factor, for tcp_snd_test() or similar, for example. But this packet does not go out, and tcp_sendmsg() or tcp_sendpages() tacks more data onto the tail of the SKB without updating TCP_SKB_CB(skb)->tso_factor. Simply setting the tso_factor to zero in these cases gets it recalculated, and reduces the number of tso_factor recalculations. This works because by definition these SKBs have not been sent for the first time yet. We never tack data onto the end of retransmitted SKBs. And because of that invariante there will be a tcp_set_skb_tso_factor() call before it gets to tcp_transmit_skb() (which will BUG otherwise). I've also audited other spots that don't update the tso_factor when they should, tcp_trim_head() was another such spot. Finally, I added an assertion to retransmit queue collapsing to make sure we don't collapse SKBs with non-1 TSO factors. And the Solaris FIN workaround pskb_trim() needs to reset the TSO factor as well. Let me know if this cures the issue, and if it does we can move back to Andi's performance issue and the MSS stuff John Heffner just discovered. # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/09/29 16:09:18-07:00 davem@nuts.davemloft.net # [TCP]: Fix inaccuracies in tso_factor settings. # # 1) If tcp_{sendmsg,sendpage} tacks on more data to an # existing SKB, this can make tso_factor inaccurate. # Invalidate it, which forces it to be recalculated, # by simply setting it to zero. # 2) __tcp_trim_head() changes skb->len thus we need # to recalculate tso_factor # 3) BUG check that tcp_retrans_try_collapse() does not # try to collapse packets with non-1 tso_factor # 4) The Solaris FIN workaround in tcp_retransmit_skb() # changes packet size, need to fixup tso_factor # # Signed-off-by: David S. Miller # # net/ipv4/tcp_output.c # 2004/09/29 16:06:54-07:00 davem@nuts.davemloft.net +15 -5 # [TCP]: Fix inaccuracies in tso_factor settings. # # net/ipv4/tcp.c # 2004/09/29 16:06:54-07:00 davem@nuts.davemloft.net +2 -0 # [TCP]: Fix inaccuracies in tso_factor settings. # diff -Nru a/net/ipv4/tcp.c b/net/ipv4/tcp.c --- a/net/ipv4/tcp.c 2004-09-29 16:09:42 -07:00 +++ b/net/ipv4/tcp.c 2004-09-29 16:09:42 -07:00 @@ -691,6 +691,7 @@ skb->ip_summed = CHECKSUM_HW; tp->write_seq += copy; TCP_SKB_CB(skb)->end_seq += copy; + TCP_SKB_CB(skb)->tso_factor = 0; if (!copied) TCP_SKB_CB(skb)->flags &= ~TCPCB_FLAG_PSH; @@ -937,6 +938,7 @@ tp->write_seq += copy; TCP_SKB_CB(skb)->end_seq += copy; + TCP_SKB_CB(skb)->tso_factor = 0; from += copy; copied += copy; diff -Nru a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c --- a/net/ipv4/tcp_output.c 2004-09-29 16:09:42 -07:00 +++ b/net/ipv4/tcp_output.c 2004-09-29 16:09:42 -07:00 @@ -553,7 +553,7 @@ return skb->tail; } -static int __tcp_trim_head(struct sock *sk, struct sk_buff *skb, u32 len) +static int __tcp_trim_head(struct tcp_opt *tp, struct sk_buff *skb, u32 len) { if (skb_cloned(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) @@ -567,12 +567,18 @@ } skb->ip_summed = CHECKSUM_HW; + + /* Any change of skb->len requires recalculation of tso + * factor and mss. + */ + tcp_set_skb_tso_factor(skb, tp->mss_cache_std); + return 0; } -static inline int tcp_trim_head(struct sock *sk, struct sk_buff *skb, u32 len) +static inline int tcp_trim_head(struct tcp_opt *tp, struct sk_buff *skb, u32 len) { - int err = __tcp_trim_head(sk, skb, len); + int err = __tcp_trim_head(tp, skb, len); if (!err) TCP_SKB_CB(skb)->seq += len; @@ -897,6 +903,9 @@ ((skb_size + next_skb_size) > mss_now)) return; + BUG_ON(TCP_SKB_CB(skb)->tso_factor != 1 || + TCP_SKB_CB(next_skb)->tso_factor != 1); + /* Ok. We will be able to collapse the packet. */ __skb_unlink(next_skb, next_skb->list); @@ -1018,7 +1027,7 @@ if (skb->len > (data_end_seq - data_seq)) { u32 to_trim = skb->len - (data_end_seq - data_seq); - if (__tcp_trim_head(sk, skb, to_trim)) + if (__tcp_trim_head(tp, skb, to_trim)) return -ENOMEM; } @@ -1032,7 +1041,7 @@ tp->mss_cache = tp->mss_cache_std; } - if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq)) + if (tcp_trim_head(tp, skb, tp->snd_una - TCP_SKB_CB(skb)->seq)) return -ENOMEM; } @@ -1080,6 +1089,7 @@ tp->snd_una == (TCP_SKB_CB(skb)->end_seq - 1)) { if (!pskb_trim(skb, 0)) { TCP_SKB_CB(skb)->seq = TCP_SKB_CB(skb)->end_seq - 1; + TCP_SKB_CB(skb)->tso_factor = 1; skb->ip_summed = CHECKSUM_NONE; skb->csum = 0; }