netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "David S. Miller" <davem@davemloft.net>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: ak@suse.de, niv@us.ibm.com, jheffner@psc.edu,
	andy.grover@gmail.com, anton@samba.org, netdev@oss.sgi.com
Subject: Re: bad TSO performance in 2.6.9-rc2-BK
Date: Wed, 29 Sep 2004 21:33:10 -0700	[thread overview]
Message-ID: <20040929213310.40f5f33a.davem@davemloft.net> (raw)
In-Reply-To: <20040930000515.GA10496@gondor.apana.org.au>

On Thu, 30 Sep 2004 10:05:15 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Wed, Sep 29, 2004 at 04:29:23PM -0700, David S. Miller wrote:
> >
> > @@ -567,12 +567,18 @@
> >  	skb->ip_summed = CHECKSUM_HW;
> > +
> > +	/* Any change of skb->len requires recalculation of tso
> > +	 * factor and mss.
> > +	 */
> > +	tcp_set_skb_tso_factor(skb, tp->mss_cache_std);
> 
> Minor optimsations: __tcp_trim_head is only called directly when
> tso_factor has already been adjusted by tcp_tso_acked.  So you can
> move this setting into tcp_trim_head.

Right.  This patch below combines that with adjustment of socket
send queue usage when we trim the head.

I also added John Heffner's snd_cwnd TSO factor tweak.  I adjusted
it down to 1/4 of the congestion window because it gave the best
ramp-up performance for a cross-continental transfer test.

John, this might make your netperf case go better.  Give it a try
and let me know how it goes.

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/09/29 21:12:18-07:00 davem@nuts.davemloft.net 
#   [TCP]: Smooth out TSO ack clocking.
#   
#   - Export tcp_trim_head() and call it directly from
#     tcp_tso_acked().  This also fixes URG handling.
#   
#   - Make tcp_trim_head() adjust the skb->truesize of
#     the packet and liberate that space from the socket
#     send buffer.
#   
#   - In tcp_current_mss(), limit TSO factor to 1/4 of
#     snd_cwnd.  The idea is from John Heffner.
#   
#   Signed-off-by: David S. Miller <davem@davemloft.net>
# 
# net/ipv4/tcp_output.c
#   2004/09/29 21:11:53-07:00 davem@nuts.davemloft.net +15 -35
#   [TCP]: Smooth out TSO ack clocking.
#   
#   - Export tcp_trim_head() and call it directly from
#     tcp_tso_acked().  This also fixes URG handling.
#   
#   - Make tcp_trim_head() adjust the skb->truesize of
#     the packet and liberate that space from the socket
#     send buffer.
#   
#   - In tcp_current_mss(), limit TSO factor to 1/4 of
#     snd_cwnd.  The idea is from John Heffner.
#   
#   Signed-off-by: David S. Miller <davem@davemloft.net>
# 
# net/ipv4/tcp_input.c
#   2004/09/29 21:11:53-07:00 davem@nuts.davemloft.net +9 -13
#   [TCP]: Smooth out TSO ack clocking.
#   
#   - Export tcp_trim_head() and call it directly from
#     tcp_tso_acked().  This also fixes URG handling.
#   
#   - Make tcp_trim_head() adjust the skb->truesize of
#     the packet and liberate that space from the socket
#     send buffer.
#   
#   - In tcp_current_mss(), limit TSO factor to 1/4 of
#     snd_cwnd.  The idea is from John Heffner.
#   
#   Signed-off-by: David S. Miller <davem@davemloft.net>
# 
# include/net/tcp.h
#   2004/09/29 21:11:52-07:00 davem@nuts.davemloft.net +1 -0
#   [TCP]: Smooth out TSO ack clocking.
#   
#   - Export tcp_trim_head() and call it directly from
#     tcp_tso_acked().  This also fixes URG handling.
#   
#   - Make tcp_trim_head() adjust the skb->truesize of
#     the packet and liberate that space from the socket
#     send buffer.
#   
#   - In tcp_current_mss(), limit TSO factor to 1/4 of
#     snd_cwnd.  The idea is from John Heffner.
#   
#   Signed-off-by: David S. Miller <davem@davemloft.net>
# 
diff -Nru a/include/net/tcp.h b/include/net/tcp.h
--- a/include/net/tcp.h	2004-09-29 21:12:59 -07:00
+++ b/include/net/tcp.h	2004-09-29 21:12:59 -07:00
@@ -944,6 +944,7 @@
 extern int tcp_retransmit_skb(struct sock *, struct sk_buff *);
 extern void tcp_xmit_retransmit_queue(struct sock *);
 extern void tcp_simple_retransmit(struct sock *);
+extern int tcp_trim_head(struct sock *, struct sk_buff *, u32);
 
 extern void tcp_send_probe0(struct sock *);
 extern void tcp_send_partial(struct sock *);
diff -Nru a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
--- a/net/ipv4/tcp_input.c	2004-09-29 21:12:59 -07:00
+++ b/net/ipv4/tcp_input.c	2004-09-29 21:12:59 -07:00
@@ -2364,13 +2364,14 @@
  * then making a write space wakeup callback is a possible
  * future enhancement.  WARNING: it is not trivial to make.
  */
-static int tcp_tso_acked(struct tcp_opt *tp, struct sk_buff *skb,
+static int tcp_tso_acked(struct sock *sk, struct sk_buff *skb,
 			 __u32 now, __s32 *seq_rtt)
 {
+	struct tcp_opt *tp = tcp_sk(sk);
 	struct tcp_skb_cb *scb = TCP_SKB_CB(skb); 
 	__u32 mss = scb->tso_mss;
 	__u32 snd_una = tp->snd_una;
-	__u32 seq = scb->seq;
+	__u32 orig_seq, seq;
 	__u32 packets_acked = 0;
 	int acked = 0;
 
@@ -2379,22 +2380,18 @@
 	 */
 	BUG_ON(!after(scb->end_seq, snd_una));
 
+	seq = orig_seq = scb->seq;
 	while (!after(seq + mss, snd_una)) {
 		packets_acked++;
 		seq += mss;
 	}
 
+	if (tcp_trim_head(sk, skb, (seq - orig_seq)))
+		return 0;
+
 	if (packets_acked) {
 		__u8 sacked = scb->sacked;
 
-		/* We adjust scb->seq but we do not pskb_pull() the
-		 * SKB.  We let tcp_retransmit_skb() handle this case
-		 * by checking skb->len against the data sequence span.
-		 * This way, we avoid the pskb_pull() work unless we
-		 * actually need to retransmit the SKB.
-		 */
-		scb->seq = seq;
-
 		acked |= FLAG_DATA_ACKED;
 		if (sacked) {
 			if (sacked & TCPCB_RETRANS) {
@@ -2413,7 +2410,7 @@
 							packets_acked);
 			if (sacked & TCPCB_URG) {
 				if (tp->urg_mode &&
-				    !before(scb->seq, tp->snd_up))
+				    !before(orig_seq, tp->snd_up))
 					tp->urg_mode = 0;
 			}
 		} else if (*seq_rtt < 0)
@@ -2425,7 +2422,6 @@
 			tcp_dec_pcount_explicit(&tp->fackets_out, dval);
 		}
 		tcp_dec_pcount_explicit(&tp->packets_out, packets_acked);
-		scb->tso_factor -= packets_acked;
 
 		BUG_ON(scb->tso_factor == 0);
 		BUG_ON(!before(scb->seq, scb->end_seq));
@@ -2455,7 +2451,7 @@
 		 */
 		if (after(scb->end_seq, tp->snd_una)) {
 			if (scb->tso_factor > 1)
-				acked |= tcp_tso_acked(tp, skb,
+				acked |= tcp_tso_acked(sk, skb,
 						       now, &seq_rtt);
 			break;
 		}
diff -Nru a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
--- a/net/ipv4/tcp_output.c	2004-09-29 21:12:59 -07:00
+++ b/net/ipv4/tcp_output.c	2004-09-29 21:12:59 -07:00
@@ -525,7 +525,7 @@
  * eventually). The difference is that pulled data not copied, but
  * immediately discarded.
  */
-unsigned char * __pskb_trim_head(struct sk_buff *skb, int len)
+static unsigned char *__pskb_trim_head(struct sk_buff *skb, int len)
 {
 	int i, k, eat;
 
@@ -553,8 +553,10 @@
 	return skb->tail;
 }
 
-static int __tcp_trim_head(struct tcp_opt *tp, struct sk_buff *skb, u32 len)
+int tcp_trim_head(struct sock *sk, struct sk_buff *skb, u32 len)
 {
+	struct tcp_opt *tp = tcp_sk(sk);
+
 	if (skb_cloned(skb) &&
 	    pskb_expand_head(skb, 0, 0, GFP_ATOMIC))
 		return -ENOMEM;
@@ -566,8 +568,14 @@
 			return -ENOMEM;
 	}
 
+	TCP_SKB_CB(skb)->seq += len;
 	skb->ip_summed = CHECKSUM_HW;
 
+	skb->truesize	     -= len;
+	sk->sk_queue_shrunk   = 1;
+	sk->sk_wmem_queued   -= len;
+	sk->sk_forward_alloc += len;
+
 	/* Any change of skb->len requires recalculation of tso
 	 * factor and mss.
 	 */
@@ -576,16 +584,6 @@
 	return 0;
 }
 
-static inline int tcp_trim_head(struct tcp_opt *tp, struct sk_buff *skb, u32 len)
-{
-	int err = __tcp_trim_head(tp, skb, len);
-
-	if (!err)
-		TCP_SKB_CB(skb)->seq += len;
-
-	return err;
-}
-
 /* This function synchronize snd mss to current pmtu/exthdr set.
 
    tp->user_mss is mss set by user by TCP_MAXSEG. It does NOT counts
@@ -686,11 +684,12 @@
 					68U - tp->tcp_header_len);
 
 		/* Always keep large mss multiple of real mss, but
-		 * do not exceed congestion window.
+		 * do not exceed 1/4 of the congestion window so we
+		 * can keep the ACK clock ticking.
 		 */
 		factor = large_mss / mss_now;
-		if (factor > tp->snd_cwnd)
-			factor = tp->snd_cwnd;
+		if (factor > (tp->snd_cwnd >> 2))
+			factor = max(1, tp->snd_cwnd >> 2);
 
 		tp->mss_cache = mss_now * factor;
 
@@ -1003,7 +1002,6 @@
 {
 	struct tcp_opt *tp = tcp_sk(sk);
  	unsigned int cur_mss = tcp_current_mss(sk, 0);
-	__u32 data_seq, data_end_seq;
 	int err;
 
 	/* Do not sent more than we queued. 1/4 is reserved for possible
@@ -1013,24 +1011,6 @@
 	    min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
 		return -EAGAIN;
 
-	/* What is going on here?  When TSO packets are partially ACK'd,
-	 * we adjust the TCP_SKB_CB(skb)->seq value forward but we do
-	 * not adjust the data area of the SKB.  We defer that to here
-	 * so that we can avoid the work unless we really retransmit
-	 * the packet.
-	 */
-	data_seq = TCP_SKB_CB(skb)->seq;
-	data_end_seq = TCP_SKB_CB(skb)->end_seq;
-	if (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN)
-		data_end_seq--;
-
-	if (skb->len > (data_end_seq - data_seq)) {
-		u32 to_trim = skb->len - (data_end_seq - data_seq);
-
-		if (__tcp_trim_head(tp, skb, to_trim))
-			return -ENOMEM;
-	}		
-
 	if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) {
 		if (before(TCP_SKB_CB(skb)->end_seq, tp->snd_una))
 			BUG();
@@ -1041,7 +1021,7 @@
 			tp->mss_cache = tp->mss_cache_std;
 		}
 
-		if (tcp_trim_head(tp, skb, tp->snd_una - TCP_SKB_CB(skb)->seq))
+		if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq))
 			return -ENOMEM;
 	}
 

  reply	other threads:[~2004-09-30  4:33 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-20  6:30 bad TSO performance in 2.6.9-rc2-BK Anton Blanchard
2004-09-20 15:54 ` Nivedita Singhvi
2004-09-21 15:55   ` Anton Blanchard
2004-09-20 20:30 ` Andi Kleen
2004-09-21 22:58   ` David S. Miller
2004-09-22 14:00     ` Andi Kleen
2004-09-22 18:12       ` David S. Miller
2004-09-22 19:55         ` Andi Kleen
2004-09-22 20:07           ` Nivedita Singhvi
2004-09-22 20:30             ` David S. Miller
2004-09-22 20:56               ` Nivedita Singhvi
2004-09-22 21:56               ` Andi Kleen
2004-09-22 22:04                 ` David S. Miller
2004-09-22 20:12           ` Andrew Grover
2004-09-22 20:39             ` David S. Miller
2004-09-22 22:06               ` Andi Kleen
2004-09-22 22:25                 ` David S. Miller
2004-09-22 22:47                   ` Andi Kleen
2004-09-22 22:50                     ` David S. Miller
2004-09-23 23:11                     ` David S. Miller
2004-09-23 23:41                       ` Herbert Xu
2004-09-23 23:41                         ` David S. Miller
2004-09-24  0:12                           ` Herbert Xu
2004-09-24  0:40                             ` Herbert Xu
2004-09-24  1:07                               ` Herbert Xu
2004-09-24  1:17                                 ` David S. Miller
2004-09-27  1:27                           ` Herbert Xu
2004-09-27  2:50                             ` Herbert Xu
2004-09-27  4:00                               ` David S. Miller
2004-09-27  5:45                                 ` Herbert Xu
2004-09-27 19:01                                   ` David S. Miller
2004-09-27 21:32                                     ` Herbert Xu
2004-09-28 21:10                                       ` David S. Miller
2004-09-28 21:34                                         ` Andi Kleen
2004-09-28 21:53                                           ` David S. Miller
2004-09-28 22:33                                             ` Andi Kleen
2004-09-28 22:57                                               ` David S. Miller
2004-09-28 23:27                                                 ` Andi Kleen
2004-09-28 23:35                                                   ` David S. Miller
2004-09-28 23:55                                                     ` Andi Kleen
2004-09-29  0:04                                                       ` David S. Miller
2004-09-29 20:58                                                   ` John Heffner
2004-09-29 21:10                                                     ` Nivedita Singhvi
2004-09-29 21:50                                                       ` David S. Miller
2004-09-29 21:56                                                         ` Andi Kleen
2004-09-29 23:29                                                           ` David S. Miller
2004-09-29 23:51                                                             ` John Heffner
2004-09-30  0:03                                                               ` David S. Miller
2004-09-30  0:10                                                                 ` Herbert Xu
2004-10-01  0:34                                                                   ` David S. Miller
2004-10-01  1:12                                                                     ` David S. Miller
2004-10-01  3:40                                                                       ` David S. Miller
2004-10-01 10:35                                                                         ` Andi Kleen
2004-10-01 10:23                                                                       ` Andi Kleen
2004-09-30  0:10                                                               ` John Heffner
2004-09-30 17:25                                                                 ` John Heffner
2004-09-30 20:23                                                                   ` David S. Miller
2004-09-30  0:05                                                             ` Herbert Xu
2004-09-30  4:33                                                               ` David S. Miller [this message]
2004-09-30  5:47                                                                 ` Herbert Xu
2004-09-30  7:39                                                                   ` David S. Miller
2004-09-30  8:09                                                                     ` Herbert Xu
2004-09-30  9:29                                                                 ` Andi Kleen
2004-09-30 20:20                                                                   ` David S. Miller
2004-09-29  3:27                                               ` John Heffner
2004-09-29  9:01                                                 ` Andi Kleen
2004-09-29 19:56                                                   ` David S. Miller
2004-09-29 20:56                                                     ` Andi Kleen
2004-09-29 21:17                                                       ` David S. Miller
2004-09-29 21:00                                                 ` David S. Miller
2004-09-29 21:16                                                   ` Nivedita Singhvi
2004-09-29 21:22                                                     ` David S. Miller
2004-09-29 21:43                                                       ` Andi Kleen
2004-09-29 21:51                                                         ` John Heffner
2004-09-29 21:52                                                           ` David S. Miller
2004-09-24  8:30                       ` Andi Kleen
2004-09-27 22:38                       ` John Heffner
2004-09-27 23:04                         ` David S. Miller
2004-09-27 23:25                           ` Andi Kleen
2004-09-27 23:37                             ` David S. Miller
2004-09-27 23:51                               ` Andi Kleen
2004-09-28  0:15                                 ` David S. Miller
2004-09-27 23:36                           ` Herbert Xu
2004-09-28  0:13                             ` David S. Miller
2004-09-28  0:34                               ` Herbert Xu
2004-09-28  4:59                                 ` David S. Miller
2004-09-28  5:15                                   ` Herbert Xu
2004-09-28  5:58                                     ` David S. Miller
2004-09-28  6:45                                   ` Nivedita Singhvi
2004-09-28  7:20                               ` Nivedita Singhvi
2004-09-28 20:38                                 ` David S. Miller
2004-09-28  7:23                         ` Nivedita Singhvi
2004-09-28  8:23                           ` Herbert Xu
2004-09-28 12:53                           ` John Heffner
2004-09-22 20:28           ` David S. Miller
     [not found] <Pine.NEB.4.33.0409301625560.13549-100000@dexter.psc.edu>
2004-10-02  1:32 ` John Heffner
2004-10-04 20:07   ` David S. Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040929213310.40f5f33a.davem@davemloft.net \
    --to=davem@davemloft.net \
    --cc=ak@suse.de \
    --cc=andy.grover@gmail.com \
    --cc=anton@samba.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=jheffner@psc.edu \
    --cc=netdev@oss.sgi.com \
    --cc=niv@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).