netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix issues with SACK processing
@ 2007-01-27 16:47 Baruch Even
  2007-01-27 16:48 ` [PATCH 1/3] Advance fast path pointer for first block only Baruch Even
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Baruch Even @ 2007-01-27 16:47 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

These patches are intended to fix the issues I've raised in a former
email in addition to the sorting code.

I still was not able to runtime test these patches, they were only
compile tested.

Baruch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] Advance fast path pointer for first block only
  2007-01-27 16:47 [PATCH 0/3] Fix issues with SACK processing Baruch Even
@ 2007-01-27 16:48 ` Baruch Even
  2007-01-27 16:49 ` [PATCH 2/3] Seperate DSACK from SACK fast path Baruch Even
  2007-01-27 16:50 ` [PATCH 3/3] Check num sacks in " Baruch Even
  2 siblings, 0 replies; 12+ messages in thread
From: Baruch Even @ 2007-01-27 16:48 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

Only advance the SACK fast-path pointer for the first block, the fast-path
assumes that only the first block advances next time so we should not move the
skb for the next sack blocks.

Signed-Off-By: Baruch Even <baruch@ev-en.org>

---

I'm not sure about the fack_count part, this patch changes the value that is
maintained in this case.

Index: 2.6-rc6/net/ipv4/tcp_input.c
===================================================================
--- 2.6-rc6.orig/net/ipv4/tcp_input.c	2007-01-27 14:53:27.000000000 +0200
+++ 2.6-rc6/net/ipv4/tcp_input.c	2007-01-27 15:59:22.000000000 +0200
@@ -1048,8 +1048,13 @@
 			int in_sack, pcount;
 			u8 sacked;
 
-			tp->fastpath_skb_hint = skb;
-			tp->fastpath_cnt_hint = fack_count;
+			if (i == 0) {
+				/* Only advance the hint for the first SACK
+				 * block, the hint is for quickly handling the
+				 * advancing of the first SACK blocks only. */
+				tp->fastpath_skb_hint = skb;
+				tp->fastpath_cnt_hint = fack_count;
+			}
 
 			/* The retransmission queue is always in order, so
 			 * we can short-circuit the walk early.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/3] Seperate DSACK from SACK fast path
  2007-01-27 16:47 [PATCH 0/3] Fix issues with SACK processing Baruch Even
  2007-01-27 16:48 ` [PATCH 1/3] Advance fast path pointer for first block only Baruch Even
@ 2007-01-27 16:49 ` Baruch Even
  2007-01-28  4:06   ` David Miller
  2007-01-27 16:50 ` [PATCH 3/3] Check num sacks in " Baruch Even
  2 siblings, 1 reply; 12+ messages in thread
From: Baruch Even @ 2007-01-27 16:49 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

Move DSACK code outside the SACK fast-path checking code. If the DSACK
determined that the information was too old we stayed with a partial cache
copied. Most likely this matters very little since the next packet will not be
DSACK and we will find it in the cache. but it's still not good form and there
is little reason to couple the two checks.

Since the SACK receive cache doesn't need the data to be in host order we also
remove the ntohl in the checking loop.

Signed-Off-By: Baruch Even <baruch@ev-en.org>

Index: 2.6-rc6/net/ipv4/tcp_input.c
===================================================================
--- 2.6-rc6.orig/net/ipv4/tcp_input.c	2007-01-27 15:06:30.000000000 +0200
+++ 2.6-rc6/net/ipv4/tcp_input.c	2007-01-27 15:59:15.000000000 +0200
@@ -948,16 +948,43 @@
 		tp->fackets_out = 0;
 	prior_fackets = tp->fackets_out;
 
+	/* Check for D-SACK. */
+	if (before(ntohl(sp[0].start_seq), TCP_SKB_CB(ack_skb)->ack_seq)) {
+		dup_sack = 1;
+		tp->rx_opt.sack_ok |= 4;
+		NET_INC_STATS_BH(LINUX_MIB_TCPDSACKRECV);
+	} else if (num_sacks > 1 &&
+			!after(ntohl(sp[0].end_seq), ntohl(sp[1].end_seq)) &&
+			!before(ntohl(sp[0].start_seq), ntohl(sp[1].start_seq))) {
+		dup_sack = 1;
+		tp->rx_opt.sack_ok |= 4;
+		NET_INC_STATS_BH(LINUX_MIB_TCPDSACKOFORECV);
+	}
+
+	/* D-SACK for already forgotten data...
+	 * Do dumb counting. */
+	if (dup_sack &&
+			!after(ntohl(sp[0].end_seq), prior_snd_una) &&
+			after(ntohl(sp[0].end_seq), tp->undo_marker))
+		tp->undo_retrans--;
+
+	/* Eliminate too old ACKs, but take into
+	 * account more or less fresh ones, they can
+	 * contain valid SACK info.
+	 */
+	if (before(TCP_SKB_CB(ack_skb)->ack_seq, prior_snd_una - tp->max_window))
+		return 0;
+
 	/* SACK fastpath:
 	 * if the only SACK change is the increase of the end_seq of
 	 * the first block then only apply that SACK block
 	 * and use retrans queue hinting otherwise slowpath */
 	flag = 1;
-	for (i = 0; i< num_sacks; i++) {
-		__u32 start_seq = ntohl(sp[i].start_seq);
-		__u32 end_seq =	 ntohl(sp[i].end_seq);
+	for (i = 0; i < num_sacks; i++) {
+		__u32 start_seq = sp[i].start_seq;
+		__u32 end_seq = sp[i].end_seq;
 
-		if (i == 0){
+		if (i == 0) {
 			if (tp->recv_sack_cache[i].start_seq != start_seq)
 				flag = 0;
 		} else {
@@ -967,37 +994,6 @@
 		}
 		tp->recv_sack_cache[i].start_seq = start_seq;
 		tp->recv_sack_cache[i].end_seq = end_seq;
-
-		/* Check for D-SACK. */
-		if (i == 0) {
-			u32 ack = TCP_SKB_CB(ack_skb)->ack_seq;
-
-			if (before(start_seq, ack)) {
-				dup_sack = 1;
-				tp->rx_opt.sack_ok |= 4;
-				NET_INC_STATS_BH(LINUX_MIB_TCPDSACKRECV);
-			} else if (num_sacks > 1 &&
-				   !after(end_seq, ntohl(sp[1].end_seq)) &&
-				   !before(start_seq, ntohl(sp[1].start_seq))) {
-				dup_sack = 1;
-				tp->rx_opt.sack_ok |= 4;
-				NET_INC_STATS_BH(LINUX_MIB_TCPDSACKOFORECV);
-			}
-
-			/* D-SACK for already forgotten data...
-			 * Do dumb counting. */
-			if (dup_sack &&
-			    !after(end_seq, prior_snd_una) &&
-			    after(end_seq, tp->undo_marker))
-				tp->undo_retrans--;
-
-			/* Eliminate too old ACKs, but take into
-			 * account more or less fresh ones, they can
-			 * contain valid SACK info.
-			 */
-			if (before(ack, prior_snd_una - tp->max_window))
-				return 0;
-		}
 	}
 
 	if (flag)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/3] Check num sacks in SACK fast path
  2007-01-27 16:47 [PATCH 0/3] Fix issues with SACK processing Baruch Even
  2007-01-27 16:48 ` [PATCH 1/3] Advance fast path pointer for first block only Baruch Even
  2007-01-27 16:49 ` [PATCH 2/3] Seperate DSACK from SACK fast path Baruch Even
@ 2007-01-27 16:50 ` Baruch Even
  2 siblings, 0 replies; 12+ messages in thread
From: Baruch Even @ 2007-01-27 16:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

When we check for SACK fast path make sure that we also have the same number of
SACK blocks in the cache and in the new SACK data. This prevents us from
mistakenly taking the cache data if the old data in the SACK cache is the same
as the data in the SACK block.

Signed-Off-By: Baruch Even <baruch@ev-en.org>

Index: 2.6-rc6/include/linux/tcp.h
===================================================================
--- 2.6-rc6.orig/include/linux/tcp.h	2007-01-27 15:06:02.000000000 +0200
+++ 2.6-rc6/include/linux/tcp.h	2007-01-27 15:19:04.000000000 +0200
@@ -317,6 +317,7 @@
 	struct tcp_sack_block selective_acks[4]; /* The SACKS themselves*/
 
 	struct tcp_sack_block recv_sack_cache[4];
+	u32     recv_sack_cache_size;
 
 	/* from STCP, retrans queue hinting */
 	struct sk_buff* lost_skb_hint;
Index: 2.6-rc6/net/ipv4/tcp_input.c
===================================================================
--- 2.6-rc6.orig/net/ipv4/tcp_input.c	2007-01-27 15:18:30.000000000 +0200
+++ 2.6-rc6/net/ipv4/tcp_input.c	2007-01-27 15:30:09.000000000 +0200
@@ -979,7 +979,8 @@
 	 * if the only SACK change is the increase of the end_seq of
 	 * the first block then only apply that SACK block
 	 * and use retrans queue hinting otherwise slowpath */
-	flag = 1;
+	flag = num_sacks == tp->recv_sack_cache_size;
+	tp->recv_sack_cache_size = num_sacks;
 	for (i = 0; i < num_sacks; i++) {
 		__u32 start_seq = sp[i].start_seq;
 		__u32 end_seq = sp[i].end_seq;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] Seperate DSACK from SACK fast path
  2007-01-27 16:49 ` [PATCH 2/3] Seperate DSACK from SACK fast path Baruch Even
@ 2007-01-28  4:06   ` David Miller
  2007-01-28  5:27     ` Baruch Even
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2007-01-28  4:06 UTC (permalink / raw)
  To: baruch; +Cc: netdev

From: Baruch Even <baruch@ev-en.org>
Date: Sat, 27 Jan 2007 18:49:49 +0200

> Since the SACK receive cache doesn't need the data to be in host
> order we also remove the ntohl in the checking loop.
 ...
> -	for (i = 0; i< num_sacks; i++) {
> -		__u32 start_seq = ntohl(sp[i].start_seq);
> -		__u32 end_seq =	 ntohl(sp[i].end_seq);
> +	for (i = 0; i < num_sacks; i++) {
> +		__u32 start_seq = sp[i].start_seq;
> +		__u32 end_seq = sp[i].end_seq;
 ...
>  		}
>  		tp->recv_sack_cache[i].start_seq = start_seq;
>  		tp->recv_sack_cache[i].end_seq = end_seq;

Ok, and now the sack cache and the real sack blocks are
stored in net-endian and this works out because we only
make direct equality comparisons with the recv_sack_cache[]
entry values?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] Seperate DSACK from SACK fast path
  2007-01-28  4:06   ` David Miller
@ 2007-01-28  5:27     ` Baruch Even
  2007-01-29  5:06       ` Herbert Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Baruch Even @ 2007-01-28  5:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

* David Miller <davem@davemloft.net> [070128 06:06]:
> From: Baruch Even <baruch@ev-en.org>
> Date: Sat, 27 Jan 2007 18:49:49 +0200
> 
> > Since the SACK receive cache doesn't need the data to be in host
> > order we also remove the ntohl in the checking loop.
>  ...
> > -	for (i = 0; i< num_sacks; i++) {
> > -		__u32 start_seq = ntohl(sp[i].start_seq);
> > -		__u32 end_seq =	 ntohl(sp[i].end_seq);
> > +	for (i = 0; i < num_sacks; i++) {
> > +		__u32 start_seq = sp[i].start_seq;
> > +		__u32 end_seq = sp[i].end_seq;
>  ...
> >  		}
> >  		tp->recv_sack_cache[i].start_seq = start_seq;
> >  		tp->recv_sack_cache[i].end_seq = end_seq;
> 
> Ok, and now the sack cache and the real sack blocks are
> stored in net-endian and this works out because we only
> make direct equality comparisons with the recv_sack_cache[]
> entry values?

Yes. The only comparison we do with recv_sack_cache entries is != and
that works for net-endian just fine.

The only reason recv_sack_cache was in host-order before that was that
start_seq and end_seq were used to do more before/after comparisons for
DSACK.

Baruch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] Seperate DSACK from SACK fast path
  2007-01-28  5:27     ` Baruch Even
@ 2007-01-29  5:06       ` Herbert Xu
  2007-01-29  5:28         ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Herbert Xu @ 2007-01-29  5:06 UTC (permalink / raw)
  To: Baruch Even; +Cc: davem, netdev

Baruch Even <baruch@ev-en.org> wrote:
>> 
>> > Since the SACK receive cache doesn't need the data to be in host
>> > order we also remove the ntohl in the checking loop.
>>  ...
>> > -   for (i = 0; i< num_sacks; i++) {
>> > -           __u32 start_seq = ntohl(sp[i].start_seq);
>> > -           __u32 end_seq =  ntohl(sp[i].end_seq);
>> > +   for (i = 0; i < num_sacks; i++) {
>> > +           __u32 start_seq = sp[i].start_seq;
>> > +           __u32 end_seq = sp[i].end_seq;
> 
> Yes. The only comparison we do with recv_sack_cache entries is != and
> that works for net-endian just fine.

In that case you need to use __be32 before Al Viro starts coming after
you :)
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] Seperate DSACK from SACK fast path
  2007-01-29  5:06       ` Herbert Xu
@ 2007-01-29  5:28         ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2007-01-29  5:28 UTC (permalink / raw)
  To: herbert; +Cc: baruch, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 29 Jan 2007 16:06:07 +1100

> Baruch Even <baruch@ev-en.org> wrote:
> >> 
> >> > Since the SACK receive cache doesn't need the data to be in host
> >> > order we also remove the ntohl in the checking loop.
> >>  ...
> >> > -   for (i = 0; i< num_sacks; i++) {
> >> > -           __u32 start_seq = ntohl(sp[i].start_seq);
> >> > -           __u32 end_seq =  ntohl(sp[i].end_seq);
> >> > +   for (i = 0; i < num_sacks; i++) {
> >> > +           __u32 start_seq = sp[i].start_seq;
> >> > +           __u32 end_seq = sp[i].end_seq;
> > 
> > Yes. The only comparison we do with recv_sack_cache entries is != and
> > that works for net-endian just fine.
> 
> In that case you need to use __be32 before Al Viro starts coming after
> you :)

Good catch, Baruch please fix this up :-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Check num sacks in SACK fast path
       [not found] ` <20070129071349.24050.56022.sendpatchset@galon.ev-en.org>
@ 2007-01-31 20:52   ` David Miller
  2007-02-01  7:22     ` Baruch Even
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2007-01-31 20:52 UTC (permalink / raw)
  To: baruch; +Cc: netdev

From: Baruch Even <baruch@ev-en.org>
Date: Mon, 29 Jan 2007 09:13:49 +0200

> When we check for SACK fast path make sure that we also have the same number of
> SACK blocks in the cache and in the new SACK data. This prevents us from
> mistakenly taking the cache data if the old data in the SACK cache is the same
> as the data in the SACK block.
> 
> Signed-Off-By: Baruch Even <baruch@ev-en.org>

We could implement this without extra state, for example by
clearing out the rest of the recv_sack_cache entries.

We should never see a SACK block from sequence zero to zero,
which would be an empty SACK block.

Something like the following?

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c26076f..84cd722 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -999,6 +1001,10 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
 				return 0;
 		}
 	}
+	for (; i <= 4; i++) {
+		tp->recv_sack_cache[i].start_seq = 0;
+		tp->recv_sack_cache[i].end_seq = 0;
+	}
 
 	if (flag)
 		num_sacks = 1;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Check num sacks in SACK fast path
  2007-01-31 20:52   ` David Miller
@ 2007-02-01  7:22     ` Baruch Even
  2007-02-01 22:38       ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Baruch Even @ 2007-02-01  7:22 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

* David Miller <davem@davemloft.net> [070131 22:52]:
> From: Baruch Even <baruch@ev-en.org>
> Date: Mon, 29 Jan 2007 09:13:49 +0200
> 
> > When we check for SACK fast path make sure that we also have the same number of
> > SACK blocks in the cache and in the new SACK data. This prevents us from
> > mistakenly taking the cache data if the old data in the SACK cache is the same
> > as the data in the SACK block.
> > 
> > Signed-Off-By: Baruch Even <baruch@ev-en.org>
> 
> We could implement this without extra state, for example by
> clearing out the rest of the recv_sack_cache entries.
> 
> We should never see a SACK block from sequence zero to zero,
> which would be an empty SACK block.

That would work as well at the cost of extra writing to memory for each
ack packet. Though I won't guess what is worse, the extra memory used or
the extra writing.

> Something like the following?
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index c26076f..84cd722 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -999,6 +1001,10 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
>  				return 0;
>  		}
>  	}
> +	for (; i <= 4; i++) {

That won't work though, the <= should be <, I've actually used
ARRAY_SIZE just to be on the safe side.

> +		tp->recv_sack_cache[i].start_seq = 0;
> +		tp->recv_sack_cache[i].end_seq = 0;
> +	}
>  
>  	if (flag)
>  		num_sacks = 1;

Baruch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Check num sacks in SACK fast path
  2007-02-01  7:22     ` Baruch Even
@ 2007-02-01 22:38       ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2007-02-01 22:38 UTC (permalink / raw)
  To: baruch; +Cc: netdev

From: Baruch Even <baruch@ev-en.org>
Date: Thu, 1 Feb 2007 09:22:52 +0200

> * David Miller <davem@davemloft.net> [070131 22:52]:
> > We should never see a SACK block from sequence zero to zero,
> > which would be an empty SACK block.
> 
> That would work as well at the cost of extra writing to memory for each
> ack packet. Though I won't guess what is worse, the extra memory used or
> the extra writing.

Good point.  The recv_sack_cache is 32-bytes, so sits in at most
2 cache lines.  The writes are consequetive and in order so would
compress into cpu store buffers.

But it's definitely not free. :-)

> > Something like the following?
> > 
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index c26076f..84cd722 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -999,6 +1001,10 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> >  				return 0;
> >  		}
> >  	}
> > +	for (; i <= 4; i++) {
> 
> That won't work though, the <= should be <, I've actually used
> ARRAY_SIZE just to be on the safe side.

Thanks for catching that.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Check num sacks in SACK fast path
       [not found] ` <20070202144127.26863.97696.sendpatchset@galon.ev-en.org>
@ 2007-02-05  7:38   ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2007-02-05  7:38 UTC (permalink / raw)
  To: baruch; +Cc: netdev

From: Baruch Even <baruch@ev-en.org>
Date: Fri, 02 Feb 2007 16:41:27 +0200

> We clear the unused parts of the SACK cache, This prevents us from mistakenly
> taking the cache data if the old data in the SACK cache is the same as the data
> in the SACK block. This assumes that we never receive an empty SACK block with
> start and end both at zero.
> 
> Signed-Off-By: Baruch Even <baruch@ev-en.org>

I've applied this one too, thanks a lot.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-02-05  7:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-27 16:47 [PATCH 0/3] Fix issues with SACK processing Baruch Even
2007-01-27 16:48 ` [PATCH 1/3] Advance fast path pointer for first block only Baruch Even
2007-01-27 16:49 ` [PATCH 2/3] Seperate DSACK from SACK fast path Baruch Even
2007-01-28  4:06   ` David Miller
2007-01-28  5:27     ` Baruch Even
2007-01-29  5:06       ` Herbert Xu
2007-01-29  5:28         ` David Miller
2007-01-27 16:50 ` [PATCH 3/3] Check num sacks in " Baruch Even
     [not found] <20070129071339.24050.21052.sendpatchset@galon.ev-en.org>
     [not found] ` <20070129071349.24050.56022.sendpatchset@galon.ev-en.org>
2007-01-31 20:52   ` David Miller
2007-02-01  7:22     ` Baruch Even
2007-02-01 22:38       ` David Miller
     [not found] <20070202144116.26863.3722.sendpatchset@galon.ev-en.org>
     [not found] ` <20070202144127.26863.97696.sendpatchset@galon.ev-en.org>
2007-02-05  7:38   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).