netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 2/2] TCP: skip processing cached SACK blocks
@ 2007-10-04  9:44 TAKANO Ryousei
  2007-10-05 10:37 ` Ilpo Järvinen
  0 siblings, 1 reply; 4+ messages in thread
From: TAKANO Ryousei @ 2007-10-04  9:44 UTC (permalink / raw)
  To: netdev; +Cc: y-kodama

This patch allows to process only newly reported SACK blocks at the
sender side. An ACK packet contains up to three SACK blocks, and some
of them may be already reported and processed blocks.  This patch 
prevents processing of such already processed SACK blocks.

Signed-off-by: Ryousei Takano <takano-ryousei@aist.go.jp>
Signed-off-by: Yuetsu Kodama <y-kodama@aist.go.jp>
---
 net/ipv4/tcp_input.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bbad2cd..9615fc9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -978,6 +978,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
 	int cached_fack_count;
 	int i;
 	int first_sack_index;
+	u8 sack_block_skip[4] = {0,0,0,0};
 
 	if (!tp->sacked_out)
 		tp->fackets_out = 0;
@@ -1012,6 +1013,21 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
 	if (before(TCP_SKB_CB(ack_skb)->ack_seq, prior_snd_una - tp->max_window))
 		return 0;
 
+	/* Skip processing cached SACK blocks. */
+	for (i = 0; i < num_sacks; i++) {
+		__be32 start_seq = sp[i].start_seq;
+		__be32 end_seq = sp[i].end_seq;
+		int j;
+
+		for (j = 0; j < ARRAY_SIZE(tp->recv_sack_cache); j++) {
+			if ((tp->recv_sack_cache[j].start_seq == start_seq) &&
+			    (tp->recv_sack_cache[j].end_seq == end_seq)) {
+				sack_block_skip[i] = 1;
+				break;
+			}
+		}
+	}
+
 	/* SACK fastpath:
 	 * if the only SACK change is the increase of the end_seq of
 	 * the first block then only apply that SACK block
@@ -1051,11 +1067,16 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
 				if (after(ntohl(sp[j].start_seq),
 					  ntohl(sp[j+1].start_seq))){
 					struct tcp_sack_block_wire tmp;
+					u8 sbtmp;
 
 					tmp = sp[j];
 					sp[j] = sp[j+1];
 					sp[j+1] = tmp;
 
+					sbtmp = sack_block_skip[j];
+					sack_block_skip[j] = sack_block_skip[j+1];
+					sack_block_skip[j+1] = sbtmp;
+
 					/* Track where the first SACK block goes to */
 					if (j == first_sack_index)
 						first_sack_index = j+1;
@@ -1083,6 +1104,9 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
 		int fack_count;
 		int dup_sack = (found_dup_sack && (i == first_sack_index));
 
+		if (sack_block_skip[i])
+			continue;
+
 		skb = cached_skb;
 		fack_count = cached_fack_count;
 
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH 2/2] TCP: skip processing cached SACK blocks
  2007-10-04  9:44 [RFC][PATCH 2/2] TCP: skip processing cached SACK blocks TAKANO Ryousei
@ 2007-10-05 10:37 ` Ilpo Järvinen
  2007-10-07  6:11   ` TAKANO Ryousei
  0 siblings, 1 reply; 4+ messages in thread
From: Ilpo Järvinen @ 2007-10-05 10:37 UTC (permalink / raw)
  To: TAKANO Ryousei; +Cc: Netdev, y-kodama

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3934 bytes --]

On Thu, 4 Oct 2007, TAKANO Ryousei wrote:

> This patch allows to process only newly reported SACK blocks at the
> sender side. An ACK packet contains up to three SACK blocks, and some

  "A SACK option that specifies n blocks will have a length of 8*n+2
   bytes, so the 40 bytes available for TCP options can specify a
   maximum of 4 blocks.  It is expected that SACK will often be used in
   conjunction with the Timestamp option used for RTTM [Jacobson92],
   which takes an additional 10 bytes (plus two bytes of padding); thus
   a maximum of 3 SACK blocks will be allowed in this case." [RFC2018]

:-)

> of them may be already reported and processed blocks.  This patch 
> prevents processing of such already processed SACK blocks.
> 
> Signed-off-by: Ryousei Takano <takano-ryousei@aist.go.jp>
> Signed-off-by: Yuetsu Kodama <y-kodama@aist.go.jp>
> ---
>  net/ipv4/tcp_input.c |   24 ++++++++++++++++++++++++
>  1 files changed, 24 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index bbad2cd..9615fc9 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -978,6 +978,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
>  	int cached_fack_count;
>  	int i;
>  	int first_sack_index;
> +	u8 sack_block_skip[4] = {0,0,0,0};
>  
>  	if (!tp->sacked_out)
>  		tp->fackets_out = 0;
> @@ -1012,6 +1013,21 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
>  	if (before(TCP_SKB_CB(ack_skb)->ack_seq, prior_snd_una - tp->max_window))
>  		return 0;
>  
> +	/* Skip processing cached SACK blocks. */
> +	for (i = 0; i < num_sacks; i++) {
> +		__be32 start_seq = sp[i].start_seq;
> +		__be32 end_seq = sp[i].end_seq;
> +		int j;
> +
> +		for (j = 0; j < ARRAY_SIZE(tp->recv_sack_cache); j++) {
> +			if ((tp->recv_sack_cache[j].start_seq == start_seq) &&
> +			    (tp->recv_sack_cache[j].end_seq == end_seq)) {
> +				sack_block_skip[i] = 1;
> +				break;
> +			}
> +		}
> +	}
> +

I'm somewhat against adding more and more special cases to sacktag, 
there's still need for more special cases after this one to avoid very 
expensive processing (I guess they just won't occur in your scenario)!
...I would rather remove whole special case mess of the fastpath and
have a more generic solution (see the patch I point into in the reply
to patch 1/2)...

>  	/* SACK fastpath:
>  	 * if the only SACK change is the increase of the end_seq of
>  	 * the first block then only apply that SACK block
> @@ -1051,11 +1067,16 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
>  				if (after(ntohl(sp[j].start_seq),
>  					  ntohl(sp[j+1].start_seq))){
>  					struct tcp_sack_block_wire tmp;
> +					u8 sbtmp;
>  
>  					tmp = sp[j];
>  					sp[j] = sp[j+1];
>  					sp[j+1] = tmp;
>  
> +					sbtmp = sack_block_skip[j];
> +					sack_block_skip[j] = sack_block_skip[j+1];
> +					sack_block_skip[j+1] = sbtmp;
> +
>  					/* Track where the first SACK block goes to */
>  					if (j == first_sack_index)
>  						first_sack_index = j+1;
> @@ -1083,6 +1104,9 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
>  		int fack_count;
>  		int dup_sack = (found_dup_sack && (i == first_sack_index));
>  
> +		if (sack_block_skip[i])

DSACKs must always be processed, so please add:

&& !dup_sack

> +			continue;

By doing this skipping here, you actually end up crippling lost_retrans 
detection even more than it was broken before. ...You probably didn't just 
notice that during tests because of unrelated suboptimal behavior (in 
fastpath_skb_hint handling). ...Anyway, correctness of this should be 
evaluated against the fixed lost_retrans, rather than the already 
broken one.

> +
>  		skb = cached_skb;
>  		fack_count = cached_fack_count;

Other than what's noted above:

Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>


-- 
 i.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH 2/2] TCP: skip processing cached SACK blocks
  2007-10-05 10:37 ` Ilpo Järvinen
@ 2007-10-07  6:11   ` TAKANO Ryousei
  2007-10-08 11:20     ` Ilpo Järvinen
  0 siblings, 1 reply; 4+ messages in thread
From: TAKANO Ryousei @ 2007-10-07  6:11 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: netdev, y-kodama

Hi Ilpo,

Thanks for your reply. 
Most of my response are in the reply to patch 1/2.

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Subject: Re: [RFC][PATCH 2/2] TCP: skip processing cached SACK blocks
Date: Fri, 5 Oct 2007 13:37:21 +0300 (EEST)

> On Thu, 4 Oct 2007, TAKANO Ryousei wrote:
> 
> > This patch allows to process only newly reported SACK blocks at the
> > sender side. An ACK packet contains up to three SACK blocks, and some
> 
>   "A SACK option that specifies n blocks will have a length of 8*n+2
>    bytes, so the 40 bytes available for TCP options can specify a
>    maximum of 4 blocks.  It is expected that SACK will often be used in
>    conjunction with the Timestamp option used for RTTM [Jacobson92],
>    which takes an additional 10 bytes (plus two bytes of padding); thus
>    a maximum of 3 SACK blocks will be allowed in this case." [RFC2018]
> 
> :-)
> 
Yes, indeed:-)

> > of them may be already reported and processed blocks.  This patch 
> > prevents processing of such already processed SACK blocks.
> > 
> > Signed-off-by: Ryousei Takano <takano-ryousei@aist.go.jp>
> > Signed-off-by: Yuetsu Kodama <y-kodama@aist.go.jp>
> > ---
> >  net/ipv4/tcp_input.c |   24 ++++++++++++++++++++++++
> >  1 files changed, 24 insertions(+), 0 deletions(-)
> > 
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index bbad2cd..9615fc9 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -978,6 +978,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> >  	int cached_fack_count;
> >  	int i;
> >  	int first_sack_index;
> > +	u8 sack_block_skip[4] = {0,0,0,0};
> >  
> >  	if (!tp->sacked_out)
> >  		tp->fackets_out = 0;
> > @@ -1012,6 +1013,21 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> >  	if (before(TCP_SKB_CB(ack_skb)->ack_seq, prior_snd_una - tp->max_window))
> >  		return 0;
> >  
> > +	/* Skip processing cached SACK blocks. */
> > +	for (i = 0; i < num_sacks; i++) {
> > +		__be32 start_seq = sp[i].start_seq;
> > +		__be32 end_seq = sp[i].end_seq;
> > +		int j;
> > +
> > +		for (j = 0; j < ARRAY_SIZE(tp->recv_sack_cache); j++) {
> > +			if ((tp->recv_sack_cache[j].start_seq == start_seq) &&
> > +			    (tp->recv_sack_cache[j].end_seq == end_seq)) {
> > +				sack_block_skip[i] = 1;
> > +				break;
> > +			}
> > +		}
> > +	}
> > +
> 
> I'm somewhat against adding more and more special cases to sacktag, 
> there's still need for more special cases after this one to avoid very 
> expensive processing (I guess they just won't occur in your scenario)!
> ...I would rather remove whole special case mess of the fastpath and
> have a more generic solution (see the patch I point into in the reply
> to patch 1/2)...
> 
> >  	/* SACK fastpath:
> >  	 * if the only SACK change is the increase of the end_seq of
> >  	 * the first block then only apply that SACK block
> > @@ -1051,11 +1067,16 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> >  				if (after(ntohl(sp[j].start_seq),
> >  					  ntohl(sp[j+1].start_seq))){
> >  					struct tcp_sack_block_wire tmp;
> > +					u8 sbtmp;
> >  
> >  					tmp = sp[j];
> >  					sp[j] = sp[j+1];
> >  					sp[j+1] = tmp;
> >  
> > +					sbtmp = sack_block_skip[j];
> > +					sack_block_skip[j] = sack_block_skip[j+1];
> > +					sack_block_skip[j+1] = sbtmp;
> > +
> >  					/* Track where the first SACK block goes to */
> >  					if (j == first_sack_index)
> >  						first_sack_index = j+1;
> > @@ -1083,6 +1104,9 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> >  		int fack_count;
> >  		int dup_sack = (found_dup_sack && (i == first_sack_index));
> >  
> > +		if (sack_block_skip[i])
> 
> DSACKs must always be processed, so please add:
> 
> && !dup_sack
> 
I did not notice DSACKs. Thanks.

> > +			continue;
> 
> By doing this skipping here, you actually end up crippling lost_retrans 
> detection even more than it was broken before. ...You probably didn't just 
> notice that during tests because of unrelated suboptimal behavior (in 
> fastpath_skb_hint handling). ...Anyway, correctness of this should be 
> evaluated against the fixed lost_retrans, rather than the already 
> broken one.
> 
You can find the result that the average goodput slightly improves
against the only PATCH #1 (fixed lost_retrans) applied kernel.
Sorry, our web page is down this weekend for the power outage.

http://projects.gtrc.aist.go.jp/gnet/sack-bug.html

> > +
> >  		skb = cached_skb;
> >  		fack_count = cached_fack_count;
> 
> Other than what's noted above:
> 
> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> 
> 
> -- 
>  i.

Regards,
Ryousei Takano

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC][PATCH 2/2] TCP: skip processing cached SACK blocks
  2007-10-07  6:11   ` TAKANO Ryousei
@ 2007-10-08 11:20     ` Ilpo Järvinen
  0 siblings, 0 replies; 4+ messages in thread
From: Ilpo Järvinen @ 2007-10-08 11:20 UTC (permalink / raw)
  To: TAKANO Ryousei; +Cc: Netdev, y-kodama

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1177 bytes --]

On Sun, 7 Oct 2007, TAKANO Ryousei wrote:
> From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
> > On Thu, 4 Oct 2007, TAKANO Ryousei wrote:
> > 
> > > @@ -1083,6 +1104,9 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> > >  		int fack_count;
> > >  		int dup_sack = (found_dup_sack && (i == first_sack_index));
> > >  
> > > +		if (sack_block_skip[i])
> > > +			continue;
> > 
> > By doing this skipping here, you actually end up crippling lost_retrans 
> > detection even more than it was broken before. ...You probably didn't just 
> > notice that during tests because of unrelated suboptimal behavior (in 
> > fastpath_skb_hint handling). ...Anyway, correctness of this should be 
> > evaluated against the fixed lost_retrans, rather than the already 
> > broken one.
> 
> You can find the result that the average goodput slightly improves
> against the only PATCH #1 (fixed lost_retrans) applied kernel.

...Yeah, sorry, I was mistaken when the slow-path is taken and when it's 
not, thought that the old (current) fastpath_skb_hint code was more 
intelligent than it currently is... :-) That continue should not be
a problem.


-- 
 i.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-10-08 11:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-04  9:44 [RFC][PATCH 2/2] TCP: skip processing cached SACK blocks TAKANO Ryousei
2007-10-05 10:37 ` Ilpo Järvinen
2007-10-07  6:11   ` TAKANO Ryousei
2007-10-08 11:20     ` Ilpo Järvinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).