* [RFC][PATCH 2/2] TCP: skip processing cached SACK blocks
@ 2007-10-04 9:44 TAKANO Ryousei
2007-10-05 10:37 ` Ilpo Järvinen
0 siblings, 1 reply; 4+ messages in thread
From: TAKANO Ryousei @ 2007-10-04 9:44 UTC (permalink / raw)
To: netdev; +Cc: y-kodama
This patch allows to process only newly reported SACK blocks at the
sender side. An ACK packet contains up to three SACK blocks, and some
of them may be already reported and processed blocks. This patch
prevents processing of such already processed SACK blocks.
Signed-off-by: Ryousei Takano <takano-ryousei@aist.go.jp>
Signed-off-by: Yuetsu Kodama <y-kodama@aist.go.jp>
---
net/ipv4/tcp_input.c | 24 ++++++++++++++++++++++++
1 files changed, 24 insertions(+), 0 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bbad2cd..9615fc9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -978,6 +978,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
int cached_fack_count;
int i;
int first_sack_index;
+ u8 sack_block_skip[4] = {0,0,0,0};
if (!tp->sacked_out)
tp->fackets_out = 0;
@@ -1012,6 +1013,21 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
if (before(TCP_SKB_CB(ack_skb)->ack_seq, prior_snd_una - tp->max_window))
return 0;
+ /* Skip processing cached SACK blocks. */
+ for (i = 0; i < num_sacks; i++) {
+ __be32 start_seq = sp[i].start_seq;
+ __be32 end_seq = sp[i].end_seq;
+ int j;
+
+ for (j = 0; j < ARRAY_SIZE(tp->recv_sack_cache); j++) {
+ if ((tp->recv_sack_cache[j].start_seq == start_seq) &&
+ (tp->recv_sack_cache[j].end_seq == end_seq)) {
+ sack_block_skip[i] = 1;
+ break;
+ }
+ }
+ }
+
/* SACK fastpath:
* if the only SACK change is the increase of the end_seq of
* the first block then only apply that SACK block
@@ -1051,11 +1067,16 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
if (after(ntohl(sp[j].start_seq),
ntohl(sp[j+1].start_seq))){
struct tcp_sack_block_wire tmp;
+ u8 sbtmp;
tmp = sp[j];
sp[j] = sp[j+1];
sp[j+1] = tmp;
+ sbtmp = sack_block_skip[j];
+ sack_block_skip[j] = sack_block_skip[j+1];
+ sack_block_skip[j+1] = sbtmp;
+
/* Track where the first SACK block goes to */
if (j == first_sack_index)
first_sack_index = j+1;
@@ -1083,6 +1104,9 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
int fack_count;
int dup_sack = (found_dup_sack && (i == first_sack_index));
+ if (sack_block_skip[i])
+ continue;
+
skb = cached_skb;
fack_count = cached_fack_count;
--
1.5.2.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC][PATCH 2/2] TCP: skip processing cached SACK blocks
2007-10-04 9:44 [RFC][PATCH 2/2] TCP: skip processing cached SACK blocks TAKANO Ryousei
@ 2007-10-05 10:37 ` Ilpo Järvinen
2007-10-07 6:11 ` TAKANO Ryousei
0 siblings, 1 reply; 4+ messages in thread
From: Ilpo Järvinen @ 2007-10-05 10:37 UTC (permalink / raw)
To: TAKANO Ryousei; +Cc: Netdev, y-kodama
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3934 bytes --]
On Thu, 4 Oct 2007, TAKANO Ryousei wrote:
> This patch allows to process only newly reported SACK blocks at the
> sender side. An ACK packet contains up to three SACK blocks, and some
"A SACK option that specifies n blocks will have a length of 8*n+2
bytes, so the 40 bytes available for TCP options can specify a
maximum of 4 blocks. It is expected that SACK will often be used in
conjunction with the Timestamp option used for RTTM [Jacobson92],
which takes an additional 10 bytes (plus two bytes of padding); thus
a maximum of 3 SACK blocks will be allowed in this case." [RFC2018]
:-)
> of them may be already reported and processed blocks. This patch
> prevents processing of such already processed SACK blocks.
>
> Signed-off-by: Ryousei Takano <takano-ryousei@aist.go.jp>
> Signed-off-by: Yuetsu Kodama <y-kodama@aist.go.jp>
> ---
> net/ipv4/tcp_input.c | 24 ++++++++++++++++++++++++
> 1 files changed, 24 insertions(+), 0 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index bbad2cd..9615fc9 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -978,6 +978,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> int cached_fack_count;
> int i;
> int first_sack_index;
> + u8 sack_block_skip[4] = {0,0,0,0};
>
> if (!tp->sacked_out)
> tp->fackets_out = 0;
> @@ -1012,6 +1013,21 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> if (before(TCP_SKB_CB(ack_skb)->ack_seq, prior_snd_una - tp->max_window))
> return 0;
>
> + /* Skip processing cached SACK blocks. */
> + for (i = 0; i < num_sacks; i++) {
> + __be32 start_seq = sp[i].start_seq;
> + __be32 end_seq = sp[i].end_seq;
> + int j;
> +
> + for (j = 0; j < ARRAY_SIZE(tp->recv_sack_cache); j++) {
> + if ((tp->recv_sack_cache[j].start_seq == start_seq) &&
> + (tp->recv_sack_cache[j].end_seq == end_seq)) {
> + sack_block_skip[i] = 1;
> + break;
> + }
> + }
> + }
> +
I'm somewhat against adding more and more special cases to sacktag,
there's still need for more special cases after this one to avoid very
expensive processing (I guess they just won't occur in your scenario)!
...I would rather remove whole special case mess of the fastpath and
have a more generic solution (see the patch I point into in the reply
to patch 1/2)...
> /* SACK fastpath:
> * if the only SACK change is the increase of the end_seq of
> * the first block then only apply that SACK block
> @@ -1051,11 +1067,16 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> if (after(ntohl(sp[j].start_seq),
> ntohl(sp[j+1].start_seq))){
> struct tcp_sack_block_wire tmp;
> + u8 sbtmp;
>
> tmp = sp[j];
> sp[j] = sp[j+1];
> sp[j+1] = tmp;
>
> + sbtmp = sack_block_skip[j];
> + sack_block_skip[j] = sack_block_skip[j+1];
> + sack_block_skip[j+1] = sbtmp;
> +
> /* Track where the first SACK block goes to */
> if (j == first_sack_index)
> first_sack_index = j+1;
> @@ -1083,6 +1104,9 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> int fack_count;
> int dup_sack = (found_dup_sack && (i == first_sack_index));
>
> + if (sack_block_skip[i])
DSACKs must always be processed, so please add:
&& !dup_sack
> + continue;
By doing this skipping here, you actually end up crippling lost_retrans
detection even more than it was broken before. ...You probably didn't just
notice that during tests because of unrelated suboptimal behavior (in
fastpath_skb_hint handling). ...Anyway, correctness of this should be
evaluated against the fixed lost_retrans, rather than the already
broken one.
> +
> skb = cached_skb;
> fack_count = cached_fack_count;
Other than what's noted above:
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
--
i.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC][PATCH 2/2] TCP: skip processing cached SACK blocks
2007-10-05 10:37 ` Ilpo Järvinen
@ 2007-10-07 6:11 ` TAKANO Ryousei
2007-10-08 11:20 ` Ilpo Järvinen
0 siblings, 1 reply; 4+ messages in thread
From: TAKANO Ryousei @ 2007-10-07 6:11 UTC (permalink / raw)
To: ilpo.jarvinen; +Cc: netdev, y-kodama
Hi Ilpo,
Thanks for your reply.
Most of my response are in the reply to patch 1/2.
From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Subject: Re: [RFC][PATCH 2/2] TCP: skip processing cached SACK blocks
Date: Fri, 5 Oct 2007 13:37:21 +0300 (EEST)
> On Thu, 4 Oct 2007, TAKANO Ryousei wrote:
>
> > This patch allows to process only newly reported SACK blocks at the
> > sender side. An ACK packet contains up to three SACK blocks, and some
>
> "A SACK option that specifies n blocks will have a length of 8*n+2
> bytes, so the 40 bytes available for TCP options can specify a
> maximum of 4 blocks. It is expected that SACK will often be used in
> conjunction with the Timestamp option used for RTTM [Jacobson92],
> which takes an additional 10 bytes (plus two bytes of padding); thus
> a maximum of 3 SACK blocks will be allowed in this case." [RFC2018]
>
> :-)
>
Yes, indeed:-)
> > of them may be already reported and processed blocks. This patch
> > prevents processing of such already processed SACK blocks.
> >
> > Signed-off-by: Ryousei Takano <takano-ryousei@aist.go.jp>
> > Signed-off-by: Yuetsu Kodama <y-kodama@aist.go.jp>
> > ---
> > net/ipv4/tcp_input.c | 24 ++++++++++++++++++++++++
> > 1 files changed, 24 insertions(+), 0 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index bbad2cd..9615fc9 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -978,6 +978,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> > int cached_fack_count;
> > int i;
> > int first_sack_index;
> > + u8 sack_block_skip[4] = {0,0,0,0};
> >
> > if (!tp->sacked_out)
> > tp->fackets_out = 0;
> > @@ -1012,6 +1013,21 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> > if (before(TCP_SKB_CB(ack_skb)->ack_seq, prior_snd_una - tp->max_window))
> > return 0;
> >
> > + /* Skip processing cached SACK blocks. */
> > + for (i = 0; i < num_sacks; i++) {
> > + __be32 start_seq = sp[i].start_seq;
> > + __be32 end_seq = sp[i].end_seq;
> > + int j;
> > +
> > + for (j = 0; j < ARRAY_SIZE(tp->recv_sack_cache); j++) {
> > + if ((tp->recv_sack_cache[j].start_seq == start_seq) &&
> > + (tp->recv_sack_cache[j].end_seq == end_seq)) {
> > + sack_block_skip[i] = 1;
> > + break;
> > + }
> > + }
> > + }
> > +
>
> I'm somewhat against adding more and more special cases to sacktag,
> there's still need for more special cases after this one to avoid very
> expensive processing (I guess they just won't occur in your scenario)!
> ...I would rather remove whole special case mess of the fastpath and
> have a more generic solution (see the patch I point into in the reply
> to patch 1/2)...
>
> > /* SACK fastpath:
> > * if the only SACK change is the increase of the end_seq of
> > * the first block then only apply that SACK block
> > @@ -1051,11 +1067,16 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> > if (after(ntohl(sp[j].start_seq),
> > ntohl(sp[j+1].start_seq))){
> > struct tcp_sack_block_wire tmp;
> > + u8 sbtmp;
> >
> > tmp = sp[j];
> > sp[j] = sp[j+1];
> > sp[j+1] = tmp;
> >
> > + sbtmp = sack_block_skip[j];
> > + sack_block_skip[j] = sack_block_skip[j+1];
> > + sack_block_skip[j+1] = sbtmp;
> > +
> > /* Track where the first SACK block goes to */
> > if (j == first_sack_index)
> > first_sack_index = j+1;
> > @@ -1083,6 +1104,9 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> > int fack_count;
> > int dup_sack = (found_dup_sack && (i == first_sack_index));
> >
> > + if (sack_block_skip[i])
>
> DSACKs must always be processed, so please add:
>
> && !dup_sack
>
I did not notice DSACKs. Thanks.
> > + continue;
>
> By doing this skipping here, you actually end up crippling lost_retrans
> detection even more than it was broken before. ...You probably didn't just
> notice that during tests because of unrelated suboptimal behavior (in
> fastpath_skb_hint handling). ...Anyway, correctness of this should be
> evaluated against the fixed lost_retrans, rather than the already
> broken one.
>
You can find the result that the average goodput slightly improves
against the only PATCH #1 (fixed lost_retrans) applied kernel.
Sorry, our web page is down this weekend for the power outage.
http://projects.gtrc.aist.go.jp/gnet/sack-bug.html
> > +
> > skb = cached_skb;
> > fack_count = cached_fack_count;
>
> Other than what's noted above:
>
> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
>
>
> --
> i.
Regards,
Ryousei Takano
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC][PATCH 2/2] TCP: skip processing cached SACK blocks
2007-10-07 6:11 ` TAKANO Ryousei
@ 2007-10-08 11:20 ` Ilpo Järvinen
0 siblings, 0 replies; 4+ messages in thread
From: Ilpo Järvinen @ 2007-10-08 11:20 UTC (permalink / raw)
To: TAKANO Ryousei; +Cc: Netdev, y-kodama
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1177 bytes --]
On Sun, 7 Oct 2007, TAKANO Ryousei wrote:
> From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
> > On Thu, 4 Oct 2007, TAKANO Ryousei wrote:
> >
> > > @@ -1083,6 +1104,9 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
> > > int fack_count;
> > > int dup_sack = (found_dup_sack && (i == first_sack_index));
> > >
> > > + if (sack_block_skip[i])
> > > + continue;
> >
> > By doing this skipping here, you actually end up crippling lost_retrans
> > detection even more than it was broken before. ...You probably didn't just
> > notice that during tests because of unrelated suboptimal behavior (in
> > fastpath_skb_hint handling). ...Anyway, correctness of this should be
> > evaluated against the fixed lost_retrans, rather than the already
> > broken one.
>
> You can find the result that the average goodput slightly improves
> against the only PATCH #1 (fixed lost_retrans) applied kernel.
...Yeah, sorry, I was mistaken when the slow-path is taken and when it's
not, thought that the old (current) fastpath_skb_hint code was more
intelligent than it currently is... :-) That continue should not be
a problem.
--
i.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-10-08 11:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-04 9:44 [RFC][PATCH 2/2] TCP: skip processing cached SACK blocks TAKANO Ryousei
2007-10-05 10:37 ` Ilpo Järvinen
2007-10-07 6:11 ` TAKANO Ryousei
2007-10-08 11:20 ` Ilpo Järvinen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).