* [RFC PATCH 0/2]: TCP MTUprobe fixes
@ 2007-11-21 16:01 Ilpo Järvinen
2007-11-21 16:01 ` [RFC PATCH 1/2] [TCP]: MTUprobe: receiver window & data available checks fixed Ilpo Järvinen
0 siblings, 1 reply; 9+ messages in thread
From: Ilpo Järvinen @ 2007-11-21 16:01 UTC (permalink / raw)
To: Herbert Xu, John Heffner; +Cc: netdev
Hi all,
Here are two other things in MTU probe code that caught my
attention while attempting to figure out the sk_send_head
usage there (sent patch to that earlier). The latter here is
not strictly speaking a fix but the original code has striking
complexity to perform a query which can be reduced to a simple
operation, thus I included it here as well. If these seem fine
to you as well, inclusion net-2.6 would be nice. Only compile
tested.
--
i.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 1/2] [TCP]: MTUprobe: receiver window & data available checks fixed
2007-11-21 16:01 [RFC PATCH 0/2]: TCP MTUprobe fixes Ilpo Järvinen
@ 2007-11-21 16:01 ` Ilpo Järvinen
2007-11-21 16:01 ` [RFC PATCH 2/2] [TCP] MTUprobe: Cleanup send queue check (no need to loop) Ilpo Järvinen
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Ilpo Järvinen @ 2007-11-21 16:01 UTC (permalink / raw)
To: Herbert Xu, John Heffner; +Cc: netdev
It seems that the checked range for receiver window check should
begin from the first rather than from the last skb that is going
to be included to the probe. And that can be achieved without
reference to skbs at all, snd_nxt and write_seq provides the
correct seqno already. Plus, it SHOULD account packets that are
necessary to trigger fast retransmit [RFC4821].
Location of snd_wnd < probe_size/size_needed check is bogus
because it will cause the other if() match as well (due to
snd_nxt >= snd_una invariant).
Removed dead obvious comment.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
net/ipv4/tcp_output.c | 17 ++++++++---------
1 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 30d6737..ff22ce8 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1289,6 +1289,7 @@ static int tcp_mtu_probe(struct sock *sk)
struct sk_buff *skb, *nskb, *next;
int len;
int probe_size;
+ int size_needed;
unsigned int pif;
int copy;
int mss_now;
@@ -1307,6 +1308,7 @@ static int tcp_mtu_probe(struct sock *sk)
/* Very simple search strategy: just double the MSS. */
mss_now = tcp_current_mss(sk, 0);
probe_size = 2*tp->mss_cache;
+ size_needed = probe_size + (tp->reordering + 1) * mss_now;
if (probe_size > tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_high)) {
/* TODO: set timer for probe_converge_event */
return -1;
@@ -1316,18 +1318,15 @@ static int tcp_mtu_probe(struct sock *sk)
len = 0;
if ((skb = tcp_send_head(sk)) == NULL)
return -1;
- while ((len += skb->len) < probe_size && !tcp_skb_is_last(sk, skb))
+ while ((len += skb->len) < size_needed && !tcp_skb_is_last(sk, skb))
skb = tcp_write_queue_next(sk, skb);
- if (len < probe_size)
+ if (len < size_needed)
return -1;
- /* Receive window check. */
- if (after(TCP_SKB_CB(skb)->seq + probe_size, tp->snd_una + tp->snd_wnd)) {
- if (tp->snd_wnd < probe_size)
- return -1;
- else
- return 0;
- }
+ if (tp->snd_wnd < size_needed)
+ return -1;
+ if (after(tp->snd_nxt + size_needed, tp->snd_una + tp->snd_wnd))
+ return 0;
/* Do we need to wait to drain cwnd? */
pif = tcp_packets_in_flight(tp);
--
1.5.0.6
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [RFC PATCH 2/2] [TCP] MTUprobe: Cleanup send queue check (no need to loop)
2007-11-21 16:01 ` [RFC PATCH 1/2] [TCP]: MTUprobe: receiver window & data available checks fixed Ilpo Järvinen
@ 2007-11-21 16:01 ` Ilpo Järvinen
2007-11-21 16:25 ` John Heffner
2007-11-22 11:00 ` Herbert Xu
2007-11-21 16:25 ` [RFC PATCH 1/2] [TCP]: MTUprobe: receiver window & data available checks fixed John Heffner
2007-11-22 9:27 ` Herbert Xu
2 siblings, 2 replies; 9+ messages in thread
From: Ilpo Järvinen @ 2007-11-21 16:01 UTC (permalink / raw)
To: Herbert Xu, John Heffner; +Cc: netdev
The original code has striking complexity to perform a query
which can be reduced to a very simple compare.
FIN seqno may be included to write_seq but it should not make
any significant difference here compared to skb->len which was
used previously. One won't end up there with SYN still queued.
Use of write_seq check guarantees that there's a valid skb in
send_head so I removed the extra check.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
net/ipv4/tcp_output.c | 7 +------
1 files changed, 1 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ff22ce8..1822ce6 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1315,12 +1315,7 @@ static int tcp_mtu_probe(struct sock *sk)
}
/* Have enough data in the send queue to probe? */
- len = 0;
- if ((skb = tcp_send_head(sk)) == NULL)
- return -1;
- while ((len += skb->len) < size_needed && !tcp_skb_is_last(sk, skb))
- skb = tcp_write_queue_next(sk, skb);
- if (len < size_needed)
+ if (tp->write_seq - tp->snd_nxt < size_needed)
return -1;
if (tp->snd_wnd < size_needed)
--
1.5.0.6
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 2/2] [TCP] MTUprobe: Cleanup send queue check (no need to loop)
2007-11-21 16:01 ` [RFC PATCH 2/2] [TCP] MTUprobe: Cleanup send queue check (no need to loop) Ilpo Järvinen
@ 2007-11-21 16:25 ` John Heffner
2007-11-22 11:00 ` Herbert Xu
1 sibling, 0 replies; 9+ messages in thread
From: John Heffner @ 2007-11-21 16:25 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: Herbert Xu, netdev
Ilpo Järvinen wrote:
> The original code has striking complexity to perform a query
> which can be reduced to a very simple compare.
>
> FIN seqno may be included to write_seq but it should not make
> any significant difference here compared to skb->len which was
> used previously. One won't end up there with SYN still queued.
>
> Use of write_seq check guarantees that there's a valid skb in
> send_head so I removed the extra check.
>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: John Heffner <jheffner@psc.edu>
> ---
> net/ipv4/tcp_output.c | 7 +------
> 1 files changed, 1 insertions(+), 6 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index ff22ce8..1822ce6 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1315,12 +1315,7 @@ static int tcp_mtu_probe(struct sock *sk)
> }
>
> /* Have enough data in the send queue to probe? */
> - len = 0;
> - if ((skb = tcp_send_head(sk)) == NULL)
> - return -1;
> - while ((len += skb->len) < size_needed && !tcp_skb_is_last(sk, skb))
> - skb = tcp_write_queue_next(sk, skb);
> - if (len < size_needed)
> + if (tp->write_seq - tp->snd_nxt < size_needed)
> return -1;
>
> if (tp->snd_wnd < size_needed)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 1/2] [TCP]: MTUprobe: receiver window & data available checks fixed
2007-11-21 16:01 ` [RFC PATCH 1/2] [TCP]: MTUprobe: receiver window & data available checks fixed Ilpo Järvinen
2007-11-21 16:01 ` [RFC PATCH 2/2] [TCP] MTUprobe: Cleanup send queue check (no need to loop) Ilpo Järvinen
@ 2007-11-21 16:25 ` John Heffner
2007-11-22 9:27 ` Herbert Xu
2 siblings, 0 replies; 9+ messages in thread
From: John Heffner @ 2007-11-21 16:25 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: Herbert Xu, netdev
Ilpo Järvinen wrote:
> It seems that the checked range for receiver window check should
> begin from the first rather than from the last skb that is going
> to be included to the probe. And that can be achieved without
> reference to skbs at all, snd_nxt and write_seq provides the
> correct seqno already. Plus, it SHOULD account packets that are
> necessary to trigger fast retransmit [RFC4821].
>
> Location of snd_wnd < probe_size/size_needed check is bogus
> because it will cause the other if() match as well (due to
> snd_nxt >= snd_una invariant).
>
> Removed dead obvious comment.
>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: John Heffner <jheffner@psc.edu>
> ---
> net/ipv4/tcp_output.c | 17 ++++++++---------
> 1 files changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 30d6737..ff22ce8 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1289,6 +1289,7 @@ static int tcp_mtu_probe(struct sock *sk)
> struct sk_buff *skb, *nskb, *next;
> int len;
> int probe_size;
> + int size_needed;
> unsigned int pif;
> int copy;
> int mss_now;
> @@ -1307,6 +1308,7 @@ static int tcp_mtu_probe(struct sock *sk)
> /* Very simple search strategy: just double the MSS. */
> mss_now = tcp_current_mss(sk, 0);
> probe_size = 2*tp->mss_cache;
> + size_needed = probe_size + (tp->reordering + 1) * mss_now;
> if (probe_size > tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_high)) {
> /* TODO: set timer for probe_converge_event */
> return -1;
> @@ -1316,18 +1318,15 @@ static int tcp_mtu_probe(struct sock *sk)
> len = 0;
> if ((skb = tcp_send_head(sk)) == NULL)
> return -1;
> - while ((len += skb->len) < probe_size && !tcp_skb_is_last(sk, skb))
> + while ((len += skb->len) < size_needed && !tcp_skb_is_last(sk, skb))
> skb = tcp_write_queue_next(sk, skb);
> - if (len < probe_size)
> + if (len < size_needed)
> return -1;
>
> - /* Receive window check. */
> - if (after(TCP_SKB_CB(skb)->seq + probe_size, tp->snd_una + tp->snd_wnd)) {
> - if (tp->snd_wnd < probe_size)
> - return -1;
> - else
> - return 0;
> - }
> + if (tp->snd_wnd < size_needed)
> + return -1;
> + if (after(tp->snd_nxt + size_needed, tp->snd_una + tp->snd_wnd))
> + return 0;
>
> /* Do we need to wait to drain cwnd? */
> pif = tcp_packets_in_flight(tp);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 1/2] [TCP]: MTUprobe: receiver window & data available checks fixed
2007-11-21 16:01 ` [RFC PATCH 1/2] [TCP]: MTUprobe: receiver window & data available checks fixed Ilpo Järvinen
2007-11-21 16:01 ` [RFC PATCH 2/2] [TCP] MTUprobe: Cleanup send queue check (no need to loop) Ilpo Järvinen
2007-11-21 16:25 ` [RFC PATCH 1/2] [TCP]: MTUprobe: receiver window & data available checks fixed John Heffner
@ 2007-11-22 9:27 ` Herbert Xu
2007-11-22 13:11 ` Ilpo Järvinen
2 siblings, 1 reply; 9+ messages in thread
From: Herbert Xu @ 2007-11-22 9:27 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: John Heffner, netdev
On Wed, Nov 21, 2007 at 06:01:27PM +0200, Ilpo Järvinen wrote:
>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Thanks for the patch Ilpo!
I've just got a couple small questions.
> @@ -1307,6 +1308,7 @@ static int tcp_mtu_probe(struct sock *sk)
> /* Very simple search strategy: just double the MSS. */
> mss_now = tcp_current_mss(sk, 0);
> probe_size = 2*tp->mss_cache;
> + size_needed = probe_size + (tp->reordering + 1) * mss_now;
Should we use mss_now here or mss_cache? It would seem that an
over-estimate would be safer than an under-estimate.
Also should that be Tcprexmtthresh segments at the current MTU
or the probed MTU?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 2/2] [TCP] MTUprobe: Cleanup send queue check (no need to loop)
2007-11-21 16:01 ` [RFC PATCH 2/2] [TCP] MTUprobe: Cleanup send queue check (no need to loop) Ilpo Järvinen
2007-11-21 16:25 ` John Heffner
@ 2007-11-22 11:00 ` Herbert Xu
1 sibling, 0 replies; 9+ messages in thread
From: Herbert Xu @ 2007-11-22 11:00 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: John Heffner, netdev
On Wed, Nov 21, 2007 at 06:01:28PM +0200, Ilpo Järvinen wrote:
> The original code has striking complexity to perform a query
> which can be reduced to a very simple compare.
>
> FIN seqno may be included to write_seq but it should not make
> any significant difference here compared to skb->len which was
> used previously. One won't end up there with SYN still queued.
>
> Use of write_seq check guarantees that there's a valid skb in
> send_head so I removed the extra check.
>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
OK, this one looks pretty straightforward. I'll put it in once
we resolve the other patch.
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 1/2] [TCP]: MTUprobe: receiver window & data available checks fixed
2007-11-22 9:27 ` Herbert Xu
@ 2007-11-22 13:11 ` Ilpo Järvinen
2007-11-22 15:16 ` Herbert Xu
0 siblings, 1 reply; 9+ messages in thread
From: Ilpo Järvinen @ 2007-11-22 13:11 UTC (permalink / raw)
To: Herbert Xu; +Cc: John Heffner, Netdev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3724 bytes --]
On Thu, 22 Nov 2007, Herbert Xu wrote:
> On Wed, Nov 21, 2007 at 06:01:27PM +0200, Ilpo Järvinen wrote:
> >
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
>
> Thanks for the patch Ilpo!
>
> I've just got a couple small questions.
...Thanks for actually commenting it. :-)
> > @@ -1307,6 +1308,7 @@ static int tcp_mtu_probe(struct sock *sk)
> > /* Very simple search strategy: just double the MSS. */
> > mss_now = tcp_current_mss(sk, 0);
> > probe_size = 2*tp->mss_cache;
> > + size_needed = probe_size + (tp->reordering + 1) * mss_now;
>
> Should we use mss_now here or mss_cache? It would seem that an
> over-estimate would be safer than an under-estimate.
You're right, it's definately better that way... New version below, I
just tweaked that line inline :-).
> Also should that be Tcprexmtthresh segments at the current MTU
> or the probed MTU?
...We'll be using current MTU sized packets except for the probe until
the probe has succeeded, only from that point onward will the new packets
will be sent with the larger MTU. If the probe succeeds, all reservations
we made here are no longer meaningful or necessary anyway. The point is to
detect failure by having fast recovery triggered by the current MTU sized
packets that follow the probe, that's because we "know" that those smaller
packets are not going to be dropped by the path even though an oversize
probe would be.
--
i.
--
[PATCH 1/2] [TCP]: MTUprobe: receiver window & data available checks fixed
It seems that the checked range for receiver window check should
begin from the first rather than from the last skb that is going
to be included to the probe. And that can be achieved without
reference to skbs at all, snd_nxt and write_seq provides the
correct seqno already. Plus, it SHOULD account packets that are
necessary to trigger fast retransmit [RFC4821].
Location of snd_wnd < probe_size/size_needed check is bogus
because it will cause the other if() match as well (due to
snd_nxt >= snd_una invariant).
Removed dead obvious comment.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
net/ipv4/tcp_output.c | 17 ++++++++---------
1 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 30d6737..ff22ce8 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1289,6 +1289,7 @@ static int tcp_mtu_probe(struct sock *sk)
struct sk_buff *skb, *nskb, *next;
int len;
int probe_size;
+ int size_needed;
unsigned int pif;
int copy;
int mss_now;
@@ -1307,6 +1308,7 @@ static int tcp_mtu_probe(struct sock *sk)
/* Very simple search strategy: just double the MSS. */
mss_now = tcp_current_mss(sk, 0);
probe_size = 2*tp->mss_cache;
+ size_needed = probe_size + (tp->reordering + 1) * tp->mss_cache;
if (probe_size > tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_high)) {
/* TODO: set timer for probe_converge_event */
return -1;
@@ -1316,18 +1318,15 @@ static int tcp_mtu_probe(struct sock *sk)
len = 0;
if ((skb = tcp_send_head(sk)) == NULL)
return -1;
- while ((len += skb->len) < probe_size && !tcp_skb_is_last(sk, skb))
+ while ((len += skb->len) < size_needed && !tcp_skb_is_last(sk, skb))
skb = tcp_write_queue_next(sk, skb);
- if (len < probe_size)
+ if (len < size_needed)
return -1;
- /* Receive window check. */
- if (after(TCP_SKB_CB(skb)->seq + probe_size, tp->snd_una + tp->snd_wnd)) {
- if (tp->snd_wnd < probe_size)
- return -1;
- else
- return 0;
- }
+ if (tp->snd_wnd < size_needed)
+ return -1;
+ if (after(tp->snd_nxt + size_needed, tp->snd_una + tp->snd_wnd))
+ return 0;
/* Do we need to wait to drain cwnd? */
pif = tcp_packets_in_flight(tp);
--
1.5.0.6
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH 1/2] [TCP]: MTUprobe: receiver window & data available checks fixed
2007-11-22 13:11 ` Ilpo Järvinen
@ 2007-11-22 15:16 ` Herbert Xu
0 siblings, 0 replies; 9+ messages in thread
From: Herbert Xu @ 2007-11-22 15:16 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: John Heffner, Netdev
On Thu, Nov 22, 2007 at 03:11:35PM +0200, Ilpo Järvinen wrote:
>
> ...We'll be using current MTU sized packets except for the probe until
> the probe has succeeded, only from that point onward will the new packets
> will be sent with the larger MTU. If the probe succeeds, all reservations
> we made here are no longer meaningful or necessary anyway. The point is to
> detect failure by having fast recovery triggered by the current MTU sized
> packets that follow the probe, that's because we "know" that those smaller
> packets are not going to be dropped by the path even though an oversize
> probe would be.
Thanks for the explanation!
I'll apply both patches to net-2.6.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-11-22 15:16 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-21 16:01 [RFC PATCH 0/2]: TCP MTUprobe fixes Ilpo Järvinen
2007-11-21 16:01 ` [RFC PATCH 1/2] [TCP]: MTUprobe: receiver window & data available checks fixed Ilpo Järvinen
2007-11-21 16:01 ` [RFC PATCH 2/2] [TCP] MTUprobe: Cleanup send queue check (no need to loop) Ilpo Järvinen
2007-11-21 16:25 ` John Heffner
2007-11-22 11:00 ` Herbert Xu
2007-11-21 16:25 ` [RFC PATCH 1/2] [TCP]: MTUprobe: receiver window & data available checks fixed John Heffner
2007-11-22 9:27 ` Herbert Xu
2007-11-22 13:11 ` Ilpo Järvinen
2007-11-22 15:16 ` Herbert Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).