[PATCH net-next 0/4] Fix for TCP PMTU

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/4] Fix for TCP PMTU
@ 2015-03-10  9:16 Fan Du
  2015-03-10  9:16 ` [PATCH net-next 1/4] ipv4: Rename tcp_mtu_probing to tcp_blackhole_probe Fan Du
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Fan Du @ 2015-03-10  9:16 UTC (permalink / raw)
  To: johnwheffner; +Cc: davem, netdev, fengyuleidian0615

Hi,

This patchset performs some minor fixes for TCP PMTU,
including correct length condition checking when building
probe segment, then checking if there is enough data avaiable
with probe_size, and finally a cosmetics patch to rename
tcp_mtu_probing with tcp_blackhole_probe for clarity.
In addition, clamp sysctl_tcp_probe_threshold.

Fan Du (4):
  ipv4: Rename tcp_mtu_probing to tcp_blackhole_probe
  ipv4: Correct two conditions checking when buildig nskb
  ipv4: Use probe_size to check write queue data length
  ipv4: clamp sysctl_tcp_probe_threshold

 net/ipv4/tcp_output.c |   14 ++++++--------
 net/ipv4/tcp_timer.c  |    5 +++--
 2 files changed, 9 insertions(+), 10 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next 1/4] ipv4: Rename tcp_mtu_probing to tcp_blackhole_probe
  2015-03-10  9:16 [PATCH net-next 0/4] Fix for TCP PMTU Fan Du
@ 2015-03-10  9:16 ` Fan Du
  2015-03-10  9:16 ` [PATCH net-next 2/4] ipv4: Correct two conditions checking when buildig nskb Fan Du
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Fan Du @ 2015-03-10  9:16 UTC (permalink / raw)
  To: johnwheffner; +Cc: davem, netdev, fengyuleidian0615

This is a cosmetic patch by using tcp_blackhole_probe to
clear confusion between tcp_mtu_probing and tcp_mtu_probe.
In essence keep running into tcp_mtu_probing is a sign of
blackhole incident.

Signed-off-by: Fan Du <fan.du@intel.com>
---
 net/ipv4/tcp_timer.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 1550593..abde4e1 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -99,7 +99,8 @@ static int tcp_orphan_retries(struct sock *sk, int alive)
 	return retries;
 }
 
-static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
+static void tcp_blackhole_probe(struct inet_connection_sock *icsk,
+				struct sock *sk)
 {
 	struct net *net = sock_net(sk);
 
@@ -177,7 +178,7 @@ static int tcp_write_timeout(struct sock *sk)
 	} else {
 		if (retransmits_timed_out(sk, sysctl_tcp_retries1, 0, 0)) {
 			/* Black hole detection */
-			tcp_mtu_probing(icsk, sk);
+			tcp_blackhole_probe(icsk, sk);
 
 			dst_negative_advice(sk);
 		}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 2/4] ipv4: Correct two conditions checking when buildig nskb
  2015-03-10  9:16 [PATCH net-next 0/4] Fix for TCP PMTU Fan Du
  2015-03-10  9:16 ` [PATCH net-next 1/4] ipv4: Rename tcp_mtu_probing to tcp_blackhole_probe Fan Du
@ 2015-03-10  9:16 ` Fan Du
  2015-03-18 12:44   ` John Heffner
  2015-03-10  9:16 ` [PATCH net-next 3/4] ipv4: Use probe_size to check write queue data length Fan Du
  2015-03-10  9:16 ` [PATCH net-next 4/4] ipv4: clamp sysctl_tcp_probe_threshold Fan Du
  3 siblings, 1 reply; 9+ messages in thread
From: Fan Du @ 2015-03-10  9:16 UTC (permalink / raw)
  To: johnwheffner; +Cc: davem, netdev, fengyuleidian0615

When traversing write queue to build 'nskb', the size of 'copy'
could only be equal or smaller than skb->len.

And as we need exactly 'probe_size' bytes data for 'nskb', thus
break condition could only be len == probe_size.

Signed-off-by: Fan Du <fan.du@intel.com>
---
 net/ipv4/tcp_output.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5a73ad5..79977d1 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1954,7 +1954,7 @@ static int tcp_mtu_probe(struct sock *sk)
 							    skb_put(nskb, copy),
 							    copy, nskb->csum);
 
-		if (skb->len <= copy) {
+		if (skb->len == copy) {
 			/* We've eaten all the data from this skb.
 			 * Throw it away. */
 			TCP_SKB_CB(nskb)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags;
@@ -1977,7 +1977,7 @@ static int tcp_mtu_probe(struct sock *sk)
 
 		len += copy;
 
-		if (len >= probe_size)
+		if (len == probe_size)
 			break;
 	}
 	tcp_init_tso_segs(sk, nskb, nskb->len);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 3/4] ipv4: Use probe_size to check write queue data length
  2015-03-10  9:16 [PATCH net-next 0/4] Fix for TCP PMTU Fan Du
  2015-03-10  9:16 ` [PATCH net-next 1/4] ipv4: Rename tcp_mtu_probing to tcp_blackhole_probe Fan Du
  2015-03-10  9:16 ` [PATCH net-next 2/4] ipv4: Correct two conditions checking when buildig nskb Fan Du
@ 2015-03-10  9:16 ` Fan Du
  2015-03-10 12:26   ` John Heffner
  2015-03-10  9:16 ` [PATCH net-next 4/4] ipv4: clamp sysctl_tcp_probe_threshold Fan Du
  3 siblings, 1 reply; 9+ messages in thread
From: Fan Du @ 2015-03-10  9:16 UTC (permalink / raw)
  To: johnwheffner; +Cc: davem, netdev, fengyuleidian0615

When building a probe segment, only 'probe_size' of bytes
is needed, so check write queue data length with probe_size
instead of size_needed.

Signed-off-by: Fan Du <fan.du@intel.com>
---
 net/ipv4/tcp_output.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 79977d1..89d3f84 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1872,7 +1872,6 @@ static int tcp_mtu_probe(struct sock *sk)
 	struct net *net = sock_net(sk);
 	int len;
 	int probe_size;
-	int size_needed;
 	int copy;
 	int mss_now;
 	int interval;
@@ -1895,7 +1894,6 @@ static int tcp_mtu_probe(struct sock *sk)
 	mss_now = tcp_current_mss(sk);
 	probe_size = tcp_mtu_to_mss(sk, (icsk->icsk_mtup.search_high +
 				    icsk->icsk_mtup.search_low) >> 1);
-	size_needed = probe_size + (tp->reordering + 1) * tp->mss_cache;
 	interval = icsk->icsk_mtup.search_high - icsk->icsk_mtup.search_low;
 	/* When misfortune happens, we are reprobing actively,
 	 * and then reprobe timer has expired. We stick with current
@@ -1911,12 +1909,12 @@ static int tcp_mtu_probe(struct sock *sk)
 	}
 
 	/* Have enough data in the send queue to probe? */
-	if (tp->write_seq - tp->snd_nxt < size_needed)
+	if (tp->write_seq - tp->snd_nxt < probe_size)
 		return -1;
 
-	if (tp->snd_wnd < size_needed)
+	if (tp->snd_wnd < probe_size)
 		return -1;
-	if (after(tp->snd_nxt + size_needed, tcp_wnd_end(tp)))
+	if (after(tp->snd_nxt + probe_size, tcp_wnd_end(tp)))
 		return 0;
 
 	/* Do we need to wait to drain cwnd? With none in flight, don't stall */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 4/4] ipv4: clamp sysctl_tcp_probe_threshold
  2015-03-10  9:16 [PATCH net-next 0/4] Fix for TCP PMTU Fan Du
                   ` (2 preceding siblings ...)
  2015-03-10  9:16 ` [PATCH net-next 3/4] ipv4: Use probe_size to check write queue data length Fan Du
@ 2015-03-10  9:16 ` Fan Du
  3 siblings, 0 replies; 9+ messages in thread
From: Fan Du @ 2015-03-10  9:16 UTC (permalink / raw)
  To: johnwheffner; +Cc: davem, netdev, fengyuleidian0615

Signed-off-by: Fan Du <fan.du@intel.com>
---
 net/ipv4/tcp_output.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 89d3f84..dc0221b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1900,7 +1900,7 @@ static int tcp_mtu_probe(struct sock *sk)
 	 * probing process by not resetting search range to its orignal.
 	 */
 	if (probe_size > tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_high) ||
-		interval < net->ipv4.sysctl_tcp_probe_threshold) {
+		interval < max(1, net->ipv4.sysctl_tcp_probe_threshold)) {
 		/* Check whether enough time has elaplased for
 		 * another round of probing.
 		 */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 3/4] ipv4: Use probe_size to check write queue data length
  2015-03-10  9:16 ` [PATCH net-next 3/4] ipv4: Use probe_size to check write queue data length Fan Du
@ 2015-03-10 12:26   ` John Heffner
  2015-03-18  1:37     ` Fan Du
  0 siblings, 1 reply; 9+ messages in thread
From: John Heffner @ 2015-03-10 12:26 UTC (permalink / raw)
  To: Fan Du; +Cc: David Miller, Netdev, Fan Du

NACK.  From RFC4821:

   In addition, the timely loss detection algorithms in most protocols
   have pre-conditions that SHOULD be satisfied before sending a probe.
   For example, TCP Fast Retransmit is not robust unless there are
   sufficient segments following a probe; that is, the sender SHOULD
   have enough data queued and sufficient receiver window to send the
   probe plus at least Tcprexmtthresh [RFC2760] additional segments.
   This restriction may inhibit probing in some protocol states, such as
   too close to the end of a connection, or when the window is too
   small.


On Tue, Mar 10, 2015 at 5:16 AM, Fan Du <fan.du@intel.com> wrote:
> When building a probe segment, only 'probe_size' of bytes
> is needed, so check write queue data length with probe_size
> instead of size_needed.
>
> Signed-off-by: Fan Du <fan.du@intel.com>
> ---
>  net/ipv4/tcp_output.c |    8 +++-----
>  1 files changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 79977d1..89d3f84 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1872,7 +1872,6 @@ static int tcp_mtu_probe(struct sock *sk)
>         struct net *net = sock_net(sk);
>         int len;
>         int probe_size;
> -       int size_needed;
>         int copy;
>         int mss_now;
>         int interval;
> @@ -1895,7 +1894,6 @@ static int tcp_mtu_probe(struct sock *sk)
>         mss_now = tcp_current_mss(sk);
>         probe_size = tcp_mtu_to_mss(sk, (icsk->icsk_mtup.search_high +
>                                     icsk->icsk_mtup.search_low) >> 1);
> -       size_needed = probe_size + (tp->reordering + 1) * tp->mss_cache;
>         interval = icsk->icsk_mtup.search_high - icsk->icsk_mtup.search_low;
>         /* When misfortune happens, we are reprobing actively,
>          * and then reprobe timer has expired. We stick with current
> @@ -1911,12 +1909,12 @@ static int tcp_mtu_probe(struct sock *sk)
>         }
>
>         /* Have enough data in the send queue to probe? */
> -       if (tp->write_seq - tp->snd_nxt < size_needed)
> +       if (tp->write_seq - tp->snd_nxt < probe_size)
>                 return -1;
>
> -       if (tp->snd_wnd < size_needed)
> +       if (tp->snd_wnd < probe_size)
>                 return -1;
> -       if (after(tp->snd_nxt + size_needed, tcp_wnd_end(tp)))
> +       if (after(tp->snd_nxt + probe_size, tcp_wnd_end(tp)))
>                 return 0;
>
>         /* Do we need to wait to drain cwnd? With none in flight, don't stall */
> --
> 1.7.1
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 3/4] ipv4: Use probe_size to check write queue data length
  2015-03-10 12:26   ` John Heffner
@ 2015-03-18  1:37     ` Fan Du
  2015-03-18 12:41       ` John Heffner
  0 siblings, 1 reply; 9+ messages in thread
From: Fan Du @ 2015-03-18  1:37 UTC (permalink / raw)
  To: John Heffner; +Cc: Fan Du, David Miller, Netdev

于 2015年03月10日 20:26, John Heffner 写道:
> NACK.  From RFC4821:
>
>     In addition, the timely loss detection algorithms in most protocols
>     have pre-conditions that SHOULD be satisfied before sending a probe.
>     For example, TCP Fast Retransmit is not robust unless there are
>     sufficient segments following a probe; that is, the sender SHOULD
>     have enough data queued and sufficient receiver window to send the
>     probe plus at least Tcprexmtthresh [RFC2760] additional segments.
>     This restriction may inhibit probing in some protocol states, such as
>     too close to the end of a connection, or when the window is too
>     small.
>

Thanks for pointing this out for me.

My limit understanding is the extra segments is used to trigger fast retransmit,
and the conditions is the count of duplicate ack. Then why needs an extra one more
segment here besides 'reordering' segment to trigger this?

size_needed = probe_size + (tp->reordering + 1) * tp->mss_cache;
                                             ^^^
btw, any comments for the rest of patches?

-- 
天下英雄出我辈，一入江湖岁月催。
鸿图霸业谈笑间，不胜人生一场醉。
提剑跨骑挥鬼雨，白骨如山鸟惊飞。
尘世如潮人如水，只叹江湖几人回。

来，干了这碗酒，接着解BUG!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 3/4] ipv4: Use probe_size to check write queue data length
  2015-03-18  1:37     ` Fan Du
@ 2015-03-18 12:41       ` John Heffner
  0 siblings, 0 replies; 9+ messages in thread
From: John Heffner @ 2015-03-18 12:41 UTC (permalink / raw)
  To: Fan Du; +Cc: Fan Du, David Miller, Netdev

On Tue, Mar 17, 2015 at 9:37 PM, Fan Du <fengyuleidian0615@gmail.com> wrote:
> 于 2015年03月10日 20:26, John Heffner 写道:
>>
>> NACK.  From RFC4821:
>>
>>     In addition, the timely loss detection algorithms in most protocols
>>     have pre-conditions that SHOULD be satisfied before sending a probe.
>>     For example, TCP Fast Retransmit is not robust unless there are
>>     sufficient segments following a probe; that is, the sender SHOULD
>>     have enough data queued and sufficient receiver window to send the
>>     probe plus at least Tcprexmtthresh [RFC2760] additional segments.
>>     This restriction may inhibit probing in some protocol states, such as
>>     too close to the end of a connection, or when the window is too
>>     small.
>>
>
> Thanks for pointing this out for me.
>
> My limit understanding is the extra segments is used to trigger fast
> retransmit,
> and the conditions is the count of duplicate ack. Then why needs an extra
> one more
> segment here besides 'reordering' segment to trigger this?
>
> size_needed = probe_size + (tp->reordering + 1) * tp->mss_cache;
>                                             ^^^

I can't recall a particular reason for the plus one.  It may be
unnecessarily conservative.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 2/4] ipv4: Correct two conditions checking when buildig nskb
  2015-03-10  9:16 ` [PATCH net-next 2/4] ipv4: Correct two conditions checking when buildig nskb Fan Du
@ 2015-03-18 12:44   ` John Heffner
  0 siblings, 0 replies; 9+ messages in thread
From: John Heffner @ 2015-03-18 12:44 UTC (permalink / raw)
  To: Fan Du; +Cc: David Miller, Netdev, Fan Du

On Tue, Mar 10, 2015 at 5:16 AM, Fan Du <fan.du@intel.com> wrote:
> When traversing write queue to build 'nskb', the size of 'copy'
> could only be equal or smaller than skb->len.
>
> And as we need exactly 'probe_size' bytes data for 'nskb', thus
> break condition could only be len == probe_size.
>
> Signed-off-by: Fan Du <fan.du@intel.com>
> ---
>  net/ipv4/tcp_output.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 5a73ad5..79977d1 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1954,7 +1954,7 @@ static int tcp_mtu_probe(struct sock *sk)
>                                                             skb_put(nskb, copy),
>                                                             copy, nskb->csum);
>
> -               if (skb->len <= copy) {
> +               if (skb->len == copy) {
>                         /* We've eaten all the data from this skb.
>                          * Throw it away. */
>                         TCP_SKB_CB(nskb)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags;
> @@ -1977,7 +1977,7 @@ static int tcp_mtu_probe(struct sock *sk)
>
>                 len += copy;
>
> -               if (len >= probe_size)
> +               if (len == probe_size)
>                         break;
>         }
>         tcp_init_tso_segs(sk, nskb, nskb->len);
> --
> 1.7.1
>

This is called "defensive programming." ;-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-03-18 12:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-10  9:16 [PATCH net-next 0/4] Fix for TCP PMTU Fan Du
2015-03-10  9:16 ` [PATCH net-next 1/4] ipv4: Rename tcp_mtu_probing to tcp_blackhole_probe Fan Du
2015-03-10  9:16 ` [PATCH net-next 2/4] ipv4: Correct two conditions checking when buildig nskb Fan Du
2015-03-18 12:44   ` John Heffner
2015-03-10  9:16 ` [PATCH net-next 3/4] ipv4: Use probe_size to check write queue data length Fan Du
2015-03-10 12:26   ` John Heffner
2015-03-18  1:37     ` Fan Du
2015-03-18 12:41       ` John Heffner
2015-03-10  9:16 ` [PATCH net-next 4/4] ipv4: clamp sysctl_tcp_probe_threshold Fan Du

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).