netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas
@ 2014-07-25 11:52 Christoph Paasch
  2014-07-25 18:14 ` Stephen Hemminger
  2014-07-29  0:26 ` David Miller
  0 siblings, 2 replies; 7+ messages in thread
From: Christoph Paasch @ 2014-07-25 11:52 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Christoph Paasch, Neal Cardwell, David Laight, Doug Leith

In vegas we do a multiplication of the cwnd and the rtt. This
may overflow and thus their result is stored in a u64. The current code
however does not cast the cwnd to a u64 and thus 32-bit arithmetic will
be done. This means, that in case of an integer overflow, the result is
completly wrong.

This patch fixes it, by splitting the calculation of target_cwnd in two:

1. The non-overflow case: We just do a regular division here.
2. The overflow-case: In this case we also want to avoid doing a costly do_div.
   So, we calculate the upper 32 bits (that are overflowing) and the
   error and add everything up. More details are in the comment in
   tcp_vegas.c

For the accuracy, I tested this with a python script that does the
same 32-bit arithmetic and compared the difference of this one with
the result of floating-point arithmetic with the following ranges in
a space-filling design across this 3-dimensional space:

snd_cwnd : [1, 2^31 / 1500] (that's the maximum congestion-window size,
                             assuming a send-buffer of 2^31 and a MSS of 1500)
rtt: [1, 2^28]
baseRTT: [1, rtt]

The error is never bigger than 10% in this simulation.

If I set the rtt bigger than 2^28 the error may grow up to 50%.

Cc: Neal Cardwell <ncardwell@google.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Doug Leith <doug.leith@nuim.ie>
Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
---

Notes:
    v2: David Laight noted that a do_div is necessary to allow this on 32-bit machines.
        David Miller then added that a do_div should be avoided. So, v2 handles overflows
        now correctly.
    
        Additionally, the target_cwnd could actually be computed a bit later in the code
        (inside the "if", where it is used). But that's probably rather net-next material.

 net/ipv4/tcp_vegas.c | 34 +++++++++++++++++++++++++++++++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_vegas.c b/net/ipv4/tcp_vegas.c
index 9a5e05f27f4f..ec714d91581e 100644
--- a/net/ipv4/tcp_vegas.c
+++ b/net/ipv4/tcp_vegas.c
@@ -196,8 +196,8 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked)
 			 */
 			tcp_reno_cong_avoid(sk, ack, acked);
 		} else {
-			u32 rtt, diff;
-			u64 target_cwnd;
+			u32 rtt, diff, target_cwnd;
+			u64 cwnd_rtt;
 
 			/* We have enough RTT samples, so, using the Vegas
 			 * algorithm, we determine if we should increase or
@@ -218,7 +218,35 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked)
 			 * This is:
 			 *     (actual rate in segments) * baseRTT
 			 */
-			target_cwnd = tp->snd_cwnd * vegas->baseRTT / rtt;
+			cwnd_rtt = (u64)tp->snd_cwnd * vegas->baseRTT;
+			if (cwnd_rtt > U32_MAX) {
+				/* We would overflow 32-bit integer arithmetic.
+				 *
+				 * So, we split the calculation by using:
+				 * cwnd * baseRTT = U32_MAX * x
+				 * and x = upper + err / U32_MAX
+				 *
+				 * Which brings us to:
+				 * target_cwnd = U32_MAX /rtt * upper + err / rtt
+				 *
+				 * This approach allows an error of less than
+				 * 10% of the target_cwnd compared to the
+				 * intended cwnd (calculated with floating-point
+				 * numbers) for the following ranges:
+				 * cwnd: 1 to 2^31/1500
+				 * rtt: 1 to 2^28
+				 *
+				 * In case the rtt becomes bigger, the error
+				 * increases to 50%.
+				 */
+
+				u32 upper = (u32)(cwnd_rtt >> 32);
+				u32 err = (u32)(cwnd_rtt & U32_MAX);
+
+				target_cwnd = U32_MAX / rtt * upper + err / rtt;
+			} else {
+				target_cwnd = (u32)cwnd_rtt / rtt;
+			}
 
 			/* Calculate the difference between the window we had,
 			 * and the window we would like to have. This quantity
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas
  2014-07-25 11:52 [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas Christoph Paasch
@ 2014-07-25 18:14 ` Stephen Hemminger
  2014-07-26  8:59   ` Christoph Paasch
  2014-07-29  0:26 ` David Miller
  1 sibling, 1 reply; 7+ messages in thread
From: Stephen Hemminger @ 2014-07-25 18:14 UTC (permalink / raw)
  To: Christoph Paasch
  Cc: David Miller, netdev, Neal Cardwell, David Laight, Doug Leith

On Fri, 25 Jul 2014 13:52:39 +0200
Christoph Paasch <christoph.paasch@uclouvain.be> wrote:

> In vegas we do a multiplication of the cwnd and the rtt. This
> may overflow and thus their result is stored in a u64. The current code
> however does not cast the cwnd to a u64 and thus 32-bit arithmetic will
> be done. This means, that in case of an integer overflow, the result is
> completly wrong.
> 
> This patch fixes it, by splitting the calculation of target_cwnd in two:
> 
> 1. The non-overflow case: We just do a regular division here.
> 2. The overflow-case: In this case we also want to avoid doing a costly do_div.
>    So, we calculate the upper 32 bits (that are overflowing) and the
>    error and add everything up. More details are in the comment in
>    tcp_vegas.c
> 
> For the accuracy, I tested this with a python script that does the
> same 32-bit arithmetic and compared the difference of this one with
> the result of floating-point arithmetic with the following ranges in
> a space-filling design across this 3-dimensional space:
> 
> snd_cwnd : [1, 2^31 / 1500] (that's the maximum congestion-window size,
>                              assuming a send-buffer of 2^31 and a MSS of 1500)
> rtt: [1, 2^28]
> baseRTT: [1, rtt]
> 
> The error is never bigger than 10% in this simulation.
> 
> If I set the rtt bigger than 2^28 the error may grow up to 50%.
> 
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: David Laight <David.Laight@ACULAB.COM>
> Cc: Doug Leith <doug.leith@nuim.ie>
> Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix)
> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>

Wouldnt the simple, dumb approach used by other places doing 64 bit by 32 divide
in the kernel be sufficient?

--- a/net/ipv4/tcp_vegas.c	2014-05-16 20:27:32.499419952 -0700
+++ b/net/ipv4/tcp_vegas.c	2014-07-25 11:14:18.161465900 -0700
@@ -218,7 +218,9 @@ static void tcp_vegas_cong_avoid(struct
 			 * This is:
 			 *     (actual rate in segments) * baseRTT
 			 */
-			target_cwnd = tp->snd_cwnd * vegas->baseRTT / rtt;
+			target_cwnd = tp->snd_cwnd;
+			target_cwnd *= vegas->baseRTT;
+			do_div(target_cwnd, rtt);
 
 			/* Calculate the difference between the window we had,
 			 * and the window we would like to have. This quantity
@@ -238,7 +240,7 @@ static void tcp_vegas_cong_avoid(struct
 				 * truncation robs us of full link
 				 * utilization.
 				 */
-				tp->snd_cwnd = min(tp->snd_cwnd, (u32)target_cwnd+1);
+				tp->snd_cwnd = min_t(u64, tp->snd_cwnd, target_cwnd+1);
 				tp->snd_ssthresh = tcp_vegas_ssthresh(tp);
 
 			} else if (tp->snd_cwnd <= tp->snd_ssthresh) {

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas
  2014-07-25 18:14 ` Stephen Hemminger
@ 2014-07-26  8:59   ` Christoph Paasch
  2014-07-26  9:54     ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Paasch @ 2014-07-26  8:59 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, netdev, Neal Cardwell, David Laight, Doug Leith

Hello Stephen,

On 25/07/14 - 11:14:48, Stephen Hemminger wrote:
> On Fri, 25 Jul 2014 13:52:39 +0200
> Christoph Paasch <christoph.paasch@uclouvain.be> wrote:
> 
> > In vegas we do a multiplication of the cwnd and the rtt. This
> > may overflow and thus their result is stored in a u64. The current code
> > however does not cast the cwnd to a u64 and thus 32-bit arithmetic will
> > be done. This means, that in case of an integer overflow, the result is
> > completly wrong.
> > 
> > This patch fixes it, by splitting the calculation of target_cwnd in two:
> > 
> > 1. The non-overflow case: We just do a regular division here.
> > 2. The overflow-case: In this case we also want to avoid doing a costly do_div.
> >    So, we calculate the upper 32 bits (that are overflowing) and the
> >    error and add everything up. More details are in the comment in
> >    tcp_vegas.c
> > 
> > For the accuracy, I tested this with a python script that does the
> > same 32-bit arithmetic and compared the difference of this one with
> > the result of floating-point arithmetic with the following ranges in
> > a space-filling design across this 3-dimensional space:
> > 
> > snd_cwnd : [1, 2^31 / 1500] (that's the maximum congestion-window size,
> >                              assuming a send-buffer of 2^31 and a MSS of 1500)
> > rtt: [1, 2^28]
> > baseRTT: [1, rtt]
> > 
> > The error is never bigger than 10% in this simulation.
> > 
> > If I set the rtt bigger than 2^28 the error may grow up to 50%.
> > 
> > Cc: Neal Cardwell <ncardwell@google.com>
> > Cc: David Laight <David.Laight@ACULAB.COM>
> > Cc: Doug Leith <doug.leith@nuim.ie>
> > Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix)
> > Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
> 
> Wouldnt the simple, dumb approach used by other places doing 64 bit by 32 divide
> in the kernel be sufficient?

do you mean, using "do_div"?

David suggested to avoid using do_div in tcp_vegas.


Cheers,
Christoph


> 
> --- a/net/ipv4/tcp_vegas.c	2014-05-16 20:27:32.499419952 -0700
> +++ b/net/ipv4/tcp_vegas.c	2014-07-25 11:14:18.161465900 -0700
> @@ -218,7 +218,9 @@ static void tcp_vegas_cong_avoid(struct
>  			 * This is:
>  			 *     (actual rate in segments) * baseRTT
>  			 */
> -			target_cwnd = tp->snd_cwnd * vegas->baseRTT / rtt;
> +			target_cwnd = tp->snd_cwnd;
> +			target_cwnd *= vegas->baseRTT;
> +			do_div(target_cwnd, rtt);
>  
>  			/* Calculate the difference between the window we had,
>  			 * and the window we would like to have. This quantity
> @@ -238,7 +240,7 @@ static void tcp_vegas_cong_avoid(struct
>  				 * truncation robs us of full link
>  				 * utilization.
>  				 */
> -				tp->snd_cwnd = min(tp->snd_cwnd, (u32)target_cwnd+1);
> +				tp->snd_cwnd = min_t(u64, tp->snd_cwnd, target_cwnd+1);
>  				tp->snd_ssthresh = tcp_vegas_ssthresh(tp);
>  
>  			} else if (tp->snd_cwnd <= tp->snd_ssthresh) {
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas
  2014-07-26  8:59   ` Christoph Paasch
@ 2014-07-26  9:54     ` Eric Dumazet
  2014-07-27  9:48       ` Christoph Paasch
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2014-07-26  9:54 UTC (permalink / raw)
  To: Christoph Paasch
  Cc: Stephen Hemminger, David Miller, netdev, Neal Cardwell,
	David Laight, Doug Leith

On Sat, 2014-07-26 at 10:59 +0200, Christoph Paasch wrote:

> do you mean, using "do_div"?
> 
> David suggested to avoid using do_div in tcp_vegas.

My understanding is the following :

On 64bit arches, used on most servers that really care of TCP
performance these days, do_div() is the fastest thing : No extra
conditional.

# define do_div(n,base) ({                                      \
        uint32_t __base = (base);                               \
        uint32_t __rem;                                         \
        __rem = ((uint64_t)(n)) % __base;                       \
        (n) = ((uint64_t)(n)) / __base;                         \
        __rem;                                                  \
 })


Then on 32bit, do_div(target_cwnd, Y) will perform a single divide
if target_cwnd is < 2^32, which is very likely the case :


# define do_div(n,base) ({                              \
        uint32_t __base = (base);                       \
        uint32_t __rem;                                 \
        (void)(((typeof((n)) *)0) == ((uint64_t *)0));  \
        if (likely(((n) >> 32) == 0)) {                 \
                __rem = (uint32_t)(n) % __base;         \
                (n) = (uint32_t)(n) / __base;           \
        } else                                          \
                __rem = __div64_32(&(n), __base);       \
        __rem;                                          \
 })



(In both cases, compiler will remove the modulo operation, as we do not use it)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas
  2014-07-26  9:54     ` Eric Dumazet
@ 2014-07-27  9:48       ` Christoph Paasch
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Paasch @ 2014-07-27  9:48 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David Miller, netdev, Neal Cardwell,
	David Laight, Doug Leith

On 26/07/14 - 11:54:57, Eric Dumazet wrote:
> On Sat, 2014-07-26 at 10:59 +0200, Christoph Paasch wrote:
> 
> > do you mean, using "do_div"?
> > 
> > David suggested to avoid using do_div in tcp_vegas.
> 
> My understanding is the following :
> 
> On 64bit arches, used on most servers that really care of TCP
> performance these days, do_div() is the fastest thing : No extra
> conditional.
> 
> # define do_div(n,base) ({                                      \
>         uint32_t __base = (base);                               \
>         uint32_t __rem;                                         \
>         __rem = ((uint64_t)(n)) % __base;                       \
>         (n) = ((uint64_t)(n)) / __base;                         \
>         __rem;                                                  \
>  })
> 
> 
> Then on 32bit, do_div(target_cwnd, Y) will perform a single divide
> if target_cwnd is < 2^32, which is very likely the case :
> 
> 
> # define do_div(n,base) ({                              \
>         uint32_t __base = (base);                       \
>         uint32_t __rem;                                 \
>         (void)(((typeof((n)) *)0) == ((uint64_t *)0));  \
>         if (likely(((n) >> 32) == 0)) {                 \
>                 __rem = (uint32_t)(n) % __base;         \
>                 (n) = (uint32_t)(n) / __base;           \
>         } else                                          \
>                 __rem = __div64_32(&(n), __base);       \
>         __rem;                                          \
>  })
> 
> 
> 
> (In both cases, compiler will remove the modulo operation, as we do not use it)

I am very fine with using do_div. Indeed, cwnd and rtt must be quite high to
fall into the case of 64-bit divides.

I will wait a bit for other feedback and then send a new version with do_div.


Thanks,
Christoph

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas
  2014-07-25 11:52 [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas Christoph Paasch
  2014-07-25 18:14 ` Stephen Hemminger
@ 2014-07-29  0:26 ` David Miller
  2014-07-29  9:52   ` Christoph Paasch
  1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2014-07-29  0:26 UTC (permalink / raw)
  To: christoph.paasch; +Cc: netdev, ncardwell, David.Laight, doug.leith

From: Christoph Paasch <christoph.paasch@uclouvain.be>
Date: Fri, 25 Jul 2014 13:52:39 +0200

> +				target_cwnd = U32_MAX / rtt * upper + err / rtt;

Doing two divides is probably more expensive than using do_div().

Why don't we go back to the do_div() implementation, sorry about
changing my mind again.

And please resubmit the veno change, it's fine as-is.

Thanks again.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas
  2014-07-29  0:26 ` David Miller
@ 2014-07-29  9:52   ` Christoph Paasch
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Paasch @ 2014-07-29  9:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, ncardwell, David.Laight, doug.leith

On 28/07/14 - 17:26:27, David Miller wrote:
> From: Christoph Paasch <christoph.paasch@uclouvain.be>
> Date: Fri, 25 Jul 2014 13:52:39 +0200
> 
> > +				target_cwnd = U32_MAX / rtt * upper + err / rtt;
> 
> Doing two divides is probably more expensive than using do_div().
> 
> Why don't we go back to the do_div() implementation, sorry about
> changing my mind again.

No worries :)
I will resubmit.


Cheers,
Christoph

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-07-29  9:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-25 11:52 [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas Christoph Paasch
2014-07-25 18:14 ` Stephen Hemminger
2014-07-26  8:59   ` Christoph Paasch
2014-07-26  9:54     ` Eric Dumazet
2014-07-27  9:48       ` Christoph Paasch
2014-07-29  0:26 ` David Miller
2014-07-29  9:52   ` Christoph Paasch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).