* [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas
@ 2014-07-25 11:52 Christoph Paasch
2014-07-25 18:14 ` Stephen Hemminger
2014-07-29 0:26 ` David Miller
0 siblings, 2 replies; 7+ messages in thread
From: Christoph Paasch @ 2014-07-25 11:52 UTC (permalink / raw)
To: David Miller
Cc: netdev, Christoph Paasch, Neal Cardwell, David Laight, Doug Leith
In vegas we do a multiplication of the cwnd and the rtt. This
may overflow and thus their result is stored in a u64. The current code
however does not cast the cwnd to a u64 and thus 32-bit arithmetic will
be done. This means, that in case of an integer overflow, the result is
completly wrong.
This patch fixes it, by splitting the calculation of target_cwnd in two:
1. The non-overflow case: We just do a regular division here.
2. The overflow-case: In this case we also want to avoid doing a costly do_div.
So, we calculate the upper 32 bits (that are overflowing) and the
error and add everything up. More details are in the comment in
tcp_vegas.c
For the accuracy, I tested this with a python script that does the
same 32-bit arithmetic and compared the difference of this one with
the result of floating-point arithmetic with the following ranges in
a space-filling design across this 3-dimensional space:
snd_cwnd : [1, 2^31 / 1500] (that's the maximum congestion-window size,
assuming a send-buffer of 2^31 and a MSS of 1500)
rtt: [1, 2^28]
baseRTT: [1, rtt]
The error is never bigger than 10% in this simulation.
If I set the rtt bigger than 2^28 the error may grow up to 50%.
Cc: Neal Cardwell <ncardwell@google.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Doug Leith <doug.leith@nuim.ie>
Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
---
Notes:
v2: David Laight noted that a do_div is necessary to allow this on 32-bit machines.
David Miller then added that a do_div should be avoided. So, v2 handles overflows
now correctly.
Additionally, the target_cwnd could actually be computed a bit later in the code
(inside the "if", where it is used). But that's probably rather net-next material.
net/ipv4/tcp_vegas.c | 34 +++++++++++++++++++++++++++++++---
1 file changed, 31 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_vegas.c b/net/ipv4/tcp_vegas.c
index 9a5e05f27f4f..ec714d91581e 100644
--- a/net/ipv4/tcp_vegas.c
+++ b/net/ipv4/tcp_vegas.c
@@ -196,8 +196,8 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked)
*/
tcp_reno_cong_avoid(sk, ack, acked);
} else {
- u32 rtt, diff;
- u64 target_cwnd;
+ u32 rtt, diff, target_cwnd;
+ u64 cwnd_rtt;
/* We have enough RTT samples, so, using the Vegas
* algorithm, we determine if we should increase or
@@ -218,7 +218,35 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked)
* This is:
* (actual rate in segments) * baseRTT
*/
- target_cwnd = tp->snd_cwnd * vegas->baseRTT / rtt;
+ cwnd_rtt = (u64)tp->snd_cwnd * vegas->baseRTT;
+ if (cwnd_rtt > U32_MAX) {
+ /* We would overflow 32-bit integer arithmetic.
+ *
+ * So, we split the calculation by using:
+ * cwnd * baseRTT = U32_MAX * x
+ * and x = upper + err / U32_MAX
+ *
+ * Which brings us to:
+ * target_cwnd = U32_MAX /rtt * upper + err / rtt
+ *
+ * This approach allows an error of less than
+ * 10% of the target_cwnd compared to the
+ * intended cwnd (calculated with floating-point
+ * numbers) for the following ranges:
+ * cwnd: 1 to 2^31/1500
+ * rtt: 1 to 2^28
+ *
+ * In case the rtt becomes bigger, the error
+ * increases to 50%.
+ */
+
+ u32 upper = (u32)(cwnd_rtt >> 32);
+ u32 err = (u32)(cwnd_rtt & U32_MAX);
+
+ target_cwnd = U32_MAX / rtt * upper + err / rtt;
+ } else {
+ target_cwnd = (u32)cwnd_rtt / rtt;
+ }
/* Calculate the difference between the window we had,
* and the window we would like to have. This quantity
--
1.9.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas
2014-07-25 11:52 [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas Christoph Paasch
@ 2014-07-25 18:14 ` Stephen Hemminger
2014-07-26 8:59 ` Christoph Paasch
2014-07-29 0:26 ` David Miller
1 sibling, 1 reply; 7+ messages in thread
From: Stephen Hemminger @ 2014-07-25 18:14 UTC (permalink / raw)
To: Christoph Paasch
Cc: David Miller, netdev, Neal Cardwell, David Laight, Doug Leith
On Fri, 25 Jul 2014 13:52:39 +0200
Christoph Paasch <christoph.paasch@uclouvain.be> wrote:
> In vegas we do a multiplication of the cwnd and the rtt. This
> may overflow and thus their result is stored in a u64. The current code
> however does not cast the cwnd to a u64 and thus 32-bit arithmetic will
> be done. This means, that in case of an integer overflow, the result is
> completly wrong.
>
> This patch fixes it, by splitting the calculation of target_cwnd in two:
>
> 1. The non-overflow case: We just do a regular division here.
> 2. The overflow-case: In this case we also want to avoid doing a costly do_div.
> So, we calculate the upper 32 bits (that are overflowing) and the
> error and add everything up. More details are in the comment in
> tcp_vegas.c
>
> For the accuracy, I tested this with a python script that does the
> same 32-bit arithmetic and compared the difference of this one with
> the result of floating-point arithmetic with the following ranges in
> a space-filling design across this 3-dimensional space:
>
> snd_cwnd : [1, 2^31 / 1500] (that's the maximum congestion-window size,
> assuming a send-buffer of 2^31 and a MSS of 1500)
> rtt: [1, 2^28]
> baseRTT: [1, rtt]
>
> The error is never bigger than 10% in this simulation.
>
> If I set the rtt bigger than 2^28 the error may grow up to 50%.
>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: David Laight <David.Laight@ACULAB.COM>
> Cc: Doug Leith <doug.leith@nuim.ie>
> Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix)
> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Wouldnt the simple, dumb approach used by other places doing 64 bit by 32 divide
in the kernel be sufficient?
--- a/net/ipv4/tcp_vegas.c 2014-05-16 20:27:32.499419952 -0700
+++ b/net/ipv4/tcp_vegas.c 2014-07-25 11:14:18.161465900 -0700
@@ -218,7 +218,9 @@ static void tcp_vegas_cong_avoid(struct
* This is:
* (actual rate in segments) * baseRTT
*/
- target_cwnd = tp->snd_cwnd * vegas->baseRTT / rtt;
+ target_cwnd = tp->snd_cwnd;
+ target_cwnd *= vegas->baseRTT;
+ do_div(target_cwnd, rtt);
/* Calculate the difference between the window we had,
* and the window we would like to have. This quantity
@@ -238,7 +240,7 @@ static void tcp_vegas_cong_avoid(struct
* truncation robs us of full link
* utilization.
*/
- tp->snd_cwnd = min(tp->snd_cwnd, (u32)target_cwnd+1);
+ tp->snd_cwnd = min_t(u64, tp->snd_cwnd, target_cwnd+1);
tp->snd_ssthresh = tcp_vegas_ssthresh(tp);
} else if (tp->snd_cwnd <= tp->snd_ssthresh) {
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas
2014-07-25 18:14 ` Stephen Hemminger
@ 2014-07-26 8:59 ` Christoph Paasch
2014-07-26 9:54 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Paasch @ 2014-07-26 8:59 UTC (permalink / raw)
To: Stephen Hemminger
Cc: David Miller, netdev, Neal Cardwell, David Laight, Doug Leith
Hello Stephen,
On 25/07/14 - 11:14:48, Stephen Hemminger wrote:
> On Fri, 25 Jul 2014 13:52:39 +0200
> Christoph Paasch <christoph.paasch@uclouvain.be> wrote:
>
> > In vegas we do a multiplication of the cwnd and the rtt. This
> > may overflow and thus their result is stored in a u64. The current code
> > however does not cast the cwnd to a u64 and thus 32-bit arithmetic will
> > be done. This means, that in case of an integer overflow, the result is
> > completly wrong.
> >
> > This patch fixes it, by splitting the calculation of target_cwnd in two:
> >
> > 1. The non-overflow case: We just do a regular division here.
> > 2. The overflow-case: In this case we also want to avoid doing a costly do_div.
> > So, we calculate the upper 32 bits (that are overflowing) and the
> > error and add everything up. More details are in the comment in
> > tcp_vegas.c
> >
> > For the accuracy, I tested this with a python script that does the
> > same 32-bit arithmetic and compared the difference of this one with
> > the result of floating-point arithmetic with the following ranges in
> > a space-filling design across this 3-dimensional space:
> >
> > snd_cwnd : [1, 2^31 / 1500] (that's the maximum congestion-window size,
> > assuming a send-buffer of 2^31 and a MSS of 1500)
> > rtt: [1, 2^28]
> > baseRTT: [1, rtt]
> >
> > The error is never bigger than 10% in this simulation.
> >
> > If I set the rtt bigger than 2^28 the error may grow up to 50%.
> >
> > Cc: Neal Cardwell <ncardwell@google.com>
> > Cc: David Laight <David.Laight@ACULAB.COM>
> > Cc: Doug Leith <doug.leith@nuim.ie>
> > Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix)
> > Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
>
> Wouldnt the simple, dumb approach used by other places doing 64 bit by 32 divide
> in the kernel be sufficient?
do you mean, using "do_div"?
David suggested to avoid using do_div in tcp_vegas.
Cheers,
Christoph
>
> --- a/net/ipv4/tcp_vegas.c 2014-05-16 20:27:32.499419952 -0700
> +++ b/net/ipv4/tcp_vegas.c 2014-07-25 11:14:18.161465900 -0700
> @@ -218,7 +218,9 @@ static void tcp_vegas_cong_avoid(struct
> * This is:
> * (actual rate in segments) * baseRTT
> */
> - target_cwnd = tp->snd_cwnd * vegas->baseRTT / rtt;
> + target_cwnd = tp->snd_cwnd;
> + target_cwnd *= vegas->baseRTT;
> + do_div(target_cwnd, rtt);
>
> /* Calculate the difference between the window we had,
> * and the window we would like to have. This quantity
> @@ -238,7 +240,7 @@ static void tcp_vegas_cong_avoid(struct
> * truncation robs us of full link
> * utilization.
> */
> - tp->snd_cwnd = min(tp->snd_cwnd, (u32)target_cwnd+1);
> + tp->snd_cwnd = min_t(u64, tp->snd_cwnd, target_cwnd+1);
> tp->snd_ssthresh = tcp_vegas_ssthresh(tp);
>
> } else if (tp->snd_cwnd <= tp->snd_ssthresh) {
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas
2014-07-26 8:59 ` Christoph Paasch
@ 2014-07-26 9:54 ` Eric Dumazet
2014-07-27 9:48 ` Christoph Paasch
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2014-07-26 9:54 UTC (permalink / raw)
To: Christoph Paasch
Cc: Stephen Hemminger, David Miller, netdev, Neal Cardwell,
David Laight, Doug Leith
On Sat, 2014-07-26 at 10:59 +0200, Christoph Paasch wrote:
> do you mean, using "do_div"?
>
> David suggested to avoid using do_div in tcp_vegas.
My understanding is the following :
On 64bit arches, used on most servers that really care of TCP
performance these days, do_div() is the fastest thing : No extra
conditional.
# define do_div(n,base) ({ \
uint32_t __base = (base); \
uint32_t __rem; \
__rem = ((uint64_t)(n)) % __base; \
(n) = ((uint64_t)(n)) / __base; \
__rem; \
})
Then on 32bit, do_div(target_cwnd, Y) will perform a single divide
if target_cwnd is < 2^32, which is very likely the case :
# define do_div(n,base) ({ \
uint32_t __base = (base); \
uint32_t __rem; \
(void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
if (likely(((n) >> 32) == 0)) { \
__rem = (uint32_t)(n) % __base; \
(n) = (uint32_t)(n) / __base; \
} else \
__rem = __div64_32(&(n), __base); \
__rem; \
})
(In both cases, compiler will remove the modulo operation, as we do not use it)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas
2014-07-26 9:54 ` Eric Dumazet
@ 2014-07-27 9:48 ` Christoph Paasch
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Paasch @ 2014-07-27 9:48 UTC (permalink / raw)
To: Eric Dumazet
Cc: Stephen Hemminger, David Miller, netdev, Neal Cardwell,
David Laight, Doug Leith
On 26/07/14 - 11:54:57, Eric Dumazet wrote:
> On Sat, 2014-07-26 at 10:59 +0200, Christoph Paasch wrote:
>
> > do you mean, using "do_div"?
> >
> > David suggested to avoid using do_div in tcp_vegas.
>
> My understanding is the following :
>
> On 64bit arches, used on most servers that really care of TCP
> performance these days, do_div() is the fastest thing : No extra
> conditional.
>
> # define do_div(n,base) ({ \
> uint32_t __base = (base); \
> uint32_t __rem; \
> __rem = ((uint64_t)(n)) % __base; \
> (n) = ((uint64_t)(n)) / __base; \
> __rem; \
> })
>
>
> Then on 32bit, do_div(target_cwnd, Y) will perform a single divide
> if target_cwnd is < 2^32, which is very likely the case :
>
>
> # define do_div(n,base) ({ \
> uint32_t __base = (base); \
> uint32_t __rem; \
> (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
> if (likely(((n) >> 32) == 0)) { \
> __rem = (uint32_t)(n) % __base; \
> (n) = (uint32_t)(n) / __base; \
> } else \
> __rem = __div64_32(&(n), __base); \
> __rem; \
> })
>
>
>
> (In both cases, compiler will remove the modulo operation, as we do not use it)
I am very fine with using do_div. Indeed, cwnd and rtt must be quite high to
fall into the case of 64-bit divides.
I will wait a bit for other feedback and then send a new version with do_div.
Thanks,
Christoph
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas
2014-07-25 11:52 [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas Christoph Paasch
2014-07-25 18:14 ` Stephen Hemminger
@ 2014-07-29 0:26 ` David Miller
2014-07-29 9:52 ` Christoph Paasch
1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2014-07-29 0:26 UTC (permalink / raw)
To: christoph.paasch; +Cc: netdev, ncardwell, David.Laight, doug.leith
From: Christoph Paasch <christoph.paasch@uclouvain.be>
Date: Fri, 25 Jul 2014 13:52:39 +0200
> + target_cwnd = U32_MAX / rtt * upper + err / rtt;
Doing two divides is probably more expensive than using do_div().
Why don't we go back to the do_div() implementation, sorry about
changing my mind again.
And please resubmit the veno change, it's fine as-is.
Thanks again.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas
2014-07-29 0:26 ` David Miller
@ 2014-07-29 9:52 ` Christoph Paasch
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Paasch @ 2014-07-29 9:52 UTC (permalink / raw)
To: David Miller; +Cc: netdev, ncardwell, David.Laight, doug.leith
On 28/07/14 - 17:26:27, David Miller wrote:
> From: Christoph Paasch <christoph.paasch@uclouvain.be>
> Date: Fri, 25 Jul 2014 13:52:39 +0200
>
> > + target_cwnd = U32_MAX / rtt * upper + err / rtt;
>
> Doing two divides is probably more expensive than using do_div().
>
> Why don't we go back to the do_div() implementation, sorry about
> changing my mind again.
No worries :)
I will resubmit.
Cheers,
Christoph
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-07-29 9:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-25 11:52 [PATCH v2 net] tcp: Fix integer-overflows in TCP vegas Christoph Paasch
2014-07-25 18:14 ` Stephen Hemminger
2014-07-26 8:59 ` Christoph Paasch
2014-07-26 9:54 ` Eric Dumazet
2014-07-27 9:48 ` Christoph Paasch
2014-07-29 0:26 ` David Miller
2014-07-29 9:52 ` Christoph Paasch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).