[PATCH net-next] tcp: speedup tcp_fixup

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf()
@ 2013-05-16  5:25 Eric Dumazet
  2013-05-16  7:06 ` Christoph Paasch
  2013-05-16 15:23 ` Neal Cardwell
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2013-05-16  5:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng

From: Eric Dumazet <edumazet@google.com>

tcp_fixup_rcvbuf() contains a loop to estimate initial socket
rcv space needed for a given mss. With large MTU (like 64K on lo),
we can loop ~500 times and consume a lot of cpu cycles.

perf top of 200 concurrent netperf -t TCP_CRR

5.62%  netperf  [kernel.kallsyms]  [k] tcp_init_buffer_space
1.71%  netperf  [kernel.kallsyms]  [k] _raw_spin_lock
1.55%  netperf  [kernel.kallsyms]  [k] kmem_cache_free
1.51%  netperf  [kernel.kallsyms]  [k] tcp_transmit_skb
1.50%  netperf  [kernel.kallsyms]  [k] tcp_ack

Lets use a 100% factor, and remove the loop.

100% is needed anyway for tcp_adv_win_scale=1
default value, and is also the maximum factor.

Refs: commit b49960a05e32 
      ("tcp: change tcp_adv_win_scale and tcp_rmem[2]")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
 net/ipv4/tcp_input.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 08bbe60..b358e8c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -360,9 +360,7 @@ static void tcp_fixup_rcvbuf(struct sock *sk)
 	if (mss > 1460)
 		icwnd = max_t(u32, (1460 * TCP_DEFAULT_INIT_RCVWND) / mss, 2);
 
-	rcvmem = SKB_TRUESIZE(mss + MAX_TCP_HEADER);
-	while (tcp_win_from_space(rcvmem) < mss)
-		rcvmem += 128;
+	rcvmem = 2 * SKB_TRUESIZE(mss + MAX_TCP_HEADER);
 
 	rcvmem *= icwnd;
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf()
  2013-05-16  5:25 [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf() Eric Dumazet
@ 2013-05-16  7:06 ` Christoph Paasch
  2013-05-16  7:11   ` Eric Dumazet
  2013-05-16 17:42   ` Rick Jones
  2013-05-16 15:23 ` Neal Cardwell
  1 sibling, 2 replies; 6+ messages in thread
From: Christoph Paasch @ 2013-05-16  7:06 UTC (permalink / raw)
  To: Eric Dumazet, netdev

Hello Eric,

On Wednesday 15 May 2013 22:25:55 Eric Dumazet wrote:
> tcp_fixup_rcvbuf() contains a loop to estimate initial socket
> rcv space needed for a given mss. With large MTU (like 64K on lo),
> we can loop ~500 times and consume a lot of cpu cycles.
> 
> perf top of 200 concurrent netperf -t TCP_CRR

just out of curiosity, how do you run 200 concurrent netperfs?
Is there an option as in iperf (-P) ?
I did not find anything like this in the netperf-code.


Thanks,
Christoph

-- 
IP Networking Lab --- http://inl.info.ucl.ac.be
MultiPath TCP in the Linux Kernel --- http://multipath-tcp.org
UCLouvain
--

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf()
  2013-05-16  7:06 ` Christoph Paasch
@ 2013-05-16  7:11   ` Eric Dumazet
  2013-05-16 17:42   ` Rick Jones
  1 sibling, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2013-05-16  7:11 UTC (permalink / raw)
  To: christoph.paasch; +Cc: netdev

On Thu, 2013-05-16 at 09:06 +0200, Christoph Paasch wrote:

> just out of curiosity, how do you run 200 concurrent netperfs?
> Is there an option as in iperf (-P) ?
> I did not find anything like this in the netperf-code.

I am pretty sure there are some scripts in the netperf tree, but I am so
lazy ;)

I use the following shell script : (OK even with old netperf versions)

$ ./super_netperf 200 -t TCP_CRR

$ cat super_netperf 
#!/bin/bash

run_netperf() {
	loops=$1
	shift
	for ((i=0; i<loops; i++)); do
		netperf -s 2 $@ | awk '/Min/{
			if (!once) {
				print;
				once=1;
			}
		}
		{
			if (NR == 6)
				save = $NF
			else if (NR==7) {
				if (NF > 0)
					print $NF
				else
					print save
			} else if (NR==11) {
				print $0
			}
		}' &
	done
	wait
	return 0
}

run_netperf $@ | awk '{if (NF==7) {print $0; next}} {sum += $1} END
{print sum}'

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf()
  2013-05-16  5:25 [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf() Eric Dumazet
  2013-05-16  7:06 ` Christoph Paasch
@ 2013-05-16 15:23 ` Neal Cardwell
  2013-05-16 22:20   ` David Miller
  1 sibling, 1 reply; 6+ messages in thread
From: Neal Cardwell @ 2013-05-16 15:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Yuchung Cheng

On Thu, May 16, 2013 at 1:25 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> tcp_fixup_rcvbuf() contains a loop to estimate initial socket
> rcv space needed for a given mss. With large MTU (like 64K on lo),
> we can loop ~500 times and consume a lot of cpu cycles.
>
> perf top of 200 concurrent netperf -t TCP_CRR
>
> 5.62%  netperf  [kernel.kallsyms]  [k] tcp_init_buffer_space
> 1.71%  netperf  [kernel.kallsyms]  [k] _raw_spin_lock
> 1.55%  netperf  [kernel.kallsyms]  [k] kmem_cache_free
> 1.51%  netperf  [kernel.kallsyms]  [k] tcp_transmit_skb
> 1.50%  netperf  [kernel.kallsyms]  [k] tcp_ack
>
> Lets use a 100% factor, and remove the loop.
>
> 100% is needed anyway for tcp_adv_win_scale=1
> default value, and is also the maximum factor.
>
> Refs: commit b49960a05e32
>       ("tcp: change tcp_adv_win_scale and tcp_rmem[2]")
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> ---
>  net/ipv4/tcp_input.c |    4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

Acked-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf()
  2013-05-16  7:06 ` Christoph Paasch
  2013-05-16  7:11   ` Eric Dumazet
@ 2013-05-16 17:42   ` Rick Jones
  1 sibling, 0 replies; 6+ messages in thread
From: Rick Jones @ 2013-05-16 17:42 UTC (permalink / raw)
  To: christoph.paasch; +Cc: Eric Dumazet, netdev

On 05/16/2013 12:06 AM, Christoph Paasch wrote:
> just out of curiosity, how do you run 200 concurrent netperfs?
> Is there an option as in iperf (-P) ?
> I did not find anything like this in the netperf-code.

There is nothing like that in the netperf2 code.  Concurrent netperfs is 
handled outside of netperf itself via scripting.  There is some 
discussion of some different mechanisms in netperf to use in conjunction 
with that external scripting to mitigate issues of skew error:

http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Using-Netperf-to-Measure-Aggregate-Performance

My favorite these days is to use the interim results emitted when 
netperf is ./configure'd with --enable-demo , and reasonably 
synchronized clocks on the different systems running netperf, and then 
post-process them.  A single-system example of that being done is in 
doc/examples/runemomniaggdemo.sh , the results of which can be 
post-processed with doc/examples/post_proc.py .

I have used the interim results plus post processing mechanism as far 
out as 512ish concurrent netperfs running on 512ish systems targeting 
512ish other systems.  Apart from my innate lack of patience :) I don't 
believe there is much there to limit that mechanism scaling further. 
Perhaps others have already gone father.

I this specific situation where Eric was running 200 netperf TCP_CRR 
tests over loopback, if the difference from removing the loop was 
sufficiently large (and I'm guessing so based on the perf top output) 
then I would expect the difference to appear in service demand even for 
a single stream of TCP_CRR tests.

something like:

netperf -t TCP_CRR -c -i 30,3

before and after the change.  Perhaps use the -I option to request a 
narrower confidence interval than the default 5%  and use a longish 
per-iteration runtime (-l option) to help ensure hitting the confidence 
intervals.

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf()
  2013-05-16 15:23 ` Neal Cardwell
@ 2013-05-16 22:20   ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2013-05-16 22:20 UTC (permalink / raw)
  To: ncardwell; +Cc: eric.dumazet, netdev, ycheng

From: Neal Cardwell <ncardwell@google.com>
Date: Thu, 16 May 2013 11:23:03 -0400

> On Thu, May 16, 2013 at 1:25 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> tcp_fixup_rcvbuf() contains a loop to estimate initial socket
>> rcv space needed for a given mss. With large MTU (like 64K on lo),
>> we can loop ~500 times and consume a lot of cpu cycles.
>>
>> perf top of 200 concurrent netperf -t TCP_CRR
>>
>> 5.62%  netperf  [kernel.kallsyms]  [k] tcp_init_buffer_space
>> 1.71%  netperf  [kernel.kallsyms]  [k] _raw_spin_lock
>> 1.55%  netperf  [kernel.kallsyms]  [k] kmem_cache_free
>> 1.51%  netperf  [kernel.kallsyms]  [k] tcp_transmit_skb
>> 1.50%  netperf  [kernel.kallsyms]  [k] tcp_ack
>>
>> Lets use a 100% factor, and remove the loop.
>>
>> 100% is needed anyway for tcp_adv_win_scale=1
>> default value, and is also the maximum factor.
>>
>> Refs: commit b49960a05e32
>>       ("tcp: change tcp_adv_win_scale and tcp_rmem[2]")
>>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
 ...
> Acked-by: Neal Cardwell <ncardwell@google.com>

Applied.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-05-16 22:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-16  5:25 [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf() Eric Dumazet
2013-05-16  7:06 ` Christoph Paasch
2013-05-16  7:11   ` Eric Dumazet
2013-05-16 17:42   ` Rick Jones
2013-05-16 15:23 ` Neal Cardwell
2013-05-16 22:20   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).