* [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf()
@ 2013-05-16 5:25 Eric Dumazet
2013-05-16 7:06 ` Christoph Paasch
2013-05-16 15:23 ` Neal Cardwell
0 siblings, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2013-05-16 5:25 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng
From: Eric Dumazet <edumazet@google.com>
tcp_fixup_rcvbuf() contains a loop to estimate initial socket
rcv space needed for a given mss. With large MTU (like 64K on lo),
we can loop ~500 times and consume a lot of cpu cycles.
perf top of 200 concurrent netperf -t TCP_CRR
5.62% netperf [kernel.kallsyms] [k] tcp_init_buffer_space
1.71% netperf [kernel.kallsyms] [k] _raw_spin_lock
1.55% netperf [kernel.kallsyms] [k] kmem_cache_free
1.51% netperf [kernel.kallsyms] [k] tcp_transmit_skb
1.50% netperf [kernel.kallsyms] [k] tcp_ack
Lets use a 100% factor, and remove the loop.
100% is needed anyway for tcp_adv_win_scale=1
default value, and is also the maximum factor.
Refs: commit b49960a05e32
("tcp: change tcp_adv_win_scale and tcp_rmem[2]")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
net/ipv4/tcp_input.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 08bbe60..b358e8c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -360,9 +360,7 @@ static void tcp_fixup_rcvbuf(struct sock *sk)
if (mss > 1460)
icwnd = max_t(u32, (1460 * TCP_DEFAULT_INIT_RCVWND) / mss, 2);
- rcvmem = SKB_TRUESIZE(mss + MAX_TCP_HEADER);
- while (tcp_win_from_space(rcvmem) < mss)
- rcvmem += 128;
+ rcvmem = 2 * SKB_TRUESIZE(mss + MAX_TCP_HEADER);
rcvmem *= icwnd;
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf()
2013-05-16 5:25 [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf() Eric Dumazet
@ 2013-05-16 7:06 ` Christoph Paasch
2013-05-16 7:11 ` Eric Dumazet
2013-05-16 17:42 ` Rick Jones
2013-05-16 15:23 ` Neal Cardwell
1 sibling, 2 replies; 6+ messages in thread
From: Christoph Paasch @ 2013-05-16 7:06 UTC (permalink / raw)
To: Eric Dumazet, netdev
Hello Eric,
On Wednesday 15 May 2013 22:25:55 Eric Dumazet wrote:
> tcp_fixup_rcvbuf() contains a loop to estimate initial socket
> rcv space needed for a given mss. With large MTU (like 64K on lo),
> we can loop ~500 times and consume a lot of cpu cycles.
>
> perf top of 200 concurrent netperf -t TCP_CRR
just out of curiosity, how do you run 200 concurrent netperfs?
Is there an option as in iperf (-P) ?
I did not find anything like this in the netperf-code.
Thanks,
Christoph
--
IP Networking Lab --- http://inl.info.ucl.ac.be
MultiPath TCP in the Linux Kernel --- http://multipath-tcp.org
UCLouvain
--
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf()
2013-05-16 7:06 ` Christoph Paasch
@ 2013-05-16 7:11 ` Eric Dumazet
2013-05-16 17:42 ` Rick Jones
1 sibling, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2013-05-16 7:11 UTC (permalink / raw)
To: christoph.paasch; +Cc: netdev
On Thu, 2013-05-16 at 09:06 +0200, Christoph Paasch wrote:
> just out of curiosity, how do you run 200 concurrent netperfs?
> Is there an option as in iperf (-P) ?
> I did not find anything like this in the netperf-code.
I am pretty sure there are some scripts in the netperf tree, but I am so
lazy ;)
I use the following shell script : (OK even with old netperf versions)
$ ./super_netperf 200 -t TCP_CRR
$ cat super_netperf
#!/bin/bash
run_netperf() {
loops=$1
shift
for ((i=0; i<loops; i++)); do
netperf -s 2 $@ | awk '/Min/{
if (!once) {
print;
once=1;
}
}
{
if (NR == 6)
save = $NF
else if (NR==7) {
if (NF > 0)
print $NF
else
print save
} else if (NR==11) {
print $0
}
}' &
done
wait
return 0
}
run_netperf $@ | awk '{if (NF==7) {print $0; next}} {sum += $1} END
{print sum}'
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf()
2013-05-16 5:25 [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf() Eric Dumazet
2013-05-16 7:06 ` Christoph Paasch
@ 2013-05-16 15:23 ` Neal Cardwell
2013-05-16 22:20 ` David Miller
1 sibling, 1 reply; 6+ messages in thread
From: Neal Cardwell @ 2013-05-16 15:23 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Yuchung Cheng
On Thu, May 16, 2013 at 1:25 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> tcp_fixup_rcvbuf() contains a loop to estimate initial socket
> rcv space needed for a given mss. With large MTU (like 64K on lo),
> we can loop ~500 times and consume a lot of cpu cycles.
>
> perf top of 200 concurrent netperf -t TCP_CRR
>
> 5.62% netperf [kernel.kallsyms] [k] tcp_init_buffer_space
> 1.71% netperf [kernel.kallsyms] [k] _raw_spin_lock
> 1.55% netperf [kernel.kallsyms] [k] kmem_cache_free
> 1.51% netperf [kernel.kallsyms] [k] tcp_transmit_skb
> 1.50% netperf [kernel.kallsyms] [k] tcp_ack
>
> Lets use a 100% factor, and remove the loop.
>
> 100% is needed anyway for tcp_adv_win_scale=1
> default value, and is also the maximum factor.
>
> Refs: commit b49960a05e32
> ("tcp: change tcp_adv_win_scale and tcp_rmem[2]")
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> ---
> net/ipv4/tcp_input.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
Acked-by: Neal Cardwell <ncardwell@google.com>
neal
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf()
2013-05-16 7:06 ` Christoph Paasch
2013-05-16 7:11 ` Eric Dumazet
@ 2013-05-16 17:42 ` Rick Jones
1 sibling, 0 replies; 6+ messages in thread
From: Rick Jones @ 2013-05-16 17:42 UTC (permalink / raw)
To: christoph.paasch; +Cc: Eric Dumazet, netdev
On 05/16/2013 12:06 AM, Christoph Paasch wrote:
> just out of curiosity, how do you run 200 concurrent netperfs?
> Is there an option as in iperf (-P) ?
> I did not find anything like this in the netperf-code.
There is nothing like that in the netperf2 code. Concurrent netperfs is
handled outside of netperf itself via scripting. There is some
discussion of some different mechanisms in netperf to use in conjunction
with that external scripting to mitigate issues of skew error:
http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Using-Netperf-to-Measure-Aggregate-Performance
My favorite these days is to use the interim results emitted when
netperf is ./configure'd with --enable-demo , and reasonably
synchronized clocks on the different systems running netperf, and then
post-process them. A single-system example of that being done is in
doc/examples/runemomniaggdemo.sh , the results of which can be
post-processed with doc/examples/post_proc.py .
I have used the interim results plus post processing mechanism as far
out as 512ish concurrent netperfs running on 512ish systems targeting
512ish other systems. Apart from my innate lack of patience :) I don't
believe there is much there to limit that mechanism scaling further.
Perhaps others have already gone father.
I this specific situation where Eric was running 200 netperf TCP_CRR
tests over loopback, if the difference from removing the loop was
sufficiently large (and I'm guessing so based on the perf top output)
then I would expect the difference to appear in service demand even for
a single stream of TCP_CRR tests.
something like:
netperf -t TCP_CRR -c -i 30,3
before and after the change. Perhaps use the -I option to request a
narrower confidence interval than the default 5% and use a longish
per-iteration runtime (-l option) to help ensure hitting the confidence
intervals.
happy benchmarking,
rick jones
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf()
2013-05-16 15:23 ` Neal Cardwell
@ 2013-05-16 22:20 ` David Miller
0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2013-05-16 22:20 UTC (permalink / raw)
To: ncardwell; +Cc: eric.dumazet, netdev, ycheng
From: Neal Cardwell <ncardwell@google.com>
Date: Thu, 16 May 2013 11:23:03 -0400
> On Thu, May 16, 2013 at 1:25 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> tcp_fixup_rcvbuf() contains a loop to estimate initial socket
>> rcv space needed for a given mss. With large MTU (like 64K on lo),
>> we can loop ~500 times and consume a lot of cpu cycles.
>>
>> perf top of 200 concurrent netperf -t TCP_CRR
>>
>> 5.62% netperf [kernel.kallsyms] [k] tcp_init_buffer_space
>> 1.71% netperf [kernel.kallsyms] [k] _raw_spin_lock
>> 1.55% netperf [kernel.kallsyms] [k] kmem_cache_free
>> 1.51% netperf [kernel.kallsyms] [k] tcp_transmit_skb
>> 1.50% netperf [kernel.kallsyms] [k] tcp_ack
>>
>> Lets use a 100% factor, and remove the loop.
>>
>> 100% is needed anyway for tcp_adv_win_scale=1
>> default value, and is also the maximum factor.
>>
>> Refs: commit b49960a05e32
>> ("tcp: change tcp_adv_win_scale and tcp_rmem[2]")
>>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
...
> Acked-by: Neal Cardwell <ncardwell@google.com>
Applied.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-05-16 22:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-16 5:25 [PATCH net-next] tcp: speedup tcp_fixup_rcvbuf() Eric Dumazet
2013-05-16 7:06 ` Christoph Paasch
2013-05-16 7:11 ` Eric Dumazet
2013-05-16 17:42 ` Rick Jones
2013-05-16 15:23 ` Neal Cardwell
2013-05-16 22:20 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).