* [PATCH?] tcp and delayed acks
@ 2006-08-16 20:55 Benjamin LaHaise
2006-08-16 19:11 ` Stephen Hemminger
2006-08-16 22:39 ` Alexey Kuznetsov
0 siblings, 2 replies; 6+ messages in thread
From: Benjamin LaHaise @ 2006-08-16 20:55 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
Hello folks,
In looking at a few benchmarks (especially netperf) run locally, it seems
that tcp is unable to make full use of available CPU cycles as the sender
is throttled waiting for ACKs to arrive. The problem is exacerbated when
the sender is using a small send buffer -- running netperf -C -c -- -s 1024
show a miserable 420Kbit/s at essentially 0% CPU usage. Tests over gige
are similarly constrained to a mere 96Mbit/s.
Since there is no way for the receiver to know if the sender is being
blocked on transmit space, would it not make sense for the receiver to
send out any delayed ACKs when it is clear that the receiving process is
waiting for more data? The patch below attempts this (I make no guarantees
of its correctness with respect to the rest of the delayed ack code). One
point I'm still contemplating is what to do if the receiver is waiting in
poll/select/epoll.
[All tests run with maxcpus=1 on a 2.67GHz Woodcrest system.]
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
Base (2.6.17-rc4):
default send buffer size
netperf -C -c
87380 16384 16384 10.02 14127.79 99.90 99.90 0.579 0.579
87380 16384 16384 10.02 13875.28 99.90 99.90 0.590 0.590
87380 16384 16384 10.01 13777.25 99.90 99.90 0.594 0.594
87380 16384 16384 10.02 13796.31 99.90 99.90 0.593 0.593
87380 16384 16384 10.01 13801.97 99.90 99.90 0.593 0.593
netperf -C -c -- -s 1024
87380 2048 2048 10.02 0.43 -0.04 -0.04 -7.105 -7.377
87380 2048 2048 10.02 0.43 -0.01 -0.01 -2.337 -2.620
87380 2048 2048 10.02 0.43 -0.03 -0.03 -5.683 -5.940
87380 2048 2048 10.02 0.43 -0.05 -0.05 -9.373 -9.625
87380 2048 2048 10.02 0.43 -0.05 -0.05 -9.373 -9.625
from a remote system over gigabit ethernet
netperf -H woody -C -c
87380 16384 16384 10.03 936.23 19.32 20.47 3.382 1.791
87380 16384 16384 10.03 936.27 17.67 20.95 3.091 1.833
87380 16384 16384 10.03 936.17 19.18 20.77 3.356 1.817
87380 16384 16384 10.03 936.26 18.22 20.26 3.188 1.773
87380 16384 16384 10.03 936.26 17.35 20.54 3.036 1.797
netperf -H woody -C -c -- -s 1024
87380 2048 2048 10.00 95.72 10.04 6.64 17.188 5.683
87380 2048 2048 10.00 95.94 9.47 6.42 16.170 5.478
87380 2048 2048 10.00 96.83 9.62 5.72 16.283 4.840
87380 2048 2048 10.00 95.91 9.58 6.13 16.368 5.236
87380 2048 2048 10.00 95.91 9.58 6.13 16.368 5.236
Patched:
default send buffer size
netperf -C -c
87380 16384 16384 10.01 13923.16 99.90 99.90 0.588 0.588
87380 16384 16384 10.01 13854.59 99.90 99.90 0.591 0.591
87380 16384 16384 10.02 13840.42 99.90 99.90 0.591 0.591
87380 16384 16384 10.01 13810.96 99.90 99.90 0.593 0.593
87380 16384 16384 10.01 13771.27 99.90 99.90 0.594 0.594
netperf -C -c -- -s 1024
87380 2048 2048 10.02 2473.48 99.90 99.90 3.309 3.309
87380 2048 2048 10.02 2421.46 99.90 99.90 3.380 3.380
87380 2048 2048 10.02 2288.07 99.90 99.90 3.577 3.577
87380 2048 2048 10.02 2405.41 99.90 99.90 3.402 3.402
87380 2048 2048 10.02 2284.41 99.90 99.90 3.582 3.582
netperf -H woody -C -c
87380 16384 16384 10.04 936.10 23.04 21.60 4.033 1.890
87380 16384 16384 10.03 936.20 18.52 21.06 3.242 1.843
87380 16384 16384 10.03 936.52 17.61 21.05 3.082 1.841
87380 16384 16384 10.03 936.18 18.24 20.73 3.191 1.814
87380 16384 16384 10.03 936.28 18.30 21.04 3.202 1.841
netperf -H woody -C -c -- -s 1024
87380 2048 2048 10.00 142.46 10.19 7.53 11.714 4.332
87380 2048 2048 10.00 147.28 9.73 7.93 10.829 4.412
87380 2048 2048 10.00 143.37 10.64 6.54 12.161 3.738
87380 2048 2048 10.00 146.41 9.18 7.43 10.277 4.158
87380 2048 2048 10.01 145.58 9.80 7.25 11.032 4.081
Comments/thoughts?
-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <dont@kvack.org>.
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 934396b..e554ceb 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1277,8 +1277,11 @@ #endif
/* Do not sleep, just process backlog. */
release_sock(sk);
lock_sock(sk);
- } else
+ } else {
+ if (inet_csk_ack_scheduled(sk))
+ tcp_send_ack(sk);
sk_wait_data(sk, &timeo);
+ }
#ifdef CONFIG_NET_DMA
tp->ucopy.wakeup = 0;
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH?] tcp and delayed acks
2006-08-16 20:55 [PATCH?] tcp and delayed acks Benjamin LaHaise
@ 2006-08-16 19:11 ` Stephen Hemminger
2006-08-16 21:15 ` David Miller
` (2 more replies)
2006-08-16 22:39 ` Alexey Kuznetsov
1 sibling, 3 replies; 6+ messages in thread
From: Stephen Hemminger @ 2006-08-16 19:11 UTC (permalink / raw)
To: Benjamin LaHaise; +Cc: David S. Miller, netdev
On Wed, 16 Aug 2006 16:55:32 -0400
Benjamin LaHaise <bcrl@kvack.org> wrote:
> Hello folks,
>
> In looking at a few benchmarks (especially netperf) run locally, it seems
> that tcp is unable to make full use of available CPU cycles as the sender
> is throttled waiting for ACKs to arrive. The problem is exacerbated when
> the sender is using a small send buffer -- running netperf -C -c -- -s 1024
> show a miserable 420Kbit/s at essentially 0% CPU usage. Tests over gige
> are similarly constrained to a mere 96Mbit/s.
What ethernet hardware? The defaults are often not big enough
for full speed on gigabit hardware. I need increase rmem/wmem to allow
for more buffering.
> Since there is no way for the receiver to know if the sender is being
> blocked on transmit space, would it not make sense for the receiver to
> send out any delayed ACKs when it is clear that the receiving process is
> waiting for more data? The patch below attempts this (I make no guarantees
> of its correctness with respect to the rest of the delayed ack code). One
> point I'm still contemplating is what to do if the receiver is waiting in
> poll/select/epoll.
The point of delayed ack's was to merge the response and the ack on request/response
protocols like NFS or telnet. It does make sense to get it out sooner though.
> [All tests run with maxcpus=1 on a 2.67GHz Woodcrest system.]
>
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> Base (2.6.17-rc4):
> default send buffer size
> netperf -C -c
> 87380 16384 16384 10.02 14127.79 99.90 99.90 0.579 0.579
> 87380 16384 16384 10.02 13875.28 99.90 99.90 0.590 0.590
> 87380 16384 16384 10.01 13777.25 99.90 99.90 0.594 0.594
> 87380 16384 16384 10.02 13796.31 99.90 99.90 0.593 0.593
> 87380 16384 16384 10.01 13801.97 99.90 99.90 0.593 0.593
>
> netperf -C -c -- -s 1024
> 87380 2048 2048 10.02 0.43 -0.04 -0.04 -7.105 -7.377
> 87380 2048 2048 10.02 0.43 -0.01 -0.01 -2.337 -2.620
> 87380 2048 2048 10.02 0.43 -0.03 -0.03 -5.683 -5.940
> 87380 2048 2048 10.02 0.43 -0.05 -0.05 -9.373 -9.625
> 87380 2048 2048 10.02 0.43 -0.05 -0.05 -9.373 -9.625
>
> from a remote system over gigabit ethernet
> netperf -H woody -C -c
> 87380 16384 16384 10.03 936.23 19.32 20.47 3.382 1.791
> 87380 16384 16384 10.03 936.27 17.67 20.95 3.091 1.833
> 87380 16384 16384 10.03 936.17 19.18 20.77 3.356 1.817
> 87380 16384 16384 10.03 936.26 18.22 20.26 3.188 1.773
> 87380 16384 16384 10.03 936.26 17.35 20.54 3.036 1.797
>
> netperf -H woody -C -c -- -s 1024
> 87380 2048 2048 10.00 95.72 10.04 6.64 17.188 5.683
> 87380 2048 2048 10.00 95.94 9.47 6.42 16.170 5.478
> 87380 2048 2048 10.00 96.83 9.62 5.72 16.283 4.840
> 87380 2048 2048 10.00 95.91 9.58 6.13 16.368 5.236
> 87380 2048 2048 10.00 95.91 9.58 6.13 16.368 5.236
>
>
> Patched:
> default send buffer size
> netperf -C -c
> 87380 16384 16384 10.01 13923.16 99.90 99.90 0.588 0.588
> 87380 16384 16384 10.01 13854.59 99.90 99.90 0.591 0.591
> 87380 16384 16384 10.02 13840.42 99.90 99.90 0.591 0.591
> 87380 16384 16384 10.01 13810.96 99.90 99.90 0.593 0.593
> 87380 16384 16384 10.01 13771.27 99.90 99.90 0.594 0.594
>
> netperf -C -c -- -s 1024
> 87380 2048 2048 10.02 2473.48 99.90 99.90 3.309 3.309
> 87380 2048 2048 10.02 2421.46 99.90 99.90 3.380 3.380
> 87380 2048 2048 10.02 2288.07 99.90 99.90 3.577 3.577
> 87380 2048 2048 10.02 2405.41 99.90 99.90 3.402 3.402
> 87380 2048 2048 10.02 2284.41 99.90 99.90 3.582 3.582
>
> netperf -H woody -C -c
> 87380 16384 16384 10.04 936.10 23.04 21.60 4.033 1.890
> 87380 16384 16384 10.03 936.20 18.52 21.06 3.242 1.843
> 87380 16384 16384 10.03 936.52 17.61 21.05 3.082 1.841
> 87380 16384 16384 10.03 936.18 18.24 20.73 3.191 1.814
> 87380 16384 16384 10.03 936.28 18.30 21.04 3.202 1.841
>
> netperf -H woody -C -c -- -s 1024
> 87380 2048 2048 10.00 142.46 10.19 7.53 11.714 4.332
> 87380 2048 2048 10.00 147.28 9.73 7.93 10.829 4.412
> 87380 2048 2048 10.00 143.37 10.64 6.54 12.161 3.738
> 87380 2048 2048 10.00 146.41 9.18 7.43 10.277 4.158
> 87380 2048 2048 10.01 145.58 9.80 7.25 11.032 4.081
>
> Comments/thoughts?
>
> -ben
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH?] tcp and delayed acks
2006-08-16 19:11 ` Stephen Hemminger
@ 2006-08-16 21:15 ` David Miller
2006-08-16 21:37 ` Rick Jones
2006-08-16 21:41 ` Benjamin LaHaise
2 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2006-08-16 21:15 UTC (permalink / raw)
To: shemminger; +Cc: bcrl, netdev
From: Stephen Hemminger <shemminger@osdl.org>
Date: Wed, 16 Aug 2006 12:11:12 -0700
> What ethernet hardware? The defaults are often not big enough
> for full speed on gigabit hardware. I need increase rmem/wmem to allow
> for more buffering.
Current kernels allow the TCP send and receive socket buffers
to grow up to at least 4MB in size, how much more do you need?
tcp_{w,r}mem[2] will now have a value of at least 4MB, see
net/ipv4/tcp.c:tcp_init().
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH?] tcp and delayed acks
2006-08-16 19:11 ` Stephen Hemminger
2006-08-16 21:15 ` David Miller
@ 2006-08-16 21:37 ` Rick Jones
2006-08-16 21:41 ` Benjamin LaHaise
2 siblings, 0 replies; 6+ messages in thread
From: Rick Jones @ 2006-08-16 21:37 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Benjamin LaHaise, David S. Miller, netdev
> The point of delayed ack's was to merge the response and the ack on request/response
> protocols like NFS or telnet. It does make sense to get it out sooner though.
Well, to a point at least - I wouldn't go so far as to suggest immediate
ACKs.
However, I was always under the impression that ACKs were sent (in the
mythical generic TCP stack) when:
a) there was data going the other way
b) there was a window update going the other way
c) the standalone ACK timer expired.
Does this patch then implement b? Were there perhaps "holes" in the
logic when things were smaller than the MTU/MSS? (-v 2 on the netperf
command line should show what the MSS was for the connection)
rick jones
BTW, many points scored for including CPU utilization and service demand
figures with the netperf output :)
>
>
>>[All tests run with maxcpus=1 on a 2.67GHz Woodcrest system.]
>>
>>Recv Send Send Utilization Service Demand
>>Socket Socket Message Elapsed Send Recv Send Recv
>>Size Size Size Time Throughput local remote local remote
>>bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>>
>>Base (2.6.17-rc4):
>>default send buffer size
>>netperf -C -c
>> 87380 16384 16384 10.02 14127.79 99.90 99.90 0.579 0.579
>> 87380 16384 16384 10.02 13875.28 99.90 99.90 0.590 0.590
>> 87380 16384 16384 10.01 13777.25 99.90 99.90 0.594 0.594
>> 87380 16384 16384 10.02 13796.31 99.90 99.90 0.593 0.593
>> 87380 16384 16384 10.01 13801.97 99.90 99.90 0.593 0.593
>>
>>netperf -C -c -- -s 1024
>> 87380 2048 2048 10.02 0.43 -0.04 -0.04 -7.105 -7.377
>> 87380 2048 2048 10.02 0.43 -0.01 -0.01 -2.337 -2.620
>> 87380 2048 2048 10.02 0.43 -0.03 -0.03 -5.683 -5.940
>> 87380 2048 2048 10.02 0.43 -0.05 -0.05 -9.373 -9.625
>> 87380 2048 2048 10.02 0.43 -0.05 -0.05 -9.373 -9.625
Hmm, those CPU numbers don't look right. I guess there must still be
some holes in the procstat CPU method code in netperf :(
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH?] tcp and delayed acks
2006-08-16 19:11 ` Stephen Hemminger
2006-08-16 21:15 ` David Miller
2006-08-16 21:37 ` Rick Jones
@ 2006-08-16 21:41 ` Benjamin LaHaise
2 siblings, 0 replies; 6+ messages in thread
From: Benjamin LaHaise @ 2006-08-16 21:41 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David S. Miller, netdev
On Wed, Aug 16, 2006 at 12:11:12PM -0700, Stephen Hemminger wrote:
> > is throttled waiting for ACKs to arrive. The problem is exacerbated when
> > the sender is using a small send buffer -- running netperf -C -c -- -s 1024
> > show a miserable 420Kbit/s at essentially 0% CPU usage. Tests over gige
> > are similarly constrained to a mere 96Mbit/s.
>
> What ethernet hardware? The defaults are often not big enough
> for full speed on gigabit hardware. I need increase rmem/wmem to allow
> for more buffering.
This is for small buffer transmit buffer sizes over either loopback or
e1000. The artifact also shows up over localhost for somewhat larger buffer
sizes, although it is much more difficult to get results that don't have
large fluctuations because of other scheduling issues. Pinning the tasks to
CPUs is on my list of things to try, but something in the multiple variants
of sched_setaffinity() has resulted in it being broken in netperf.
> The point of delayed ack's was to merge the response and the ack on request/response
> protocols like NFS or telnet. It does make sense to get it out sooner though.
I would like to see what sort of effect this change has on higher latency.
Ideally, quick ack mode should be doing the right thing, but it might need
more input about the receiver's intent.
-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <dont@kvack.org>.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH?] tcp and delayed acks
2006-08-16 20:55 [PATCH?] tcp and delayed acks Benjamin LaHaise
2006-08-16 19:11 ` Stephen Hemminger
@ 2006-08-16 22:39 ` Alexey Kuznetsov
1 sibling, 0 replies; 6+ messages in thread
From: Alexey Kuznetsov @ 2006-08-16 22:39 UTC (permalink / raw)
To: Benjamin LaHaise; +Cc: David S. Miller, netdev
Hello!
> send out any delayed ACKs when it is clear that the receiving process is
> waiting for more data?
It has just be done in tcp_cleanup_rbuf() a few lines before your chunk.
There is some somplex condition to be satisfied there and it is
impossible to relax it any further.
I do not know what is wrong in your case, check what of conditions
in tcp_cleanup_rbuf() was not satisfied. Or just tcpdump a little.
BTW what "buffer" do you mean? SO_SNDBUF? SO_RCVBUF? Something else?
TCP tries to tune itself to weird buffer sizes to allow to have
at least 2 segments in flight exactly to address the problem with delayed
ACKs, but it is quite easy to confuse those heuristics.
Alexey
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-08-16 22:40 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-16 20:55 [PATCH?] tcp and delayed acks Benjamin LaHaise
2006-08-16 19:11 ` Stephen Hemminger
2006-08-16 21:15 ` David Miller
2006-08-16 21:37 ` Rick Jones
2006-08-16 21:41 ` Benjamin LaHaise
2006-08-16 22:39 ` Alexey Kuznetsov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).