netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH?] tcp and delayed acks
  2006-08-16 20:55 [PATCH?] tcp and delayed acks Benjamin LaHaise
@ 2006-08-16 19:11 ` Stephen Hemminger
  2006-08-16 21:15   ` David Miller
                     ` (2 more replies)
  2006-08-16 22:39 ` Alexey Kuznetsov
  1 sibling, 3 replies; 6+ messages in thread
From: Stephen Hemminger @ 2006-08-16 19:11 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: David S. Miller, netdev

On Wed, 16 Aug 2006 16:55:32 -0400
Benjamin LaHaise <bcrl@kvack.org> wrote:

> Hello folks,
> 
> In looking at a few benchmarks (especially netperf) run locally, it seems 
> that tcp is unable to make full use of available CPU cycles as the sender 
> is throttled waiting for ACKs to arrive.  The problem is exacerbated when 
> the sender is using a small send buffer -- running netperf -C -c -- -s 1024 
> show a miserable 420Kbit/s at essentially 0% CPU usage.  Tests over gige 
> are similarly constrained to a mere 96Mbit/s.

What ethernet hardware? The defaults are often not big enough
for full speed on gigabit hardware. I need increase rmem/wmem to allow
for more buffering. 

> Since there is no way for the receiver to know if the sender is being 
> blocked on transmit space, would it not make sense for the receiver to 
> send out any delayed ACKs when it is clear that the receiving process is 
> waiting for more data?  The patch below attempts this (I make no guarantees 
> of its correctness with respect to the rest of the delayed ack code).  One 
> point I'm still contemplating is what to do if the receiver is waiting in 
> poll/select/epoll.

The point of delayed ack's was to merge the response and the ack on request/response
protocols like NFS or telnet. It does make sense to get it out sooner though.

> [All tests run with maxcpus=1 on a 2.67GHz Woodcrest system.]
> 
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
> 
> Base (2.6.17-rc4):
> default send buffer size
> netperf -C -c
>  87380  16384  16384    10.02      14127.79   99.90    99.90    0.579   0.579 
>  87380  16384  16384    10.02      13875.28   99.90    99.90    0.590   0.590 
>  87380  16384  16384    10.01      13777.25   99.90    99.90    0.594   0.594 
>  87380  16384  16384    10.02      13796.31   99.90    99.90    0.593   0.593 
>  87380  16384  16384    10.01      13801.97   99.90    99.90    0.593   0.593 
> 
> netperf -C -c -- -s 1024
>  87380   2048   2048    10.02         0.43   -0.04    -0.04    -7.105  -7.377
>  87380   2048   2048    10.02         0.43   -0.01    -0.01    -2.337  -2.620
>  87380   2048   2048    10.02         0.43   -0.03    -0.03    -5.683  -5.940
>  87380   2048   2048    10.02         0.43   -0.05    -0.05    -9.373  -9.625
>  87380   2048   2048    10.02         0.43   -0.05    -0.05    -9.373  -9.625
> 
> from a remote system over gigabit ethernet
> netperf -H woody -C -c
>  87380  16384  16384    10.03       936.23   19.32    20.47    3.382   1.791 
>  87380  16384  16384    10.03       936.27   17.67    20.95    3.091   1.833 
>  87380  16384  16384    10.03       936.17   19.18    20.77    3.356   1.817 
>  87380  16384  16384    10.03       936.26   18.22    20.26    3.188   1.773 
>  87380  16384  16384    10.03       936.26   17.35    20.54    3.036   1.797 
> 
> netperf -H woody -C -c -- -s 1024
>  87380   2048   2048    10.00        95.72   10.04    6.64     17.188  5.683 
>  87380   2048   2048    10.00        95.94   9.47     6.42     16.170  5.478 
>  87380   2048   2048    10.00        96.83   9.62     5.72     16.283  4.840 
>  87380   2048   2048    10.00        95.91   9.58     6.13     16.368  5.236 
>  87380   2048   2048    10.00        95.91   9.58     6.13     16.368  5.236 
> 
> 
> Patched:
> default send buffer size
> netperf -C -c
>  87380  16384  16384    10.01      13923.16   99.90    99.90    0.588   0.588 
>  87380  16384  16384    10.01      13854.59   99.90    99.90    0.591   0.591 
>  87380  16384  16384    10.02      13840.42   99.90    99.90    0.591   0.591 
>  87380  16384  16384    10.01      13810.96   99.90    99.90    0.593   0.593 
>  87380  16384  16384    10.01      13771.27   99.90    99.90    0.594   0.594 
> 
> netperf -C -c -- -s 1024
>  87380   2048   2048    10.02      2473.48   99.90    99.90    3.309   3.309 
>  87380   2048   2048    10.02      2421.46   99.90    99.90    3.380   3.380 
>  87380   2048   2048    10.02      2288.07   99.90    99.90    3.577   3.577 
>  87380   2048   2048    10.02      2405.41   99.90    99.90    3.402   3.402 
>  87380   2048   2048    10.02      2284.41   99.90    99.90    3.582   3.582 
> 
> netperf -H woody -C -c
>  87380  16384  16384    10.04       936.10   23.04    21.60    4.033   1.890 
>  87380  16384  16384    10.03       936.20   18.52    21.06    3.242   1.843 
>  87380  16384  16384    10.03       936.52   17.61    21.05    3.082   1.841 
>  87380  16384  16384    10.03       936.18   18.24    20.73    3.191   1.814 
>  87380  16384  16384    10.03       936.28   18.30    21.04    3.202   1.841 
> 
> netperf -H woody -C -c -- -s 1024
>  87380   2048   2048    10.00       142.46   10.19    7.53     11.714  4.332 
>  87380   2048   2048    10.00       147.28   9.73     7.93     10.829  4.412 
>  87380   2048   2048    10.00       143.37   10.64    6.54     12.161  3.738 
>  87380   2048   2048    10.00       146.41   9.18     7.43     10.277  4.158 
>  87380   2048   2048    10.01       145.58   9.80     7.25     11.032  4.081 
> 
> Comments/thoughts?
> 
> 		-ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH?] tcp and delayed acks
@ 2006-08-16 20:55 Benjamin LaHaise
  2006-08-16 19:11 ` Stephen Hemminger
  2006-08-16 22:39 ` Alexey Kuznetsov
  0 siblings, 2 replies; 6+ messages in thread
From: Benjamin LaHaise @ 2006-08-16 20:55 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

Hello folks,

In looking at a few benchmarks (especially netperf) run locally, it seems 
that tcp is unable to make full use of available CPU cycles as the sender 
is throttled waiting for ACKs to arrive.  The problem is exacerbated when 
the sender is using a small send buffer -- running netperf -C -c -- -s 1024 
show a miserable 420Kbit/s at essentially 0% CPU usage.  Tests over gige 
are similarly constrained to a mere 96Mbit/s.

Since there is no way for the receiver to know if the sender is being 
blocked on transmit space, would it not make sense for the receiver to 
send out any delayed ACKs when it is clear that the receiving process is 
waiting for more data?  The patch below attempts this (I make no guarantees 
of its correctness with respect to the rest of the delayed ack code).  One 
point I'm still contemplating is what to do if the receiver is waiting in 
poll/select/epoll.

[All tests run with maxcpus=1 on a 2.67GHz Woodcrest system.]

Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

Base (2.6.17-rc4):
default send buffer size
netperf -C -c
 87380  16384  16384    10.02      14127.79   99.90    99.90    0.579   0.579 
 87380  16384  16384    10.02      13875.28   99.90    99.90    0.590   0.590 
 87380  16384  16384    10.01      13777.25   99.90    99.90    0.594   0.594 
 87380  16384  16384    10.02      13796.31   99.90    99.90    0.593   0.593 
 87380  16384  16384    10.01      13801.97   99.90    99.90    0.593   0.593 

netperf -C -c -- -s 1024
 87380   2048   2048    10.02         0.43   -0.04    -0.04    -7.105  -7.377
 87380   2048   2048    10.02         0.43   -0.01    -0.01    -2.337  -2.620
 87380   2048   2048    10.02         0.43   -0.03    -0.03    -5.683  -5.940
 87380   2048   2048    10.02         0.43   -0.05    -0.05    -9.373  -9.625
 87380   2048   2048    10.02         0.43   -0.05    -0.05    -9.373  -9.625

from a remote system over gigabit ethernet
netperf -H woody -C -c
 87380  16384  16384    10.03       936.23   19.32    20.47    3.382   1.791 
 87380  16384  16384    10.03       936.27   17.67    20.95    3.091   1.833 
 87380  16384  16384    10.03       936.17   19.18    20.77    3.356   1.817 
 87380  16384  16384    10.03       936.26   18.22    20.26    3.188   1.773 
 87380  16384  16384    10.03       936.26   17.35    20.54    3.036   1.797 

netperf -H woody -C -c -- -s 1024
 87380   2048   2048    10.00        95.72   10.04    6.64     17.188  5.683 
 87380   2048   2048    10.00        95.94   9.47     6.42     16.170  5.478 
 87380   2048   2048    10.00        96.83   9.62     5.72     16.283  4.840 
 87380   2048   2048    10.00        95.91   9.58     6.13     16.368  5.236 
 87380   2048   2048    10.00        95.91   9.58     6.13     16.368  5.236 


Patched:
default send buffer size
netperf -C -c
 87380  16384  16384    10.01      13923.16   99.90    99.90    0.588   0.588 
 87380  16384  16384    10.01      13854.59   99.90    99.90    0.591   0.591 
 87380  16384  16384    10.02      13840.42   99.90    99.90    0.591   0.591 
 87380  16384  16384    10.01      13810.96   99.90    99.90    0.593   0.593 
 87380  16384  16384    10.01      13771.27   99.90    99.90    0.594   0.594 

netperf -C -c -- -s 1024
 87380   2048   2048    10.02      2473.48   99.90    99.90    3.309   3.309 
 87380   2048   2048    10.02      2421.46   99.90    99.90    3.380   3.380 
 87380   2048   2048    10.02      2288.07   99.90    99.90    3.577   3.577 
 87380   2048   2048    10.02      2405.41   99.90    99.90    3.402   3.402 
 87380   2048   2048    10.02      2284.41   99.90    99.90    3.582   3.582 

netperf -H woody -C -c
 87380  16384  16384    10.04       936.10   23.04    21.60    4.033   1.890 
 87380  16384  16384    10.03       936.20   18.52    21.06    3.242   1.843 
 87380  16384  16384    10.03       936.52   17.61    21.05    3.082   1.841 
 87380  16384  16384    10.03       936.18   18.24    20.73    3.191   1.814 
 87380  16384  16384    10.03       936.28   18.30    21.04    3.202   1.841 

netperf -H woody -C -c -- -s 1024
 87380   2048   2048    10.00       142.46   10.19    7.53     11.714  4.332 
 87380   2048   2048    10.00       147.28   9.73     7.93     10.829  4.412 
 87380   2048   2048    10.00       143.37   10.64    6.54     12.161  3.738 
 87380   2048   2048    10.00       146.41   9.18     7.43     10.277  4.158 
 87380   2048   2048    10.01       145.58   9.80     7.25     11.032  4.081 

Comments/thoughts?

		-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <dont@kvack.org>.


diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 934396b..e554ceb 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1277,8 +1277,11 @@ #endif
 			/* Do not sleep, just process backlog. */
 			release_sock(sk);
 			lock_sock(sk);
-		} else
+		} else {
+			if (inet_csk_ack_scheduled(sk))
+				tcp_send_ack(sk);
 			sk_wait_data(sk, &timeo);
+		}
 
 #ifdef CONFIG_NET_DMA
 		tp->ucopy.wakeup = 0;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH?] tcp and delayed acks
  2006-08-16 19:11 ` Stephen Hemminger
@ 2006-08-16 21:15   ` David Miller
  2006-08-16 21:37   ` Rick Jones
  2006-08-16 21:41   ` Benjamin LaHaise
  2 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2006-08-16 21:15 UTC (permalink / raw)
  To: shemminger; +Cc: bcrl, netdev

From: Stephen Hemminger <shemminger@osdl.org>
Date: Wed, 16 Aug 2006 12:11:12 -0700

> What ethernet hardware? The defaults are often not big enough
> for full speed on gigabit hardware. I need increase rmem/wmem to allow
> for more buffering. 

Current kernels allow the TCP send and receive socket buffers
to grow up to at least 4MB in size, how much more do you need?

tcp_{w,r}mem[2] will now have a value of at least 4MB, see
net/ipv4/tcp.c:tcp_init().

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH?] tcp and delayed acks
  2006-08-16 19:11 ` Stephen Hemminger
  2006-08-16 21:15   ` David Miller
@ 2006-08-16 21:37   ` Rick Jones
  2006-08-16 21:41   ` Benjamin LaHaise
  2 siblings, 0 replies; 6+ messages in thread
From: Rick Jones @ 2006-08-16 21:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Benjamin LaHaise, David S. Miller, netdev

> The point of delayed ack's was to merge the response and the ack on request/response
> protocols like NFS or telnet. It does make sense to get it out sooner though.

Well, to a point at least - I wouldn't go so far as to suggest immediate 
ACKs.

However, I was always under the impression that ACKs were sent (in the 
mythical generic TCP stack) when:

a) there was data going the other way
b) there was a window update going the other way
c) the standalone ACK timer expired.

Does this patch then implement b?  Were there perhaps "holes" in the 
logic when things were smaller than the MTU/MSS?  (-v 2 on the netperf 
command line should show what the MSS was for the connection)

rick jones

BTW, many points scored for including CPU utilization and service demand 
figures with the netperf output :)

> 
> 
>>[All tests run with maxcpus=1 on a 2.67GHz Woodcrest system.]
>>
>>Recv   Send    Send                          Utilization       Service Demand
>>Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
>>Size   Size    Size     Time     Throughput  local    remote   local   remote
>>bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
>>
>>Base (2.6.17-rc4):
>>default send buffer size
>>netperf -C -c
>> 87380  16384  16384    10.02      14127.79   99.90    99.90    0.579   0.579 
>> 87380  16384  16384    10.02      13875.28   99.90    99.90    0.590   0.590 
>> 87380  16384  16384    10.01      13777.25   99.90    99.90    0.594   0.594 
>> 87380  16384  16384    10.02      13796.31   99.90    99.90    0.593   0.593 
>> 87380  16384  16384    10.01      13801.97   99.90    99.90    0.593   0.593 
>>
>>netperf -C -c -- -s 1024
>> 87380   2048   2048    10.02         0.43   -0.04    -0.04    -7.105  -7.377
>> 87380   2048   2048    10.02         0.43   -0.01    -0.01    -2.337  -2.620
>> 87380   2048   2048    10.02         0.43   -0.03    -0.03    -5.683  -5.940
>> 87380   2048   2048    10.02         0.43   -0.05    -0.05    -9.373  -9.625
>> 87380   2048   2048    10.02         0.43   -0.05    -0.05    -9.373  -9.625

Hmm, those CPU numbers don't look right.  I guess there must still be 
some holes in the procstat CPU method code in netperf :(



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH?] tcp and delayed acks
  2006-08-16 19:11 ` Stephen Hemminger
  2006-08-16 21:15   ` David Miller
  2006-08-16 21:37   ` Rick Jones
@ 2006-08-16 21:41   ` Benjamin LaHaise
  2 siblings, 0 replies; 6+ messages in thread
From: Benjamin LaHaise @ 2006-08-16 21:41 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev

On Wed, Aug 16, 2006 at 12:11:12PM -0700, Stephen Hemminger wrote:
> > is throttled waiting for ACKs to arrive.  The problem is exacerbated when 
> > the sender is using a small send buffer -- running netperf -C -c -- -s 1024 
> > show a miserable 420Kbit/s at essentially 0% CPU usage.  Tests over gige 
> > are similarly constrained to a mere 96Mbit/s.
> 
> What ethernet hardware? The defaults are often not big enough
> for full speed on gigabit hardware. I need increase rmem/wmem to allow
> for more buffering. 

This is for small buffer transmit buffer sizes over either loopback or 
e1000.  The artifact also shows up over localhost for somewhat larger buffer 
sizes, although it is much more difficult to get results that don't have 
large fluctuations because of other scheduling issues.  Pinning the tasks to 
CPUs is on my list of things to try, but something in the multiple variants 
of sched_setaffinity() has resulted in it being broken in netperf.

> The point of delayed ack's was to merge the response and the ack on request/response
> protocols like NFS or telnet. It does make sense to get it out sooner though.

I would like to see what sort of effect this change has on higher latency.  
Ideally, quick ack mode should be doing the right thing, but it might need 
more input about the receiver's intent.

		-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH?] tcp and delayed acks
  2006-08-16 20:55 [PATCH?] tcp and delayed acks Benjamin LaHaise
  2006-08-16 19:11 ` Stephen Hemminger
@ 2006-08-16 22:39 ` Alexey Kuznetsov
  1 sibling, 0 replies; 6+ messages in thread
From: Alexey Kuznetsov @ 2006-08-16 22:39 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: David S. Miller, netdev

Hello!

> send out any delayed ACKs when it is clear that the receiving process is 
> waiting for more data?

It has just be done in tcp_cleanup_rbuf() a few lines before your chunk.
There is some somplex condition to be satisfied there and it is
impossible to relax it any further.

I do not know what is wrong in your case, check what of conditions
in tcp_cleanup_rbuf() was not satisfied. Or just tcpdump a little.

BTW what "buffer" do you mean? SO_SNDBUF? SO_RCVBUF? Something else?
TCP tries to tune itself to weird buffer sizes to allow to have
at least 2 segments in flight exactly to address the problem with delayed
ACKs, but it is quite easy to confuse those heuristics.

Alexey

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-08-16 22:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-16 20:55 [PATCH?] tcp and delayed acks Benjamin LaHaise
2006-08-16 19:11 ` Stephen Hemminger
2006-08-16 21:15   ` David Miller
2006-08-16 21:37   ` Rick Jones
2006-08-16 21:41   ` Benjamin LaHaise
2006-08-16 22:39 ` Alexey Kuznetsov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).