[PATCH] Add TCP_NO_DELAYED_ACK socket option

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] Add TCP_NO_DELAYED_ACK socket option
@ 2011-10-26  2:25 Andy Lutomirski
  2011-10-26 17:56 ` Rick Jones
  2011-10-27 10:24 ` Eric Dumazet
  0 siblings, 2 replies; 9+ messages in thread
From: Andy Lutomirski @ 2011-10-26  2:25 UTC (permalink / raw)
  To: netdev; +Cc: Andy Lutomirski

When talking to an unfixable interactive peer that fails to set
TCP_NODELAY, disabling delayed ACKs can help mitigate the problem.
This is an evil thing to do, but if the entire network is private,
it's not that evil.

This works around a problem with the remote *application*, so make
it a socket option instead of a sysctl or a per-route option.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---

This patch is a bit embarrassing.  We talk to remote applications over
TCP that are very much interactive but don't set TCP_NODELAY.  These
applications apparently cannot be fixed.  As a partial workaround, if we
ACK every incoming segment, then as long as they don't transmit two
segments per rtt, we do pretty well.

Windows can do something similar, but it's per interface instead of per
socket:

http://support.microsoft.com/kb/328890

 include/linux/tcp.h                |    1 +
 include/net/inet_connection_sock.h |    3 ++-
 net/ipv4/tcp.c                     |   11 +++++++++++
 net/ipv4/tcp_input.c               |    3 ++-
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 531ede8..2116f31 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -106,6 +106,7 @@ enum {
 #define TCP_THIN_LINEAR_TIMEOUTS 16      /* Use linear timeouts for thin streams*/
 #define TCP_THIN_DUPACK         17      /* Fast retrans. after 1 dupack */
 #define TCP_USER_TIMEOUT	18	/* How long for loss retry before timeout */
+#define TCP_NO_DELAYED_ACK	19	/* Do not delay ACKs.  */
 
 /* for TCP_INFO socket option */
 #define TCPI_OPT_TIMESTAMPS	1
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index e6db62e..1ad91bf 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -106,8 +106,9 @@ struct inet_connection_sock {
 	struct {
 		__u8		  pending;	 /* ACK is pending			   */
 		__u8		  quick;	 /* Scheduled number of quick acks	   */
-		__u8		  pingpong;	 /* The session is interactive		   */
 		__u8		  blocked;	 /* Delayed ACK was blocked by socket lock */
+		__u8		  pingpong:1;	 /* The session is interactive		   */
+		__u8		  nodelack:1;	 /* Delayed ACKs are disabled		   */
 		__u32		  ato;		 /* Predicted tick of soft clock	   */
 		unsigned long	  timeout;	 /* Currently scheduled timeout		   */
 		__u32		  lrcvtime;	 /* timestamp of last received data packet */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 46febca..e8e98dc 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2385,6 +2385,13 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		}
 		break;
 
+	case TCP_NO_DELAYED_ACK:
+		if (val == 0 || val == 1)
+			icsk->icsk_ack.nodelack = !!val;
+		else
+			err = -EINVAL;
+		break;
+
 #ifdef CONFIG_TCP_MD5SIG
 	case TCP_MD5SIG:
 		/* Read the IP->Key mappings from userspace */
@@ -2564,6 +2571,10 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
 		val = !icsk->icsk_ack.pingpong;
 		break;
 
+	case TCP_NO_DELAYED_ACK:
+		val = icsk->icsk_ack.nodelack;
+		break;
+
 	case TCP_CONGESTION:
 		if (get_user(len, optlen))
 			return -EFAULT;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 21fab3e..e7d7ee0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -197,7 +197,8 @@ static void tcp_enter_quickack_mode(struct sock *sk)
 static inline int tcp_in_quickack_mode(const struct sock *sk)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
-	return icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong;
+	return (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong) ||
+		icsk->icsk_ack.nodelack;
 }
 
 static inline void TCP_ECN_queue_cwr(struct tcp_sock *tp)
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add TCP_NO_DELAYED_ACK socket option
  2011-10-26  2:25 [PATCH] Add TCP_NO_DELAYED_ACK socket option Andy Lutomirski
@ 2011-10-26 17:56 ` Rick Jones
  2011-10-26 19:35   ` Andy Lutomirski
  2011-10-27 10:24 ` Eric Dumazet
  1 sibling, 1 reply; 9+ messages in thread
From: Rick Jones @ 2011-10-26 17:56 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: netdev

On 10/25/2011 07:25 PM, Andy Lutomirski wrote:
> When talking to an unfixable interactive peer that fails to set
> TCP_NODELAY, disabling delayed ACKs can help mitigate the problem.
> This is an evil thing to do, but if the entire network is private,
> it's not that evil.
>
> This works around a problem with the remote *application*, so make
> it a socket option instead of a sysctl or a per-route option.
>
> Signed-off-by: Andy Lutomirski<luto@amacapital.net>
> ---
>
> This patch is a bit embarrassing.  We talk to remote applications over
> TCP that are very much interactive but don't set TCP_NODELAY.  These
> applications apparently cannot be fixed.  As a partial workaround, if we
> ACK every incoming segment, then as long as they don't transmit two
> segments per rtt, we do pretty well.

Embarrassing/evil indeed - is it really something to go into the kernel?

If the networks where this happens are indeed truly private, can they 
run a private kernel?  Or use an LD_PRELOAD hack to wedge-in a 
setsockopt(TCP_NODELAY) call into the application?  Or set something 
like tcp_naglim_def on the application system(s)?  Or have the server 
application make a setsockopt(TCP_MAXSEG) call before listen() to a 
value one byte below that of what the application is sending?

Is the application actually "virtuous" in sending logically associated 
data in one "send" call, and simply running afoul of Nagle+DelayedACK in 
having multiple distinct requests outstanding at once, or is it actually 
quite evil in that it is sending logically associated data in separate 
send calls?

rick jones

choir preaching follows:

raj@tardy:~$ cat usenet_replies/nagle_algorithm

 > I'm not familiar with this issue, and I'm mostly ignorant about what
 > tcp does below the sockets interface. Can anybody briefly explain what
 > "nagle" is, and how and when to turn it off? Or point me to the
 > appropriate manual.

In broad terms, whenever an application does a send() call, the logic
of the Nagle algorithm is supposed to go something like this:

1) Is the quantity of data in this send, plus any queued, unsent data,
    greater than the MSS (Maximum Segment Size) for this connection? If
    yes, send the data in the user's send now (modulo any other
    constraints such as receiver's advertised window and the TCP
    congestion window). If no, go to 2.

2) Is the connection to the remote otherwise idle? That is, is there
    no unACKed data outstanding on the network. If yes, send the data
    in the user's send now. If no, queue the data and wait. Either the
    application will continue to call send() with enough data to get to
    a full MSS-worth of data, or the remote will ACK all the currently
    sent, unACKed data, or our retransmission timer will expire.

Now, where applications run into trouble is when they have what might
be described as "write, write, read" behaviour, where they present
logically associated data to the transport in separate 'send' calls
and those sends are typically less than the MSS for the connection.
It isn't so much that they run afoul of Nagle as they run into issues
with the interaction of Nagle and the other heuristics operating on
the remote. In particular, the delayed ACK heuristics.

When a receiving TCP is deciding whether or not to send an ACK back to
the sender, in broad handwaving terms it goes through logic similar to
this:

a) is there data being sent back to the sender? if yes, piggy-back the
    ACK on the data segment.

b) is there a window update being sent back to the sender? if yes,
    piggy-back the ACK on the window update.

c) has the standalone ACK timer expired.

Window updates are generally triggered by the following heuristics:

i) would the window update be for a non-trivial fraction of the window
    - typically somewhere at or above 1/4 the window, that is, has the
    application "consumed" at least that much data? if yes, send a
    window update. if no, check ii.

ii) would the window update be for, the application "consumed," at
     least 2*MSS worth of data? if yes, send a window update, if no
     wait.

Now, going back to that write, write, read application, on the sending
side, the first write will be transmitted by TCP via nagle rule 2 -
the connection is otherwise idle.  However, the second small send will
be delayed as there is at that point unACKnowledged data outstanding
on the connection.

At the receiver, that small TCP segment will arrive and will be passed
to the application. The application does not have the entire app-level
message, so it will not send a reply (data to TCP) back. The typical
TCP window is much much larger than the MSS, so no window update would
be triggered by heuristic i. The data just arrived and consumed by the
application is < 2*MSS, so no window update from heuristic ii.  Since
there is no window update, no ACK is sent by heuristic b.

So, that leaves heuristic c - the standalone ACK timer. That ranges
anywhere between 50 and 200 milliseconds depending on the TCP stack in
use.

If you've read this far :) now we can take a look at the effect of
various things touted as "fixes" to applications experiencing this
interaction.  We take as our example a client-server application where
both the client and the server are implemented with a write of a small
application header, followed by application data.  First, the
"default" case which is with Nagle enabled (TCP_NODELAY _NOT_ set) and
with standard ACK behaviour:

               Client                     Server
              Req Header        ->
                                <-        Standalone ACK after Nms
              Req Data          ->
                                <-        Possible standalone ACK
                                <-        Rsp Header
              Standalone ACK    ->
                                <-        Rsp Data
     Possible standalone ACK    ->

For two "messages" we end-up with at least six segments on the wire.
The possible standalone ACKs will depend on whether the server's
response time, or client's think time is longer than the standalone
ACK interval on their respective sides. Now, if TCP_NODELAY is set we
see:

               Client                     Server
              Req Header        ->
              Req Data          ->
                                <-        Possible Standalone ACK after Nms
                                <-        Rsp Header
                                <-        Rsp Data
      Possible Standalone ACK   ->

In theory, we are down two four segments on the wire which seems good,
but frankly we can do better.  First though, consider what happens
when someone disables delayed ACKs

               Client                     Server
              Req Header        ->
                                <-        Immediate Standalone ACK
              Req Data          ->
                                <-        Immediate Standalone ACK
                                <-        Rsp Header
    Immediate Standalone ACK    ->
                                <-        Rsp Data
    Immediate Standalone ACK    ->

Now we definitly see 8 segments on the wire.  It will also be that way
if both TCP_NODELAY is set and delayed ACKs are disabled.

How about if the application did the "right" think in the first place?
That is sent the logically associated data at the same time:

               Client                     Server
              Request        ->
                             <-           Possible Standalone ACK
                                <-        Response
    Possible Standalone ACK    ->

We are down to two segments on the wire.

For "small" packets, the CPU cost is about the same regardless of data
or ACK.  This means that the application which is making the propper
gathering send call will spend far fewer CPU cycles in the networking
stack.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add TCP_NO_DELAYED_ACK socket option
  2011-10-26 17:56 ` Rick Jones
@ 2011-10-26 19:35   ` Andy Lutomirski
  2011-10-26 20:06     ` Rick Jones
  0 siblings, 1 reply; 9+ messages in thread
From: Andy Lutomirski @ 2011-10-26 19:35 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev

On Wed, Oct 26, 2011 at 10:56 AM, Rick Jones <rick.jones2@hp.com> wrote:
> On 10/25/2011 07:25 PM, Andy Lutomirski wrote:
>>
>> When talking to an unfixable interactive peer that fails to set
>> TCP_NODELAY, disabling delayed ACKs can help mitigate the problem.
>> This is an evil thing to do, but if the entire network is private,
>> it's not that evil.
>>
>> This works around a problem with the remote *application*, so make
>> it a socket option instead of a sysctl or a per-route option.
>>
>> Signed-off-by: Andy Lutomirski<luto@amacapital.net>
>> ---
>>
>> This patch is a bit embarrassing.  We talk to remote applications over
>> TCP that are very much interactive but don't set TCP_NODELAY.  These
>> applications apparently cannot be fixed.  As a partial workaround, if we
>> ACK every incoming segment, then as long as they don't transmit two
>> segments per rtt, we do pretty well.
>
> Embarrassing/evil indeed - is it really something to go into the kernel?

That's a good question.  It's in our kernel -- I don't know whether it
should go upstream.

>
> If the networks where this happens are indeed truly private, can they run a
> private kernel?  Or use an LD_PRELOAD hack to wedge-in a
> setsockopt(TCP_NODELAY) call into the application?  Or set something like
> tcp_naglim_def on the application system(s)?  Or have the server application
> make a setsockopt(TCP_MAXSEG) call before listen() to a value one byte below
> that of what the application is sending?

We control our server.  We don't control the server at the other end.
We've tried to get them to do any of the above, but they seem
unwilling or unable to do it.  I suspect that they're using various
pieces from various third-party vendors that just don't care.

>
> Is the application actually "virtuous" in sending logically associated data
> in one "send" call, and simply running afoul of Nagle+DelayedACK in having
> multiple distinct requests outstanding at once, or is it actually quite evil
> in that it is sending logically associated data in separate send calls?
>

The remote application generates messages meant for us, and they
appear to send each message in its own segment.  I don't have the
source, so I don't know whether they're really using one send call per
message or whether they're using MSG_MORE, TCP_CORK, so some other
mechanism.  Each message is time-sensitive and should be received as
soon as possible afterq its sent (i.e. one-half rtt).  Unfortunately,
when they send two messages and we don't ack the first one, the second
gets delayed.  Turning off delayed acks helps but does not completely
solve the problem.

> rick jones
>
> choir preaching follows:

:)  I agree.  Unfortunately I didn't write all this stuff.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add TCP_NO_DELAYED_ACK socket option
  2011-10-26 19:35   ` Andy Lutomirski
@ 2011-10-26 20:06     ` Rick Jones
  2011-10-27  5:35       ` Andy Lutomirski
  0 siblings, 1 reply; 9+ messages in thread
From: Rick Jones @ 2011-10-26 20:06 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: netdev

>> If the networks where this happens are indeed truly private, can they run a
>> private kernel?  Or use an LD_PRELOAD hack to wedge-in a
>> setsockopt(TCP_NODELAY) call into the application?  Or set something like
>> tcp_naglim_def on the application system(s)?  Or have the server application
>> make a setsockopt(TCP_MAXSEG) call before listen() to a value one byte below
>> that of what the application is sending?
>
> We control our server.  We don't control the server at the other end.
> We've tried to get them to do any of the above, but they seem
> unwilling or unable to do it.  I suspect that they're using various
> pieces from various third-party vendors that just don't care.

Making the setsockopt(TCP_MAXSEG) would be at your end :)  Presumably 
based on the minimum message size.  That would cause the connection to 
have an MSS == the request size so every request send should take the 
"is this send plus any queued unsent data >= MSS" path.

Another "at your end" possibility would be setting a rather small 
SO_RCVBUF size at your end before calling listen(), in hopes of 
triggering the window update.

>> Is the application actually "virtuous" in sending logically associated data
>> in one "send" call, and simply running afoul of Nagle+DelayedACK in having
>> multiple distinct requests outstanding at once, or is it actually quite evil
>> in that it is sending logically associated data in separate send calls?
>>
>
> The remote application generates messages meant for us, and they
> appear to send each message in its own segment.  I don't have the
> source, so I don't know whether they're really using one send call per
> message or whether they're using MSG_MORE, TCP_CORK, so some other
> mechanism.  Each message is time-sensitive and should be received as
> soon as possible afterq its sent (i.e. one-half rtt).  Unfortunately,
> when they send two messages and we don't ack the first one, the second
> gets delayed.  Turning off delayed acks helps but does not completely
> solve the problem.

If it is write,write,read  (multiple sends per logical message) in a 
packet trace you should see a partial request in the first segment, 
followed by the rest of the request  (and perhaps the second through 
Nth) in the second segment.  Or, I suppose your server application would 
have a receive complete with the first part of the first request, 
getting the second part of the request in a subsequent receive call.

If it is multiple requests at a time each sent in one send call, you 
should see a first segment arriving with a complete request within it, 
followed by a second segment with the next request(s).

rick jones

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add TCP_NO_DELAYED_ACK socket option
  2011-10-26 20:06     ` Rick Jones
@ 2011-10-27  5:35       ` Andy Lutomirski
  0 siblings, 0 replies; 9+ messages in thread
From: Andy Lutomirski @ 2011-10-27  5:35 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev

On Wed, Oct 26, 2011 at 1:06 PM, Rick Jones <rick.jones2@hp.com> wrote:
>>> If the networks where this happens are indeed truly private, can they run
>>> a
>>> private kernel?  Or use an LD_PRELOAD hack to wedge-in a
>>> setsockopt(TCP_NODELAY) call into the application?  Or set something like
>>> tcp_naglim_def on the application system(s)?  Or have the server
>>> application
>>> make a setsockopt(TCP_MAXSEG) call before listen() to a value one byte
>>> below
>>> that of what the application is sending?
>>
>> We control our server.  We don't control the server at the other end.
>> We've tried to get them to do any of the above, but they seem
>> unwilling or unable to do it.  I suspect that they're using various
>> pieces from various third-party vendors that just don't care.
>
> Making the setsockopt(TCP_MAXSEG) would be at your end :)  Presumably based
> on the minimum message size.  That would cause the connection to have an MSS
> == the request size so every request send should take the "is this send plus
> any queued unsent data >= MSS" path.

That's cute.  The messages are variable-size (but they don't vary
much), so doing this would probably be worse for the network than
having them set TCP_NODELAY or having us turn off delayed acks, but we
don't really care about the network, and it might work well.

>
> Another "at your end" possibility would be setting a rather small SO_RCVBUF
> size at your end before calling listen(), in hopes of triggering the window
> update.

That scares me.  If they every start sending in bursts (it happens on
occasion), then we lose if they would want to exceed an artificially
small window.

>
>>> Is the application actually "virtuous" in sending logically associated
>>> data
>>> in one "send" call, and simply running afoul of Nagle+DelayedACK in
>>> having
>>> multiple distinct requests outstanding at once, or is it actually quite
>>> evil
>>> in that it is sending logically associated data in separate send calls?
>>>
>>
>> The remote application generates messages meant for us, and they
>> appear to send each message in its own segment.  I don't have the
>> source, so I don't know whether they're really using one send call per
>> message or whether they're using MSG_MORE, TCP_CORK, so some other
>> mechanism.  Each message is time-sensitive and should be received as
>> soon as possible afterq its sent (i.e. one-half rtt).  Unfortunately,
>> when they send two messages and we don't ack the first one, the second
>> gets delayed.  Turning off delayed acks helps but does not completely
>> solve the problem.
>
> If it is write,write,read  (multiple sends per logical message) in a packet
> trace you should see a partial request in the first segment, followed by the
> rest of the request  (and perhaps the second through Nth) in the second
> segment.  Or, I suppose your server application would have a receive
> complete with the first part of the first request, getting the second part
> of the request in a subsequent receive call.
>
> If it is multiple requests at a time each sent in one send call, you should
> see a first segment arriving with a complete request within it, followed by
> a second segment with the next request(s).

These are asynchronous messages and we don't reply to the vast
majority of them.  We see one request arriving per segment.

I'll play with TCP_MAXSEG.  But I'll probably leave TCP_NO_DELAYED_ACK
patched in to my kernel for the time being.  I'm not thrilled about
forcing the other side to split their messages across multiple
segments.

--Andy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add TCP_NO_DELAYED_ACK socket option
  2011-10-26  2:25 [PATCH] Add TCP_NO_DELAYED_ACK socket option Andy Lutomirski
  2011-10-26 17:56 ` Rick Jones
@ 2011-10-27 10:24 ` Eric Dumazet
  2011-10-27 11:54   ` Daniel Baluta
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2011-10-27 10:24 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: netdev

Le mardi 25 octobre 2011 à 19:25 -0700, Andy Lutomirski a écrit :
> When talking to an unfixable interactive peer that fails to set
> TCP_NODELAY, disabling delayed ACKs can help mitigate the problem.
> This is an evil thing to do, but if the entire network is private,
> it's not that evil.
> 
> This works around a problem with the remote *application*, so make
> it a socket option instead of a sysctl or a per-route option.
> 
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> ---
> 
> This patch is a bit embarrassing.  We talk to remote applications over
> TCP that are very much interactive but don't set TCP_NODELAY.  These
> applications apparently cannot be fixed.  As a partial workaround, if we
> ACK every incoming segment, then as long as they don't transmit two
> segments per rtt, we do pretty well.
> 
> Windows can do something similar, but it's per interface instead of per
> socket:
> 
> http://support.microsoft.com/kb/328890

Hi Andy

Yet another delayed ack hacking proposal :)

Well, to be honest, I find the MS Windows tunable more generic.
[ But doing it for a whole interface is wrong, it should be per socket
to allow best tuning ]

Setting the value to 4 (instead of default 2) for example would _reduce_
number of ACK packets in bulk transferts [ We can do that if GRO is on,
as a side effect ]

Also the 40ms/200ms values (TCP_DELACK_{MIN|MAX}) could be tunables.
(system or per socket)
RFC 1122 says it SHOULD be less than 500ms. The time criteria is IMHO
far more palatable for an application author than "number of delayed
acks"

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add TCP_NO_DELAYED_ACK socket option
  2011-10-27 10:24 ` Eric Dumazet
@ 2011-10-27 11:54   ` Daniel Baluta
  2011-10-27 12:13     ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Baluta @ 2011-10-27 11:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andy Lutomirski, netdev

On Thu, Oct 27, 2011 at 1:24 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 25 octobre 2011 à 19:25 -0700, Andy Lutomirski a écrit :
>> When talking to an unfixable interactive peer that fails to set
>> TCP_NODELAY, disabling delayed ACKs can help mitigate the problem.
>> This is an evil thing to do, but if the entire network is private,
>> it's not that evil.
>>
>> This works around a problem with the remote *application*, so make
>> it a socket option instead of a sysctl or a per-route option.
>>
>> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
>> ---
>>
>> This patch is a bit embarrassing.  We talk to remote applications over
>> TCP that are very much interactive but don't set TCP_NODELAY.  These
>> applications apparently cannot be fixed.  As a partial workaround, if we
>> ACK every incoming segment, then as long as they don't transmit two
>> segments per rtt, we do pretty well.
>>
>> Windows can do something similar, but it's per interface instead of per
>> socket:
>>
>> http://support.microsoft.com/kb/328890
>
> Hi Andy
>
> Yet another delayed ack hacking proposal :)
>
> Well, to be honest, I find the MS Windows tunable more generic.
> [ But doing it for a whole interface is wrong, it should be per socket
> to allow best tuning ]
>
> Setting the value to 4 (instead of default 2) for example would _reduce_
> number of ACK packets in bulk transferts [ We can do that if GRO is on,
> as a side effect ]
>
> Also the 40ms/200ms values (TCP_DELACK_{MIN|MAX}) could be tunables.
> (system or per socket)
> RFC 1122 says it SHOULD be less than 500ms. The time criteria is IMHO
> far more palatable for an application author than "number of delayed
> acks"

Hello Eric,

Few days ago, in our custom kernel we made TCP Delack segments and
TCP Delack timeout parameters tunable via proc entries.
Increasing tcp_delack_segs (number of full sized segments that must
be received until an ACK is sent) we observed an improvement of
throughput up to 20% in some test cases.

Do you think that this kind of patch would have a chance to be
included in mainstream?

thanks,
Daniel.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add TCP_NO_DELAYED_ACK socket option
  2011-10-27 11:54   ` Daniel Baluta
@ 2011-10-27 12:13     ` Eric Dumazet
  2011-10-27 12:18       ` Daniel Baluta
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2011-10-27 12:13 UTC (permalink / raw)
  To: Daniel Baluta; +Cc: Andy Lutomirski, netdev

Le jeudi 27 octobre 2011 à 14:54 +0300, Daniel Baluta a écrit :

> Few days ago, in our custom kernel we made TCP Delack segments and
> TCP Delack timeout parameters tunable via proc entries.
> Increasing tcp_delack_segs (number of full sized segments that must
> be received until an ACK is sent) we observed an improvement of
> throughput up to 20% in some test cases.
> 
> Do you think that this kind of patch would have a chance to be
> included in mainstream?
> 

If your patches are ready, why not sending them as RFC ?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Add TCP_NO_DELAYED_ACK socket option
  2011-10-27 12:13     ` Eric Dumazet
@ 2011-10-27 12:18       ` Daniel Baluta
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel Baluta @ 2011-10-27 12:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andy Lutomirski, netdev

On Thu, Oct 27, 2011 at 3:13 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 27 octobre 2011 à 14:54 +0300, Daniel Baluta a écrit :
>
>> Few days ago, in our custom kernel we made TCP Delack segments and
>> TCP Delack timeout parameters tunable via proc entries.
>> Increasing tcp_delack_segs (number of full sized segments that must
>> be received until an ACK is sent) we observed an improvement of
>> throughput up to 20% in some test cases.
>>
>> Do you think that this kind of patch would have a chance to be
>> included in mainstream?
>>
>
> If your patches are ready, why not sending them as RFC ?

OK, I will port them to the latest kernel and send a patch.

thanks,
Daniel.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-10-27 12:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-26  2:25 [PATCH] Add TCP_NO_DELAYED_ACK socket option Andy Lutomirski
2011-10-26 17:56 ` Rick Jones
2011-10-26 19:35   ` Andy Lutomirski
2011-10-26 20:06     ` Rick Jones
2011-10-27  5:35       ` Andy Lutomirski
2011-10-27 10:24 ` Eric Dumazet
2011-10-27 11:54   ` Daniel Baluta
2011-10-27 12:13     ` Eric Dumazet
2011-10-27 12:18       ` Daniel Baluta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).