netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Can I limit the number of active tx per TCP socket?
@ 2014-03-06 12:28 David Laight
  2014-03-06 14:15 ` John Heffner
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: David Laight @ 2014-03-06 12:28 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Is it possible to stop a TCP connection having more than one
tx skb (in the ethernet tx ring) at any one time?
The idea is to allow time for short sends from the application
to accumulate so that the transmitted frames are longer.

Basically I have a TCP connection which carries a lot of separate
short 'user buffers'. These are not command-response so
TCP_NODELAY has to be set to avoid long delays. But this means
that the ethernet packet rate is very high - with 3.14 about
2000/sec even though the data rate is well under 1MB/sec.

Anything that reduces this packet rate will help the poor little
embedded ppc that has to receive them!

>From the descriptions I've found I suspect that setting a very
low TCP_NOTSENT_LOWAT (like 1 byte) might have other side effects.
I think that limits the writes into kernel memory - which isn't
really what I'm trying to do.

With a 3.14.0-rc5 kernel reducing the network speed to 10M (from Ge)
halves the number of transmitted packets (with the same aggregate
data rate). But I suspect it could still be reduced further.

Limiting the number of tx packets per TCP connection might also
help stop bulk transfers affecting low-latency connections,
especially if the throughput of individual connections isn't
especially important - as it may not be on a big ftp/web server.

Limiting the window size offered by the remote system won't help me.
The window needs to be large enough several full-sized packets, and
I'm trying to stop large numbers of very short packets being sent.

In this particular case the connection is local, but we have a similar
problem with sigtran m3ua traffic over sctp.
If we are sending 15000 sctp data chunks every second, with an average size
under 300 bytes (possibly nearer 150) then we really want to fill the
ethernet packets.
(That is a real data pattern, not a development test.)

	David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Can I limit the number of active tx per TCP socket?
  2014-03-06 12:28 Can I limit the number of active tx per TCP socket? David Laight
@ 2014-03-06 14:15 ` John Heffner
  2014-03-06 15:03   ` David Laight
  2014-03-06 14:38 ` Eric Dumazet
  2014-03-06 17:17 ` Rick Jones
  2 siblings, 1 reply; 17+ messages in thread
From: John Heffner @ 2014-03-06 14:15 UTC (permalink / raw)
  To: David Laight; +Cc: netdev@vger.kernel.org

Is TCP_CORK what you're looking for?


On Thu, Mar 6, 2014 at 7:28 AM, David Laight <David.Laight@aculab.com> wrote:
> Is it possible to stop a TCP connection having more than one
> tx skb (in the ethernet tx ring) at any one time?
> The idea is to allow time for short sends from the application
> to accumulate so that the transmitted frames are longer.
>
> Basically I have a TCP connection which carries a lot of separate
> short 'user buffers'. These are not command-response so
> TCP_NODELAY has to be set to avoid long delays. But this means
> that the ethernet packet rate is very high - with 3.14 about
> 2000/sec even though the data rate is well under 1MB/sec.
>
> Anything that reduces this packet rate will help the poor little
> embedded ppc that has to receive them!
>
> From the descriptions I've found I suspect that setting a very
> low TCP_NOTSENT_LOWAT (like 1 byte) might have other side effects.
> I think that limits the writes into kernel memory - which isn't
> really what I'm trying to do.
>
> With a 3.14.0-rc5 kernel reducing the network speed to 10M (from Ge)
> halves the number of transmitted packets (with the same aggregate
> data rate). But I suspect it could still be reduced further.
>
> Limiting the number of tx packets per TCP connection might also
> help stop bulk transfers affecting low-latency connections,
> especially if the throughput of individual connections isn't
> especially important - as it may not be on a big ftp/web server.
>
> Limiting the window size offered by the remote system won't help me.
> The window needs to be large enough several full-sized packets, and
> I'm trying to stop large numbers of very short packets being sent.
>
> In this particular case the connection is local, but we have a similar
> problem with sigtran m3ua traffic over sctp.
> If we are sending 15000 sctp data chunks every second, with an average size
> under 300 bytes (possibly nearer 150) then we really want to fill the
> ethernet packets.
> (That is a real data pattern, not a development test.)
>
>         David
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Can I limit the number of active tx per TCP socket?
  2014-03-06 12:28 Can I limit the number of active tx per TCP socket? David Laight
  2014-03-06 14:15 ` John Heffner
@ 2014-03-06 14:38 ` Eric Dumazet
  2014-03-06 14:52   ` Eric Dumazet
  2014-03-06 17:17 ` Rick Jones
  2 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2014-03-06 14:38 UTC (permalink / raw)
  To: David Laight; +Cc: netdev@vger.kernel.org

On Thu, 2014-03-06 at 12:28 +0000, David Laight wrote:
> Is it possible to stop a TCP connection having more than one
> tx skb (in the ethernet tx ring) at any one time?
> The idea is to allow time for short sends from the application
> to accumulate so that the transmitted frames are longer.
> 
> Basically I have a TCP connection which carries a lot of separate
> short 'user buffers'. These are not command-response so
> TCP_NODELAY has to be set to avoid long delays. But this means
> that the ethernet packet rate is very high - with 3.14 about
> 2000/sec even though the data rate is well under 1MB/sec.
> 
> Anything that reduces this packet rate will help the poor little
> embedded ppc that has to receive them!
> 
> From the descriptions I've found I suspect that setting a very
> low TCP_NOTSENT_LOWAT (like 1 byte) might have other side effects.
> I think that limits the writes into kernel memory - which isn't
> really what I'm trying to do.
> 
> With a 3.14.0-rc5 kernel reducing the network speed to 10M (from Ge)
> halves the number of transmitted packets (with the same aggregate
> data rate). But I suspect it could still be reduced further.
> 
> Limiting the number of tx packets per TCP connection might also
> help stop bulk transfers affecting low-latency connections,
> especially if the throughput of individual connections isn't
> especially important - as it may not be on a big ftp/web server.
> 
> Limiting the window size offered by the remote system won't help me.
> The window needs to be large enough several full-sized packets, and
> I'm trying to stop large numbers of very short packets being sent.
> 
> In this particular case the connection is local, but we have a similar
> problem with sigtran m3ua traffic over sctp.
> If we are sending 15000 sctp data chunks every second, with an average size
> under 300 bytes (possibly nearer 150) then we really want to fill the
> ethernet packets.
> (That is a real data pattern, not a development test.)

This is the intent of TCP small queue, but it was somehow broken by
latest patch.

Also try FQ pacing

tc qdisc replace dev eth0 root fq

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5286228679bd..6e1db9889a8f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1909,8 +1909,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 		 * of queued bytes to ensure line rate.
 		 * One example is wifi aggregation (802.11 AMPDU)
 		 */
-		limit = max_t(unsigned int, sysctl_tcp_limit_output_bytes,
-			      sk->sk_pacing_rate >> 10);
+		limit = 2 * skb->truesize;
 
 		if (atomic_read(&sk->sk_wmem_alloc) > limit) {
 			set_bit(TSQ_THROTTLED, &tp->tsq_flags);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Can I limit the number of active tx per TCP socket?
  2014-03-06 14:38 ` Eric Dumazet
@ 2014-03-06 14:52   ` Eric Dumazet
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2014-03-06 14:52 UTC (permalink / raw)
  To: David Laight; +Cc: netdev@vger.kernel.org

On Thu, 2014-03-06 at 06:38 -0800, Eric Dumazet wrote:

> This is the intent of TCP small queue, but it was somehow broken by
> latest patch.
> 
> Also try FQ pacing
> 
> tc qdisc replace dev eth0 root fq

BTW, once you have FQ packet scheduler in place, you also can
use SO_MAX_PACING_RATE socket option on your flow if you know what is
the optimal throughput for your ppc receiver.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Can I limit the number of active tx per TCP socket?
  2014-03-06 14:15 ` John Heffner
@ 2014-03-06 15:03   ` David Laight
  0 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2014-03-06 15:03 UTC (permalink / raw)
  To: 'John Heffner'; +Cc: netdev@vger.kernel.org

From: John Heffner 
> On Thu, Mar 6, 2014 at 7:28 AM, David Laight <David.Laight@aculab.com> wrote:
> > Is it possible to stop a TCP connection having more than one
> > tx skb (in the ethernet tx ring) at any one time?
> > The idea is to allow time for short sends from the application
> > to accumulate so that the transmitted frames are longer.
>
> Is TCP_CORK what you're looking for?

No - TCP_CORK requires that you know you have more data to send.
In which case you are better off saving the system calls and
sending it with sendv() or writev().

I may well write the code to sit on the data for up to a ms
before sending it.  The data is sent out on a 64k line so
1ms is only 8 byte times.

	David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Can I limit the number of active tx per TCP socket?
  2014-03-06 12:28 Can I limit the number of active tx per TCP socket? David Laight
  2014-03-06 14:15 ` John Heffner
  2014-03-06 14:38 ` Eric Dumazet
@ 2014-03-06 17:17 ` Rick Jones
  2014-03-06 18:09   ` Neal Cardwell
  2014-03-07 10:17   ` David Laight
  2 siblings, 2 replies; 17+ messages in thread
From: Rick Jones @ 2014-03-06 17:17 UTC (permalink / raw)
  To: David Laight, netdev@vger.kernel.org

On 03/06/2014 04:28 AM, David Laight wrote:
> Is it possible to stop a TCP connection having more than one
> tx skb (in the ethernet tx ring) at any one time?
> The idea is to allow time for short sends from the application
> to accumulate so that the transmitted frames are longer.

That is precisely what Nagle is supposed to be doing - at least where 
the definition of "time" is the round-trip-time rather than "time it 
takes to get transmitted out the NIC."

> Basically I have a TCP connection which carries a lot of separate
> short 'user buffers'. These are not command-response so
> TCP_NODELAY has to be set to avoid long delays.

When you are saturating the receiver and/or the 64K line, are you 
certain that not setting TCP_NODELAY means long delays?

 From a later message:

> The data is sent out on a 64k line so 1ms is only 8 byte times.

Are you still using a 1460 byte MSS on such a connection?

Perhaps you can set the MSS (or drop the MTU on the 64K line and use 
PTMU) to something less to trigger window updates a bit sooner and so 
get piggy-backed ACKs rather than delayed ACKs and so not have to set 
TCP_NODELAY?  Yes, you will have a question of headers versus 
headers+data but with TCP_NODELAY set as you have it you are (probably) 
already trashing that.

Setting TCP_NODELAY to avoid "long delays" and then having a 64Kbyte/s 
link seems a trifle, well, contradictory.

rick jones

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Can I limit the number of active tx per TCP socket?
  2014-03-06 17:17 ` Rick Jones
@ 2014-03-06 18:09   ` Neal Cardwell
  2014-03-06 19:06     ` Rick Jones
  2014-03-07 10:24     ` David Laight
  2014-03-07 10:17   ` David Laight
  1 sibling, 2 replies; 17+ messages in thread
From: Neal Cardwell @ 2014-03-06 18:09 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Laight, netdev@vger.kernel.org

Eric's recent "auto corking" feature may be helpful in this context:

  http://lwn.net/Articles/576263/

neal

On Thu, Mar 6, 2014 at 12:17 PM, Rick Jones <rick.jones2@hp.com> wrote:
> On 03/06/2014 04:28 AM, David Laight wrote:
>>
>> Is it possible to stop a TCP connection having more than one
>> tx skb (in the ethernet tx ring) at any one time?
>> The idea is to allow time for short sends from the application
>> to accumulate so that the transmitted frames are longer.
>
>
> That is precisely what Nagle is supposed to be doing - at least where the
> definition of "time" is the round-trip-time rather than "time it takes to
> get transmitted out the NIC."
>
>
>> Basically I have a TCP connection which carries a lot of separate
>> short 'user buffers'. These are not command-response so
>> TCP_NODELAY has to be set to avoid long delays.
>
>
> When you are saturating the receiver and/or the 64K line, are you certain
> that not setting TCP_NODELAY means long delays?
>
> From a later message:
>
>
>> The data is sent out on a 64k line so 1ms is only 8 byte times.
>
>
> Are you still using a 1460 byte MSS on such a connection?
>
> Perhaps you can set the MSS (or drop the MTU on the 64K line and use PTMU)
> to something less to trigger window updates a bit sooner and so get
> piggy-backed ACKs rather than delayed ACKs and so not have to set
> TCP_NODELAY?  Yes, you will have a question of headers versus headers+data
> but with TCP_NODELAY set as you have it you are (probably) already trashing
> that.
>
> Setting TCP_NODELAY to avoid "long delays" and then having a 64Kbyte/s link
> seems a trifle, well, contradictory.
>
> rick jones
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Can I limit the number of active tx per TCP socket?
  2014-03-06 18:09   ` Neal Cardwell
@ 2014-03-06 19:06     ` Rick Jones
  2014-03-06 19:09       ` Neal Cardwell
  2014-03-06 19:27       ` Eric Dumazet
  2014-03-07 10:24     ` David Laight
  1 sibling, 2 replies; 17+ messages in thread
From: Rick Jones @ 2014-03-06 19:06 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: David Laight, netdev@vger.kernel.org

On 03/06/2014 10:09 AM, Neal Cardwell wrote:
> Eric's recent "auto corking" feature may be helpful in this context:
>
>    http://lwn.net/Articles/576263/

Doesn't that depend on the bottleneck being local to the sending side? 
Perhaps I've mis-understood David's setup, but I get the impression the 
bottleneck is not at the sending side but either in the middle or at the 
end, so tx completions will still be happening quickly.

rick

> neal
>
> On Thu, Mar 6, 2014 at 12:17 PM, Rick Jones <rick.jones2@hp.com> wrote:
>> On 03/06/2014 04:28 AM, David Laight wrote:
>>>
>>> Is it possible to stop a TCP connection having more than one
>>> tx skb (in the ethernet tx ring) at any one time?
>>> The idea is to allow time for short sends from the application
>>> to accumulate so that the transmitted frames are longer.
>>
>>
>> That is precisely what Nagle is supposed to be doing - at least where the
>> definition of "time" is the round-trip-time rather than "time it takes to
>> get transmitted out the NIC."
>>
>>
>>> Basically I have a TCP connection which carries a lot of separate
>>> short 'user buffers'. These are not command-response so
>>> TCP_NODELAY has to be set to avoid long delays.
>>
>>
>> When you are saturating the receiver and/or the 64K line, are you certain
>> that not setting TCP_NODELAY means long delays?
>>
>>  From a later message:
>>
>>
>>> The data is sent out on a 64k line so 1ms is only 8 byte times.
>>
>>
>> Are you still using a 1460 byte MSS on such a connection?
>>
>> Perhaps you can set the MSS (or drop the MTU on the 64K line and use PTMU)
>> to something less to trigger window updates a bit sooner and so get
>> piggy-backed ACKs rather than delayed ACKs and so not have to set
>> TCP_NODELAY?  Yes, you will have a question of headers versus headers+data
>> but with TCP_NODELAY set as you have it you are (probably) already trashing
>> that.
>>
>> Setting TCP_NODELAY to avoid "long delays" and then having a 64Kbyte/s link
>> seems a trifle, well, contradictory.
>>
>> rick jones
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Can I limit the number of active tx per TCP socket?
  2014-03-06 19:06     ` Rick Jones
@ 2014-03-06 19:09       ` Neal Cardwell
  2014-03-06 19:27       ` Eric Dumazet
  1 sibling, 0 replies; 17+ messages in thread
From: Neal Cardwell @ 2014-03-06 19:09 UTC (permalink / raw)
  To: Rick Jones; +Cc: David Laight, netdev@vger.kernel.org

On Thu, Mar 6, 2014 at 2:06 PM, Rick Jones <rick.jones2@hp.com> wrote:
> On 03/06/2014 10:09 AM, Neal Cardwell wrote:
>>
>> Eric's recent "auto corking" feature may be helpful in this context:
>>
>>    http://lwn.net/Articles/576263/
>
>
> Doesn't that depend on the bottleneck being local to the sending side?
> Perhaps I've mis-understood David's setup, but I get the impression the
> bottleneck is not at the sending side but either in the middle or at the
> end, so tx completions will still be happening quickly.

Yes, I think auto corking works well if you are using the fq/pacing
qdisc, and otherwise may not buy much.

neal

> rick
>
>
>> neal
>>
>> On Thu, Mar 6, 2014 at 12:17 PM, Rick Jones <rick.jones2@hp.com> wrote:
>>>
>>> On 03/06/2014 04:28 AM, David Laight wrote:
>>>>
>>>>
>>>> Is it possible to stop a TCP connection having more than one
>>>> tx skb (in the ethernet tx ring) at any one time?
>>>> The idea is to allow time for short sends from the application
>>>> to accumulate so that the transmitted frames are longer.
>>>
>>>
>>>
>>> That is precisely what Nagle is supposed to be doing - at least where the
>>> definition of "time" is the round-trip-time rather than "time it takes to
>>> get transmitted out the NIC."
>>>
>>>
>>>> Basically I have a TCP connection which carries a lot of separate
>>>> short 'user buffers'. These are not command-response so
>>>> TCP_NODELAY has to be set to avoid long delays.
>>>
>>>
>>>
>>> When you are saturating the receiver and/or the 64K line, are you certain
>>> that not setting TCP_NODELAY means long delays?
>>>
>>>  From a later message:
>>>
>>>
>>>> The data is sent out on a 64k line so 1ms is only 8 byte times.
>>>
>>>
>>>
>>> Are you still using a 1460 byte MSS on such a connection?
>>>
>>> Perhaps you can set the MSS (or drop the MTU on the 64K line and use
>>> PTMU)
>>> to something less to trigger window updates a bit sooner and so get
>>> piggy-backed ACKs rather than delayed ACKs and so not have to set
>>> TCP_NODELAY?  Yes, you will have a question of headers versus
>>> headers+data
>>> but with TCP_NODELAY set as you have it you are (probably) already
>>> trashing
>>> that.
>>>
>>> Setting TCP_NODELAY to avoid "long delays" and then having a 64Kbyte/s
>>> link
>>> seems a trifle, well, contradictory.
>>>
>>> rick jones
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Can I limit the number of active tx per TCP socket?
  2014-03-06 19:06     ` Rick Jones
  2014-03-06 19:09       ` Neal Cardwell
@ 2014-03-06 19:27       ` Eric Dumazet
  1 sibling, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2014-03-06 19:27 UTC (permalink / raw)
  To: Rick Jones; +Cc: Neal Cardwell, David Laight, netdev@vger.kernel.org

On Thu, 2014-03-06 at 11:06 -0800, Rick Jones wrote:
> On 03/06/2014 10:09 AM, Neal Cardwell wrote:
> > Eric's recent "auto corking" feature may be helpful in this context:
> >
> >    http://lwn.net/Articles/576263/
> 
> Doesn't that depend on the bottleneck being local to the sending side? 
> Perhaps I've mis-understood David's setup, but I get the impression the 
> bottleneck is not at the sending side but either in the middle or at the 
> end, so tx completions will still be happening quickly.

There were multiple aspects.

One skb per flow in qdisc/TX ring : TCP Small Queue spirit (recently
kind of broken by commit 98e09386c0ef4dfd ("tcp: tsq: restore minimal
amount of queueing ")

FQ adds the Fairness part (bulk flows wont slow down interactive flows)

Then if the problem is having a high speed interface on the sender,
and a low speed receiver (or bottleneck), then SO_MAX_PACING_RATE is an
easy way to solve the problem. Only flows setting this option will be
paced, while others can use all the bandwidth they need.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Can I limit the number of active tx per TCP socket?
  2014-03-06 17:17 ` Rick Jones
  2014-03-06 18:09   ` Neal Cardwell
@ 2014-03-07 10:17   ` David Laight
  1 sibling, 0 replies; 17+ messages in thread
From: David Laight @ 2014-03-07 10:17 UTC (permalink / raw)
  To: 'Rick Jones', netdev@vger.kernel.org

From: Rick Jones
> On 03/06/2014 04:28 AM, David Laight wrote:
> > Is it possible to stop a TCP connection having more than one
> > tx skb (in the ethernet tx ring) at any one time?
> > The idea is to allow time for short sends from the application
> > to accumulate so that the transmitted frames are longer.
> 
> That is precisely what Nagle is supposed to be doing - at least where
> the definition of "time" is the round-trip-time rather than "time it
> takes to get transmitted out the NIC."

No, enabling Nagle leads to unacceptable delays in sending packets.
Try sending one short message every 5ms (without any return data),
the 2nd message gets delayed until the remote ack timer, or Nagle
timer expire. Both these timers are far too long, and the Nagle
timeout isn't specifiable on a per socket basis.

> > Basically I have a TCP connection which carries a lot of separate
> > short 'user buffers'. These are not command-response so
> > TCP_NODELAY has to be set to avoid long delays.
> 
> When you are saturating the receiver and/or the 64K line, are you
> certain that not setting TCP_NODELAY means long delays?

Yes, because it delays data for at least 10s of milliseconds (100ms?)
(I can't remember the Nagle delay - I know it is visible if you
'page forward' when running 'vi' in a large xterm.)

We've had issues with 'slow start' being affected by 'delayed acks'.
IIRC 'slow start' will only send 4 frames without receiving an ack.
The 'delayed ack' code waits 50ms? before sending one out.
By that time we are discarding data received on the slow link!
I avoided that one by requesting an application level data ack
every 4 packets rather than every 8.

>  From a later message:
> 
> > The data is sent out on a 64k line so 1ms is only 8 byte times.
> 
> Are you still using a 1460 byte MSS on such a connection?

The 64k line isn't carrying TCP, the TCP connection is carrying
the data that will be sent over the 64k line.
(It is actually SS7 - telephony signalling data.)
The reason there are a lot of small packets on the TCP connection
is that the 'mtu' for those links it 273, and typical packets are
much shorter.

Also the traffic for multiple (max 64) 64k links is sent over the
same TCP connection, so the aggregate packet rate is considerable.
(Especially when I'm running performance tests!)

	David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Can I limit the number of active tx per TCP socket?
  2014-03-06 18:09   ` Neal Cardwell
  2014-03-06 19:06     ` Rick Jones
@ 2014-03-07 10:24     ` David Laight
  2014-03-07 12:29       ` David Laight
  1 sibling, 1 reply; 17+ messages in thread
From: David Laight @ 2014-03-07 10:24 UTC (permalink / raw)
  To: 'Neal Cardwell', Rick Jones; +Cc: netdev@vger.kernel.org

From: Neal Cardwell 
> Eric's recent "auto corking" feature may be helpful in this context:
> 
>   http://lwn.net/Articles/576263/

Yes, I was running some tests to make sure this didn't cause us
any grief.

In fact very aggressive "auto corking" would help my traffic flow.
I haven't yet tried locally reverting the patch that stopped
auto corking being quite as effective.
(I might even try setting the limit lower than 2*skb_true_size.)

	David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Can I limit the number of active tx per TCP socket?
  2014-03-07 10:24     ` David Laight
@ 2014-03-07 12:29       ` David Laight
  2014-03-07 14:22         ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: David Laight @ 2014-03-07 12:29 UTC (permalink / raw)
  To: David Laight, 'Neal Cardwell', Rick Jones; +Cc: netdev@vger.kernel.org

From: David Laight
> From: Neal Cardwell
> > Eric's recent "auto corking" feature may be helpful in this context:
> >
> >   http://lwn.net/Articles/576263/
> 
> Yes, I was running some tests to make sure this didn't cause us
> any grief.
> 
> In fact very aggressive "auto corking" would help my traffic flow.
> I haven't yet tried locally reverting the patch that stopped
> auto corking being quite as effective.
> (I might even try setting the limit lower than 2*skb_true_size.)

Agressive auto corking helps (reduces the cpu load of the ppc from
60% to 30% for the same traffic flow) - but only if I reduce the
ethernet speed to 10M.
(Actually I only have the process cpu load for the ppc, it probably
excludes a lot of the tcp rx code.)

I guess that the transmit is completing before the other active
processes/kernel threads manage to request the next transmit.
This won't be helped by the first packet being short.

Spinning all but one of the cpus with 'while :; do :; done'
has an interesting effect on the workload/throughput.
The aggregate message rate goes from 5200/sec to 9000/sec (now
limited by the 64k links).
I think this happens because the scheduler 'resists' pre-empting
running processes - so the TCP send processing happens in bursts.
The ppc process is then using about 85% (from top).

I'll probably look at delaying the sends within our own code.

	David



^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Can I limit the number of active tx per TCP socket?
  2014-03-07 12:29       ` David Laight
@ 2014-03-07 14:22         ` Eric Dumazet
  2014-03-07 14:35           ` David Laight
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2014-03-07 14:22 UTC (permalink / raw)
  To: David Laight; +Cc: 'Neal Cardwell', Rick Jones, netdev@vger.kernel.org

On Fri, 2014-03-07 at 12:29 +0000, David Laight wrote:

> I'll probably look at delaying the sends within our own code.

That would be bad.

Just use fq/pacing, this is way better. Its designed for this usage.

trick is to reduce 'quantum' as you mtu is 273+14 bytes.

QUANTUM=$((273+14))
tc qdisc replace dev eth0 root fq quantum $QUANTUM initial_quantum $QUANTUM

This will _perfectly_ pace packets every 34 ms.

If you share your ethernet device with this 64k destination, and other
uses, then you need a more complex setup with HTB plus two classes, and
fq running at the htb leaf

DEV=eth0

MTU=273
QUANTUM=$(($MTU+14))
DESTINATION=10.246.11.52/32   # change this to meet the network/host

tc qdisc del dev $DEV root &> /dev/null

tc qdisc add dev $DEV handle 1: root htb default 1

tc class add dev $DEV parent 1: classid 1:1 htb rate 1Gbit
tc class add dev $DEV parent 1: classid 1:2 htb rate 64kbit

tc filter add dev $DEV protocol ip parent 1:0 prio 1 u32 match ip dst $DESTINATION flowid 1:2

tc qdisc add dev $DEV parent 1:1 est 1sec 4sec fq
tc qdisc add dev $DEV parent 1:2 est 1sec 4sec fq quantum $QUANTUM initial_quantum $QUANTUM

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Can I limit the number of active tx per TCP socket?
  2014-03-07 14:22         ` Eric Dumazet
@ 2014-03-07 14:35           ` David Laight
  2014-03-07 15:00             ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: David Laight @ 2014-03-07 14:35 UTC (permalink / raw)
  To: 'Eric Dumazet'
  Cc: 'Neal Cardwell', Rick Jones, netdev@vger.kernel.org

From: Eric Dumazet 
> On Fri, 2014-03-07 at 12:29 +0000, David Laight wrote:
> 
> > I'll probably look at delaying the sends within our own code.
> 
> That would be bad.

The sending code can be told whether the packets are control
(which want sending immediately) or data (which can be delayed
based on knowledge of how much data has been sent recently).

I also probably ought to make this work on the window's version
of our code - but most of the high throughput systems are linux.
The overheads through the M$ IP stack are horrid.

> Just use fq/pacing, this is way better. Its designed for this usage.
> trick is to reduce 'quantum' as you mtu is 273+14 bytes.
> 
> QUANTUM=$((273+14))
> tc qdisc replace dev eth0 root fq quantum $QUANTUM initial_quantum $QUANTUM
> 
> This will _perfectly_ pace packets every 34 ms.

Unfortunately that isn't what I need to do.
The 64k links run a reliable protocol and we use application level
flow control to limit the number of packets being sent.

So not all the data on the tcp connection is destined to be sent
over the slow link(s).

Everything works fine - except that I'd like the traffic to fill
ethernet packets under heavy load.
If I could set the Nagle timeout to 1-2ms (on a per-socket basis)
I could enable Nagle and that would probably suffice.

> If you share your ethernet device with this 64k destination, and other
> uses, then you need a more complex setup with HTB plus two classes, and
> fq running at the htb leaf

Only the one connection between the two IP addresses is carrying this data.
Other connections carry other traffic that has entirely different
characteristics.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Can I limit the number of active tx per TCP socket?
  2014-03-07 14:35           ` David Laight
@ 2014-03-07 15:00             ` Eric Dumazet
  2014-03-07 16:39               ` David Laight
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2014-03-07 15:00 UTC (permalink / raw)
  To: David Laight; +Cc: 'Neal Cardwell', Rick Jones, netdev@vger.kernel.org

On Fri, 2014-03-07 at 14:35 +0000, David Laight wrote:

> Only the one connection between the two IP addresses is carrying this data.
> Other connections carry other traffic that has entirely different
> characteristics.

You can adapt the filter to match _this_ connection.

Other traffic will just use a different path (other qdisc if you prefer)

You can match things like source port, destination port, or fw marks
(using iptables or a socket option)

By letting the stack doing optimal aggregation, you'll have no overhead
an perfect pacing (using high resolution timer)

fq has the maxrate parameter so that you can avoid using
SO_MAX_PACING_RATE in the application.

By shaping the flow, you'll allow TCP stack to build efficient packets,
because tcp_sendmsg() will append data to the last skb in write queue,
thanks to auto corking.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Can I limit the number of active tx per TCP socket?
  2014-03-07 15:00             ` Eric Dumazet
@ 2014-03-07 16:39               ` David Laight
  0 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2014-03-07 16:39 UTC (permalink / raw)
  To: 'Eric Dumazet'
  Cc: 'Neal Cardwell', Rick Jones, netdev@vger.kernel.org

From: Eric Dumazet 
> On Fri, 2014-03-07 at 14:35 +0000, David Laight wrote:
> 
> > Only the one connection between the two IP addresses is carrying this data.
> > Other connections carry other traffic that has entirely different
> > characteristics.
...
> By letting the stack doing optimal aggregation, you'll have no overhead
> an perfect pacing (using high resolution timer)

I don't need 'perfect pacing', and indeed it wouldn't work.
I'm also actually sending from within the kernel, so high resolution
timers aren't a big problem.

The code also has a (per socket) kernel thread dedicated to the
sends (partially because we didn't work out how to do the equivalent
of poll from within the kernel), but doing the socket send (and
receive) from separate kernel threads helps share the load between
cpus - most of the rest of the protocol stack runs as a single
kernel thread to simplify locking.

Fortunately I'm fairly sure that our customers don't try to use
the same traffic patterns as I do in testing.

We do have customers sending very large numbers of packets through
sctp (I believe 15000 data chunks/sec).
I might have to look at that code to see if anything can be
done to get multiple data chunks into a single ethernet packet.
Since each send() generates a separate data chunk I can't use sendv()
to merge fragments (as I do for tcp).

	David


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-03-07 16:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-06 12:28 Can I limit the number of active tx per TCP socket? David Laight
2014-03-06 14:15 ` John Heffner
2014-03-06 15:03   ` David Laight
2014-03-06 14:38 ` Eric Dumazet
2014-03-06 14:52   ` Eric Dumazet
2014-03-06 17:17 ` Rick Jones
2014-03-06 18:09   ` Neal Cardwell
2014-03-06 19:06     ` Rick Jones
2014-03-06 19:09       ` Neal Cardwell
2014-03-06 19:27       ` Eric Dumazet
2014-03-07 10:24     ` David Laight
2014-03-07 12:29       ` David Laight
2014-03-07 14:22         ` Eric Dumazet
2014-03-07 14:35           ` David Laight
2014-03-07 15:00             ` Eric Dumazet
2014-03-07 16:39               ` David Laight
2014-03-07 10:17   ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).