netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rick Jones <rick.jones2@hp.com>
To: Tom Herbert <therbert@google.com>
Cc: netdev@vger.kernel.org
Subject: Re: Very low latency TCP for clusters
Date: Mon, 19 Jul 2010 11:13:33 -0700	[thread overview]
Message-ID: <4C4495CD.3090605@hp.com> (raw)
In-Reply-To: <AANLkTilNmNZbFWS8LF-UHU65QYIC32HZlgVZ7lXJHxPh@mail.gmail.com>

Tom Herbert wrote:
> We have been looking at best case TCP latencies that might be achieved
> within a cluster (low loss fabric).  The goal is to have latency
> numbers roughly comparable to that which can be produced using RDMA/IB
> in a low latency configuration  (<5 usecs round trip on netperf TCP_RR
> test with one byte data for directly connected hosts as a starting
> point).  This would be without changing sockets API, fabric, and
> preferably not using TCP offload or a user space stack.
> 
> I think there are at least two techniques that will drive down TCP
> latency: per connection queues and polling queues.  Per connection
> queues (supported by device) should eliminate costs of connection
> look-up, hopefully some locking.  Polling becomes viable as core
> counts on systems increase, and burning a few CPUs for networking
> polling on behalf of very low-latency threads would be reasonable.

Likely preaching to the choir - but "just so long as it doesn't give the 
system's coherence fits."  Every once and again there are things stuck into the 
idle loop of various OSes on the premis that it is only burning cycles on that 
idle core, but ends-up trashing cache lines and/or the memory subsystem and so 
drags-down other cores.

Just how close to even 5 usecs/tran is the service demand on a TCP_RR test now? 
  The best I've seen for a 10GbE NIC under SLES11 SP1 (sorry, not latest 
upstream) has been 10-12.6 usec/tran, but the range went as high as 20 or more - 
depended on where netperf/netserver were running relative to the interrupt CPU:

ftp://ftp.netperf.org/netperf/misc/dl380g6_X5560_sles11sp1_ad386a_cxgb3_1.1.3-ko_b2b_to_same_1500mtu_20100602.csv
ftp://ftp.netperf.org/netperf/misc/dl380g6_X5560_sles11sp1_nc550_be2net_2.102.147s_b2b_to_same_1500mtu_20100520.csv

Getting rid of connection lookup and some locking will no doubt be necessary, 
but I suspect there will be a lot more to it as well.  Quite a few sacred 
path-length cows may have to be slaughtered along the way to get the service 
demand << 5 microseconds to allow the < 5 usec RTT.

happy benchmarking,

rick jones

  parent reply	other threads:[~2010-07-19 18:13 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-19 17:05 Very low latency TCP for clusters Tom Herbert
2010-07-19 17:35 ` David Miller
2010-07-19 17:41 ` Eric Dumazet
2010-07-19 18:44   ` Tom Herbert
2010-07-19 19:27     ` David Miller
2010-07-19 22:03     ` Eric Dumazet
2010-07-19 23:37       ` Tom Herbert
2010-07-20  5:26         ` Eric Dumazet
2010-07-20 17:24           ` Rick Jones
2010-07-20 12:57         ` Brian Bloniarz
2010-07-19 18:13 ` Rick Jones [this message]
2010-07-19 18:28 ` Nivedita Singhvi
2010-07-19 19:46 ` Mitchell Erblich
2010-07-19 21:16   ` Tom Herbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C4495CD.3090605@hp.com \
    --to=rick.jones2@hp.com \
    --cc=netdev@vger.kernel.org \
    --cc=therbert@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).