Re: Initial thoughts on TXDP

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Tom Herbert <tom@herbertland.com>
To: Rick Jones <rick.jones2@hpe.com>
Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>
Subject: Re: Initial thoughts on TXDP
Date: Thu, 1 Dec 2016 12:18:50 -0800	[thread overview]
Message-ID: <CALx6S37aknqfrj66AP8hEfVT9X2OmBFK9GVa9A+0FpydPbm9kg@mail.gmail.com> (raw)
In-Reply-To: <aac93b13-6298-b9eb-7f3c-b074f22c388c@hpe.com>

On Thu, Dec 1, 2016 at 11:48 AM, Rick Jones <rick.jones2@hpe.com> wrote:
> On 12/01/2016 11:05 AM, Tom Herbert wrote:
>>
>> For the GSO and GRO the rationale is that performing the extra SW
>> processing to do the offloads is significantly less expensive than
>> running each packet through the full stack. This is true in a
>> multi-layered generalized stack. In TXDP, however, we should be able
>> to optimize the stack data path such that that would no longer be
>> true. For instance, if we can process the packets received on a
>> connection quickly enough so that it's about the same or just a little
>> more costly than GRO processing then we might bypass GRO entirely.
>> TSO is probably still relevant in TXDP since it reduces overheads
>> processing TX in the device itself.
>
>
> Just how much per-packet path-length are you thinking will go away under the
> likes of TXDP?  It is admittedly "just" netperf but losing TSO/GSO does some
> non-trivial things to effective overhead (service demand) and so throughput:
>
For plain in order TCP packets I believe we should be able process
each packet at nearly same speed as GRO. Most of the protocol
processing we do between GRO and the stack are the same, the
differences are that we need to do a connection lookup in the stack
path (note we now do this is UDP GRO and that hasn't show up as a
major hit). We also need to consider enqueue/dequeue on the socket
which is a major reason to try for lockless sockets in this instance.

> stack@np-cp1-c0-m1-mgmt:~/rjones2$ ./netperf -c -H np-cp1-c1-m3-mgmt -- -P
> 12867
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 12867 AF_INET to
> np-cp1-c1-m3-mgmt () port 12867 AF_INET : demo
> Recv   Send    Send                          Utilization       Service
> Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB
>
>  87380  16384  16384    10.00      9260.24   2.02     -1.00    0.428 -1.000
> stack@np-cp1-c0-m1-mgmt:~/rjones2$ sudo ethtool -K hed0 tso off gso off
> stack@np-cp1-c0-m1-mgmt:~/rjones2$ ./netperf -c -H np-cp1-c1-m3-mgmt -- -P
> 12867
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 12867 AF_INET to
> np-cp1-c1-m3-mgmt () port 12867 AF_INET : demo
> Recv   Send    Send                          Utilization       Service
> Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB
>
>  87380  16384  16384    10.00      5621.82   4.25     -1.00    1.486 -1.000
>
> And that is still with the stretch-ACKs induced by GRO at the receiver.
>
Sure, but trying running something emulates a more realistic workload
than a TCP stream, like RR test with relative small payload and many
connections.

> Losing GRO has quite similar results:
> stack@np-cp1-c0-m1-mgmt:~/rjones2$ ./netperf -c -H np-cp1-c1-m3-mgmt -t
> TCP_MAERTS -- -P 12867
> MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 12867 AF_INET to
> np-cp1-c1-m3-mgmt () port 12867 AF_INET : demo
> Recv   Send    Send                          Utilization       Service
> Demand
> Socket Socket  Message  Elapsed              Recv     Send     Recv    Send
> Size   Size    Size     Time     Throughput  local    remote   local remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB
>
>  87380  16384  16384    10.00      9154.02   4.00     -1.00    0.860 -1.000
> stack@np-cp1-c0-m1-mgmt:~/rjones2$ sudo ethtool -K hed0 gro off
>
> stack@np-cp1-c0-m1-mgmt:~/rjones2$ ./netperf -c -H np-cp1-c1-m3-mgmt -t
> TCP_MAERTS -- -P 12867
> MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 12867 AF_INET to
> np-cp1-c1-m3-mgmt () port 12867 AF_INET : demo
> Recv   Send    Send                          Utilization       Service
> Demand
> Socket Socket  Message  Elapsed              Recv     Send     Recv    Send
> Size   Size    Size     Time     Throughput  local    remote   local remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB
>
>  87380  16384  16384    10.00      4212.06   5.36     -1.00    2.502 -1.000
>
> I'm sure there is a very non-trivial "it depends" component here - netperf
> will get the peak benefit from *SO and so one will see the peak difference
> in service demands - but even if one gets only 6 segments per *SO that is a
> lot of path-length to make-up.
>
True, but I think there's a lot of path we'll be able to cut out. In
this mode we don't need IPtables, Netfilter, input route, IPvlan
check, or other similar lookups. Once we've successfully matched a
establish TCP state anything related to policy on both TX and RX for
that connection is inferred from the state. We want the processing
path in this case to just be concerned with just protocol processing
and interface to user.

> 4.4 kernel, BE3 NICs ... E5-2640 0 @ 2.50GHz
>
> And even if one does have the CPU cycles to burn so to speak, the effect on
> power consumption needs to be included in the calculus.
>
Definitely, power consumption is the down side of spin polling CPUs.
As I said we would never should be spinning any more CPUs than
necessary to handle the load.

Tom

> happy benchmarking,
>
> rick jones

next prev parent reply	other threads:[~2016-12-01 20:18 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-30 22:54 Initial thoughts on TXDP Tom Herbert
2016-12-01  2:44 ` Florian Westphal
2016-12-01 19:51   ` Tom Herbert
2016-12-01 22:47     ` Hannes Frederic Sowa
2016-12-01 23:46       ` Tom Herbert
2016-12-02 14:36         ` Edward Cree
2016-12-02 17:12           ` Tom Herbert
2016-12-02 13:01       ` Jesper Dangaard Brouer
2016-12-02 12:13     ` Jesper Dangaard Brouer
2016-12-01 13:55 ` Sowmini Varadhan
2016-12-01 19:05   ` Tom Herbert
2016-12-01 19:48     ` Rick Jones
2016-12-01 20:18       ` Tom Herbert [this message]
2016-12-01 21:47         ` Rick Jones
2016-12-01 22:12           ` Tom Herbert
2016-12-02  0:04             ` Rick Jones
2016-12-01 20:13     ` Sowmini Varadhan
2016-12-01 20:39       ` Tom Herbert
2016-12-01 22:55       ` Hannes Frederic Sowa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALx6S37aknqfrj66AP8hEfVT9X2OmBFK9GVa9A+0FpydPbm9kg@mail.gmail.com \
    --to=tom@herbertland.com \
    --cc=netdev@vger.kernel.org \
    --cc=rick.jones2@hpe.com \
    --cc=sowmini.varadhan@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).