netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
To: Tom Herbert <tom@herbertland.com>
Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>
Subject: Re: Initial thoughts on TXDP
Date: Thu, 1 Dec 2016 08:55:08 -0500	[thread overview]
Message-ID: <20161201135508.GB24547@oracle.com> (raw)
In-Reply-To: <CALx6S34qPqXa7s1eHmk9V-k6xb=36dfiQvx3JruaNnqg4v8r9g@mail.gmail.com>

On (11/30/16 14:54), Tom Herbert wrote:
> 
> Posting for discussion....
   :
> One simplifying assumption we might make is that TXDP is primarily for
> optimizing latency, specifically request/response type operations
> (think HPC, HFT, flash server, or other tightly coupled communications
> within the datacenter). Notably, I don't think that saving CPU is as
> relevant to TXDP, in fact we have already seen that CPU utilization
> can be traded off for lower latency via spin polling. Similar to XDP
> though, we might assume that single CPU performance is relevant (i.e.
> on a cache server we'd like to spin as few CPUs as needed and no more
> to handle the load an maintain throughput and latency requirements).
> High throughput (ops/sec) and low variance should be side effects of
> any design.

I'm sending this with some hesitation (esp as the flamebait threads
are starting up - I have no interest in getting into food-fights!!), 
because it sounds like the HPC/request-response use-case you have in mind
(HTTP based?) is very likely different than the one the DB use-cases in
my environment (RDBMS, Cluster req/responses). But to provide some
perspective from the latter use-case..

We also have request-response transactions, but CPU utilization
is extremely critical- many DB operations are highly CPU bound,
so it's not acceptable for the network to hog CPU util by polling.
In that sense, the DB req/resp model has a lot of overlap with the
Suricata use-case.

Also we need a select()able socket, because we have to deal with
input from several sources- network I/O, but also disk, and 
file-system I/O. So need to make sure there is no starvation,
and that we multiplex between  I/O sources efficiently

and one other critical difference from the hot-potato-forwarding
model (the sort of OVS model that DPDK etc might aruguably be a fit for)
does not apply: in order to figure out the ethernet and IP headers
in the response correctly at all times (in the face of things like VRRP,
gw changes, gw's mac addr changes etc) the application should really
be listening on NETLINK sockets for modifications to the networking
state - again points to needing a select() socket set where you can
have both the I/O fds and the netlink socket, 

For all of these reasons, we are investigating approaches similar ot
Suricata- PF_PACKET with TPACKETV2 (since we need both Tx and Rx,
and so far, tpacketv2 seems "good enough"). FWIW, we also took
a look at netmap and so far have not seen any significant benefits
to netmap over pf_packet.. investigation still ongoing.

>   - Call into TCP/IP stack with page data directly from driver-- no
> skbuff allocation or interface. This is essentially provided by the

I'm curious- one thing that came out of the IPsec evaluation
is that TSO is very valuable for performance, and this is most easily
accessed via the sk_buff interfaces.  I have not had a chance
to review your patches yet, but isnt that an issue if you bypass
sk_buff usage? But I should probably go and review your patchset..

--Sowmini

  parent reply	other threads:[~2016-12-01 13:54 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-30 22:54 Initial thoughts on TXDP Tom Herbert
2016-12-01  2:44 ` Florian Westphal
2016-12-01 19:51   ` Tom Herbert
2016-12-01 22:47     ` Hannes Frederic Sowa
2016-12-01 23:46       ` Tom Herbert
2016-12-02 14:36         ` Edward Cree
2016-12-02 17:12           ` Tom Herbert
2016-12-02 13:01       ` Jesper Dangaard Brouer
2016-12-02 12:13     ` Jesper Dangaard Brouer
2016-12-01 13:55 ` Sowmini Varadhan [this message]
2016-12-01 19:05   ` Tom Herbert
2016-12-01 19:48     ` Rick Jones
2016-12-01 20:18       ` Tom Herbert
2016-12-01 21:47         ` Rick Jones
2016-12-01 22:12           ` Tom Herbert
2016-12-02  0:04             ` Rick Jones
2016-12-01 20:13     ` Sowmini Varadhan
2016-12-01 20:39       ` Tom Herbert
2016-12-01 22:55       ` Hannes Frederic Sowa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161201135508.GB24547@oracle.com \
    --to=sowmini.varadhan@oracle.com \
    --cc=netdev@vger.kernel.org \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).