All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer via iovisor-dev <iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org>
To: Jakub Kicinski <jakub.kicinski-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
Cc: "netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org"
	<iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org>,
	John Fastabend
	<john.fastabend-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	"Fastabend,
	John R"
	<john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>,
	Simon Horman
	<simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>,
	Rana Shahout <ranas-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Ari Saha <as754m-60p5jsuXm+c@public.gmane.org>
Subject: Re: XDP seeking input from NIC hardware vendors
Date: Sat, 9 Jul 2016 13:27:26 +0200	[thread overview]
Message-ID: <20160709132726.3cbccf11@redhat.com> (raw)
In-Reply-To: <20160708185107.10c0dbe4@jkicinski-Precision-T1700>

On Fri, 8 Jul 2016 18:51:07 +0100
Jakub Kicinski <jakub.kicinski-wFxRvT7yatFl57MIdRCFDg@public.gmane.org> wrote:

> On Fri, 8 Jul 2016 09:45:25 -0700, John Fastabend wrote:
> > The only distinction between VFs and queue groupings on my side is VFs
> > provide RSS where as queue groupings have to be selected explicitly.
> > In a programmable NIC world the distinction might be lost if a "RSS"
> > program can be loaded into the NIC to select queues but for existing
> > hardware the distinction is there.  
> 
> To do BPF RSS we need a way to select the queue which I think is all
> Jesper wanted.  So we will have to tackle the queue selection at some
> point.  The main obstacle with it for me is to define what queue
> selection means when program is not offloaded to HW...  Implementing
> queue selection on HW side is trivial.

Yes, I do see the problem of fallback, when the programs "filter" demux
cannot be offloaded to hardware.

First I though it was a good idea to keep the "demux-filter" part of
the eBPF program, as software fallback can still apply this filter in
SW, and just mark the packets as not-zero-copy-safe.  But when HW
offloading is not possible, then packets can be delivered every RX
queue, and SW would need to handle that, which hard to keep transparent.


> > If you demux using a eBPF program or via a filter model like
> > flow_director or cls_{u32|flower} I think we can support both. And this
> > just depends on the programmability of the hardware. Note flow_director
> > and cls_{u32|flower} steering to VFs is already in place.  

Maybe we should keep HW demuxing as a separate setup step.

Today I can almost do what I want: by setting up ntuple filters, and (if
Alexei allows it) assign an application specific XDP eBPF program to a
specific RX queue.

 ethtool -K eth2 ntuple on
 ethtool -N eth2 flow-type udp4 dst-ip 192.168.254.1 dst-port 53 action 42

Then the XDP program can be attached to RX queue 42, and
promise/guarantee that it will consume all packet.  And then the
backing page-pool can allow zero-copy RX (and enable scrubbing when
refilling pool).


> Yes, for steering to VFs we could potentially reuse a lot of existing
> infrastructure.
> 
> > The question I have is should the "filter" part of the eBPF program
> > be a separate program from the XDP program and loaded using specific
> > semantics (e.g. "load_hardware_demux" ndo op) at the risk of building
> > a ever growing set of "ndo" ops. If you are running multiple XDP
> > programs on the same NIC hardware then I think this actually makes
> > sense otherwise how would the hardware and even software find the
> > "demux" logic. In this model there is a "demux" program that selects
> > a queue/VF and a program that runs on the netdev queues.  
> 
> I don't think we should enforce the separation here.  What we may want
> to do before forwarding to the VF can be much more complicated than
> pure demux/filtering (simple eg - pop VLAN/tunnel).  VF representative
> model works well here as fallback - if program could not be offloaded
> it will be run on the host and "trombone" packets via VFR into the VF.

That is an interesting idea.

> If we have a chain of BPF programs we can order them in increasing
> level of complexity/features required and then HW could transparently
> offload the first parts - the easier ones - leaving more complex
> processing on the host.

I'll try to keep out of the discussion of how to structure the BPF
program, as it is outside my "area".
 
> This should probably be paired with some sort of "skip-sw" flag to let
> user space enforce the HW offload on the fast path part.


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2016-07-09 11:27 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-07 10:42 XDP seeking input from NIC hardware vendors Jesper Dangaard Brouer via iovisor-dev
2016-07-07 15:18 ` Fastabend, John R
     [not found]   ` <D6BB30FE66EA894C9F13C9E3CDDF00F564E5FB81-5FK+k9557ZBqS6EAlXoojrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2016-07-07 16:12     ` Jakub Kicinski via iovisor-dev
2016-07-07 17:53       ` Tom Herbert via iovisor-dev
     [not found]         ` <CALx6S36BADKByJAYQLMXBx1NEDaqn6fdqsCk-OdgNo5vgHrO1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-07 21:33           ` John Fastabend via iovisor-dev
2016-07-08  2:22     ` Alexei Starovoitov via iovisor-dev
     [not found]       ` <20160708022210.GA12244-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2016-07-08  4:05         ` John Fastabend via iovisor-dev
     [not found]           ` <577F2689.4010602-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-08  4:28             ` Alexei Starovoitov via iovisor-dev
2016-07-08 13:44         ` Jakub Kicinski via iovisor-dev
2016-07-08 15:19           ` Jesper Dangaard Brouer via iovisor-dev
     [not found]             ` <20160708171943.0e1ce8d7-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-07-08 16:07               ` Jakub Kicinski via iovisor-dev
2016-07-08 16:45                 ` John Fastabend via iovisor-dev
     [not found]                   ` <577FD8A5.8020700-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-08 17:51                     ` Jakub Kicinski via iovisor-dev
2016-07-09 11:27                       ` Jesper Dangaard Brouer via iovisor-dev [this message]
2016-07-12  2:24                         ` Alexei Starovoitov
     [not found]                           ` <20160712022423.GA47757-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2016-07-12 19:13                             ` John Fastabend via iovisor-dev
     [not found]                               ` <5785413D.4050901-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-12 19:49                                 ` Jakub Kicinski via iovisor-dev
2016-07-12 20:32                                 ` Jesper Dangaard Brouer via iovisor-dev
     [not found]                                   ` <20160712223231.202cd122-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-07-26 13:31                                     ` Thomas Monjalon via iovisor-dev
2016-07-26 16:08                                       ` [iovisor-dev] " Tom Herbert
     [not found]                                         ` <CALx6S35XjCsG5EmiYBpbGk9NckQbe4VbNSGLqV7h+d16PgNGKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-26 17:53                                           ` John Fastabend via iovisor-dev
     [not found]                                             ` <5797A381.90406-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-26 18:42                                               ` Jesper Dangaard Brouer via iovisor-dev
2016-07-26 18:58                                               ` Tom Herbert via iovisor-dev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160709132726.3cbccf11@redhat.com \
    --to=iovisor-dev-9jonkmmolfhee9la1f8ukti2o/jbrioy@public.gmane.org \
    --cc=as754m-60p5jsuXm+c@public.gmane.org \
    --cc=brouer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org \
    --cc=jakub.kicinski-wFxRvT7yatFl57MIdRCFDg@public.gmane.org \
    --cc=john.fastabend-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=ranas-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.