From: Jesper Dangaard Brouer via iovisor-dev <iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org>
To: Jakub Kicinski <jakub.kicinski-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>
Cc: "netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org"
<iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org>,
John Fastabend
<john.fastabend-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
"Fastabend,
John R"
<john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>,
Simon Horman
<simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>,
Rana Shahout <ranas-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
Ari Saha <as754m-60p5jsuXm+c@public.gmane.org>
Subject: Re: XDP seeking input from NIC hardware vendors
Date: Sat, 9 Jul 2016 13:27:26 +0200 [thread overview]
Message-ID: <20160709132726.3cbccf11@redhat.com> (raw)
In-Reply-To: <20160708185107.10c0dbe4@jkicinski-Precision-T1700>
On Fri, 8 Jul 2016 18:51:07 +0100
Jakub Kicinski <jakub.kicinski-wFxRvT7yatFl57MIdRCFDg@public.gmane.org> wrote:
> On Fri, 8 Jul 2016 09:45:25 -0700, John Fastabend wrote:
> > The only distinction between VFs and queue groupings on my side is VFs
> > provide RSS where as queue groupings have to be selected explicitly.
> > In a programmable NIC world the distinction might be lost if a "RSS"
> > program can be loaded into the NIC to select queues but for existing
> > hardware the distinction is there.
>
> To do BPF RSS we need a way to select the queue which I think is all
> Jesper wanted. So we will have to tackle the queue selection at some
> point. The main obstacle with it for me is to define what queue
> selection means when program is not offloaded to HW... Implementing
> queue selection on HW side is trivial.
Yes, I do see the problem of fallback, when the programs "filter" demux
cannot be offloaded to hardware.
First I though it was a good idea to keep the "demux-filter" part of
the eBPF program, as software fallback can still apply this filter in
SW, and just mark the packets as not-zero-copy-safe. But when HW
offloading is not possible, then packets can be delivered every RX
queue, and SW would need to handle that, which hard to keep transparent.
> > If you demux using a eBPF program or via a filter model like
> > flow_director or cls_{u32|flower} I think we can support both. And this
> > just depends on the programmability of the hardware. Note flow_director
> > and cls_{u32|flower} steering to VFs is already in place.
Maybe we should keep HW demuxing as a separate setup step.
Today I can almost do what I want: by setting up ntuple filters, and (if
Alexei allows it) assign an application specific XDP eBPF program to a
specific RX queue.
ethtool -K eth2 ntuple on
ethtool -N eth2 flow-type udp4 dst-ip 192.168.254.1 dst-port 53 action 42
Then the XDP program can be attached to RX queue 42, and
promise/guarantee that it will consume all packet. And then the
backing page-pool can allow zero-copy RX (and enable scrubbing when
refilling pool).
> Yes, for steering to VFs we could potentially reuse a lot of existing
> infrastructure.
>
> > The question I have is should the "filter" part of the eBPF program
> > be a separate program from the XDP program and loaded using specific
> > semantics (e.g. "load_hardware_demux" ndo op) at the risk of building
> > a ever growing set of "ndo" ops. If you are running multiple XDP
> > programs on the same NIC hardware then I think this actually makes
> > sense otherwise how would the hardware and even software find the
> > "demux" logic. In this model there is a "demux" program that selects
> > a queue/VF and a program that runs on the netdev queues.
>
> I don't think we should enforce the separation here. What we may want
> to do before forwarding to the VF can be much more complicated than
> pure demux/filtering (simple eg - pop VLAN/tunnel). VF representative
> model works well here as fallback - if program could not be offloaded
> it will be run on the host and "trombone" packets via VFR into the VF.
That is an interesting idea.
> If we have a chain of BPF programs we can order them in increasing
> level of complexity/features required and then HW could transparently
> offload the first parts - the easier ones - leaving more complex
> processing on the host.
I'll try to keep out of the discussion of how to structure the BPF
program, as it is outside my "area".
> This should probably be paired with some sort of "skip-sw" flag to let
> user space enforce the HW offload on the fast path part.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2016-07-09 11:27 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-07 10:42 XDP seeking input from NIC hardware vendors Jesper Dangaard Brouer via iovisor-dev
2016-07-07 15:18 ` Fastabend, John R
[not found] ` <D6BB30FE66EA894C9F13C9E3CDDF00F564E5FB81-5FK+k9557ZBqS6EAlXoojrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2016-07-07 16:12 ` Jakub Kicinski via iovisor-dev
2016-07-07 17:53 ` Tom Herbert via iovisor-dev
[not found] ` <CALx6S36BADKByJAYQLMXBx1NEDaqn6fdqsCk-OdgNo5vgHrO1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-07 21:33 ` John Fastabend via iovisor-dev
2016-07-08 2:22 ` Alexei Starovoitov via iovisor-dev
[not found] ` <20160708022210.GA12244-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2016-07-08 4:05 ` John Fastabend via iovisor-dev
[not found] ` <577F2689.4010602-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-08 4:28 ` Alexei Starovoitov via iovisor-dev
2016-07-08 13:44 ` Jakub Kicinski via iovisor-dev
2016-07-08 15:19 ` Jesper Dangaard Brouer via iovisor-dev
[not found] ` <20160708171943.0e1ce8d7-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-07-08 16:07 ` Jakub Kicinski via iovisor-dev
2016-07-08 16:45 ` John Fastabend via iovisor-dev
[not found] ` <577FD8A5.8020700-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-08 17:51 ` Jakub Kicinski via iovisor-dev
2016-07-09 11:27 ` Jesper Dangaard Brouer via iovisor-dev [this message]
2016-07-12 2:24 ` Alexei Starovoitov
[not found] ` <20160712022423.GA47757-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2016-07-12 19:13 ` John Fastabend via iovisor-dev
[not found] ` <5785413D.4050901-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-12 19:49 ` Jakub Kicinski via iovisor-dev
2016-07-12 20:32 ` Jesper Dangaard Brouer via iovisor-dev
[not found] ` <20160712223231.202cd122-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-07-26 13:31 ` Thomas Monjalon via iovisor-dev
2016-07-26 16:08 ` [iovisor-dev] " Tom Herbert
[not found] ` <CALx6S35XjCsG5EmiYBpbGk9NckQbe4VbNSGLqV7h+d16PgNGKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-26 17:53 ` John Fastabend via iovisor-dev
[not found] ` <5797A381.90406-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-26 18:42 ` Jesper Dangaard Brouer via iovisor-dev
2016-07-26 18:58 ` Tom Herbert via iovisor-dev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160709132726.3cbccf11@redhat.com \
--to=iovisor-dev-9jonkmmolfhee9la1f8ukti2o/jbrioy@public.gmane.org \
--cc=as754m-60p5jsuXm+c@public.gmane.org \
--cc=brouer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org \
--cc=jakub.kicinski-wFxRvT7yatFl57MIdRCFDg@public.gmane.org \
--cc=john.fastabend-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=ranas-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).