netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend via iovisor-dev <iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org>
To: Jakub Kicinski
	<jakub.kicinski-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>,
	Jesper Dangaard Brouer
	<brouer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: "netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org"
	<iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org>,
	"Fastabend,
	John R"
	<john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>,
	Simon Horman
	<simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>,
	Rana Shahout <ranas-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Ari Saha <as754m-60p5jsuXm+c@public.gmane.org>
Subject: Re: XDP seeking input from NIC hardware vendors
Date: Fri, 8 Jul 2016 09:45:25 -0700	[thread overview]
Message-ID: <577FD8A5.8020700@gmail.com> (raw)
In-Reply-To: <20160708170751.2eaae790@jkicinski-Precision-T1700>

On 16-07-08 09:07 AM, Jakub Kicinski wrote:
> On Fri, 8 Jul 2016 17:19:43 +0200, Jesper Dangaard Brouer wrote:
>> On Fri, 8 Jul 2016 14:44:53 +0100 Jakub Kicinski <jakub.kicinski-wFxRvT7yatFl57MIdRCFDg@public.gmane.org> wrote:
>>> On Thu, 7 Jul 2016 19:22:12 -0700, Alexei Starovoitov wrote:
>>>>> If the goal is to just separate XDP traffic from non-XDP traffic
>>>>> you could accomplish this with a combination of SR-IOV/macvlan to
>>>>> separate the device queues into multiple netdevs and then run XDP
>>>>> on just one of the netdevs. Then use flow director (ethtool) or
>>>>> 'tc cls_u32/flower' to steer traffic to the netdev. This is how
>>>>> we support multiple networking stacks on one device by the way it
>>>>> is called the bifurcated driver. Its not too far of a stretch to
>>>>> think we could offload some simple XDP programs to program the
>>>>> splitting of traffic instead of cls_u32/flower/flow_director and
>>>>> then you would have a stack of XDP programs. One running in
>>>>> hardware and a set running on the queues in software.      
>>>>
>>>>
>>>> the above sounds like much better approach then Jesper/mine
>>>> prog_per_ring stuff.
>>>>
>>>> If we can split the nic via sriov and have dedicated netdev via VF
>>>> just for XDP that's way cleaner approach. I guess we won't need to
>>>> do xdp_rxqmask after all.    
>>>
>>> +1
>>>
>>> I was thinking about using eBPF to direct to NIC queues but concluded
>>> that doing a redirect to a VF is cleaner.  Especially if the PF driver
>>> supports VF representatives we could potentially just use
>>> bpf_redirect(VFR netdev) and the VF doesn't even have to be handled by
>>> the same stack.  
>>
>> I actually disagree.
>>
>> I _do_ want to use the "filter" part of eBPF to direct to NIC queues, and
>> then run a single/specific XDP program on that queue.
>>
>> Why to I want this?
>>
>> This part of solving a very fundamental CS problem (early demux), when
>> wanting to support Zero-copy on RX.  The basic problem that the NIC
>> driver need to map RX pages into the RX ring, prior to receiving
>> packets. Thus, we need HW support to steer packets, for gaining enough
>> isolation (e.g between tenants domains) for allowing zero-copy.
>>
>>
>> Based on the flexibility of the HW-filter, the granularity achievable
>> for isolation (e.g. application specific) is much more flexible.  Than
>> splitting up the entire NIC with SR-IOV, VFs or macvlans.
> 
> I think of SR-IOV VFs a way of grouping queues.  If HW is capable of
> directing to a queue it's usually capable of directing to a VF as well.
> And the VF could have all other traffic disabled so you would get only
> packets directed to it by the (BPF) filter - same as you would for the
> queue.  Does that make sense for zero copy apps?
> 

The only distinction between VFs and queue groupings on my side is VFs
provide RSS where as queue groupings have to be selected explicitly.
In a programmable NIC world the distinction might be lost if a "RSS"
program can be loaded into the NIC to select queues but for existing
hardware the distinction is there.

If you demux using a eBPF program or via a filter model like
flow_director or cls_{u32|flower} I think we can support both. And this
just depends on the programmability of the hardware. Note flow_director
and cls_{u32|flower} steering to VFs is already in place.

The question I have is should the "filter" part of the eBPF program
be a separate program from the XDP program and loaded using specific
semantics (e.g. "load_hardware_demux" ndo op) at the risk of building
a ever growing set of "ndo" ops. If you are running multiple XDP
programs on the same NIC hardware then I think this actually makes
sense otherwise how would the hardware and even software find the
"demux" logic. In this model there is a "demux" program that selects
a queue/VF and a program that runs on the netdev queues.

Any thoughts?

.John

  reply	other threads:[~2016-07-08 16:45 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-07 10:42 XDP seeking input from NIC hardware vendors Jesper Dangaard Brouer via iovisor-dev
2016-07-07 15:18 ` Fastabend, John R
     [not found]   ` <D6BB30FE66EA894C9F13C9E3CDDF00F564E5FB81-5FK+k9557ZBqS6EAlXoojrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2016-07-07 16:12     ` Jakub Kicinski via iovisor-dev
2016-07-07 17:53       ` Tom Herbert via iovisor-dev
     [not found]         ` <CALx6S36BADKByJAYQLMXBx1NEDaqn6fdqsCk-OdgNo5vgHrO1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-07 21:33           ` John Fastabend via iovisor-dev
2016-07-08  2:22     ` Alexei Starovoitov via iovisor-dev
     [not found]       ` <20160708022210.GA12244-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2016-07-08  4:05         ` John Fastabend via iovisor-dev
     [not found]           ` <577F2689.4010602-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-08  4:28             ` Alexei Starovoitov via iovisor-dev
2016-07-08 13:44         ` Jakub Kicinski via iovisor-dev
2016-07-08 15:19           ` Jesper Dangaard Brouer via iovisor-dev
     [not found]             ` <20160708171943.0e1ce8d7-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-07-08 16:07               ` Jakub Kicinski via iovisor-dev
2016-07-08 16:45                 ` John Fastabend via iovisor-dev [this message]
     [not found]                   ` <577FD8A5.8020700-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-08 17:51                     ` Jakub Kicinski via iovisor-dev
2016-07-09 11:27                       ` Jesper Dangaard Brouer via iovisor-dev
2016-07-12  2:24                         ` Alexei Starovoitov
     [not found]                           ` <20160712022423.GA47757-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2016-07-12 19:13                             ` John Fastabend via iovisor-dev
     [not found]                               ` <5785413D.4050901-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-12 19:49                                 ` Jakub Kicinski via iovisor-dev
2016-07-12 20:32                                 ` Jesper Dangaard Brouer via iovisor-dev
     [not found]                                   ` <20160712223231.202cd122-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-07-26 13:31                                     ` Thomas Monjalon via iovisor-dev
2016-07-26 16:08                                       ` [iovisor-dev] " Tom Herbert
     [not found]                                         ` <CALx6S35XjCsG5EmiYBpbGk9NckQbe4VbNSGLqV7h+d16PgNGKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-26 17:53                                           ` John Fastabend via iovisor-dev
     [not found]                                             ` <5797A381.90406-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-26 18:42                                               ` Jesper Dangaard Brouer via iovisor-dev
2016-07-26 18:58                                               ` Tom Herbert via iovisor-dev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=577FD8A5.8020700@gmail.com \
    --to=iovisor-dev-9jonkmmolfhee9la1f8ukti2o/jbrioy@public.gmane.org \
    --cc=as754m-60p5jsuXm+c@public.gmane.org \
    --cc=brouer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org \
    --cc=jakub.kicinski-wFxRvT7yatFl57MIdRCFDg@public.gmane.org \
    --cc=john.fastabend-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=ranas-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).