All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend via iovisor-dev <iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org>
To: Jakub Kicinski
	<jakub.kicinski-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>,
	Jesper Dangaard Brouer
	<brouer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: "netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org"
	<iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org>,
	"Fastabend,
	John R"
	<john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>,
	Simon Horman
	<simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org>,
	Rana Shahout <ranas-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Ari Saha <as754m-60p5jsuXm+c@public.gmane.org>
Subject: Re: XDP seeking input from NIC hardware vendors
Date: Fri, 8 Jul 2016 09:45:25 -0700	[thread overview]
Message-ID: <577FD8A5.8020700@gmail.com> (raw)
In-Reply-To: <20160708170751.2eaae790@jkicinski-Precision-T1700>

On 16-07-08 09:07 AM, Jakub Kicinski wrote:
> On Fri, 8 Jul 2016 17:19:43 +0200, Jesper Dangaard Brouer wrote:
>> On Fri, 8 Jul 2016 14:44:53 +0100 Jakub Kicinski <jakub.kicinski-wFxRvT7yatFl57MIdRCFDg@public.gmane.org> wrote:
>>> On Thu, 7 Jul 2016 19:22:12 -0700, Alexei Starovoitov wrote:
>>>>> If the goal is to just separate XDP traffic from non-XDP traffic
>>>>> you could accomplish this with a combination of SR-IOV/macvlan to
>>>>> separate the device queues into multiple netdevs and then run XDP
>>>>> on just one of the netdevs. Then use flow director (ethtool) or
>>>>> 'tc cls_u32/flower' to steer traffic to the netdev. This is how
>>>>> we support multiple networking stacks on one device by the way it
>>>>> is called the bifurcated driver. Its not too far of a stretch to
>>>>> think we could offload some simple XDP programs to program the
>>>>> splitting of traffic instead of cls_u32/flower/flow_director and
>>>>> then you would have a stack of XDP programs. One running in
>>>>> hardware and a set running on the queues in software.      
>>>>
>>>>
>>>> the above sounds like much better approach then Jesper/mine
>>>> prog_per_ring stuff.
>>>>
>>>> If we can split the nic via sriov and have dedicated netdev via VF
>>>> just for XDP that's way cleaner approach. I guess we won't need to
>>>> do xdp_rxqmask after all.    
>>>
>>> +1
>>>
>>> I was thinking about using eBPF to direct to NIC queues but concluded
>>> that doing a redirect to a VF is cleaner.  Especially if the PF driver
>>> supports VF representatives we could potentially just use
>>> bpf_redirect(VFR netdev) and the VF doesn't even have to be handled by
>>> the same stack.  
>>
>> I actually disagree.
>>
>> I _do_ want to use the "filter" part of eBPF to direct to NIC queues, and
>> then run a single/specific XDP program on that queue.
>>
>> Why to I want this?
>>
>> This part of solving a very fundamental CS problem (early demux), when
>> wanting to support Zero-copy on RX.  The basic problem that the NIC
>> driver need to map RX pages into the RX ring, prior to receiving
>> packets. Thus, we need HW support to steer packets, for gaining enough
>> isolation (e.g between tenants domains) for allowing zero-copy.
>>
>>
>> Based on the flexibility of the HW-filter, the granularity achievable
>> for isolation (e.g. application specific) is much more flexible.  Than
>> splitting up the entire NIC with SR-IOV, VFs or macvlans.
> 
> I think of SR-IOV VFs a way of grouping queues.  If HW is capable of
> directing to a queue it's usually capable of directing to a VF as well.
> And the VF could have all other traffic disabled so you would get only
> packets directed to it by the (BPF) filter - same as you would for the
> queue.  Does that make sense for zero copy apps?
> 

The only distinction between VFs and queue groupings on my side is VFs
provide RSS where as queue groupings have to be selected explicitly.
In a programmable NIC world the distinction might be lost if a "RSS"
program can be loaded into the NIC to select queues but for existing
hardware the distinction is there.

If you demux using a eBPF program or via a filter model like
flow_director or cls_{u32|flower} I think we can support both. And this
just depends on the programmability of the hardware. Note flow_director
and cls_{u32|flower} steering to VFs is already in place.

The question I have is should the "filter" part of the eBPF program
be a separate program from the XDP program and loaded using specific
semantics (e.g. "load_hardware_demux" ndo op) at the risk of building
a ever growing set of "ndo" ops. If you are running multiple XDP
programs on the same NIC hardware then I think this actually makes
sense otherwise how would the hardware and even software find the
"demux" logic. In this model there is a "demux" program that selects
a queue/VF and a program that runs on the netdev queues.

Any thoughts?

.John

  reply	other threads:[~2016-07-08 16:45 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-07 10:42 XDP seeking input from NIC hardware vendors Jesper Dangaard Brouer via iovisor-dev
2016-07-07 15:18 ` Fastabend, John R
     [not found]   ` <D6BB30FE66EA894C9F13C9E3CDDF00F564E5FB81-5FK+k9557ZBqS6EAlXoojrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2016-07-07 16:12     ` Jakub Kicinski via iovisor-dev
2016-07-07 17:53       ` Tom Herbert via iovisor-dev
     [not found]         ` <CALx6S36BADKByJAYQLMXBx1NEDaqn6fdqsCk-OdgNo5vgHrO1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-07 21:33           ` John Fastabend via iovisor-dev
2016-07-08  2:22     ` Alexei Starovoitov via iovisor-dev
     [not found]       ` <20160708022210.GA12244-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2016-07-08  4:05         ` John Fastabend via iovisor-dev
     [not found]           ` <577F2689.4010602-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-08  4:28             ` Alexei Starovoitov via iovisor-dev
2016-07-08 13:44         ` Jakub Kicinski via iovisor-dev
2016-07-08 15:19           ` Jesper Dangaard Brouer via iovisor-dev
     [not found]             ` <20160708171943.0e1ce8d7-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-07-08 16:07               ` Jakub Kicinski via iovisor-dev
2016-07-08 16:45                 ` John Fastabend via iovisor-dev [this message]
     [not found]                   ` <577FD8A5.8020700-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-08 17:51                     ` Jakub Kicinski via iovisor-dev
2016-07-09 11:27                       ` Jesper Dangaard Brouer via iovisor-dev
2016-07-12  2:24                         ` Alexei Starovoitov
     [not found]                           ` <20160712022423.GA47757-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2016-07-12 19:13                             ` John Fastabend via iovisor-dev
     [not found]                               ` <5785413D.4050901-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-12 19:49                                 ` Jakub Kicinski via iovisor-dev
2016-07-12 20:32                                 ` Jesper Dangaard Brouer via iovisor-dev
     [not found]                                   ` <20160712223231.202cd122-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-07-26 13:31                                     ` Thomas Monjalon via iovisor-dev
2016-07-26 16:08                                       ` [iovisor-dev] " Tom Herbert
     [not found]                                         ` <CALx6S35XjCsG5EmiYBpbGk9NckQbe4VbNSGLqV7h+d16PgNGKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-07-26 17:53                                           ` John Fastabend via iovisor-dev
     [not found]                                             ` <5797A381.90406-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-07-26 18:42                                               ` Jesper Dangaard Brouer via iovisor-dev
2016-07-26 18:58                                               ` Tom Herbert via iovisor-dev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=577FD8A5.8020700@gmail.com \
    --to=iovisor-dev-9jonkmmolfhee9la1f8ukti2o/jbrioy@public.gmane.org \
    --cc=as754m-60p5jsuXm+c@public.gmane.org \
    --cc=brouer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org \
    --cc=jakub.kicinski-wFxRvT7yatFl57MIdRCFDg@public.gmane.org \
    --cc=john.fastabend-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=john.r.fastabend-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=ranas-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=simon.horman-wFxRvT7yatFl57MIdRCFDg@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.