netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Tom Herbert <tom@herbertland.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: "John W. Linville" <linville@tuxdriver.com>,
	Jesse Gross <jesse@kernel.org>,
	David Miller <davem@davemloft.net>,
	Anjali Singhai Jain <anjali.singhai@intel.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Kiran Patil <kiran.patil@intel.com>
Subject: Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
Date: Fri, 04 Dec 2015 11:54:01 -0800	[thread overview]
Message-ID: <5661EF59.9040606@gmail.com> (raw)
In-Reply-To: <CALx6S342OMLB6s7LcmnZP1rnYNUs345Dy1gjYX3JUKqqB_waPA@mail.gmail.com>

[...]

>>>> Please provide a sketch up for a protocol generic api that can tell
>>>> hardware where a inner protocol header starts that supports vxlan,
>>>> vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is
>>>> starting at that point.
>>>>
>>> BPF. Implementing protocol generic offloads are not just a HW concern
>>> either, adding kernel GRO code for every possible protocol that comes
>>> along doesn't scale well. This becomes especially obvious when we
>>> consider how to provide offloads for applications protocols. If the
>>> kernel provides a programmable framework for the offloads then
>>> application protocols, such as QUIC, could use use that without
>>> needing to hack the kernel to support the specific protocol (which no
>>> one wants!). Application protocol parsing in KCM and some other use
>>> cases of BPF have already foreshadowed this, and we are working on a
>>> prototype for a BPF programmable engine in the kernel. Presumably,
>>> this same model could eventually be applied as the HW API to
>>> programmable offload.
>>
>> So your proposal is like this:
>>
>> dev->ops->ndo_add_offload(struct net_device *, struct bpf_prog *) ?
>>
>> What do network cards do when they don't support bpf in hardware as
>> currently all cards. Should they do program equivalence testing on the
>> bpf program to check if it conforms some of its offload capabilities and
>> activate those for the port they parsed out of the bpf program? I don't
>> really care about more function pointers in struct net_device_ops
>> because it really doesn't matter but what really concerns me is the huge
>> size of the drivers in the kernel. Just tell the driver specifically
>> what is wanted and let them do that. Don't force them to do program
>> inspection or anything.
>>
> Nobody is forcing anyone to do anything. If someone implements generic
> offload like this it's treated just like any other optional feature of
> a NIC.
> 

My concern with this approach is it seems to imply either you have
a BPF engine in hardware (via FPGA or NPU) or you do a program
transformation of a BPF program into registers. Possibly by building
the control flow graph and mapping that onto a parse graph. Maybe
this could be done in some library code for drivers to use but it
seems a bit unnecessary to me when we could make an API map to this
class of hardware.

Note I think a ndo_add_opffload is really useful and needed for
one class  of devices but misses the mark slightly for a large class of
devices we have today/tomorrow.

>> About your argument regarding GRO for every possible protocol:
>>
>> Adding GRO for QUIC or SPUD transparently does not work as it breaks the
>> semantics of UDP. UDP is a framed protocol not a streamed one so it does
>> not make sense to add that. You can implement GRO for fragmented UDP,
>> though. The length of the packet is end-to-end information. If you add a
>> new protocol with a new socket type, sure you can add GRO engine
>> transparently for that but not simply peeking data inside UDP if you
>> don't know how the local application uses this data. In case of
>> forwarding you can never do that, it will break the internet actually.
>> In case you are the end host GRO engine can ask the socket what type it
>> is or what framing inside UDP is used. Thus this cannot work on hardware
>> either.
>>
> This is not correct, We already have many instances of GRO being used
> over UDP in several UDP encapsulations, there is no issue with
> breaking UDP semantics. QUIC is a stream based transport like TCP so
> it will fit into the model (granted the fact that this incoming from
> userspace and the per packet security will make it little more
> challenging to implement offload). I don't know if this is needed, but
> I can only assume that server performance in QUIC must be miserable if
> all the I/O is 1350 bytes.
> 
>> I am not very happy with the use cases of BPF outside of tracing and
>> cls_bpf and packet steering.
>>
>> Please don't propose that we should use BPF as the API for HW
>> programmable offloading currently. It does not make sense.
>>
> If you have an alternative, please propose it now.

My proposal is still the Flow API I proposed back in Feb. It maps
well to at least the segment of hardware that exists today and/or will
exist in the very near future. And also requires less mangling by the
driver, kernel, etc.

As a reminder here are the operations I proposed,

On the read-only side for parse graphs,

 get_hdrs : returns a list of header types supported
 get_hdr_graph : returns a parse graph of the header types
 get_actions : returns a list of actions the device supports on nodes
	       in the above graph.

Then I also proposed some operations for reading out table formats
but I think you could ignore that for the time being if your main
concern is parsing headers for queue mappings, RSS, etc. These were
more about building pipelines of operations. For completeness the
operations were get_tbls and get_tbl_graph.

Further although I didn't propose them in the talk (a) because the
hardware wasn't ready and (b) because rocker which was my prototype
vehicle at the time could not support them but there could be write ops
as well such as,

  set_hdrs : push a list of header types to support
  set_hdr_graph : push a parse graph to support

If you wanted the hdrs and hdr_graph ops could be pushed into a single
operations but I found it easier to deal with two separate operations.

This would (I think at least) easily support NICs that don't have a
a more general purpose engine like BPF or some other instruction set
but do support generic parsers. I think this is the trend that we will
see. Its a big jump to go from fixed logic to an instruction set its
much more manageable on the hardware side to go from fixed logic to
a generic parser engine.

If we insist on BPF programs I don't see how to avoid doing the BPF
to CFG and mapping that onto a parse graph to support the class of
devices I am looking at supporting. Perhaps the argument is this isn't
horrible to do, but I would ask if we go that route make that
mapping in the core kernel code and expose the above ndo ops to the
driver.

Maybe when I was talking at netconf/netdev0.1 I (others?) got hung
up on how the above ops related to switchdev or offload some feature
of the kernel. Specifically the set_flow piece was controversial which
let users push flow rules into the hardware similar to ntuple ethtool
case. But you don't necessarily need to include the set_flow part to
make it useful for loading parse graphs into the hardware.

Thanks
.John



> 
>> Bye,
>> Hannes
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

  reply	other threads:[~2015-12-04 19:54 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-23 21:02 [PATCH 0/6] Generalize udp based tunnels and add geneve offload Anjali Singhai Jain
2015-11-23 21:02 ` [PATCH v1 1/6] net: Generalize udp based tunnel offload Anjali Singhai Jain
2015-11-23 20:57   ` kbuild test robot
2015-11-23 20:58   ` kbuild test robot
2015-11-23 21:53   ` Tom Herbert
2015-11-23 22:49     ` Jesse Gross
2015-11-24  0:32       ` Singhai, Anjali
2015-11-24  0:38         ` Tom Herbert
2015-11-24  1:11           ` Jesse Brandeburg
2015-11-24 17:32             ` Tom Herbert
2015-11-24 17:43               ` Hannes Frederic Sowa
2015-11-24 17:52                 ` Tom Herbert
2015-11-24 18:16                   ` Hannes Frederic Sowa
2015-11-24 18:37                 ` David Miller
2015-11-24 18:42                   ` Hannes Frederic Sowa
2015-11-24 18:43                   ` Tom Herbert
2015-11-30  3:22               ` David Miller
2015-11-30 21:42                 ` Singhai, Anjali
2015-11-30 21:48                   ` Tom Herbert
2015-12-01  3:51                     ` David Miller
2015-12-01  3:48                   ` David Miller
2015-12-01  6:33                     ` Alexander Duyck
2015-11-30  3:21     ` David Miller
2015-11-30 21:33       ` Singhai, Anjali
2015-12-01  0:25       ` Jesse Gross
2015-12-01  1:02         ` Tom Herbert
2015-12-01  1:28           ` Jesse Gross
2015-12-01  5:26             ` Tom Herbert
2015-12-01 15:44               ` John W. Linville
2015-12-01 15:49                 ` Hannes Frederic Sowa
2015-12-01 16:08                   ` John W. Linville
2015-12-02  0:40                     ` Singhai, Anjali
2015-12-02  3:50                   ` Tom Herbert
2015-12-02 16:35                     ` Hannes Frederic Sowa
2015-12-02 19:15                       ` Tom Herbert
2015-12-02 23:35                         ` John Fastabend
2015-12-03  0:15                           ` Tom Herbert
2015-12-08  7:33                             ` John Fastabend
2015-12-08 14:23                               ` Jamal Hadi Salim
2015-12-08 15:10                                 ` Jamal Hadi Salim
2015-12-09  1:40                                   ` Thomas Graf
2015-12-09  5:45                                     ` Alexei Starovoitov
2015-12-09 12:58                                       ` Thomas Graf
2015-12-09 17:38                                         ` Alexei Starovoitov
2015-12-09 20:03                                           ` David Miller
2015-12-09 22:03                                           ` Thomas Graf
2015-12-09 22:21                                             ` David Miller
2015-12-09 22:25                                               ` Thomas Graf
2015-12-03  2:08                           ` Alexei Starovoitov
2015-12-03 15:59                         ` Hannes Frederic Sowa
2015-12-03 16:35                           ` Andreas Schultz
2015-12-03 16:43                             ` Hannes Frederic Sowa
2015-12-04 18:28                           ` Tom Herbert
2015-12-04 19:54                             ` John Fastabend [this message]
2015-12-04 19:59                             ` Hannes Frederic Sowa
2015-12-04 20:02                               ` Hannes Frederic Sowa
2015-12-04 20:06                               ` David Miller
2015-12-04 20:13                                 ` Tom Herbert
2015-12-04 21:37                                   ` David Miller
2015-12-04 20:26                                 ` Hannes Frederic Sowa
2015-12-04 20:43                                   ` Tom Herbert
2015-12-04 21:11                                     ` Hannes Frederic Sowa
2015-12-04 20:44                                   ` Jesse Gross
2015-12-04 22:44                                 ` Alexander Duyck
2015-12-05  0:53                                   ` Tom Herbert
2015-12-05  5:45                                     ` Alexander Duyck
2015-12-05  6:49                                       ` David Miller
2015-12-05  8:24                                         ` Alexander Duyck
2015-12-05 17:53                                           ` Tom Herbert
2015-12-05 19:34                                             ` Alexander Duyck
2015-12-05 18:03                                           ` David Miller
2015-12-05 19:34                                             ` Alexander Duyck
2015-12-05 22:27                                               ` David Miller
2015-12-06  2:13                                                 ` Alexander Duyck
2015-12-06 16:31                                                   ` Tom Herbert
2015-12-06 18:44                                                     ` Alexander Duyck
2015-12-06 21:30                                                       ` Tom Herbert
2015-12-07  1:20                                                         ` Alexander Duyck
2015-12-07  3:02                                                           ` David Ahern
2015-12-07 16:20                                                             ` Jesse Gross
2015-12-05  4:50                                   ` David Miller
2015-12-05  6:50                                     ` Alexander Duyck
2015-11-24  5:41   ` Alexander Duyck
2015-11-30 16:35   ` Tom Herbert
2015-11-30 21:53     ` Singhai, Anjali
2015-12-01  3:52       ` David Miller
2015-11-23 21:02 ` [PATCH v1 2/6] net: Add a generic udp_offload_get_port function Anjali Singhai Jain
2015-11-24  6:08   ` Alexander Duyck
2015-11-24  6:37   ` Alexander Duyck
2015-11-24 19:35     ` Singhai, Anjali
2015-11-23 21:02 ` [PATCH v1 3/6] i40e: Generalize the flow for udp based tunnels Anjali Singhai Jain
2015-11-23 21:02 ` [PATCH v1 4/6] i40e: Remove CONFIG_I40E_VXLAN Anjali Singhai Jain
2015-11-23 21:02 ` [PATCH v1 5/6] net: Refactor udp_offload and add Geneve port offload support Anjali Singhai Jain
2015-11-23 21:02 ` [PATCH v1 6/6] i40e:Add geneve tunnel " Anjali Singhai Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5661EF59.9040606@gmail.com \
    --to=john.fastabend@gmail.com \
    --cc=anjali.singhai@intel.com \
    --cc=davem@davemloft.net \
    --cc=hannes@stressinduktion.org \
    --cc=jesse@kernel.org \
    --cc=kiran.patil@intel.com \
    --cc=linville@tuxdriver.com \
    --cc=netdev@vger.kernel.org \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).