From: Thomas Graf <tgraf@suug.ch>
To: Jiri Pirko <jiri@resnulli.us>
Cc: Simon Horman <simon.horman@netronome.com>,
netdev@vger.kernel.org, davem@davemloft.net,
nhorman@tuxdriver.com, andy@greyhouse.net, dborkman@redhat.com,
ogerlitz@mellanox.com, jesse@nicira.com, jpettit@nicira.com,
joestringer@nicira.com, john.r.fastabend@intel.com,
jhs@mojatatu.com, sfeldma@gmail.com, f.fainelli@gmail.com,
roopa@cumulusnetworks.com, linville@tuxdriver.com,
shrijeet@gmail.com, gospo@cumulusnetworks.com, bcrl@kvack.org
Subject: Re: Flows! Offload them.
Date: Thu, 26 Feb 2015 13:33:26 +0000 [thread overview]
Message-ID: <20150226133326.GC23050@casper.infradead.org> (raw)
In-Reply-To: <20150226091628.GA4059@nanopsycho.orion>
On 02/26/15 at 10:16am, Jiri Pirko wrote:
> Well, on netdev01, I believe that a consensus was reached that for every
> switch offloaded functionality there has to be an implementation in
> kernel.
Agreed. This should not prevent the policy being driven from user
space though.
> What John's Flow API originally did was to provide a way to
> configure hardware independently of kernel. So the right way is to
> configure kernel and, if hw allows it, to offload the configuration to hw.
>
> In this case, seems to me logical to offload from one place, that being
> TC. The reason is, as I stated above, the possible conversion from OVS
> datapath to TC.
Offloading of TC definitely makes a lot of sense. I think that even in
that case you will already encounter independent configuration of
hardware and kernel. Example: The hardware provides a fixed, generic
function to push up to n bytes onto a packet. This hardware function
could be used to implement TC actions "push_vlan", "push_vxlan",
"push_mpls". You would you would likely agree that TC should make use
of such a function even if the hardware version is different from the
software version. So I don't think we'll have a 1:1 mapping for all
configurations, regardless of whether the how is decided in kernel or
user space.
My primiary concern of *only* allowing to decide how to program the
hardware in the kernel is the lack of context; A given L3/L4 software
pipeline in the Linux kernel consists of various subsystems: tc
ingress, linux bridge, various iptables chains, routing rules, routing
tables, tc egress, etc. All of them can be stacked in almost unlimited
combinations using virtual software devices and segmented using
net namespaces.
Given this complexity we'll most likely have to solve some of it with
a flag to control offloading (as already introduced for bridging) and
allow the user to shoot himself in the foot (as Jamal and others
pointed out a couple of times). I currently don't see how the kernel
could *always* get it right automatically. We need some additional
input from the user (See also Patrick's comments regarding iptables
offload)
However, for certain datacenter server use cases we actually have the
full user intent in user space as we configure all of the kernel
subsystems from a single central management agent running locally
on the server (OpenStack, Kubernetes, Mesos, ...), i.e. we do know
exactly what the user wants on the system as a whole. This intent is
then split into small configuration pieces to configure iptables, tc,
routes on multiple net namespaces (for example to implement VRF).
E.g. A VRF in software would make use of net namespaces which holds
tenant specific ACLs, routes and QoS settings. A separate action
would fwd packets to the namespace. Easy and straight forward in
software. OTOH, the hardware, capable of implementing the ACLs,
would also need to know about the tc action which selected the
namespace when attempting to offload the ACL as it would otherwise
ACLs to wrong packets.
I would love to have the possibility to make use of that rich intent
avaiable in user space to program the hardware in combination with
configuring the kernel.
Would love to hear your thoughts on this. I think we all share the same
goal which is to have in-kernel drivers for chips which can perform
advanced switching and support it natively with Linux and have it
become the de-facto standard for both hardware switch management and
compute servers.
next prev parent reply other threads:[~2015-02-26 13:33 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-26 7:42 Flows! Offload them Jiri Pirko
2015-02-26 8:38 ` Simon Horman
2015-02-26 9:16 ` Jiri Pirko
2015-02-26 13:33 ` Thomas Graf [this message]
2015-02-26 15:23 ` John Fastabend
2015-02-26 20:16 ` Neil Horman
2015-02-26 21:11 ` John Fastabend
2015-02-27 1:17 ` Neil Horman
2015-02-27 8:53 ` Jiri Pirko
2015-02-27 16:00 ` John Fastabend
2015-02-26 21:52 ` Simon Horman
2015-02-27 1:22 ` Neil Horman
2015-02-27 1:52 ` Tom Herbert
2015-03-02 13:49 ` Andy Gospodarek
2015-03-02 16:54 ` Scott Feldman
2015-03-02 18:06 ` Andy Gospodarek
[not found] ` <CAGpadYEC3-5AdkOG66q0vX+HM0c6EU-C0ZT=sKGe7rZRHsYYKg@mail.gmail.com>
2015-03-02 22:13 ` Scott Feldman
2015-03-02 22:43 ` Andy Gospodarek
2015-03-02 22:49 ` Florian Fainelli
2015-02-27 8:41 ` Thomas Graf
2015-02-27 12:59 ` Neil Horman
2015-03-01 9:36 ` Arad, Ronen
2015-03-01 14:05 ` Neil Horman
2015-03-02 14:16 ` Jamal Hadi Salim
2015-03-01 9:47 ` Arad, Ronen
2015-03-01 17:20 ` Neil Horman
[not found] ` <CAGpadYGrjfkZqe0k7D05+cy3pY=1hXZtQqtV0J-8ogU80K7BUQ@mail.gmail.com>
2015-02-26 15:39 ` John Fastabend
[not found] ` <CAGpadYHfNcDR2ojubkCJ8-nJTQkdLkPsAwJu0wOKU82bLDzhww@mail.gmail.com>
2015-02-26 16:33 ` Thomas Graf
2015-02-26 16:53 ` John Fastabend
2015-02-27 13:33 ` Jamal Hadi Salim
2015-02-27 15:23 ` John Fastabend
2015-03-02 13:45 ` Jamal Hadi Salim
2015-02-26 17:38 ` David Ahern
2015-02-26 16:04 ` Tom Herbert
2015-02-26 16:17 ` Jiri Pirko
2015-02-26 18:15 ` Tom Herbert
2015-02-26 19:05 ` Thomas Graf
2015-02-27 9:00 ` Jiri Pirko
2015-02-28 20:02 ` David Miller
2015-02-28 21:31 ` Jiri Pirko
2015-02-26 18:16 ` Scott Feldman
2015-02-26 11:22 ` Sowmini Varadhan
2015-02-26 11:39 ` Jiri Pirko
2015-02-26 15:42 ` Sowmini Varadhan
2015-02-27 13:15 ` Named sockets WAS(Re: " Jamal Hadi Salim
2015-02-26 12:51 ` Thomas Graf
2015-02-26 13:17 ` Jiri Pirko
2015-02-26 19:32 ` Florian Fainelli
2015-02-26 20:58 ` John Fastabend
2015-02-26 21:45 ` Florian Fainelli
2015-02-26 23:06 ` John Fastabend
2015-02-27 18:37 ` Neil Horman
2015-02-27 14:01 ` Driver level interface WAS(Re: " Jamal Hadi Salim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150226133326.GC23050@casper.infradead.org \
--to=tgraf@suug.ch \
--cc=andy@greyhouse.net \
--cc=bcrl@kvack.org \
--cc=davem@davemloft.net \
--cc=dborkman@redhat.com \
--cc=f.fainelli@gmail.com \
--cc=gospo@cumulusnetworks.com \
--cc=jesse@nicira.com \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=joestringer@nicira.com \
--cc=john.r.fastabend@intel.com \
--cc=jpettit@nicira.com \
--cc=linville@tuxdriver.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=ogerlitz@mellanox.com \
--cc=roopa@cumulusnetworks.com \
--cc=sfeldma@gmail.com \
--cc=shrijeet@gmail.com \
--cc=simon.horman@netronome.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).