* [rfc] Merging the Open vSwitch datapath
@ 2010-08-30 6:27 Simon Horman
2010-08-30 6:52 ` Joe Perches
2010-08-30 17:22 ` Ben Pfaff
0 siblings, 2 replies; 31+ messages in thread
From: Simon Horman @ 2010-08-30 6:27 UTC (permalink / raw)
To: netdev
Cc: Jesse Gross, Stephen Hemminger, Chris Wright, Herbert Xu,
Arnd Bergmann, David Miller
Hi,
I am looking to submit the Open vSwitch datapath for merging into the
kernel. To this end I have posted a preliminary round of patches to the
dev@openvswitch.org mailing list (unfortunately the online archive seems
incomplete). It seems to me that I am now at or close to the point where
the patches can be posted to netdev. However, I have a few questions.
1) The current patches place the datapath in drivers/staging/ovs-datapath.
It has been proposed that net/openvswitch would be a better location.
What is the feeling of netdev on this?
In a similar vein. Open vSwitch has some headers that are shared
with user-space. Is include/net/openvswitch an acceptable location
for those headers?
2) The code could do with some cleaning up. The current todo list
includes the following items.
* Use pr_*
* Remove trailing whitespace and other formatting fixes
* Stephen Hemminger would like make_writable() removed in favour
of using skb_cow_data() or made generic.
http://openvswitch.org/pipermail/dev_openvswitch.org/2010-August/002993.html
* Removal of compatibility code
* Possible use of netlink for user-space interface
* Network namespace support
While the last two items seem to be post-merge material to me.
I am wondering if it is ok to handle the other items post-merge too.
As the person doing the merge, this would make my life a lot easier
but I'm unclear if it is an acceptable approach or not.
About Open vSwitch:
(Text by Jesse Gross)
Open vSwitch is a multilayer Ethernet switch targeted at virtualized
environments. In addition to supporting a variety of features
expected in a traditional hardware switch, it enables fine-grained
programmatic extension and control through the OpenFlow protocol.
This control is useful in a wide variety of applications but is
particularly important in multi-server virtualization deployments,
which are often characterized by highly dynamic endpoints and the need
to maintain logical abstractions for multiple tenants.
The Open vSwitch datapath provides an in-kernel fast path for packet
forwarding. It is complemented by a userspace daemon, ovs-vswitchd,
which is able to accept configuration from a variety of sources and
translate it into packet processing rules.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-30 6:27 [rfc] Merging the Open vSwitch datapath Simon Horman
@ 2010-08-30 6:52 ` Joe Perches
2010-08-30 7:11 ` Simon Horman
2010-08-30 17:22 ` Ben Pfaff
1 sibling, 1 reply; 31+ messages in thread
From: Joe Perches @ 2010-08-30 6:52 UTC (permalink / raw)
To: Simon Horman
Cc: netdev, Jesse Gross, Stephen Hemminger, Chris Wright, Herbert Xu,
Arnd Bergmann, David Miller
On Mon, 2010-08-30 at 15:27 +0900, Simon Horman wrote:
> I am looking to submit the Open vSwitch datapath for merging into the
> kernel. To this end I have posted a preliminary round of patches to the
> dev@openvswitch.org mailing list
> 2) The code could do with some cleaning up. The current todo list
> includes the following items.
>
> * Use pr_*
> * Remove trailing whitespace and other formatting fixes
[]
These are trivial and easy to fix via script.
I think these should be done prior to being merged.
I've got patches if you want them.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-30 6:52 ` Joe Perches
@ 2010-08-30 7:11 ` Simon Horman
2010-08-30 7:25 ` Joe Perches
0 siblings, 1 reply; 31+ messages in thread
From: Simon Horman @ 2010-08-30 7:11 UTC (permalink / raw)
To: Joe Perches
Cc: netdev, Jesse Gross, Stephen Hemminger, Chris Wright, Herbert Xu,
Arnd Bergmann, David Miller
On Sun, Aug 29, 2010 at 11:52:05PM -0700, Joe Perches wrote:
> On Mon, 2010-08-30 at 15:27 +0900, Simon Horman wrote:
> > I am looking to submit the Open vSwitch datapath for merging into the
> > kernel. To this end I have posted a preliminary round of patches to the
> > dev@openvswitch.org mailing list
>
> > 2) The code could do with some cleaning up. The current todo list
> > includes the following items.
> >
> > * Use pr_*
> > * Remove trailing whitespace and other formatting fixes
>
> []
>
> These are trivial and easy to fix via script.
> I think these should be done prior to being merged.
>
> I've got patches if you want them.
Yes, please send them.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-30 7:11 ` Simon Horman
@ 2010-08-30 7:25 ` Joe Perches
2010-08-30 7:33 ` Simon Horman
0 siblings, 1 reply; 31+ messages in thread
From: Joe Perches @ 2010-08-30 7:25 UTC (permalink / raw)
To: Simon Horman
Cc: netdev, Jesse Gross, Stephen Hemminger, Chris Wright, Herbert Xu,
Arnd Bergmann, David Miller
On Mon, 2010-08-30 at 16:11 +0900, Simon Horman wrote:
> On Sun, Aug 29, 2010 at 11:52:05PM -0700, Joe Perches wrote:
> > On Mon, 2010-08-30 at 15:27 +0900, Simon Horman wrote:
> > > I am looking to submit the Open vSwitch datapath for merging into the
> > > kernel. To this end I have posted a preliminary round of patches to the
> > > dev@openvswitch.org mailing list
> > > 2) The code could do with some cleaning up. The current todo list
> > > includes the following items.
> > > * Use pr_*
> > > * Remove trailing whitespace and other formatting fixes
> > These are trivial and easy to fix via script.
> > I think these should be done prior to being merged.
> > I've got patches if you want them.
> Yes, please send them.
I've send them to you and cc'd the dev@openvswitch.org mailing list.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-30 7:25 ` Joe Perches
@ 2010-08-30 7:33 ` Simon Horman
0 siblings, 0 replies; 31+ messages in thread
From: Simon Horman @ 2010-08-30 7:33 UTC (permalink / raw)
To: Joe Perches
Cc: netdev, Jesse Gross, Stephen Hemminger, Chris Wright, Herbert Xu,
Arnd Bergmann, David Miller
On Mon, Aug 30, 2010 at 12:25:18AM -0700, Joe Perches wrote:
> On Mon, 2010-08-30 at 16:11 +0900, Simon Horman wrote:
> > On Sun, Aug 29, 2010 at 11:52:05PM -0700, Joe Perches wrote:
> > > On Mon, 2010-08-30 at 15:27 +0900, Simon Horman wrote:
> > > > I am looking to submit the Open vSwitch datapath for merging into the
> > > > kernel. To this end I have posted a preliminary round of patches to the
> > > > dev@openvswitch.org mailing list
> > > > 2) The code could do with some cleaning up. The current todo list
> > > > includes the following items.
> > > > * Use pr_*
> > > > * Remove trailing whitespace and other formatting fixes
> > > These are trivial and easy to fix via script.
> > > I think these should be done prior to being merged.
> > > I've got patches if you want them.
> > Yes, please send them.
>
> I've send them to you and cc'd the dev@openvswitch.org mailing list.
Thanks
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-30 6:27 [rfc] Merging the Open vSwitch datapath Simon Horman
2010-08-30 6:52 ` Joe Perches
@ 2010-08-30 17:22 ` Ben Pfaff
2010-08-30 18:26 ` Rose, Gregory V
2010-10-15 11:31 ` openvswitch/flow WAS ( " jamal
1 sibling, 2 replies; 31+ messages in thread
From: Ben Pfaff @ 2010-08-30 17:22 UTC (permalink / raw)
To: netdev
Simon Horman <horms@verge.net.au> writes:
> * Possible use of netlink for user-space interface
I'm working on this one this week, for what it's worth.
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [rfc] Merging the Open vSwitch datapath
2010-08-30 17:22 ` Ben Pfaff
@ 2010-08-30 18:26 ` Rose, Gregory V
2010-08-30 18:33 ` Ben Pfaff
2010-10-15 11:31 ` openvswitch/flow WAS ( " jamal
1 sibling, 1 reply; 31+ messages in thread
From: Rose, Gregory V @ 2010-08-30 18:26 UTC (permalink / raw)
To: Ben Pfaff, netdev@vger.kernel.org
>-----Original Message-----
>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
>On Behalf Of Ben Pfaff
>Sent: Monday, August 30, 2010 10:22 AM
>To: netdev@vger.kernel.org
>Subject: Re: [rfc] Merging the Open vSwitch datapath
>
>Simon Horman <horms@verge.net.au> writes:
>
>> * Possible use of netlink for user-space interface
>
>I'm working on this one this week, for what it's worth.
I just want to put in a plug for the netlink interface. For NICs with EVB we'll need it.
- Greg
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-30 18:26 ` Rose, Gregory V
@ 2010-08-30 18:33 ` Ben Pfaff
2010-08-30 18:45 ` Rose, Gregory V
0 siblings, 1 reply; 31+ messages in thread
From: Ben Pfaff @ 2010-08-30 18:33 UTC (permalink / raw)
To: Rose, Gregory V
Cc: netdev@vger.kernel.org, Jesse Gross, Stephen Hemminger,
Chris Wright, Herbert Xu, Arnd Bergmann, David Miller
[restoring CCs that I inadvertently busted upthread]
On Mon, Aug 30, 2010 at 11:26:17AM -0700, Rose, Gregory V wrote:
> >From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> >On Behalf Of Ben Pfaff
> >
> >Simon Horman <horms@verge.net.au> writes:
> >
> >> * Possible use of netlink for user-space interface
> >
> >I'm working on this one this week, for what it's worth.
>
> I just want to put in a plug for the netlink interface. For NICs with
> EVB we'll need it.
Off-hand, the main reasons to use Netlink, instead of the existing
character device interface, are that Netlink is easier to extend and
that it should reduce or eliminate the 32-to-64 bit compat layer
currently in the Open vSwitch tree.
Why will NICs with EVB require Netlink for the Open vSwitch interface?
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [rfc] Merging the Open vSwitch datapath
2010-08-30 18:33 ` Ben Pfaff
@ 2010-08-30 18:45 ` Rose, Gregory V
2010-08-30 20:59 ` Chris Wright
2010-08-30 21:04 ` Arnd Bergmann
0 siblings, 2 replies; 31+ messages in thread
From: Rose, Gregory V @ 2010-08-30 18:45 UTC (permalink / raw)
To: Ben Pfaff
Cc: netdev@vger.kernel.org, Jesse Gross, Stephen Hemminger,
Chris Wright, Herbert Xu, Arnd Bergmann, David Miller
>-----Original Message-----
>From: Ben Pfaff [mailto:blp@nicira.com]
>Sent: Monday, August 30, 2010 11:33 AM
>To: Rose, Gregory V
>Cc: netdev@vger.kernel.org; Jesse Gross; Stephen Hemminger; Chris
>Wright; Herbert Xu; Arnd Bergmann; David Miller
>Subject: Re: [rfc] Merging the Open vSwitch datapath
>
>[restoring CCs that I inadvertently busted upthread]
>
>On Mon, Aug 30, 2010 at 11:26:17AM -0700, Rose, Gregory V wrote:
>> >From: netdev-owner@vger.kernel.org [mailto:netdev-
>owner@vger.kernel.org]
>> >On Behalf Of Ben Pfaff
>> >
>> >Simon Horman <horms@verge.net.au> writes:
>> >
>> >> * Possible use of netlink for user-space interface
>> >
>> >I'm working on this one this week, for what it's worth.
>>
>> I just want to put in a plug for the netlink interface. For NICs with
>> EVB we'll need it.
>
>Off-hand, the main reasons to use Netlink, instead of the existing
>character device interface, are that Netlink is easier to extend and
>that it should reduce or eliminate the 32-to-64 bit compat layer
>currently in the Open vSwitch tree.
>
>Why will NICs with EVB require Netlink for the Open vSwitch interface?
As of now there are no existing ways to get switch configuration to a NIC without resorting to a customized interface such as a private IOCTL. EVB is an emerging standard that I think would be desirable to support in the kernel. As you mention netlink is easier to extend and I think it would be a great way to add support for NIC EVB in the kernel. But even with a kernel interface there is still no user level tool.
>From what I can tell the Open vSwitch interface with its ability to set packet forwarding rules is also a good candidate for a user space tool to set rules for EVB capable NICs. Seems like a natural extension to me.
- Greg
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-30 18:45 ` Rose, Gregory V
@ 2010-08-30 20:59 ` Chris Wright
2010-08-31 0:48 ` Simon Horman
2010-08-30 21:04 ` Arnd Bergmann
1 sibling, 1 reply; 31+ messages in thread
From: Chris Wright @ 2010-08-30 20:59 UTC (permalink / raw)
To: Rose, Gregory V
Cc: Ben Pfaff, netdev@vger.kernel.org, Jesse Gross, Stephen Hemminger,
Chris Wright, Herbert Xu, Arnd Bergmann, David Miller
* Rose, Gregory V (gregory.v.rose@intel.com) wrote:
> >From: Ben Pfaff [mailto:blp@nicira.com]
> >On Mon, Aug 30, 2010 at 11:26:17AM -0700, Rose, Gregory V wrote:
> >> I just want to put in a plug for the netlink interface. For NICs with
> >> EVB we'll need it.
> >
> >Off-hand, the main reasons to use Netlink, instead of the existing
> >character device interface, are that Netlink is easier to extend and
> >that it should reduce or eliminate the 32-to-64 bit compat layer
> >currently in the Open vSwitch tree.
That, plus it's a typical way to do network configuration. Esp. with
bi-directional communication. So the userspace bit both listens to
netlink messages, like any of the routing daemons or lldpad or similar
do, and sends netlink messasges to update driver's flow table.
BTW, this kind of discussion was why Herbert felt strongly against
drivers/staging/. He wanted to be sure the interfaces were well-defined
first.
> >Why will NICs with EVB require Netlink for the Open vSwitch interface?
>
> As of now there are no existing ways to get switch configuration to a NIC without resorting to a customized interface such as a private IOCTL. EVB is an emerging standard that I think would be desirable to support in the kernel. As you mention netlink is easier to extend and I think it would be a great way to add support for NIC EVB in the kernel. But even with a kernel interface there is still no user level tool.
Right, there's the netlink interface for VFINFO, and a short list I
compiled a while back of "requirements"
http://permalink.gmane.org/gmane.linux.network/158930
> >From what I can tell the Open vSwitch interface with its ability to set packet forwarding rules is also a good candidate for a user space tool to set rules for EVB capable NICs. Seems like a natural extension to me.
Yup, but also consider that the NIC's switches will lag sw. So likely
need a way to say what the thing is capable of so that rules that can't
be enforced in NIC hw are done in sw (or in external hw, ala 802.1Qbg).
thanks,
-chris
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-30 18:45 ` Rose, Gregory V
2010-08-30 20:59 ` Chris Wright
@ 2010-08-30 21:04 ` Arnd Bergmann
2010-08-30 22:15 ` Rose, Gregory V
1 sibling, 1 reply; 31+ messages in thread
From: Arnd Bergmann @ 2010-08-30 21:04 UTC (permalink / raw)
To: Rose, Gregory V
Cc: Ben Pfaff, netdev@vger.kernel.org, Jesse Gross, Stephen Hemminger,
Chris Wright, Herbert Xu, David Miller
On Monday 30 August 2010 20:45:19 Rose, Gregory V wrote:
> As of now there are no existing ways to get switch configuration to a
> NIC without resorting to a customized interface such as a private IOCTL.
Well, there are the IFLA_VF_INFO netlink attributes that I would
assume are to be used for switch configuration and extended where
required for that, e.g. to set VEPA mode per channel.
> EVB is an emerging standard that I think would be desirable to support
> in the kernel.
Do you mean 802.1Qbg? Why would you want kernel support? There is
already support for VEPA in the kernel, and 802.1ad provider bridges
should probably be added in order to support multi-channel setups.
The other parts are configuration protocols like LLDP and CDP, which
we normally do in user space (e.g. lldpad).
What else is there that you think should go into the kernel.
> As you mention netlink is easier to extend and I think
> it would be a great way to add support for NIC EVB in the kernel.
> But even with a kernel interface there is still no user level tool.
Using the same interface as Open vSwitch would be really nice to
configure a NIC bridge sounds interesting if we want to speed up
Open vSwitch, but I don't think it makes any sense for the EVB
protocols. Quite the contrary, you typically want the NIC to
get out of the way and do all the bridging in the external
switch in case of VEPA. Or you actually want to use features of
the software bridge implementation like iptables.
One idea that we have discussed in the past is to use the macvlan
netlink interface to create ports inside a NIC. This interface
already exists in the kernel, and it allows both bridged and VEPA
interfaces. The main advantage of this is that the kernel can
transparently create ports either using software macvlan or
hardware accelerated functions where available.
Arnd
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [rfc] Merging the Open vSwitch datapath
2010-08-30 21:04 ` Arnd Bergmann
@ 2010-08-30 22:15 ` Rose, Gregory V
2010-08-31 11:48 ` Arnd Bergmann
0 siblings, 1 reply; 31+ messages in thread
From: Rose, Gregory V @ 2010-08-30 22:15 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Ben Pfaff, netdev@vger.kernel.org, Jesse Gross, Stephen Hemminger,
Chris Wright, Herbert Xu, David Miller
>-----Original Message-----
>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
>On Behalf Of Arnd Bergmann
>Sent: Monday, August 30, 2010 2:05 PM
>To: Rose, Gregory V
>Cc: Ben Pfaff; netdev@vger.kernel.org; Jesse Gross; Stephen Hemminger;
>Chris Wright; Herbert Xu; David Miller
>Subject: Re: [rfc] Merging the Open vSwitch datapath
>
>On Monday 30 August 2010 20:45:19 Rose, Gregory V wrote:
>> As of now there are no existing ways to get switch configuration to a
>> NIC without resorting to a customized interface such as a private
>IOCTL.
>
>Well, there are the IFLA_VF_INFO netlink attributes that I would
>assume are to be used for switch configuration and extended where
>required for that, e.g. to set VEPA mode per channel.
>
>> EVB is an emerging standard that I think would be desirable to support
>> in the kernel.
>
>Do you mean 802.1Qbg?
Yes, and 802.1Qbh.
Why would you want kernel support? There is
>already support for VEPA in the kernel, and 802.1ad provider bridges
>should probably be added in order to support multi-channel setups.
I should probably read up a bit more on 802.1ad.
>
>The other parts are configuration protocols like LLDP and CDP, which
>we normally do in user space (e.g. lldpad).
>
>What else is there that you think should go into the kernel.
It seems to me that the IFLA_VF_INFO netlink attributes are station oriented. The kernel support I see there is insufficient for some other things that need to be done for access control, forwarding rules and actions taken on certain kind of packets. I think there'll be a need to configure the switch itself, not just the stations attached to the switch.
>
>> As you mention netlink is easier to extend and I think
>> it would be a great way to add support for NIC EVB in the kernel.
>> But even with a kernel interface there is still no user level tool.
>
>Using the same interface as Open vSwitch would be really nice to
>configure a NIC bridge sounds interesting if we want to speed up
>Open vSwitch, but I don't think it makes any sense for the EVB
>protocols. Quite the contrary, you typically want the NIC to
>get out of the way and do all the bridging in the external
>switch in case of VEPA. Or you actually want to use features of
>the software bridge implementation like iptables.
What if the NIC is the external switch? I mean, what if the NIC has an edge virtual bridge embedded in it? The IFLA_VF_INFO messages are sufficient for many features but there are some that it doesn't address. And I don't know of any way to get iptables rules down to the VF using existing kernel interfaces. Perhaps I missed something.
>
>One idea that we have discussed in the past is to use the macvlan
>netlink interface to create ports inside a NIC. This interface
>already exists in the kernel, and it allows both bridged and VEPA
>interfaces. The main advantage of this is that the kernel can
>transparently create ports either using software macvlan or
>hardware accelerated functions where available.
This actually sounds like a good idea. I hadn't thought about that. It would cover one of the primary issues I'm dealing with right now.
- Greg
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-30 20:59 ` Chris Wright
@ 2010-08-31 0:48 ` Simon Horman
2010-08-31 0:54 ` Chris Wright
0 siblings, 1 reply; 31+ messages in thread
From: Simon Horman @ 2010-08-31 0:48 UTC (permalink / raw)
To: Chris Wright
Cc: Rose, Gregory V, Ben Pfaff, netdev@vger.kernel.org, Jesse Gross,
Stephen Hemminger, Herbert Xu, Arnd Bergmann, David Miller
On Mon, Aug 30, 2010 at 01:59:07PM -0700, Chris Wright wrote:
> * Rose, Gregory V (gregory.v.rose@intel.com) wrote:
> > >From: Ben Pfaff [mailto:blp@nicira.com]
> > >On Mon, Aug 30, 2010 at 11:26:17AM -0700, Rose, Gregory V wrote:
> > >> I just want to put in a plug for the netlink interface. For NICs with
> > >> EVB we'll need it.
> > >
> > >Off-hand, the main reasons to use Netlink, instead of the existing
> > >character device interface, are that Netlink is easier to extend and
> > >that it should reduce or eliminate the 32-to-64 bit compat layer
> > >currently in the Open vSwitch tree.
>
> That, plus it's a typical way to do network configuration. Esp. with
> bi-directional communication. So the userspace bit both listens to
> netlink messages, like any of the routing daemons or lldpad or similar
> do, and sends netlink messasges to update driver's flow table.
>
> BTW, this kind of discussion was why Herbert felt strongly against
> drivers/staging/. He wanted to be sure the interfaces were well-defined
> first.
Hi Chris,
Is the implication that there is a preference for finalising
the interface (as much as that is possible) before merging?
[ snip ]
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-31 0:48 ` Simon Horman
@ 2010-08-31 0:54 ` Chris Wright
2010-08-31 1:01 ` Simon Horman
2010-08-31 8:18 ` Herbert Xu
0 siblings, 2 replies; 31+ messages in thread
From: Chris Wright @ 2010-08-31 0:54 UTC (permalink / raw)
To: Simon Horman
Cc: Chris Wright, Rose, Gregory V, Ben Pfaff, netdev@vger.kernel.org,
Jesse Gross, Stephen Hemminger, Herbert Xu, Arnd Bergmann,
David Miller
* Simon Horman (horms@verge.net.au) wrote:
> On Mon, Aug 30, 2010 at 01:59:07PM -0700, Chris Wright wrote:
> > * Rose, Gregory V (gregory.v.rose@intel.com) wrote:
> > > >From: Ben Pfaff [mailto:blp@nicira.com]
> > > >On Mon, Aug 30, 2010 at 11:26:17AM -0700, Rose, Gregory V wrote:
> > > >> I just want to put in a plug for the netlink interface. For NICs with
> > > >> EVB we'll need it.
> > > >
> > > >Off-hand, the main reasons to use Netlink, instead of the existing
> > > >character device interface, are that Netlink is easier to extend and
> > > >that it should reduce or eliminate the 32-to-64 bit compat layer
> > > >currently in the Open vSwitch tree.
> >
> > That, plus it's a typical way to do network configuration. Esp. with
> > bi-directional communication. So the userspace bit both listens to
> > netlink messages, like any of the routing daemons or lldpad or similar
> > do, and sends netlink messasges to update driver's flow table.
> >
> > BTW, this kind of discussion was why Herbert felt strongly against
> > drivers/staging/. He wanted to be sure the interfaces were well-defined
> > first.
>
> Is the implication that there is a preference for finalising
> the interface (as much as that is possible) before merging?
I'll let Herbert chime in, just reminder that was his thought earlier
this month at LinuxCon.
thanks,
-chris
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-31 0:54 ` Chris Wright
@ 2010-08-31 1:01 ` Simon Horman
2010-08-31 1:11 ` Jesse Gross
2010-08-31 8:18 ` Herbert Xu
1 sibling, 1 reply; 31+ messages in thread
From: Simon Horman @ 2010-08-31 1:01 UTC (permalink / raw)
To: Chris Wright
Cc: Rose, Gregory V, Ben Pfaff, netdev@vger.kernel.org, Jesse Gross,
Stephen Hemminger, Herbert Xu, Arnd Bergmann, David Miller
On Mon, Aug 30, 2010 at 05:54:10PM -0700, Chris Wright wrote:
> * Simon Horman (horms@verge.net.au) wrote:
> > On Mon, Aug 30, 2010 at 01:59:07PM -0700, Chris Wright wrote:
> > > * Rose, Gregory V (gregory.v.rose@intel.com) wrote:
> > > > >From: Ben Pfaff [mailto:blp@nicira.com]
> > > > >On Mon, Aug 30, 2010 at 11:26:17AM -0700, Rose, Gregory V wrote:
> > > > >> I just want to put in a plug for the netlink interface. For NICs with
> > > > >> EVB we'll need it.
> > > > >
> > > > >Off-hand, the main reasons to use Netlink, instead of the existing
> > > > >character device interface, are that Netlink is easier to extend and
> > > > >that it should reduce or eliminate the 32-to-64 bit compat layer
> > > > >currently in the Open vSwitch tree.
> > >
> > > That, plus it's a typical way to do network configuration. Esp. with
> > > bi-directional communication. So the userspace bit both listens to
> > > netlink messages, like any of the routing daemons or lldpad or similar
> > > do, and sends netlink messasges to update driver's flow table.
> > >
> > > BTW, this kind of discussion was why Herbert felt strongly against
> > > drivers/staging/. He wanted to be sure the interfaces were well-defined
> > > first.
> >
> > Is the implication that there is a preference for finalising
> > the interface (as much as that is possible) before merging?
>
> I'll let Herbert chime in, just reminder that was his thought earlier
> this month at LinuxCon.
Thanks, I must confess that had slipped my mind.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-31 1:01 ` Simon Horman
@ 2010-08-31 1:11 ` Jesse Gross
2010-08-31 1:38 ` Simon Horman
0 siblings, 1 reply; 31+ messages in thread
From: Jesse Gross @ 2010-08-31 1:11 UTC (permalink / raw)
To: Simon Horman
Cc: Chris Wright, Rose, Gregory V, Ben Pfaff, netdev@vger.kernel.org,
Stephen Hemminger, Herbert Xu, Arnd Bergmann, David Miller
On Mon, Aug 30, 2010 at 6:01 PM, Simon Horman <horms@verge.net.au> wrote:
> On Mon, Aug 30, 2010 at 05:54:10PM -0700, Chris Wright wrote:
>> * Simon Horman (horms@verge.net.au) wrote:
>> > On Mon, Aug 30, 2010 at 01:59:07PM -0700, Chris Wright wrote:
>> > > * Rose, Gregory V (gregory.v.rose@intel.com) wrote:
>> > > > >From: Ben Pfaff [mailto:blp@nicira.com]
>> > > > >On Mon, Aug 30, 2010 at 11:26:17AM -0700, Rose, Gregory V wrote:
>> > > > >> I just want to put in a plug for the netlink interface. For NICs with
>> > > > >> EVB we'll need it.
>> > > > >
>> > > > >Off-hand, the main reasons to use Netlink, instead of the existing
>> > > > >character device interface, are that Netlink is easier to extend and
>> > > > >that it should reduce or eliminate the 32-to-64 bit compat layer
>> > > > >currently in the Open vSwitch tree.
>> > >
>> > > That, plus it's a typical way to do network configuration. Esp. with
>> > > bi-directional communication. So the userspace bit both listens to
>> > > netlink messages, like any of the routing daemons or lldpad or similar
>> > > do, and sends netlink messasges to update driver's flow table.
>> > >
>> > > BTW, this kind of discussion was why Herbert felt strongly against
>> > > drivers/staging/. He wanted to be sure the interfaces were well-defined
>> > > first.
>> >
>> > Is the implication that there is a preference for finalising
>> > the interface (as much as that is possible) before merging?
>>
>> I'll let Herbert chime in, just reminder that was his thought earlier
>> this month at LinuxCon.
>
> Thanks, I must confess that had slipped my mind.
I think it might be worth delaying the merge until we at least have a
starting point. As the userspace interface is such an important
aspect of the code, I don't want to ask people to review code that is
expected to undergo a large change soon (obviously comments are always
welcome at any time). It's probably also more productive to have a
discussion about minor improvements to a proposed interface than a
free-for-all.
As Ben mentioned, he's working on designing a Netlink-based interface
now. It shouldn't take too long to get a first cut out the door so
we'll have something concrete to discuss. I'll certainly be the first
one to promote the different uses that are possible with Open vSwitch
but I don't want too get bogged down in the details of future features
now. As long as the interface doesn't have serious problems
precluding future work, we can merge the existing code and then move
onto new things.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-31 1:11 ` Jesse Gross
@ 2010-08-31 1:38 ` Simon Horman
0 siblings, 0 replies; 31+ messages in thread
From: Simon Horman @ 2010-08-31 1:38 UTC (permalink / raw)
To: Jesse Gross
Cc: Chris Wright, Rose, Gregory V, Ben Pfaff, netdev@vger.kernel.org,
Stephen Hemminger, Herbert Xu, Arnd Bergmann, David Miller
On Mon, Aug 30, 2010 at 06:11:30PM -0700, Jesse Gross wrote:
> On Mon, Aug 30, 2010 at 6:01 PM, Simon Horman <horms@verge.net.au> wrote:
> > On Mon, Aug 30, 2010 at 05:54:10PM -0700, Chris Wright wrote:
> >> * Simon Horman (horms@verge.net.au) wrote:
> >> > On Mon, Aug 30, 2010 at 01:59:07PM -0700, Chris Wright wrote:
> >> > > * Rose, Gregory V (gregory.v.rose@intel.com) wrote:
> >> > > > >From: Ben Pfaff [mailto:blp@nicira.com]
> >> > > > >On Mon, Aug 30, 2010 at 11:26:17AM -0700, Rose, Gregory V wrote:
> >> > > > >> I just want to put in a plug for the netlink interface. For NICs with
> >> > > > >> EVB we'll need it.
> >> > > > >
> >> > > > >Off-hand, the main reasons to use Netlink, instead of the existing
> >> > > > >character device interface, are that Netlink is easier to extend and
> >> > > > >that it should reduce or eliminate the 32-to-64 bit compat layer
> >> > > > >currently in the Open vSwitch tree.
> >> > >
> >> > > That, plus it's a typical way to do network configuration. Esp. with
> >> > > bi-directional communication. So the userspace bit both listens to
> >> > > netlink messages, like any of the routing daemons or lldpad or similar
> >> > > do, and sends netlink messasges to update driver's flow table.
> >> > >
> >> > > BTW, this kind of discussion was why Herbert felt strongly against
> >> > > drivers/staging/. He wanted to be sure the interfaces were well-defined
> >> > > first.
> >> >
> >> > Is the implication that there is a preference for finalising
> >> > the interface (as much as that is possible) before merging?
> >>
> >> I'll let Herbert chime in, just reminder that was his thought earlier
> >> this month at LinuxCon.
> >
> > Thanks, I must confess that had slipped my mind.
>
> I think it might be worth delaying the merge until we at least have a
> starting point. As the userspace interface is such an important
> aspect of the code, I don't want to ask people to review code that is
> expected to undergo a large change soon (obviously comments are always
> welcome at any time). It's probably also more productive to have a
> discussion about minor improvements to a proposed interface than a
> free-for-all.
>
> As Ben mentioned, he's working on designing a Netlink-based interface
> now. It shouldn't take too long to get a first cut out the door so
> we'll have something concrete to discuss. I'll certainly be the first
> one to promote the different uses that are possible with Open vSwitch
> but I don't want too get bogged down in the details of future features
> now. As long as the interface doesn't have serious problems
> precluding future work, we can merge the existing code and then move
> onto new things.
That sounds entirely reasonable.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-31 0:54 ` Chris Wright
2010-08-31 1:01 ` Simon Horman
@ 2010-08-31 8:18 ` Herbert Xu
1 sibling, 0 replies; 31+ messages in thread
From: Herbert Xu @ 2010-08-31 8:18 UTC (permalink / raw)
To: Chris Wright
Cc: Simon Horman, Rose, Gregory V, Ben Pfaff, netdev@vger.kernel.org,
Jesse Gross, Stephen Hemminger, Arnd Bergmann, David Miller
On Mon, Aug 30, 2010 at 05:54:10PM -0700, Chris Wright wrote:
>
> I'll let Herbert chime in, just reminder that was his thought earlier
> this month at LinuxCon.
Well my reasoning was that Open vSwitch isn't a hardware driver
per se, it's much closer to say the bridge module or a tunnel
implementation.
As such we should be very careful before committing to a user-space
interface that we may have to carry forward for years to come.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-30 22:15 ` Rose, Gregory V
@ 2010-08-31 11:48 ` Arnd Bergmann
2010-08-31 17:04 ` Rose, Gregory V
0 siblings, 1 reply; 31+ messages in thread
From: Arnd Bergmann @ 2010-08-31 11:48 UTC (permalink / raw)
To: Rose, Gregory V
Cc: Ben Pfaff, netdev@vger.kernel.org, Jesse Gross, Stephen Hemminger,
Chris Wright, Herbert Xu, David Miller
On Tuesday 31 August 2010, Rose, Gregory V wrote:
> >On Monday 30 August 2010 20:45:19 Rose, Gregory V wrote:
> >> As of now there are no existing ways to get switch configuration to a
> >> NIC without resorting to a customized interface such as a private
> >IOCTL.
> >
> >Well, there are the IFLA_VF_INFO netlink attributes that I would
> >assume are to be used for switch configuration and extended where
> >required for that, e.g. to set VEPA mode per channel.
> >
> >> EVB is an emerging standard that I think would be desirable to support
> >> in the kernel.
> >
> >Do you mean 802.1Qbg?
>
> Yes, and 802.1Qbh.
The situation for 802.1Qbh is a little trickier. We cannot do an
implementation of this in user space, because the spec is not public.
However, we have kernel interfaces that allow you to do this in
firmware/driver.
> > Why would you want kernel support? There is
> >already support for VEPA in the kernel, and 802.1ad provider bridges
> >should probably be added in order to support multi-channel setups.
>
> I should probably read up a bit more on 802.1ad.
What we need here is an extension of the vlan module to allow
double tagging with the right ethertype on the outer frame.
> >The other parts are configuration protocols like LLDP and CDP, which
> >we normally do in user space (e.g. lldpad).
> >
> >What else is there that you think should go into the kernel.
>
> It seems to me that the IFLA_VF_INFO netlink attributes are station
> oriented. The kernel support I see there is insufficient for some
> other things that need to be done for access control, forwarding
> rules and actions taken on certain kind of packets. I think there'll
> be a need to configure the switch itself, not just the stations
> attached to the switch.
Ok, I'm beginning to understand what you want to do.
1. VEPA using software: use a traditional NIC, and macvtap (or similar)
in the hypervisor to separate traffic between the guests, do
bridging in an external switch. Works now.
2. VEPA using hardware: give each guest a VF, configure VFs into VEPA
mode. Requires a trivial addition to IFLA_VF_INFO to allow VEPA setting
3. Simple bridge using software: like 1, but forward traffic between
some or all macvtap ports. Works now.
4. Simple bridge using hardware: Like 2, this is what we do today when
using VFs.
5. Full-featured bridge using brctl/ebtables/iptables. This has access
to all features of the Linux kernel. Works today, but requires management
infrastructure (see: Vyatta) that is not present everywhere.
6. Full-featured bridge in hardware with the features of ebtables/iptables.
Not going to happen IMHO, see below.
7. Full-featured distributed bridge using Open vSwitch. This is
what the current discussion is about.
8. Full-featured distributed bridge using Open vSwitch and hardware support.
I was arguing against 6, which would not even work using the same Open
vSwitch netlink interface, while I guess what you want is 8.
Now I would not call that "configuring the switch", since the switch in
this case is basically a daemon running on the host and configuring the
data path, which has now moved into the hardware from the kernel.
> >> As you mention netlink is easier to extend and I think
> >> it would be a great way to add support for NIC EVB in the kernel.
> >> But even with a kernel interface there is still no user level tool.
> >
> >Using the same interface as Open vSwitch would be really nice to
> >configure a NIC bridge sounds interesting if we want to speed up
> >Open vSwitch, but I don't think it makes any sense for the EVB
> >protocols. Quite the contrary, you typically want the NIC to
> >get out of the way and do all the bridging in the external
> >switch in case of VEPA. Or you actually want to use features of
> >the software bridge implementation like iptables.
>
> What if the NIC is the external switch?
I don't think that is going to happen. All embedded switches
are of the edge (a.k.a. dumb) type right now, and I believe that
will stay this way.
By an external switch, I mean something that is running an
operating system and allows users to log in for configuring
the switch rules.
> I mean, what if the
> NIC has an edge virtual bridge embedded in it? The IFLA_VF_INFO
> messages are sufficient for many features but there are some that
> it doesn't address. And I don't know of any way to get iptables
> rules down to the VF using existing kernel interfaces.
Exactly! The problem is that I don't think any edge virtual bridge
can ever implement the full set of features we have in software,
and for this reason I wouldn't spend too much time in adding a small
subset of the features.
We probably have a few hundreds features implemented in iptables,
ebtables and tc, e.g. connection tracking, quality of service
and filtering. Implementing all these on a NIC is both an enourmous
(or close to impossible) development task and a security risk,
unless you are thinking of actually running Linux on the NIC
to implement them.
Anyway, my point was that improvements to the bridging code
are not directly related to work on EVB, even if we had netfilter
rules for controlling the integrated bridge in your NIC.
Now, your suggestion to define the Open vSwitch netlink interface
in a way that works with both hardware bridges as well as the
kernel code we're discussing does sound great!
Obviously, there are some nice ways to combine this with the EVB
protocols, but I can both being useful without the other.
> >One idea that we have discussed in the past is to use the macvlan
> >netlink interface to create ports inside a NIC. This interface
> >already exists in the kernel, and it allows both bridged and VEPA
> >interfaces. The main advantage of this is that the kernel can
> >transparently create ports either using software macvlan or
> >hardware accelerated functions where available.
>
> This actually sounds like a good idea. I hadn't thought about that.
> It would cover one of the primary issues I'm dealing with right now.
Ok, cool. Since this is something I've been meaning to work on for
some time but never got around to, I'll gladly give help and advice
if you want to work on the implementation. I have access to a number
of Intel NICs to test things.
Arnd
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [rfc] Merging the Open vSwitch datapath
2010-08-31 11:48 ` Arnd Bergmann
@ 2010-08-31 17:04 ` Rose, Gregory V
2010-08-31 17:43 ` Arnd Bergmann
0 siblings, 1 reply; 31+ messages in thread
From: Rose, Gregory V @ 2010-08-31 17:04 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Ben Pfaff, netdev@vger.kernel.org, Jesse Gross, Stephen Hemminger,
Chris Wright, Herbert Xu, David Miller
>-----Original Message-----
>From: Arnd Bergmann [mailto:arnd@arndb.de]
>Sent: Tuesday, August 31, 2010 4:49 AM
>To: Rose, Gregory V
>Cc: Ben Pfaff; netdev@vger.kernel.org; Jesse Gross; Stephen Hemminger;
>Chris Wright; Herbert Xu; David Miller
>Subject: Re: [rfc] Merging the Open vSwitch datapath
>
>On Tuesday 31 August 2010, Rose, Gregory V wrote:
>>
>> I should probably read up a bit more on 802.1ad.
>
>What we need here is an extension of the vlan module to allow
>double tagging with the right ethertype on the outer frame.
Yes.
>
>> >The other parts are configuration protocols like LLDP and CDP, which
>> >we normally do in user space (e.g. lldpad).
>> >
>> >What else is there that you think should go into the kernel.
>>
>> It seems to me that the IFLA_VF_INFO netlink attributes are station
>> oriented. The kernel support I see there is insufficient for some
>> other things that need to be done for access control, forwarding
>> rules and actions taken on certain kind of packets. I think there'll
>> be a need to configure the switch itself, not just the stations
>> attached to the switch.
>
>Ok, I'm beginning to understand what you want to do.
>
>1. VEPA using software: use a traditional NIC, and macvtap (or similar)
> in the hypervisor to separate traffic between the guests, do
> bridging in an external switch. Works now.
>2. VEPA using hardware: give each guest a VF, configure VFs into VEPA
> mode. Requires a trivial addition to IFLA_VF_INFO to allow VEPA
>setting
>3. Simple bridge using software: like 1, but forward traffic between
> some or all macvtap ports. Works now.
>4. Simple bridge using hardware: Like 2, this is what we do today when
> using VFs.
>5. Full-featured bridge using brctl/ebtables/iptables. This has access
> to all features of the Linux kernel. Works today, but requires
>management
> infrastructure (see: Vyatta) that is not present everywhere.
>6. Full-featured bridge in hardware with the features of
>ebtables/iptables.
> Not going to happen IMHO, see below.
>7. Full-featured distributed bridge using Open vSwitch. This is
> what the current discussion is about.
>8. Full-featured distributed bridge using Open vSwitch and hardware
>support.
Yep, that about covers it.
;-)
Agree on item # 6.
>I was arguing against 6, which would not even work using the same Open
>vSwitch netlink interface, while I guess what you want is 8.
>
>Now I would not call that "configuring the switch", since the switch in
>this case is basically a daemon running on the host and configuring the
>data path, which has now moved into the hardware from the kernel.
Yeah, the semantics get tricky sometimes but we're on the same page.
>>
>> What if the NIC is the external switch?
>
>I don't think that is going to happen. All embedded switches
>are of the edge (a.k.a. dumb) type right now, and I believe that
>will stay this way.
>By an external switch, I mean something that is running an
>operating system and allows users to log in for configuring
>the switch rules.
>
>> I mean, what if the
>> NIC has an edge virtual bridge embedded in it? The IFLA_VF_INFO
>> messages are sufficient for many features but there are some that
>> it doesn't address. And I don't know of any way to get iptables
>> rules down to the VF using existing kernel interfaces.
>
>Exactly! The problem is that I don't think any edge virtual bridge
>can ever implement the full set of features we have in software,
>and for this reason I wouldn't spend too much time in adding a small
>subset of the features.
Not sure I agree there. I've gotten specific requests for a small number of features that would make an embedded NIC switch useful to some customers.
>
>We probably have a few hundreds features implemented in iptables,
>ebtables and tc, e.g. connection tracking, quality of service
>and filtering. Implementing all these on a NIC is both an enourmous
>(or close to impossible) development task and a security risk,
>unless you are thinking of actually running Linux on the NIC
>to implement them.
No need to implement all of them but there are a small subset of useful rules and associated actions that would be very useful on the embedded switch of an SR-IOV capable NIC. And these rules and actions would actually promote security from my point of view. I agree that the embedded NIC switch will never (and should never) attempt to implement all the features a full fledged external switch. But as things stand now embedded NIC switches are so dumb as to be almost useless for most security conscious virtualized applications. With the implementation of a small set of rules and associated actions we could make them more useful for a number of our customers.
>
>Anyway, my point was that improvements to the bridging code
>are not directly related to work on EVB, even if we had netfilter
>rules for controlling the integrated bridge in your NIC.
>
>Now, your suggestion to define the Open vSwitch netlink interface
>in a way that works with both hardware bridges as well as the
>kernel code we're discussing does sound great!
>Obviously, there are some nice ways to combine this with the EVB
>protocols, but I can both being useful without the other.
Alright, I'm sort of new to Linux. Most of my past experience is in the embedded space and is more device oriented so I definitely appreciate getting your perspective on this. Like many folks I just have product features that I need to make available to customers. Finding a way to do this that is acceptable to the Linux community and promotes the common welfare (so to speak) is all I'm trying to do here.
>
>> >One idea that we have discussed in the past is to use the macvlan
>> >netlink interface to create ports inside a NIC. This interface
>> >already exists in the kernel, and it allows both bridged and VEPA
>> >interfaces. The main advantage of this is that the kernel can
>> >transparently create ports either using software macvlan or
>> >hardware accelerated functions where available.
>>
>> This actually sounds like a good idea. I hadn't thought about that.
>> It would cover one of the primary issues I'm dealing with right now.
>
>Ok, cool. Since this is something I've been meaning to work on for
>some time but never got around to, I'll gladly give help and advice
>if you want to work on the implementation. I have access to a number
>of Intel NICs to test things.
Excellent. I appreciate the offer and will probably take you up on it.
Thanks!
- Greg
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [rfc] Merging the Open vSwitch datapath
2010-08-31 17:04 ` Rose, Gregory V
@ 2010-08-31 17:43 ` Arnd Bergmann
2010-08-31 20:16 ` Rose, Gregory V
0 siblings, 1 reply; 31+ messages in thread
From: Arnd Bergmann @ 2010-08-31 17:43 UTC (permalink / raw)
To: Rose, Gregory V
Cc: Ben Pfaff, netdev@vger.kernel.org, Jesse Gross, Stephen Hemminger,
Chris Wright, Herbert Xu, David Miller
On Tuesday 31 August 2010, Rose, Gregory V wrote:
> >
> >Exactly! The problem is that I don't think any edge virtual bridge
> >can ever implement the full set of features we have in software,
> >and for this reason I wouldn't spend too much time in adding a small
> >subset of the features.
>
> Not sure I agree there. I've gotten specific requests for a small
> number of features that would make an embedded NIC switch useful
> to some customers.
That should be fine. Adding a small number of features probably
works well enough using extensions to the existing VF_INFO netlink
attributes, like a way to configure ports into VEPA mode.
I'm totally fine with making small additions here, which is something
completely different from extending the interface to the point where
it mimics all the features we have in the linux bridge code.
> No need to implement all of them but there are a small subset of
> useful rules and associated actions that would be very useful on
> the embedded switch of an SR-IOV capable NIC. And these rules
> and actions would actually promote security from my point of view.
> I agree that the embedded NIC switch will never (and should never)
> attempt to implement all the features a full fledged external
> switch. But as things stand now embedded NIC switches are so
> dumb as to be almost useless for most security conscious
> virtualized applications. With the implementation of a small
> set of rules and associated actions we could make them more
> useful for a number of our customers.
Right. Can you share your specific requirements with the rest of us?
Maybe start a new email thread with the same people on it, since
this is now really an Open vSwitch topic.
> Alright, I'm sort of new to Linux. Most of my past experience
> is in the embedded space and is more device oriented so I
> definitely appreciate getting your perspective on this.
> Like many folks I just have product features that I need to make
> available to customers. Finding a way to do this that is
> acceptable to the Linux community and promotes the common welfare
> (so to speak) is all I'm trying to do here.
Ok, cool. Now I think that getting an interface that fits the needs
of the NIC and Open vSwitch would be great for both Open vSwitch and
every NIC vendor implementing it, because it means that you can simply
add a smart NIC and Open vSwitch will run faster without any changes to
the software setup.
Even better for you if you are the first one on the market to implement
it in hardware ;-)
You should probably take a look at the current ioctl based implementation
in Open vSwitch and figure out if you could make that interface would work
on your NIC, and if not, tell us all what is missing to make that work
with the new interface.
Arnd
^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [rfc] Merging the Open vSwitch datapath
2010-08-31 17:43 ` Arnd Bergmann
@ 2010-08-31 20:16 ` Rose, Gregory V
0 siblings, 0 replies; 31+ messages in thread
From: Rose, Gregory V @ 2010-08-31 20:16 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Ben Pfaff, netdev@vger.kernel.org, Jesse Gross, Stephen Hemminger,
Chris Wright, Herbert Xu, David Miller
>-----Original Message-----
>From: Arnd Bergmann [mailto:arnd@arndb.de]
>Sent: Tuesday, August 31, 2010 10:44 AM
>To: Rose, Gregory V
>Cc: Ben Pfaff; netdev@vger.kernel.org; Jesse Gross; Stephen Hemminger;
>Chris Wright; Herbert Xu; David Miller
>Subject: Re: [rfc] Merging the Open vSwitch datapath
>
>On Tuesday 31 August 2010, Rose, Gregory V wrote:
[snip]
>
>> No need to implement all of them but there are a small subset of
>> useful rules and associated actions that would be very useful on
>> the embedded switch of an SR-IOV capable NIC. And these rules
>> and actions would actually promote security from my point of view.
>> I agree that the embedded NIC switch will never (and should never)
>> attempt to implement all the features a full fledged external
>> switch. But as things stand now embedded NIC switches are so
>> dumb as to be almost useless for most security conscious
>> virtualized applications. With the implementation of a small
>> set of rules and associated actions we could make them more
>> useful for a number of our customers.
>
>Right. Can you share your specific requirements with the rest of us?
>Maybe start a new email thread with the same people on it, since
>this is now really an Open vSwitch topic.
I'm not quite ready to do that yet. Disclosure of the specific requirements would be premature at this time because they aren't fully baked yet. Hopefully we'll get there soon and at that time I will take you up on your suggestion.
>
>> Alright, I'm sort of new to Linux. Most of my past experience
>> is in the embedded space and is more device oriented so I
>> definitely appreciate getting your perspective on this.
>> Like many folks I just have product features that I need to make
>> available to customers. Finding a way to do this that is
>> acceptable to the Linux community and promotes the common welfare
>> (so to speak) is all I'm trying to do here.
>
>Ok, cool. Now I think that getting an interface that fits the needs
>of the NIC and Open vSwitch would be great for both Open vSwitch and
>every NIC vendor implementing it, because it means that you can simply
>add a smart NIC and Open vSwitch will run faster without any changes to
>the software setup.
That's the idea!
>Even better for you if you are the first one on the market to implement
>it in hardware ;-)
Yep. And just getting the eco-system in place is good for everyone.
>
>You should probably take a look at the current ioctl based
>implementation
>in Open vSwitch and figure out if you could make that interface would
>work
>on your NIC, and if not, tell us all what is missing to make that work
>with the new interface.
When the time is right I'll do that.
Again, thanks for your advice!
- Greg
^ permalink raw reply [flat|nested] 31+ messages in thread
* openvswitch/flow WAS ( Re: [rfc] Merging the Open vSwitch datapath
2010-08-30 17:22 ` Ben Pfaff
2010-08-30 18:26 ` Rose, Gregory V
@ 2010-10-15 11:31 ` jamal
2010-10-15 16:18 ` Ben Pfaff
2010-10-15 21:35 ` Jesse Gross
1 sibling, 2 replies; 31+ messages in thread
From: jamal @ 2010-10-15 11:31 UTC (permalink / raw)
To: Ben Pfaff, Jesse Gross; +Cc: netdev
Sorry, slightly off topic - and this old (catching up
with netdev randomly going backwards)..
I was curious after seeing the exchange with Eric D. on
the tunnels and how you have your own tree so i glossed
over what your project is doing.
It seems to me that you reinvented things that exist in
Linux already such as bridging, tunnels and what really
caught my attention: ability to do flows (tc actions).
It is possible Linux is missing something you wanted or was
not efficient enough?
[For example: I couldnt see anything you needed
on flow-action management that Linux couldnt do already
(with already very nice well structured netlink APIs)]
cheers,
jamal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: openvswitch/flow WAS ( Re: [rfc] Merging the Open vSwitch datapath
2010-10-15 11:31 ` openvswitch/flow WAS ( " jamal
@ 2010-10-15 16:18 ` Ben Pfaff
2010-10-15 21:35 ` Jesse Gross
1 sibling, 0 replies; 31+ messages in thread
From: Ben Pfaff @ 2010-10-15 16:18 UTC (permalink / raw)
To: ovs-team; +Cc: jamal, Jesse Gross, netdev
[adding ovs-team]
On Fri, Oct 15, 2010 at 07:31:32AM -0400, jamal wrote:
>
> Sorry, slightly off topic - and this old (catching up
> with netdev randomly going backwards)..
>
> I was curious after seeing the exchange with Eric D. on
> the tunnels and how you have your own tree so i glossed
> over what your project is doing.
>
> It seems to me that you reinvented things that exist in
> Linux already such as bridging, tunnels and what really
> caught my attention: ability to do flows (tc actions).
> It is possible Linux is missing something you wanted or was
> not efficient enough?
> [For example: I couldnt see anything you needed
> on flow-action management that Linux couldnt do already
> (with already very nice well structured netlink APIs)]
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: openvswitch/flow WAS ( Re: [rfc] Merging the Open vSwitch datapath
2010-10-15 11:31 ` openvswitch/flow WAS ( " jamal
2010-10-15 16:18 ` Ben Pfaff
@ 2010-10-15 21:35 ` Jesse Gross
2010-10-16 11:35 ` jamal
1 sibling, 1 reply; 31+ messages in thread
From: Jesse Gross @ 2010-10-15 21:35 UTC (permalink / raw)
To: hadi; +Cc: Ben Pfaff, netdev
On Fri, Oct 15, 2010 at 4:31 AM, jamal <hadi@cyberus.ca> wrote:
> It seems to me that you reinvented things that exist in
> Linux already such as bridging, tunnels and what really
> caught my attention: ability to do flows (tc actions).
> It is possible Linux is missing something you wanted or was
> not efficient enough?
> [For example: I couldnt see anything you needed
> on flow-action management that Linux couldnt do already
> (with already very nice well structured netlink APIs)]
You're right, at a high level, it appears that there is a bit of an
overlap between bridging, tc, and Open vSwitch. However, in reality
each is targeting a pretty different use case. Given that the design
goals are not aligned, keeping separate things separate actually helps
with overall simplicity. Where there is overlap, I am certainly happy
to see common functionality reused: for example, Open vSwitch uses tc
for its QoS capabilities.
In the future, I expect there to be an even clearer delineation
between the various components. One of the primary use cases of Open
vSwitch at the moment is for virtualized data center networking but a
few of the other potential uses that have been brought up include
security processing (involving sending traffic of interest to
userspace) and configuring SR-IOV NICs (to appropriately program rules
in hardware). You can see how each of these makes sense in the
context of a virtual switch datapath but less so as a set of tc
actions.
So, in short, I don't see this as something lacking in Linux, just
complementary functionality.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: openvswitch/flow WAS ( Re: [rfc] Merging the Open vSwitch datapath
2010-10-15 21:35 ` Jesse Gross
@ 2010-10-16 11:35 ` jamal
2010-10-16 19:33 ` Jesse Gross
0 siblings, 1 reply; 31+ messages in thread
From: jamal @ 2010-10-16 11:35 UTC (permalink / raw)
To: Jesse Gross; +Cc: Ben Pfaff, netdev, ovs-team
Jesse,
I re-added the other address Ben put earlier on in case you
missed it.
yes, I have heard of TL;DR but unlike Alan Cox i find it hard to
make a point in one sentence of 3 words - so please bear with me
and read on.
On Fri, 2010-10-15 at 14:35 -0700, Jesse Gross wrote:
>
> You're right, at a high level, it appears that there is a bit of an
> overlap between bridging, tc, and Open vSwitch.
It looks like openvswitch rides on top of openflow, correct?
earlier i was looking at openflow/datapath but gleaning
openvswitch/datapath it still looks conceptually the same
at the lower level.
> However, in reality each is targeting a pretty different use case.
Sure, use cases differences typically map either to policy
or extension/addition of a new mechanism.
To clarify - you have the following approach per VM:
-->ingress port --> filter match --> actions
Did i get this right?
You have a classifier that has 10 or so tuples. I could
replicate it with the u32 classifier - but it could be argued
that a brand new "hard-coded" classifier would be needed.
You have a series of actions like: redirect/mirror to port, drop etc
I can do most of these with existing tc actions and maybe replicate
most (like the vlan, MAC address, checksum etc rewrites) with pedit
action - but it could be argued that maybe one or more new tc actions
are needed.
Note: in linux, the above ingress port could be replaced with an
egress port instead. Bridging and L3 come after the actions in
the ingress path; and post that we have exactly the same approach of
port->filter->action
> Given that the design
> goals are not aligned, keeping separate things separate actually helps
> with overall simplicity.
In general i would agree with the simplicity sentiment - but i fail to
see it so far.
A lot of the complexity, such as your own proprietary headers for flows
+actions, doesnt need to sit in the kernel.
IOW, the semantics of openflow already exist albeit a different syntax.
You can map the syntax to semantic in user space. This adheres to the
principal of simple kernel and external policy.
I am sure thats what you would need to do with openflow on top of an
ASIC chip for example, no? I can see from the website you already run on
top of broadcom and marvel...
> Where there is overlap, I am certainly happy
> to see common functionality reused: for example, Open vSwitch uses tc
> for its QoS capabilities.
Refer to above.
> In the future, I expect there to be an even clearer delineation
> between the various components. One of the primary use cases of Open
> vSwitch at the moment is for virtualized data center networking but a
> few of the other potential uses that have been brought up include
> security processing (involving sending traffic of interest to
> userspace) and configuring SR-IOV NICs (to appropriately program rules
> in hardware). You can see how each of these makes sense in the
> context of a virtual switch datapath but less so as a set of tc
> actions.
Unless i am misunderstanding - these are clearly more control extensions
but I dont see any of it needing to be in the kernel. It is all control
path stuff.
i.e something in user space (maybe even in a hypervisor) that is aware
of the virtualization creates, destroys and manages the VMs (SR-IOV etc)
and then configures per-VM flows whether directly in the kernel or via
some ethtool or other interface to the NIC.
> So, in short, I don't see this as something lacking in Linux, just
> complementary functionality.
Like i said above, I dont see the complimentary part.
cheers,
jamal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: openvswitch/flow WAS ( Re: [rfc] Merging the Open vSwitch datapath
2010-10-16 11:35 ` jamal
@ 2010-10-16 19:33 ` Jesse Gross
2010-10-18 12:16 ` jamal
0 siblings, 1 reply; 31+ messages in thread
From: Jesse Gross @ 2010-10-16 19:33 UTC (permalink / raw)
To: hadi; +Cc: Ben Pfaff, netdev, ovs-team
On Sat, Oct 16, 2010 at 4:35 AM, jamal <hadi@cyberus.ca> wrote:
> On Fri, 2010-10-15 at 14:35 -0700, Jesse Gross wrote:
>
>>
>> You're right, at a high level, it appears that there is a bit of an
>> overlap between bridging, tc, and Open vSwitch.
>
> It looks like openvswitch rides on top of openflow, correct?
> earlier i was looking at openflow/datapath but gleaning
> openvswitch/datapath it still looks conceptually the same
> at the lower level.
Yes, Open vSwitch supports the OpenFlow protocol. However, the Open
vSwitch kernel portion is completely different from the OpenFlow
reference implementation datapath and in fact does not speak OpenFlow
at the kernel level. You brought up the point of keeping the kernel
simple and making policy decisions in userspace. I completely agree
and, in fact, that is the reason why Open vSwitch is designed the way
it is.
I think it might be helpful if I gave a high level overview of packet
processing:
When a packet is received it, the relevant fields from the packet are
extracted and matched against a hash table. The most interesting part
is actually what happens when the packets don't match a hash entry:
they get sent up to userspace. It is userspace that makes a policy
decision about the traffic and then pushes down a flow entry for
future packets to match. Some of the things that those decisions can
be based on include: OpenFlow rules, wildcarded entries, normal L2
learning, etc. From then on, packets in that flow can be processed on
the fast path in the kernel with minimal overhead, while still getting
the benefit of the knowledge of userspace.
So I think that we are actually in agreement on quite a number of
points: the kernel should be kept as simple as possible, the control
plane should be abstracted out and handled in userspace, and it should
be possible to map the control rules (from OpenFlow or anywhere
really) onto a simpler set of primitives for handling packets.
So with those goals in mind, here's what is needed:
1. Packet field extraction and classification. Realistically speaking
a new, specialized classifier would probably be needed, as you
mention.
2. A mechanism to send/receive packets to/from userspace. This is an
important component that Open vSwitch adds to the pipeline. This will
probably expand in the future to suit different applications, like the
security processing that I talked about.
3. Output actions. A few exist today, at least some new ones will
need to be added.
So in reality, all of major components of Open vSwitch are actually
not present in the kernel today. I know the argument could be made
that certains parts can be replicated in different ways but that's
back to the simplicity point that I was making earlier. The u32
classifier isn't well suited for these types of rules and neither is
pedit. If we're going to add the needed components either way, let's
not make everyone's lives more complicated by mixing everything
together.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: openvswitch/flow WAS ( Re: [rfc] Merging the Open vSwitch datapath
2010-10-16 19:33 ` Jesse Gross
@ 2010-10-18 12:16 ` jamal
2010-10-18 15:20 ` Simon Horman
0 siblings, 1 reply; 31+ messages in thread
From: jamal @ 2010-10-18 12:16 UTC (permalink / raw)
To: Jesse Gross; +Cc: Ben Pfaff, netdev, ovs-team
On Sat, 2010-10-16 at 12:33 -0700, Jesse Gross wrote:
> On Sat, Oct 16, 2010 at 4:35 AM, jamal <hadi@cyberus.ca> wrote:
> Yes, Open vSwitch supports the OpenFlow protocol. However, the Open
> vSwitch kernel portion is completely different from the OpenFlow
> reference implementation datapath and in fact does not speak OpenFlow
> at the kernel level.
Excellent.
Sorry - I may have misread the openflow code to be openvswitch.
> You brought up the point of keeping the kernel
> simple and making policy decisions in userspace. I completely agree
> and, in fact, that is the reason why Open vSwitch is designed the way
> it is.
>
> I think it might be helpful if I gave a high level overview of packet
> processing:
>
> When a packet is received it, the relevant fields from the packet are
> extracted and matched against a hash table. The most interesting part
> is actually what happens when the packets don't match a hash entry:
> they get sent up to userspace. It is userspace that makes a policy
> decision about the traffic and then pushes down a flow entry for
> future packets to match. Some of the things that those decisions can
> be based on include: OpenFlow rules, wildcarded entries, normal L2
> learning, etc. From then on, packets in that flow can be processed on
> the fast path in the kernel with minimal overhead, while still getting
> the benefit of the knowledge of userspace.
Ok, pretty classical stuff - exception handling in control path, update
policy to data path based on exception, subsequent stuff happens in data
path.
> So I think that we are actually in agreement on quite a number of
> points: the kernel should be kept as simple as possible, the control
> plane should be abstracted out and handled in userspace, and it should
> be possible to map the control rules (from OpenFlow or anywhere
> really) onto a simpler set of primitives for handling packets.
>
> So with those goals in mind, here's what is needed:
> 1. Packet field extraction and classification. Realistically speaking
> a new, specialized classifier would probably be needed, as you
> mention.
I think a new classifier would make life simpler here.
> 2. A mechanism to send/receive packets to/from userspace. This is an
> important component that Open vSwitch adds to the pipeline. This will
> probably expand in the future to suit different applications, like the
> security processing that I talked about.
There are many ways to skin that proverbial cat. I guess it will depend
on whether you are redirecting or merely copying a whole packet, or part
of it (while storing a part in the kernel) etc. Example for a scheme
that works using netlink look at the netfilter examples. You could use
pf_packet if merely requiring copies. One simple scheme i have used is
to have the mirred action redirect to a tun device on which a user space
daemon is listening. If you look at the mirred action - there is an
option to redirect to a named socket which was never implemented because
workarounds exist.
> 3. Output actions. A few exist today, at least some new ones will
> need to be added.
Agreed.
> So in reality, all of major components of Open vSwitch are actually
> not present in the kernel today. I know the argument could be made
> that certains parts can be replicated in different ways but that's
> back to the simplicity point that I was making earlier. The u32
> classifier isn't well suited for these types of rules and neither is
> pedit. If we're going to add the needed components either way, let's
> not make everyone's lives more complicated by mixing everything
> together.
I have to say it is a pleasant suprise that we agree. When i looked at
the openflow code i was worried. I always believe in improving what we
have in Linux than trying to add parallel competing interfaces.
[You'd be suprised for example on the number of vendors who put forward
the claim that they can route faster on Linux[1] by writing a little
barebone driver which ignores 99% of reality.]
cheers,
jamal
[1] I am forgiving on academics
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: openvswitch/flow WAS ( Re: [rfc] Merging the Open vSwitch datapath
2010-10-18 12:16 ` jamal
@ 2010-10-18 15:20 ` Simon Horman
2010-10-19 10:22 ` jamal
0 siblings, 1 reply; 31+ messages in thread
From: Simon Horman @ 2010-10-18 15:20 UTC (permalink / raw)
To: jamal; +Cc: Jesse Gross, Ben Pfaff, netdev, ovs-team
On Mon, Oct 18, 2010 at 08:16:57AM -0400, jamal wrote:
>
> On Sat, 2010-10-16 at 12:33 -0700, Jesse Gross wrote:
> > On Sat, Oct 16, 2010 at 4:35 AM, jamal <hadi@cyberus.ca> wrote:
[ snip ]
> > 2. A mechanism to send/receive packets to/from userspace. This is an
> > important component that Open vSwitch adds to the pipeline. This will
> > probably expand in the future to suit different applications, like the
> > security processing that I talked about.
>
> There are many ways to skin that proverbial cat. I guess it will depend
> on whether you are redirecting or merely copying a whole packet, or part
> of it (while storing a part in the kernel) etc. Example for a scheme
> that works using netlink look at the netfilter examples. You could use
> pf_packet if merely requiring copies. One simple scheme i have used is
> to have the mirred action redirect to a tun device on which a user space
> daemon is listening. If you look at the mirred action - there is an
> option to redirect to a named socket which was never implemented because
> workarounds exist.
As I understand things, the packet goes from the kernel to userspace
and then (typically) comes back again.
I guess that it would be possible to send a copy of the headers
to user-sapce while the packet is quarantined in the kernel pending
a response from user-space. I say only the headers, as typically
that is all user-space needs to make a decision, though I guess it
may need the body to make some types of decisions. I have no idea
if such a scheme would be desirable in any circumstances.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: openvswitch/flow WAS ( Re: [rfc] Merging the Open vSwitch datapath
2010-10-18 15:20 ` Simon Horman
@ 2010-10-19 10:22 ` jamal
2010-10-19 14:56 ` Simon Horman
0 siblings, 1 reply; 31+ messages in thread
From: jamal @ 2010-10-19 10:22 UTC (permalink / raw)
To: Simon Horman; +Cc: Jesse Gross, Ben Pfaff, netdev, ovs-team
On Mon, 2010-10-18 at 17:20 +0200, Simon Horman wrote:
> As I understand things, the packet goes from the kernel to userspace
> and then (typically) comes back again.
Injection back is trivial.
> I guess that it would be possible to send a copy of the headers
> to user-sapce while the packet is quarantined in the kernel pending
> a response from user-space. I say only the headers, as typically
> that is all user-space needs to make a decision, though I guess it
> may need the body to make some types of decisions. I have no idea
> if such a scheme would be desirable in any circumstances.
quarantine the packet in the kernel would be trickier than sending the
whole thing up - for a sample of how it is done i believe the netfilter
approach (ipq?) as well as ipsec would be good samples to look at.
cheers,
jamal
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: openvswitch/flow WAS ( Re: [rfc] Merging the Open vSwitch datapath
2010-10-19 10:22 ` jamal
@ 2010-10-19 14:56 ` Simon Horman
0 siblings, 0 replies; 31+ messages in thread
From: Simon Horman @ 2010-10-19 14:56 UTC (permalink / raw)
To: jamal; +Cc: Jesse Gross, Ben Pfaff, netdev, ovs-team
On Tue, Oct 19, 2010 at 06:22:48AM -0400, jamal wrote:
> On Mon, 2010-10-18 at 17:20 +0200, Simon Horman wrote:
>
> > As I understand things, the packet goes from the kernel to userspace
> > and then (typically) comes back again.
>
> Injection back is trivial.
>
> > I guess that it would be possible to send a copy of the headers
> > to user-sapce while the packet is quarantined in the kernel pending
> > a response from user-space. I say only the headers, as typically
> > that is all user-space needs to make a decision, though I guess it
> > may need the body to make some types of decisions. I have no idea
> > if such a scheme would be desirable in any circumstances.
>
> quarantine the packet in the kernel would be trickier than sending the
> whole thing up - for a sample of how it is done i believe the netfilter
> approach (ipq?) as well as ipsec would be good samples to look at.
Ok, lets forget my quarantine idea - I was just thinking aloud.
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2010-10-19 14:55 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-30 6:27 [rfc] Merging the Open vSwitch datapath Simon Horman
2010-08-30 6:52 ` Joe Perches
2010-08-30 7:11 ` Simon Horman
2010-08-30 7:25 ` Joe Perches
2010-08-30 7:33 ` Simon Horman
2010-08-30 17:22 ` Ben Pfaff
2010-08-30 18:26 ` Rose, Gregory V
2010-08-30 18:33 ` Ben Pfaff
2010-08-30 18:45 ` Rose, Gregory V
2010-08-30 20:59 ` Chris Wright
2010-08-31 0:48 ` Simon Horman
2010-08-31 0:54 ` Chris Wright
2010-08-31 1:01 ` Simon Horman
2010-08-31 1:11 ` Jesse Gross
2010-08-31 1:38 ` Simon Horman
2010-08-31 8:18 ` Herbert Xu
2010-08-30 21:04 ` Arnd Bergmann
2010-08-30 22:15 ` Rose, Gregory V
2010-08-31 11:48 ` Arnd Bergmann
2010-08-31 17:04 ` Rose, Gregory V
2010-08-31 17:43 ` Arnd Bergmann
2010-08-31 20:16 ` Rose, Gregory V
2010-10-15 11:31 ` openvswitch/flow WAS ( " jamal
2010-10-15 16:18 ` Ben Pfaff
2010-10-15 21:35 ` Jesse Gross
2010-10-16 11:35 ` jamal
2010-10-16 19:33 ` Jesse Gross
2010-10-18 12:16 ` jamal
2010-10-18 15:20 ` Simon Horman
2010-10-19 10:22 ` jamal
2010-10-19 14:56 ` Simon Horman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).