* OVS Offload Decision Proposal
@ 2015-03-04 1:18 Simon Horman
2015-03-04 16:45 ` Tom Herbert
0 siblings, 1 reply; 19+ messages in thread
From: Simon Horman @ 2015-03-04 1:18 UTC (permalink / raw)
To: dev; +Cc: netdev
[ CCed netdev as although this is primarily about Open vSwitch userspace
I believe there are some interested parties not on the Open vSwitch
dev mailing list ]
Hi,
The purpose of this email is to describe a rough design for driving Open
vSwitch flow offload from user-space. But before getting to that I would
like to provide some background information.
The proposed design is for "OVS Offload Decision": a proposed component of
ovs-vswitchd. In short the top-most red box in the first figure in the
"OVS HW Offload Architecture" document edited by Thomas Graf[1].
[1] https://docs.google.com/document/d/195waUliu7G5YYVuXHmLmHgJ38DFSte321WPq0oaFhyU/edit#heading=h.116je16s8xzw
Assumptions
-----------
There is currently a lively debate on various aspects of flow offloads
within the Linux networking community. As of writing the latest discussion
centers around the "Flows! Offload them." thread[2] on the netdev mailing
list.
[2] http://thread.gmane.org/gmane.linux.network/351860
My aim is not to preempt the outcome of those discussions. But rather to
investigate what offloads might look like in ovs-vswitchd. In order to make
that investigation concrete I have made some assumptions about facilities
that may be provided by the kernel in future. Clearly if the discussions
within the Linux networking community end in a solution that differs from
my assumptions then this work will need to be revisited. Indeed, I entirely
expect this work to be revised and refined and possibly even radically
rethought as time goes on.
That said, my working assumptions are:
* That Open vSwitch may manage flow offloads from user-space. This is as
opposed to them being transparently handled in the datapath. This does
not preclude the existence of transparent offloading in the datapath.
But rather limits this discussion to a mode where offloads are managed
from user-space.
* That Open vSwitch may add flows to hardware via an API provided by the
kernel. In particular my working assumption is that the Flow API proposed
by John Fastabend[3] may be used to add flows to hardware. While the
existing netlink API may be used to add flows to the kernel datapath.
* That there will be an API provided by the kernel to allow the discovery
of hardware offload capabilities by user-space. Again my working
assumption is that the Flow API proposed by John Fastabend[3] may be used
for this purpose.
[3] http://thread.gmane.org/gmane.linux.network/347188
Rough Design
------------
* Modify flow translation so that the switch parent id[4] of the flow is
recorded as part of its translation context. The switch parent id was
recently added to the Linux kernel and provides a common identifier for
all netdevices that are backed by the same underlying switch hardware for
some very loose definition of switch. In this scheme if the input and all
output ports of a flow belong to the same switch hardware then the switch
id of the translation context would be set accordingly, indicating
offload of the flow may occur to that switch.
[4] https://github.com/torvalds/linux/blob/master/Documentation/networking/switchdev.txt
At this time this excludes both flows that either span multiple switch
devices or use vports that are not backed directly by netdevices, for
example tunnel vports. While important I believe these are topics for
further work.
* At the point where a flow is to be added to the datapath ovs-vswitchd
should determine if it should be offloaded and if so translate it to a
flow for the hardware offload API and queue this translated flow up to be
added to hardware as well as the datapath.
The translation to hardware flows could be performed along with the
translation that already occurs from OpenFlow to ODP flows. However, that
translation is already quite complex and called for a variety of reasons
other than to prepare flows to be added to the datapath. So I think it
makes some sense to keep the new translation separate from the existing
one.
The determination mentioned above could first check if the switch id is
set and then may make further checks: for example that there is space in
the hardware for a new flow, that all the matches and actions of the flow
may be offloaded.
There seems to be ample scope for complex logic to determine which flows
should be offloaded. And I believe that one motivation for handling
offloads in user-space for such complex logic to live in user-space.
However, in order to keep things simple in the beginning I propose some
very simple logic: offload all flows that the hardware supports up until
the hardware runs out of space.
This seems like a reasonable start keeping in mind that all flows will
also be added to the datapath and that ovs-vswitchd constructs flows such
that they do not overlap.
A more conservative version of this simple rule would be to remove all
flows from hardware if a flow is encountered that is not to be added to
hardware. That is, ensure either all flows that are in hardware are also
in software or no flows are in hardware at all. This is the approach
being initially taken for L3 offloads in the Linux kernel[5].
[5] http://thread.gmane.org/gmane.linux.network/352481/focus=352658
* It seems to me that somewhat tricky problem is how to manage flows in
hardware. As things stand ovs-vswitchd generally manages flows in the
datapath by dumping flows, inspecting the dumped flows to see how
recently they have been used and removing idle flows from the datapath.
Unfortunately this approach may not be well suited to flows offloaded to
hardware as dumping flows may be prohibitively expensive. As such I would
like some consideration given to three approaches. Perhaps in the end all
will need to be supported. And perhaps there are others:
1. Dump Flows
This is the approach currently taken to managing datapath flows. As
stated above my feeling is that this will not be well suited much
hardware. However, for simplicity it may be a good place to start.
2. Notifications
In this approach flows are added to hardware with a soft timeout and
hardware removes flows when they timeout sending a notification when
that occurs. Notifications would be relayed up to user space from the
driver in the kernel. Some effort may be required to mitigate
notification storms if many flows are removed in a short space of
time. It is also of note that there is likely to be hardware that
can't generate notifications on flow removal.
3. Aging in hardware
In this approach flows are added to hardware with a soft timeout and
hardware removes the flows when they timeout. However no notification
is generated. And thus ovs-vswitchd has no way of knowing if a flow is
still present in hardware or not. From a hardware point of view this
seems to be the simplest to support. But I suspect that it would
present some significant challenges to ovs-vswitchd in the context of
its current implementation of flow management. Especially if flows are
also to be present in the datapath as proposed above.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: OVS Offload Decision Proposal
2015-03-04 1:18 OVS Offload Decision Proposal Simon Horman
@ 2015-03-04 16:45 ` Tom Herbert
2015-03-04 19:07 ` John Fastabend
2015-03-05 0:04 ` [ovs-dev] " David Christensen
0 siblings, 2 replies; 19+ messages in thread
From: Tom Herbert @ 2015-03-04 16:45 UTC (permalink / raw)
To: Simon Horman; +Cc: dev@openvswitch.org, Linux Netdev List
Hi Simon, a few comments inline.
On Tue, Mar 3, 2015 at 5:18 PM, Simon Horman <simon.horman@netronome.com> wrote:
> [ CCed netdev as although this is primarily about Open vSwitch userspace
> I believe there are some interested parties not on the Open vSwitch
> dev mailing list ]
>
> Hi,
>
> The purpose of this email is to describe a rough design for driving Open
> vSwitch flow offload from user-space. But before getting to that I would
> like to provide some background information.
>
> The proposed design is for "OVS Offload Decision": a proposed component of
> ovs-vswitchd. In short the top-most red box in the first figure in the
> "OVS HW Offload Architecture" document edited by Thomas Graf[1].
>
> [1] https://docs.google.com/document/d/195waUliu7G5YYVuXHmLmHgJ38DFSte321WPq0oaFhyU/edit#heading=h.116je16s8xzw
>
> Assumptions
> -----------
>
> There is currently a lively debate on various aspects of flow offloads
> within the Linux networking community. As of writing the latest discussion
> centers around the "Flows! Offload them." thread[2] on the netdev mailing
> list.
>
> [2] http://thread.gmane.org/gmane.linux.network/351860
>
> My aim is not to preempt the outcome of those discussions. But rather to
> investigate what offloads might look like in ovs-vswitchd. In order to make
> that investigation concrete I have made some assumptions about facilities
> that may be provided by the kernel in future. Clearly if the discussions
> within the Linux networking community end in a solution that differs from
> my assumptions then this work will need to be revisited. Indeed, I entirely
> expect this work to be revised and refined and possibly even radically
> rethought as time goes on.
>
> That said, my working assumptions are:
>
> * That Open vSwitch may manage flow offloads from user-space. This is as
> opposed to them being transparently handled in the datapath. This does
> not preclude the existence of transparent offloading in the datapath.
> But rather limits this discussion to a mode where offloads are managed
> from user-space.
>
> * That Open vSwitch may add flows to hardware via an API provided by the
> kernel. In particular my working assumption is that the Flow API proposed
> by John Fastabend[3] may be used to add flows to hardware. While the
> existing netlink API may be used to add flows to the kernel datapath.
>
Doesn't this imply two entities to be independently managing the same
physical resource? If so, this raises questions of how the resource
would be partitioned between them? How are conflicting requests
between the two rectified?
> * That there will be an API provided by the kernel to allow the discovery
> of hardware offload capabilities by user-space. Again my working
> assumption is that the Flow API proposed by John Fastabend[3] may be used
> for this purpose.
>
> [3] http://thread.gmane.org/gmane.linux.network/347188
>
> Rough Design
> ------------
>
> * Modify flow translation so that the switch parent id[4] of the flow is
> recorded as part of its translation context. The switch parent id was
> recently added to the Linux kernel and provides a common identifier for
> all netdevices that are backed by the same underlying switch hardware for
> some very loose definition of switch. In this scheme if the input and all
> output ports of a flow belong to the same switch hardware then the switch
> id of the translation context would be set accordingly, indicating
> offload of the flow may occur to that switch.
>
> [4] https://github.com/torvalds/linux/blob/master/Documentation/networking/switchdev.txt
>
> At this time this excludes both flows that either span multiple switch
> devices or use vports that are not backed directly by netdevices, for
> example tunnel vports. While important I believe these are topics for
> further work.
>
> * At the point where a flow is to be added to the datapath ovs-vswitchd
> should determine if it should be offloaded and if so translate it to a
> flow for the hardware offload API and queue this translated flow up to be
> added to hardware as well as the datapath.
>
> The translation to hardware flows could be performed along with the
> translation that already occurs from OpenFlow to ODP flows. However, that
> translation is already quite complex and called for a variety of reasons
> other than to prepare flows to be added to the datapath. So I think it
> makes some sense to keep the new translation separate from the existing
> one.
>
> The determination mentioned above could first check if the switch id is
> set and then may make further checks: for example that there is space in
> the hardware for a new flow, that all the matches and actions of the flow
> may be offloaded.
>
> There seems to be ample scope for complex logic to determine which flows
> should be offloaded. And I believe that one motivation for handling
> offloads in user-space for such complex logic to live in user-space.
I think there needs to be more thought around the long term
ramifications of this model. Aside from the potential conflicts with
kernel that I mentioned above as well as the inevitable replication of
functionality between kernel and userspace, I don't see that we have
any good precedents for dynamically managing a HW offload from user
space like this. AFAIK, all current networking offloads are managed by
kernel or device, and I believe iSCSI, RDMA qp's, and even TOE
offloads were all managed in the kernel. The basic problem of choosing
best M of N total flows to offload really isn't fundamentally
different than some other kernel mechanisms such as how we need to
manage the memory allocated to the page cache.
> However, in order to keep things simple in the beginning I propose some
> very simple logic: offload all flows that the hardware supports up until
> the hardware runs out of space.
>
> This seems like a reasonable start keeping in mind that all flows will
> also be added to the datapath and that ovs-vswitchd constructs flows such
> that they do not overlap.
>
Again, who will enforce this?
> A more conservative version of this simple rule would be to remove all
> flows from hardware if a flow is encountered that is not to be added to
> hardware. That is, ensure either all flows that are in hardware are also
> in software or no flows are in hardware at all. This is the approach
> being initially taken for L3 offloads in the Linux kernel[5].
>
That approach is non-starter for real deployment anyway. Graceful
degradation is a fundamental requirement.
> [5] http://thread.gmane.org/gmane.linux.network/352481/focus=352658
>
> * It seems to me that somewhat tricky problem is how to manage flows in
> hardware. As things stand ovs-vswitchd generally manages flows in the
> datapath by dumping flows, inspecting the dumped flows to see how
> recently they have been used and removing idle flows from the datapath.
> Unfortunately this approach may not be well suited to flows offloaded to
> hardware as dumping flows may be prohibitively expensive. As such I would
> like some consideration given to three approaches. Perhaps in the end all
> will need to be supported. And perhaps there are others:
>
> 1. Dump Flows
> This is the approach currently taken to managing datapath flows. As
> stated above my feeling is that this will not be well suited much
> hardware. However, for simplicity it may be a good place to start.
>
> 2. Notifications
> In this approach flows are added to hardware with a soft timeout and
> hardware removes flows when they timeout sending a notification when
> that occurs. Notifications would be relayed up to user space from the
> driver in the kernel. Some effort may be required to mitigate
> notification storms if many flows are removed in a short space of
> time. It is also of note that there is likely to be hardware that
> can't generate notifications on flow removal.
>
> 3. Aging in hardware
> In this approach flows are added to hardware with a soft timeout and
> hardware removes the flows when they timeout. However no notification
> is generated. And thus ovs-vswitchd has no way of knowing if a flow is
> still present in hardware or not. From a hardware point of view this
> seems to be the simplest to support. But I suspect that it would
> present some significant challenges to ovs-vswitchd in the context of
> its current implementation of flow management. Especially if flows are
> also to be present in the datapath as proposed above.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: OVS Offload Decision Proposal
2015-03-04 16:45 ` Tom Herbert
@ 2015-03-04 19:07 ` John Fastabend
2015-03-04 21:36 ` Tom Herbert
2015-03-05 0:04 ` [ovs-dev] " David Christensen
1 sibling, 1 reply; 19+ messages in thread
From: John Fastabend @ 2015-03-04 19:07 UTC (permalink / raw)
To: Tom Herbert
Cc: Simon Horman, dev@openvswitch.org, Linux Netdev List, Neil Horman,
tgraf
On 03/04/2015 08:45 AM, Tom Herbert wrote:
> Hi Simon, a few comments inline.
>
> On Tue, Mar 3, 2015 at 5:18 PM, Simon Horman <simon.horman@netronome.com> wrote:
>> [ CCed netdev as although this is primarily about Open vSwitch userspace
>> I believe there are some interested parties not on the Open vSwitch
>> dev mailing list ]
>>
>> Hi,
>>
>> The purpose of this email is to describe a rough design for driving Open
>> vSwitch flow offload from user-space. But before getting to that I would
>> like to provide some background information.
>>
>> The proposed design is for "OVS Offload Decision": a proposed component of
>> ovs-vswitchd. In short the top-most red box in the first figure in the
>> "OVS HW Offload Architecture" document edited by Thomas Graf[1].
>>
>> [1] https://docs.google.com/document/d/195waUliu7G5YYVuXHmLmHgJ38DFSte321WPq0oaFhyU/edit#heading=h.116je16s8xzw
>>
>> Assumptions
>> -----------
>>
>> There is currently a lively debate on various aspects of flow offloads
>> within the Linux networking community. As of writing the latest discussion
>> centers around the "Flows! Offload them." thread[2] on the netdev mailing
>> list.
>>
>> [2] http://thread.gmane.org/gmane.linux.network/351860
>>
>> My aim is not to preempt the outcome of those discussions. But rather to
>> investigate what offloads might look like in ovs-vswitchd. In order to make
>> that investigation concrete I have made some assumptions about facilities
>> that may be provided by the kernel in future. Clearly if the discussions
>> within the Linux networking community end in a solution that differs from
>> my assumptions then this work will need to be revisited. Indeed, I entirely
>> expect this work to be revised and refined and possibly even radically
>> rethought as time goes on.
>>
>> That said, my working assumptions are:
>>
>> * That Open vSwitch may manage flow offloads from user-space. This is as
>> opposed to them being transparently handled in the datapath. This does
>> not preclude the existence of transparent offloading in the datapath.
>> But rather limits this discussion to a mode where offloads are managed
>> from user-space.
>>
>> * That Open vSwitch may add flows to hardware via an API provided by the
>> kernel. In particular my working assumption is that the Flow API proposed
>> by John Fastabend[3] may be used to add flows to hardware. While the
>> existing netlink API may be used to add flows to the kernel datapath.
>>
> Doesn't this imply two entities to be independently managing the same
> physical resource? If so, this raises questions of how the resource
> would be partitioned between them? How are conflicting requests
> between the two rectified?
What two entities? The driver + flow API code I have in this case manage
the physical resource.
I'm guessing the conflict you are thinking about is if we want to use
both L3 (or some other kernel subsystem) and OVS in the above case at
the same time? Not sure if people actually do this but what I expect is
the L3 sub-system should request a table from the hardware for L3
routes. Then the driver/kernel can allocate a part of the hardware
resources for L3 and a set for OVS.
This seems to work fairly well in practice in the user space drivers
but implies some provisioning up front which is what Neil was proposing.
Even without this OVS discussion I don't see how you avoid the
provisioning step.
>
>> * That there will be an API provided by the kernel to allow the discovery
>> of hardware offload capabilities by user-space. Again my working
>> assumption is that the Flow API proposed by John Fastabend[3] may be used
>> for this purpose.
>>
>> [3] http://thread.gmane.org/gmane.linux.network/347188
>>
>> Rough Design
>> ------------
>>
>> * Modify flow translation so that the switch parent id[4] of the flow is
>> recorded as part of its translation context. The switch parent id was
>> recently added to the Linux kernel and provides a common identifier for
>> all netdevices that are backed by the same underlying switch hardware for
>> some very loose definition of switch. In this scheme if the input and all
>> output ports of a flow belong to the same switch hardware then the switch
>> id of the translation context would be set accordingly, indicating
>> offload of the flow may occur to that switch.
>>
>> [4] https://github.com/torvalds/linux/blob/master/Documentation/networking/switchdev.txt
>>
>> At this time this excludes both flows that either span multiple switch
>> devices or use vports that are not backed directly by netdevices, for
>> example tunnel vports. While important I believe these are topics for
>> further work.
>>
>> * At the point where a flow is to be added to the datapath ovs-vswitchd
>> should determine if it should be offloaded and if so translate it to a
>> flow for the hardware offload API and queue this translated flow up to be
>> added to hardware as well as the datapath.
>>
>> The translation to hardware flows could be performed along with the
>> translation that already occurs from OpenFlow to ODP flows. However, that
>> translation is already quite complex and called for a variety of reasons
>> other than to prepare flows to be added to the datapath. So I think it
>> makes some sense to keep the new translation separate from the existing
>> one.
>>
>> The determination mentioned above could first check if the switch id is
>> set and then may make further checks: for example that there is space in
>> the hardware for a new flow, that all the matches and actions of the flow
>> may be offloaded.
>>
>> There seems to be ample scope for complex logic to determine which flows
>> should be offloaded. And I believe that one motivation for handling
>> offloads in user-space for such complex logic to live in user-space.
>
> I think there needs to be more thought around the long term
> ramifications of this model. Aside from the potential conflicts with
> kernel that I mentioned above as well as the inevitable replication of
> functionality between kernel and userspace, I don't see that we have
> any good precedents for dynamically managing a HW offload from user
> space like this. AFAIK, all current networking offloads are managed by
> kernel or device, and I believe iSCSI, RDMA qp's, and even TOE
> offloads were all managed in the kernel. The basic problem of choosing
> best M of N total flows to offload really isn't fundamentally
> different than some other kernel mechanisms such as how we need to
> manage the memory allocated to the page cache.
There is at least some precedence today where we configure VFs and the
hardware VEB/VEPA to forward traffic via 'ip' and 'fdb' dynamically. If
we get an indication from the controller a new VM has landed on the VF
and the controller indicates it should only send MAC/VLAN x we add it
to the hardware.
I would argue the controller is where the context to "know" which flows
should be sent to which VMs/queue_pairs/etc. The controller also has a
policy it wants to enforce on the VMs and hypervisor the kernel doesn't
have any of this context.
So without any of this context how can we build policy that requires
flows to be sent directly to a VM/queue-set or pre-processed by
hardware. Its not clear to me how the kernel can decide which flows are
the "best" in this case. Three cases come to mind (1) I always want
this done in hardware or I'll move my application/VM/whatever to another
system, (2) try and program this flow in hardware but if you can't its
a don't care, and (3) never offload this flow. We may dynamically
change the criteria above depending on external configuration/policy
events. If its a specific application the same three cases apply. It
might be required that pre-processing happens in hardware to meet
performance guarantees, it might be a nice to have, or it might be an
application that we never want to do pre-processing in hardware.
Another case is where you have two related rules possibly in different
subsystems. If you offload a route that depends on setting some meta
data for example but don't offload the rule to set the metadata the
route offload is useless and consuming hardware resources. So you need
to account for this as well and its not clear to me how to do this in
the kernel cleanly.
The conflicts issue I think can be resolved as noted above.
>
>> However, in order to keep things simple in the beginning I propose some
>> very simple logic: offload all flows that the hardware supports up until
>> the hardware runs out of space.
>>
>> This seems like a reasonable start keeping in mind that all flows will
>> also be added to the datapath and that ovs-vswitchd constructs flows such
>> that they do not overlap.
>>
> Again, who will enforce this?
This is the OVS user space and only one policy. We can build better ones
following this. But from the kernel perspective it only gets requests to
add flows or delete flows it doesn't have the above policy embedded in
the kernel.
You could implement the same policy on top of the L3 offloads if you
wanted. Load L3 rules into hardware until its full then stop in this
case its the application driving the L3 interface that implements the
policy we are saying the same thing here for OVS.
>
>> A more conservative version of this simple rule would be to remove all
>> flows from hardware if a flow is encountered that is not to be added to
>> hardware. That is, ensure either all flows that are in hardware are also
>> in software or no flows are in hardware at all. This is the approach
>> being initially taken for L3 offloads in the Linux kernel[5].
>>
> That approach is non-starter for real deployment anyway. Graceful
> degradation is a fundamental requirement.
Agreed, but we can improve it by making the applications smarter.
>
>> [5] http://thread.gmane.org/gmane.linux.network/352481/focus=352658
>>
>> * It seems to me that somewhat tricky problem is how to manage flows in
>> hardware. As things stand ovs-vswitchd generally manages flows in the
>> datapath by dumping flows, inspecting the dumped flows to see how
>> recently they have been used and removing idle flows from the datapath.
>> Unfortunately this approach may not be well suited to flows offloaded to
>> hardware as dumping flows may be prohibitively expensive. As such I would
>> like some consideration given to three approaches. Perhaps in the end all
>> will need to be supported. And perhaps there are others:
>>
>> 1. Dump Flows
>> This is the approach currently taken to managing datapath flows. As
>> stated above my feeling is that this will not be well suited much
>> hardware. However, for simplicity it may be a good place to start.
>>
>> 2. Notifications
>> In this approach flows are added to hardware with a soft timeout and
>> hardware removes flows when they timeout sending a notification when
>> that occurs. Notifications would be relayed up to user space from the
>> driver in the kernel. Some effort may be required to mitigate
>> notification storms if many flows are removed in a short space of
>> time. It is also of note that there is likely to be hardware that
>> can't generate notifications on flow removal.
>>
>> 3. Aging in hardware
>> In this approach flows are added to hardware with a soft timeout and
>> hardware removes the flows when they timeout. However no notification
>> is generated. And thus ovs-vswitchd has no way of knowing if a flow is
>> still present in hardware or not. From a hardware point of view this
>> seems to be the simplest to support. But I suspect that it would
>> present some significant challenges to ovs-vswitchd in the context of
>> its current implementation of flow management. Especially if flows are
>> also to be present in the datapath as proposed above.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
John Fastabend Intel Corporation
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: OVS Offload Decision Proposal
2015-03-04 19:07 ` John Fastabend
@ 2015-03-04 21:36 ` Tom Herbert
2015-03-05 1:58 ` John Fastabend
0 siblings, 1 reply; 19+ messages in thread
From: Tom Herbert @ 2015-03-04 21:36 UTC (permalink / raw)
To: John Fastabend
Cc: Simon Horman, dev@openvswitch.org, Linux Netdev List, Neil Horman,
tgraf
On Wed, Mar 4, 2015 at 11:07 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> On 03/04/2015 08:45 AM, Tom Herbert wrote:
>>
>> Hi Simon, a few comments inline.
>>
>> On Tue, Mar 3, 2015 at 5:18 PM, Simon Horman <simon.horman@netronome.com>
>> wrote:
>>>
>>> [ CCed netdev as although this is primarily about Open vSwitch userspace
>>> I believe there are some interested parties not on the Open vSwitch
>>> dev mailing list ]
>>>
>>> Hi,
>>>
>>> The purpose of this email is to describe a rough design for driving Open
>>> vSwitch flow offload from user-space. But before getting to that I would
>>> like to provide some background information.
>>>
>>> The proposed design is for "OVS Offload Decision": a proposed component
>>> of
>>> ovs-vswitchd. In short the top-most red box in the first figure in the
>>> "OVS HW Offload Architecture" document edited by Thomas Graf[1].
>>>
>>> [1]
>>> https://docs.google.com/document/d/195waUliu7G5YYVuXHmLmHgJ38DFSte321WPq0oaFhyU/edit#heading=h.116je16s8xzw
>>>
>>> Assumptions
>>> -----------
>>>
>>> There is currently a lively debate on various aspects of flow offloads
>>> within the Linux networking community. As of writing the latest
>>> discussion
>>> centers around the "Flows! Offload them." thread[2] on the netdev mailing
>>> list.
>>>
>>> [2] http://thread.gmane.org/gmane.linux.network/351860
>>>
>>> My aim is not to preempt the outcome of those discussions. But rather to
>>> investigate what offloads might look like in ovs-vswitchd. In order to
>>> make
>>> that investigation concrete I have made some assumptions about facilities
>>> that may be provided by the kernel in future. Clearly if the discussions
>>> within the Linux networking community end in a solution that differs from
>>> my assumptions then this work will need to be revisited. Indeed, I
>>> entirely
>>> expect this work to be revised and refined and possibly even radically
>>> rethought as time goes on.
>>>
>>> That said, my working assumptions are:
>>>
>>> * That Open vSwitch may manage flow offloads from user-space. This is as
>>> opposed to them being transparently handled in the datapath. This does
>>> not preclude the existence of transparent offloading in the datapath.
>>> But rather limits this discussion to a mode where offloads are managed
>>> from user-space.
>>>
>>> * That Open vSwitch may add flows to hardware via an API provided by the
>>> kernel. In particular my working assumption is that the Flow API
>>> proposed
>>> by John Fastabend[3] may be used to add flows to hardware. While the
>>> existing netlink API may be used to add flows to the kernel datapath.
>>>
>> Doesn't this imply two entities to be independently managing the same
>> physical resource? If so, this raises questions of how the resource
>> would be partitioned between them? How are conflicting requests
>> between the two rectified?
>
>
> What two entities? The driver + flow API code I have in this case manage
> the physical resource.
>
OVS and non-OVS kernel. Management in this context refers to policies
for optimizing use of the HW resource (like which subset of flows to
offload for best utilization).
> I'm guessing the conflict you are thinking about is if we want to use
> both L3 (or some other kernel subsystem) and OVS in the above case at
> the same time? Not sure if people actually do this but what I expect is
> the L3 sub-system should request a table from the hardware for L3
> routes. Then the driver/kernel can allocate a part of the hardware
> resources for L3 and a set for OVS.
>
I'm thinking of this as a more general problem. We've established that
the existing kernel mechanisms (routing, tc, qdiscs, etc) should and
maybe are required to work with these HW offloads. I don't think that
a model where we can't use offloads with OVS and kernel simultaneously
would fly, nor are we going to want the kernel to be dependent on OVS
for resource management. So at some point, these two are going to need
to work together somehow to share common HW resources. By this
reasoning, OVS offload can't be defined in a vacuum. Strict
partitioning only goes so far an inevitably leads to poor resource
utilization. For instance, if we gave OVS and kernel each 1000 flow
states each to offload, but OVS has 2000 flows that are inundated and
kernel ones are getting any traffic then we have achieved poor
utilization. This problem becomes even more evident when someone adds
rate limiting to flows. What would it mean if both OVS and kernel
tried to instantiate a flow with guaranteed line rate bandwidth? It
seems like we need either a centralized resource manager, or at least
some sort of fairly dynamic delegation mechanism for managing the
resource (presumably kernel is master of the resource).
Maybe a solution to all of this has already been fleshed out, but I
didn't readily see this in Simon's write-up.
Thanks,
Tom
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [ovs-dev] OVS Offload Decision Proposal
2015-03-04 16:45 ` Tom Herbert
2015-03-04 19:07 ` John Fastabend
@ 2015-03-05 0:04 ` David Christensen
[not found] ` <3A5015FE9E557D448AF7238AF0ACE20A2D8AE08A-Wwdb2uEOBX+nNEFK5l6JbL1+IgudQmzARxWJa1zDYLQ@public.gmane.org>
1 sibling, 1 reply; 19+ messages in thread
From: David Christensen @ 2015-03-05 0:04 UTC (permalink / raw)
To: Tom Herbert, Simon Horman; +Cc: dev@openvswitch.org, Linux Netdev List
> > That said, my working assumptions are:
> >
> > * That Open vSwitch may manage flow offloads from user-space. This is as
> > opposed to them being transparently handled in the datapath. This does
> > not preclude the existence of transparent offloading in the datapath.
> > But rather limits this discussion to a mode where offloads are managed
> > from user-space.
> >
> > * That Open vSwitch may add flows to hardware via an API provided by the
> > kernel. In particular my working assumption is that the Flow API
> proposed
> > by John Fastabend[3] may be used to add flows to hardware. While the
> > existing netlink API may be used to add flows to the kernel datapath.
> >
> Doesn't this imply two entities to be independently managing the same
> physical resource? If so, this raises questions of how the resource
> would be partitioned between them? How are conflicting requests
> between the two rectified?
The consensus at Netdev was that "set" operations would be removed from
flow API to limit hardware management to the kernel only. Existing "get"
operations would remain so user space is aware of the device capabilities.
Dave
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: OVS Offload Decision Proposal
[not found] ` <3A5015FE9E557D448AF7238AF0ACE20A2D8AE08A-Wwdb2uEOBX+nNEFK5l6JbL1+IgudQmzARxWJa1zDYLQ@public.gmane.org>
@ 2015-03-05 1:54 ` John Fastabend
2015-03-05 5:00 ` [ovs-dev] " David Miller
0 siblings, 1 reply; 19+ messages in thread
From: John Fastabend @ 2015-03-05 1:54 UTC (permalink / raw)
To: David Christensen
Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, Simon Horman,
Linux Netdev List, Pablo Neira Ayuso, Tom Herbert
On 03/04/2015 04:04 PM, David Christensen wrote:
>>> That said, my working assumptions are:
>>>
>>> * That Open vSwitch may manage flow offloads from user-space. This is as
>>> opposed to them being transparently handled in the datapath. This does
>>> not preclude the existence of transparent offloading in the datapath.
>>> But rather limits this discussion to a mode where offloads are managed
>>> from user-space.
>>>
>>> * That Open vSwitch may add flows to hardware via an API provided by the
>>> kernel. In particular my working assumption is that the Flow API
>> proposed
>>> by John Fastabend[3] may be used to add flows to hardware. While the
>>> existing netlink API may be used to add flows to the kernel datapath.
>>>
>> Doesn't this imply two entities to be independently managing the same
>> physical resource? If so, this raises questions of how the resource
>> would be partitioned between them? How are conflicting requests
>> between the two rectified?
>
> The consensus at Netdev was that "set" operations would be removed from
> flow API to limit hardware management to the kernel only. Existing "get"
> operations would remain so user space is aware of the device capabilities.
>
> Dave
I think a set operation _is_ necessary for OVS and other applications
that run in user space. The more I work with this the clearer it is
that this is needed for a class of applications/controllers that want
to work on a richer set of the pipeline for optimization reasons. For
example OVS doesn't want to query/set on a single table of the pipeline
but possibly multiple tables and it needs to know the layout. Now
if you want to make that set operation look like a 'tc' command or 'nft'
command I think we can debate that. Although I'm in favour of keeping
the existing flow api with the 'set' command for the class of
applications that want to use it. The set operation as it exists now
is ideal for the hardware case 'nft' will require a translation step.
'tc' is actually a bit closer IMO but I'm not sure 'tc' applications
want to work on optimizing hardware tables.
We need to support both models, both the kernel consumer and the user
space consumer.
What was wrong with the initial set which I've subsequently resolved is
it needs to be constrained a bit to only allow well-defined actions and
well-defined matches. Also the core module needs to manage the hardware
resource and ensure it is managed correctly so that multiple consumers
do not stomp on eachother.
I've CC'd Pablo.
Thanks,
.John
--
John Fastabend Intel Corporation
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: OVS Offload Decision Proposal
2015-03-04 21:36 ` Tom Herbert
@ 2015-03-05 1:58 ` John Fastabend
2015-03-06 0:44 ` Neil Horman
0 siblings, 1 reply; 19+ messages in thread
From: John Fastabend @ 2015-03-05 1:58 UTC (permalink / raw)
To: Tom Herbert
Cc: Simon Horman, dev@openvswitch.org, Linux Netdev List, Neil Horman,
tgraf
[...]
>>> Doesn't this imply two entities to be independently managing the same
>>> physical resource? If so, this raises questions of how the resource
>>> would be partitioned between them? How are conflicting requests
>>> between the two rectified?
>>
>>
>> What two entities? The driver + flow API code I have in this case manage
>> the physical resource.
>>
> OVS and non-OVS kernel. Management in this context refers to policies
> for optimizing use of the HW resource (like which subset of flows to
> offload for best utilization).
>
>> I'm guessing the conflict you are thinking about is if we want to use
>> both L3 (or some other kernel subsystem) and OVS in the above case at
>> the same time? Not sure if people actually do this but what I expect is
>> the L3 sub-system should request a table from the hardware for L3
>> routes. Then the driver/kernel can allocate a part of the hardware
>> resources for L3 and a set for OVS.
>>
> I'm thinking of this as a more general problem. We've established that
> the existing kernel mechanisms (routing, tc, qdiscs, etc) should and
> maybe are required to work with these HW offloads. I don't think that
> a model where we can't use offloads with OVS and kernel simultaneously
> would fly, nor are we going to want the kernel to be dependent on OVS
> for resource management. So at some point, these two are going to need
> to work together somehow to share common HW resources. By this
> reasoning, OVS offload can't be defined in a vacuum. Strict
> partitioning only goes so far an inevitably leads to poor resource
> utilization. For instance, if we gave OVS and kernel each 1000 flow
> states each to offload, but OVS has 2000 flows that are inundated and
> kernel ones are getting any traffic then we have achieved poor
> utilization. This problem becomes even more evident when someone adds
> rate limiting to flows. What would it mean if both OVS and kernel
> tried to instantiate a flow with guaranteed line rate bandwidth? It
> seems like we need either a centralized resource manager, or at least
> some sort of fairly dynamic delegation mechanism for managing the
> resource (presumably kernel is master of the resource).
>
> Maybe a solution to all of this has already been fleshed out, but I
> didn't readily see this in Simon's write-up.
I agree with all this and no I don't think it is all flushed out yet.
I currently have something like the following although currently
proto-typed on a user space driver I plan to move the prototype into
the kernel rocker switch over the next couple weeks. The biggest amount
of work left is getting a "world" into rocker that doesn't have a
pre-defined table model and implementing constraints on the resources
to reflect how the tables are created.
Via user space tool I can call into an API to allocate tables,
#./flowtl create table type flow name flow-table \
matches $my_matches actions $my_actions \
size 1024 source 1
this allocates a flow table resource in the hardware with the identifier
'flow-table' that can match on fields in $my_matches and provide actions
in $my_actions. This lets the driver create an optimized table in the
hardware that matches on just the matches and just the actions. One
reason we need this is because if the hardware (at least the hardware I
generally work on) tries to use wide matches it is severely limited in
the number of entries it can support. But if you build tables that just
match on the relevant fields we can support many more entries in the
table.
Then I have a few other 'well-defined' types to handle L3, L2.
#./flowtl create table type l3-route route-table size 2048 source dflt
these don't need matches/actions specifiers because it is known what
a l3-route type table is. Similarly we can have a l2 table,
#./flowtl create table type l2-fwd l2-table size 8k source dflt
the 'source' field instructs the hardware where to place the table in
the forwarding pipeline. I use 'dflt' to indicate the driver should
place it in the "normal" spot for that type.
Then the flow-api module in the kernel acts as the resource manager. If
a "route" rule is received it maps to the l3-route table if a l2 ndo op
is received we point it at the "l2-table" and so on. User space flowtl
set rule commands can only be directed at tables of type 'flow'. If the
user tries to push a flow rule into l2-table or l3-table it will be
rejected because these are reserved for the kernel subsystems.
I would expect OVS user space data plane for example to reserve a table
or maybe multiple tables like this,
#./flowtl create table type flow name ovs-table-1 \
matches $ovs_matches1 actions $ovs_actions1 \
size 1k source 1
#./flowtl create table type flow name ovs-table-2 \
matches $ovs_matches2 actions $ovs_actoins2 \
size 1k source 2
By manipulating the source fields you could have a table that forward
packets to the l2/l3 tables or a "flow" table depending on some criteria
or you could work the other way have a set of routes and if they miss
forward to a "flow" table. Other combinations are possible as well.
I hope that is helpful I'll try to do a better write-up when I post the
code. Also it seems like a reasonable approach to me any thoughts?
.John
--
John Fastabend Intel Corporation
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ovs-dev] OVS Offload Decision Proposal
2015-03-05 1:54 ` John Fastabend
@ 2015-03-05 5:00 ` David Miller
2015-03-05 5:20 ` Tom Herbert
0 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2015-03-05 5:00 UTC (permalink / raw)
To: john.fastabend; +Cc: davidch, therbert, simon.horman, dev, netdev, pablo
From: John Fastabend <john.fastabend@gmail.com>
Date: Wed, 04 Mar 2015 17:54:54 -0800
> I think a set operation _is_ necessary for OVS and other
> applications that run in user space.
It's necessary for the kernel to internally manage the chip
flow resources.
Full stop.
It's not being exported to userspace. That is exactly the kind
of open ended, outside the model, crap we're trying to avoid
by putting everything into the kernel where we have consistent
mechanisms, well understood behaviors, and rules.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ovs-dev] OVS Offload Decision Proposal
2015-03-05 5:00 ` [ovs-dev] " David Miller
@ 2015-03-05 5:20 ` Tom Herbert
2015-03-05 6:42 ` David Miller
0 siblings, 1 reply; 19+ messages in thread
From: Tom Herbert @ 2015-03-05 5:20 UTC (permalink / raw)
To: David Miller
Cc: john fastabend, David Christensen, Simon Horman,
dev@openvswitch.org, Linux Netdev List, Pablo Neira Ayuso
On Wed, Mar 4, 2015 at 9:00 PM, David Miller <davem@davemloft.net> wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Wed, 04 Mar 2015 17:54:54 -0800
>
>> I think a set operation _is_ necessary for OVS and other
>> applications that run in user space.
>
> It's necessary for the kernel to internally manage the chip
> flow resources.
>
> Full stop.
>
> It's not being exported to userspace. That is exactly the kind
> of open ended, outside the model, crap we're trying to avoid
> by putting everything into the kernel where we have consistent
> mechanisms, well understood behaviors, and rules.
David,
Just to make sure everyone is on the same page... this discussion has
been about where the policy of offload is implemented, not just who is
actually sending config bits to the device. The question is who gets
to decide how to best divvy up the finite resources of the device and
network amongst various requestors. Is this what you're referring to?
Thanks,
Tom
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ovs-dev] OVS Offload Decision Proposal
2015-03-05 5:20 ` Tom Herbert
@ 2015-03-05 6:42 ` David Miller
2015-03-05 7:39 ` John Fastabend
0 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2015-03-05 6:42 UTC (permalink / raw)
To: therbert; +Cc: john.fastabend, davidch, simon.horman, dev, netdev, pablo
From: Tom Herbert <therbert@google.com>
Date: Wed, 4 Mar 2015 21:20:41 -0800
> On Wed, Mar 4, 2015 at 9:00 PM, David Miller <davem@davemloft.net> wrote:
>> From: John Fastabend <john.fastabend@gmail.com>
>> Date: Wed, 04 Mar 2015 17:54:54 -0800
>>
>>> I think a set operation _is_ necessary for OVS and other
>>> applications that run in user space.
>>
>> It's necessary for the kernel to internally manage the chip
>> flow resources.
>>
>> Full stop.
>>
>> It's not being exported to userspace. That is exactly the kind
>> of open ended, outside the model, crap we're trying to avoid
>> by putting everything into the kernel where we have consistent
>> mechanisms, well understood behaviors, and rules.
>
> David,
>
> Just to make sure everyone is on the same page... this discussion has
> been about where the policy of offload is implemented, not just who is
> actually sending config bits to the device. The question is who gets
> to decide how to best divvy up the finite resources of the device and
> network amongst various requestors. Is this what you're referring to?
I'm talking about only the kernel being able to make ->set() calls
through the flow manager API to the device.
Resource control is the kernel's job.
You cannot delegate this crap between ipv4 routing in the kernel,
L2 bridging in the kernel, and some user space crap. It's simply
not going to happen.
All of the delegation of the hardware resource must occur in the
kernel. Because only the kernel has a full view of all of the
resources and how each and every subsystem needs to use it.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ovs-dev] OVS Offload Decision Proposal
2015-03-05 6:42 ` David Miller
@ 2015-03-05 7:39 ` John Fastabend
2015-03-05 12:37 ` Jamal Hadi Salim
0 siblings, 1 reply; 19+ messages in thread
From: John Fastabend @ 2015-03-05 7:39 UTC (permalink / raw)
To: David Miller; +Cc: therbert, davidch, simon.horman, dev, netdev, pablo
On 03/04/2015 10:42 PM, David Miller wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Wed, 4 Mar 2015 21:20:41 -0800
>
>> On Wed, Mar 4, 2015 at 9:00 PM, David Miller <davem@davemloft.net> wrote:
>>> From: John Fastabend <john.fastabend@gmail.com>
>>> Date: Wed, 04 Mar 2015 17:54:54 -0800
>>>
>>>> I think a set operation _is_ necessary for OVS and other
>>>> applications that run in user space.
>>>
>>> It's necessary for the kernel to internally manage the chip
>>> flow resources.
>>>
>>> Full stop.
>>>
>>> It's not being exported to userspace. That is exactly the kind
>>> of open ended, outside the model, crap we're trying to avoid
>>> by putting everything into the kernel where we have consistent
>>> mechanisms, well understood behaviors, and rules.
>>
>> David,
>>
>> Just to make sure everyone is on the same page... this discussion has
>> been about where the policy of offload is implemented, not just who is
>> actually sending config bits to the device. The question is who gets
>> to decide how to best divvy up the finite resources of the device and
>> network amongst various requestors. Is this what you're referring to?
>
> I'm talking about only the kernel being able to make ->set() calls
> through the flow manager API to the device.
>
> Resource control is the kernel's job.
>
> You cannot delegate this crap between ipv4 routing in the kernel,
> L2 bridging in the kernel, and some user space crap. It's simply
> not going to happen.
The intent was to reserve space in the tables for l2, l3, user space,
and whatever else is needed. This reservation needs to come from the
administrator because even the kernel doesn't know how much of my
table space I want to reserve for l2 vs l3 vs tc vs ... The sizing
of each of these tables will depend on the use case. If I'm provisioning
L3 networks I may want to create a large l3 table and no 'tc' table.
If I'm building a firewall box I might want a small l3 table and a
large 'tc' table. Also depending on how wide I want my matches in the
'tc' case I may consume more or less resources in the hardware.
Once the reservation of resources occurs we wouldn't let user space
arbitrarily write to any table but only tables that have been
explicitly reserved for user space to write to.
Even without the user space piece we need this reservation when
the table space for l2, l3, etc are shared. Otherwise driver writers
end up doing a best guess for you or end up delivering driver flavours
based on firmware and you can hope the driver writer guessed something
that is close to your network.
>
> All of the delegation of the hardware resource must occur in the
> kernel. Because only the kernel has a full view of all of the
> resources and how each and every subsystem needs to use it.
>
So I'm going to ask... even if we restrict the set() using the above
scheme to only work on pre-defined tables you see an issue with it?
I might be missing the point but I could similarly drive the set()
calls through 'tc' via a new filter call it xflow.
.John
--
John Fastabend Intel Corporation
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ovs-dev] OVS Offload Decision Proposal
2015-03-05 7:39 ` John Fastabend
@ 2015-03-05 12:37 ` Jamal Hadi Salim
2015-03-05 13:16 ` Jamal Hadi Salim
0 siblings, 1 reply; 19+ messages in thread
From: Jamal Hadi Salim @ 2015-03-05 12:37 UTC (permalink / raw)
To: John Fastabend, David Miller
Cc: therbert, davidch, simon.horman, dev, netdev, pablo
On 03/05/15 02:39, John Fastabend wrote:
>
> The intent was to reserve space in the tables for l2, l3, user space,
> and whatever else is needed. This reservation needs to come from the
> administrator because even the kernel doesn't know how much of my
> table space I want to reserve for l2 vs l3 vs tc vs ... The sizing
> of each of these tables will depend on the use case. If I'm provisioning
> L3 networks I may want to create a large l3 table and no 'tc' table.
> If I'm building a firewall box I might want a small l3 table and a
> large 'tc' table. Also depending on how wide I want my matches in the
> 'tc' case I may consume more or less resources in the hardware.
>
Would kernel boot/module options passed to the driver not suffice?
That implies a central authority that decides what these table size
slicing looks like.
> Once the reservation of resources occurs we wouldn't let user space
> arbitrarily write to any table but only tables that have been
> explicitly reserved for user space to write to.
>
How would one allow for a bypass to create tables (a write command)
but not to write to said tables? likely i am missing something
subtle.
cheers,
jamal
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ovs-dev] OVS Offload Decision Proposal
2015-03-05 12:37 ` Jamal Hadi Salim
@ 2015-03-05 13:16 ` Jamal Hadi Salim
2015-03-05 14:52 ` John Fastabend
0 siblings, 1 reply; 19+ messages in thread
From: Jamal Hadi Salim @ 2015-03-05 13:16 UTC (permalink / raw)
To: John Fastabend, David Miller
Cc: therbert, davidch, simon.horman, dev, netdev, pablo
On 03/05/15 07:37, Jamal Hadi Salim wrote:
> On 03/05/15 02:39, John Fastabend wrote:
> Would kernel boot/module options passed to the driver not suffice?
> That implies a central authority that decides what these table size
> slicing looks like.
>
>> Once the reservation of resources occurs we wouldn't let user space
>> arbitrarily write to any table but only tables that have been
>> explicitly reserved for user space to write to.
Seems i misread what you are saying.
I thought you wanted to just create the tables from user space
directly; however, rereading the above:
you are actually asking *to write* to these tables directly from user
space ;->
cheers,
jamal
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ovs-dev] OVS Offload Decision Proposal
2015-03-05 13:16 ` Jamal Hadi Salim
@ 2015-03-05 14:52 ` John Fastabend
2015-03-05 16:33 ` B Viswanath
0 siblings, 1 reply; 19+ messages in thread
From: John Fastabend @ 2015-03-05 14:52 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: David Miller, therbert, davidch, simon.horman, dev, netdev, pablo
On 03/05/2015 05:16 AM, Jamal Hadi Salim wrote:
> On 03/05/15 07:37, Jamal Hadi Salim wrote:
>> On 03/05/15 02:39, John Fastabend wrote:
>
>> Would kernel boot/module options passed to the driver not suffice?
>> That implies a central authority that decides what these table size
>> slicing looks like.
>>
The problem with boot/module options is they are really difficult to
manage from a controller agent. And yes in most cases I am working on
there is central authority that "knows" how to map the network policy
onto a set of table size slices. At least in my space I don't believe
people are logging into systems and using the CLI except for debugging
and experimenting.
>>> Once the reservation of resources occurs we wouldn't let user space
>>> arbitrarily write to any table but only tables that have been
>>> explicitly reserved for user space to write to.
>
> Seems i misread what you are saying.
> I thought you wanted to just create the tables from user space
> directly; however, rereading the above:
> you are actually asking *to write* to these tables directly from user
> space ;->
>
>
Actually I was proposing both. But I can see a workaround for the set
rule or *to write* by mapping a new xflow classifier onto my hardware.
Not ideal for my work but I guess it might be possible.
The 'create' table from user space though I don't see any good work
around for. You need this in order to provide some guidance to the
driver otherwise we have to try and "guess" what the table size slicing
should look like and this can create rather large variations in how
many rules fit in the table think 100 - 100k difference. Also at least
on the hardware I have this is not dynamic I can't start adding rules
to a table and then do a resizing later without disrupting the traffic.
It would be interesting for folks working on other switch devices to
chime in.
Also just to the point out even in the 'set' case we wouldn't let
arbitrary 'set rule' writes hit the hardware we would verify the rule
set is for a table that is pre-defined for it and that the rule itself
is well-formed. In that sense the xflow classifier path is not
particularly different.
cheers,
> jamal
--
John Fastabend Intel Corporation
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ovs-dev] OVS Offload Decision Proposal
2015-03-05 14:52 ` John Fastabend
@ 2015-03-05 16:33 ` B Viswanath
2015-03-05 17:45 ` B Viswanath
0 siblings, 1 reply; 19+ messages in thread
From: B Viswanath @ 2015-03-05 16:33 UTC (permalink / raw)
To: John Fastabend
Cc: Jamal Hadi Salim, David Miller, therbert, davidch, simon.horman,
dev, netdev@vger.kernel.org, pablo
On 5 March 2015 at 20:22, John Fastabend <john.fastabend@gmail.com> wrote:
> On 03/05/2015 05:16 AM, Jamal Hadi Salim wrote:
>>
<snip>
>>>> Once the reservation of resources occurs we wouldn't let user space
>>>> arbitrarily write to any table but only tables that have been
>>>> explicitly reserved for user space to write to.
>>
>>
>> Seems i misread what you are saying.
>> I thought you wanted to just create the tables from user space
>> directly; however, rereading the above:
>> you are actually asking *to write* to these tables directly from user
>> space ;->
>>
>>
>
> Actually I was proposing both. But I can see a workaround for the set
> rule or *to write* by mapping a new xflow classifier onto my hardware.
> Not ideal for my work but I guess it might be possible.
>
> The 'create' table from user space though I don't see any good work
> around for. You need this in order to provide some guidance to the
> driver otherwise we have to try and "guess" what the table size slicing
> should look like and this can create rather large variations in how
> many rules fit in the table think 100 - 100k difference. Also at least
> on the hardware I have this is not dynamic I can't start adding rules
> to a table and then do a resizing later without disrupting the traffic.
> It would be interesting for folks working on other switch devices to
> chime in.
Some of these abstractions are a little tough to map into for me.
Probably I need more reading. But it is very interesting to follow
the central resource manager notion. The question being asked is where
this will be, user space or kernel space.
The drivers (and the SDKs) I have worked upon provided a simple add
rule , delete rule notion. They have hidden away the complexity of how
it is managing stuff inside. They do tend to be complicated. For
example, the simple question of how many rules can be supported inside
a chip, doesn't have a single answer. It depends on how deep the
packets need to be looked into, how many tags the packets are expected
to be supported, ipv6, tunnels, tunnels inside tunnels and many other
factors. Depending on the chip, some of the standard operations (such
as vlans) are managed via rules, and some chips support them natively.
This knowledge tends to be chip specific and very likely varies from
chip to chip and manufacturer to manufacturer.
Given this, the place the 'resources (rules)' need to be managed
should be close to the chip, the driver. Kernel ? May be. It needs to
define lot of 'generic' interfaces and manage in a single place. User
space ? I don't think it can do it, not for the chips I am aware of.
Note that by 'user space', I mean close to user and not an SDK
running in user space.
>
> Also just to the point out even in the 'set' case we wouldn't let
> arbitrary 'set rule' writes hit the hardware we would verify the rule
> set is for a table that is pre-defined for it and that the rule itself
> is well-formed. In that sense the xflow classifier path is not
> particularly different.
>
> cheers,
>>
>> jamal
>
>
>
> --
> John Fastabend Intel Corporation
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [ovs-dev] OVS Offload Decision Proposal
2015-03-05 16:33 ` B Viswanath
@ 2015-03-05 17:45 ` B Viswanath
[not found] ` <CAN+pFw+LDAiebOzFF+DD81vJp7y0OfVg=5BE0m47B2ZUp6zpeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: B Viswanath @ 2015-03-05 17:45 UTC (permalink / raw)
To: John Fastabend
Cc: Jamal Hadi Salim, David Miller, therbert, davidch, simon.horman,
dev, netdev@vger.kernel.org, pablo
On 5 March 2015 at 22:03, B Viswanath <marichika4@gmail.com> wrote:
> On 5 March 2015 at 20:22, John Fastabend <john.fastabend@gmail.com> wrote:
>> On 03/05/2015 05:16 AM, Jamal Hadi Salim wrote:
>>>
> <snip>
>>>>> Once the reservation of resources occurs we wouldn't let user space
>>>>> arbitrarily write to any table but only tables that have been
>>>>> explicitly reserved for user space to write to.
>>>
>>>
>>> Seems i misread what you are saying.
>>> I thought you wanted to just create the tables from user space
>>> directly; however, rereading the above:
>>> you are actually asking *to write* to these tables directly from user
>>> space ;->
>>>
>>>
>>
>> Actually I was proposing both. But I can see a workaround for the set
>> rule or *to write* by mapping a new xflow classifier onto my hardware.
>> Not ideal for my work but I guess it might be possible.
>>
>> The 'create' table from user space though I don't see any good work
>> around for. You need this in order to provide some guidance to the
>> driver otherwise we have to try and "guess" what the table size slicing
>> should look like and this can create rather large variations in how
>> many rules fit in the table think 100 - 100k difference. Also at least
>> on the hardware I have this is not dynamic I can't start adding rules
>> to a table and then do a resizing later without disrupting the traffic.
>> It would be interesting for folks working on other switch devices to
>> chime in.
>
> Some of these abstractions are a little tough to map into for me.
> Probably I need more reading. But it is very interesting to follow
> the central resource manager notion. The question being asked is where
> this will be, user space or kernel space.
>
> The drivers (and the SDKs) I have worked upon provided a simple add
> rule , delete rule notion. They have hidden away the complexity of how
> it is managing stuff inside. They do tend to be complicated. For
> example, the simple question of how many rules can be supported inside
> a chip, doesn't have a single answer. It depends on how deep the
> packets need to be looked into, how many tags the packets are expected
> to be supported, ipv6, tunnels, tunnels inside tunnels and many other
> factors. Depending on the chip, some of the standard operations (such
> as vlans) are managed via rules, and some chips support them natively.
> This knowledge tends to be chip specific and very likely varies from
> chip to chip and manufacturer to manufacturer.
>
> Given this, the place the 'resources (rules)' need to be managed
> should be close to the chip, the driver. Kernel ? May be. It needs to
> define lot of 'generic' interfaces and manage in a single place. User
> space ? I don't think it can do it, not for the chips I am aware of.
> Note that by 'user space', I mean close to user and not an SDK
> running in user space.
I would like to mention couple of real world issues I faced in a
previous life, with some of the switch chips, to support my argument
about where the rules must be managed.
1. One of the chips required that I install three rules to just have a
simple packet classification into a vlan, based on Source MAC. Two
rules out of these three are work-around rules to avoid a bug on the
silicon. One of the work around rules is needed to be installed only
for the first mac-to-vlan classification rule . In other words, if a
mac-to-vlan classification rule is already installed, I need to
install just two rules.
2. Some chips can detect voice traffic, and automatically classify
such traffic into a vlan. So no rules are necessary for voice-vlans.
For some chips, a list of the OUI of the VOIP phones to be supported
is configured as a set of rules for matching incoming packets.
So, for a seemingly simple operation, the number of rules that will be
exhausted out of the available set will vary depending on the chip. It
gets very difficult to manage this rule set as we move away from the
driver. Only the driver knows all the needed work-arounds to make
something happen.
Vissu
>
>
>>
>> Also just to the point out even in the 'set' case we wouldn't let
>> arbitrary 'set rule' writes hit the hardware we would verify the rule
>> set is for a table that is pre-defined for it and that the rule itself
>> is well-formed. In that sense the xflow classifier path is not
>> particularly different.
>>
>> cheers,
>>>
>>> jamal
>>
>>
>>
>> --
>> John Fastabend Intel Corporation
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: OVS Offload Decision Proposal
[not found] ` <CAN+pFw+LDAiebOzFF+DD81vJp7y0OfVg=5BE0m47B2ZUp6zpeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-03-05 19:21 ` David Miller
0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2015-03-05 19:21 UTC (permalink / raw)
To: marichika4-Re5JQEeQqe8AvxtiuMwx3w
Cc: dev-yBygre7rU0TnMu66kgdUjQ, simon.horman-wFxRvT7yatFl57MIdRCFDg,
john.fastabend-Re5JQEeQqe8AvxtiuMwx3w, jhs-jkUAjuhPggJWk0Htik3J/w,
netdev-u79uwXL29TY76Z2rM5mHXA, pablo-Cap9r6Oaw4JrovVCs/uTlw,
therbert-hpIqsD4AKlfQT0dZR+AlfA
I find it funny that we haven't even got a L3 forwarding
implementation fleshed out enough to merge into the tree, and people
are talking about VOIP to VLAN classification, hw bug workarounds, and
shit like that.
Everyone is really jumping the gun on all of this.
Nobody knows what we will need, and I do mean nobody. Not me, not
switch hardware guys, not people working on the code actively right
now. Nobody.
The only way to find out is to _do_, in small incremental steps,
rather than big revolutionary changes.
Simplify, consolidate, and optimize later.
Let's get something that works for at least the simplest cases first.
We don't even have that yet.
So if people could drive their attention towards Scott's L3 forwarding
work instead of this flow crap which is too far into the horizon to
even be properly seen yet, I'd _really_ appreciate it.
Thanks.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: OVS Offload Decision Proposal
2015-03-05 1:58 ` John Fastabend
@ 2015-03-06 0:44 ` Neil Horman
[not found] ` <20150306004425.GB6785-0o1r3XBGOEbbgkc5XkKeNuvMHUBZFtU3YPYVAmT7z5s@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Neil Horman @ 2015-03-06 0:44 UTC (permalink / raw)
To: John Fastabend
Cc: Tom Herbert, Simon Horman, dev@openvswitch.org, Linux Netdev List,
tgraf
On Wed, Mar 04, 2015 at 05:58:08PM -0800, John Fastabend wrote:
> [...]
>
> >>>Doesn't this imply two entities to be independently managing the same
> >>>physical resource? If so, this raises questions of how the resource
> >>>would be partitioned between them? How are conflicting requests
> >>>between the two rectified?
> >>
> >>
> >>What two entities? The driver + flow API code I have in this case manage
> >>the physical resource.
> >>
> >OVS and non-OVS kernel. Management in this context refers to policies
> >for optimizing use of the HW resource (like which subset of flows to
> >offload for best utilization).
> >
> >>I'm guessing the conflict you are thinking about is if we want to use
> >>both L3 (or some other kernel subsystem) and OVS in the above case at
> >>the same time? Not sure if people actually do this but what I expect is
> >>the L3 sub-system should request a table from the hardware for L3
> >>routes. Then the driver/kernel can allocate a part of the hardware
> >>resources for L3 and a set for OVS.
> >>
> >I'm thinking of this as a more general problem. We've established that
> >the existing kernel mechanisms (routing, tc, qdiscs, etc) should and
> >maybe are required to work with these HW offloads. I don't think that
> >a model where we can't use offloads with OVS and kernel simultaneously
> >would fly, nor are we going to want the kernel to be dependent on OVS
> >for resource management. So at some point, these two are going to need
> >to work together somehow to share common HW resources. By this
> >reasoning, OVS offload can't be defined in a vacuum. Strict
> >partitioning only goes so far an inevitably leads to poor resource
> >utilization. For instance, if we gave OVS and kernel each 1000 flow
> >states each to offload, but OVS has 2000 flows that are inundated and
> >kernel ones are getting any traffic then we have achieved poor
> >utilization. This problem becomes even more evident when someone adds
> >rate limiting to flows. What would it mean if both OVS and kernel
> >tried to instantiate a flow with guaranteed line rate bandwidth? It
> >seems like we need either a centralized resource manager, or at least
> >some sort of fairly dynamic delegation mechanism for managing the
> >resource (presumably kernel is master of the resource).
> >
> >Maybe a solution to all of this has already been fleshed out, but I
> >didn't readily see this in Simon's write-up.
>
In addition to Johns notes below, I think its important to keep in mind here
that no one is explicitly setting out make OVS offload and kernel dataplane
offload mutually exclusive, nor do I think that any of the available proposals
are actually doing so. We just have two use cases that require different
semantics to make efficient use of those offloads within their own environments.
OVS, in Johns world requires fine grained control of the hardware dataplane, so
that the OVS bridge can optimally pass off the most cycle constrained operations
to the hardware, be that L2/L3, or some combination of both, in an effort to
maximize whatever aggregate software/hardware datapath it wishes to construct
based on user supplied rules.
Alternatively, kernel functional offloads already have very well defined
semantics, and more than anything else really just want to enforce those
semantics in hardware to opportunistically accelerate data movement when
possible, but not if it means sacrificing how the users interacts with those
functions (routing should still act like routing, bridging like bridging, etc).
That may require somewhat less efficient resource utilization than we could
otherwise achieve in the hardware, but if the goal is semantic consistency, that
may be a necessecary trade off.
As to co-existence, theres no reason that both the models can't operate in
parallel, as long as the API's for resource management collaborate under the
covers. The only question is, does the hardware have enough resources to do
both? I expect the answer is, not likely (though in some situations it may).
But for that very reason we need to make that resource allocation an
administrative decision. For kernel functionality, the only aspect of the
offload that we should expose to the user is an on/off switch, and possibly some
parameters with which to define offload resource sizing and policy. I.e.
commands like:
ip neigh offload enable dev sw0 cachesize 1000 policy lru
to reserve 1000 entries to store l2 lookups with a least recently used
replacement policy
or
ip route offload enable dev sw0 cachesize 1000 policy maxuse
to reserve 1000 entries to store l3 lookups with a replacement policy that only
replaces routes who's hit count is larger than the least used in the cache
By enabling kernel functionality like that, remaining resources can be used by
the lower level API for things like OVS to use. If there aren't enough left to
enable OVS offload, so be it. the administrator has all the tools at their
disposal with which to reduce resource usage in one area in order to free them
for use in another.
Best
Neil
> I agree with all this and no I don't think it is all flushed out yet.
>
> I currently have something like the following although currently
> proto-typed on a user space driver I plan to move the prototype into
> the kernel rocker switch over the next couple weeks. The biggest amount
> of work left is getting a "world" into rocker that doesn't have a
> pre-defined table model and implementing constraints on the resources
> to reflect how the tables are created.
>
> Via user space tool I can call into an API to allocate tables,
>
> #./flowtl create table type flow name flow-table \
> matches $my_matches actions $my_actions \
> size 1024 source 1
>
> this allocates a flow table resource in the hardware with the identifier
> 'flow-table' that can match on fields in $my_matches and provide actions
> in $my_actions. This lets the driver create an optimized table in the
> hardware that matches on just the matches and just the actions. One
> reason we need this is because if the hardware (at least the hardware I
> generally work on) tries to use wide matches it is severely limited in
> the number of entries it can support. But if you build tables that just
> match on the relevant fields we can support many more entries in the
> table.
>
> Then I have a few other 'well-defined' types to handle L3, L2.
>
> #./flowtl create table type l3-route route-table size 2048 source dflt
>
> these don't need matches/actions specifiers because it is known what
> a l3-route type table is. Similarly we can have a l2 table,
>
> #./flowtl create table type l2-fwd l2-table size 8k source dflt
>
> the 'source' field instructs the hardware where to place the table in
> the forwarding pipeline. I use 'dflt' to indicate the driver should
> place it in the "normal" spot for that type.
>
> Then the flow-api module in the kernel acts as the resource manager. If
> a "route" rule is received it maps to the l3-route table if a l2 ndo op
> is received we point it at the "l2-table" and so on. User space flowtl
> set rule commands can only be directed at tables of type 'flow'. If the
> user tries to push a flow rule into l2-table or l3-table it will be
> rejected because these are reserved for the kernel subsystems.
>
> I would expect OVS user space data plane for example to reserve a table
> or maybe multiple tables like this,
>
> #./flowtl create table type flow name ovs-table-1 \
> matches $ovs_matches1 actions $ovs_actions1 \
> size 1k source 1
>
> #./flowtl create table type flow name ovs-table-2 \
> matches $ovs_matches2 actions $ovs_actoins2 \
> size 1k source 2
>
> By manipulating the source fields you could have a table that forward
> packets to the l2/l3 tables or a "flow" table depending on some criteria
> or you could work the other way have a set of routes and if they miss
> forward to a "flow" table. Other combinations are possible as well.
>
> I hope that is helpful I'll try to do a better write-up when I post the
> code. Also it seems like a reasonable approach to me any thoughts?
>
> .John
>
> --
> John Fastabend Intel Corporation
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: OVS Offload Decision Proposal
[not found] ` <20150306004425.GB6785-0o1r3XBGOEbbgkc5XkKeNuvMHUBZFtU3YPYVAmT7z5s@public.gmane.org>
@ 2015-06-17 14:44 ` Neelakantam Gaddam
0 siblings, 0 replies; 19+ messages in thread
From: Neelakantam Gaddam @ 2015-06-17 14:44 UTC (permalink / raw)
To: Neil Horman
Cc: dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org, Simon Horman,
Tom Herbert, John Fastabend, Linux Netdev List
Hi All,
I am interested in OVS HW offload support.
Is there any plan for implementing HW offload support for upcoming OVS
releases?
If implementation is already started, please point me to the source.
On Fri, Mar 6, 2015 at 6:14 AM, Neil Horman <nhorman@tuxdriver.com> wrote:
> On Wed, Mar 04, 2015 at 05:58:08PM -0800, John Fastabend wrote:
> > [...]
> >
> > >>>Doesn't this imply two entities to be independently managing the same
> > >>>physical resource? If so, this raises questions of how the resource
> > >>>would be partitioned between them? How are conflicting requests
> > >>>between the two rectified?
> > >>
> > >>
> > >>What two entities? The driver + flow API code I have in this case
> manage
> > >>the physical resource.
> > >>
> > >OVS and non-OVS kernel. Management in this context refers to policies
> > >for optimizing use of the HW resource (like which subset of flows to
> > >offload for best utilization).
> > >
> > >>I'm guessing the conflict you are thinking about is if we want to use
> > >>both L3 (or some other kernel subsystem) and OVS in the above case at
> > >>the same time? Not sure if people actually do this but what I expect is
> > >>the L3 sub-system should request a table from the hardware for L3
> > >>routes. Then the driver/kernel can allocate a part of the hardware
> > >>resources for L3 and a set for OVS.
> > >>
> > >I'm thinking of this as a more general problem. We've established that
> > >the existing kernel mechanisms (routing, tc, qdiscs, etc) should and
> > >maybe are required to work with these HW offloads. I don't think that
> > >a model where we can't use offloads with OVS and kernel simultaneously
> > >would fly, nor are we going to want the kernel to be dependent on OVS
> > >for resource management. So at some point, these two are going to need
> > >to work together somehow to share common HW resources. By this
> > >reasoning, OVS offload can't be defined in a vacuum. Strict
> > >partitioning only goes so far an inevitably leads to poor resource
> > >utilization. For instance, if we gave OVS and kernel each 1000 flow
> > >states each to offload, but OVS has 2000 flows that are inundated and
> > >kernel ones are getting any traffic then we have achieved poor
> > >utilization. This problem becomes even more evident when someone adds
> > >rate limiting to flows. What would it mean if both OVS and kernel
> > >tried to instantiate a flow with guaranteed line rate bandwidth? It
> > >seems like we need either a centralized resource manager, or at least
> > >some sort of fairly dynamic delegation mechanism for managing the
> > >resource (presumably kernel is master of the resource).
> > >
> > >Maybe a solution to all of this has already been fleshed out, but I
> > >didn't readily see this in Simon's write-up.
> >
> In addition to Johns notes below, I think its important to keep in mind
> here
> that no one is explicitly setting out make OVS offload and kernel dataplane
> offload mutually exclusive, nor do I think that any of the available
> proposals
> are actually doing so. We just have two use cases that require different
> semantics to make efficient use of those offloads within their own
> environments.
>
> OVS, in Johns world requires fine grained control of the hardware
> dataplane, so
> that the OVS bridge can optimally pass off the most cycle constrained
> operations
> to the hardware, be that L2/L3, or some combination of both, in an effort
> to
> maximize whatever aggregate software/hardware datapath it wishes to
> construct
> based on user supplied rules.
>
> Alternatively, kernel functional offloads already have very well defined
> semantics, and more than anything else really just want to enforce those
> semantics in hardware to opportunistically accelerate data movement when
> possible, but not if it means sacrificing how the users interacts with
> those
> functions (routing should still act like routing, bridging like bridging,
> etc).
> That may require somewhat less efficient resource utilization than we could
> otherwise achieve in the hardware, but if the goal is semantic
> consistency, that
> may be a necessecary trade off.
>
> As to co-existence, theres no reason that both the models can't operate in
> parallel, as long as the API's for resource management collaborate under
> the
> covers. The only question is, does the hardware have enough resources to
> do
> both? I expect the answer is, not likely (though in some situations it
> may).
> But for that very reason we need to make that resource allocation an
> administrative decision. For kernel functionality, the only aspect of the
> offload that we should expose to the user is an on/off switch, and
> possibly some
> parameters with which to define offload resource sizing and policy. I.e.
> commands like:
> ip neigh offload enable dev sw0 cachesize 1000 policy lru
> to reserve 1000 entries to store l2 lookups with a least recently used
> replacement policy
>
> or
> ip route offload enable dev sw0 cachesize 1000 policy maxuse
> to reserve 1000 entries to store l3 lookups with a replacement policy that
> only
> replaces routes who's hit count is larger than the least used in the cache
>
> By enabling kernel functionality like that, remaining resources can be
> used by
> the lower level API for things like OVS to use. If there aren't enough
> left to
> enable OVS offload, so be it. the administrator has all the tools at their
> disposal with which to reduce resource usage in one area in order to free
> them
> for use in another.
>
> Best
> Neil
>
> > I agree with all this and no I don't think it is all flushed out yet.
> >
> > I currently have something like the following although currently
> > proto-typed on a user space driver I plan to move the prototype into
> > the kernel rocker switch over the next couple weeks. The biggest amount
> > of work left is getting a "world" into rocker that doesn't have a
> > pre-defined table model and implementing constraints on the resources
> > to reflect how the tables are created.
> >
> > Via user space tool I can call into an API to allocate tables,
> >
> > #./flowtl create table type flow name flow-table \
> > matches $my_matches actions $my_actions \
> > size 1024 source 1
> >
> > this allocates a flow table resource in the hardware with the identifier
> > 'flow-table' that can match on fields in $my_matches and provide actions
> > in $my_actions. This lets the driver create an optimized table in the
> > hardware that matches on just the matches and just the actions. One
> > reason we need this is because if the hardware (at least the hardware I
> > generally work on) tries to use wide matches it is severely limited in
> > the number of entries it can support. But if you build tables that just
> > match on the relevant fields we can support many more entries in the
> > table.
> >
> > Then I have a few other 'well-defined' types to handle L3, L2.
> >
> > #./flowtl create table type l3-route route-table size 2048 source dflt
> >
> > these don't need matches/actions specifiers because it is known what
> > a l3-route type table is. Similarly we can have a l2 table,
> >
> > #./flowtl create table type l2-fwd l2-table size 8k source dflt
> >
> > the 'source' field instructs the hardware where to place the table in
> > the forwarding pipeline. I use 'dflt' to indicate the driver should
> > place it in the "normal" spot for that type.
> >
> > Then the flow-api module in the kernel acts as the resource manager. If
> > a "route" rule is received it maps to the l3-route table if a l2 ndo op
> > is received we point it at the "l2-table" and so on. User space flowtl
> > set rule commands can only be directed at tables of type 'flow'. If the
> > user tries to push a flow rule into l2-table or l3-table it will be
> > rejected because these are reserved for the kernel subsystems.
> >
> > I would expect OVS user space data plane for example to reserve a table
> > or maybe multiple tables like this,
> >
> > #./flowtl create table type flow name ovs-table-1 \
> > matches $ovs_matches1 actions $ovs_actions1 \
> > size 1k source 1
> >
> > #./flowtl create table type flow name ovs-table-2 \
> > matches $ovs_matches2 actions $ovs_actoins2 \
> > size 1k source 2
> >
> > By manipulating the source fields you could have a table that forward
> > packets to the l2/l3 tables or a "flow" table depending on some criteria
> > or you could work the other way have a set of routes and if they miss
> > forward to a "flow" table. Other combinations are possible as well.
> >
> > I hope that is helpful I'll try to do a better write-up when I post the
> > code. Also it seems like a reasonable approach to me any thoughts?
> >
> > .John
> >
> > --
> > John Fastabend Intel Corporation
> >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
--
Thanks & Regards
Neelakantam Gaddam
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2015-06-17 14:44 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-04 1:18 OVS Offload Decision Proposal Simon Horman
2015-03-04 16:45 ` Tom Herbert
2015-03-04 19:07 ` John Fastabend
2015-03-04 21:36 ` Tom Herbert
2015-03-05 1:58 ` John Fastabend
2015-03-06 0:44 ` Neil Horman
[not found] ` <20150306004425.GB6785-0o1r3XBGOEbbgkc5XkKeNuvMHUBZFtU3YPYVAmT7z5s@public.gmane.org>
2015-06-17 14:44 ` Neelakantam Gaddam
2015-03-05 0:04 ` [ovs-dev] " David Christensen
[not found] ` <3A5015FE9E557D448AF7238AF0ACE20A2D8AE08A-Wwdb2uEOBX+nNEFK5l6JbL1+IgudQmzARxWJa1zDYLQ@public.gmane.org>
2015-03-05 1:54 ` John Fastabend
2015-03-05 5:00 ` [ovs-dev] " David Miller
2015-03-05 5:20 ` Tom Herbert
2015-03-05 6:42 ` David Miller
2015-03-05 7:39 ` John Fastabend
2015-03-05 12:37 ` Jamal Hadi Salim
2015-03-05 13:16 ` Jamal Hadi Salim
2015-03-05 14:52 ` John Fastabend
2015-03-05 16:33 ` B Viswanath
2015-03-05 17:45 ` B Viswanath
[not found] ` <CAN+pFw+LDAiebOzFF+DD81vJp7y0OfVg=5BE0m47B2ZUp6zpeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-03-05 19:21 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).