netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net-next 0/6] offload linux bonding tc ingress rules
@ 2018-03-05 13:28 John Hurley
  2018-03-05 13:28 ` [RFC net-next 1/6] drivers: net: bonding: add tc offload infastructure to bond John Hurley
                   ` (8 more replies)
  0 siblings, 9 replies; 25+ messages in thread
From: John Hurley @ 2018-03-05 13:28 UTC (permalink / raw)
  To: netdev; +Cc: jiri, ogerlitz, jakub.kicinski, simon.horman, John Hurley

Hi,

This RFC patchset adds support for offloading tc ingress rules applied to
linux bonds. The premise of these patches is that if a rule is applied to
a bond port then the rule should be applied to each slave of the bond.

The linux bond itself registers a cb for offloading tc rules. Potential
slave netdevs on offload devices can then register with the bond for a
further callback - this code is basically the same as registering for an
egress dev offload in TC. Then when a rule is offloaded to the bond, it
can be relayed to each netdev that has registered with the bond code and
which is a slave of the given bond.

To prevent sync issues between the kernel and offload device, the linux
bond driver is affectively locked when it has offloaded rules. i.e no new
ports can be enslaved and no slaves can be released until the offload
rules are removed. Similarly, if a port on a bond is deleted, the bond is
destroyed, forcing a flush of all offloaded rules.

Also included in the RFC are changes to the NFP driver to utilise the new
code by registering NFP port representors for bond offload rules and
modifying cookie handling to allow the relaying of a rule to multiple
ports.

Thanks,
John

John Hurley (6):
  drivers: net: bonding: add tc offload infastructure to bond
  driver: net: bonding: allow registration of tc offload callbacks in
    bond
  drivers: net: bonding: restrict bond mods when rules are offloaded
  nfp: add ndo_set_mac_address for representors
  nfp: register repr ports for bond offloads
  nfp: support offloading multiple rules with same cookie

 drivers/net/bonding/bond_main.c                    | 277 ++++++++++++++++++++-
 drivers/net/ethernet/netronome/nfp/flower/main.c   |  24 +-
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  10 +-
 .../net/ethernet/netronome/nfp/flower/metadata.c   |  20 +-
 .../net/ethernet/netronome/nfp/flower/offload.c    |  33 ++-
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c  |   1 +
 include/net/bonding.h                              |   9 +
 7 files changed, 351 insertions(+), 23 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [RFC net-next 2/6] driver: net: bonding: allow registration of tc offload callbacks in bond
@ 2018-03-13 15:51 Or Gerlitz
  2018-03-13 15:53 ` Or Gerlitz
  2018-03-14  9:50 ` Jiri Pirko
  0 siblings, 2 replies; 25+ messages in thread
From: Or Gerlitz @ 2018-03-13 15:51 UTC (permalink / raw)
  To: Jiri Pirko, Rabie Loulou, John Hurley
  Cc: Jakub Kicinski, Simon Horman, Linux Netdev List, ASAP_Direct_Dev,
	mlxsw

On Wed, Mar 7, 2018 at 12:57 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Mon, Mar 05, 2018 at 02:28:30PM CET, john.hurley@netronome.com wrote:
>>Allow drivers to register netdev callbacks for tc offload in linux bonds.
>>If a netdev has registered and is a slave of a given bond, then any tc
>>rules offloaded to the bond will be relayed to it if both the bond and the
>>slave permit hw offload.

>>Because the bond itself is not offloaded, just the rules, we don't care
>>about whether the bond ports are on the same device or whether some of
>>slaves are representor ports and some are not.

John, I think we must design here for the case where the bond IS offloaded.
E.g some sort of HW LAG. For example, the mlxsw driver supports
LAG offload and support tcflower offload, we need to see how these
two live together, mlx5 supports tcflower offload and we are working on
bond offload, etc.

>>+EXPORT_SYMBOL_GPL(tc_setup_cb_bond_register);
>
> Please, no "bond" specific calls from drivers. That would be wrong.
> The idea behing block callbacks was that anyone who is interested could
> register to receive those. In this case, slave device is interested.
> So it should register to receive block callbacks in the same way as if
> the block was directly on top of the slave device. The only thing you
> need to handle is to propagate block bind/unbind from master down to the
> slaves.

Jiri,

This sounds nice for the case where one install ingress tc rules on
the bond (lets
call them type 1, see next)

One obstacle pointed by my colleague, Rabie, is that when the upper layer
issues stat call on the filter, they will get two replies, this can confuse them
and lead to wrong decisions (aging). I wonder if/how we can set a knob
somewhere that unifies the stats (add packet/bytes, use the latest lastuse).

Also, lets see what other rules have to be offloaded in that scheme
(call them type 2/3/4)
where one bonded two HW ports

2. bond being egress port of a rule

TC rules for overlay networks scheme, e.g in NIC SRIOV
scheme where one bonds the two uplink representors

Starting with type 2, in our current NIC HW APIs we have to duplicate
these rules
into two rules set to HW:

2.1 VF rep --> uplink 0
2.2 VF rep --> uplink 1

and we do that in the driver (add/del two HW rules, combine the stat
results, etc)

3. ingress rule on VF rep port with shared tunnel device being the
egress (encap)
and where the routing of the underlay (tunnel) goes through LAG.

in our case, this is like 2.1/2.2 above, offload two rules, combine stats

4. ingress rule shared tunnel device being the ingress and VF rep port
being the egress (decap)

this uses the egdev facility to be offloaded into the our driver, and
then in the driver
we will treat it like type 1, two rules need to be installed into HW,
but now, we can't delegate them
from the vxlan device b/c it has no direct connection with the bond.

All to all, for the mlx5 use case, seems we have elegant solution only
for type 1.

I think we should do the elegant solution for the case where it applicable.

In parallel if/when newer HW APIs are there such that type 2 and 3 can be set
using one HW rule whose dest is the bond, we are good. As for type 4,
need to see
if/how it can be nicer.

Or.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-03-15 21:38 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-05 13:28 [RFC net-next 0/6] offload linux bonding tc ingress rules John Hurley
2018-03-05 13:28 ` [RFC net-next 1/6] drivers: net: bonding: add tc offload infastructure to bond John Hurley
2018-03-05 13:28 ` [RFC net-next 2/6] driver: net: bonding: allow registration of tc offload callbacks in bond John Hurley
2018-03-07 10:57   ` Jiri Pirko
2018-03-05 13:28 ` [RFC net-next 3/6] drivers: net: bonding: restrict bond mods when rules are offloaded John Hurley
2018-03-05 13:28 ` [RFC net-next 4/6] nfp: add ndo_set_mac_address for representors John Hurley
2018-03-05 21:39   ` Or Gerlitz
2018-03-05 23:20     ` John Hurley
2018-03-05 13:28 ` [RFC net-next 5/6] nfp: register repr ports for bond offloads John Hurley
2018-03-05 13:28 ` [RFC net-next 6/6] nfp: support offloading multiple rules with same cookie John Hurley
2018-03-05 21:43 ` [RFC net-next 0/6] offload linux bonding tc ingress rules Or Gerlitz
2018-03-05 23:57   ` John Hurley
2018-03-06  0:16     ` Jakub Kicinski
2018-03-05 22:08 ` Jakub Kicinski
2018-03-06  2:34   ` Roopa Prabhu
2018-03-06  7:49 ` Ido Schimmel
  -- strict thread matches above, loose matches on Subject: below --
2018-03-13 15:51 [RFC net-next 2/6] driver: net: bonding: allow registration of tc offload callbacks in bond Or Gerlitz
2018-03-13 15:53 ` Or Gerlitz
2018-03-14  1:50   ` Jakub Kicinski
2018-03-14  6:54     ` Or Gerlitz
2018-03-14 15:51     ` Jiri Pirko
2018-03-14  9:50 ` Jiri Pirko
2018-03-14 11:23   ` Or Gerlitz
2018-03-14 15:56     ` Jiri Pirko
2018-03-15 21:38       ` Or Gerlitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).