* [PATCH net-next v2 0/4] switchdev: add IPv4 routing offload
@ 2015-03-02 10:06 sfeldma
2015-03-02 10:06 ` [PATCH net-next v2 1/4] rtnetlink: add RTNH_F_EXTERNAL flag for fib offload sfeldma
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: sfeldma @ 2015-03-02 10:06 UTC (permalink / raw)
To: netdev, davem, jiri, roopa
From: Scott Feldman <sfeldma@gmail.com>
(sorry, resending, this time with 'v2' in subject line)
v2:
Changes based on v1 review comments and discussions at netconf:
- Allow route modification, but use same ndo op used for adding route.
Driver/device is expected to modify route in-place, if it can, to avoid
interruption of service.
- Add new RTNH_F_EXTERNAL flag to mark FIB entries offloaded externally.
- Don't offload routes if using custom IP rules. If routes are already
offloaded, and custom IP rules are turned on, flush routes from offload
device. (Offloaded routes are marked with RTNH_F_EXTERNAL).
- Use kernel's neigh resolution code to resolve route's nexthops' neigh
MAC addrs. (Thanks davem, works great!).
- Use fib->fib_priority in rocker driver to give priorities to routes in
OF-DPA unicast route table.
v1:
This patch set adds L3 routing offload support for IPv4 routes. The idea is to
mirror routes installed in the kernel's FIB down to a hardware switch device to
offload the data forwarding path for L3. Only the data forwarding path is
intercepted. Control and management of the kernel's FIB remains with the
kernel.
Scott Feldman (4):
rtnetlink: add RTNH_F_EXTERNAL flag for fib offload
net: add IPv4 routing FIB support for switchdev
rocker: implement IPv4 fib offloading
switchdev: don't support custom ip rules, for now
drivers/net/ethernet/rocker/rocker.c | 517 +++++++++++++++++++++++++++++++---
include/linux/netdevice.h | 22 ++
include/net/ip_fib.h | 2 +
include/net/switchdev.h | 19 ++
include/uapi/linux/rtnetlink.h | 1 +
net/ipv4/fib_frontend.c | 13 +
net/ipv4/fib_rules.c | 3 +
net/ipv4/fib_trie.c | 60 +++-
net/switchdev/switchdev.c | 99 +++++++
9 files changed, 688 insertions(+), 48 deletions(-)
--
1.7.10.4
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH net-next v2 1/4] rtnetlink: add RTNH_F_EXTERNAL flag for fib offload 2015-03-02 10:06 [PATCH net-next v2 0/4] switchdev: add IPv4 routing offload sfeldma @ 2015-03-02 10:06 ` sfeldma 2015-03-02 10:06 ` [PATCH net-next v2 2/4] net: add IPv4 routing FIB support for switchdev sfeldma ` (2 subsequent siblings) 3 siblings, 0 replies; 16+ messages in thread From: sfeldma @ 2015-03-02 10:06 UTC (permalink / raw) To: netdev, davem, jiri, roopa From: Scott Feldman <sfeldma@gmail.com> Add new RTNH_F_EXTERNAL flag to mark fib entries offloaded externally, for example to a switchdev switch. Signed-off-by: Scott Feldman <sfeldma@gmail.com> --- include/uapi/linux/rtnetlink.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 5cc5d66..b476e86 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -332,6 +332,7 @@ struct rtnexthop { #define RTNH_F_DEAD 1 /* Nexthop is dead (used by multipath) */ #define RTNH_F_PERVASIVE 2 /* Do recursive gateway lookup */ #define RTNH_F_ONLINK 4 /* Gateway is forced on link */ +#define RTNH_F_EXTERNAL 8 /* Route installed externally */ /* Macros to handle hexthops */ -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v2 2/4] net: add IPv4 routing FIB support for switchdev 2015-03-02 10:06 [PATCH net-next v2 0/4] switchdev: add IPv4 routing offload sfeldma 2015-03-02 10:06 ` [PATCH net-next v2 1/4] rtnetlink: add RTNH_F_EXTERNAL flag for fib offload sfeldma @ 2015-03-02 10:06 ` sfeldma 2015-03-02 14:30 ` roopa 2015-03-02 22:27 ` Samudrala, Sridhar 2015-03-02 10:06 ` [PATCH net-next v2 3/4] rocker: implement IPv4 fib offloading sfeldma 2015-03-02 10:06 ` [PATCH net-next v2 4/4] switchdev: don't support custom ip rules, for now sfeldma 3 siblings, 2 replies; 16+ messages in thread From: sfeldma @ 2015-03-02 10:06 UTC (permalink / raw) To: netdev, davem, jiri, roopa From: Scott Feldman <sfeldma@gmail.com> Add two new ndo ops (ndo_switch_fib_ipv4_add/del) for switchdev devices capable of offloading IPv4 L3 routing function from the kernel. The ops are called by the core IPv4 FIB code when installing/removing/modifying FIB entries in the kernel FIB. On install/modify, the driver should return 0 if FIB entry (route) can be installed/modified to device; -EOPNOTSUPP if route cannot be installed/modified due to device limitations; and any other negative error code on failure to install route to device. On failure error code, the route is not installed to device, and not installed in kernel FIB, and the return code is propagated back to the user-space caller (via netlink). An -EOPNOTSUPP error code is skipped for the device but installed in the kernel FIB. The FIB entry (route) nexthop list is used to find the switchdev netdev to anchor the ndo op call. The route's fib_dev (the first nexthop's dev) is used find the switchdev netdev by recursively traversing the fib_dev's lower_dev list until a switchdev netdev is found. The ndo op is called on this switchdev netdev. This downward traversal is necessary for switchdev ports stacked under bonds and/or bridges, where the bond or bridge has the L3 interface. Thw switchdev driver can monitor netevent notifier NETEVENT_NEIGH_UPDATE to know neighbor IP addresses which are resolved to a MAC address. In the case where the route's nexthops list contains unresolved neighbor IP addresses, the driver can ask the kernel to resolve the neighbor. As route nexthops are resolved, the driver has enough information to program the device for L3 forwarding offload. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> --- include/linux/netdevice.h | 22 +++++++++++ include/net/switchdev.h | 19 +++++++++ net/ipv4/fib_trie.c | 33 ++++++++++++++-- net/switchdev/switchdev.c | 95 +++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 166 insertions(+), 3 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5897b4e..73b2766 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -769,6 +769,8 @@ struct netdev_phys_item_id { typedef u16 (*select_queue_fallback_t)(struct net_device *dev, struct sk_buff *skb); +struct fib_info; + /* * This structure defines the management hooks for network devices. * The following hooks can be defined; unless noted otherwise, they are @@ -1032,6 +1034,14 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev, * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state); * Called to notify switch device port of bridge port STP * state change. + * int (*ndo_sw_parent_fib_ipv4_add)(struct net_device *dev, __be32 dst, + * int dst_len, struct fib_info *fi, + * u8 tos, u8 type, u32 tb_id); + * Called to add/modify IPv4 route to switch device. + * int (*ndo_sw_parent_fib_ipv4_del)(struct net_device *dev, __be32 dst, + * int dst_len, struct fib_info *fi, + * u8 tos, u8 type, u32 tb_id); + * Called to delete IPv4 route from switch device. */ struct net_device_ops { int (*ndo_init)(struct net_device *dev); @@ -1193,6 +1203,18 @@ struct net_device_ops { struct netdev_phys_item_id *psid); int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state); + int (*ndo_switch_fib_ipv4_add)(struct net_device *dev, + __be32 dst, + int dst_len, + struct fib_info *fi, + u8 tos, u8 type, + u32 tb_id); + int (*ndo_switch_fib_ipv4_del)(struct net_device *dev, + __be32 dst, + int dst_len, + struct fib_info *fi, + u8 tos, u8 type, + u32 tb_id); #endif }; diff --git a/include/net/switchdev.h b/include/net/switchdev.h index cfcdac2..4b2fc3f2 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -51,6 +51,11 @@ int ndo_dflt_netdev_switch_port_bridge_dellink(struct net_device *dev, struct nlmsghdr *nlh, u16 flags); int ndo_dflt_netdev_switch_port_bridge_setlink(struct net_device *dev, struct nlmsghdr *nlh, u16 flags); +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi, + u8 tos, u8 type, u32 tb_id); +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi, + u8 tos, u8 type, u32 tb_id); + #else static inline int netdev_switch_parent_id_get(struct net_device *dev, @@ -109,6 +114,20 @@ static inline int ndo_dflt_netdev_switch_port_bridge_setlink(struct net_device * return 0; } +static inline int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, + struct fib_info *fi, + u8 tos, u8 type, u32 tb_id) +{ + return -EOPNOTSUPP; +} + +static inline int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, + struct fib_info *fi, + u8 tos, u8 type, u32 tb_id) +{ + return -EOPNOTSUPP; +} + #endif #endif /* _LINUX_SWITCHDEV_H_ */ diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index f485345..b834e9c 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -79,6 +79,7 @@ #include <net/tcp.h> #include <net/sock.h> #include <net/ip_fib.h> +#include <net/switchdev.h> #include "fib_lookup.h" #define MAX_STAT_DEPTH 32 @@ -1161,7 +1162,17 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg) new_fa->fa_state = state & ~FA_S_ACCESSED; new_fa->fa_slen = fa->fa_slen; + err = netdev_switch_fib_ipv4_add(key, plen, fi, + new_fa->fa_tos, + cfg->fc_type, + tb->tb_id); + if (err && err != -EOPNOTSUPP) { + kmem_cache_free(fn_alias_kmem, new_fa); + goto out; + } + hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list); + alias_free_mem_rcu(fa); fib_release_info(fi_drop); @@ -1197,12 +1208,18 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg) new_fa->fa_state = 0; new_fa->fa_slen = slen; + /* (Optionally) offload fib entry to switch hardware. */ + err = netdev_switch_fib_ipv4_add(key, plen, fi, tos, + cfg->fc_type, tb->tb_id); + if (err && err != -EOPNOTSUPP) + goto out_free_new_fa; + /* Insert new entry to the list. */ if (!l) { l = fib_insert_node(t, key, plen); if (unlikely(!l)) { err = -ENOMEM; - goto out_free_new_fa; + goto out_sw_fib_del; } } @@ -1217,6 +1234,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg) succeeded: return 0; +out_sw_fib_del: + netdev_switch_fib_ipv4_del(key, plen, fi, tos, cfg->fc_type, tb->tb_id); out_free_new_fa: kmem_cache_free(fn_alias_kmem, new_fa); out: @@ -1475,6 +1494,10 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg) return -ESRCH; fa = fa_to_delete; + + netdev_switch_fib_ipv4_del(key, plen, fa->fa_info, tos, + cfg->fc_type, tb->tb_id); + rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id, &cfg->fc_nlinfo, 0); @@ -1494,7 +1517,7 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg) return 0; } -static int trie_flush_leaf(struct tnode *l) +static int trie_flush_leaf(struct fib_table *tb, struct tnode *l) { struct hlist_node *tmp; unsigned char slen = 0; @@ -1505,6 +1528,10 @@ static int trie_flush_leaf(struct tnode *l) struct fib_info *fi = fa->fa_info; if (fi && (fi->fib_flags & RTNH_F_DEAD)) { + netdev_switch_fib_ipv4_del(l->key, + KEYLENGTH - fa->fa_slen, + fi, fa->fa_tos, + fa->fa_type, tb->tb_id); hlist_del_rcu(&fa->fa_list); fib_release_info(fa->fa_info); alias_free_mem_rcu(fa); @@ -1593,7 +1620,7 @@ int fib_table_flush(struct fib_table *tb) int found = 0; for (l = trie_firstleaf(t); l; l = trie_nextleaf(l)) { - found += trie_flush_leaf(l); + found += trie_flush_leaf(tb, l); if (ll) { if (hlist_empty(&ll->leaf)) diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index 8c1e558..a84bdb4 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -14,6 +14,7 @@ #include <linux/mutex.h> #include <linux/notifier.h> #include <linux/netdevice.h> +#include <net/ip_fib.h> #include <net/switchdev.h> /** @@ -225,3 +226,97 @@ int ndo_dflt_netdev_switch_port_bridge_dellink(struct net_device *dev, return ret; } EXPORT_SYMBOL(ndo_dflt_netdev_switch_port_bridge_dellink); + +static struct net_device *netdev_switch_get_by_fib_dev(struct net_device *dev) +{ + const struct net_device_ops *ops = dev->netdev_ops; + struct net_device *lower_dev; + struct net_device *port_dev; + struct list_head *iter; + + /* Recusively search from fib_dev down until we find + * a sw port dev. (A sw port dev supports + * ndo_switch_parent_id_get). + */ + + if (ops->ndo_switch_parent_id_get) + return dev; + + netdev_for_each_lower_dev(dev, lower_dev, iter) { + port_dev = netdev_switch_get_by_fib_dev(lower_dev); + if (port_dev) + return port_dev; + } + + return NULL; +} + +/** + * netdev_switch_fib_ipv4_add - Add IPv4 route entry to switch + * + * @dst: route's IPv4 destination address + * @dst_len: destination address length (prefix length) + * @fi: route FIB info structure + * @tos: route TOS + * @type: route type + * @tb_id: route table ID + * + * Add IPv4 route entry to switch device. + */ +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi, + u8 tos, u8 type, u32 tb_id) +{ + struct net_device *dev; + const struct net_device_ops *ops; + int err = -EOPNOTSUPP; + + dev = netdev_switch_get_by_fib_dev(fi->fib_dev); + if (!dev) + return -EOPNOTSUPP; + ops = dev->netdev_ops; + + if (ops->ndo_switch_fib_ipv4_add) + err = ops->ndo_switch_fib_ipv4_add(dev, htonl(dst), dst_len, + fi, tos, type, tb_id); + + if (!err) + fi->fib_flags |= RTNH_F_EXTERNAL; + + return err; +} +EXPORT_SYMBOL(netdev_switch_fib_ipv4_add); + +/** + * netdev_switch_fib_ipv4_del - Delete IPv4 route entry from switch + * + * @dst: route's IPv4 destination address + * @dst_len: destination address length (prefix length) + * @fi: route FIB info structure + * @tos: route TOS + * @type: route type + * @tb_id: route table ID + * + * Delete IPv4 route entry from switch device. + */ +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi, + u8 tos, u8 type, u32 tb_id) +{ + struct net_device *dev; + const struct net_device_ops *ops; + int err = -EOPNOTSUPP; + + if (!(fi->fib_flags & RTNH_F_EXTERNAL)) + return -EOPNOTSUPP; + + dev = netdev_switch_get_by_fib_dev(fi->fib_dev); + if (!dev) + return -EOPNOTSUPP; + ops = dev->netdev_ops; + + if (ops->ndo_switch_fib_ipv4_del) + err = ops->ndo_switch_fib_ipv4_del(dev, htonl(dst), dst_len, + fi, tos, type, tb_id); + + return err; +} +EXPORT_SYMBOL(netdev_switch_fib_ipv4_del); -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 2/4] net: add IPv4 routing FIB support for switchdev 2015-03-02 10:06 ` [PATCH net-next v2 2/4] net: add IPv4 routing FIB support for switchdev sfeldma @ 2015-03-02 14:30 ` roopa 2015-03-02 17:10 ` Scott Feldman 2015-03-02 22:27 ` Samudrala, Sridhar 1 sibling, 1 reply; 16+ messages in thread From: roopa @ 2015-03-02 14:30 UTC (permalink / raw) To: sfeldma; +Cc: netdev, davem, jiri On 3/2/15, 2:06 AM, sfeldma@gmail.com wrote: > From: Scott Feldman <sfeldma@gmail.com> > > Add two new ndo ops (ndo_switch_fib_ipv4_add/del) for switchdev devices > capable of offloading IPv4 L3 routing function from the kernel. The ops are > called by the core IPv4 FIB code when installing/removing/modifying FIB entries > in the kernel FIB. On install/modify, the driver should return 0 if FIB entry > (route) can be installed/modified to device; -EOPNOTSUPP if route cannot be > installed/modified due to device limitations; and any other negative error code > on failure to install route to device. On failure error code, the route is not > installed to device, and not installed in kernel FIB, and the return code is > propagated back to the user-space caller (via netlink). An -EOPNOTSUPP error > code is skipped for the device but installed in the kernel FIB. > > The FIB entry (route) nexthop list is used to find the switchdev netdev to > anchor the ndo op call. The route's fib_dev (the first nexthop's dev) is used > find the switchdev netdev by recursively traversing the fib_dev's lower_dev > list until a switchdev netdev is found. The ndo op is called on this switchdev > netdev. This downward traversal is necessary for switchdev ports stacked under > bonds and/or bridges, where the bond or bridge has the L3 interface. > > Thw switchdev driver can monitor netevent notifier NETEVENT_NEIGH_UPDATE to > know neighbor IP addresses which are resolved to a MAC address. In the case > where the route's nexthops list contains unresolved neighbor IP addresses, the > driver can ask the kernel to resolve the neighbor. As route nexthops are > resolved, the driver has enough information to program the device for > L3 forwarding offload. > > Signed-off-by: Scott Feldman <sfeldma@gmail.com> > Signed-off-by: Jiri Pirko <jiri@resnulli.us> > --- > include/linux/netdevice.h | 22 +++++++++++ > include/net/switchdev.h | 19 +++++++++ > net/ipv4/fib_trie.c | 33 ++++++++++++++-- > net/switchdev/switchdev.c | 95 +++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 166 insertions(+), 3 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 5897b4e..73b2766 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -769,6 +769,8 @@ struct netdev_phys_item_id { > typedef u16 (*select_queue_fallback_t)(struct net_device *dev, > struct sk_buff *skb); > > +struct fib_info; > + > /* > * This structure defines the management hooks for network devices. > * The following hooks can be defined; unless noted otherwise, they are > @@ -1032,6 +1034,14 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev, > * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state); > * Called to notify switch device port of bridge port STP > * state change. > + * int (*ndo_sw_parent_fib_ipv4_add)(struct net_device *dev, __be32 dst, > + * int dst_len, struct fib_info *fi, > + * u8 tos, u8 type, u32 tb_id); > + * Called to add/modify IPv4 route to switch device. > + * int (*ndo_sw_parent_fib_ipv4_del)(struct net_device *dev, __be32 dst, > + * int dst_len, struct fib_info *fi, > + * u8 tos, u8 type, u32 tb_id); > + * Called to delete IPv4 route from switch device. > */ > struct net_device_ops { > int (*ndo_init)(struct net_device *dev); > @@ -1193,6 +1203,18 @@ struct net_device_ops { > struct netdev_phys_item_id *psid); > int (*ndo_switch_port_stp_update)(struct net_device *dev, > u8 state); > + int (*ndo_switch_fib_ipv4_add)(struct net_device *dev, > + __be32 dst, > + int dst_len, > + struct fib_info *fi, > + u8 tos, u8 type, > + u32 tb_id); > + int (*ndo_switch_fib_ipv4_del)(struct net_device *dev, > + __be32 dst, > + int dst_len, > + struct fib_info *fi, > + u8 tos, u8 type, > + u32 tb_id); > #endif > }; > > diff --git a/include/net/switchdev.h b/include/net/switchdev.h > index cfcdac2..4b2fc3f2 100644 > --- a/include/net/switchdev.h > +++ b/include/net/switchdev.h > @@ -51,6 +51,11 @@ int ndo_dflt_netdev_switch_port_bridge_dellink(struct net_device *dev, > struct nlmsghdr *nlh, u16 flags); > int ndo_dflt_netdev_switch_port_bridge_setlink(struct net_device *dev, > struct nlmsghdr *nlh, u16 flags); > +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi, > + u8 tos, u8 type, u32 tb_id); > +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi, > + u8 tos, u8 type, u32 tb_id); > + > #else > > static inline int netdev_switch_parent_id_get(struct net_device *dev, > @@ -109,6 +114,20 @@ static inline int ndo_dflt_netdev_switch_port_bridge_setlink(struct net_device * > return 0; > } > > +static inline int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, > + struct fib_info *fi, > + u8 tos, u8 type, u32 tb_id) > +{ > + return -EOPNOTSUPP; > +} > + > +static inline int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, > + struct fib_info *fi, > + u8 tos, u8 type, u32 tb_id) > +{ > + return -EOPNOTSUPP; > +} > + > #endif > > #endif /* _LINUX_SWITCHDEV_H_ */ > diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c > index f485345..b834e9c 100644 > --- a/net/ipv4/fib_trie.c > +++ b/net/ipv4/fib_trie.c > @@ -79,6 +79,7 @@ > #include <net/tcp.h> > #include <net/sock.h> > #include <net/ip_fib.h> > +#include <net/switchdev.h> > #include "fib_lookup.h" > > #define MAX_STAT_DEPTH 32 > @@ -1161,7 +1162,17 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg) > new_fa->fa_state = state & ~FA_S_ACCESSED; > new_fa->fa_slen = fa->fa_slen; > > + err = netdev_switch_fib_ipv4_add(key, plen, fi, > + new_fa->fa_tos, > + cfg->fc_type, > + tb->tb_id); > + if (err && err != -EOPNOTSUPP) { > + kmem_cache_free(fn_alias_kmem, new_fa); > + goto out; > + } > + > hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list); > + > alias_free_mem_rcu(fa); > > fib_release_info(fi_drop); This looks like the replace case: It will need a netdev_switch_fib_ipv4_del for fi_drop ? > @@ -1197,12 +1208,18 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg) > new_fa->fa_state = 0; > new_fa->fa_slen = slen; > > + /* (Optionally) offload fib entry to switch hardware. */ > + err = netdev_switch_fib_ipv4_add(key, plen, fi, tos, > + cfg->fc_type, tb->tb_id); This could be an NLM_F_APPEND case. Would be better for the switchdev API to also take nlflags as argument, to inform the switch driver of replace and append cases. > + if (err && err != -EOPNOTSUPP) > + goto out_free_new_fa; > + > /* Insert new entry to the list. */ > if (!l) { > l = fib_insert_node(t, key, plen); > if (unlikely(!l)) { > err = -ENOMEM; > - goto out_free_new_fa; > + goto out_sw_fib_del; > } > } > > @@ -1217,6 +1234,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg) > succeeded: > return 0; > > +out_sw_fib_del: > + netdev_switch_fib_ipv4_del(key, plen, fi, tos, cfg->fc_type, tb->tb_id); > out_free_new_fa: > kmem_cache_free(fn_alias_kmem, new_fa); > out: > @@ -1475,6 +1494,10 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg) > return -ESRCH; > > fa = fa_to_delete; > + > + netdev_switch_fib_ipv4_del(key, plen, fa->fa_info, tos, > + cfg->fc_type, tb->tb_id); > + > rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb->tb_id, > &cfg->fc_nlinfo, 0); > > @@ -1494,7 +1517,7 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg) > return 0; > } > > -static int trie_flush_leaf(struct tnode *l) > +static int trie_flush_leaf(struct fib_table *tb, struct tnode *l) > { > struct hlist_node *tmp; > unsigned char slen = 0; > @@ -1505,6 +1528,10 @@ static int trie_flush_leaf(struct tnode *l) > struct fib_info *fi = fa->fa_info; > > if (fi && (fi->fib_flags & RTNH_F_DEAD)) { > + netdev_switch_fib_ipv4_del(l->key, > + KEYLENGTH - fa->fa_slen, > + fi, fa->fa_tos, > + fa->fa_type, tb->tb_id); > hlist_del_rcu(&fa->fa_list); > fib_release_info(fa->fa_info); > alias_free_mem_rcu(fa); > @@ -1593,7 +1620,7 @@ int fib_table_flush(struct fib_table *tb) > int found = 0; > > for (l = trie_firstleaf(t); l; l = trie_nextleaf(l)) { > - found += trie_flush_leaf(l); > + found += trie_flush_leaf(tb, l); > > if (ll) { > if (hlist_empty(&ll->leaf)) > diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c > index 8c1e558..a84bdb4 100644 > --- a/net/switchdev/switchdev.c > +++ b/net/switchdev/switchdev.c > @@ -14,6 +14,7 @@ > #include <linux/mutex.h> > #include <linux/notifier.h> > #include <linux/netdevice.h> > +#include <net/ip_fib.h> > #include <net/switchdev.h> > > /** > @@ -225,3 +226,97 @@ int ndo_dflt_netdev_switch_port_bridge_dellink(struct net_device *dev, > return ret; > } > EXPORT_SYMBOL(ndo_dflt_netdev_switch_port_bridge_dellink); > + > +static struct net_device *netdev_switch_get_by_fib_dev(struct net_device *dev) > +{ > + const struct net_device_ops *ops = dev->netdev_ops; > + struct net_device *lower_dev; > + struct net_device *port_dev; > + struct list_head *iter; > + > + /* Recusively search from fib_dev down until we find > + * a sw port dev. (A sw port dev supports > + * ndo_switch_parent_id_get). > + */ > + > + if (ops->ndo_switch_parent_id_get) > + return dev; Maybe we can just check for NETIF_F_HW_SWITCH_OFFLOAD here ? similar to netdev_switch_port_bridge_newlink/dellink > + > + netdev_for_each_lower_dev(dev, lower_dev, iter) { > + port_dev = netdev_switch_get_by_fib_dev(lower_dev); > + if (port_dev) > + return port_dev; > + } > + > + return NULL; > +} > + > +/** > + * netdev_switch_fib_ipv4_add - Add IPv4 route entry to switch > + * > + * @dst: route's IPv4 destination address > + * @dst_len: destination address length (prefix length) > + * @fi: route FIB info structure > + * @tos: route TOS > + * @type: route type > + * @tb_id: route table ID > + * > + * Add IPv4 route entry to switch device. > + */ > +int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi, > + u8 tos, u8 type, u32 tb_id) > +{ > + struct net_device *dev; > + const struct net_device_ops *ops; > + int err = -EOPNOTSUPP; > + > + dev = netdev_switch_get_by_fib_dev(fi->fib_dev); > + if (!dev) > + return -EOPNOTSUPP; > + ops = dev->netdev_ops; > + > + if (ops->ndo_switch_fib_ipv4_add) > + err = ops->ndo_switch_fib_ipv4_add(dev, htonl(dst), dst_len, > + fi, tos, type, tb_id); > + > + if (!err) > + fi->fib_flags |= RTNH_F_EXTERNAL; > + > + return err; > +} > +EXPORT_SYMBOL(netdev_switch_fib_ipv4_add); > + > +/** > + * netdev_switch_fib_ipv4_del - Delete IPv4 route entry from switch > + * > + * @dst: route's IPv4 destination address > + * @dst_len: destination address length (prefix length) > + * @fi: route FIB info structure > + * @tos: route TOS > + * @type: route type > + * @tb_id: route table ID > + * > + * Delete IPv4 route entry from switch device. > + */ > +int netdev_switch_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi, > + u8 tos, u8 type, u32 tb_id) > +{ > + struct net_device *dev; > + const struct net_device_ops *ops; > + int err = -EOPNOTSUPP; > + > + if (!(fi->fib_flags & RTNH_F_EXTERNAL)) > + return -EOPNOTSUPP; > + > + dev = netdev_switch_get_by_fib_dev(fi->fib_dev); > + if (!dev) > + return -EOPNOTSUPP; > + ops = dev->netdev_ops; > + > + if (ops->ndo_switch_fib_ipv4_del) > + err = ops->ndo_switch_fib_ipv4_del(dev, htonl(dst), dst_len, > + fi, tos, type, tb_id); > + > + return err; > +} > +EXPORT_SYMBOL(netdev_switch_fib_ipv4_del); Rest looks great!. We can extend the switchdev api as needed in the future. Thanks scott. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 2/4] net: add IPv4 routing FIB support for switchdev 2015-03-02 14:30 ` roopa @ 2015-03-02 17:10 ` Scott Feldman 2015-03-02 19:24 ` roopa 0 siblings, 1 reply; 16+ messages in thread From: Scott Feldman @ 2015-03-02 17:10 UTC (permalink / raw) To: roopa; +Cc: Netdev, David S. Miller, Jiří Pírko On Mon, Mar 2, 2015 at 6:30 AM, roopa <roopa@cumulusnetworks.com> wrote: > On 3/2/15, 2:06 AM, sfeldma@gmail.com wrote: >> >> From: Scott Feldman <sfeldma@gmail.com> >> + err = netdev_switch_fib_ipv4_add(key, plen, fi, >> + new_fa->fa_tos, >> + cfg->fc_type, >> + tb->tb_id); >> + if (err && err != -EOPNOTSUPP) { >> + kmem_cache_free(fn_alias_kmem, new_fa); >> + goto out; >> + } >> + >> hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list); >> + >> alias_free_mem_rcu(fa); >> fib_release_info(fi_drop); > > > This looks like the replace case: It will need a netdev_switch_fib_ipv4_del > for fi_drop ? It is the replace case. No del needed. Failure to replace in HW results in replace not proceeding in SW and err return to user. So HW and SW stay coherent and user can react to failure case. >> >> @@ -1197,12 +1208,18 @@ int fib_table_insert(struct fib_table *tb, struct >> fib_config *cfg) >> new_fa->fa_state = 0; >> new_fa->fa_slen = slen; >> + /* (Optionally) offload fib entry to switch hardware. */ >> + err = netdev_switch_fib_ipv4_add(key, plen, fi, tos, >> + cfg->fc_type, tb->tb_id); > > > This could be an NLM_F_APPEND case. Would be better for the switchdev API to > also take > nlflags as argument, to inform the switch driver of replace and append > cases. No, we can keep it simple. The ndo add op is used for adds or mods. The driver/device will know if operation is a mod vs. add if object exists (it's an add if object doesn't exist). >> + /* Recusively search from fib_dev down until we find >> + * a sw port dev. (A sw port dev supports >> + * ndo_switch_parent_id_get). >> + */ >> + >> + if (ops->ndo_switch_parent_id_get) >> + return dev; > > > Maybe we can just check for NETIF_F_HW_SWITCH_OFFLOAD here ? > similar to netdev_switch_port_bridge_newlink/dellink Probably, yes. -scott ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 2/4] net: add IPv4 routing FIB support for switchdev 2015-03-02 17:10 ` Scott Feldman @ 2015-03-02 19:24 ` roopa 0 siblings, 0 replies; 16+ messages in thread From: roopa @ 2015-03-02 19:24 UTC (permalink / raw) To: Scott Feldman; +Cc: Netdev, David S. Miller, Jiří Pírko On 3/2/15, 9:10 AM, Scott Feldman wrote: > On Mon, Mar 2, 2015 at 6:30 AM, roopa <roopa@cumulusnetworks.com> wrote: >> On 3/2/15, 2:06 AM, sfeldma@gmail.com wrote: >>> From: Scott Feldman <sfeldma@gmail.com> >>> + err = netdev_switch_fib_ipv4_add(key, plen, fi, >>> + new_fa->fa_tos, >>> + cfg->fc_type, >>> + tb->tb_id); >>> + if (err && err != -EOPNOTSUPP) { >>> + kmem_cache_free(fn_alias_kmem, new_fa); >>> + goto out; >>> + } >>> + >>> hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list); >>> + >>> alias_free_mem_rcu(fa); >>> fib_release_info(fi_drop); >> >> This looks like the replace case: It will need a netdev_switch_fib_ipv4_del >> for fi_drop ? > It is the replace case. No del needed. Failure to replace in HW > results in replace not proceeding in SW and err return to user. So HW > and SW stay coherent and user can react to failure case. sure, If your driver can handle it. Am not sure if always doing a replace in hw will work best in all cases. I cant think of a definite testcase right now. AFAICT it becomes necessary to indicate a replace for sure in the IPv6 multipath case. > >>> @@ -1197,12 +1208,18 @@ int fib_table_insert(struct fib_table *tb, struct >>> fib_config *cfg) >>> new_fa->fa_state = 0; >>> new_fa->fa_slen = slen; >>> + /* (Optionally) offload fib entry to switch hardware. */ >>> + err = netdev_switch_fib_ipv4_add(key, plen, fi, tos, >>> + cfg->fc_type, tb->tb_id); >> >> This could be an NLM_F_APPEND case. Would be better for the switchdev API to >> also take >> nlflags as argument, to inform the switch driver of replace and append >> cases. > No, we can keep it simple. The ndo add op is used for adds or mods. > The driver/device will know if operation is a mod vs. add if object > exists (it's an add if object doesn't exist). Again, this will become apparent in the ipv6 multipath route case when you come to it. The append flag is required to insert at tail. Thanks, Roopa ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 2/4] net: add IPv4 routing FIB support for switchdev 2015-03-02 10:06 ` [PATCH net-next v2 2/4] net: add IPv4 routing FIB support for switchdev sfeldma 2015-03-02 14:30 ` roopa @ 2015-03-02 22:27 ` Samudrala, Sridhar 2015-03-02 22:31 ` David Miller 1 sibling, 1 reply; 16+ messages in thread From: Samudrala, Sridhar @ 2015-03-02 22:27 UTC (permalink / raw) To: sfeldma, netdev, davem, jiri, roopa On 3/2/2015 2:06 AM, sfeldma@gmail.com wrote: > From: Scott Feldman <sfeldma@gmail.com> > > Add two new ndo ops (ndo_switch_fib_ipv4_add/del) for switchdev devices > capable of offloading IPv4 L3 routing function from the kernel. The ops are > called by the core IPv4 FIB code when installing/removing/modifying FIB entries > in the kernel FIB. On install/modify, the driver should return 0 if FIB entry > (route) can be installed/modified to device; -EOPNOTSUPP if route cannot be > installed/modified due to device limitations; and any other negative error code > on failure to install route to device. On failure error code, the route is not > installed to device, and not installed in kernel FIB, and the return code is > propagated back to the user-space caller (via netlink). An -EOPNOTSUPP error > code is skipped for the device but installed in the kernel FIB. > > The FIB entry (route) nexthop list is used to find the switchdev netdev to > anchor the ndo op call. The route's fib_dev (the first nexthop's dev) is used > find the switchdev netdev by recursively traversing the fib_dev's lower_dev > list until a switchdev netdev is found. The ndo op is called on this switchdev > netdev. This downward traversal is necessary for switchdev ports stacked under > bonds and/or bridges, where the bond or bridge has the L3 interface. > > Thw switchdev driver can monitor netevent notifier NETEVENT_NEIGH_UPDATE to > know neighbor IP addresses which are resolved to a MAC address. In the case > where the route's nexthops list contains unresolved neighbor IP addresses, the > driver can ask the kernel to resolve the neighbor. As route nexthops are > resolved, the driver has enough information to program the device for > L3 forwarding offload. > > Signed-off-by: Scott Feldman <sfeldma@gmail.com> > Signed-off-by: Jiri Pirko <jiri@resnulli.us> > --- > include/linux/netdevice.h | 22 +++++++++++ > include/net/switchdev.h | 19 +++++++++ > net/ipv4/fib_trie.c | 33 ++++++++++++++-- > net/switchdev/switchdev.c | 95 +++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 166 insertions(+), 3 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 5897b4e..73b2766 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -769,6 +769,8 @@ struct netdev_phys_item_id { > typedef u16 (*select_queue_fallback_t)(struct net_device *dev, > struct sk_buff *skb); > > +struct fib_info; > + > /* > * This structure defines the management hooks for network devices. > * The following hooks can be defined; unless noted otherwise, they are > @@ -1032,6 +1034,14 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev, > * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state); > * Called to notify switch device port of bridge port STP > * state change. > + * int (*ndo_sw_parent_fib_ipv4_add)(struct net_device *dev, __be32 dst, > + * int dst_len, struct fib_info *fi, > + * u8 tos, u8 type, u32 tb_id); > + * Called to add/modify IPv4 route to switch device. > + * int (*ndo_sw_parent_fib_ipv4_del)(struct net_device *dev, __be32 dst, > + * int dst_len, struct fib_info *fi, > + * u8 tos, u8 type, u32 tb_id); > + * Called to delete IPv4 route from switch device. > */ > struct net_device_ops { > int (*ndo_init)(struct net_device *dev); > @@ -1193,6 +1203,18 @@ struct net_device_ops { > struct netdev_phys_item_id *psid); > int (*ndo_switch_port_stp_update)(struct net_device *dev, > u8 state); > + int (*ndo_switch_fib_ipv4_add)(struct net_device *dev, > + __be32 dst, > + int dst_len, > + struct fib_info *fi, > + u8 tos, u8 type, > + u32 tb_id); > + int (*ndo_switch_fib_ipv4_del)(struct net_device *dev, > + __be32 dst, > + int dst_len, > + struct fib_info *fi, > + u8 tos, u8 type, > + u32 tb_id); > #endif > }; > Don't we need ndo's to offload adding/deleting neighbor table entries and router interface or gateway addresses? Is it expected that the switch driver can program the hardware in response to the neighbor update events? Thanks Sridhar ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 2/4] net: add IPv4 routing FIB support for switchdev 2015-03-02 22:27 ` Samudrala, Sridhar @ 2015-03-02 22:31 ` David Miller 0 siblings, 0 replies; 16+ messages in thread From: David Miller @ 2015-03-02 22:31 UTC (permalink / raw) To: sridhar.samudrala; +Cc: sfeldma, netdev, jiri, roopa From: "Samudrala, Sridhar" <sridhar.samudrala@intel.com> Date: Mon, 02 Mar 2015 14:27:04 -0800 > Don't we need ndo's to offload adding/deleting neighbor table entries > and router interface or gateway addresses? > Is it expected that the switch driver can program the hardware in > response to the neighbor update events? The neighbour layer is used to translate IPv4 nexthop addresses into MAC addresses. The neighbour layer sends notifications when neigh entries change, and that is what the driver listens to in order to reprogram the hardware when necessary. The rocker driver implementation of these callbacks should have made this clear. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH net-next v2 3/4] rocker: implement IPv4 fib offloading 2015-03-02 10:06 [PATCH net-next v2 0/4] switchdev: add IPv4 routing offload sfeldma 2015-03-02 10:06 ` [PATCH net-next v2 1/4] rtnetlink: add RTNH_F_EXTERNAL flag for fib offload sfeldma 2015-03-02 10:06 ` [PATCH net-next v2 2/4] net: add IPv4 routing FIB support for switchdev sfeldma @ 2015-03-02 10:06 ` sfeldma 2015-03-02 10:06 ` [PATCH net-next v2 4/4] switchdev: don't support custom ip rules, for now sfeldma 3 siblings, 0 replies; 16+ messages in thread From: sfeldma @ 2015-03-02 10:06 UTC (permalink / raw) To: netdev, davem, jiri, roopa From: Scott Feldman <sfeldma@gmail.com> The driver implements ndo_switch_fib_ipv4_add/del ops to add/del/mod IPv4 routes to/from switchdev device. Once a route is added to the device, and the route's nexthops are resolved to neighbor MAC address, the device will forward matching pkts rather than the kernel. This offloads the L3 forwarding path from the kernel to the device. Note that control and management planes are still mananged by Linux; only the data plane is offloaded. Standard routing control protocols such as OSPF and BGP run on Linux and manage the kernel's FIB via standard rtm netlink msgs...nothing changes here. A new hash table is added to rocker to track neighbors. The driver listens for neighbor updates events using netevent notifier NETEVENT_NEIGH_UPDATE. Any ARP table updates for ports on this device are recorded in this table. Routes installed to the device with nexthops that reference neighbors in this table are "qualified". In the case of a route with nexthops not resolved in the table, the kernel is asked to resolve the nexthop. The driver uses fib_info->fib_priority for the priority field in rocker's unicast routing table. The device can only forward to pkts matching route dst to resolved nexthops. Currently, the device only supports single-path routes (i.e. routes with one nexthop). Equal Cost Multipath (ECMP) route support will be added in followup patches. This patch is driver support for unicast IPv4 routing only. Followup patches will add driver and infrastructure for IPv6 routing and multicast routing. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> --- drivers/net/ethernet/rocker/rocker.c | 517 +++++++++++++++++++++++++++++++--- net/ipv4/fib_trie.c | 2 +- 2 files changed, 473 insertions(+), 46 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index e5a15a4..f8d8a54 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -32,6 +32,9 @@ #include <linux/bitops.h> #include <net/switchdev.h> #include <net/rtnetlink.h> +#include <net/ip_fib.h> +#include <net/netevent.h> +#include <net/arp.h> #include <asm-generic/io-64-nonatomic-lo-hi.h> #include <generated/utsrelease.h> @@ -111,9 +114,10 @@ struct rocker_flow_tbl_key { struct rocker_flow_tbl_entry { struct hlist_node entry; - u32 ref_count; + u32 cmd; u64 cookie; struct rocker_flow_tbl_key key; + size_t key_len; u32 key_crc32; /* key */ }; @@ -161,6 +165,16 @@ struct rocker_internal_vlan_tbl_entry { __be16 vlan_id; }; +struct rocker_neigh_tbl_entry { + struct hlist_node entry; + __be32 ip_addr; /* key */ + struct net_device *dev; + u32 ref_count; + u32 index; + u8 eth_dst[ETH_ALEN]; + bool ttl_check; +}; + struct rocker_desc_info { char *data; /* mapped */ size_t data_size; @@ -234,6 +248,9 @@ struct rocker { unsigned long internal_vlan_bitmap[ROCKER_INTERNAL_VLAN_BITMAP_LEN]; DECLARE_HASHTABLE(internal_vlan_tbl, 8); spinlock_t internal_vlan_tbl_lock; + DECLARE_HASHTABLE(neigh_tbl, 16); + spinlock_t neigh_tbl_lock; + u32 neigh_tbl_next_index; }; static const u8 zero_mac[ETH_ALEN] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }; @@ -256,7 +273,6 @@ enum { ROCKER_PRIORITY_VLAN = 1, ROCKER_PRIORITY_TERM_MAC_UCAST = 0, ROCKER_PRIORITY_TERM_MAC_MCAST = 1, - ROCKER_PRIORITY_UNICAST_ROUTING = 1, ROCKER_PRIORITY_BRIDGING_VLAN_DFLT_EXACT = 1, ROCKER_PRIORITY_BRIDGING_VLAN_DFLT_WILD = 2, ROCKER_PRIORITY_BRIDGING_VLAN = 3, @@ -1940,8 +1956,7 @@ static int rocker_cmd_flow_tbl_add(struct rocker *rocker, struct rocker_tlv *cmd_info; int err = 0; - if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE, - ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_ADD)) + if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE, entry->cmd)) return -EMSGSIZE; cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO); if (!cmd_info) @@ -1998,8 +2013,7 @@ static int rocker_cmd_flow_tbl_del(struct rocker *rocker, const struct rocker_flow_tbl_entry *entry = priv; struct rocker_tlv *cmd_info; - if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE, - ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_DEL)) + if (rocker_tlv_put_u16(desc_info, ROCKER_TLV_CMD_TYPE, entry->cmd)) return -EMSGSIZE; cmd_info = rocker_tlv_nest_start(desc_info, ROCKER_TLV_CMD_INFO); if (!cmd_info) @@ -2168,9 +2182,9 @@ static int rocker_cmd_group_tbl_del(struct rocker *rocker, return 0; } -/***************************************** - * Flow, group, FDB, internal VLAN tables - *****************************************/ +/*************************************************** + * Flow, group, FDB, internal VLAN and neigh tables + ***************************************************/ static int rocker_init_tbls(struct rocker *rocker) { @@ -2186,6 +2200,9 @@ static int rocker_init_tbls(struct rocker *rocker) hash_init(rocker->internal_vlan_tbl); spin_lock_init(&rocker->internal_vlan_tbl_lock); + hash_init(rocker->neigh_tbl); + spin_lock_init(&rocker->neigh_tbl_lock); + return 0; } @@ -2196,6 +2213,7 @@ static void rocker_free_tbls(struct rocker *rocker) struct rocker_group_tbl_entry *group_entry; struct rocker_fdb_tbl_entry *fdb_entry; struct rocker_internal_vlan_tbl_entry *internal_vlan_entry; + struct rocker_neigh_tbl_entry *neigh_entry; struct hlist_node *tmp; int bkt; @@ -2219,16 +2237,22 @@ static void rocker_free_tbls(struct rocker *rocker) tmp, internal_vlan_entry, entry) hash_del(&internal_vlan_entry->entry); spin_unlock_irqrestore(&rocker->internal_vlan_tbl_lock, flags); + + spin_lock_irqsave(&rocker->neigh_tbl_lock, flags); + hash_for_each_safe(rocker->neigh_tbl, bkt, tmp, neigh_entry, entry) + hash_del(&neigh_entry->entry); + spin_unlock_irqrestore(&rocker->neigh_tbl_lock, flags); } static struct rocker_flow_tbl_entry * rocker_flow_tbl_find(struct rocker *rocker, struct rocker_flow_tbl_entry *match) { struct rocker_flow_tbl_entry *found; + size_t key_len = match->key_len ? match->key_len : sizeof(found->key); hash_for_each_possible(rocker->flow_tbl, found, entry, match->key_crc32) { - if (memcmp(&found->key, &match->key, sizeof(found->key)) == 0) + if (memcmp(&found->key, &match->key, key_len) == 0) return found; } @@ -2241,42 +2265,34 @@ static int rocker_flow_tbl_add(struct rocker_port *rocker_port, { struct rocker *rocker = rocker_port->rocker; struct rocker_flow_tbl_entry *found; + size_t key_len = match->key_len ? match->key_len : sizeof(found->key); unsigned long flags; - bool add_to_hw = false; - int err = 0; - match->key_crc32 = crc32(~0, &match->key, sizeof(match->key)); + match->key_crc32 = crc32(~0, &match->key, key_len); spin_lock_irqsave(&rocker->flow_tbl_lock, flags); found = rocker_flow_tbl_find(rocker, match); if (found) { - kfree(match); + match->cookie = found->cookie; + hash_del(&found->entry); + kfree(found); + found = match; + found->cmd = ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_MOD; } else { found = match; found->cookie = rocker->flow_tbl_next_cookie++; - hash_add(rocker->flow_tbl, &found->entry, found->key_crc32); - add_to_hw = true; + found->cmd = ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_ADD; } - found->ref_count++; + hash_add(rocker->flow_tbl, &found->entry, found->key_crc32); spin_unlock_irqrestore(&rocker->flow_tbl_lock, flags); - if (add_to_hw) { - err = rocker_cmd_exec(rocker, rocker_port, - rocker_cmd_flow_tbl_add, - found, NULL, NULL, nowait); - if (err) { - spin_lock_irqsave(&rocker->flow_tbl_lock, flags); - hash_del(&found->entry); - spin_unlock_irqrestore(&rocker->flow_tbl_lock, flags); - kfree(found); - } - } - - return err; + return rocker_cmd_exec(rocker, rocker_port, + rocker_cmd_flow_tbl_add, + found, NULL, NULL, nowait); } static int rocker_flow_tbl_del(struct rocker_port *rocker_port, @@ -2285,29 +2301,26 @@ static int rocker_flow_tbl_del(struct rocker_port *rocker_port, { struct rocker *rocker = rocker_port->rocker; struct rocker_flow_tbl_entry *found; + size_t key_len = match->key_len ? match->key_len : sizeof(found->key); unsigned long flags; - bool del_from_hw = false; int err = 0; - match->key_crc32 = crc32(~0, &match->key, sizeof(match->key)); + match->key_crc32 = crc32(~0, &match->key, key_len); spin_lock_irqsave(&rocker->flow_tbl_lock, flags); found = rocker_flow_tbl_find(rocker, match); if (found) { - found->ref_count--; - if (found->ref_count == 0) { - hash_del(&found->entry); - del_from_hw = true; - } + hash_del(&found->entry); + found->cmd = ROCKER_TLV_CMD_TYPE_OF_DPA_FLOW_DEL; } spin_unlock_irqrestore(&rocker->flow_tbl_lock, flags); kfree(match); - if (del_from_hw) { + if (found) { err = rocker_cmd_exec(rocker, rocker_port, rocker_cmd_flow_tbl_del, found, NULL, NULL, nowait); @@ -2467,6 +2480,31 @@ static int rocker_flow_tbl_bridge(struct rocker_port *rocker_port, return rocker_flow_tbl_do(rocker_port, flags, entry); } +static int rocker_flow_tbl_ucast4_routing(struct rocker_port *rocker_port, + __be16 eth_type, __be32 dst, + __be32 dst_mask, u32 priority, + enum rocker_of_dpa_table_id goto_tbl, + u32 group_id, int flags) +{ + struct rocker_flow_tbl_entry *entry; + + entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags)); + if (!entry) + return -ENOMEM; + + entry->key.tbl_id = ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING; + entry->key.priority = priority; + entry->key.ucast_routing.eth_type = eth_type; + entry->key.ucast_routing.dst4 = dst; + entry->key.ucast_routing.dst4_mask = dst_mask; + entry->key.ucast_routing.goto_tbl = goto_tbl; + entry->key.ucast_routing.group_id = group_id; + entry->key_len = offsetof(struct rocker_flow_tbl_key, + ucast_routing.group_id); + + return rocker_flow_tbl_do(rocker_port, flags, entry); +} + static int rocker_flow_tbl_acl(struct rocker_port *rocker_port, int flags, u32 in_pport, u32 in_pport_mask, @@ -2554,7 +2592,6 @@ static int rocker_group_tbl_add(struct rocker_port *rocker_port, struct rocker *rocker = rocker_port->rocker; struct rocker_group_tbl_entry *found; unsigned long flags; - int err = 0; spin_lock_irqsave(&rocker->group_tbl_lock, flags); @@ -2574,12 +2611,9 @@ static int rocker_group_tbl_add(struct rocker_port *rocker_port, spin_unlock_irqrestore(&rocker->group_tbl_lock, flags); - if (found->cmd) - err = rocker_cmd_exec(rocker, rocker_port, - rocker_cmd_group_tbl_add, - found, NULL, NULL, nowait); - - return err; + return rocker_cmd_exec(rocker, rocker_port, + rocker_cmd_group_tbl_add, + found, NULL, NULL, nowait); } static int rocker_group_tbl_del(struct rocker_port *rocker_port, @@ -2675,6 +2709,244 @@ static int rocker_group_l2_flood(struct rocker_port *rocker_port, group_id); } +static int rocker_group_l3_unicast(struct rocker_port *rocker_port, + int flags, u32 index, u8 *src_mac, + u8 *dst_mac, __be16 vlan_id, + bool ttl_check, u32 pport) +{ + struct rocker_group_tbl_entry *entry; + + entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags)); + if (!entry) + return -ENOMEM; + + entry->group_id = ROCKER_GROUP_L3_UNICAST(index); + if (src_mac) + ether_addr_copy(entry->l3_unicast.eth_src, src_mac); + if (dst_mac) + ether_addr_copy(entry->l3_unicast.eth_dst, dst_mac); + entry->l3_unicast.vlan_id = vlan_id; + entry->l3_unicast.ttl_check = ttl_check; + entry->l3_unicast.group_id = ROCKER_GROUP_L2_INTERFACE(vlan_id, pport); + + return rocker_group_tbl_do(rocker_port, flags, entry); +} + +static struct rocker_neigh_tbl_entry * + rocker_neigh_tbl_find(struct rocker *rocker, __be32 ip_addr) +{ + struct rocker_neigh_tbl_entry *found; + + hash_for_each_possible(rocker->neigh_tbl, found, entry, ip_addr) + if (found->ip_addr == ip_addr) + return found; + + return NULL; +} + +static void _rocker_neigh_add(struct rocker *rocker, + struct rocker_neigh_tbl_entry *entry) +{ + entry->index = rocker->neigh_tbl_next_index++; + entry->ref_count++; + hash_add(rocker->neigh_tbl, &entry->entry, entry->ip_addr); +} + +static void _rocker_neigh_del(struct rocker *rocker, + struct rocker_neigh_tbl_entry *entry) +{ + if (--entry->ref_count == 0) { + hash_del(&entry->entry); + kfree(entry); + } +} + +static void _rocker_neigh_update(struct rocker *rocker, + struct rocker_neigh_tbl_entry *entry, + u8 *eth_dst, bool ttl_check) +{ + if (eth_dst) { + ether_addr_copy(entry->eth_dst, eth_dst); + entry->ttl_check = ttl_check; + } else { + entry->ref_count++; + } +} + +static int rocker_port_ipv4_neigh(struct rocker_port *rocker_port, + int flags, __be32 ip_addr, u8 *eth_dst) +{ + struct rocker *rocker = rocker_port->rocker; + struct rocker_neigh_tbl_entry *entry; + struct rocker_neigh_tbl_entry *found; + unsigned long lock_flags; + __be16 eth_type = htons(ETH_P_IP); + enum rocker_of_dpa_table_id goto_tbl = + ROCKER_OF_DPA_TABLE_ID_ACL_POLICY; + u32 group_id; + u32 priority = 0; + bool adding = !(flags & ROCKER_OP_FLAG_REMOVE); + bool updating; + bool removing; + int err = 0; + + entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags)); + if (!entry) + return -ENOMEM; + + spin_lock_irqsave(&rocker->neigh_tbl_lock, lock_flags); + + found = rocker_neigh_tbl_find(rocker, ip_addr); + + updating = found && adding; + removing = found && !adding; + adding = !found && adding; + + if (adding) { + entry->ip_addr = ip_addr; + entry->dev = rocker_port->dev; + ether_addr_copy(entry->eth_dst, eth_dst); + entry->ttl_check = true; + _rocker_neigh_add(rocker, entry); + } else if (removing) { + memcpy(entry, found, sizeof(*entry)); + _rocker_neigh_del(rocker, found); + } else if (updating) { + _rocker_neigh_update(rocker, found, eth_dst, true); + memcpy(entry, found, sizeof(*entry)); + } else { + err = -ENOENT; + } + + spin_unlock_irqrestore(&rocker->neigh_tbl_lock, lock_flags); + + if (err) + goto err_out; + + /* For each active neighbor, we have an L3 unicast group and + * a /32 route to the neighbor, which uses the L3 unicast + * group. The L3 unicast group can also be referred to by + * other routes' nexthops. + */ + + err = rocker_group_l3_unicast(rocker_port, flags, + entry->index, + rocker_port->dev->dev_addr, + entry->eth_dst, + rocker_port->internal_vlan_id, + entry->ttl_check, + rocker_port->pport); + if (err) { + netdev_err(rocker_port->dev, + "Error (%d) L3 unicast group index %d\n", + err, entry->index); + goto err_out; + } + + if (adding || removing) { + group_id = ROCKER_GROUP_L3_UNICAST(entry->index); + err = rocker_flow_tbl_ucast4_routing(rocker_port, + eth_type, ip_addr, + inet_make_mask(32), + priority, goto_tbl, + group_id, flags); + + if (err) + netdev_err(rocker_port->dev, + "Error (%d) /32 unicast route %pI4 group 0x%08x\n", + err, &entry->ip_addr, group_id); + } + +err_out: + if (!adding) + kfree(entry); + + return err; +} + +static int rocker_port_ipv4_resolve(struct rocker_port *rocker_port, + __be32 ip_addr) +{ + struct net_device *dev = rocker_port->dev; + struct neighbour *n = __ipv4_neigh_lookup(dev, ip_addr); + int err = 0; + + if (!n) + n = neigh_create(&arp_tbl, &ip_addr, dev); + if (!n) + return -ENOMEM; + + /* If the neigh is already resolved, then go ahead and + * install the entry, otherwise start the ARP process to + * resolve the neigh. + */ + + if (n->nud_state & NUD_VALID) + err = rocker_port_ipv4_neigh(rocker_port, 0, ip_addr, n->ha); + else + neigh_event_send(n, NULL); + + return err; +} + +static int rocker_port_ipv4_nh(struct rocker_port *rocker_port, int flags, + __be32 ip_addr, u32 *index) +{ + struct rocker *rocker = rocker_port->rocker; + struct rocker_neigh_tbl_entry *entry; + struct rocker_neigh_tbl_entry *found; + unsigned long lock_flags; + bool adding = !(flags & ROCKER_OP_FLAG_REMOVE); + bool updating; + bool removing; + bool resolved = true; + int err = 0; + + entry = kzalloc(sizeof(*entry), rocker_op_flags_gfp(flags)); + if (!entry) + return -ENOMEM; + + spin_lock_irqsave(&rocker->neigh_tbl_lock, lock_flags); + + found = rocker_neigh_tbl_find(rocker, ip_addr); + if (found) + *index = found->index; + + updating = found && adding; + removing = found && !adding; + adding = !found && adding; + + if (adding) { + entry->ip_addr = ip_addr; + entry->dev = rocker_port->dev; + _rocker_neigh_add(rocker, entry); + *index = entry->index; + resolved = false; + } else if (removing) { + _rocker_neigh_del(rocker, found); + } else if (updating) { + _rocker_neigh_update(rocker, found, NULL, false); + resolved = !is_zero_ether_addr(found->eth_dst); + } else { + err = -ENOENT; + } + + spin_unlock_irqrestore(&rocker->neigh_tbl_lock, lock_flags); + + if (!adding) + kfree(entry); + + if (err) + return err; + + /* Resolved means neigh ip_addr is resolved to neigh mac. */ + + if (!resolved) + err = rocker_port_ipv4_resolve(rocker_port, ip_addr); + + return err; +} + static int rocker_port_vlan_flood_group(struct rocker_port *rocker_port, int flags, __be16 vlan_id) { @@ -3429,6 +3701,84 @@ not_found: spin_unlock_irqrestore(&rocker->internal_vlan_tbl_lock, lock_flags); } +static int rocker_port_fib_ipv4(struct rocker_port *rocker_port, __be32 dst, + int dst_len, struct fib_info *fi, u32 tb_id, + int flags) +{ + struct fib_nh *nh; + __be16 eth_type = htons(ETH_P_IP); + __be32 dst_mask = inet_make_mask(dst_len); + __be16 internal_vlan_id = rocker_port->internal_vlan_id; + u32 priority = fi->fib_priority; + enum rocker_of_dpa_table_id goto_tbl = + ROCKER_OF_DPA_TABLE_ID_ACL_POLICY; + u32 group_id; + bool nh_on_port; + bool has_gw; + u32 index; + int err; + + /* XXX support ECMP */ + + nh = fi->fib_nh; + nh_on_port = (fi->fib_dev == rocker_port->dev); + has_gw = !!nh->nh_gw; + + if (has_gw && nh_on_port) { + err = rocker_port_ipv4_nh(rocker_port, flags, + nh->nh_gw, &index); + if (err) + return err; + + group_id = ROCKER_GROUP_L3_UNICAST(index); + } else { + /* Send to CPU for processing */ + group_id = ROCKER_GROUP_L2_INTERFACE(internal_vlan_id, 0); + } + + err = rocker_flow_tbl_ucast4_routing(rocker_port, eth_type, dst, + dst_mask, priority, goto_tbl, + group_id, flags); + if (err) + netdev_err(rocker_port->dev, "Error (%d) IPv4 route %pI4\n", + err, &dst); + + return err; +} + +static int rocker_port_fib_ipv4_skip(struct net_device *dev, + __be32 dst, int dst_len, + struct fib_info *fi, + u8 tos, u8 type, u32 tb_id) +{ + if (fi->fib_flags & RTM_F_CLONED) + return -EOPNOTSUPP; + + if (tb_id != RT_TABLE_MAIN && tb_id != RT_TABLE_LOCAL) + return -EOPNOTSUPP; + + if (type != RTN_UNICAST && type != RTN_BLACKHOLE && + type != RTN_UNREACHABLE && type != RTN_LOCAL && + type != RTN_BROADCAST) + return -EOPNOTSUPP; + + if (tb_id == RT_TABLE_MAIN && type != RTN_UNICAST && + type != RTN_BLACKHOLE && type != RTN_UNREACHABLE) + return -EOPNOTSUPP; + + if (tos != 0) + return -EOPNOTSUPP; + + if (ipv4_is_loopback(dst)) + return -EOPNOTSUPP; + + /* XXX not handling ECMP right now */ + if (fi->fib_nhs != 1) + return -EOPNOTSUPP; + + return 0; +} + /***************** * Net device ops *****************/ @@ -3830,6 +4180,36 @@ static int rocker_port_switch_port_stp_update(struct net_device *dev, u8 state) return rocker_port_stp_update(rocker_port, state); } +static int rocker_port_switch_fib_ipv4_add(struct net_device *dev, + __be32 dst, int dst_len, + struct fib_info *fi, + u8 tos, u8 type, u32 tb_id) +{ + struct rocker_port *rocker_port = netdev_priv(dev); + int flags = 0; + int err; + + err = rocker_port_fib_ipv4_skip(dev, dst, dst_len, fi, + tos, type, tb_id); + if (err) + return err; + + return rocker_port_fib_ipv4(rocker_port, dst, dst_len, + fi, tb_id, flags); +} + +static int rocker_port_switch_fib_ipv4_del(struct net_device *dev, + __be32 dst, int dst_len, + struct fib_info *fi, + u8 tos, u8 type, u32 tb_id) +{ + struct rocker_port *rocker_port = netdev_priv(dev); + int flags = ROCKER_OP_FLAG_REMOVE; + + return rocker_port_fib_ipv4(rocker_port, dst, dst_len, + fi, tb_id, flags); +} + static const struct net_device_ops rocker_port_netdev_ops = { .ndo_open = rocker_port_open, .ndo_stop = rocker_port_stop, @@ -3844,6 +4224,8 @@ static const struct net_device_ops rocker_port_netdev_ops = { .ndo_bridge_getlink = rocker_port_bridge_getlink, .ndo_switch_parent_id_get = rocker_port_switch_parent_id_get, .ndo_switch_port_stp_update = rocker_port_switch_port_stp_update, + .ndo_switch_fib_ipv4_add = rocker_port_switch_fib_ipv4_add, + .ndo_switch_fib_ipv4_del = rocker_port_switch_fib_ipv4_del, }; /******************** @@ -4544,6 +4926,48 @@ static struct notifier_block rocker_netdevice_nb __read_mostly = { .notifier_call = rocker_netdevice_event, }; +/************************************ + * Net event notifier event handler + ************************************/ + +static int rocker_neigh_update(struct net_device *dev, struct neighbour *n) +{ + struct rocker_port *rocker_port = netdev_priv(dev); + int flags = (n->nud_state & NUD_VALID) ? 0 : ROCKER_OP_FLAG_REMOVE; + __be32 ip_addr = *(__be32 *)n->primary_key; + + return rocker_port_ipv4_neigh(rocker_port, flags, ip_addr, n->ha); +} + +static int rocker_netevent_event(struct notifier_block *unused, + unsigned long event, void *ptr) +{ + struct net_device *dev; + struct neighbour *n = ptr; + int err; + + switch (event) { + case NETEVENT_NEIGH_UPDATE: + if (n->tbl != &arp_tbl) + return NOTIFY_DONE; + dev = n->dev; + if (!rocker_port_dev_check(dev)) + return NOTIFY_DONE; + err = rocker_neigh_update(dev, n); + if (err) + netdev_warn(dev, + "failed to handle neigh update (err %d)\n", + err); + break; + } + + return NOTIFY_DONE; +} + +static struct notifier_block rocker_netevent_nb __read_mostly = { + .notifier_call = rocker_netevent_event, +}; + /*********************** * Module init and exit ***********************/ @@ -4553,18 +4977,21 @@ static int __init rocker_module_init(void) int err; register_netdevice_notifier(&rocker_netdevice_nb); + register_netevent_notifier(&rocker_netevent_nb); err = pci_register_driver(&rocker_pci_driver); if (err) goto err_pci_register_driver; return 0; err_pci_register_driver: + unregister_netdevice_notifier(&rocker_netevent_nb); unregister_netdevice_notifier(&rocker_netdevice_nb); return err; } static void __exit rocker_module_exit(void) { + unregister_netevent_notifier(&rocker_netevent_nb); unregister_netdevice_notifier(&rocker_netdevice_nb); pci_unregister_driver(&rocker_pci_driver); } diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index b834e9c..90cd812 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1166,7 +1166,7 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg) new_fa->fa_tos, cfg->fc_type, tb->tb_id); - if (err && err != -EOPNOTSUPP) { + if (err) { kmem_cache_free(fn_alias_kmem, new_fa); goto out; } -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v2 4/4] switchdev: don't support custom ip rules, for now 2015-03-02 10:06 [PATCH net-next v2 0/4] switchdev: add IPv4 routing offload sfeldma ` (2 preceding siblings ...) 2015-03-02 10:06 ` [PATCH net-next v2 3/4] rocker: implement IPv4 fib offloading sfeldma @ 2015-03-02 10:06 ` sfeldma 2015-03-02 14:36 ` roopa 3 siblings, 1 reply; 16+ messages in thread From: sfeldma @ 2015-03-02 10:06 UTC (permalink / raw) To: netdev, davem, jiri, roopa From: Scott Feldman <sfeldma@gmail.com> Keep switchdev FIB offload model simple for now and don't allow custom ip rules. Signed-off-by: Scott Feldman <sfeldma@gmail.com> --- include/net/ip_fib.h | 2 ++ net/ipv4/fib_frontend.c | 13 +++++++++++++ net/ipv4/fib_rules.c | 3 +++ net/ipv4/fib_trie.c | 27 +++++++++++++++++++++++++++ net/switchdev/switchdev.c | 4 ++++ 5 files changed, 49 insertions(+) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index cba4b7c..894a75c 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -195,6 +195,7 @@ int fib_table_delete(struct fib_table *, struct fib_config *); int fib_table_dump(struct fib_table *table, struct sk_buff *skb, struct netlink_callback *cb); int fib_table_flush(struct fib_table *table); +void fib_table_flush_external(struct fib_table *table); void fib_free_table(struct fib_table *tb); @@ -294,6 +295,7 @@ static inline int fib_num_tclassid_users(struct net *net) return 0; } #endif +void fib_flush_external(struct net *net); /* Exported by fib_semantics.c */ int ip_fib_check_default(__be32 gw, struct net_device *dev); diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 57be71d..c33c19a 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -146,6 +146,19 @@ static void fib_flush(struct net *net) rt_cache_flush(net); } +void fib_flush_external(struct net *net) +{ + struct fib_table *tb; + struct hlist_head *head; + unsigned int h; + + for (h = 0; h < FIB_TABLE_HASHSZ; h++) { + head = &net->ipv4.fib_table_hash[h]; + hlist_for_each_entry(tb, head, tb_hlist) + fib_table_flush_external(tb); + } +} + /* * Find address type as if only "dev" was present in the system. If * on_dev is NULL then all interfaces are taken into consideration. diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c index d3db718..190d0d0 100644 --- a/net/ipv4/fib_rules.c +++ b/net/ipv4/fib_rules.c @@ -209,6 +209,8 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb, rule4->tos = frh->tos; net->ipv4.fib_has_custom_rules = true; + fib_flush_external(rule->fr_net); + err = 0; errout: return err; @@ -224,6 +226,7 @@ static void fib4_rule_delete(struct fib_rule *rule) net->ipv4.fib_num_tclassid_users--; #endif net->ipv4.fib_has_custom_rules = true; + fib_flush_external(rule->fr_net); } static int fib4_rule_compare(struct fib_rule *rule, struct fib_rule_hdr *frh, diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 90cd812..5a487a7 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1517,6 +1517,23 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg) return 0; } +static void trie_flush_leaf_external(struct fib_table *tb, struct tnode *l) +{ + struct hlist_node *tmp; + struct fib_alias *fa; + + hlist_for_each_entry_safe(fa, tmp, &l->leaf, fa_list) { + struct fib_info *fi = fa->fa_info; + + if (fi && (fi->fib_flags & RTNH_F_EXTERNAL)) { + netdev_switch_fib_ipv4_del(l->key, + KEYLENGTH - fa->fa_slen, + fi, fa->fa_tos, + fa->fa_type, tb->tb_id); + } + } +} + static int trie_flush_leaf(struct fib_table *tb, struct tnode *l) { struct hlist_node *tmp; @@ -1643,6 +1660,16 @@ int fib_table_flush(struct fib_table *tb) return found; } +/* Caller must hold RTNL */ +void fib_table_flush_external(struct fib_table *tb) +{ + struct trie *t = (struct trie *)tb->tb_data; + struct tnode *l; + + for (l = trie_firstleaf(t); l; l = trie_nextleaf(l)) + trie_flush_leaf_external(tb, l); +} + void fib_free_table(struct fib_table *tb) { #ifdef CONFIG_IP_FIB_TRIE_STATS diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index a84bdb4..9f87f3f 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -270,6 +270,10 @@ int netdev_switch_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi, const struct net_device_ops *ops; int err = -EOPNOTSUPP; + /* Don't offload route if using custom ip rules */ + if (fi->fib_net->ipv4.fib_has_custom_rules) + return -EOPNOTSUPP; + dev = netdev_switch_get_by_fib_dev(fi->fib_dev); if (!dev) return -EOPNOTSUPP; -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 4/4] switchdev: don't support custom ip rules, for now 2015-03-02 10:06 ` [PATCH net-next v2 4/4] switchdev: don't support custom ip rules, for now sfeldma @ 2015-03-02 14:36 ` roopa 2015-03-02 17:00 ` Scott Feldman 2015-03-02 20:10 ` David Miller 0 siblings, 2 replies; 16+ messages in thread From: roopa @ 2015-03-02 14:36 UTC (permalink / raw) To: sfeldma; +Cc: netdev, davem, jiri On 3/2/15, 2:06 AM, sfeldma@gmail.com wrote: > From: Scott Feldman <sfeldma@gmail.com> > > Keep switchdev FIB offload model simple for now and don't allow custom ip > rules. > > Signed-off-by: Scott Feldman <sfeldma@gmail.com> I don't see a need to do this. And seems very aggressive. Also, note that the rules in a system can be on non-hw accelerated ports. Until rules are hw accelerated, its ok to document this limitation for hw accelerated routes in my opinion. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 4/4] switchdev: don't support custom ip rules, for now 2015-03-02 14:36 ` roopa @ 2015-03-02 17:00 ` Scott Feldman 2015-03-02 19:09 ` roopa 2015-03-02 20:30 ` David Miller 2015-03-02 20:10 ` David Miller 1 sibling, 2 replies; 16+ messages in thread From: Scott Feldman @ 2015-03-02 17:00 UTC (permalink / raw) To: roopa; +Cc: Netdev, David S. Miller, Jiří Pírko On Mon, Mar 2, 2015 at 6:36 AM, roopa <roopa@cumulusnetworks.com> wrote: > On 3/2/15, 2:06 AM, sfeldma@gmail.com wrote: >> >> From: Scott Feldman <sfeldma@gmail.com> >> >> Keep switchdev FIB offload model simple for now and don't allow custom ip >> rules. >> >> Signed-off-by: Scott Feldman <sfeldma@gmail.com> > > > I don't see a need to do this. And seems very aggressive. > Also, note that the rules in a system can be on non-hw accelerated ports. It is aggressive but it's the safest choice for the first pass on this L3 offload. Without, switchdev has no way to model custom ip rules down to hardware. Let's start with this aggressive case, and that'll force us to work on relaxing it. > Until rules are hw accelerated, its ok to document this limitation for hw > accelerated routes in my opinion. > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 4/4] switchdev: don't support custom ip rules, for now 2015-03-02 17:00 ` Scott Feldman @ 2015-03-02 19:09 ` roopa 2015-03-02 20:40 ` David Miller 2015-03-02 20:30 ` David Miller 1 sibling, 1 reply; 16+ messages in thread From: roopa @ 2015-03-02 19:09 UTC (permalink / raw) To: Scott Feldman; +Cc: Netdev, David S. Miller, Jiří Pírko On 3/2/15, 9:00 AM, Scott Feldman wrote: > On Mon, Mar 2, 2015 at 6:36 AM, roopa <roopa@cumulusnetworks.com> wrote: >> On 3/2/15, 2:06 AM, sfeldma@gmail.com wrote: >>> From: Scott Feldman <sfeldma@gmail.com> >>> >>> Keep switchdev FIB offload model simple for now and don't allow custom ip >>> rules. >>> >>> Signed-off-by: Scott Feldman <sfeldma@gmail.com> >> >> I don't see a need to do this. And seems very aggressive. >> Also, note that the rules in a system can be on non-hw accelerated ports. > It is aggressive but it's the safest choice for the first pass on this > L3 offload. Without, switchdev has no way to model custom ip rules > down to hardware. Let's start with this aggressive case, and that'll > force us to work on relaxing it. > > But, this will not allow hw acceleration of routes even if you had a rule for the management traffic for example. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 4/4] switchdev: don't support custom ip rules, for now 2015-03-02 19:09 ` roopa @ 2015-03-02 20:40 ` David Miller 0 siblings, 0 replies; 16+ messages in thread From: David Miller @ 2015-03-02 20:40 UTC (permalink / raw) To: roopa; +Cc: sfeldma, netdev, jiri From: roopa <roopa@cumulusnetworks.com> Date: Mon, 02 Mar 2015 11:09:32 -0800 > On 3/2/15, 9:00 AM, Scott Feldman wrote: >> On Mon, Mar 2, 2015 at 6:36 AM, roopa <roopa@cumulusnetworks.com> >> wrote: >>> On 3/2/15, 2:06 AM, sfeldma@gmail.com wrote: >>>> From: Scott Feldman <sfeldma@gmail.com> >>>> >>>> Keep switchdev FIB offload model simple for now and don't allow custom >>>> ip >>>> rules. >>>> >>>> Signed-off-by: Scott Feldman <sfeldma@gmail.com> >>> >>> I don't see a need to do this. And seems very aggressive. >>> Also, note that the rules in a system can be on non-hw accelerated >>> ports. >> It is aggressive but it's the safest choice for the first pass on this >> L3 offload. Without, switchdev has no way to model custom ip rules >> down to hardware. Let's start with this aggressive case, and that'll >> force us to work on relaxing it. >> >> > But, this will not allow hw acceleration of routes even if you had a > rule for the management traffic for example. We start simple, add complexity later. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 4/4] switchdev: don't support custom ip rules, for now 2015-03-02 17:00 ` Scott Feldman 2015-03-02 19:09 ` roopa @ 2015-03-02 20:30 ` David Miller 1 sibling, 0 replies; 16+ messages in thread From: David Miller @ 2015-03-02 20:30 UTC (permalink / raw) To: sfeldma; +Cc: roopa, netdev, jiri From: Scott Feldman <sfeldma@gmail.com> Date: Mon, 2 Mar 2015 09:00:58 -0800 > On Mon, Mar 2, 2015 at 6:36 AM, roopa <roopa@cumulusnetworks.com> wrote: >> On 3/2/15, 2:06 AM, sfeldma@gmail.com wrote: >>> >>> From: Scott Feldman <sfeldma@gmail.com> >>> >>> Keep switchdev FIB offload model simple for now and don't allow custom ip >>> rules. >>> >>> Signed-off-by: Scott Feldman <sfeldma@gmail.com> >> >> >> I don't see a need to do this. And seems very aggressive. >> Also, note that the rules in a system can be on non-hw accelerated ports. > > It is aggressive but it's the safest choice for the first pass on this > L3 offload. Without, switchdev has no way to model custom ip rules > down to hardware. Let's start with this aggressive case, and that'll > force us to work on relaxing it. +1 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 4/4] switchdev: don't support custom ip rules, for now 2015-03-02 14:36 ` roopa 2015-03-02 17:00 ` Scott Feldman @ 2015-03-02 20:10 ` David Miller 1 sibling, 0 replies; 16+ messages in thread From: David Miller @ 2015-03-02 20:10 UTC (permalink / raw) To: roopa; +Cc: sfeldma, netdev, jiri From: roopa <roopa@cumulusnetworks.com> Date: Mon, 02 Mar 2015 06:36:15 -0800 > On 3/2/15, 2:06 AM, sfeldma@gmail.com wrote: >> From: Scott Feldman <sfeldma@gmail.com> >> >> Keep switchdev FIB offload model simple for now and don't allow custom >> ip >> rules. >> >> Signed-off-by: Scott Feldman <sfeldma@gmail.com> > > I don't see a need to do this. And seems very aggressive. > Also, note that the rules in a system can be on non-hw accelerated > ports. > > Until rules are hw accelerated, its ok to document this limitation for > hw accelerated routes in my opinion. No, we absolutely must start with a simple model that doesn't try at all to offload if any FIB rules are installed. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-03-02 22:31 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-03-02 10:06 [PATCH net-next v2 0/4] switchdev: add IPv4 routing offload sfeldma 2015-03-02 10:06 ` [PATCH net-next v2 1/4] rtnetlink: add RTNH_F_EXTERNAL flag for fib offload sfeldma 2015-03-02 10:06 ` [PATCH net-next v2 2/4] net: add IPv4 routing FIB support for switchdev sfeldma 2015-03-02 14:30 ` roopa 2015-03-02 17:10 ` Scott Feldman 2015-03-02 19:24 ` roopa 2015-03-02 22:27 ` Samudrala, Sridhar 2015-03-02 22:31 ` David Miller 2015-03-02 10:06 ` [PATCH net-next v2 3/4] rocker: implement IPv4 fib offloading sfeldma 2015-03-02 10:06 ` [PATCH net-next v2 4/4] switchdev: don't support custom ip rules, for now sfeldma 2015-03-02 14:36 ` roopa 2015-03-02 17:00 ` Scott Feldman 2015-03-02 19:09 ` roopa 2015-03-02 20:40 ` David Miller 2015-03-02 20:30 ` David Miller 2015-03-02 20:10 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).