Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next v2 08/10] net: sched: take reference to action dev before calling offloads
From: Vlad Buslov @ 2019-08-23 18:50 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko
In-Reply-To: <20190823185056.12536-1-vladbu@mellanox.com>

In order to remove dependency on rtnl lock when calling hardware offload
API, take reference to action mirred dev when initializing flow_action
structure in tc_setup_flow_action(). Implement function
tc_cleanup_flow_action(), use it to release the device after hardware
offload API is done using it.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h  |  2 ++
 net/sched/cls_api.c    | 32 ++++++++++++++++++++++++++++++++
 net/sched/cls_flower.c |  2 ++
 3 files changed, 36 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index a48824bc1489..e553fc80eb23 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -505,6 +505,8 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 
 int tc_setup_flow_action(struct flow_action *flow_action,
 			 const struct tcf_exts *exts, bool rtnl_held);
+void tc_cleanup_flow_action(struct flow_action *flow_action);
+
 int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
 		     void *type_data, bool err_stop, bool rtnl_held);
 int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index cb835d581b77..622146aafb06 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3260,6 +3260,27 @@ int tc_setup_cb_reoffload(struct tcf_block *block, struct tcf_proto *tp,
 }
 EXPORT_SYMBOL(tc_setup_cb_reoffload);
 
+void tc_cleanup_flow_action(struct flow_action *flow_action)
+{
+	struct flow_action_entry *entry;
+	int i;
+
+	flow_action_for_each(i, entry, flow_action) {
+		switch (entry->id) {
+		case FLOW_ACTION_REDIRECT:
+		case FLOW_ACTION_MIRRED:
+		case FLOW_ACTION_REDIRECT_INGRESS:
+		case FLOW_ACTION_MIRRED_INGRESS:
+			if (entry->dev)
+				dev_put(entry->dev);
+			break;
+		default:
+			break;
+		}
+	}
+}
+EXPORT_SYMBOL(tc_cleanup_flow_action);
+
 int tc_setup_flow_action(struct flow_action *flow_action,
 			 const struct tcf_exts *exts, bool rtnl_held)
 {
@@ -3289,15 +3310,23 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 		} else if (is_tcf_mirred_egress_redirect(act)) {
 			entry->id = FLOW_ACTION_REDIRECT;
 			entry->dev = tcf_mirred_dev(act);
+			if (entry->dev)
+				dev_hold(entry->dev);
 		} else if (is_tcf_mirred_egress_mirror(act)) {
 			entry->id = FLOW_ACTION_MIRRED;
 			entry->dev = tcf_mirred_dev(act);
+			if (entry->dev)
+				dev_hold(entry->dev);
 		} else if (is_tcf_mirred_ingress_redirect(act)) {
 			entry->id = FLOW_ACTION_REDIRECT_INGRESS;
 			entry->dev = tcf_mirred_dev(act);
+			if (entry->dev)
+				dev_hold(entry->dev);
 		} else if (is_tcf_mirred_ingress_mirror(act)) {
 			entry->id = FLOW_ACTION_MIRRED_INGRESS;
 			entry->dev = tcf_mirred_dev(act);
+			if (entry->dev)
+				dev_hold(entry->dev);
 		} else if (is_tcf_vlan(act)) {
 			switch (tcf_vlan_action(act)) {
 			case TCA_VLAN_ACT_PUSH:
@@ -3405,6 +3434,9 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 	if (!rtnl_held)
 		rtnl_unlock();
 
+	if (err)
+		tc_cleanup_flow_action(flow_action);
+
 	return err;
 }
 EXPORT_SYMBOL(tc_setup_flow_action);
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index d47d4e84d4e5..df141a67c17f 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -465,6 +465,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 
 	err = tc_setup_cb_add(block, tp, TC_SETUP_CLSFLOWER, &cls_flower,
 			      skip_sw, &f->flags, &f->in_hw_count, true);
+	tc_cleanup_flow_action(&cls_flower.rule->action);
 	kfree(cls_flower.rule);
 
 	if (err) {
@@ -1838,6 +1839,7 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 					    TC_SETUP_CLSFLOWER, &cls_flower,
 					    cb_priv, &f->flags,
 					    &f->in_hw_count);
+		tc_cleanup_flow_action(&cls_flower.rule->action);
 		kfree(cls_flower.rule);
 
 		if (err && add && tc_skip_sw(f->flags)) {
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next v2 05/10] net: sched: add API for registering unlocked offload block callbacks
From: Vlad Buslov @ 2019-08-23 18:50 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko
In-Reply-To: <20190823185056.12536-1-vladbu@mellanox.com>

Extend struct flow_block_offload with "unlocked_driver_cb" flag to allow
registering and unregistering block hardware offload callbacks that do not
require caller to hold rtnl lock. Extend tcf_block with additional
lockeddevcnt counter that is incremented for each non-unlocked driver
callback attached to device. This counter is necessary to conditionally
obtain rtnl lock before calling hardware callbacks in following patches.

Register mlx5 tc block offload callbacks as "unlocked".

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 3 +++
 include/net/flow_offload.h                        | 1 +
 include/net/sch_generic.h                         | 1 +
 net/sched/cls_api.c                               | 6 ++++++
 5 files changed, 13 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index fa4bf2d4bcd4..8592b98d0e70 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3470,10 +3470,12 @@ static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			  void *type_data)
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct flow_block_offload *f = type_data;
 
 	switch (type) {
 #ifdef CONFIG_MLX5_ESWITCH
 	case TC_SETUP_BLOCK:
+		f->unlocked_driver_cb = true;
 		return flow_block_cb_setup_simple(type_data,
 						  &mlx5e_block_cb_list,
 						  mlx5e_setup_tc_block_cb,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 3c0d36b2b91c..e7ac6233037d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -763,6 +763,7 @@ mlx5e_rep_indr_setup_tc_block(struct net_device *netdev,
 	if (f->binder_type != FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		return -EOPNOTSUPP;
 
+	f->unlocked_driver_cb = true;
 	f->driver_block_list = &mlx5e_block_cb_list;
 
 	switch (f->command) {
@@ -1245,9 +1246,11 @@ static int mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			      void *type_data)
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct flow_block_offload *f = type_data;
 
 	switch (type) {
 	case TC_SETUP_BLOCK:
+		f->unlocked_driver_cb = true;
 		return flow_block_cb_setup_simple(type_data,
 						  &mlx5e_rep_block_cb_list,
 						  mlx5e_rep_setup_tc_cb,
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 757fa84de654..fc881875f856 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -284,6 +284,7 @@ struct flow_block_offload {
 	enum flow_block_command command;
 	enum flow_block_binder_type binder_type;
 	bool block_shared;
+	bool unlocked_driver_cb;
 	struct net *net;
 	struct flow_block *block;
 	struct list_head cb_list;
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index c4fbbaff30a2..43f5b7ed02bd 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -408,6 +408,7 @@ struct tcf_block {
 	bool keep_dst;
 	atomic_t offloadcnt; /* Number of oddloaded filters */
 	unsigned int nooffloaddevcnt; /* Number of devs unable to do offload */
+	unsigned int lockeddevcnt; /* Number of devs that require rtnl lock. */
 	struct {
 		struct tcf_chain *chain;
 		struct list_head filter_chain_list;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index dd16cf171f51..87954f5370a4 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1418,6 +1418,8 @@ static int tcf_block_bind(struct tcf_block *block,
 						  bo->extack);
 		if (err)
 			goto err_unroll;
+		if (!bo->unlocked_driver_cb)
+			block->lockeddevcnt++;
 
 		i++;
 	}
@@ -1433,6 +1435,8 @@ static int tcf_block_bind(struct tcf_block *block,
 						    block_cb->cb_priv, false,
 						    tcf_block_offload_in_use(block),
 						    NULL);
+			if (!bo->unlocked_driver_cb)
+				block->lockeddevcnt--;
 		}
 		flow_block_cb_free(block_cb);
 	}
@@ -1454,6 +1458,8 @@ static void tcf_block_unbind(struct tcf_block *block,
 					    NULL);
 		list_del(&block_cb->list);
 		flow_block_cb_free(block_cb);
+		if (!bo->unlocked_driver_cb)
+			block->lockeddevcnt--;
 	}
 }
 
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next v2 09/10] net: sched: copy tunnel info when setting flow_action entry->tunnel
From: Vlad Buslov @ 2019-08-23 18:50 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko
In-Reply-To: <20190823185056.12536-1-vladbu@mellanox.com>

In order to remove dependency on rtnl lock, modify tc_setup_flow_action()
to copy tunnel info, instead of just saving pointer to tunnel_key action
tunnel info. This is necessary to prevent concurrent action overwrite from
releasing tunnel info while it is being used by rtnl-unlocked driver.

Implement helper tcf_tunnel_info_copy() that is used to copy tunnel info
with all its options to dynamically allocated memory block. Modify
tc_cleanup_flow_action() to free dynamically allocated tunnel info.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/tc_act/tc_tunnel_key.h | 17 +++++++++++++++++
 net/sched/cls_api.c                |  9 ++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/net/tc_act/tc_tunnel_key.h b/include/net/tc_act/tc_tunnel_key.h
index 7c3f777c168c..0689d9bcdf84 100644
--- a/include/net/tc_act/tc_tunnel_key.h
+++ b/include/net/tc_act/tc_tunnel_key.h
@@ -59,4 +59,21 @@ static inline struct ip_tunnel_info *tcf_tunnel_info(const struct tc_action *a)
 	return NULL;
 #endif
 }
+
+static inline struct ip_tunnel_info *
+tcf_tunnel_info_copy(const struct tc_action *a)
+{
+#ifdef CONFIG_NET_CLS_ACT
+	struct ip_tunnel_info *tun = tcf_tunnel_info(a);
+
+	if (tun) {
+		size_t tun_size = sizeof(*tun) + tun->options_len;
+		struct ip_tunnel_info *tun_copy = kmemdup(tun, tun_size,
+							  GFP_KERNEL);
+
+		return tun_copy;
+	}
+#endif
+	return NULL;
+}
 #endif /* __NET_TC_TUNNEL_KEY_H */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 622146aafb06..c00c836cd155 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3274,6 +3274,9 @@ void tc_cleanup_flow_action(struct flow_action *flow_action)
 			if (entry->dev)
 				dev_put(entry->dev);
 			break;
+		case FLOW_ACTION_TUNNEL_ENCAP:
+			kfree(entry->tunnel);
+			break;
 		default:
 			break;
 		}
@@ -3350,7 +3353,11 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 			}
 		} else if (is_tcf_tunnel_set(act)) {
 			entry->id = FLOW_ACTION_TUNNEL_ENCAP;
-			entry->tunnel = tcf_tunnel_info(act);
+			entry->tunnel = tcf_tunnel_info_copy(act);
+			if (!entry->tunnel) {
+				err = -ENOMEM;
+				goto err_out;
+			}
 		} else if (is_tcf_tunnel_release(act)) {
 			entry->id = FLOW_ACTION_TUNNEL_DECAP;
 		} else if (is_tcf_pedit(act)) {
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next v2 10/10] net: sched: flower: don't take rtnl lock for cls hw offloads API
From: Vlad Buslov @ 2019-08-23 18:50 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko
In-Reply-To: <20190823185056.12536-1-vladbu@mellanox.com>

Don't manually take rtnl lock in flower classifier before calling cls
hardware offloads API. Instead, pass rtnl lock status via 'rtnl_held'
parameter.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 53 +++++++++++++-----------------------------
 1 file changed, 16 insertions(+), 37 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index df141a67c17f..fa1de7080c65 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -412,18 +412,13 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
 	struct tcf_block *block = tp->chain->block;
 	struct flow_cls_offload cls_flower = {};
 
-	if (!rtnl_held)
-		rtnl_lock();
-
 	tc_cls_common_offload_init(&cls_flower.common, tp, f->flags, extack);
 	cls_flower.command = FLOW_CLS_DESTROY;
 	cls_flower.cookie = (unsigned long) f;
 
 	tc_setup_cb_destroy(block, tp, TC_SETUP_CLSFLOWER, &cls_flower, false,
-			    &f->flags, &f->in_hw_count, true);
+			    &f->flags, &f->in_hw_count, rtnl_held);
 
-	if (!rtnl_held)
-		rtnl_unlock();
 }
 
 static int fl_hw_replace_filter(struct tcf_proto *tp,
@@ -435,14 +430,9 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 	bool skip_sw = tc_skip_sw(f->flags);
 	int err = 0;
 
-	if (!rtnl_held)
-		rtnl_lock();
-
 	cls_flower.rule = flow_rule_alloc(tcf_exts_num_actions(&f->exts));
-	if (!cls_flower.rule) {
-		err = -ENOMEM;
-		goto errout;
-	}
+	if (!cls_flower.rule)
+		return -ENOMEM;
 
 	tc_cls_common_offload_init(&cls_flower.common, tp, f->flags, extack);
 	cls_flower.command = FLOW_CLS_REPLACE;
@@ -453,36 +443,30 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 	cls_flower.classid = f->res.classid;
 
 	err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts,
-				   true);
+				   rtnl_held);
 	if (err) {
 		kfree(cls_flower.rule);
-		if (skip_sw)
+		if (skip_sw) {
 			NL_SET_ERR_MSG_MOD(extack, "Failed to setup flow action");
-		else
-			err = 0;
-		goto errout;
+			return err;
+		}
+		return 0;
 	}
 
 	err = tc_setup_cb_add(block, tp, TC_SETUP_CLSFLOWER, &cls_flower,
-			      skip_sw, &f->flags, &f->in_hw_count, true);
+			      skip_sw, &f->flags, &f->in_hw_count, rtnl_held);
 	tc_cleanup_flow_action(&cls_flower.rule->action);
 	kfree(cls_flower.rule);
 
 	if (err) {
-		fl_hw_destroy_filter(tp, f, true, NULL);
-		goto errout;
-	}
-
-	if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW)) {
-		err = -EINVAL;
-		goto errout;
+		fl_hw_destroy_filter(tp, f, rtnl_held, NULL);
+		return err;
 	}
 
-errout:
-	if (!rtnl_held)
-		rtnl_unlock();
+	if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW))
+		return -EINVAL;
 
-	return err;
+	return 0;
 }
 
 static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f,
@@ -491,22 +475,17 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f,
 	struct tcf_block *block = tp->chain->block;
 	struct flow_cls_offload cls_flower = {};
 
-	if (!rtnl_held)
-		rtnl_lock();
-
 	tc_cls_common_offload_init(&cls_flower.common, tp, f->flags, NULL);
 	cls_flower.command = FLOW_CLS_STATS;
 	cls_flower.cookie = (unsigned long) f;
 	cls_flower.classid = f->res.classid;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false, true);
+	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false,
+			 rtnl_held);
 
 	tcf_exts_stats_update(&f->exts, cls_flower.stats.bytes,
 			      cls_flower.stats.pkts,
 			      cls_flower.stats.lastused);
-
-	if (!rtnl_held)
-		rtnl_unlock();
 }
 
 static void __fl_put(struct cls_fl_filter *f)
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next v2 02/10] net: sched: change tcf block offload counter type to atomic_t
From: Vlad Buslov @ 2019-08-23 18:50 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko
In-Reply-To: <20190823185056.12536-1-vladbu@mellanox.com>

As a preparation for running proto ops functions without rtnl lock, change
offload counter type to atomic. This is necessary to allow updating the
counter by multiple concurrent users when offloading filters to hardware
from unlocked classifiers.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h | 7 ++++---
 net/sched/cls_api.c       | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a3eaf5f9d28f..d778c502decd 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -14,6 +14,7 @@
 #include <linux/workqueue.h>
 #include <linux/mutex.h>
 #include <linux/rwsem.h>
+#include <linux/atomic.h>
 #include <net/gen_stats.h>
 #include <net/rtnetlink.h>
 #include <net/flow_offload.h>
@@ -401,7 +402,7 @@ struct tcf_block {
 	struct flow_block flow_block;
 	struct list_head owner_list;
 	bool keep_dst;
-	unsigned int offloadcnt; /* Number of oddloaded filters */
+	atomic_t offloadcnt; /* Number of oddloaded filters */
 	unsigned int nooffloaddevcnt; /* Number of devs unable to do offload */
 	struct {
 		struct tcf_chain *chain;
@@ -443,7 +444,7 @@ static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
 	if (*flags & TCA_CLS_FLAGS_IN_HW)
 		return;
 	*flags |= TCA_CLS_FLAGS_IN_HW;
-	block->offloadcnt++;
+	atomic_inc(&block->offloadcnt);
 }
 
 static inline void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
@@ -451,7 +452,7 @@ static inline void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
 	if (!(*flags & TCA_CLS_FLAGS_IN_HW))
 		return;
 	*flags &= ~TCA_CLS_FLAGS_IN_HW;
-	block->offloadcnt--;
+	atomic_dec(&block->offloadcnt);
 }
 
 static inline void
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 959b7ca1ca02..f2c2f8159e35 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -629,7 +629,7 @@ static void tc_indr_block_call(struct tcf_block *block,
 
 static bool tcf_block_offload_in_use(struct tcf_block *block)
 {
-	return block->offloadcnt;
+	return atomic_read(&block->offloadcnt);
 }
 
 static int tcf_block_offload_cmd(struct tcf_block *block,
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next v2 04/10] net: sched: notify classifier on successful offload add/delete
From: Vlad Buslov @ 2019-08-23 18:50 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov
In-Reply-To: <20190823185056.12536-1-vladbu@mellanox.com>

To remove dependency on rtnl lock, extend classifier ops with new
ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while
holding cb_lock every time filter if successfully added to or deleted from
hardware.

Implement the new API in flower classifier. Use it to manage hw_filters
list under cb_lock protection, instead of relying on rtnl lock to
synchronize with concurrent fl_reoffload() call.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
Changes from V1 to V2:
  - Keep success flow unindented in tc_setup_cb_{add|replace}().

 include/net/sch_generic.h |  4 ++++
 net/sched/cls_api.c       | 19 +++++++++++++++++--
 net/sched/cls_flower.c    | 33 ++++++++++++++++++++++++++-------
 3 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index f90e3b2a3065..c4fbbaff30a2 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -312,6 +312,10 @@ struct tcf_proto_ops {
 	int			(*reoffload)(struct tcf_proto *tp, bool add,
 					     flow_setup_cb_t *cb, void *cb_priv,
 					     struct netlink_ext_ack *extack);
+	void			(*hw_add)(struct tcf_proto *tp,
+					  void *type_data);
+	void			(*hw_del)(struct tcf_proto *tp,
+					  void *type_data);
 	void			(*bind_class)(void *, u32, unsigned long);
 	void *			(*tmplt_create)(struct net *net,
 						struct tcf_chain *chain,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 157bf3599ce1..dd16cf171f51 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3099,6 +3099,11 @@ int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
 	}
 
 	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
+	if (ok_count < 0)
+		goto err_unlock;
+
+	if (tp->ops->hw_add)
+		tp->ops->hw_add(tp, type_data);
 	if (ok_count > 0)
 		tc_cls_offload_cnt_update(block, tp, in_hw_count, flags,
 					  ok_count, true);
@@ -3130,11 +3135,18 @@ int tc_setup_cb_replace(struct tcf_block *block, struct tcf_proto *tp,
 	}
 
 	tc_cls_offload_cnt_reset(block, tp, old_in_hw_count, old_flags);
+	if (tp->ops->hw_del)
+		tp->ops->hw_del(tp, type_data);
 
 	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
+	if (ok_count < 0)
+		goto err_unlock;
+
+	if (tp->ops->hw_add)
+		tp->ops->hw_add(tp, type_data);
 	if (ok_count > 0)
-		tc_cls_offload_cnt_update(block, tp, new_in_hw_count, new_flags,
-					  ok_count, true);
+		tc_cls_offload_cnt_update(block, tp, new_in_hw_count,
+					  new_flags, ok_count, true);
 err_unlock:
 	up_read(&block->cb_lock);
 	return ok_count < 0 ? ok_count : 0;
@@ -3155,6 +3167,9 @@ int tc_setup_cb_destroy(struct tcf_block *block, struct tcf_proto *tp,
 	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
 
 	tc_cls_offload_cnt_reset(block, tp, in_hw_count, flags);
+	if (tp->ops->hw_del)
+		tp->ops->hw_del(tp, type_data);
+
 	up_read(&block->cb_lock);
 	return ok_count < 0 ? ok_count : 0;
 }
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 0978ae4a1428..9a7fd6bcd0a5 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -421,9 +421,6 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
 
 	tc_setup_cb_destroy(block, tp, TC_SETUP_CLSFLOWER, &cls_flower, false,
 			    &f->flags, &f->in_hw_count, true);
-	spin_lock(&tp->lock);
-	list_del_init(&f->hw_list);
-	spin_unlock(&tp->lock);
 
 	if (!rtnl_held)
 		rtnl_unlock();
@@ -433,7 +430,6 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 				struct cls_fl_filter *f, bool rtnl_held,
 				struct netlink_ext_ack *extack)
 {
-	struct cls_fl_head *head = fl_head_dereference(tp);
 	struct tcf_block *block = tp->chain->block;
 	struct flow_cls_offload cls_flower = {};
 	bool skip_sw = tc_skip_sw(f->flags);
@@ -480,9 +476,6 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 		goto errout;
 	}
 
-	spin_lock(&tp->lock);
-	list_add(&f->hw_list, &head->hw_filters);
-	spin_unlock(&tp->lock);
 errout:
 	if (!rtnl_held)
 		rtnl_unlock();
@@ -1856,6 +1849,30 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 	return 0;
 }
 
+static void fl_hw_add(struct tcf_proto *tp, void *type_data)
+{
+	struct flow_cls_offload *cls_flower = type_data;
+	struct cls_fl_filter *f =
+		(struct cls_fl_filter *) cls_flower->cookie;
+	struct cls_fl_head *head = fl_head_dereference(tp);
+
+	spin_lock(&tp->lock);
+	list_add(&f->hw_list, &head->hw_filters);
+	spin_unlock(&tp->lock);
+}
+
+static void fl_hw_del(struct tcf_proto *tp, void *type_data)
+{
+	struct flow_cls_offload *cls_flower = type_data;
+	struct cls_fl_filter *f =
+		(struct cls_fl_filter *) cls_flower->cookie;
+
+	spin_lock(&tp->lock);
+	if (!list_empty(&f->hw_list))
+		list_del_init(&f->hw_list);
+	spin_unlock(&tp->lock);
+}
+
 static int fl_hw_create_tmplt(struct tcf_chain *chain,
 			      struct fl_flow_tmplt *tmplt)
 {
@@ -2516,6 +2533,8 @@ static struct tcf_proto_ops cls_fl_ops __read_mostly = {
 	.delete		= fl_delete,
 	.walk		= fl_walk,
 	.reoffload	= fl_reoffload,
+	.hw_add		= fl_hw_add,
+	.hw_del		= fl_hw_del,
 	.dump		= fl_dump,
 	.bind_class	= fl_bind_class,
 	.tmplt_create	= fl_tmplt_create,
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next v2 03/10] net: sched: refactor block offloads counter usage
From: Vlad Buslov @ 2019-08-23 18:50 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov
In-Reply-To: <20190823185056.12536-1-vladbu@mellanox.com>

Without rtnl lock protection filters can no longer safely manage block
offloads counter themselves. Refactor cls API to protect block offloadcnt
with tcf_block->cb_lock that is already used to protect driver callback
list and nooffloaddevcnt counter. The counter can be modified by concurrent
tasks by new functions that execute block callbacks (which is safe with
previous patch that changed its type to atomic_t), however, block
bind/unbind code that checks the counter value takes cb_lock in write mode
to exclude any concurrent modifications. This approach prevents race
conditions between bind/unbind and callback execution code but allows for
concurrency for tc rule update path.

Move block offload counter, filter in hardware counter and filter flags
management from classifiers into cls hardware offloads API. Make functions
tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API
private. Implement following new cls API to be used instead:

  tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
  already in hardware is successfully offloaded, increment block offloads
  counter, set filter in hardware counter and flag. On failure, previously
  offloaded filter is considered to be intact and offloads counter is not
  decremented.

  tc_setup_cb_replace() - destructive filter replace. Release existing
  filter block offload counter and reset its in hardware counter and flag.
  Set new filter in hardware counter and flag. On failure, previously
  offloaded filter is considered to be destroyed and offload counter is
  decremented.

  tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
  offloads counter.

  tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and
  call tc_cls_offload_cnt_update() if cb() didn't return an error.

Refactor all offload-capable classifiers to atomically offload filters to
hardware, change block offload counter, and set filter in hardware counter
and flag by means of the new cls API functions.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
Changes from V1 to V2:
  - Remove redundant brackets around cnt.
  - Rename 'errout' label to 'err_unlock'.
  - Revert changing oldprog to cls_bpf.oldprog in conditional.
  - Add new helper tc_setup_cb_reoffload().
  - Make tc_cls_offload_cnt_update() static and remove its export.
  - Change tc_setup_cb_*() helpers to only return zero or error code.

 include/net/pkt_cls.h     |  17 +++-
 include/net/sch_generic.h |  31 -------
 net/sched/cls_api.c       | 171 ++++++++++++++++++++++++++++++++++----
 net/sched/cls_bpf.c       |  38 ++++-----
 net/sched/cls_flower.c    |  40 ++++-----
 net/sched/cls_matchall.c  |  27 +++---
 net/sched/cls_u32.c       |  29 +++----
 7 files changed, 229 insertions(+), 124 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 64999ffcb486..612232492f67 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -506,7 +506,22 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 int tc_setup_flow_action(struct flow_action *flow_action,
 			 const struct tcf_exts *exts);
 int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
-		     void *type_data, bool err_stop);
+		     void *type_data, bool err_stop, bool rtnl_held);
+int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
+		    enum tc_setup_type type, void *type_data, bool err_stop,
+		    u32 *flags, unsigned int *in_hw_count, bool rtnl_held);
+int tc_setup_cb_replace(struct tcf_block *block, struct tcf_proto *tp,
+			enum tc_setup_type type, void *type_data, bool err_stop,
+			u32 *old_flags, unsigned int *old_in_hw_count,
+			u32 *new_flags, unsigned int *new_in_hw_count,
+			bool rtnl_held);
+int tc_setup_cb_destroy(struct tcf_block *block, struct tcf_proto *tp,
+			enum tc_setup_type type, void *type_data, bool err_stop,
+			u32 *flags, unsigned int *in_hw_count, bool rtnl_held);
+int tc_setup_cb_reoffload(struct tcf_block *block, struct tcf_proto *tp,
+			  bool add, flow_setup_cb_t *cb,
+			  enum tc_setup_type type, void *type_data,
+			  void *cb_priv, u32 *flags, unsigned int *in_hw_count);
 unsigned int tcf_exts_num_actions(struct tcf_exts *exts);
 
 struct tc_cls_u32_knode {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d778c502decd..f90e3b2a3065 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -439,37 +439,6 @@ static inline bool lockdep_tcf_proto_is_locked(struct tcf_proto *tp)
 #define tcf_proto_dereference(p, tp)					\
 	rcu_dereference_protected(p, lockdep_tcf_proto_is_locked(tp))
 
-static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
-{
-	if (*flags & TCA_CLS_FLAGS_IN_HW)
-		return;
-	*flags |= TCA_CLS_FLAGS_IN_HW;
-	atomic_inc(&block->offloadcnt);
-}
-
-static inline void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
-{
-	if (!(*flags & TCA_CLS_FLAGS_IN_HW))
-		return;
-	*flags &= ~TCA_CLS_FLAGS_IN_HW;
-	atomic_dec(&block->offloadcnt);
-}
-
-static inline void
-tc_cls_offload_cnt_update(struct tcf_block *block, u32 *cnt,
-			  u32 *flags, bool add)
-{
-	if (add) {
-		if (!*cnt)
-			tcf_block_offload_inc(block, flags);
-		(*cnt)++;
-	} else {
-		(*cnt)--;
-		if (!*cnt)
-			tcf_block_offload_dec(block, flags);
-	}
-}
-
 static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
 {
 	struct qdisc_skb_cb *qcb;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index f2c2f8159e35..157bf3599ce1 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3000,37 +3000,180 @@ int tcf_exts_dump_stats(struct sk_buff *skb, struct tcf_exts *exts)
 }
 EXPORT_SYMBOL(tcf_exts_dump_stats);
 
-int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
-		     void *type_data, bool err_stop)
+static void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
+{
+	if (*flags & TCA_CLS_FLAGS_IN_HW)
+		return;
+	*flags |= TCA_CLS_FLAGS_IN_HW;
+	atomic_inc(&block->offloadcnt);
+}
+
+static void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
+{
+	if (!(*flags & TCA_CLS_FLAGS_IN_HW))
+		return;
+	*flags &= ~TCA_CLS_FLAGS_IN_HW;
+	atomic_dec(&block->offloadcnt);
+}
+
+static void tc_cls_offload_cnt_update(struct tcf_block *block,
+				      struct tcf_proto *tp, u32 *cnt,
+				      u32 *flags, u32 diff, bool add)
+{
+	lockdep_assert_held(&block->cb_lock);
+
+	spin_lock(&tp->lock);
+	if (add) {
+		if (!*cnt)
+			tcf_block_offload_inc(block, flags);
+		*cnt += diff;
+	} else {
+		*cnt -= diff;
+		if (!*cnt)
+			tcf_block_offload_dec(block, flags);
+	}
+	spin_unlock(&tp->lock);
+}
+
+static void
+tc_cls_offload_cnt_reset(struct tcf_block *block, struct tcf_proto *tp,
+			 u32 *cnt, u32 *flags)
+{
+	lockdep_assert_held(&block->cb_lock);
+
+	spin_lock(&tp->lock);
+	tcf_block_offload_dec(block, flags);
+	*cnt = 0;
+	spin_unlock(&tp->lock);
+}
+
+static int
+__tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
+		   void *type_data, bool err_stop)
 {
 	struct flow_block_cb *block_cb;
 	int ok_count = 0;
 	int err;
 
-	down_read(&block->cb_lock);
-	/* Make sure all netdevs sharing this block are offload-capable. */
-	if (block->nooffloaddevcnt && err_stop) {
-		ok_count = -EOPNOTSUPP;
-		goto err_unlock;
-	}
-
 	list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
 		err = block_cb->cb(type, type_data, block_cb->cb_priv);
 		if (err) {
-			if (err_stop) {
-				ok_count = err;
-				goto err_unlock;
-			}
+			if (err_stop)
+				return err;
 		} else {
 			ok_count++;
 		}
 	}
-err_unlock:
+	return ok_count;
+}
+
+int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
+		     void *type_data, bool err_stop, bool rtnl_held)
+{
+	int ok_count;
+
+	down_read(&block->cb_lock);
+	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
 	up_read(&block->cb_lock);
 	return ok_count;
 }
 EXPORT_SYMBOL(tc_setup_cb_call);
 
+/* Non-destructive filter add. If filter that wasn't already in hardware is
+ * successfully offloaded, increment block offloads counter. On failure,
+ * previously offloaded filter is considered to be intact and offloads counter
+ * is not decremented.
+ */
+
+int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
+		    enum tc_setup_type type, void *type_data, bool err_stop,
+		    u32 *flags, unsigned int *in_hw_count, bool rtnl_held)
+{
+	int ok_count;
+
+	down_read(&block->cb_lock);
+	/* Make sure all netdevs sharing this block are offload-capable. */
+	if (block->nooffloaddevcnt && err_stop) {
+		ok_count = -EOPNOTSUPP;
+		goto err_unlock;
+	}
+
+	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
+	if (ok_count > 0)
+		tc_cls_offload_cnt_update(block, tp, in_hw_count, flags,
+					  ok_count, true);
+err_unlock:
+	up_read(&block->cb_lock);
+	return ok_count < 0 ? ok_count : 0;
+}
+EXPORT_SYMBOL(tc_setup_cb_add);
+
+/* Destructive filter replace. If filter that wasn't already in hardware is
+ * successfully offloaded, increment block offload counter. On failure,
+ * previously offloaded filter is considered to be destroyed and offload counter
+ * is decremented.
+ */
+
+int tc_setup_cb_replace(struct tcf_block *block, struct tcf_proto *tp,
+			enum tc_setup_type type, void *type_data, bool err_stop,
+			u32 *old_flags, unsigned int *old_in_hw_count,
+			u32 *new_flags, unsigned int *new_in_hw_count,
+			bool rtnl_held)
+{
+	int ok_count;
+
+	down_read(&block->cb_lock);
+	/* Make sure all netdevs sharing this block are offload-capable. */
+	if (block->nooffloaddevcnt && err_stop) {
+		ok_count = -EOPNOTSUPP;
+		goto err_unlock;
+	}
+
+	tc_cls_offload_cnt_reset(block, tp, old_in_hw_count, old_flags);
+
+	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
+	if (ok_count > 0)
+		tc_cls_offload_cnt_update(block, tp, new_in_hw_count, new_flags,
+					  ok_count, true);
+err_unlock:
+	up_read(&block->cb_lock);
+	return ok_count < 0 ? ok_count : 0;
+}
+EXPORT_SYMBOL(tc_setup_cb_replace);
+
+/* Destroy filter and decrement block offload counter, if filter was previously
+ * offloaded.
+ */
+
+int tc_setup_cb_destroy(struct tcf_block *block, struct tcf_proto *tp,
+			enum tc_setup_type type, void *type_data, bool err_stop,
+			u32 *flags, unsigned int *in_hw_count, bool rtnl_held)
+{
+	int ok_count;
+
+	down_read(&block->cb_lock);
+	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
+
+	tc_cls_offload_cnt_reset(block, tp, in_hw_count, flags);
+	up_read(&block->cb_lock);
+	return ok_count < 0 ? ok_count : 0;
+}
+EXPORT_SYMBOL(tc_setup_cb_destroy);
+
+int tc_setup_cb_reoffload(struct tcf_block *block, struct tcf_proto *tp,
+			  bool add, flow_setup_cb_t *cb,
+			  enum tc_setup_type type, void *type_data,
+			  void *cb_priv, u32 *flags, unsigned int *in_hw_count)
+{
+	int err = cb(type, type_data, cb_priv);
+
+	if (!err)
+		tc_cls_offload_cnt_update(block, tp, in_hw_count, flags, 1,
+					  add);
+	return err;
+}
+EXPORT_SYMBOL(tc_setup_cb_reoffload);
+
 int tc_setup_flow_action(struct flow_action *flow_action,
 			 const struct tcf_exts *exts)
 {
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 3f7a9c02b70c..9eef65a1b683 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -163,17 +163,19 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
 	cls_bpf.exts_integrated = obj->exts_integrated;
 
 	if (oldprog)
-		tcf_block_offload_dec(block, &oldprog->gen_flags);
+		err = tc_setup_cb_replace(block, tp, TC_SETUP_CLSBPF, &cls_bpf,
+					  skip_sw, &oldprog->gen_flags,
+					  &oldprog->in_hw_count,
+					  &prog->gen_flags, &prog->in_hw_count,
+					  true);
+	else
+		err = tc_setup_cb_add(block, tp, TC_SETUP_CLSBPF, &cls_bpf,
+				      skip_sw, &prog->gen_flags,
+				      &prog->in_hw_count, true);
 
-	err = tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, skip_sw);
-	if (prog) {
-		if (err < 0) {
-			cls_bpf_offload_cmd(tp, oldprog, prog, extack);
-			return err;
-		} else if (err > 0) {
-			prog->in_hw_count = err;
-			tcf_block_offload_inc(block, &prog->gen_flags);
-		}
+	if (prog && err) {
+		cls_bpf_offload_cmd(tp, oldprog, prog, extack);
+		return err;
 	}
 
 	if (prog && skip_sw && !(prog->gen_flags & TCA_CLS_FLAGS_IN_HW))
@@ -230,7 +232,7 @@ static void cls_bpf_offload_update_stats(struct tcf_proto *tp,
 	cls_bpf.name = prog->bpf_name;
 	cls_bpf.exts_integrated = prog->exts_integrated;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, false);
+	tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, false, true);
 }
 
 static int cls_bpf_init(struct tcf_proto *tp)
@@ -673,15 +675,11 @@ static int cls_bpf_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb
 		cls_bpf.name = prog->bpf_name;
 		cls_bpf.exts_integrated = prog->exts_integrated;
 
-		err = cb(TC_SETUP_CLSBPF, &cls_bpf, cb_priv);
-		if (err) {
-			if (add && tc_skip_sw(prog->gen_flags))
-				return err;
-			continue;
-		}
-
-		tc_cls_offload_cnt_update(block, &prog->in_hw_count,
-					  &prog->gen_flags, add);
+		err = tc_setup_cb_reoffload(block, tp, add, cb, TC_SETUP_CLSBPF,
+					    &cls_bpf, cb_priv, &prog->gen_flags,
+					    &prog->in_hw_count);
+		if (err && add && tc_skip_sw(prog->gen_flags))
+			return err;
 	}
 
 	return 0;
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 054123742e32..0978ae4a1428 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -419,10 +419,10 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
 	cls_flower.command = FLOW_CLS_DESTROY;
 	cls_flower.cookie = (unsigned long) f;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false);
+	tc_setup_cb_destroy(block, tp, TC_SETUP_CLSFLOWER, &cls_flower, false,
+			    &f->flags, &f->in_hw_count, true);
 	spin_lock(&tp->lock);
 	list_del_init(&f->hw_list);
-	tcf_block_offload_dec(block, &f->flags);
 	spin_unlock(&tp->lock);
 
 	if (!rtnl_held)
@@ -466,18 +466,13 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 		goto errout;
 	}
 
-	err = tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, skip_sw);
+	err = tc_setup_cb_add(block, tp, TC_SETUP_CLSFLOWER, &cls_flower,
+			      skip_sw, &f->flags, &f->in_hw_count, true);
 	kfree(cls_flower.rule);
 
-	if (err < 0) {
+	if (err) {
 		fl_hw_destroy_filter(tp, f, true, NULL);
 		goto errout;
-	} else if (err > 0) {
-		f->in_hw_count = err;
-		err = 0;
-		spin_lock(&tp->lock);
-		tcf_block_offload_inc(block, &f->flags);
-		spin_unlock(&tp->lock);
 	}
 
 	if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW)) {
@@ -509,7 +504,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f,
 	cls_flower.cookie = (unsigned long) f;
 	cls_flower.classid = f->res.classid;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false);
+	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false, true);
 
 	tcf_exts_stats_update(&f->exts, cls_flower.stats.bytes,
 			      cls_flower.stats.pkts,
@@ -1844,21 +1839,16 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 
 		cls_flower.classid = f->res.classid;
 
-		err = cb(TC_SETUP_CLSFLOWER, &cls_flower, cb_priv);
+		err = tc_setup_cb_reoffload(block, tp, add, cb,
+					    TC_SETUP_CLSFLOWER, &cls_flower,
+					    cb_priv, &f->flags,
+					    &f->in_hw_count);
 		kfree(cls_flower.rule);
 
-		if (err) {
-			if (add && tc_skip_sw(f->flags)) {
-				__fl_put(f);
-				return err;
-			}
-			goto next_flow;
+		if (err && add && tc_skip_sw(f->flags)) {
+			__fl_put(f);
+			return err;
 		}
-
-		spin_lock(&tp->lock);
-		tc_cls_offload_cnt_update(block, &f->in_hw_count, &f->flags,
-					  add);
-		spin_unlock(&tp->lock);
 next_flow:
 		__fl_put(f);
 	}
@@ -1886,7 +1876,7 @@ static int fl_hw_create_tmplt(struct tcf_chain *chain,
 	/* We don't care if driver (any of them) fails to handle this
 	 * call. It serves just as a hint for it.
 	 */
-	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false);
+	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false, true);
 	kfree(cls_flower.rule);
 
 	return 0;
@@ -1902,7 +1892,7 @@ static void fl_hw_destroy_tmplt(struct tcf_chain *chain,
 	cls_flower.command = FLOW_CLS_TMPLT_DESTROY;
 	cls_flower.cookie = (unsigned long) tmplt;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false);
+	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false, true);
 }
 
 static void *fl_tmplt_create(struct net *net, struct tcf_chain *chain,
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 455ea2793f9b..7801851b55d5 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -75,8 +75,8 @@ static void mall_destroy_hw_filter(struct tcf_proto *tp,
 	cls_mall.command = TC_CLSMATCHALL_DESTROY;
 	cls_mall.cookie = cookie;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSMATCHALL, &cls_mall, false);
-	tcf_block_offload_dec(block, &head->flags);
+	tc_setup_cb_destroy(block, tp, TC_SETUP_CLSMATCHALL, &cls_mall, false,
+			    &head->flags, &head->in_hw_count, true);
 }
 
 static int mall_replace_hw_filter(struct tcf_proto *tp,
@@ -109,15 +109,13 @@ static int mall_replace_hw_filter(struct tcf_proto *tp,
 		return err;
 	}
 
-	err = tc_setup_cb_call(block, TC_SETUP_CLSMATCHALL, &cls_mall, skip_sw);
+	err = tc_setup_cb_add(block, tp, TC_SETUP_CLSMATCHALL, &cls_mall,
+			      skip_sw, &head->flags, &head->in_hw_count, true);
 	kfree(cls_mall.rule);
 
-	if (err < 0) {
+	if (err) {
 		mall_destroy_hw_filter(tp, head, cookie, NULL);
 		return err;
-	} else if (err > 0) {
-		head->in_hw_count = err;
-		tcf_block_offload_inc(block, &head->flags);
 	}
 
 	if (skip_sw && !(head->flags & TCA_CLS_FLAGS_IN_HW))
@@ -312,16 +310,13 @@ static int mall_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 		return 0;
 	}
 
-	err = cb(TC_SETUP_CLSMATCHALL, &cls_mall, cb_priv);
+	err = tc_setup_cb_reoffload(block, tp, add, cb, TC_SETUP_CLSMATCHALL,
+				    &cls_mall, cb_priv, &head->flags,
+				    &head->in_hw_count);
 	kfree(cls_mall.rule);
 
-	if (err) {
-		if (add && tc_skip_sw(head->flags))
-			return err;
-		return 0;
-	}
-
-	tc_cls_offload_cnt_update(block, &head->in_hw_count, &head->flags, add);
+	if (err && add && tc_skip_sw(head->flags))
+		return err;
 
 	return 0;
 }
@@ -337,7 +332,7 @@ static void mall_stats_hw_filter(struct tcf_proto *tp,
 	cls_mall.command = TC_CLSMATCHALL_STATS;
 	cls_mall.cookie = cookie;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSMATCHALL, &cls_mall, false);
+	tc_setup_cb_call(block, TC_SETUP_CLSMATCHALL, &cls_mall, false, true);
 
 	tcf_exts_stats_update(&head->exts, cls_mall.stats.bytes,
 			      cls_mall.stats.pkts, cls_mall.stats.lastused);
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 8614088edd1b..a1672b180b19 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -480,7 +480,7 @@ static void u32_clear_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h,
 	cls_u32.hnode.handle = h->handle;
 	cls_u32.hnode.prio = h->prio;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSU32, &cls_u32, false);
+	tc_setup_cb_call(block, TC_SETUP_CLSU32, &cls_u32, false, true);
 }
 
 static int u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h,
@@ -498,7 +498,7 @@ static int u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h,
 	cls_u32.hnode.handle = h->handle;
 	cls_u32.hnode.prio = h->prio;
 
-	err = tc_setup_cb_call(block, TC_SETUP_CLSU32, &cls_u32, skip_sw);
+	err = tc_setup_cb_call(block, TC_SETUP_CLSU32, &cls_u32, skip_sw, true);
 	if (err < 0) {
 		u32_clear_hw_hnode(tp, h, NULL);
 		return err;
@@ -522,8 +522,8 @@ static void u32_remove_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n,
 	cls_u32.command = TC_CLSU32_DELETE_KNODE;
 	cls_u32.knode.handle = n->handle;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSU32, &cls_u32, false);
-	tcf_block_offload_dec(block, &n->flags);
+	tc_setup_cb_destroy(block, tp, TC_SETUP_CLSU32, &cls_u32, false,
+			    &n->flags, &n->in_hw_count, true);
 }
 
 static int u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n,
@@ -552,13 +552,11 @@ static int u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n,
 	if (n->ht_down)
 		cls_u32.knode.link_handle = ht->handle;
 
-	err = tc_setup_cb_call(block, TC_SETUP_CLSU32, &cls_u32, skip_sw);
-	if (err < 0) {
+	err = tc_setup_cb_add(block, tp, TC_SETUP_CLSU32, &cls_u32, skip_sw,
+			      &n->flags, &n->in_hw_count, true);
+	if (err) {
 		u32_remove_hw_knode(tp, n, NULL);
 		return err;
-	} else if (err > 0) {
-		n->in_hw_count = err;
-		tcf_block_offload_inc(block, &n->flags);
 	}
 
 	if (skip_sw && !(n->flags & TCA_CLS_FLAGS_IN_HW))
@@ -1201,14 +1199,11 @@ static int u32_reoffload_knode(struct tcf_proto *tp, struct tc_u_knode *n,
 			cls_u32.knode.link_handle = ht->handle;
 	}
 
-	err = cb(TC_SETUP_CLSU32, &cls_u32, cb_priv);
-	if (err) {
-		if (add && tc_skip_sw(n->flags))
-			return err;
-		return 0;
-	}
-
-	tc_cls_offload_cnt_update(block, &n->in_hw_count, &n->flags, add);
+	err = tc_setup_cb_reoffload(block, tp, add, cb, TC_SETUP_CLSU32,
+				    &cls_u32, cb_priv, &n->flags,
+				    &n->in_hw_count);
+	if (err && add && tc_skip_sw(n->flags))
+		return err;
 
 	return 0;
 }
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next v2 01/10] net: sched: protect block offload-related fields with rw_semaphore
From: Vlad Buslov @ 2019-08-23 18:50 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov
In-Reply-To: <20190823185056.12536-1-vladbu@mellanox.com>

In order to remove dependency on rtnl lock, extend tcf_block with 'cb_lock'
rwsem and use it to protect flow_block->cb_list and related counters from
concurrent modification. The lock is taken in read mode for read-only
traversal of cb_list in tc_setup_cb_call() and write mode in all other
cases. This approach ensures that:

- cb_list is not changed concurrently while filters is being offloaded on
  block.

- block->nooffloaddevcnt is checked while holding the lock in read mode,
  but is only changed by bind/unbind code when holding the cb_lock in write
  mode to prevent concurrent modification.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
Changes from V1 to V2:
  - Rename 'errout' label to 'err_unlock'.

 include/net/sch_generic.h |  2 ++
 net/sched/cls_api.c       | 45 +++++++++++++++++++++++++++++++--------
 2 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d9f359af0b93..a3eaf5f9d28f 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -13,6 +13,7 @@
 #include <linux/refcount.h>
 #include <linux/workqueue.h>
 #include <linux/mutex.h>
+#include <linux/rwsem.h>
 #include <net/gen_stats.h>
 #include <net/rtnetlink.h>
 #include <net/flow_offload.h>
@@ -396,6 +397,7 @@ struct tcf_block {
 	refcount_t refcnt;
 	struct net *net;
 	struct Qdisc *q;
+	struct rw_semaphore cb_lock; /* protects cb_list and offload counters */
 	struct flow_block flow_block;
 	struct list_head owner_list;
 	bool keep_dst;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index e0d8b456e9f5..959b7ca1ca02 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -568,9 +568,11 @@ static void tc_indr_block_ing_cmd(struct net_device *dev,
 
 	bo.block = &block->flow_block;
 
+	down_write(&block->cb_lock);
 	cb(dev, cb_priv, TC_SETUP_BLOCK, &bo);
 
 	tcf_block_setup(block, &bo);
+	up_write(&block->cb_lock);
 }
 
 static struct tcf_block *tc_dev_ingress_block(struct net_device *dev)
@@ -661,6 +663,7 @@ static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
 	struct net_device *dev = q->dev_queue->dev;
 	int err;
 
+	down_write(&block->cb_lock);
 	if (!dev->netdev_ops->ndo_setup_tc)
 		goto no_offload_dev_inc;
 
@@ -669,24 +672,31 @@ static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
 	 */
 	if (!tc_can_offload(dev) && tcf_block_offload_in_use(block)) {
 		NL_SET_ERR_MSG(extack, "Bind to offloaded block failed as dev has offload disabled");
-		return -EOPNOTSUPP;
+		err = -EOPNOTSUPP;
+		goto err_unlock;
 	}
 
 	err = tcf_block_offload_cmd(block, dev, ei, FLOW_BLOCK_BIND, extack);
 	if (err == -EOPNOTSUPP)
 		goto no_offload_dev_inc;
 	if (err)
-		return err;
+		goto err_unlock;
 
 	tc_indr_block_call(block, dev, ei, FLOW_BLOCK_BIND, extack);
+	up_write(&block->cb_lock);
 	return 0;
 
 no_offload_dev_inc:
-	if (tcf_block_offload_in_use(block))
-		return -EOPNOTSUPP;
+	if (tcf_block_offload_in_use(block)) {
+		err = -EOPNOTSUPP;
+		goto err_unlock;
+	}
+	err = 0;
 	block->nooffloaddevcnt++;
 	tc_indr_block_call(block, dev, ei, FLOW_BLOCK_BIND, extack);
-	return 0;
+err_unlock:
+	up_write(&block->cb_lock);
+	return err;
 }
 
 static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
@@ -695,6 +705,7 @@ static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
 	struct net_device *dev = q->dev_queue->dev;
 	int err;
 
+	down_write(&block->cb_lock);
 	tc_indr_block_call(block, dev, ei, FLOW_BLOCK_UNBIND, NULL);
 
 	if (!dev->netdev_ops->ndo_setup_tc)
@@ -702,10 +713,12 @@ static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
 	err = tcf_block_offload_cmd(block, dev, ei, FLOW_BLOCK_UNBIND, NULL);
 	if (err == -EOPNOTSUPP)
 		goto no_offload_dev_dec;
+	up_write(&block->cb_lock);
 	return;
 
 no_offload_dev_dec:
 	WARN_ON(block->nooffloaddevcnt-- == 0);
+	up_write(&block->cb_lock);
 }
 
 static int
@@ -820,6 +833,7 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
 		return ERR_PTR(-ENOMEM);
 	}
 	mutex_init(&block->lock);
+	init_rwsem(&block->cb_lock);
 	flow_block_init(&block->flow_block);
 	INIT_LIST_HEAD(&block->chain_list);
 	INIT_LIST_HEAD(&block->owner_list);
@@ -1355,6 +1369,8 @@ tcf_block_playback_offloads(struct tcf_block *block, flow_setup_cb_t *cb,
 	struct tcf_proto *tp, *tp_prev;
 	int err;
 
+	lockdep_assert_held(&block->cb_lock);
+
 	for (chain = __tcf_get_next_chain(block, NULL);
 	     chain;
 	     chain_prev = chain,
@@ -1393,6 +1409,8 @@ static int tcf_block_bind(struct tcf_block *block,
 	struct flow_block_cb *block_cb, *next;
 	int err, i = 0;
 
+	lockdep_assert_held(&block->cb_lock);
+
 	list_for_each_entry(block_cb, &bo->cb_list, list) {
 		err = tcf_block_playback_offloads(block, block_cb->cb,
 						  block_cb->cb_priv, true,
@@ -1427,6 +1445,8 @@ static void tcf_block_unbind(struct tcf_block *block,
 {
 	struct flow_block_cb *block_cb, *next;
 
+	lockdep_assert_held(&block->cb_lock);
+
 	list_for_each_entry_safe(block_cb, next, &bo->cb_list, list) {
 		tcf_block_playback_offloads(block, block_cb->cb,
 					    block_cb->cb_priv, false,
@@ -2987,19 +3007,26 @@ int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
 	int ok_count = 0;
 	int err;
 
+	down_read(&block->cb_lock);
 	/* Make sure all netdevs sharing this block are offload-capable. */
-	if (block->nooffloaddevcnt && err_stop)
-		return -EOPNOTSUPP;
+	if (block->nooffloaddevcnt && err_stop) {
+		ok_count = -EOPNOTSUPP;
+		goto err_unlock;
+	}
 
 	list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
 		err = block_cb->cb(type, type_data, block_cb->cb_priv);
 		if (err) {
-			if (err_stop)
-				return err;
+			if (err_stop) {
+				ok_count = err;
+				goto err_unlock;
+			}
 		} else {
 			ok_count++;
 		}
 	}
+err_unlock:
+	up_read(&block->cb_lock);
 	return ok_count;
 }
 EXPORT_SYMBOL(tc_setup_cb_call);
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next v2 07/10] net: sched: take rtnl lock in tc_setup_flow_action()
From: Vlad Buslov @ 2019-08-23 18:50 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko
In-Reply-To: <20190823185056.12536-1-vladbu@mellanox.com>

In order to allow using new flow_action infrastructure from unlocked
classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held'
argument. Take rtnl lock before accessing tc_action data. This is necessary
to protect from concurrent action replace.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h    |  2 +-
 net/sched/cls_api.c      | 17 +++++++++++++----
 net/sched/cls_flower.c   |  6 ++++--
 net/sched/cls_matchall.c |  4 ++--
 4 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 612232492f67..a48824bc1489 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -504,7 +504,7 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 }
 
 int tc_setup_flow_action(struct flow_action *flow_action,
-			 const struct tcf_exts *exts);
+			 const struct tcf_exts *exts, bool rtnl_held);
 int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
 		     void *type_data, bool err_stop, bool rtnl_held);
 int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index f2dcecf34c6f..cb835d581b77 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3261,14 +3261,17 @@ int tc_setup_cb_reoffload(struct tcf_block *block, struct tcf_proto *tp,
 EXPORT_SYMBOL(tc_setup_cb_reoffload);
 
 int tc_setup_flow_action(struct flow_action *flow_action,
-			 const struct tcf_exts *exts)
+			 const struct tcf_exts *exts, bool rtnl_held)
 {
 	const struct tc_action *act;
-	int i, j, k;
+	int i, j, k, err = 0;
 
 	if (!exts)
 		return 0;
 
+	if (!rtnl_held)
+		rtnl_lock();
+
 	j = 0;
 	tcf_exts_for_each_action(i, act, exts) {
 		struct flow_action_entry *entry;
@@ -3313,6 +3316,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 				entry->vlan.prio = tcf_vlan_push_prio(act);
 				break;
 			default:
+				err = -EOPNOTSUPP;
 				goto err_out;
 			}
 		} else if (is_tcf_tunnel_set(act)) {
@@ -3330,6 +3334,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 					entry->id = FLOW_ACTION_ADD;
 					break;
 				default:
+					err = -EOPNOTSUPP;
 					goto err_out;
 				}
 				entry->mangle.htype = tcf_pedit_htype(act, k);
@@ -3388,15 +3393,19 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 			entry->id = FLOW_ACTION_PTYPE;
 			entry->ptype = tcf_skbedit_ptype(act);
 		} else {
+			err = -EOPNOTSUPP;
 			goto err_out;
 		}
 
 		if (!is_tcf_pedit(act))
 			j++;
 	}
-	return 0;
+
 err_out:
-	return -EOPNOTSUPP;
+	if (!rtnl_held)
+		rtnl_unlock();
+
+	return err;
 }
 EXPORT_SYMBOL(tc_setup_flow_action);
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 9a7fd6bcd0a5..d47d4e84d4e5 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -452,7 +452,8 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 	cls_flower.rule->match.key = &f->mkey;
 	cls_flower.classid = f->res.classid;
 
-	err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts);
+	err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts,
+				   true);
 	if (err) {
 		kfree(cls_flower.rule);
 		if (skip_sw)
@@ -1819,7 +1820,8 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 		cls_flower.rule->match.mask = &f->mask->key;
 		cls_flower.rule->match.key = &f->mkey;
 
-		err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts);
+		err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts,
+					   true);
 		if (err) {
 			kfree(cls_flower.rule);
 			if (tc_skip_sw(f->flags)) {
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 7801851b55d5..d53d81dfee03 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -97,7 +97,7 @@ static int mall_replace_hw_filter(struct tcf_proto *tp,
 	cls_mall.command = TC_CLSMATCHALL_REPLACE;
 	cls_mall.cookie = cookie;
 
-	err = tc_setup_flow_action(&cls_mall.rule->action, &head->exts);
+	err = tc_setup_flow_action(&cls_mall.rule->action, &head->exts, true);
 	if (err) {
 		kfree(cls_mall.rule);
 		mall_destroy_hw_filter(tp, head, cookie, NULL);
@@ -300,7 +300,7 @@ static int mall_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 		TC_CLSMATCHALL_REPLACE : TC_CLSMATCHALL_DESTROY;
 	cls_mall.cookie = (unsigned long)head;
 
-	err = tc_setup_flow_action(&cls_mall.rule->action, &head->exts);
+	err = tc_setup_flow_action(&cls_mall.rule->action, &head->exts, true);
 	if (err) {
 		kfree(cls_mall.rule);
 		if (add && tc_skip_sw(head->flags)) {
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next v2 00/10] Refactor cls hardware offload API to support rtnl-independent drivers
From: Vlad Buslov @ 2019-08-23 18:50 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov

Currently, all cls API hardware offloads driver callbacks require caller
to hold rtnl lock when calling them. This patch set introduces new API
that allows drivers to register callbacks that are not dependent on rtnl
lock and unlocked classifiers to offload filters without obtaining rtnl
lock first, which is intended to allow offloading tc rules in parallel.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
TC rule update handlers (RTM_NEWTFILTER, RTM_DELTFILTER, etc.) are
already registered with this flag and only take rtnl lock when qdisc or
classifier requires it. Classifiers can indicate that their ops
callbacks don't require caller to hold rtnl lock by setting the
TCF_PROTO_OPS_DOIT_UNLOCKED flag. Unlocked implementation of flower
classifier is now upstreamed. However, this implementation still obtains
rtnl lock before calling hardware offloads API.

Implement following cls API changes:

- Introduce new "unlocked_driver_cb" flag to struct flow_block_offload
  to allow registering and unregistering block hardware offload
  callbacks that do not require caller to hold rtnl lock. Drivers that
  doesn't require users of its tc offload callbacks to hold rtnl lock
  sets the flag to true on block bind/unbind. Internally tcf_block is
  extended with additional lockeddevcnt counter that is used to count
  number of devices that require rtnl lock that block is bound to. When
  this counter is zero, tc_setup_cb_*() functions execute callbacks
  without obtaining rtnl lock.

- Extend cls API single hardware rule update tc_setup_cb_call() function
  with tc_setup_cb_add(), tc_setup_cb_replace(), tc_setup_cb_destroy()
  and tc_setup_cb_reoffload() functions. These new APIs are needed to
  move management of block offload counter, filter in hardware counter
  and flag from classifier implementations to cls API, which is now
  responsible for managing them in concurrency-safe manner. Access to
  cb_list from callback execution code is synchronized by obtaining new
  'cb_lock' rw_semaphore in read mode, which allows executing callbacks
  in parallel, but excludes any modifications of data from
  register/unregister code. tcf_block offloads counter type is changed
  to atomic integer to allow updating the counter concurrently.

- Extend classifier ops with new ops->hw_add() and ops->hw_del()
  callbacks which are used to notify unlocked classifiers when filter is
  successfully added or deleted to hardware without releasing cb_lock.
  This is necessary to update classifier state atomically with callback
  list traversal and updating of all relevant counters and allows
  unlocked classifiers to synchronize with concurrent reoffload without
  requiring any changes to driver callback API implementations.

New tc flow_action infrastructure is also modified to allow its user to
execute without rtnl lock protection. Function tc_setup_flow_action() is
modified to conditionally obtain rtnl lock before accessing action
state. Action data that is accessed by reference is either copied or
reference counted to prevent concurrent action overwrite from
deallocating it. New function tc_cleanup_flow_action() is introduced to
cleanup/release all such data obtained by tc_setup_flow_action().

Flower classifier (only unlocked classifier at the moment) is modified
to use new cls hardware offloads API and no longer obtains rtnl lock
before calling it.

Vlad Buslov (10):
  net: sched: protect block offload-related fields with rw_semaphore
  net: sched: change tcf block offload counter type to atomic_t
  net: sched: refactor block offloads counter usage
  net: sched: notify classifier on successful offload add/delete
  net: sched: add API for registering unlocked offload block callbacks
  net: sched: conditionally obtain rtnl lock in cls hw offloads API
  net: sched: take rtnl lock in tc_setup_flow_action()
  net: sched: take reference to action dev before calling offloads
  net: sched: copy tunnel info when setting flow_action entry->tunnel
  net: sched: flower: don't take rtnl lock for cls hw offloads API

 .../net/ethernet/mellanox/mlx5/core/en_main.c |   2 +
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |   3 +
 include/net/flow_offload.h                    |   1 +
 include/net/pkt_cls.h                         |  21 +-
 include/net/sch_generic.h                     |  41 +--
 include/net/tc_act/tc_tunnel_key.h            |  17 +
 net/sched/cls_api.c                           | 338 +++++++++++++++++-
 net/sched/cls_bpf.c                           |  38 +-
 net/sched/cls_flower.c                        | 126 +++----
 net/sched/cls_matchall.c                      |  31 +-
 net/sched/cls_u32.c                           |  29 +-
 11 files changed, 474 insertions(+), 173 deletions(-)

-- 
2.21.0


^ permalink raw reply

* Re: [RFC 3/4] Move ACPI functionality out of r8152 driver
From: Bjørn Mork @ 2019-08-23 18:34 UTC (permalink / raw)
  To: Charles.Hyde
  Cc: linux-usb, linux-acpi, gregkh, Mario.Limonciello, oliver, netdev,
	nic_swsd
In-Reply-To: <1566339738195.2913@Dellteam.com>

<Charles.Hyde@dellteam.com> writes:

> This change moves ACPI functionality out of the Realtek r8152 driver to
> its own source and header file, making it available to other drivers as
> needed now and into the future.  At the time this ACPI snippet was
> introduced in 2016, only the Realtek driver made use of it in support of
> Dell's enterprise IT policy efforts.  There comes now a need for this
> same support in a different driver, also in support of Dell's enterprise
> IT policy efforts.

Yes, and we told you so.


Bjørn

^ permalink raw reply

* Re: [PATCH net-next 03/10] net: sched: refactor block offloads counter usage
From: Jakub Kicinski @ 2019-08-23 18:26 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	pablo@netfilter.org, Jiri Pirko, linux-doc
In-Reply-To: <vbfftls17yl.fsf@mellanox.com>

On Fri, 23 Aug 2019 10:39:50 +0000, Vlad Buslov wrote:
> >> +/* Destroy filter and decrement block offload counter, if filter was previously
> >> + * offloaded.
> >> + */
> >> +  
> >
> > hm.. is this gap between comment and function it pertains to
> > intentional?  
> 
> Majority of function comments in cls_api.c have newline after them (not
> all of them though). I don't have any strong opinions regarding this.
> You suggest it is better not to have blank lines after function
> comments?

Ah, you're right. I think it's pretty strange to have a new line after
a comment which pertains only to the function which is immediately
following it. Often the new line is used as a separation, when the
comment describes whole section of the file..

I kind of wish kdoc allowed none of the parameters to be described.
Often you want to document the function but the parameters are kind 
of obvious.

Anyway... feel free to leave this as is.

^ permalink raw reply

* Re: [PATCH net-next] net/rds: Whitelist rdma_cookie and rx_tstamp for usercopy
From: santosh.shilimkar @ 2019-08-23 18:17 UTC (permalink / raw)
  To: Dag Moxnes, netdev, linux-rdma, rds-devel; +Cc: davem
In-Reply-To: <1566568998-26222-1-git-send-email-dag.moxnes@oracle.com>

On 8/23/19 7:03 AM, Dag Moxnes wrote:
> Add the RDMA cookie and RX timestamp to the usercopy whitelist.
> 
> After the introduction of hardened usercopy whitelisting
> (https://lwn.net/Articles/727322/), a warning is displayed when the
> RDMA cookie or RX timestamp is copied to userspace:
> 
> kernel: WARNING: CPU: 3 PID: 5750 at
> mm/usercopy.c:81 usercopy_warn+0x8e/0xa6
> [...]
> kernel: Call Trace:
> kernel: __check_heap_object+0xb8/0x11b
> kernel: __check_object_size+0xe3/0x1bc
> kernel: put_cmsg+0x95/0x115
> kernel: rds_recvmsg+0x43d/0x620 [rds]
> kernel: sock_recvmsg+0x43/0x4a
> kernel: ___sys_recvmsg+0xda/0x1e6
> kernel: ? __handle_mm_fault+0xcae/0xf79
> kernel: __sys_recvmsg+0x51/0x8a
> kernel: SyS_recvmsg+0x12/0x1c
> kernel: do_syscall_64+0x79/0x1ae
> 
> When the whitelisting feature was introduced, the memory for the RDMA
> cookie and RX timestamp in RDS was not added to the whitelist, causing
> the warning above.
> 
> Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com>
> Tested-by: jenny.x.xu@oracle.com
> ---
Thanks Dag to get this out on list.
You might have to fix the Tested-by tag.
Tested-by: Jenny <jenny.x.xu@oracle.com

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>




^ permalink raw reply

* Re: [net-next 4/8] net/mlx5e: Add device out of buffer counter
From: Jakub Kicinski @ 2019-08-23 18:16 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: davem@davemloft.net, netdev@vger.kernel.org, Moshe Shemesh
In-Reply-To: <27f7cfa13d1b5e7717e2d75595ab453951b18a96.camel@mellanox.com>

On Fri, 23 Aug 2019 06:00:45 +0000, Saeed Mahameed wrote:
> On Thu, 2019-08-22 at 18:33 -0700, Jakub Kicinski wrote:
> > On Thu, 22 Aug 2019 23:35:52 +0000, Saeed Mahameed wrote:  
> > > From: Moshe Shemesh <moshe@mellanox.com>
> > > 
> > > Added the following packets drop counter:
> > > Device out of buffer - counts packets which were dropped due to
> > > full
> > > device internal receive queue.
> > > This counter will be shown on ethtool as a new counter called
> > > dev_out_of_buffer.
> > > The counter is read from FW by command QUERY_VNIC_ENV.
> > > 
> > > Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
> > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>  
> > 
> > Sounds like rx_fifo_errors, no? Doesn't rx_fifo_errors count RX
> > overruns?  
> 
> No, that is port buffer you are looking for and we got that fully
> covered in mlx5. this is different.
> 
> This new counter is deep into the HW data path pipeline and it covers
> very rare and complex scenarios that got only recently introduced with
> swichdev mode and "some" lately added tunnels offloads that are routed
> between VFs/PFs.
> 
> Normally the HW is lossless once the packet passes port buffers into
> the data plane pipeline, let's call that "fast lane", BUT for sriov
> configurations with switchdev mode enabled and some special hand
> crafted tc tunnel offloads that requires hairpin between VFs/PFs, the
> hw might decide to send some traffic to a "service lane" which is still
> fast path but unlike the "fast lane" it handles traffic through "HW
> internal" receive and send queues (just like we do with hairpin) that
> might drop packets. the whole thing is transparent to driver and it is
> HW implementation specific.

I see thanks for the explanation and sorry for the delayed response.
Would it perhaps make sense to indicate the hairpin in the name?
dev_out_of_buffer is quite a generic name, and there seems to be no
doc, nor does the commit message explains it as well as you have..

^ permalink raw reply

* Re: r8169: regression on MIPS/Loongson
From: Heiner Kallweit @ 2019-08-23 18:08 UTC (permalink / raw)
  To: Aaro Koskinen; +Cc: netdev, linux-mips
In-Reply-To: <20190822233854.GG30291@darkstar.musicnaut.iki.fi>

On 23.08.2019 01:38, Aaro Koskinen wrote:
> Hi,
> 
> On Fri, Aug 23, 2019 at 12:52:34AM +0200, Heiner Kallweit wrote:
>> On 23.08.2019 00:25, Aaro Koskinen wrote:
>>> After upgrading from v5.2 to v5.3-rc5 on MIPS/Loongson board, copying
>>> large files from network with scp started to fail with "Integrity error".
>>> Bisected to:
>>>
>>> f072218cca5b076dd99f3dfa3aaafedfd0023a51 is the first bad commit
>>> commit f072218cca5b076dd99f3dfa3aaafedfd0023a51
>>> Author: Heiner Kallweit <hkallweit1@gmail.com>
>>> Date:   Thu Jun 27 23:19:09 2019 +0200
>>>
>>>     r8169: remove not needed call to dma_sync_single_for_device
>>>
>>> Any idea what goes wrong? Should this change be reverted?
>>>



>> Typically the Realtek chips are used on Intel platforms and I haven't
>> seen any such report yet, so it seems to be platform-specific.
> 
> Probably. On my AMD x86_64 box r8169 works fine.
> 
>> Which board (DT config) is it, and can you provide a full dmesg?
> 
> This board does not use DT (support files are under arch/mips/loongson64).
> dmesg is below:
> 
> [    0.000000] Linux version 5.3.0-rc4-lemote-los_1bf0c (aakoskin@amd-fx-6350) (gcc version 8.3.0 (GCC)) #1 Fri Aug 23 01:01:45 EEST 2019
> [    0.000000] memsize=256, highmemsize=256
> [    0.000000] CpuClock = 797800000
> [    0.000000] printk: bootconsole [early0] enabled
> [    0.000000] CPU0 revision is: 00006303 (ICT Loongson-2)
> [    0.000000] FPU revision is: 00000501
> [    0.000000] Checking for the multiply/shift bug... no.
> [    0.000000] Checking for the daddiu bug... no.
> [    0.000000] Determined physical RAM map:
> [    0.000000]  memory: 0000000010000000 @ 0000000000000000 (usable)
> [    0.000000]  memory: 0000000030000000 @ 0000000010000000 (reserved)
> [    0.000000]  memory: 0000000010000000 @ 0000000090000000 (usable)
> [    0.000000]  memory: 0000000010000000 @ 0000000080000000 (reserved)
> [    0.000000] Initrd not found or empty - disabling initrd
> [    0.000000] Primary instruction cache 64kB, VIPT, direct mapped, linesize 32 bytes.
> [    0.000000] Primary data cache 64kB, 4-way, VIPT, no aliases, linesize 32 bytes
> [    0.000000] Unified secondary cache 512kB 4-way, linesize 32 bytes.
> [    0.000000] Zone ranges:
> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000009fffffff]
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000003fffffff]
> [    0.000000]   node   0: [mem 0x0000000080000000-0x000000009fffffff]
> [    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000009fffffff]
> [    0.000000] On node 0 totalpages: 98304
> [    0.000000]   Normal zone: 336 pages used for memmap
> [    0.000000]   Normal zone: 0 pages reserved
> [    0.000000]   Normal zone: 98304 pages, LIFO batch:15
> [    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
> [    0.000000] pcpu-alloc: [0] 0 
> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 97968
> [    0.000000] Kernel command line: console=tty console=ttyS0,115200
> [    0.000000] Dentry cache hash table entries: 262144 (order: 7, 2097152 bytes, linear)
> [    0.000000] Inode-cache hash table entries: 131072 (order: 6, 1048576 bytes, linear)
> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> [    0.000000] Memory: 489664K/1572864K available (4863K kernel code, 467K rwdata, 876K rodata, 1968K init, 16616K bss, 1083200K reserved, 0K cma-reserved)
> [    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> [    0.000000] NR_IRQS: 128
> [    0.000000] random: get_random_bytes called from start_kernel+0x368/0x620 with crng_init=0
> [    0.000000] Console: colour dummy device 80x25
> [    0.000000] printk: console [tty0] enabled
> [    0.000000] sched_clock: 64 bits at 250 Hz, resolution 4000000ns, wraps every 9007199254000000ns
> [    0.004000] Calibrating delay loop... 528.38 BogoMIPS (lpj=1056768)
> [    0.040000] pid_max: default: 32768 minimum: 301
> [    0.044000] Mount-cache hash table entries: 4096 (order: 1, 32768 bytes, linear)
> [    0.048000] Mountpoint-cache hash table entries: 4096 (order: 1, 32768 bytes, linear)
> [    0.052000] *** VALIDATE proc ***
> [    0.056000] Checking for the daddi bug... no.
> [    0.064000] devtmpfs: initialized
> [    0.068000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
> [    0.072000] futex hash table entries: 256 (order: -2, 6144 bytes, linear)
> [    0.076000] NET: Registered protocol family 16
> [    0.080000] clocksource: mfgpt: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133486551712 ns
> [    0.124000] SCSI subsystem initialized
> [    0.128000] usbcore: registered new interface driver usbfs
> [    0.132000] usbcore: registered new interface driver hub
> [    0.136000] usbcore: registered new device driver usb
> [    0.140000] PCI host bridge to bus 0000:00
> [    0.148000] pci_bus 0000:00: root bus resource [mem 0x40000000-0x7fffffff]
> [    0.152000] pci_bus 0000:00: root bus resource [io  0x4000-0xffff]
> [    0.156000] pci_bus 0000:00: root bus resource [??? 0x00000000 flags 0x0]
> [    0.160000] pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff]
> [    0.164000] pci 0000:00:06.0: [10ec:8169] type 00 class 0x020000
> [    0.168000] pci 0000:00:06.0: reg 0x10: [io  0xb100-0xb1ff]
> [    0.172000] pci 0000:00:06.0: reg 0x14: [mem 0x04075000-0x040750ff]
> [    0.176000] pci 0000:00:06.0: reg 0x30: [mem 0x04040000-0x0405ffff pref]
> [    0.180000] pci 0000:00:06.0: supports D1 D2
> [    0.184000] pci 0000:00:06.0: PME# supported from D1 D2 D3hot
> [    0.188000] pci 0000:00:08.0: [1039:0325] type 00 class 0x030000
> [    0.192000] pci 0000:00:08.0: reg 0x10: [mem 0x40000000-0x4fffffff pref]
> [    0.196000] pci 0000:00:08.0: reg 0x14: [mem 0x04000000-0x0403ffff]
> [    0.200000] pci 0000:00:08.0: reg 0x18: [io  0xb300-0xb37f]
> [    0.204000] pci 0000:00:08.0: reg 0x30: [mem 0x04060000-0x0406ffff pref]
> [    0.208000] pci 0000:00:08.0: supports D1 D2
> [    0.212000] pci 0000:00:0e.0: [1022:2090] type 00 class 0x060100
> [    0.216000] pci 0000:00:0e.0: reg 0x10: [io  0xb410-0xb417]
> [    0.220000] pci 0000:00:0e.0: reg 0x14: [io  0xb000-0xb0ff]
> [    0.224000] pci 0000:00:0e.0: reg 0x18: [io  0xb380-0xb3bf]
> [    0.228000] pci 0000:00:0e.0: reg 0x20: [io  0xb280-0xb2ff]
> [    0.232000] pci 0000:00:0e.0: reg 0x24: [io  0xb3c0-0xb3df]
> [    0.240000] pci 0000:00:0e.2: [1022:209a] type 00 class 0x010180
> [    0.244000] pci 0000:00:0e.2: reg 0x20: [io  0xb400-0xb40f]
> [    0.252000] pci 0000:00:0e.2: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
> [    0.256000] pci 0000:00:0e.2: legacy IDE quirk: reg 0x14: [io  0x03f6]
> [    0.260000] pci 0000:00:0e.2: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
> [    0.264000] pci 0000:00:0e.2: legacy IDE quirk: reg 0x1c: [io  0x0376]
> [    0.268000] pci 0000:00:0e.3: [1022:2093] type 00 class 0x040100
> [    0.272000] pci 0000:00:0e.3: reg 0x10: [io  0xb200-0xb27f]
> [    0.276000] pci 0000:00:0e.4: [1022:2094] type 00 class 0x0c0310
> [    0.280000] pci 0000:00:0e.4: reg 0x10: [mem 0x04074000-0x04074fff]
> [    0.288000] pci 0000:00:0e.5: [1022:2095] type 00 class 0x0c0320
> [    0.292000] pci 0000:00:0e.5: reg 0x10: [mem 0x04073000-0x04073fff]
> [    0.296000] pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 00
> [    0.300000] pci 0000:00:08.0: BAR 0: assigned [mem 0x40000000-0x4fffffff pref]
> [    0.304000] pci 0000:00:08.0: BAR 1: assigned [mem 0x50000000-0x5003ffff]
> [    0.308000] pci 0000:00:06.0: BAR 6: assigned [mem 0x50040000-0x5005ffff pref]
> [    0.312000] pci 0000:00:08.0: BAR 6: assigned [mem 0x50060000-0x5006ffff pref]
> [    0.316000] pci 0000:00:0e.4: BAR 0: assigned [mem 0x50070000-0x50070fff]
> [    0.320000] pci 0000:00:0e.5: BAR 0: assigned [mem 0x50071000-0x50071fff]
> [    0.324000] pci 0000:00:06.0: BAR 0: assigned [io  0x4000-0x40ff]
> [    0.328000] pci 0000:00:06.0: BAR 1: assigned [mem 0x50072000-0x500720ff]
> [    0.332000] pci 0000:00:0e.0: BAR 1: assigned [io  0x4400-0x44ff]
> [    0.336000] pci 0000:00:08.0: BAR 2: assigned [io  0x4800-0x487f]
> [    0.340000] pci 0000:00:0e.0: BAR 4: assigned [io  0x4880-0x48ff]
> [    0.344000] pci 0000:00:0e.3: BAR 0: assigned [io  0x4c00-0x4c7f]
> [    0.348000] pci 0000:00:0e.0: BAR 2: assigned [io  0x4c80-0x4cbf]
> [    0.352000] pci 0000:00:0e.0: BAR 5: assigned [io  0x4cc0-0x4cdf]
> [    0.356000] pci 0000:00:0e.2: BAR 4: assigned [io  0x4ce0-0x4cef]
> [    0.360000] pci 0000:00:0e.0: BAR 0: assigned [io  0x4cf0-0x4cf7]
> [    0.364000] clocksource: Switched to clocksource mfgpt
> [    0.392000] NET: Registered protocol family 2
> [    0.396000] tcp_listen_portaddr_hash hash table entries: 1024 (order: 0, 16384 bytes, linear)
> [    0.400000] TCP established hash table entries: 16384 (order: 3, 131072 bytes, linear)
> [    0.404000] TCP bind hash table entries: 16384 (order: 3, 131072 bytes, linear)
> [    0.408000] TCP: Hash tables configured (established 16384 bind 16384)
> [    0.412000] UDP hash table entries: 1024 (order: 1, 32768 bytes, linear)
> [    0.416000] UDP-Lite hash table entries: 1024 (order: 1, 32768 bytes, linear)
> [    0.420000] NET: Registered protocol family 1
> [    0.424000] pci 0000:00:0e.4: enabling device (0000 -> 0002)
> [    0.428000] PCI: CLS 32 bytes, default 32
> [    0.888000] random: fast init done
> [    1.804000] workingset: timestamp_bits=62 max_order=15 bucket_order=0
> [    1.820000] NET: Registered protocol family 38
> [    1.824000] io scheduler bfq registered
> [    1.828000] slot: 8, pin: 1, irq: 38
> [    1.832000] sisfb 0000:00:08.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x3030
> [    1.836000] sisfb: Video ROM not found
> [    1.840000] sisfb: Video RAM at 0x40000000, mapped to 0x9000000040000000, size 32768k
> [    1.844000] sisfb: MMIO at 0x50000000, mapped to 0x9000000050000000, size 256k
> [    1.848000] sisfb: Memory heap starting at 32160K, size 32K
> [    3.140000] sisfb: Detected SiS301C video bridge
> [    3.220000] sisfb: Detected 1280x1024 flat panel
> [    3.304000] sisfb: CRT2 DDC supported
> [    3.304000] sisfb: CRT2 DDC level: 2 
> [    3.512000] sisfb: Monitor range H 30-81KHz, V 56-76Hz, Max. dotclock 140MHz
> [    3.516000] sisfb: Default mode is 1280x1024x8 (60Hz)
> [    3.520000] sisfb: Initial vbflags 0x10000022
> [    4.008000] Console: switching to colour frame buffer device 160x64
> [    4.068000] sisfb: 2D acceleration is enabled, y-panning enabled (auto-max)
> [    4.072000] fb0: SiS 315PRO frame buffer device version 1.8.9
> [    4.076000] sisfb: Copyright (C) 2001-2005 Thomas Winischhofer
> [    4.156000] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> [    4.308000] printk: console [ttyS0] disabled
> [    4.316000] serial8250.0: ttyS0 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
> [    4.320000] printk: console [ttyS0] enabled
> [    4.324000] printk: bootconsole [early0] disabled
> [    4.808000] brd: module loaded
> [    5.036000] loop: module loaded
> [    5.040000] Uniform Multi-Platform E-IDE driver
> [    5.044000] amd74xx 0000:00:0e.2: UDMA100 controller
> [    5.048000] amd74xx 0000:00:0e.2: IDE controller (0x1022:0x209a rev 0x01)
> [    5.052000] amd74xx 0000:00:0e.2: IDE port disabled
> [    5.056000] amd74xx 0000:00:0e.2: not 100% native mode: will probe irqs later
> [    5.060000] legacy IDE will be removed in 2021, please switch to libata
> [    5.060000] Report any missing HW support to linux-ide@vger.kernel.org
> [    5.064000]     ide0: BM-DMA at 0x4ce0-0x4ce7
> [    5.068000] Probing IDE interface ide0...
> [    5.428000] hda: WDC WD1600BEVS-00VAT0, ATA DISK drive
> [    6.152000] hda: host max PIO5 wanted PIO255(auto-tune) selected PIO4
> [    6.152000] hda: UDMA/100 mode selected
> [    6.156000] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> [    6.160000] ide-gd driver 1.18
> [    6.164000] hda: max request size: 1024KiB
> [    6.244000] hda: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63
> [    6.248000] hda: cache flushes supported
> [    6.268000]  hda: hda1
> [    6.272000] slot: 6, pin: 1, irq: 36
> [    6.276000] libphy: r8169: probed
> [    6.280000] r8169 0000:00:06.0 eth0: RTL8169sc/8110sc, 00:23:9e:00:0f:54, XID 980, IRQ 36
> [    6.284000] r8169 0000:00:06.0 eth0: jumbo features [frames: 7152 bytes, tx checksumming: ok]
> [    6.288000] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> [    6.292000] ehci-pci: EHCI PCI platform driver
> [    6.296000] ehci-pci 0000:00:0e.5: EHCI Host Controller
> [    6.300000] ehci-pci 0000:00:0e.5: new USB bus registered, assigned bus number 1
> [    6.304000] ehci-pci 0000:00:0e.5: irq 11, io mem 0x50071000
> [    6.440000] ehci-pci 0000:00:0e.5: USB 0.0 started, EHCI 1.00
> [    6.448000] hub 1-0:1.0: USB hub found
> [    6.460000] hub 1-0:1.0: 4 ports detected
> [    6.468000] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> [    6.472000] ohci-pci: OHCI PCI platform driver
> [    6.476000] ohci-pci 0000:00:0e.4: OHCI PCI host controller
> [    6.480000] ohci-pci 0000:00:0e.4: new USB bus registered, assigned bus number 2
> [    6.484000] ohci-pci 0000:00:0e.4: irq 11, io mem 0x50070000
> [    6.580000] hub 2-0:1.0: USB hub found
> [    6.592000] hub 2-0:1.0: 4 ports detected
> [    6.600000] usbcore: registered new interface driver usb-storage
> [    6.604000] loongson2_cpufreq: Loongson-2F CPU frequency driver
> [    6.608000] usbcore: registered new interface driver usbhid
> [    6.612000] usbhid: USB HID core driver
> [    6.620000] NET: Registered protocol family 17
> [    6.700000] Freeing unused kernel memory: 1968K
> [    6.704000] This architecture does not have kernel memory protection.
> [    6.708000] Run /init as init process
> [   10.868000] EXT4-fs (hda1): mounting ext3 file system using the ext4 subsystem
> [   12.756000] EXT4-fs (hda1): mounted filesystem with ordered data mode. Opts: (null)
> [   15.800000] RTL8211B Gigabit Ethernet r8169-30:00: attached PHY driver [RTL8211B Gigabit Ethernet] (mii_bus:phy_addr=r8169-30:00, irq=IGNORE)
> [   15.908000] r8169 0000:00:06.0 eth0: Link is Down
> [   18.424000] r8169 0000:00:06.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
> 
> A.
> 
Thanks for reporting and the log. I'll revert the patch.

Heiner

^ permalink raw reply

* [PATCH net-next] r8169: fix DMA issue on MIPS platform
From: Heiner Kallweit @ 2019-08-23 18:07 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller
  Cc: Aaro Koskinen, netdev@vger.kernel.org

As reported by Aaro this patch causes network problems on
MIPS Loongson platform. Therefore revert it.

Fixes: f072218cca5b ("r8169: remove not needed call to dma_sync_single_for_device")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
---
 drivers/net/ethernet/realtek/r8169_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 910944120..6182e7d33 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5822,6 +5822,10 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
 			skb->tail += pkt_size;
 			skb->len = pkt_size;
 
+			dma_sync_single_for_device(tp_to_dev(tp),
+						   le64_to_cpu(desc->addr),
+						   pkt_size, DMA_FROM_DEVICE);
+
 			rtl8169_rx_csum(skb, status);
 			skb->protocol = eth_type_trans(skb, dev);
 
-- 
2.23.0


^ permalink raw reply related

* [PATCH net] Revert "r8169: remove not needed call to dma_sync_single_for_device"
From: Heiner Kallweit @ 2019-08-23 17:57 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller
  Cc: netdev@vger.kernel.org, Aaro Koskinen

This reverts commit f072218cca5b076dd99f3dfa3aaafedfd0023a51.

As reported by Aaro this patch causes network problems on
MIPS Loongson platform. Therefore revert it.

Fixes: f072218cca5b ("r8169: remove not needed call to dma_sync_single_for_device")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
---
This fix doesn't apply cleanly on net-next, I'll provide a separate one for net-next.
---
 drivers/net/ethernet/realtek/r8169_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index e1dd6ea60..bae0074ab 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5921,6 +5921,7 @@ static struct sk_buff *rtl8169_try_rx_copy(void *data,
 	skb = napi_alloc_skb(&tp->napi, pkt_size);
 	if (skb)
 		skb_copy_to_linear_data(skb, data, pkt_size);
+	dma_sync_single_for_device(d, addr, pkt_size, DMA_FROM_DEVICE);
 
 	return skb;
 }
-- 
2.23.0


^ permalink raw reply related

* RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Parav Pandit @ 2019-08-23 18:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
	Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia, netdev@vger.kernel.org
In-Reply-To: <20190823111641.7f928917@x1.home>



> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, August 23, 2019 10:47 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S . Miller
> <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>; Cornelia
> Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Fri, 23 Aug 2019 16:14:04 +0000
> Parav Pandit <parav@mellanox.com> wrote:
> 
> > > > Idea is to have mdev alias as optional.
> > > > Each mdev_parent says whether it wants mdev_core to generate an
> > > > alias or not. So only networking device drivers would set it to true.
> > > > For rest, alias won't be generated, and won't be compared either
> > > > during creation time. User continue to provide only uuid.
> > >
> > > Ok
> > >
> > > > I am tempted to have alias collision detection only within
> > > > children mdevs of the same parent, but doing so will always
> > > > mandate to prefix in netdev name. And currently we are left with
> > > > only 3 characters to prefix it, so that may not be good either.
> > > > Hence, I think mdev core wide alias is better with 12 characters.
> > >
> > > I suppose it depends on the API, if the vendor driver can ask the
> > > mdev core for an alias as part of the device creation process, then
> > > it could manage the netdev namespace for all its devices, choosing
> > > how many characters to use, and fail the creation if it can't meet a
> > > uniqueness requirement.  IOW, mdev-core would always provide a full
> > > sha1 and therefore gets itself out of the uniqueness/collision aspects.
> > >
> > This doesn't work. At mdev core level 20 bytes sha1 are unique, so
> > mdev core allowed to create a mdev.
> 
> The mdev vendor driver has the opportunity to fail the device creation in
> mdev_parent_ops.create().
> 
That is not helpful for below reasons.
1. vendor driver doesn't have visibility in other vendor's alias.
2. Even for single vendor, it needs to maintain global list of devices to see collision.
3. multiple vendors needs to implement same scheme.

Mdev core should be the owner. Shifting ownership from one layer to a lower layer in vendor driver doesn't solve the problem
(if there is one, which I think doesn't exist).

> > And then devlink core chooses
> > only 6 bytes (12 characters) and there is collision. Things fall
> > apart. Since mdev provides unique uuid based scheme, it's the mdev
> > core's ownership to provide unique aliases.
> 
> You're suggesting/contemplating multiple solutions here, 3-char prefix + 12-
> char sha1 vs <parent netdev> + ?-char sha1.  Also, the 15-char total limit is
> imposed by an external subsystem, where the vendor driver is the gateway
> between that subsystem and mdev.  How would mdev integrate with another
> subsystem that maybe only has 9-chars available?  Would the vendor driver API
> specify "I need an alias" or would it specify "I need an X-char length alias"?
Yes, Vendor driver should say how long the alias it wants.
However before we implement that, I suggest let such vendor/user/driver arrive which needs that.
Such variable length alias can be added at that time and even with that alias collision can be detected by single mdev module.

> Does it make sense that mdev-core would fail creation of a device if there's a
> collision in the 12-char address space between different subsystems?  For
> example, does enm0123456789ab really collide with xyz0123456789ab? 
I think so, because at mdev level its 12-char alias matters.
Choosing the prefix not adding prefix is really a user space choice.

>  So if
> mdev were to provided a 40-char sha1, is it possible that the vendor driver
> could consume this in its create callback, truncate it to the number of chars
> required by the vendor driver's subsystem, and determine whether a collision
> exists?
We shouldn't shift the problem from mdev to multiple vendor drivers to detect collision.

I still think that user providing alias is better because it knows the use-case system in use, and eliminates these collision issue.

> 
> > > > I do not understand how an extra character reduces collision, if
> > > > that's what you meant.
> > >
> > > If the default were for example 3-chars, we might already have
> > > device 'abc'.  A collision would expose one more char of the new
> > > device, so we might add device with alias 'abcd'.  I mentioned
> > > previously that this leaves an issue for userspace that we can't
> > > change the alias of device abc, so without additional information,
> > > userspace can only determine via elimination the mapping of alias to
> > > device, but userspace has more information available to it in the
> > > form of sysfs links.
> > > > Module options are almost not encouraged anymore with other
> > > > subsystems/drivers.
> > >
> > > We don't live in a world of absolutes.  I agree that the defaults
> > > should work in the vast majority of cases.  Requiring a user to
> > > twiddle module options to make things work is undesirable, verging
> > > on a bug.  A module option to enable some specific feature, unsafe
> > > condition, or test that is outside of the typical use case is
> > > reasonable, imo.
> > > > For testing collision rate, a sample user space script and sample
> > > > mtty is easy and get us collision count too. We shouldn't put that
> > > > using module option in production kernel. I practically have the
> > > > code ready to play with; Changing 12 to smaller value is easy with
> > > > module reload.
> > > >
> > > > #define MDEV_ALIAS_LEN 12
> > >
> > > If it can't be tested with a shipping binary, it probably won't be
> > > tested.  Thanks,
> > It is not the role of mdev core to expose collision
> > efficiency/deficiency of the sha1. It can be tested outside before
> > mdev choose to use it.
> 
> The testing I'm considering is the user and kernel response to a collision.
> 
> > I am saying we should test with 12 characters with 10,000 or more
> > devices and see how collision occurs. Even if collision occurs, mdev
> > returns EEXIST status indicating user to pick a different UUID for
> > those rare conditions.
> 
> The only way we're going to see collision with a 12-char sha1 is if we burn the
> CPU cycles to find uuids that collide in that space.  10,000 devices is not
> remotely enough to generate a collision in that address space.  That puts a
> prerequisite in place that in order to test collision, someone needs to know
> certain magic inputs.  OTOH, if we could use a shorter abbreviation, collisions
> are trivial to test experimentally.  Thanks,
> 
Yes, and therefore a sane user who wants to create more mdevs, wouldn't intentionally stress it to see failures.

> Alex

^ permalink raw reply

* Re: [PATCH net] ipv4: mpls: fix mpls_xmit for iptunnel
From: David Ahern @ 2019-08-23 17:59 UTC (permalink / raw)
  To: Alexey Kodanev, netdev; +Cc: David Miller
In-Reply-To: <1566582703-26567-1-git-send-email-alexey.kodanev@oracle.com>

On 8/23/19 1:51 PM, Alexey Kodanev wrote:
> When using mpls over gre/gre6 setup, rt->rt_gw4 address is not set, the
> same for rt->rt_gw_family.  Therefore, when rt->rt_gw_family is checked
> in mpls_xmit(), neigh_xmit() call is skipped. As a result, such setup
> doesn't work anymore.
> 
> This issue was found with LTP mpls03 tests.
> 
> Fixes: 1550c171935d ("ipv4: Prepare rtable for IPv6 gateway")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---
>  net/mpls/mpls_iptunnel.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 

Thanks for the report and patch. On first glance, it seems odd that
neigh_xmit could be called with rt_gw4 (formerly rt_gateway) set to 0
and if it is non-zero, why isn't family set.

I am traveling today and doubt I will be able to take a deep look at
this until Monday.

^ permalink raw reply

* Aw: [PATCH net-next v3 0/3] net: ethernet: mediatek: convert to PHYLINK
From: Frank Wunderlich @ 2019-08-23 17:56 UTC (permalink / raw)
  To: "René van Dorst"
  Cc: John Crispin, Sean Wang, Nelson Chang, David S . Miller,
	Matthias Brugger, netdev, linux-arm-kernel, linux-mediatek,
	linux-mips, Russell King, Stefan Roese,
	"René van Dorst"
In-Reply-To: <20190823134516.27559-1-opensource@vdorst.com>

tested on bpi-r2 (mt7623/mt7530) and bpi-r64 (mt7622/rtl8367)

as reported to rene directly rx-path needs some rework because current rx-speed
on bpi-r2 is 865 Mbits/sec instead of ~940 Mbits/sec

Tested-by: Frank Wunderlich <frank-w@public-files.de>

regards Frank


> Gesendet: Freitag, 23. August 2019 um 15:45 Uhr
> Von: "René van Dorst" <opensource@vdorst.com>
> An: "John Crispin" <john@phrozen.org>, "Sean Wang" <sean.wang@mediatek.com>, "Nelson Chang" <nelson.chang@mediatek.com>, "David S . Miller" <davem@davemloft.net>, "Matthias Brugger" <matthias.bgg@gmail.com>
> Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-mips@vger.kernel.org, "Russell King" <linux@armlinux.org.uk>, "Frank Wunderlich" <frank-w@public-files.de>, "Stefan Roese" <sr@denx.de>, "René van Dorst" <opensource@vdorst.com>
> Betreff: [PATCH net-next v3 0/3] net: ethernet: mediatek: convert to PHYLINK
>
> These patches converts mediatek driver to PHYLINK API.
> 
> v2->v3:
> * Phylink improvements and clean-ups after review
> v1->v2:
> * Rebase for mt76x8 changes
> * Phylink improvements and clean-ups after review
> * SGMII port doesn't support 2.5Gbit in SGMII mode only in BASE-X mode.
>   Refactor the code.
> 
> René van Dorst (3):
>   net: ethernet: mediatek: Add basic PHYLINK support
>   net: ethernet: mediatek: Re-add support SGMII
>   dt-bindings: net: ethernet: Update mt7622 docs and dts to reflect the
>     new phylink API
> 
>  .../arm/mediatek/mediatek,sgmiisys.txt        |   2 -
>  .../dts/mediatek/mt7622-bananapi-bpi-r64.dts  |  28 +-
>  arch/arm64/boot/dts/mediatek/mt7622.dtsi      |   1 -
>  drivers/net/ethernet/mediatek/Kconfig         |   2 +-
>  drivers/net/ethernet/mediatek/mtk_eth_path.c  |  75 +--
>  drivers/net/ethernet/mediatek/mtk_eth_soc.c   | 529 ++++++++++++------
>  drivers/net/ethernet/mediatek/mtk_eth_soc.h   |  68 ++-
>  drivers/net/ethernet/mediatek/mtk_sgmii.c     |  65 ++-
>  8 files changed, 477 insertions(+), 293 deletions(-)
> 
> -- 
> 2.20.1
> 
>

^ permalink raw reply

* [PATCH net] ipv4: mpls: fix mpls_xmit for iptunnel
From: Alexey Kodanev @ 2019-08-23 17:51 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern, David Miller, Alexey Kodanev

When using mpls over gre/gre6 setup, rt->rt_gw4 address is not set, the
same for rt->rt_gw_family.  Therefore, when rt->rt_gw_family is checked
in mpls_xmit(), neigh_xmit() call is skipped. As a result, such setup
doesn't work anymore.

This issue was found with LTP mpls03 tests.

Fixes: 1550c171935d ("ipv4: Prepare rtable for IPv6 gateway")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
---
 net/mpls/mpls_iptunnel.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index d25e91d..44b6750 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -133,12 +133,12 @@ static int mpls_xmit(struct sk_buff *skb)
 	mpls_stats_inc_outucastpkts(out_dev, skb);
 
 	if (rt) {
-		if (rt->rt_gw_family == AF_INET)
-			err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, &rt->rt_gw4,
-					 skb);
-		else if (rt->rt_gw_family == AF_INET6)
+		if (rt->rt_gw_family == AF_INET6)
 			err = neigh_xmit(NEIGH_ND_TABLE, out_dev, &rt->rt_gw6,
 					 skb);
+		else
+			err = neigh_xmit(NEIGH_ARP_TABLE, out_dev, &rt->rt_gw4,
+					 skb);
 	} else if (rt6) {
 		if (ipv6_addr_v4mapped(&rt6->rt6i_gateway)) {
 			/* 6PE (RFC 4798) */
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH v4 00/14] NFC: nxp-nci: clean up and new device support
From: Andy Shevchenko @ 2019-08-23 17:20 UTC (permalink / raw)
  To: Sedat Dilek
  Cc: Clément Perrochaud, Charles Gorand, netdev, David S. Miller,
	Sedat Dilek
In-Reply-To: <892584913.468.1566336479573@ox.credativ.com>

On Tue, Aug 20, 2019 at 11:27:59PM +0200, Sedat Dilek wrote:
> > Andy Shevchenko <andriy.shevchenko@linux.intel.com> hat am 29. Juli 2019 15:35 geschrieben:

> I gave that patchset v4 a try against Linux v5.3-rc5.
> 
> And played with neard and neard-tools v0.16-0.1 from Debian/buster AMD64.
> 
> # nfctool --list
> 
> # nfctool --enable --device=nfc0
> 
> # nfctool --list --device=nfc0
> nfc0:
>           Tags: [ tag11 ]
>           Devices: [ ]
>           Protocols: [ Felica MIFARE Jewel ISO-DEP NFC-DEP ]
>           Powered: Yes
>           RF Mode: Initiator
>           lto: 0
>           rw: 0
>           miux: 0
> 
> # nfctool --device=nfc0 --poll=Both --sniff --dump-symm
> Start sniffer on nfc0
> 
> Start polling on nfc0 as both initiator and target
> 
> Targets found for nfc0
>   Tags: [ tag11 ]
>   Devices: [ ]
> 
> But I see in the logs:
> 
> # journalctl -u neard.service -f
> Aug 20 23:01:15 iniza neard[6158]: Error while reading NFC bytes
> 
> What does this error mean?
> How can I get more informations?
> Can you aid with debugging help?

Unfortunately I have no idea about user space tools.
But I think neard has to be updated to match kernel somehow.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Alex Williamson @ 2019-08-23 17:16 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jiri Pirko, Jiri Pirko, David S . Miller, Kirti Wankhede,
	Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB4866E33AF7203DE47F713FAAD1A40@AM0PR05MB4866.eurprd05.prod.outlook.com>

On Fri, 23 Aug 2019 16:14:04 +0000
Parav Pandit <parav@mellanox.com> wrote:

> > -----Original Message-----
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Friday, August 23, 2019 9:22 PM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>; David S . Miller
> > <davem@davemloft.net>; Kirti Wankhede <kwankhede@nvidia.com>; Cornelia
> > Huck <cohuck@redhat.com>; kvm@vger.kernel.org; linux-
> > kernel@vger.kernel.org; cjia <cjia@nvidia.com>; netdev@vger.kernel.org
> > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > 
> > On Fri, 23 Aug 2019 14:53:06 +0000
> > Parav Pandit <parav@mellanox.com> wrote:
> >   
> > > > -----Original Message-----
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Friday, August 23, 2019 7:58 PM
> > > > To: Parav Pandit <parav@mellanox.com>
> > > > Cc: Jiri Pirko <jiri@resnulli.us>; Jiri Pirko <jiri@mellanox.com>;
> > > > David S . Miller <davem@davemloft.net>; Kirti Wankhede
> > > > <kwankhede@nvidia.com>; Cornelia Huck <cohuck@redhat.com>;
> > > > kvm@vger.kernel.org; linux- kernel@vger.kernel.org; cjia
> > > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > >
> > > > On Fri, 23 Aug 2019 08:14:39 +0000
> > > > Parav Pandit <parav@mellanox.com> wrote:
> > > >  
> > > > > Hi Alex,
> > > > >
> > > > >  
> > > > > > -----Original Message-----
> > > > > > From: Jiri Pirko <jiri@resnulli.us>
> > > > > > Sent: Friday, August 23, 2019 1:42 PM
> > > > > > To: Parav Pandit <parav@mellanox.com>
> > > > > > Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > > > > <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > > > > Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck  
> > > > <cohuck@redhat.com>;  
> > > > > > kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > > <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > > >
> > > > > > Thu, Aug 22, 2019 at 03:33:30PM CEST, parav@mellanox.com wrote:  
> > > > > > >
> > > > > > >  
> > > > > > >> -----Original Message-----
> > > > > > >> From: Jiri Pirko <jiri@resnulli.us>
> > > > > > >> Sent: Thursday, August 22, 2019 5:50 PM
> > > > > > >> To: Parav Pandit <parav@mellanox.com>
> > > > > > >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri Pirko
> > > > > > >> <jiri@mellanox.com>; David S . Miller <davem@davemloft.net>;
> > > > > > >> Kirti Wankhede <kwankhede@nvidia.com>; Cornelia Huck  
> > > > > > <cohuck@redhat.com>;  
> > > > > > >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > > >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > > >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev
> > > > > > >> core
> > > > > > >>
> > > > > > >> Thu, Aug 22, 2019 at 12:04:02PM CEST, parav@mellanox.com wrote:  
> > > > > > >> >
> > > > > > >> >  
> > > > > > >> >> -----Original Message-----
> > > > > > >> >> From: Jiri Pirko <jiri@resnulli.us>
> > > > > > >> >> Sent: Thursday, August 22, 2019 3:28 PM
> > > > > > >> >> To: Parav Pandit <parav@mellanox.com>
> > > > > > >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri
> > > > > > >> >> Pirko <jiri@mellanox.com>; David S . Miller
> > > > > > >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > > > >> >> <kwankhede@nvidia.com>; Cornelia Huck  
> > > > > > >> <cohuck@redhat.com>;  
> > > > > > >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > > >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > > >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev
> > > > > > >> >> core
> > > > > > >> >>
> > > > > > >> >> Thu, Aug 22, 2019 at 11:42:13AM CEST, parav@mellanox.com  
> > wrote:  
> > > > > > >> >> >
> > > > > > >> >> >  
> > > > > > >> >> >> -----Original Message-----
> > > > > > >> >> >> From: Jiri Pirko <jiri@resnulli.us>
> > > > > > >> >> >> Sent: Thursday, August 22, 2019 2:59 PM
> > > > > > >> >> >> To: Parav Pandit <parav@mellanox.com>
> > > > > > >> >> >> Cc: Alex Williamson <alex.williamson@redhat.com>; Jiri
> > > > > > >> >> >> Pirko <jiri@mellanox.com>; David S . Miller
> > > > > > >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > > > >> >> >> <kwankhede@nvidia.com>; Cornelia Huck  
> > > > > > >> >> <cohuck@redhat.com>;  
> > > > > > >> >> >> kvm@vger.kernel.org; linux-kernel@vger.kernel.org; cjia
> > > > > > >> >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > > >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and
> > > > > > >> >> >> mdev core
> > > > > > >> >> >>
> > > > > > >> >> >> Wed, Aug 21, 2019 at 08:23:17AM CEST,
> > > > > > >> >> >> parav@mellanox.com  
> > > > wrote:  
> > > > > > >> >> >> >
> > > > > > >> >> >> >  
> > > > > > >> >> >> >> -----Original Message-----
> > > > > > >> >> >> >> From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > >> >> >> >> Sent: Wednesday, August 21, 2019 10:56 AM
> > > > > > >> >> >> >> To: Parav Pandit <parav@mellanox.com>
> > > > > > >> >> >> >> Cc: Jiri Pirko <jiri@mellanox.com>; David S . Miller
> > > > > > >> >> >> >> <davem@davemloft.net>; Kirti Wankhede
> > > > > > >> >> >> >> <kwankhede@nvidia.com>; Cornelia Huck
> > > > > > >> >> >> >> <cohuck@redhat.com>; kvm@vger.kernel.org;
> > > > > > >> >> >> >> linux-kernel@vger.kernel.org; cjia
> > > > > > >> >> >> >> <cjia@nvidia.com>; netdev@vger.kernel.org
> > > > > > >> >> >> >> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and
> > > > > > >> >> >> >> mdev core
> > > > > > >> >> >> >>  
> > > > > > >> >> >> >> > > > > Just an example of the alias, not proposing how it's  
> > set.  
> > > > > > >> >> >> >> > > > > In fact, proposing that the user does not
> > > > > > >> >> >> >> > > > > set it, mdev-core provides one  
> > > > > > >> >> >> >> > > automatically.  
> > > > > > >> >> >> >> > > > >  
> > > > > > >> >> >> >> > > > > > > Since there seems to be some prefix
> > > > > > >> >> >> >> > > > > > > overhead, as I ask about above in how
> > > > > > >> >> >> >> > > > > > > many characters we actually have to work
> > > > > > >> >> >> >> > > > > > > with in IFNAMESZ, maybe we start with
> > > > > > >> >> >> >> > > > > > > 8 characters (matching your "index"
> > > > > > >> >> >> >> > > > > > > namespace) and expand as necessary for  
> > > > > > >> >> >> disambiguation.  
> > > > > > >> >> >> >> > > > > > > If we can eliminate overhead in
> > > > > > >> >> >> >> > > > > > > IFNAMESZ, let's start with  
> > > > > > >> 12.  
> > > > > > >> >> >> >> > > > > > > Thanks,
> > > > > > >> >> >> >> > > > > > >  
> > > > > > >> >> >> >> > > > > > If user is going to choose the alias, why
> > > > > > >> >> >> >> > > > > > does it have to be limited to  
> > > > > > >> >> >> >> sha1?  
> > > > > > >> >> >> >> > > > > > Or you just told it as an example?
> > > > > > >> >> >> >> > > > > >
> > > > > > >> >> >> >> > > > > > It can be an alpha-numeric string.  
> > > > > > >> >> >> >> > > > >
> > > > > > >> >> >> >> > > > > No, I'm proposing a different solution where
> > > > > > >> >> >> >> > > > > mdev-core creates an alias based on an
> > > > > > >> >> >> >> > > > > abbreviated sha1.  The user does not provide
> > > > > > >> >> >> >> > > > > the  
> > > > > > >> >> >> >> alias.  
> > > > > > >> >> >> >> > > > >  
> > > > > > >> >> >> >> > > > > > Instead of mdev imposing number of
> > > > > > >> >> >> >> > > > > > characters on the alias, it should be best  
> > > > > > >> >> >> >> > > > > left to the user.  
> > > > > > >> >> >> >> > > > > > Because in future if netdev improves on
> > > > > > >> >> >> >> > > > > > the naming scheme, mdev will be  
> > > > > > >> >> >> >> > > > > limiting it, which is not right.  
> > > > > > >> >> >> >> > > > > > So not restricting alias size seems right to me.
> > > > > > >> >> >> >> > > > > > User configuring mdev for networking
> > > > > > >> >> >> >> > > > > > devices in a given kernel knows what  
> > > > > > >> >> >> >> > > > > user is doing.  
> > > > > > >> >> >> >> > > > > > So user can choose alias name size as it finds  
> > suitable.  
> > > > > > >> >> >> >> > > > >
> > > > > > >> >> >> >> > > > > That's not what I'm proposing, please read again.
> > > > > > >> >> >> >> > > > > Thanks,  
> > > > > > >> >> >> >> > > >
> > > > > > >> >> >> >> > > > I understood your point. But mdev doesn't know
> > > > > > >> >> >> >> > > > how user is going to use  
> > > > > > >> >> >> >> > > udev/systemd to name the netdev.  
> > > > > > >> >> >> >> > > > So even if mdev chose to pick 12 characters,
> > > > > > >> >> >> >> > > > it could result in  
> > > > > > >> >> collision.  
> > > > > > >> >> >> >> > > > Hence the proposal to provide the alias by the
> > > > > > >> >> >> >> > > > user, as user know the best  
> > > > > > >> >> >> >> > > policy for its use case in the environment its using.  
> > > > > > >> >> >> >> > > > So 12 character sha1 method will still work by user.  
> > > > > > >> >> >> >> > >
> > > > > > >> >> >> >> > > Haven't you already provided examples where
> > > > > > >> >> >> >> > > certain drivers or subsystems have unique netdev  
> > prefixes?  
> > > > > > >> >> >> >> > > If mdev provides a unique alias within the
> > > > > > >> >> >> >> > > subsystem, couldn't we simply define a netdev
> > > > > > >> >> >> >> > > prefix for the mdev subsystem and avoid all
> > > > > > >> >> >> >> > > other collisions?  I'm not in favor of the user
> > > > > > >> >> >> >> > > providing both a uuid and an alias/instance.
> > > > > > >> >> >> >> > > Thanks,
> > > > > > >> >> >> >> > >  
> > > > > > >> >> >> >> > For a given prefix, say ens2f0, can two UUID->sha1
> > > > > > >> >> >> >> > first 9 characters have  
> > > > > > >> >> >> >> collision?
> > > > > > >> >> >> >>
> > > > > > >> >> >> >> I think it would be a mistake to waste so many chars
> > > > > > >> >> >> >> on a prefix, but
> > > > > > >> >> >> >> 9 characters of sha1 likely wouldn't have a
> > > > > > >> >> >> >> collision before we have 10s of thousands of
> > > > > > >> >> >> >> devices.  Thanks,
> > > > > > >> >> >> >>
> > > > > > >> >> >> >> Alex  
> > > > > > >> >> >> >
> > > > > > >> >> >> >Jiri, Dave,
> > > > > > >> >> >> >Are you ok with it for devlink/netdev part?
> > > > > > >> >> >> >Mdev core will create an alias from a UUID.
> > > > > > >> >> >> >
> > > > > > >> >> >> >This will be supplied during devlink port attr set
> > > > > > >> >> >> >such as,
> > > > > > >> >> >> >
> > > > > > >> >> >> >devlink_port_attrs_mdev_set(struct devlink_port *port,
> > > > > > >> >> >> >const char *mdev_alias);
> > > > > > >> >> >> >
> > > > > > >> >> >> >This alias is used to generate representor netdev's  
> > > > phys_port_name.  
> > > > > > >> >> >> >This alias from the mdev device's sysfs will be used
> > > > > > >> >> >> >by the udev/systemd to  
> > > > > > >> >> >> generate predicable netdev's name.  
> > > > > > >> >> >> >Example: enm<mdev_alias_first_12_chars>  
> > > > > > >> >> >>
> > > > > > >> >> >> What happens in unlikely case of 2 UUIDs collide?
> > > > > > >> >> >>  
> > > > > > >> >> >Since users sees two devices with same phys_port_name,
> > > > > > >> >> >user should destroy  
> > > > > > >> >> recently created mdev and recreate mdev with different UUID?
> > > > > > >> >>
> > > > > > >> >> Driver should make sure phys port name wont collide,  
> > > > > > >> >So when mdev creation is initiated, mdev core calculates the
> > > > > > >> >alias and if there  
> > > > > > >> is any other mdev with same alias exist, it returns -EEXIST
> > > > > > >> error before progressing further.  
> > > > > > >> >This way user will get to know upfront in event of collision
> > > > > > >> >before the mdev  
> > > > > > >> device gets created.  
> > > > > > >> >How about that?  
> > > > > > >>
> > > > > > >> Sounds fine to me. Now the question is how many chars do we
> > > > > > >> want to  
> > > > have.  
> > > > > > >>  
> > > > > > >12 characters from Alex's suggestion similar to git?  
> > > > > >
> > > > > > Ok.
> > > > > >  
> > > > >
> > > > > Can you please confirm this scheme looks good now? I like to get
> > > > > patches  
> > > > started.
> > > >
> > > > My only concern is your comment that in the event of an abbreviated
> > > > sha1 collision (as exceptionally rare as that might be at 12-chars),
> > > > we'd fail the device create, while my original suggestion was that
> > > > vfio-core would add an extra character to the alias.  For
> > > > non-networking devices, the sha1 is unnecessary, so the extension
> > > > behavior seems preferred.  The user is only responsible to provide a
> > > > unique uuid.  Perhaps the failure behavior could be applied based on
> > > > the mdev device_api.  A module option on mdev to specify the default
> > > > number of alias chars would also be useful for testing so that we
> > > > can set it low enough to validate the collision behavior.  Thanks,
> > > >  
> > >
> > > Idea is to have mdev alias as optional.
> > > Each mdev_parent says whether it wants mdev_core to generate an alias
> > > or not. So only networking device drivers would set it to true.
> > > For rest, alias won't be generated, and won't be compared either
> > > during creation time. User continue to provide only uuid.  
> > 
> > Ok
> >   
> > > I am tempted to have alias collision detection only within children
> > > mdevs of the same parent, but doing so will always mandate to prefix
> > > in netdev name. And currently we are left with only 3 characters to
> > > prefix it, so that may not be good either. Hence, I think mdev core
> > > wide alias is better with 12 characters.  
> > 
> > I suppose it depends on the API, if the vendor driver can ask the mdev core for
> > an alias as part of the device creation process, then it could manage the netdev
> > namespace for all its devices, choosing how many characters to use, and fail
> > the creation if it can't meet a uniqueness requirement.  IOW, mdev-core would
> > always provide a full sha1 and therefore gets itself out of the
> > uniqueness/collision aspects.
> >   
> This doesn't work. At mdev core level 20 bytes sha1 are unique, so
> mdev core allowed to create a mdev.

The mdev vendor driver has the opportunity to fail the device creation
in mdev_parent_ops.create().

> And then devlink core chooses
> only 6 bytes (12 characters) and there is collision. Things fall
> apart. Since mdev provides unique uuid based scheme, it's the mdev
> core's ownership to provide unique aliases.

You're suggesting/contemplating multiple solutions here, 3-char prefix +
12-char sha1 vs <parent netdev> + ?-char sha1.  Also, the 15-char
total limit is imposed by an external subsystem, where the vendor
driver is the gateway between that subsystem and mdev.  How would mdev
integrate with another subsystem that maybe only has 9-chars
available?  Would the vendor driver API specify "I need an alias" or
would it specify "I need an X-char length alias"?  Does it make sense
that mdev-core would fail creation of a device if there's a collision
in the 12-char address space between different subsystems?  For
example, does enm0123456789ab really collide with xyz0123456789ab?  So
if mdev were to provided a 40-char sha1, is it possible that the vendor
driver could consume this in its create callback, truncate it to the
number of chars required by the vendor driver's subsystem, and
determine whether a collision exists?

> > > I do not understand how an extra character reduces collision, if
> > > that's what you meant.  
> > 
> > If the default were for example 3-chars, we might already have
> > device 'abc'.  A collision would expose one more char of the new
> > device, so we might add device with alias 'abcd'.  I mentioned
> > previously that this leaves an issue for userspace that we can't
> > change the alias of device abc, so without additional information,
> > userspace can only determine via elimination the mapping of alias
> > to device, but userspace has more information available to it in
> > the form of sysfs links. 
> > > Module options are almost not encouraged anymore with other
> > > subsystems/drivers.  
> > 
> > We don't live in a world of absolutes.  I agree that the defaults
> > should work in the vast majority of cases.  Requiring a user to
> > twiddle module options to make things work is undesirable, verging
> > on a bug.  A module option to enable some specific feature, unsafe
> > condition, or test that is outside of the typical use case is
> > reasonable, imo. 
> > > For testing collision rate, a sample user space script and sample
> > > mtty is easy and get us collision count too. We shouldn't put
> > > that using module option in production kernel. I practically have
> > > the code ready to play with; Changing 12 to smaller value is easy
> > > with module reload.
> > >
> > > #define MDEV_ALIAS_LEN 12  
> > 
> > If it can't be tested with a shipping binary, it probably won't be
> > tested.  Thanks, 
> It is not the role of mdev core to expose collision
> efficiency/deficiency of the sha1. It can be tested outside before
> mdev choose to use it.

The testing I'm considering is the user and kernel response to a
collision.
 
> I am saying we should test with 12 characters with 10,000 or more
> devices and see how collision occurs. Even if collision occurs, mdev
> returns EEXIST status indicating user to pick a different UUID for
> those rare conditions.

The only way we're going to see collision with a 12-char sha1 is if we
burn the CPU cycles to find uuids that collide in that space.  10,000
devices is not remotely enough to generate a collision in that address
space.  That puts a prerequisite in place that in order to test
collision, someone needs to know certain magic inputs.  OTOH, if we
could use a shorter abbreviation, collisions are trivial to test
experimentally.  Thanks,

Alex

^ permalink raw reply

* Re: [PATCH net-next 6/6] net: dsa: clear VLAN flags for CPU port
From: Florian Fainelli @ 2019-08-23 17:00 UTC (permalink / raw)
  To: Vladimir Oltean, Vivien Didelot, netdev; +Cc: davem, andrew
In-Reply-To: <aee63928-a99e-3849-c8b4-dee9b660247c@gmail.com>

On 8/22/19 4:51 PM, Vladimir Oltean wrote:
> On 8/22/19 11:13 PM, Vivien Didelot wrote:
>> When the bridge offloads a VLAN on a slave port, we also need to
>> program its dedicated CPU port as a member of the VLAN.
>>
>> Drivers may handle the CPU port's membership as they want. For example,
>> Marvell as a special "Unmodified" mode to pass frames as is through
>> such ports.
>>
>> Even though DSA expects the drivers to handle the CPU port membership,
>> they are unlikely to program such VLANs untagged, and certainly not as
>> PVID. This patch clears the VLAN flags before programming the CPU port.
>>
>> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
>> Suggested-by: Vladimir Oltean <olteanv@gmail.com>
>> ---
>>   net/dsa/slave.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
>> index 8267c156a51a..48df48f76c67 100644
>> --- a/net/dsa/slave.c
>> +++ b/net/dsa/slave.c
>> @@ -332,6 +332,12 @@ static int dsa_slave_vlan_add(struct net_device
>> *dev,
>>       if (err)
>>           return err;
>>   +    /* We need the dedicated CPU port to be a member of the VLAN as
>> well.
>> +     * Even though drivers often handle CPU membership in special ways,
>> +     * CPU ports are likely to be tagged, so clear the VLAN flags.
>> +     */
>> +    vlan.flags = 0;
>> +
> 
> How does this work exactly?
> If I run 'sudo bridge vlan add vid 1 dev swp4 pvid untagged', then the
> CPU port starts sending VLAN-tagged traffic. I see this in tcpdump on
> the DSA master port, but if I tcpdump on swp4, the VLAN tag is removed.
> Who is doing that?

If vlan.flags = 0, then it does not have either BRIDGE_VLAN_INFO_PVID or
BRIDGE_VLAN_INFO_UNTAGGED which means the VLAN should be programmed
tagged on the CPU.

Since swp4 is part of the same VLAN, but has it configured PVID
untagged, the tag is removed, that sounds about what I would expect to
see...
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 2/6] net: dsa: do not skip -EOPNOTSUPP in dsa_port_vid_add
From: Florian Fainelli @ 2019-08-23 16:49 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Vivien Didelot, netdev, David S. Miller, Andrew Lunn
In-Reply-To: <CA+h21hpzSNZTK6-wQJkJCC9vs0hao12_tpQRLM5JLXXD_26c_w@mail.gmail.com>

On 8/23/19 9:32 AM, Vladimir Oltean wrote:
> Hi Florian,
> 
> On Fri, 23 Aug 2019 at 19:23, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>> On 8/22/19 4:44 PM, Vladimir Oltean wrote:
>>> On Fri, 23 Aug 2019 at 02:43, Vivien Didelot <vivien.didelot@gmail.com> wrote:
>>>>
>>>> Hi Vladimir,
>>>>
>>>> On Fri, 23 Aug 2019 01:06:58 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
>>>>> Hi Vivien,
>>>>>
>>>>> On 8/22/19 11:13 PM, Vivien Didelot wrote:
>>>>>> Currently dsa_port_vid_add returns 0 if the switch returns -EOPNOTSUPP.
>>>>>>
>>>>>> This function is used in the tag_8021q.c code to offload the PVID of
>>>>>> ports, which would simply not work if .port_vlan_add is not supported
>>>>>> by the underlying switch.
>>>>>>
>>>>>> Do not skip -EOPNOTSUPP in dsa_port_vid_add but only when necessary,
>>>>>> that is to say in dsa_slave_vlan_rx_add_vid.
>>>>>>
>>>>>
>>>>> Do you know why Florian suppressed -EOPNOTSUPP in 061f6a505ac3 ("net:
>>>>> dsa: Add ndo_vlan_rx_{add, kill}_vid implementation")?
>>>>> I forced a return value of -EOPNOTSUPP here and when I create a VLAN
>>>>> sub-interface nothing breaks, it just says:
>>>>> RTNETLINK answers: Operation not supported
>>>>> which IMO is expected.
>>>>
>>>> I do not know what you mean. This patch does not change the behavior of
>>>> dsa_slave_vlan_rx_add_vid, which returns 0 if -EOPNOTSUPP is caught.
>>>>
>>>
>>> Yes, but what's wrong with just forwarding -EOPNOTSUPP?
>>
>> It makes us fail adding the VLAN to the slave network device, which
>> sounds silly, if we can't offload it in HW, that's fine, we can still do
>> a SW VLAN instead, see net/8021q/vlan_core.c::vlan_add_rx_filter_info().
>>
>> Maybe a more correct solution is to set the NETIF_F_HW_VLAN_CTAG_FILTER
>> feature bit only if we have the ability to offload, now that I think
>> about it. Would you want me to cook a patch doing that?
> 
> sja1105 doesn't support offloading NETIF_F_HW_VLAN_CTAG_FILTER even
> though it does support programming VLANs.

The additional of the ndo_vlan_rx_{add,kill}_vid() is such that
standalone DSA ports continue to work while there is a bridge with
vlan_filtering=1 spanning other ports. In order for that ndo operation
to be called, we need to advertise the NETIF_F_HW_VLAN_CTAG_FILTER feature.

> Adding an offloaded VLAN sub-interface on a standalone switch port
> (vlan_filtering=0, uses dsa_8021q) would make the driver insert a VLAN
> entry whilst the TPID is ETH_P_DSA_8021Q.
> Maybe just let the driver set the netdev features, similar to how it
> does for the number of TX queues?

Why should we bend the framework because sja1105 and dsa_8021q are
special? Let me counter the argument: if the tagging is DSA_8021Q, why
not clear the feature then?
-- 
Florian

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox