Netdev List

Netdev List
 help / color / mirror / Atom feed

* [patch net] devlink: convert occ_get op to separate registration
From: Jiri Pirko @ 2018-04-05 20:13 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, jakub.kicinski, dsahern, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

This resolves race during initialization where the resources with
ops are registered before driver and the structures used by occ_get
op is initialized. So keep occ_get callbacks registered only when
all structs are initialized.

The example flows, as it is in mlxsw:
1) driver load/asic probe:
   mlxsw_core
      -> mlxsw_sp_resources_register
        -> mlxsw_sp_kvdl_resources_register
          -> devlink_resource_register IDX
   mlxsw_spectrum
      -> mlxsw_sp_kvdl_init
        -> mlxsw_sp_kvdl_parts_init
          -> mlxsw_sp_kvdl_part_init
            -> devlink_resource_size_get IDX (to get the current setup
                                              size from devlink)
        -> devlink_resource_occ_get_register IDX (register current
                                                  occupancy getter)
2) reload triggered by devlink command:
  -> mlxsw_devlink_core_bus_device_reload
    -> mlxsw_sp_fini
      -> mlxsw_sp_kvdl_fini
	-> devlink_resource_occ_get_unregister IDX
    (struct mlxsw_sp *mlxsw_sp is freed at this point, call to occ get
     which is using mlxsw_sp would cause use-after free)
    -> mlxsw_sp_init
      -> mlxsw_sp_kvdl_init
        -> mlxsw_sp_kvdl_parts_init
          -> mlxsw_sp_kvdl_part_init
            -> devlink_resource_size_get IDX (to get the current setup
                                              size from devlink)
        -> devlink_resource_occ_get_register IDX (register current
                                                  occupancy getter)

Fixes: d9f9b9a4d05f ("devlink: Add support for resource abstraction")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v1->v2:
- rebased on top or netdevsim changes
- improved description
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     | 24 ++-----
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |  1 -
 .../net/ethernet/mellanox/mlxsw/spectrum_kvdl.c    | 67 ++++++++++++--------
 drivers/net/netdevsim/devlink.c                    | 65 +++++++++----------
 include/net/devlink.h                              | 40 ++++++++----
 net/core/devlink.c                                 | 74 +++++++++++++++++++---
 6 files changed, 165 insertions(+), 106 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 53fffd09d133..ca38a30fbe91 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -3805,18 +3805,6 @@ static const struct mlxsw_config_profile mlxsw_sp_config_profile = {
 	},
 };
 
-static u64 mlxsw_sp_resource_kvd_linear_occ_get(struct devlink *devlink)
-{
-	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
-	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
-
-	return mlxsw_sp_kvdl_occ_get(mlxsw_sp);
-}
-
-static const struct devlink_resource_ops mlxsw_sp_resource_kvd_linear_ops = {
-	.occ_get = mlxsw_sp_resource_kvd_linear_occ_get,
-};
-
 static void
 mlxsw_sp_resource_size_params_prepare(struct mlxsw_core *mlxsw_core,
 				      struct devlink_resource_size_params *kvd_size_params,
@@ -3877,8 +3865,7 @@ static int mlxsw_sp_resources_register(struct mlxsw_core *mlxsw_core)
 	err = devlink_resource_register(devlink, MLXSW_SP_RESOURCE_NAME_KVD,
 					kvd_size, MLXSW_SP_RESOURCE_KVD,
 					DEVLINK_RESOURCE_ID_PARENT_TOP,
-					&kvd_size_params,
-					NULL);
+					&kvd_size_params);
 	if (err)
 		return err;
 
@@ -3887,8 +3874,7 @@ static int mlxsw_sp_resources_register(struct mlxsw_core *mlxsw_core)
 					linear_size,
 					MLXSW_SP_RESOURCE_KVD_LINEAR,
 					MLXSW_SP_RESOURCE_KVD,
-					&linear_size_params,
-					&mlxsw_sp_resource_kvd_linear_ops);
+					&linear_size_params);
 	if (err)
 		return err;
 
@@ -3905,8 +3891,7 @@ static int mlxsw_sp_resources_register(struct mlxsw_core *mlxsw_core)
 					double_size,
 					MLXSW_SP_RESOURCE_KVD_HASH_DOUBLE,
 					MLXSW_SP_RESOURCE_KVD,
-					&hash_double_size_params,
-					NULL);
+					&hash_double_size_params);
 	if (err)
 		return err;
 
@@ -3915,8 +3900,7 @@ static int mlxsw_sp_resources_register(struct mlxsw_core *mlxsw_core)
 					single_size,
 					MLXSW_SP_RESOURCE_KVD_HASH_SINGLE,
 					MLXSW_SP_RESOURCE_KVD,
-					&hash_single_size_params,
-					NULL);
+					&hash_single_size_params);
 	if (err)
 		return err;
 
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 82820ba43728..804d4d2c8031 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -442,7 +442,6 @@ void mlxsw_sp_kvdl_free(struct mlxsw_sp *mlxsw_sp, int entry_index);
 int mlxsw_sp_kvdl_alloc_size_query(struct mlxsw_sp *mlxsw_sp,
 				   unsigned int entry_count,
 				   unsigned int *p_alloc_size);
-u64 mlxsw_sp_kvdl_occ_get(const struct mlxsw_sp *mlxsw_sp);
 int mlxsw_sp_kvdl_resources_register(struct mlxsw_core *mlxsw_core);
 
 struct mlxsw_sp_acl_rule_info {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c
index 8796db44dcc3..fe4327f547d2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c
@@ -315,8 +315,9 @@ static u64 mlxsw_sp_kvdl_part_occ(struct mlxsw_sp_kvdl_part *part)
 	return occ;
 }
 
-u64 mlxsw_sp_kvdl_occ_get(const struct mlxsw_sp *mlxsw_sp)
+static u64 mlxsw_sp_kvdl_occ_get(void *priv)
 {
+	const struct mlxsw_sp *mlxsw_sp = priv;
 	u64 occ = 0;
 	int i;
 
@@ -326,48 +327,33 @@ u64 mlxsw_sp_kvdl_occ_get(const struct mlxsw_sp *mlxsw_sp)
 	return occ;
 }
 
-static u64 mlxsw_sp_kvdl_single_occ_get(struct devlink *devlink)
+static u64 mlxsw_sp_kvdl_single_occ_get(void *priv)
 {
-	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
-	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
+	const struct mlxsw_sp *mlxsw_sp = priv;
 	struct mlxsw_sp_kvdl_part *part;
 
 	part = mlxsw_sp->kvdl->parts[MLXSW_SP_KVDL_PART_ID_SINGLE];
 	return mlxsw_sp_kvdl_part_occ(part);
 }
 
-static u64 mlxsw_sp_kvdl_chunks_occ_get(struct devlink *devlink)
+static u64 mlxsw_sp_kvdl_chunks_occ_get(void *priv)
 {
-	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
-	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
+	const struct mlxsw_sp *mlxsw_sp = priv;
 	struct mlxsw_sp_kvdl_part *part;
 
 	part = mlxsw_sp->kvdl->parts[MLXSW_SP_KVDL_PART_ID_CHUNKS];
 	return mlxsw_sp_kvdl_part_occ(part);
 }
 
-static u64 mlxsw_sp_kvdl_large_chunks_occ_get(struct devlink *devlink)
+static u64 mlxsw_sp_kvdl_large_chunks_occ_get(void *priv)
 {
-	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
-	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
+	const struct mlxsw_sp *mlxsw_sp = priv;
 	struct mlxsw_sp_kvdl_part *part;
 
 	part = mlxsw_sp->kvdl->parts[MLXSW_SP_KVDL_PART_ID_LARGE_CHUNKS];
 	return mlxsw_sp_kvdl_part_occ(part);
 }
 
-static const struct devlink_resource_ops mlxsw_sp_kvdl_single_ops = {
-	.occ_get = mlxsw_sp_kvdl_single_occ_get,
-};
-
-static const struct devlink_resource_ops mlxsw_sp_kvdl_chunks_ops = {
-	.occ_get = mlxsw_sp_kvdl_chunks_occ_get,
-};
-
-static const struct devlink_resource_ops mlxsw_sp_kvdl_chunks_large_ops = {
-	.occ_get = mlxsw_sp_kvdl_large_chunks_occ_get,
-};
-
 int mlxsw_sp_kvdl_resources_register(struct mlxsw_core *mlxsw_core)
 {
 	struct devlink *devlink = priv_to_devlink(mlxsw_core);
@@ -386,8 +372,7 @@ int mlxsw_sp_kvdl_resources_register(struct mlxsw_core *mlxsw_core)
 					MLXSW_SP_KVDL_SINGLE_SIZE,
 					MLXSW_SP_RESOURCE_KVD_LINEAR_SINGLE,
 					MLXSW_SP_RESOURCE_KVD_LINEAR,
-					&size_params,
-					&mlxsw_sp_kvdl_single_ops);
+					&size_params);
 	if (err)
 		return err;
 
@@ -398,8 +383,7 @@ int mlxsw_sp_kvdl_resources_register(struct mlxsw_core *mlxsw_core)
 					MLXSW_SP_KVDL_CHUNKS_SIZE,
 					MLXSW_SP_RESOURCE_KVD_LINEAR_CHUNKS,
 					MLXSW_SP_RESOURCE_KVD_LINEAR,
-					&size_params,
-					&mlxsw_sp_kvdl_chunks_ops);
+					&size_params);
 	if (err)
 		return err;
 
@@ -410,13 +394,13 @@ int mlxsw_sp_kvdl_resources_register(struct mlxsw_core *mlxsw_core)
 					MLXSW_SP_KVDL_LARGE_CHUNKS_SIZE,
 					MLXSW_SP_RESOURCE_KVD_LINEAR_LARGE_CHUNKS,
 					MLXSW_SP_RESOURCE_KVD_LINEAR,
-					&size_params,
-					&mlxsw_sp_kvdl_chunks_large_ops);
+					&size_params);
 	return err;
 }
 
 int mlxsw_sp_kvdl_init(struct mlxsw_sp *mlxsw_sp)
 {
+	struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
 	struct mlxsw_sp_kvdl *kvdl;
 	int err;
 
@@ -429,6 +413,23 @@ int mlxsw_sp_kvdl_init(struct mlxsw_sp *mlxsw_sp)
 	if (err)
 		goto err_kvdl_parts_init;
 
+	devlink_resource_occ_get_register(devlink,
+					  MLXSW_SP_RESOURCE_KVD_LINEAR,
+					  mlxsw_sp_kvdl_occ_get,
+					  mlxsw_sp);
+	devlink_resource_occ_get_register(devlink,
+					  MLXSW_SP_RESOURCE_KVD_LINEAR_SINGLE,
+					  mlxsw_sp_kvdl_single_occ_get,
+					  mlxsw_sp);
+	devlink_resource_occ_get_register(devlink,
+					  MLXSW_SP_RESOURCE_KVD_LINEAR_CHUNKS,
+					  mlxsw_sp_kvdl_chunks_occ_get,
+					  mlxsw_sp);
+	devlink_resource_occ_get_register(devlink,
+					  MLXSW_SP_RESOURCE_KVD_LINEAR_LARGE_CHUNKS,
+					  mlxsw_sp_kvdl_large_chunks_occ_get,
+					  mlxsw_sp);
+
 	return 0;
 
 err_kvdl_parts_init:
@@ -438,6 +439,16 @@ int mlxsw_sp_kvdl_init(struct mlxsw_sp *mlxsw_sp)
 
 void mlxsw_sp_kvdl_fini(struct mlxsw_sp *mlxsw_sp)
 {
+	struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
+
+	devlink_resource_occ_get_unregister(devlink,
+					    MLXSW_SP_RESOURCE_KVD_LINEAR_LARGE_CHUNKS);
+	devlink_resource_occ_get_unregister(devlink,
+					    MLXSW_SP_RESOURCE_KVD_LINEAR_CHUNKS);
+	devlink_resource_occ_get_unregister(devlink,
+					    MLXSW_SP_RESOURCE_KVD_LINEAR_SINGLE);
+	devlink_resource_occ_get_unregister(devlink,
+					    MLXSW_SP_RESOURCE_KVD_LINEAR);
 	mlxsw_sp_kvdl_parts_fini(mlxsw_sp);
 	kfree(mlxsw_sp->kvdl);
 }
diff --git a/drivers/net/netdevsim/devlink.c b/drivers/net/netdevsim/devlink.c
index 1dba47936456..bef7db5d129a 100644
--- a/drivers/net/netdevsim/devlink.c
+++ b/drivers/net/netdevsim/devlink.c
@@ -30,52 +30,36 @@ static struct net *nsim_devlink_net(struct devlink *devlink)
 
 /* IPv4
  */
-static u64 nsim_ipv4_fib_resource_occ_get(struct devlink *devlink)
+static u64 nsim_ipv4_fib_resource_occ_get(void *priv)
 {
-	struct net *net = nsim_devlink_net(devlink);
+	struct net *net = priv;
 
 	return nsim_fib_get_val(net, NSIM_RESOURCE_IPV4_FIB, false);
 }
 
-static struct devlink_resource_ops nsim_ipv4_fib_res_ops = {
-	.occ_get = nsim_ipv4_fib_resource_occ_get,
-};
-
-static u64 nsim_ipv4_fib_rules_res_occ_get(struct devlink *devlink)
+static u64 nsim_ipv4_fib_rules_res_occ_get(void *priv)
 {
-	struct net *net = nsim_devlink_net(devlink);
+	struct net *net = priv;
 
 	return nsim_fib_get_val(net, NSIM_RESOURCE_IPV4_FIB_RULES, false);
 }
 
-static struct devlink_resource_ops nsim_ipv4_fib_rules_res_ops = {
-	.occ_get = nsim_ipv4_fib_rules_res_occ_get,
-};
-
 /* IPv6
  */
-static u64 nsim_ipv6_fib_resource_occ_get(struct devlink *devlink)
+static u64 nsim_ipv6_fib_resource_occ_get(void *priv)
 {
-	struct net *net = nsim_devlink_net(devlink);
+	struct net *net = priv;
 
 	return nsim_fib_get_val(net, NSIM_RESOURCE_IPV6_FIB, false);
 }
 
-static struct devlink_resource_ops nsim_ipv6_fib_res_ops = {
-	.occ_get = nsim_ipv6_fib_resource_occ_get,
-};
-
-static u64 nsim_ipv6_fib_rules_res_occ_get(struct devlink *devlink)
+static u64 nsim_ipv6_fib_rules_res_occ_get(void *priv)
 {
-	struct net *net = nsim_devlink_net(devlink);
+	struct net *net = priv;
 
 	return nsim_fib_get_val(net, NSIM_RESOURCE_IPV6_FIB_RULES, false);
 }
 
-static struct devlink_resource_ops nsim_ipv6_fib_rules_res_ops = {
-	.occ_get = nsim_ipv6_fib_rules_res_occ_get,
-};
-
 static int devlink_resources_register(struct devlink *devlink)
 {
 	struct devlink_resource_size_params params = {
@@ -91,7 +75,7 @@ static int devlink_resources_register(struct devlink *devlink)
 	err = devlink_resource_register(devlink, "IPv4", (u64)-1,
 					NSIM_RESOURCE_IPV4,
 					DEVLINK_RESOURCE_ID_PARENT_TOP,
-					&params, NULL);
+					&params);
 	if (err) {
 		pr_err("Failed to register IPv4 top resource\n");
 		goto out;
@@ -100,8 +84,7 @@ static int devlink_resources_register(struct devlink *devlink)
 	n = nsim_fib_get_val(net, NSIM_RESOURCE_IPV4_FIB, true);
 	err = devlink_resource_register(devlink, "fib", n,
 					NSIM_RESOURCE_IPV4_FIB,
-					NSIM_RESOURCE_IPV4,
-					&params, &nsim_ipv4_fib_res_ops);
+					NSIM_RESOURCE_IPV4, &params);
 	if (err) {
 		pr_err("Failed to register IPv4 FIB resource\n");
 		return err;
@@ -110,8 +93,7 @@ static int devlink_resources_register(struct devlink *devlink)
 	n = nsim_fib_get_val(net, NSIM_RESOURCE_IPV4_FIB_RULES, true);
 	err = devlink_resource_register(devlink, "fib-rules", n,
 					NSIM_RESOURCE_IPV4_FIB_RULES,
-					NSIM_RESOURCE_IPV4,
-					&params, &nsim_ipv4_fib_rules_res_ops);
+					NSIM_RESOURCE_IPV4, &params);
 	if (err) {
 		pr_err("Failed to register IPv4 FIB rules resource\n");
 		return err;
@@ -121,7 +103,7 @@ static int devlink_resources_register(struct devlink *devlink)
 	err = devlink_resource_register(devlink, "IPv6", (u64)-1,
 					NSIM_RESOURCE_IPV6,
 					DEVLINK_RESOURCE_ID_PARENT_TOP,
-					&params, NULL);
+					&params);
 	if (err) {
 		pr_err("Failed to register IPv6 top resource\n");
 		goto out;
@@ -130,8 +112,7 @@ static int devlink_resources_register(struct devlink *devlink)
 	n = nsim_fib_get_val(net, NSIM_RESOURCE_IPV6_FIB, true);
 	err = devlink_resource_register(devlink, "fib", n,
 					NSIM_RESOURCE_IPV6_FIB,
-					NSIM_RESOURCE_IPV6,
-					&params, &nsim_ipv6_fib_res_ops);
+					NSIM_RESOURCE_IPV6, &params);
 	if (err) {
 		pr_err("Failed to register IPv6 FIB resource\n");
 		return err;
@@ -140,12 +121,28 @@ static int devlink_resources_register(struct devlink *devlink)
 	n = nsim_fib_get_val(net, NSIM_RESOURCE_IPV6_FIB_RULES, true);
 	err = devlink_resource_register(devlink, "fib-rules", n,
 					NSIM_RESOURCE_IPV6_FIB_RULES,
-					NSIM_RESOURCE_IPV6,
-					&params, &nsim_ipv6_fib_rules_res_ops);
+					NSIM_RESOURCE_IPV6, &params);
 	if (err) {
 		pr_err("Failed to register IPv6 FIB rules resource\n");
 		return err;
 	}
+
+	devlink_resource_occ_get_register(devlink,
+					  NSIM_RESOURCE_IPV4_FIB,
+					  nsim_ipv4_fib_resource_occ_get,
+					  net);
+	devlink_resource_occ_get_register(devlink,
+					  NSIM_RESOURCE_IPV4_FIB_RULES,
+					  nsim_ipv4_fib_rules_res_occ_get,
+					  net);
+	devlink_resource_occ_get_register(devlink,
+					  NSIM_RESOURCE_IPV6_FIB,
+					  nsim_ipv6_fib_resource_occ_get,
+					  net);
+	devlink_resource_occ_get_register(devlink,
+					  NSIM_RESOURCE_IPV6_FIB_RULES,
+					  nsim_ipv6_fib_rules_res_occ_get,
+					  net);
 out:
 	return err;
 }
diff --git a/include/net/devlink.h b/include/net/devlink.h
index e21d8cadd480..2e4f71e16e95 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -231,14 +231,6 @@ struct devlink_dpipe_headers {
 	unsigned int headers_count;
 };
 
-/**
- * struct devlink_resource_ops - resource ops
- * @occ_get: get the occupied size
- */
-struct devlink_resource_ops {
-	u64 (*occ_get)(struct devlink *devlink);
-};
-
 /**
  * struct devlink_resource_size_params - resource's size parameters
  * @size_min: minimum size which can be set
@@ -265,6 +257,8 @@ devlink_resource_size_params_init(struct devlink_resource_size_params *size_para
 	size_params->unit = unit;
 }
 
+typedef u64 devlink_resource_occ_get_t(void *priv);
+
 /**
  * struct devlink_resource - devlink resource
  * @name: name of the resource
@@ -277,7 +271,6 @@ devlink_resource_size_params_init(struct devlink_resource_size_params *size_para
  * @size_params: size parameters
  * @list: parent list
  * @resource_list: list of child resources
- * @resource_ops: resource ops
  */
 struct devlink_resource {
 	const char *name;
@@ -289,7 +282,8 @@ struct devlink_resource {
 	struct devlink_resource_size_params size_params;
 	struct list_head list;
 	struct list_head resource_list;
-	const struct devlink_resource_ops *resource_ops;
+	devlink_resource_occ_get_t *occ_get;
+	void *occ_get_priv;
 };
 
 #define DEVLINK_RESOURCE_ID_PARENT_TOP 0
@@ -409,8 +403,7 @@ int devlink_resource_register(struct devlink *devlink,
 			      u64 resource_size,
 			      u64 resource_id,
 			      u64 parent_resource_id,
-			      const struct devlink_resource_size_params *size_params,
-			      const struct devlink_resource_ops *resource_ops);
+			      const struct devlink_resource_size_params *size_params);
 void devlink_resources_unregister(struct devlink *devlink,
 				  struct devlink_resource *resource);
 int devlink_resource_size_get(struct devlink *devlink,
@@ -419,6 +412,12 @@ int devlink_resource_size_get(struct devlink *devlink,
 int devlink_dpipe_table_resource_set(struct devlink *devlink,
 				     const char *table_name, u64 resource_id,
 				     u64 resource_units);
+void devlink_resource_occ_get_register(struct devlink *devlink,
+				       u64 resource_id,
+				       devlink_resource_occ_get_t *occ_get,
+				       void *occ_get_priv);
+void devlink_resource_occ_get_unregister(struct devlink *devlink,
+					 u64 resource_id);
 
 #else
 
@@ -562,8 +561,7 @@ devlink_resource_register(struct devlink *devlink,
 			  u64 resource_size,
 			  u64 resource_id,
 			  u64 parent_resource_id,
-			  const struct devlink_resource_size_params *size_params,
-			  const struct devlink_resource_ops *resource_ops)
+			  const struct devlink_resource_size_params *size_params)
 {
 	return 0;
 }
@@ -589,6 +587,20 @@ devlink_dpipe_table_resource_set(struct devlink *devlink,
 	return -EOPNOTSUPP;
 }
 
+static inline void
+devlink_resource_occ_get_register(struct devlink *devlink,
+				  u64 resource_id,
+				  devlink_resource_occ_get_t *occ_get,
+				  void *occ_get_priv)
+{
+}
+
+static inline void
+devlink_resource_occ_get_unregister(struct devlink *devlink,
+				    u64 resource_id)
+{
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 9236e421bd62..ad1317376798 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2405,6 +2405,16 @@ devlink_resource_size_params_put(struct devlink_resource *resource,
 	return 0;
 }
 
+static int devlink_resource_occ_put(struct devlink_resource *resource,
+				    struct sk_buff *skb)
+{
+	if (!resource->occ_get)
+		return 0;
+	return nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_OCC,
+				 resource->occ_get(resource->occ_get_priv),
+				 DEVLINK_ATTR_PAD);
+}
+
 static int devlink_resource_put(struct devlink *devlink, struct sk_buff *skb,
 				struct devlink_resource *resource)
 {
@@ -2425,11 +2435,8 @@ static int devlink_resource_put(struct devlink *devlink, struct sk_buff *skb,
 	if (resource->size != resource->size_new)
 		nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_SIZE_NEW,
 				  resource->size_new, DEVLINK_ATTR_PAD);
-	if (resource->resource_ops && resource->resource_ops->occ_get)
-		if (nla_put_u64_64bit(skb, DEVLINK_ATTR_RESOURCE_OCC,
-				      resource->resource_ops->occ_get(devlink),
-				      DEVLINK_ATTR_PAD))
-			goto nla_put_failure;
+	if (devlink_resource_occ_put(resource, skb))
+		goto nla_put_failure;
 	if (devlink_resource_size_params_put(resource, skb))
 		goto nla_put_failure;
 	if (list_empty(&resource->resource_list))
@@ -3162,15 +3169,13 @@ EXPORT_SYMBOL_GPL(devlink_dpipe_table_unregister);
  *	@resource_id: resource's id
  *	@parent_reosurce_id: resource's parent id
  *	@size params: size parameters
- *	@resource_ops: resource ops
  */
 int devlink_resource_register(struct devlink *devlink,
 			      const char *resource_name,
 			      u64 resource_size,
 			      u64 resource_id,
 			      u64 parent_resource_id,
-			      const struct devlink_resource_size_params *size_params,
-			      const struct devlink_resource_ops *resource_ops)
+			      const struct devlink_resource_size_params *size_params)
 {
 	struct devlink_resource *resource;
 	struct list_head *resource_list;
@@ -3213,7 +3218,6 @@ int devlink_resource_register(struct devlink *devlink,
 	resource->size = resource_size;
 	resource->size_new = resource_size;
 	resource->id = resource_id;
-	resource->resource_ops = resource_ops;
 	resource->size_valid = true;
 	memcpy(&resource->size_params, size_params,
 	       sizeof(resource->size_params));
@@ -3315,6 +3319,58 @@ int devlink_dpipe_table_resource_set(struct devlink *devlink,
 }
 EXPORT_SYMBOL_GPL(devlink_dpipe_table_resource_set);
 
+/**
+ *	devlink_resource_occ_get_register - register occupancy getter
+ *
+ *	@devlink: devlink
+ *	@resource_id: resource id
+ *	@occ_get: occupancy getter callback
+ *	@occ_get_priv: occupancy getter callback priv
+ */
+void devlink_resource_occ_get_register(struct devlink *devlink,
+				       u64 resource_id,
+				       devlink_resource_occ_get_t *occ_get,
+				       void *occ_get_priv)
+{
+	struct devlink_resource *resource;
+
+	mutex_lock(&devlink->lock);
+	resource = devlink_resource_find(devlink, NULL, resource_id);
+	if (WARN_ON(!resource))
+		goto out;
+	WARN_ON(resource->occ_get);
+
+	resource->occ_get = occ_get;
+	resource->occ_get_priv = occ_get_priv;
+out:
+	mutex_unlock(&devlink->lock);
+}
+EXPORT_SYMBOL_GPL(devlink_resource_occ_get_register);
+
+/**
+ *	devlink_resource_occ_get_unregister - unregister occupancy getter
+ *
+ *	@devlink: devlink
+ *	@resource_id: resource id
+ */
+void devlink_resource_occ_get_unregister(struct devlink *devlink,
+					 u64 resource_id)
+{
+	struct devlink_resource *resource;
+
+	mutex_lock(&devlink->lock);
+	resource = devlink_resource_find(devlink, NULL, resource_id);
+	if (WARN_ON(!resource))
+		goto out;
+	WARN_ON(!resource->occ_get);
+
+	resource->occ_get = NULL;
+	resource->occ_get_priv = NULL;
+out:
+	mutex_unlock(&devlink->lock);
+}
+EXPORT_SYMBOL_GPL(devlink_resource_occ_get_unregister);
+
 static int __init devlink_module_init(void)
 {
 	return genl_register_family(&devlink_nl_family);
-- 
2.14.3

^ permalink raw reply related

* Enable and configure storm prevention in a network device
From: Murali Karicheri @ 2018-04-05 20:14 UTC (permalink / raw)
  To: open list:TI NETCP ETHERNET DRIVER

Hello Netdev experts,

Is there a standard way to implement and configure storm prevention in a Linux
network device?

Our NIC firmware has the capability to enable storm prevention which is implemented
using a credit based scheme. The configuration is how many number of multicast +
broadcast packets allowed in a certain period of time. Is there a standard way
of passing this information from user space to driver? 

Thanks in advance for your input!

-- 
Murali Karicheri
Linux Kernel, Keystone

^ permalink raw reply

* Re: [PATCH net-next 6/6] netdevsim: Add simple FIB resource controller via devlink
From: David Ahern @ 2018-04-05 20:10 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, roopa, shm, jiri, idosch, jakub.kicinski,
	andy.roulin
In-Reply-To: <20180405172718.GA9125@nanopsycho>

On 4/5/18 11:27 AM, Jiri Pirko wrote:
> Wed, Mar 28, 2018 at 03:22:00AM CEST, dsa@cumulusnetworks.com wrote:
>> Add devlink support to netdevsim and use it to implement a simple,
>> profile based resource controller. Only one controller is needed
>> per namespace, so the first netdevsim netdevice in a namespace
>> registers with devlink. If that device is deleted, the resource
>> settings are deleted.
> 
> I don't understand why you add 1:1 fixed relationship between
> netnamespace and devlink instance. That is highly misleading and reader
> might think that those 2 are somehow related. They are not. You can have
> multiple devlink instances for many ports in a single namespace.

The netdevsim devlink instance is an example of limiting the number of
FIB entries and FIB rules for a namespace. It is currently limited to
the init_net based on past discussion.

It does not make sense to have multiple resource controllers for the
same network namespace, hence the limit of only registering with devlink
on the first device create.

> 
> Could you please clarify?
> 
> Also, to see the relationship between individual netdevsim netdevices
> and the parent devlink instance, we should use devlink_port
> instances, like this: 
> 
>       devlink1              devlink2
>        |    |                |    |
>  dl_port1_1 dlport1_2   dlport2_1 dlport2_2
>        |    |                |    |
>      eth0  eth1             eth2 eth3
> 
> Note that "devlink instance" reprensents one ASIC.
> The address of the devlink instance is the bus address of the ASIC.
> Here, you use address of some/first netdevsim netdev instance.

The ASIC here is the kernel tables in a namespace. It does not make
sense to have 2 devlink instances for a single namespace.

> 
> The way it is implemented in netdevsim by this patch is wrong on
> so many levels :(
> 
> Could you please fix this? I'm more than happy to help you with this,
> please say so. Thanks!

What is there to fix?

Not creating a netdevsim device per netdevsim netdevice? That is
completely unrelated to the devlink change.

^ permalink raw reply

* Re: [PATCH iproute2] l2tp: no need to export session offsets in JSON output
From: Stephen Hemminger @ 2018-04-05 19:44 UTC (permalink / raw)
  To: Guillaume Nault; +Cc: netdev, James Chapman
In-Reply-To: <6e2c59a6757414fa22b5927f248749efe34825d5.1522948642.git.g.nault@alphalink.fr>

On Thu, 5 Apr 2018 19:24:17 +0200
Guillaume Nault <g.nault@alphalink.fr> wrote:

> The offset and peer_offset parameters are only printed to avoid
> confusing external scripts that may parse "ip l2tp show session"
> output. There's no reason to keep them in JSON.
> 
> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
> ---

Applied thanks.

^ permalink raw reply

* Re: [PATCH net 0/6] net: better validate user provided tunnel names
From: Eric Dumazet @ 2018-04-05 19:40 UTC (permalink / raw)
  To: David Miller, edumazet; +Cc: netdev, steffen.klassert, eric.dumazet
In-Reply-To: <20180405.152111.2211916878014180558.davem@davemloft.net>



On 04/05/2018 12:21 PM, David Miller wrote:
> From: Eric Dumazet <edumazet@google.com>
> Date: Thu,  5 Apr 2018 06:39:25 -0700
> 
>> This series changes dev_valid_name() to not attempt reading
>> a possibly too long user-provided device name, then use
>> this helper in five different tunnel providers.
> 
> Series applied and queued up for -stable, thanks Eric.
> 
> Reading over this series makes me wonder if we generally have an
> off-by-one bug for device names which are exactly IFNAMSIZ.
> 
> We validate the size using the test:
> 
> 	if (strlen(name) >= IFNAMSIZ)
> 		return ERROR;
> 
> and thusly after Eric's changes:
> 
> 	if (strnlen(name, IFNAMSIZ) == IFNAMSIZ)
> 		return ERROR;
> 
> This value computed by str{,n}len() doesn't include the trailing null
> byte.
> 
> So we will accept a name that has exactly IFNAMSIZ bytes long not
> including the trailing null.

In this case strnlen(name, IFNAMSIZ) returns IFNAMSIZ.

So  (strnlen(name, IFNAMSIZ) == IFNAMSIZ) would definitely be true.

The only effect of the change is that strlen() would read 1000 bytes of a
malicious string before we reached the test on the length to reject such name.

While strnlen() is guaranteed to not read more than IFNAMSIZ bytes.

^ permalink raw reply

* Re: [PATCH net 0/6] net: better validate user provided tunnel names
From: David Miller @ 2018-04-05 19:21 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, steffen.klassert, eric.dumazet
In-Reply-To: <20180405133931.207634-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Thu,  5 Apr 2018 06:39:25 -0700

> This series changes dev_valid_name() to not attempt reading
> a possibly too long user-provided device name, then use
> this helper in five different tunnel providers.

Series applied and queued up for -stable, thanks Eric.

Reading over this series makes me wonder if we generally have an
off-by-one bug for device names which are exactly IFNAMSIZ.

We validate the size using the test:

	if (strlen(name) >= IFNAMSIZ)
		return ERROR;

and thusly after Eric's changes:

	if (strnlen(name, IFNAMSIZ) == IFNAMSIZ)
		return ERROR;

This value computed by str{,n}len() doesn't include the trailing null
byte.

So we will accept a name that has exactly IFNAMSIZ bytes long not
including the trailing null.

Then we will copy IFNAMSIZ bytes, minus 1, into the device name and
then tack on the trailling null byte.

So essentially we will set the final non-null byte in the string to
a null byte.

Am I misreading things?

^ permalink raw reply

* Re: [PATCH net-next v2 2/2] dt: bindings: add new dt entries for brcmfmac
From: Arend van Spriel @ 2018-04-05 19:11 UTC (permalink / raw)
  To: Kalle Valo, Ulf Hansson
  Cc: Florian Fainelli, Alexey Roslyakov, Andrew Lunn, Rob Herring,
	Mark Rutland, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
	Wright Feng, netdev, linux-wireless, devicetree,
	Linux Kernel Mailing List, brcm80211-dev-list.pdl,
	brcm80211-dev-list
In-Reply-To: <87lge1er31.fsf@kamboji.qca.qualcomm.com>

On 4/5/2018 3:10 PM, Kalle Valo wrote:
> Ulf Hansson <ulf.hansson@linaro.org> writes:
>
>> On 20 March 2018 at 10:55, Kalle Valo <kvalo@codeaurora.org> wrote:
>>> Arend van Spriel <arend.vanspriel@broadcom.com> writes:
>>>
>>>>>> If I get it right, you mean something like this:
>>>>>>
>>>>>> mmc3: mmc@1c12000 {
>>>>>> ...
>>>>>>           broken-sg-support;
>>>>>>           sd-head-align = 4;
>>>>>>           sd-sgentry-align = 512;
>>>>>>
>>>>>>           brcmf: wifi@1 {
>>>>>>                   ...
>>>>>>           };
>>>>>> };
>>>>>>
>>>>>> Where dt: bindings documentation for these entries should reside?
>>>>>> In generic MMC bindings? Well, this is the very special case and
>>>>>> mmc-linux maintainer will unlikely to accept these changes.
>>>>>> Also, extra kernel code modification might be required. It could make
>>>>>> quite trivial change much more complex.
>>>>>
>>>>> If the MMC maintainers are not copied on this patch series, it will
>>>>> likely be hard for them to identify this patch series and chime in...
>>>>
>>>> The main question is whether this is indeed a "very special case" as
>>>> Alexey claims it to be or that it is likely to be applicable to other
>>>> device and host combinations as you are suggesting.
>>>>
>>>> If these properties are imposed by the host or host controller it
>>>> would make sense to have these in the mmc bindings.
>>>
>>> BTW, last year we were discussing something similar (I mean related to
>>> alignment requirements) with ath10k SDIO patches and at the time the
>>> patch submitter was proposing to have a bounce buffer in ath10k to
>>> workaround that. I don't remember the details anymore, they are on the
>>> ath10k mailing list archive if anyone is curious to know, but I would
>>> not be surprised if they are similar as here. So there might be a need
>>> to solve this in a generic way (but not sure of course as I haven't
>>> checked the details).
>>
>> I re-call something about these as well, here are the patches. Perhaps
>> I should pick some of them up...
>>
>> https://patchwork.kernel.org/patch/10123137/
>> https://patchwork.kernel.org/patch/10123139/
>> https://patchwork.kernel.org/patch/10123141/
>> https://patchwork.kernel.org/patch/10123143/
>
> Actually I was talking about a different patch, found it now:
>
> ath10k_sdio: DMA bounce buffers for read write
>
> https://patchwork.kernel.org/patch/9979543/

The patches Uffe mentions are related to this. In brcmfmac we simply 
implemented functionality to create a scatter list and send it through 
the MMC stack using SDIO CMD53. It is in the creation of the scatter 
list that these alignment properties are needed. Moving the whole 
functionality to the MMC stack will also move those properties into 
their right context, ie. SDIO host controller.

Our broker_sg_support property is basically doing the bounce buffer 
thing if I understand it correctly.

Regards,
Arend

^ permalink raw reply

* [PATCH 4/4] hv_netvsc: Pass net_device parameter to revoke and teardown functions
From: Mohammed Gamal @ 2018-04-05 19:09 UTC (permalink / raw)
  To: netdev, sthemmin
  Cc: devel, linux-kernel, kys, haiyangz, vkuznets, otubo,
	Mohammed Gamal
In-Reply-To: <1522955361-14704-1-git-send-email-mgamal@redhat.com>

The callers to netvsc_revoke_*_buf() and netvsc_teardown_*_gpadl()
already have their net_device instances. Pass them as a paramaeter to
the function instead of obtaining them from netvsc_device struct
everytime

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
---
 drivers/net/hyperv/netvsc.c | 37 ++++++++++++++++++-------------------
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index df92c2f..04f611e 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -110,9 +110,9 @@ static void free_netvsc_device_rcu(struct netvsc_device *nvdev)
 }
 
 static void netvsc_revoke_recv_buf(struct hv_device *device,
-				   struct netvsc_device *net_device)
+				   struct netvsc_device *net_device,
+				   struct net_device *ndev)
 {
-	struct net_device *ndev = hv_get_drvdata(device);
 	struct nvsp_message *revoke_packet;
 	int ret;
 
@@ -160,9 +160,9 @@ static void netvsc_revoke_recv_buf(struct hv_device *device,
 }
 
 static void netvsc_revoke_send_buf(struct hv_device *device,
-				   struct netvsc_device *net_device)
+				   struct netvsc_device *net_device,
+				   struct net_device *ndev)
 {
-	struct net_device *ndev = hv_get_drvdata(device);
 	struct nvsp_message *revoke_packet;
 	int ret;
 
@@ -211,9 +211,9 @@ static void netvsc_revoke_send_buf(struct hv_device *device,
 }
 
 static void netvsc_teardown_recv_gpadl(struct hv_device *device,
-				       struct netvsc_device *net_device)
+				       struct netvsc_device *net_device,
+				       struct net_device *ndev)
 {
-	struct net_device *ndev = hv_get_drvdata(device);
 	int ret;
 
 	if (net_device->recv_buf_gpadl_handle) {
@@ -233,9 +233,9 @@ static void netvsc_teardown_recv_gpadl(struct hv_device *device,
 }
 
 static void netvsc_teardown_send_gpadl(struct hv_device *device,
-				       struct netvsc_device *net_device)
+				       struct netvsc_device *net_device,
+				       struct net_device *ndev)
 {
-	struct net_device *ndev = hv_get_drvdata(device);
 	int ret;
 
 	if (net_device->send_buf_gpadl_handle) {
@@ -452,10 +452,10 @@ static int netvsc_init_buf(struct hv_device *device,
 	goto exit;
 
 cleanup:
-	netvsc_revoke_recv_buf(device, net_device);
-	netvsc_revoke_send_buf(device, net_device);
-	netvsc_teardown_recv_gpadl(device, net_device);
-	netvsc_teardown_send_gpadl(device, net_device);
+	netvsc_revoke_recv_buf(device, net_device, ndev);
+	netvsc_revoke_send_buf(device, net_device, ndev);
+	netvsc_teardown_recv_gpadl(device, net_device, ndev);
+	netvsc_teardown_send_gpadl(device, net_device, ndev);
 
 exit:
 	return ret;
@@ -474,7 +474,6 @@ static int negotiate_nvsp_ver(struct hv_device *device,
 	init_packet->hdr.msg_type = NVSP_MSG_TYPE_INIT;
 	init_packet->msg.init_msg.init.min_protocol_ver = nvsp_ver;
 	init_packet->msg.init_msg.init.max_protocol_ver = nvsp_ver;
-
 	trace_nvsp_send(ndev, init_packet);
 
 	/* Send the init request */
@@ -596,13 +595,13 @@ void netvsc_device_remove(struct hv_device *device)
 	 * Revoke receive buffer. If host is pre-Win2016 then tear down
 	 * receive buffer GPADL. Do the same for send buffer.
 	 */
-	netvsc_revoke_recv_buf(device, net_device);
+	netvsc_revoke_recv_buf(device, net_device, ndev);
 	if (vmbus_proto_version < VERSION_WIN10)
-		netvsc_teardown_recv_gpadl(device, net_device);
+		netvsc_teardown_recv_gpadl(device, net_device, ndev);
 
-	netvsc_revoke_send_buf(device, net_device);
+	netvsc_revoke_send_buf(device, net_device, ndev);
 	if (vmbus_proto_version < VERSION_WIN10)
-		netvsc_teardown_send_gpadl(device, net_device);
+		netvsc_teardown_send_gpadl(device, net_device, ndev);
 
 	RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
 
@@ -624,8 +623,8 @@ void netvsc_device_remove(struct hv_device *device)
 	 * here after VMBus is closed.
 	*/
 	if (vmbus_proto_version >= VERSION_WIN10) {
-		netvsc_teardown_recv_gpadl(device, net_device);
-		netvsc_teardown_send_gpadl(device, net_device);
+		netvsc_teardown_recv_gpadl(device, net_device, ndev);
+		netvsc_teardown_send_gpadl(device, net_device, ndev);
 	}
 
 	/* Release all resources */
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 3/4] hv_netvsc: Ensure correct teardown message sequence order
From: Mohammed Gamal @ 2018-04-05 19:09 UTC (permalink / raw)
  To: netdev, sthemmin
  Cc: otubo, Mohammed Gamal, haiyangz, linux-kernel, devel, vkuznets
In-Reply-To: <1522955361-14704-1-git-send-email-mgamal@redhat.com>

Prior to commit 0cf737808ae7 ("hv_netvsc: netvsc_teardown_gpadl() split")
the call sequence in netvsc_device_remove() was as follows (as
implemented in netvsc_destroy_buf()):
1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
2- Teardown receive buffer GPADL
3- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
4- Teardown send buffer GPADL
5- Close vmbus

This didn't work for WS2016 hosts. Commit 0cf737808ae7
("hv_netvsc: netvsc_teardown_gpadl() split") rearranged the
teardown sequence as follows:
1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
3- Close vmbus
4- Teardown receive buffer GPADL
5- Teardown send buffer GPADL

That worked well for WS2016 hosts, but it prevented guests on older hosts from
shutting down after changing network settings. Commit 0ef58b0a05c1
("hv_netvsc: change GPAD teardown order on older versions") ensured the
following message sequence for older hosts
1- Send NVSP_MSG1_TYPE_REVOKE_RECV_BUF message
2- Send NVSP_MSG1_TYPE_REVOKE_SEND_BUF message
3- Teardown receive buffer GPADL
4- Teardown send buffer GPADL
5- Close vmbus

However, with this sequence calling `ip link set eth0 mtu 1000` hangs and the
process becomes uninterruptible. On futher analysis it turns out that on tearing
down the receive buffer GPADL the kernel is waiting indefinitely
in vmbus_teardown_gpadl() for a completion to be signaled.

Here is a snippet of where this occurs:
int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
{
        struct vmbus_channel_gpadl_teardown *msg;
        struct vmbus_channel_msginfo *info;
        unsigned long flags;
        int ret;

        info = kmalloc(sizeof(*info) +
                       sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL);
        if (!info)
                return -ENOMEM;

        init_completion(&info->waitevent);
        info->waiting_channel = channel;
[....]
        ret = vmbus_post_msg(msg, sizeof(struct vmbus_channel_gpadl_teardown),
                             true);

        if (ret)
                goto post_msg_err;

        wait_for_completion(&info->waitevent);
[....]
}

The completion is signaled from vmbus_ongpadl_torndown(), which gets called when
the corresponding message is received from the host, which apparently never happens
in that case.
This patch works around the issue by restoring the first mentioned message sequence
for older hosts

Fixes: 0ef58b0a05c1 ("hv_netvsc: change GPAD teardown order on older versions")

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
---
 drivers/net/hyperv/netvsc.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index f4df5de..df92c2f 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -592,8 +592,17 @@ void netvsc_device_remove(struct hv_device *device)
 		= rtnl_dereference(net_device_ctx->nvdev);
 	int i;
 
+	/*
+	 * Revoke receive buffer. If host is pre-Win2016 then tear down
+	 * receive buffer GPADL. Do the same for send buffer.
+	 */
 	netvsc_revoke_recv_buf(device, net_device);
+	if (vmbus_proto_version < VERSION_WIN10)
+		netvsc_teardown_recv_gpadl(device, net_device);
+
 	netvsc_revoke_send_buf(device, net_device);
+	if (vmbus_proto_version < VERSION_WIN10)
+		netvsc_teardown_send_gpadl(device, net_device);
 
 	RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
 
@@ -607,15 +616,13 @@ void netvsc_device_remove(struct hv_device *device)
 	 */
 	netdev_dbg(ndev, "net device safe to remove\n");
 
-	/* older versions require that buffer be revoked before close */
-	if (vmbus_proto_version < VERSION_WIN10) {
-		netvsc_teardown_recv_gpadl(device, net_device);
-		netvsc_teardown_send_gpadl(device, net_device);
-	}
-
 	/* Now, we can close the channel safely */
 	vmbus_close(device->channel);
 
+	/*
+	 * If host is Win2016 or higher then we do the GPADL tear down
+	 * here after VMBus is closed.
+	*/
 	if (vmbus_proto_version >= VERSION_WIN10) {
 		netvsc_teardown_recv_gpadl(device, net_device);
 		netvsc_teardown_send_gpadl(device, net_device);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 2/4] hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
From: Mohammed Gamal @ 2018-04-05 19:09 UTC (permalink / raw)
  To: netdev, sthemmin
  Cc: otubo, Mohammed Gamal, haiyangz, linux-kernel, devel, vkuznets
In-Reply-To: <1522955361-14704-1-git-send-email-mgamal@redhat.com>

Split each of the functions into two for each of send/recv buffers.
This will be needed in order to implement a fine-grained messaging
sequence to the host so tht we accommodate the requirements of
different Windows versions

Fixes: 0ef58b0a05c12 ("hv_netvsc: change GPAD teardown order on older versions")

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
---
 drivers/net/hyperv/netvsc.c | 46 +++++++++++++++++++++++++++++++++------------
 1 file changed, 34 insertions(+), 12 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index d65b7fc..f4df5de 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -109,11 +109,11 @@ static void free_netvsc_device_rcu(struct netvsc_device *nvdev)
 	call_rcu(&nvdev->rcu, free_netvsc_device);
 }
 
-static void netvsc_revoke_buf(struct hv_device *device,
-			      struct netvsc_device *net_device)
+static void netvsc_revoke_recv_buf(struct hv_device *device,
+				   struct netvsc_device *net_device)
 {
-	struct nvsp_message *revoke_packet;
 	struct net_device *ndev = hv_get_drvdata(device);
+	struct nvsp_message *revoke_packet;
 	int ret;
 
 	/*
@@ -157,6 +157,14 @@ static void netvsc_revoke_buf(struct hv_device *device,
 		}
 		net_device->recv_section_cnt = 0;
 	}
+}
+
+static void netvsc_revoke_send_buf(struct hv_device *device,
+				   struct netvsc_device *net_device)
+{
+	struct net_device *ndev = hv_get_drvdata(device);
+	struct nvsp_message *revoke_packet;
+	int ret;
 
 	/* Deal with the send buffer we may have setup.
 	 * If we got a  send section size, it means we received a
@@ -202,8 +210,8 @@ static void netvsc_revoke_buf(struct hv_device *device,
 	}
 }
 
-static void netvsc_teardown_gpadl(struct hv_device *device,
-				  struct netvsc_device *net_device)
+static void netvsc_teardown_recv_gpadl(struct hv_device *device,
+				       struct netvsc_device *net_device)
 {
 	struct net_device *ndev = hv_get_drvdata(device);
 	int ret;
@@ -222,6 +230,13 @@ static void netvsc_teardown_gpadl(struct hv_device *device,
 		}
 		net_device->recv_buf_gpadl_handle = 0;
 	}
+}
+
+static void netvsc_teardown_send_gpadl(struct hv_device *device,
+				       struct netvsc_device *net_device)
+{
+	struct net_device *ndev = hv_get_drvdata(device);
+	int ret;
 
 	if (net_device->send_buf_gpadl_handle) {
 		ret = vmbus_teardown_gpadl(device->channel,
@@ -437,8 +452,10 @@ static int netvsc_init_buf(struct hv_device *device,
 	goto exit;
 
 cleanup:
-	netvsc_revoke_buf(device, net_device);
-	netvsc_teardown_gpadl(device, net_device);
+	netvsc_revoke_recv_buf(device, net_device);
+	netvsc_revoke_send_buf(device, net_device);
+	netvsc_teardown_recv_gpadl(device, net_device);
+	netvsc_teardown_send_gpadl(device, net_device);
 
 exit:
 	return ret;
@@ -575,7 +592,8 @@ void netvsc_device_remove(struct hv_device *device)
 		= rtnl_dereference(net_device_ctx->nvdev);
 	int i;
 
-	netvsc_revoke_buf(device, net_device);
+	netvsc_revoke_recv_buf(device, net_device);
+	netvsc_revoke_send_buf(device, net_device);
 
 	RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
 
@@ -590,14 +608,18 @@ void netvsc_device_remove(struct hv_device *device)
 	netdev_dbg(ndev, "net device safe to remove\n");
 
 	/* older versions require that buffer be revoked before close */
-	if (vmbus_proto_version < VERSION_WIN10)
-		netvsc_teardown_gpadl(device, net_device);
+	if (vmbus_proto_version < VERSION_WIN10) {
+		netvsc_teardown_recv_gpadl(device, net_device);
+		netvsc_teardown_send_gpadl(device, net_device);
+	}
 
 	/* Now, we can close the channel safely */
 	vmbus_close(device->channel);
 
-	if (vmbus_proto_version >= VERSION_WIN10)
-		netvsc_teardown_gpadl(device, net_device);
+	if (vmbus_proto_version >= VERSION_WIN10) {
+		netvsc_teardown_recv_gpadl(device, net_device);
+		netvsc_teardown_send_gpadl(device, net_device);
+	}
 
 	/* Release all resources */
 	free_netvsc_device_rcu(net_device);
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 1/4] hv_netvsc: Use Windows version instead of NVSP version on GPAD teardown
From: Mohammed Gamal @ 2018-04-05 19:09 UTC (permalink / raw)
  To: netdev, sthemmin
  Cc: otubo, Mohammed Gamal, haiyangz, linux-kernel, devel, vkuznets
In-Reply-To: <1522955361-14704-1-git-send-email-mgamal@redhat.com>

When changing network interface settings, Windows guests
older than WS2016 can no longer shutdown. This was addressed
by commit 0ef58b0a05c12 ("hv_netvsc: change GPAD teardown order
on older versions"), however the issue also occurs on WS2012
guests that share NVSP protocol versions with WS2016 guests.
Hence we use Windows version directly to differentiate them.

Fixes: 0ef58b0a05c12 ("hv_netvsc: change GPAD teardown order on older versions")

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
---
 drivers/net/hyperv/netvsc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index c9910c3..d65b7fc 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -590,13 +590,13 @@ void netvsc_device_remove(struct hv_device *device)
 	netdev_dbg(ndev, "net device safe to remove\n");
 
 	/* older versions require that buffer be revoked before close */
-	if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_4)
+	if (vmbus_proto_version < VERSION_WIN10)
 		netvsc_teardown_gpadl(device, net_device);
 
 	/* Now, we can close the channel safely */
 	vmbus_close(device->channel);
 
-	if (net_device->nvsp_version >= NVSP_PROTOCOL_VERSION_4)
+	if (vmbus_proto_version >= VERSION_WIN10)
 		netvsc_teardown_gpadl(device, net_device);
 
 	/* Release all resources */
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 0/4] hv_netvsc: Fix shutdown issues on older Windows hosts
From: Mohammed Gamal @ 2018-04-05 19:09 UTC (permalink / raw)
  To: netdev, sthemmin
  Cc: otubo, Mohammed Gamal, haiyangz, linux-kernel, devel, vkuznets

Guests running on WS2012 hosts would not shutdown when changing network
interface setting (e.g. Number of channels, MTU ... etc). 

This patch series addresses these shutdown issues we enecountered with WS2012
hosts. It's essentialy a rework of the series sent in 
https://lkml.org/lkml/2018/1/23/111 on top of latest upstream

Fixes: 0ef58b0a05c1 ("hv_netvsc: change GPAD teardown order on older versions")

Mohammed Gamal (4):
  hv_netvsc: Use Windows version instead of NVSP version on GPAD
    teardown
  hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
  hv_netvsc: Ensure correct teardown message sequence order
  hv_netvsc: Pass net_device parameter to revoke and teardown functions

 drivers/net/hyperv/netvsc.c | 60 +++++++++++++++++++++++++++++++++------------
 1 file changed, 44 insertions(+), 16 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [PATCH net-next v2 2/2] dt: bindings: add new dt entries for brcmfmac
From: Ulf Hansson @ 2018-04-05 19:09 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Arend van Spriel, Florian Fainelli, Alexey Roslyakov, Andrew Lunn,
	Rob Herring, Mark Rutland, Franky Lin, Hante Meuleman,
	Chi-Hsien Lin, Wright Feng, netdev, linux-wireless, devicetree,
	Linux Kernel Mailing List, brcm80211-dev-list.pdl,
	brcm80211-dev-list
In-Reply-To: <87lge1er31.fsf@kamboji.qca.qualcomm.com>

On 5 April 2018 at 15:10, Kalle Valo <kvalo@codeaurora.org> wrote:
> Ulf Hansson <ulf.hansson@linaro.org> writes:
>
>> On 20 March 2018 at 10:55, Kalle Valo <kvalo@codeaurora.org> wrote:
>>> Arend van Spriel <arend.vanspriel@broadcom.com> writes:
>>>
>>>>>> If I get it right, you mean something like this:
>>>>>>
>>>>>> mmc3: mmc@1c12000 {
>>>>>> ...
>>>>>>          broken-sg-support;
>>>>>>          sd-head-align = 4;
>>>>>>          sd-sgentry-align = 512;
>>>>>>
>>>>>>          brcmf: wifi@1 {
>>>>>>                  ...
>>>>>>          };
>>>>>> };
>>>>>>
>>>>>> Where dt: bindings documentation for these entries should reside?
>>>>>> In generic MMC bindings? Well, this is the very special case and
>>>>>> mmc-linux maintainer will unlikely to accept these changes.
>>>>>> Also, extra kernel code modification might be required. It could make
>>>>>> quite trivial change much more complex.
>>>>>
>>>>> If the MMC maintainers are not copied on this patch series, it will
>>>>> likely be hard for them to identify this patch series and chime in...
>>>>
>>>> The main question is whether this is indeed a "very special case" as
>>>> Alexey claims it to be or that it is likely to be applicable to other
>>>> device and host combinations as you are suggesting.
>>>>
>>>> If these properties are imposed by the host or host controller it
>>>> would make sense to have these in the mmc bindings.
>>>
>>> BTW, last year we were discussing something similar (I mean related to
>>> alignment requirements) with ath10k SDIO patches and at the time the
>>> patch submitter was proposing to have a bounce buffer in ath10k to
>>> workaround that. I don't remember the details anymore, they are on the
>>> ath10k mailing list archive if anyone is curious to know, but I would
>>> not be surprised if they are similar as here. So there might be a need
>>> to solve this in a generic way (but not sure of course as I haven't
>>> checked the details).
>>
>> I re-call something about these as well, here are the patches. Perhaps
>> I should pick some of them up...
>>
>> https://patchwork.kernel.org/patch/10123137/
>> https://patchwork.kernel.org/patch/10123139/
>> https://patchwork.kernel.org/patch/10123141/
>> https://patchwork.kernel.org/patch/10123143/
>
> Actually I was talking about a different patch, found it now:
>
> ath10k_sdio: DMA bounce buffers for read write
>
> https://patchwork.kernel.org/patch/9979543/

Ah, yes. This is about buffer alignment, particularly when using DMA.

Normally there should be no constraint on the alignment, if the
mmc/sdio controller driver would implement a fallback mechanism from
DMA to PIO mode, in case the buffer can't be used for DMA.

However, I know about cases where simply PIO doesn't work because of
broken HW and in many cases the mmc drivers don't implement the
fallback to PIO even if they could.

Moreover, it seems reasonable to anyway have a way for mmc driver to
inform upper layers about DMA buffer alignment constraints, as to be
able to use DMA as long as possible.

Thoughts?

Kind regards
Uffe

^ permalink raw reply

* Re: [PATCH 07/12] brcmfmac: Convert ALLFFMAC to ether_broadcast_addr
From: Arend van Spriel @ 2018-04-05 19:00 UTC (permalink / raw)
  To: Joe Perches, linux-kernel
  Cc: Franky Lin, Hante Meuleman, Chi-Hsien Lin, Wright Feng,
	Kalle Valo, linux-wireless, brcm80211-dev-list.pdl,
	brcm80211-dev-list, netdev
In-Reply-To: <f60d14eaf9027774cff547b82cfb01d0904f6391.1522479608.git.joe@perches.com>

On 3/31/2018 9:05 AM, Joe Perches wrote:
> Remove the local ALLFFMAC extern array and use the new global instead.

I stumbled upon this one couple of weeks ago. I moved the definition to 
flowring.c although I considered for a moment to pick up the task you 
took here valiantly.

> Miscellanea:
>
> o Convert char *mac to const char *mac as it can't be modified

The real reason is off course that your new global is const and thus mac 
variable need to be const as well to avoid compiler warning.

I have to agree with Kalle regarding the upstream logistics, but for 
what it is worth...

Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
>   drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c   | 2 --
>   drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.h   | 2 --
>   drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c | 8 ++++----
>   3 files changed, 4 insertions(+), 8 deletions(-)

^ permalink raw reply

* Re: [PATCH 3/9] kbuild: Do not pass arguments to link-vmlinux.sh
From: Jiri Olsa @ 2018-04-05 18:59 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, lkml, netdev,
	Linux Kbuild mailing list, Quentin Monnet, Eugene Syromiatnikov,
	Jiri Benc, Stanislav Kozina, Jerome Marchand,
	Arnaldo Carvalho de Melo, Michal Marek, Jiri Kosina
In-Reply-To: <CAK7LNAQHEWAk_J8uZ4_ETR=JzBU9szDCVNxmUvY447uMj1XLVA@mail.gmail.com>

On Fri, Apr 06, 2018 at 12:50:00AM +0900, Masahiro Yamada wrote:
> 2018-04-06 0:16 GMT+09:00 Jiri Olsa <jolsa@kernel.org>:
> > There's no need to pass LD* arguments to link-vmlinux.sh,
> > because they are passed as variables. The only argument
> > the link-vmlinux.sh supports is the 'clean' argument.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> 
> Wrong.
> 
> $(LD) $(LDFLAGS) $(LDFLAGS_vmlinux)
> exist here so that any change in them
> invokes scripts/linkk-vmlinux.sh

sry, I can't see that.. but it's just a side fix,
which is actually not needed for the rest

I'll check on more and address this separately

thanks,
jirka

> 
> 
> 
> 
> >  Makefile | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/Makefile b/Makefile
> > index d3300e46f925..a65a3919c6ad 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1027,7 +1027,7 @@ ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
> >
> >  # Final link of vmlinux with optional arch pass after final link
> >  cmd_link-vmlinux =                                                 \
> > -       $(CONFIG_SHELL) $< $(LD) $(LDFLAGS) $(LDFLAGS_vmlinux) ;    \
> > +       $(CONFIG_SHELL) $< ;                                       \
> >         $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
> >
> >  vmlinux: scripts/link-vmlinux.sh vmlinux_prereq $(vmlinux-deps) FORCE
> 
> 
> 
> 
> 
> 
> -- 
> Best Regards
> Masahiro Yamada

^ permalink raw reply

* [PATCH net-next] hv_netvsc: Add NetVSP v6 into version negotiation
From: Haiyang Zhang @ 2018-04-05 18:42 UTC (permalink / raw)
  To: davem, netdev; +Cc: olaf, sthemmin, haiyangz, linux-kernel, devel, vkuznets

From: Haiyang Zhang <haiyangz@microsoft.com>

This patch adds the NetVSP v6 message structures, and includes this
version into NetVSC/NetVSP version negotiation process.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h | 33 +++++++++++++++++++++++++++++++++
 drivers/net/hyperv/netvsc.c     |  3 ++-
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 960f06141472..036cd55c66fe 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -237,6 +237,7 @@ void netvsc_switch_datapath(struct net_device *nv_dev, bool vf);
 #define NVSP_PROTOCOL_VERSION_2		0x30002
 #define NVSP_PROTOCOL_VERSION_4		0x40000
 #define NVSP_PROTOCOL_VERSION_5		0x50000
+#define NVSP_PROTOCOL_VERSION_6		0x60000
 
 enum {
 	NVSP_MSG_TYPE_NONE = 0,
@@ -308,6 +309,11 @@ enum {
 	NVSP_MSG5_TYPE_SEND_INDIRECTION_TABLE,
 
 	NVSP_MSG5_MAX = NVSP_MSG5_TYPE_SEND_INDIRECTION_TABLE,
+
+	/* Version 6 messages */
+	NVSP_MSG6_TYPE_PD_API,
+
+	NVSP_MSG6_MAX = NVSP_MSG6_TYPE_PD_API
 };
 
 enum {
@@ -619,12 +625,39 @@ union nvsp_5_message_uber {
 	struct nvsp_5_send_indirect_table send_table;
 } __packed;
 
+enum nvsp6_pd_api_op {
+	PD_API_OP_NOTIFY_VSP = 0,
+	PD_API_OP_CONFIG,
+	PD_API_OP_MAX
+};
+
+struct nvsp_6_pd_api_req {
+	u32 op;
+	u64 mmio_pa; /* MMIO Physical Address */
+	u32 mmio_len;
+	u32 num_subchn; /* Number of PD subchannels */
+} __packed;
+
+struct nvsp_6_pd_api_comp {
+	u32 op;
+	u32 status;
+	u32 num_subchn; /* Number of PD subchannels */
+	u8 is_supported; /* Is supported by VSP */
+	u8 is_enabled; /* Is enabled by VSP */
+} __packed;
+
+union nvsp_6_message_uber {
+	struct nvsp_6_pd_api_req pd_req;
+	struct nvsp_6_pd_api_comp pd_comp;
+} __packed;
+
 union nvsp_all_messages {
 	union nvsp_message_init_uber init_msg;
 	union nvsp_1_message_uber v1_msg;
 	union nvsp_2_message_uber v2_msg;
 	union nvsp_4_message_uber v4_msg;
 	union nvsp_5_message_uber v5_msg;
+	union nvsp_6_message_uber v6_msg;
 } __packed;
 
 /* ALL Messages */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index c9910c33e671..3abe57bd85bb 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -509,7 +509,8 @@ static int netvsc_connect_vsp(struct hv_device *device,
 	struct net_device *ndev = hv_get_drvdata(device);
 	static const u32 ver_list[] = {
 		NVSP_PROTOCOL_VERSION_1, NVSP_PROTOCOL_VERSION_2,
-		NVSP_PROTOCOL_VERSION_4, NVSP_PROTOCOL_VERSION_5
+		NVSP_PROTOCOL_VERSION_4, NVSP_PROTOCOL_VERSION_5,
+		NVSP_PROTOCOL_VERSION_6
 	};
 	struct nvsp_message *init_packet;
 	int ndis_version, i, ret;
-- 
2.15.1

^ permalink raw reply related

* Re: marvell switch
From: Andrew Lunn @ 2018-04-05 18:04 UTC (permalink / raw)
  To: Ran Shalit; +Cc: netdev
In-Reply-To: <CAJ2oMhL5APVabJuk9R6Bp17VpEg0HZqdkA88tNhajZumsB+f6g@mail.gmail.com>

On Thu, Apr 05, 2018 at 05:26:37PM +0000, Ran Shalit wrote:
> בתאריך יום ה׳, 5 באפר׳ 2018, 19:09, מאת Andrew Lunn ‏<andrew@lunn.ch>:
> 
> > > Is there a wiki which explains switch configuration ?
> >
> > Nope. The whole idea is that they behave like normal linux
> > interfaces. So there is no need to document them. You already know how
> > to use them.
> >
> > > Is it possible to open socket and send/recieve on switch ports (lan0
> > > for example) ?
> >
> > Sure. It is as normal interface. Give is an IP address and then do
> > TCP/IP as usual.
> >
> 
> There is something I don't fully understand.
> I thought that ports in switch which are not connected to the manage cpu
> are only configured by cpu and then can coomunicate with each other.
> 
> What does it mean to open socket in such port (not the cpu port) and send
> data (ethernet frames) ? Does it mean that the cpu port is sending data
> directly to that port ?

The CPU port is configured to use {E}DSA tagging. Frames ingressing on
the CPU port contain an extra header. That header tells the switch
which port to send the frame out of. The switch can also send frames
received on a port out the CPU port to the host, again, with a
header. The host can tell from the header which port the frame
ingressed.

Using this, we can make ports look like normal Linux interfaces.

      Andrew

^ permalink raw reply

* RE: [Intel-wired-lan] [next-queue PATCH v6 08/10] igb: Add MAC address support for ethtool nftuple filters
From: Vinicius Costa Gomes @ 2018-04-05 17:59 UTC (permalink / raw)
  To: Brown, Aaron F, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, Sanchez-Palencia, Jesus
In-Reply-To: <309B89C4C689E141A5FF6A0C5FB2118B8C824FA7@ORSMSX101.amr.corp.intel.com>

Hi,

"Brown, Aaron F" <aaron.f.brown@intel.com> writes:

>> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
>> Behalf Of Vinicius Costa Gomes
>> Sent: Thursday, March 29, 2018 2:08 PM
>> To: intel-wired-lan@lists.osuosl.org
>> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus <jesus.sanchez-
>> palencia@intel.com>
>> Subject: [Intel-wired-lan] [next-queue PATCH v6 08/10] igb: Add MAC
>> address support for ethtool nftuple filters
>> 
>> This adds the capability of configuring the queue steering of arriving
>> packets based on their source and destination MAC addresses.
>> 
>> In practical terms this adds support for the following use cases,
>> characterized by these examples:
>> 
>> $ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0
>> (this will direct packets with destination address "aa:aa:aa:aa:aa:aa"
>> to the RX queue 0)
>
> This is now working for me, testing with the dst MAC being the MAC on the i210.  I set the filter and all the traffic to the destination MAC address gets routed to the chosen RX queue.
>
>> $ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 action 3
>> (this will direct packets with source address "44:44:44:44:44:44" to
>> the RX queue 3)
>
> However, I am still not getting the raw ethernet source filter to
> work.  Even back to back with no other system to "confuse" the stream,
> I set the filter so the source MAC is the same as the MAC on the link
> partner, send traffic and the traffic bounces around the queues as if
> the filter is not set.

It seems there is at least a documentation issue in the i210 datasheet,
steering (placing traffic into a specific queue) by source address
doesn't work, filtering (accepting the traffic based on some rule) does
work. I pointed this out in the cover letter of v5 as a known issue, but
forgot to repeat it for v6, sorry about the confusion.

But only the filtering part is useful, I think, it enables cases like
this:

$ ethtool -N enp2s0 flow-type ether src 68:05:ca:4a:c9:73 proto 0x22f0 action 3

I added that note in the hope that someone else would have an stronger
opinion about what to do.

Anyway, my plan for now will be to document this better and turn the
case that only the source address is specified into an error.

>
>> 
>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>> ---
>>  drivers/net/ethernet/intel/igb/igb_ethtool.c | 35
>> ++++++++++++++++++++++++----
>>  1 file changed, 31 insertions(+), 4 deletions(-)


Cheers,

^ permalink raw reply

* Re: [PATCH net-next 6/6] netdevsim: Add simple FIB resource controller via devlink
From: Jiri Pirko @ 2018-04-05 17:27 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, roopa, shm, jiri, idosch, jakub.kicinski,
	andy.roulin
In-Reply-To: <20180328012200.15175-7-dsa@cumulusnetworks.com>

Wed, Mar 28, 2018 at 03:22:00AM CEST, dsa@cumulusnetworks.com wrote:
>Add devlink support to netdevsim and use it to implement a simple,
>profile based resource controller. Only one controller is needed
>per namespace, so the first netdevsim netdevice in a namespace
>registers with devlink. If that device is deleted, the resource
>settings are deleted.

I don't understand why you add 1:1 fixed relationship between
netnamespace and devlink instance. That is highly misleading and reader
might think that those 2 are somehow related. They are not. You can have
multiple devlink instances for many ports in a single namespace.

Could you please clarify?

Also, to see the relationship between individual netdevsim netdevices
and the parent devlink instance, we should use devlink_port
instances, like this: 

      devlink1              devlink2
       |    |                |    |
 dl_port1_1 dlport1_2   dlport2_1 dlport2_2
       |    |                |    |
     eth0  eth1             eth2 eth3

Note that "devlink instance" reprensents one ASIC.
The address of the devlink instance is the bus address of the ASIC.
Here, you use address of some/first netdevsim netdev instance.

The way it is implemented in netdevsim by this patch is wrong on
so many levels :(

Could you please fix this? I'm more than happy to help you with this,
please say so. Thanks!

[...]

>+	err = devlink_resource_register(devlink, "IPv4", (u64)-1,
>+					NSIM_RESOURCE_IPV4,
>+					DEVLINK_RESOURCE_ID_PARENT_TOP,
>+					&params, NULL);
>+	if (err) {
>+		pr_err("Failed to register IPv4 top resource\n");
>+		goto out;

this goto is pointless. Just return.

^ permalink raw reply

* Re: Kernel bug from adding bpf actions in tc
From: Davide Caratti @ 2018-04-05 17:27 UTC (permalink / raw)
  To: Lucas Bates; +Cc: Linux Kernel Network Developers
In-Reply-To: <CAMDBHY+SBEUS=-XCWugKeyRd99HDi9fT+jEFs2iYF-oueuBjRg@mail.gmail.com>

On Thu, 2018-04-05 at 11:23 -0400, Lucas Bates wrote:
> Hi Davide,
> 
> Our overnight tc test runs of net-next revealed a kernel bug on one of
> the BPF tests you submitted, d959.  The add action completes
> successfully, but the bug occurs on the verify when tdc does a get of
> the action that was just added.  Here's the text of the dump:
> 

looking at the call trace, I think cfg->filter is NULL when
tcf_bpf_cleanup() is called, and apparently we are in the error path of
tcf_bpf_init(), when 

	prog->bpf_ops = cfg.bpf_ops;
	...
	rcu_assign_pointer(prog->filter, cfg.filter);

have not been executed yet.

If tcf_idr_release() is called in this situation, cfg->is_ebpf is assigned
to true, and bpf_prog_put() can dereference a NULL pointer.

I will try reproducing in the next hours, and eventually followup with a
patch.

thanks!
regards,
-- 
davide

^ permalink raw reply

* [PATCH iproute2] l2tp: no need to export session offsets in JSON output
From: Guillaume Nault @ 2018-04-05 17:24 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman, Stephen Hemminger

The offset and peer_offset parameters are only printed to avoid
confusing external scripts that may parse "ip l2tp show session"
output. There's no reason to keep them in JSON.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 ip/ipl2tp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index 427e0249..05e96387 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -306,8 +306,9 @@ static void print_session(struct l2tp_data *data)
 		print_string(PRINT_FP, NULL, "%s", _SL_);
 	}
 
-	print_uint(PRINT_ANY, "offset", "  offset %u,", 0);
-	print_uint(PRINT_ANY, "peer_offset", " peer offset %u\n", 0);
+	/* Show offsets only for plain console output (for legacy scripts) */
+	print_uint(PRINT_FP, "offset", "  offset %u,", 0);
+	print_uint(PRINT_FP, "peer_offset", " peer offset %u\n", 0);
 
 	if (p->cookie_len > 0)
 		print_cookie("cookie", "cookie",
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH iproute2] ip/l2tp: remove offset and peer-offset options
From: Guillaume Nault @ 2018-04-05 17:19 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, James Chapman
In-Reply-To: <20180404164310.43259e82@xeon-e3>

On Wed, Apr 04, 2018 at 04:43:10PM -0700, Stephen Hemminger wrote:
> On Tue, 3 Apr 2018 17:39:54 +0200
> Guillaume Nault <g.nault@alphalink.fr> wrote:
> 
> > Ignore options "peer-offset" and "offset" when creating sessions. Keep
> > them when dumping sessions in order to avoid breaking external scripts.
> > 
> > "peer-offset" has always been a noop in iproute2. "offset" is now
> > ignored in Linux 4.16 (and was broken before that).
> > 
> > Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
> 
> Sure, this makes sense applied.
> In theory, you could have just dropped them from the JSON output.

Indeed, I'll followup on this.

^ permalink raw reply

* Re: [PATCH v3 2/4] bus: fsl-mc: add restool userspace support
From: Andrew Lunn @ 2018-04-05 16:56 UTC (permalink / raw)
  To: gregkh
  Cc: Ruxandra Ioana Ciocoi Radulescu, Laurentiu Tudor, Stuart Yoder,
	Arnd Bergmann, Ioana Ciornei, Linux Kernel Mailing List,
	Razvan Stefanescu, Roy Pledge, Networking
In-Reply-To: <20180405161214.GB9976@kroah.com>

> > Hi Andrew,
> > 
> > We're waiting for the DPIO driver (which we depend on) to be moved
> > out of staging first, it's currently under review:
> > https://lkml.org/lkml/2018/3/27/1086
> 
> That's stalled on my side right now as the merge window is open and I
> can't do any new stuff until after 4.17-rc1 is out.  So everyone please
> be patient a bit...

I took a quick look.

There are a few inline functions in .c files which is generally
frowned upon. Let the compiler decide.

e.g:

static inline struct dpaa2_io *service_select_by_cpu(struct dpaa2_io *d,	
                                                     int cpu)
static inline struct dpaa2_io *service_select(struct dpaa2_io *d)

dpaa2_io_down() seems to be too simple. dpaa2_io_create() sets up
interrupt triggers, notifications, and adds the new object to the
dpio_list. dpaa2_io_down() seems to just free the memory. Do
notifications need to be disabled, the object taken off the list?

dpaa2_io_store_create() allocates memory using kzalloc() and then uses
dma_map_single(,,DMA_FROM_DEVICE). The documentation says:

   DMA_FROM_DEVICE synchronisation must be done before the driver
   accesses data that may be changed by the device.  This memory
   should be treated as read-only by the driver.  If the driver needs
   to write to it at any point, it should be DMA_BIDIRECTIONAL (see
   below).

Since it has just been allocated, this seems questionable.

I'm also not sure where the correct call to
dma_map_single(,,DMA_FROM_DEVICE) is?  Should dpaa2_io_store_next()
doing this?

The DMA API usage might need a closer review.

    Andrew

^ permalink raw reply

* Re: [PATCH net-next] pptp: remove a buggy dst release in pptp_connect()
From: Sasha Levin @ 2018-04-05 16:42 UTC (permalink / raw)
  To: Sasha Levin, Eric Dumazet, David S . Miller
  Cc: netdev, stable@vger.kernel.org
In-Reply-To: <20180403014837.56377-1-edumazet@google.com>

Hi.

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 30.1120)

The bot has tested the following trees: v4.15.15, v4.14.32, v4.9.92, v4.4.126, 

v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

Please let us know if you'd like to have this patch included in a stable tree.

^ permalink raw reply

* Re: [PATCH] qtnfmac: pearl: pcie: fix memory leak in qtnf_fw_work_handler
From: Gustavo A. R. Silva @ 2018-04-05 16:33 UTC (permalink / raw)
  To: Igor Mitsyanko, Avinash Patil, Sergey Matyukevich, Kalle Valo,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <20180405163100.cuaedftds47pxdrn@bars>

Hi Sergey,

On 04/05/2018 11:31 AM, Sergey Matyukevich wrote:
> Hello Gustavo,
> 
>> In case memory resources for fw were succesfully allocated, release
>> them before jumping to fw_load_fail.
>>
>> Addresses-Coverity-ID: 1466092 ("Resource leak")
>> Fixes: c3b2f7ca4186 ("qtnfmac: implement asynchronous firmware loading")
>> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
>> ---
>>   drivers/net/wireless/quantenna/qtnfmac/pearl/pcie.c | 4 ++++
>>   1 file changed, 4 insertions(+)
> 
> Thanks for the patch!
> 

Glad to help. :)

> Reviewed-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
> 

Thanks
--
Gustavo

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox