Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 02/13] phy: add the mvebu cp110 comphy driver
From: Antoine Tenart @ 2017-08-24 13:44 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Antoine Tenart, davem, kishon, jason, sebastian.hesselbarth,
	gregory.clement, thomas.petazzoni, nadavh, linux, linux-kernel,
	mw, stefanc, miquel.raynal, netdev
In-Reply-To: <20170824133922.GC8022@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]

Hi Andrew,

On Thu, Aug 24, 2017 at 03:39:22PM +0200, Andrew Lunn wrote:
> > +static const struct mvebu_comhy_conf mvebu_comphy_modes[] = {
> > +	/* lane 0 */
> > +	MVEBU_COMPHY_CONF(0, 1, PHY_MODE_SGMII, 0x1),
> > +	/* lane 1 */
> > +	MVEBU_COMPHY_CONF(1, 2, PHY_MODE_SGMII, 0x1),
> > +	/* lane 2 */
> > +	MVEBU_COMPHY_CONF(2, 0, PHY_MODE_SGMII, 0x1),
> > +	MVEBU_COMPHY_CONF(2, 0, PHY_MODE_10GKR, 0x1),
> > +	/* lane 3 */
> > +	MVEBU_COMPHY_CONF(3, 1, PHY_MODE_SGMII, 0x2),
> > +	/* lane 4 */
> > +	MVEBU_COMPHY_CONF(4, 0, PHY_MODE_SGMII, 0x2),
> > +	MVEBU_COMPHY_CONF(4, 0, PHY_MODE_10GKR, 0x2),
> > +	MVEBU_COMPHY_CONF(4, 1, PHY_MODE_SGMII, 0x1),
> > +	/* lane 5 */
> > +	MVEBU_COMPHY_CONF(5, 2, PHY_MODE_SGMII, 0x1),
> > +};
> 
> Do other Marvell SoCs re-use this IP? Maybe add cp110 to the name here
> to indicate what SoC this configuration belongs to? I guess at some
> point, the compatible string will be used to select the correct table
> for the hardware variant.

OK, I'll rename the variable mvebu_comphy_cp110_modes.

> > +static const struct of_device_id mvebu_comphy_of_match_table[] = {
> > +	{ .compatible = "marvell,comphy-cp110" },
> 
> Is that specific enough? It seems like this table is easy to change in
> the VHDL. Could there be another cp110 with a different configuration?

As far as I understand CP110 is the name of the CP, should there be
another one it should be named differently. But I can't be 100% sure,
you never know what comes next :)

How would you name it if not "comphy-cp110"?

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 02/13] phy: add the mvebu cp110 comphy driver
From: Andrew Lunn @ 2017-08-24 13:39 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem, kishon, jason, sebastian.hesselbarth, gregory.clement,
	thomas.petazzoni, nadavh, linux, linux-kernel, mw, stefanc,
	miquel.raynal, netdev
In-Reply-To: <20170824083823.16826-3-antoine.tenart@free-electrons.com>

> +static const struct mvebu_comhy_conf mvebu_comphy_modes[] = {
> +	/* lane 0 */
> +	MVEBU_COMPHY_CONF(0, 1, PHY_MODE_SGMII, 0x1),
> +	/* lane 1 */
> +	MVEBU_COMPHY_CONF(1, 2, PHY_MODE_SGMII, 0x1),
> +	/* lane 2 */
> +	MVEBU_COMPHY_CONF(2, 0, PHY_MODE_SGMII, 0x1),
> +	MVEBU_COMPHY_CONF(2, 0, PHY_MODE_10GKR, 0x1),
> +	/* lane 3 */
> +	MVEBU_COMPHY_CONF(3, 1, PHY_MODE_SGMII, 0x2),
> +	/* lane 4 */
> +	MVEBU_COMPHY_CONF(4, 0, PHY_MODE_SGMII, 0x2),
> +	MVEBU_COMPHY_CONF(4, 0, PHY_MODE_10GKR, 0x2),
> +	MVEBU_COMPHY_CONF(4, 1, PHY_MODE_SGMII, 0x1),
> +	/* lane 5 */
> +	MVEBU_COMPHY_CONF(5, 2, PHY_MODE_SGMII, 0x1),
> +};

Do other Marvell SoCs re-use this IP? Maybe add cp110 to the name here
to indicate what SoC this configuration belongs to? I guess at some
point, the compatible string will be used to select the correct table
for the hardware variant.

> +static const struct of_device_id mvebu_comphy_of_match_table[] = {
> +	{ .compatible = "marvell,comphy-cp110" },

Is that specific enough? It seems like this table is easy to change in
the VHDL. Could there be another cp110 with a different configuration?

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 01/13] phy: add sgmii and 10gkr modes to the phy_mode enum
From: Antoine Tenart @ 2017-08-24 13:31 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Antoine Tenart, davem, kishon, jason, sebastian.hesselbarth,
	gregory.clement, thomas.petazzoni, nadavh, linux, linux-kernel,
	mw, stefanc, miquel.raynal, netdev
In-Reply-To: <20170824131938.GB8022@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 741 bytes --]

Hi Andrew,

On Thu, Aug 24, 2017 at 03:19:38PM +0200, Andrew Lunn wrote:
> On Thu, Aug 24, 2017 at 10:38:11AM +0200, Antoine Tenart wrote:
> > This patch adds more PHY modes to the phy_mode enum, to allow
> > configuring PHYs to the SGMII and/or the 10GKR mode by using the
> > set_mode callback.
> 
> You had me confused for a while here. If there is need for a respin,
> could you change the commit log and prefix PHY with either generic or
> Ethernet, just to make it clear which PHY subsystem we are talking
> about.

I understand the confusion :) I'll update the commit log if there is a
respin.

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [net-next 7/8] net/mlx5: Add hash table for flow groups in flow table
From: Saeed Mahameed @ 2017-08-24 13:21 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Leon Romanovsky, Matan Barak, Saeed Mahameed
In-Reply-To: <20170824132152.29538-1-saeedm@mellanox.com>

From: Matan Barak <matanb@mellanox.com>

When adding a flow table entry (fte) to a flow table (ft), we first
need to find its flow group (fg). Currently, this is done by
traversing a linear list of all flow groups in the flow table.
Furthermore, since multiple flow groups which correspond to the same
fte mask may exist in the same ft, we can't just stop at the first
match. Converting the linear list to rhltable in order to speed things
up.

The last four patches increases the steering rules update rate by a
factor of more than 7 (for insertion of 50K steering rules).

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 187 +++++++++++++++++-----
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |   2 +
 2 files changed, 152 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index d8d45b006996..9704c2eb82a1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -158,6 +158,15 @@ static const struct rhashtable_params rhash_fte = {
 	.min_size = 1,
 };
 
+static const struct rhashtable_params rhash_fg = {
+	.key_len = FIELD_SIZEOF(struct mlx5_flow_group, mask),
+	.key_offset = offsetof(struct mlx5_flow_group, mask),
+	.head_offset = offsetof(struct mlx5_flow_group, hash),
+	.automatic_shrinking = true,
+	.min_size = 1,
+
+};
+
 static void del_rule(struct fs_node *node);
 static void del_flow_table(struct fs_node *node);
 static void del_flow_group(struct fs_node *node);
@@ -318,12 +327,22 @@ static bool check_valid_mask(u8 match_criteria_enable, const u32 *match_criteria
 	return check_last_reserved(match_criteria);
 }
 
-static bool compare_match_criteria(u8 match_criteria_enable1,
-				   u8 match_criteria_enable2,
-				   void *mask1, void *mask2)
+static bool check_valid_spec(const struct mlx5_flow_spec *spec)
 {
-	return match_criteria_enable1 == match_criteria_enable2 &&
-		!memcmp(mask1, mask2, MLX5_ST_SZ_BYTES(fte_match_param));
+	int i;
+
+	if (!check_valid_mask(spec->match_criteria_enable, spec->match_criteria)) {
+		pr_warn("mlx5_core: Match criteria given mismatches match_criteria_enable\n");
+		return false;
+	}
+
+	for (i = 0; i < MLX5_ST_SZ_DW_MATCH_PARAM; i++)
+		if (spec->match_value[i] & ~spec->match_criteria[i]) {
+			pr_warn("mlx5_core: match_value differs from match_criteria\n");
+			return false;
+		}
+
+	return check_last_reserved(spec->match_value);
 }
 
 static struct mlx5_flow_root_namespace *find_root(struct fs_node *node)
@@ -365,6 +384,7 @@ static void del_flow_table(struct fs_node *node)
 	if (err)
 		mlx5_core_warn(dev, "flow steering can't destroy ft\n");
 	ida_destroy(&ft->fte_allocator);
+	rhltable_destroy(&ft->fgs_hash);
 	fs_get_obj(prio, ft->node.parent);
 	prio->num_ft--;
 }
@@ -454,6 +474,7 @@ static void del_flow_group(struct fs_node *node)
 	struct mlx5_flow_group *fg;
 	struct mlx5_flow_table *ft;
 	struct mlx5_core_dev *dev;
+	int err;
 
 	fs_get_obj(fg, node);
 	fs_get_obj(ft, fg->node.parent);
@@ -463,6 +484,10 @@ static void del_flow_group(struct fs_node *node)
 		ft->autogroup.num_groups--;
 
 	rhashtable_destroy(&fg->ftes_hash);
+	err = rhltable_remove(&ft->fgs_hash,
+			      &fg->hash,
+			      rhash_fg);
+	WARN_ON(err);
 	if (mlx5_cmd_destroy_flow_group(dev, ft, fg->id))
 		mlx5_core_warn(dev, "flow steering can't destroy fg %d of ft %d\n",
 			       fg->id, ft->id);
@@ -525,10 +550,17 @@ static struct mlx5_flow_table *alloc_flow_table(int level, u16 vport, int max_ft
 						u32 flags)
 {
 	struct mlx5_flow_table *ft;
+	int ret;
 
 	ft  = kzalloc(sizeof(*ft), GFP_KERNEL);
 	if (!ft)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
+
+	ret = rhltable_init(&ft->fgs_hash, &rhash_fg);
+	if (ret) {
+		kfree(ft);
+		return ERR_PTR(ret);
+	}
 
 	ft->level = level;
 	ft->node.type = FS_TYPE_FLOW_TABLE;
@@ -829,8 +861,8 @@ static struct mlx5_flow_table *__mlx5_create_flow_table(struct mlx5_flow_namespa
 			      ft_attr->max_fte ? roundup_pow_of_two(ft_attr->max_fte) : 0,
 			      root->table_type,
 			      op_mod, ft_attr->flags);
-	if (!ft) {
-		err = -ENOMEM;
+	if (IS_ERR(ft)) {
+		err = PTR_ERR(ft);
 		goto unlock_root;
 	}
 
@@ -942,10 +974,14 @@ static struct mlx5_flow_group *create_flow_group_common(struct mlx5_flow_table *
 	if (IS_ERR(fg))
 		return fg;
 
-	err = mlx5_cmd_create_flow_group(dev, ft, fg_in, &fg->id);
+	err = rhltable_insert(&ft->fgs_hash, &fg->hash, rhash_fg);
 	if (err)
 		goto err_free_fg;
 
+	err = mlx5_cmd_create_flow_group(dev, ft, fg_in, &fg->id);
+	if (err)
+		goto err_remove_fg;
+
 	if (ft->autogroup.active)
 		ft->autogroup.num_groups++;
 	/* Add node to tree */
@@ -956,6 +992,10 @@ static struct mlx5_flow_group *create_flow_group_common(struct mlx5_flow_table *
 
 	return fg;
 
+err_remove_fg:
+	WARN_ON(rhltable_remove(&ft->fgs_hash,
+				&fg->hash,
+				rhash_fg));
 err_free_fg:
 	rhashtable_destroy(&fg->ftes_hash);
 	kfree(fg);
@@ -1291,18 +1331,13 @@ static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg,
 					    u32 *match_value,
 					    struct mlx5_flow_act *flow_act,
 					    struct mlx5_flow_destination *dest,
-					    int dest_num)
+					    int dest_num,
+					    struct fs_fte *fte)
 {
-	u32 masked_val[sizeof(fg->mask.match_criteria)];
 	struct mlx5_flow_handle *handle;
 	struct mlx5_flow_table *ft;
-	struct fs_fte *fte;
 	int i;
 
-	nested_lock_ref_node(&fg->node, FS_MUTEX_PARENT);
-	for (i = 0; i < sizeof(masked_val); i++)
-		masked_val[i] = match_value[i] & fg->mask.match_criteria[i];
-	fte = rhashtable_lookup_fast(&fg->ftes_hash, masked_val, rhash_fte);
 	if (fte) {
 		int old_action;
 		int ret;
@@ -1324,15 +1359,12 @@ static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg,
 		} else {
 			goto add_rules;
 		}
-		unlock_ref_node(&fte->node);
 	}
 	fs_get_obj(ft, fg->node.parent);
 
 	fte = create_fte(fg, match_value, flow_act);
-	if (IS_ERR(fte)) {
-		handle = (void *)fte;
-		goto unlock_fg;
-	}
+	if (IS_ERR(fte))
+		return (void *)fte;
 	tree_init_node(&fte->node, 0, del_fte);
 	nested_lock_ref_node(&fte->node, FS_MUTEX_CHILD);
 	handle = add_rule_fte(fte, fg, dest, dest_num, false);
@@ -1340,7 +1372,7 @@ static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg,
 		unlock_ref_node(&fte->node);
 		destroy_fte(fte, fg);
 		kfree(fte);
-		goto unlock_fg;
+		return handle;
 	}
 
 	tree_add_node(&fte->node, &fg->node);
@@ -1353,8 +1385,6 @@ static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg,
 	}
 unlock_fte:
 	unlock_ref_node(&fte->node);
-unlock_fg:
-	unlock_ref_node(&fg->node);
 	return handle;
 }
 
@@ -1403,6 +1433,96 @@ static bool dest_is_valid(struct mlx5_flow_destination *dest,
 }
 
 static struct mlx5_flow_handle *
+try_add_to_existing_fg(struct mlx5_flow_table *ft,
+		       struct mlx5_flow_spec *spec,
+		       struct mlx5_flow_act *flow_act,
+		       struct mlx5_flow_destination *dest,
+		       int dest_num)
+{
+	struct mlx5_flow_group *g;
+	struct mlx5_flow_handle *rule = ERR_PTR(-ENOENT);
+	struct rhlist_head *tmp, *list;
+	struct match_list {
+		struct list_head	list;
+		struct mlx5_flow_group *g;
+	} match_list, *iter;
+	LIST_HEAD(match_head);
+
+	rcu_read_lock();
+	/* Collect all fgs which has a matching match_criteria */
+	list = rhltable_lookup(&ft->fgs_hash, spec, rhash_fg);
+	rhl_for_each_entry_rcu(g, tmp, list, hash) {
+		struct match_list *curr_match;
+
+		if (likely(list_empty(&match_head))) {
+			match_list.g = g;
+			list_add_tail(&match_list.list, &match_head);
+			continue;
+		}
+		curr_match = kmalloc(sizeof(*curr_match), GFP_ATOMIC);
+
+		if (!curr_match) {
+			rcu_read_unlock();
+			rule = ERR_PTR(-ENOMEM);
+			goto free_list;
+		}
+		curr_match->g = g;
+		list_add_tail(&curr_match->list, &match_head);
+	}
+	rcu_read_unlock();
+
+	/* Try to find a fg that already contains a matching fte */
+	list_for_each_entry(iter, &match_head, list) {
+		struct fs_fte *fte;
+
+		g = iter->g;
+		nested_lock_ref_node(&g->node, FS_MUTEX_PARENT);
+		fte = rhashtable_lookup_fast(&g->ftes_hash, spec->match_value,
+					     rhash_fte);
+		if (fte) {
+			rule = add_rule_fg(g, spec->match_value,
+					   flow_act, dest, dest_num, fte);
+			unlock_ref_node(&g->node);
+			goto free_list;
+		}
+		unlock_ref_node(&g->node);
+	}
+
+	/* No group with matching fte found. Try to add a new fte to any
+	 * matching fg.
+	 */
+	list_for_each_entry(iter, &match_head, list) {
+		g = iter->g;
+
+		nested_lock_ref_node(&g->node, FS_MUTEX_PARENT);
+		rule = add_rule_fg(g, spec->match_value,
+				   flow_act, dest, dest_num, NULL);
+		if (!IS_ERR(rule) || PTR_ERR(rule) != -ENOSPC) {
+			unlock_ref_node(&g->node);
+			goto free_list;
+		}
+		unlock_ref_node(&g->node);
+	}
+
+free_list:
+	if (!list_empty(&match_head)) {
+		struct match_list *match_tmp;
+
+		/* The most common case is having one FG. Since we want to
+		 * optimize this case, we save the first on the stack.
+		 * Therefore, no need to free it.
+		 */
+		list_del(&list_first_entry(&match_head, typeof(*iter), list)->list);
+		list_for_each_entry_safe(iter, match_tmp, &match_head, list) {
+			list_del(&iter->list);
+			kfree(iter);
+		}
+	}
+
+	return rule;
+}
+
+static struct mlx5_flow_handle *
 _mlx5_add_flow_rules(struct mlx5_flow_table *ft,
 		     struct mlx5_flow_spec *spec,
 		     struct mlx5_flow_act *flow_act,
@@ -1414,8 +1534,7 @@ _mlx5_add_flow_rules(struct mlx5_flow_table *ft,
 	struct mlx5_flow_handle *rule;
 	int i;
 
-	if (!check_valid_mask(spec->match_criteria_enable,
-			      spec->match_criteria))
+	if (!check_valid_spec(spec))
 		return ERR_PTR(-EINVAL);
 
 	for (i = 0; i < dest_num; i++) {
@@ -1424,16 +1543,9 @@ _mlx5_add_flow_rules(struct mlx5_flow_table *ft,
 	}
 
 	nested_lock_ref_node(&ft->node, FS_MUTEX_GRANDPARENT);
-	fs_for_each_fg(g, ft)
-		if (compare_match_criteria(g->mask.match_criteria_enable,
-					   spec->match_criteria_enable,
-					   g->mask.match_criteria,
-					   spec->match_criteria)) {
-			rule = add_rule_fg(g, spec->match_value,
-					   flow_act, dest, dest_num);
-			if (!IS_ERR(rule) || PTR_ERR(rule) != -ENOSPC)
-				goto unlock;
-		}
+	rule = try_add_to_existing_fg(ft, spec, flow_act, dest, dest_num);
+	if (!IS_ERR(rule))
+		goto unlock;
 
 	g = create_autogroup(ft, spec->match_criteria_enable,
 			     spec->match_criteria);
@@ -1442,7 +1554,8 @@ _mlx5_add_flow_rules(struct mlx5_flow_table *ft,
 		goto unlock;
 	}
 
-	rule = add_rule_fg(g, spec->match_value, flow_act, dest, dest_num);
+	rule = add_rule_fg(g, spec->match_value, flow_act, dest,
+			   dest_num, NULL);
 	if (IS_ERR(rule)) {
 		/* Remove assumes refcount > 0 and autogroup creates a group
 		 * with a refcount = 0.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index 62709a3865d2..5509a752f98e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -120,6 +120,7 @@ struct mlx5_flow_table {
 	struct list_head		fwd_rules;
 	u32				flags;
 	struct ida			fte_allocator;
+	struct rhltable			fgs_hash;
 };
 
 struct mlx5_fc_cache {
@@ -200,6 +201,7 @@ struct mlx5_flow_group {
 	u32				max_ftes;
 	u32				id;
 	struct rhashtable		ftes_hash;
+	struct rhlist_head		hash;
 };
 
 struct mlx5_flow_root_namespace {
-- 
2.13.0

^ permalink raw reply related

* [net-next 8/8] net/mlx5: Add tracepoints
From: Saeed Mahameed @ 2017-08-24 13:21 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Leon Romanovsky, Matan Barak, Saeed Mahameed
In-Reply-To: <20170824132152.29538-1-saeedm@mellanox.com>

From: Matan Barak <matanb@mellanox.com>

Add a tracepoint infrastructure for mlx5_core driver.
Implemented flow steering tracepoints:
1. Add flow group
2. Remove flow group
3. Add flow table entry
4. Remove flow table entry
5. Add flow table rule
6. Remove flow table rule

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   5 +-
 .../net/ethernet/mellanox/mlx5/core/diag/Makefile  |   1 +
 .../mellanox/mlx5/core/diag/fs_tracepoint.c        | 261 +++++++++++++++++++
 .../mellanox/mlx5/core/diag/fs_tracepoint.h        | 282 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |  11 +-
 5 files changed, 558 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/Makefile
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 22ed657d263a..87a3099808f3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -4,7 +4,8 @@ subdir-ccflags-y += -I$(src)
 mlx5_core-y :=	main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
 		health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \
 		mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
-		fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o
+		fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o \
+		diag/fs_tracepoint.o
 
 mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o
 
@@ -25,3 +26,5 @@ mlx5_core-$(CONFIG_MLX5_CORE_IPOIB) += ipoib/ipoib.o ipoib/ethtool.o
 
 mlx5_core-$(CONFIG_MLX5_EN_IPSEC) += en_accel/ipsec.o en_accel/ipsec_rxtx.o \
 		en_accel/ipsec_stats.o
+
+CFLAGS_tracepoint.o := -I$(src)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/diag/Makefile
new file mode 100644
index 000000000000..d8e17110f25d
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/Makefile
@@ -0,0 +1 @@
+subdir-ccflags-y += -I$(src)/..
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.c
new file mode 100644
index 000000000000..0be4575b58a2
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.c
@@ -0,0 +1,261 @@
+/*
+ * Copyright (c) 2017, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#define CREATE_TRACE_POINTS
+
+#include "fs_tracepoint.h"
+#include <linux/stringify.h>
+
+#define DECLARE_MASK_VAL(type, name) struct {type m; type v; } name
+#define MASK_VAL(type, spec, name, mask, val, fld)	\
+		DECLARE_MASK_VAL(type, name) =		\
+			{.m = MLX5_GET(spec, mask, fld),\
+			 .v = MLX5_GET(spec, val, fld)}
+#define MASK_VAL_BE(type, spec, name, mask, val, fld)	\
+		    DECLARE_MASK_VAL(type, name) =	\
+			{.m = MLX5_GET_BE(type, spec, mask, fld),\
+			 .v = MLX5_GET_BE(type, spec, val, fld)}
+#define GET_MASKED_VAL(name) (name.m & name.v)
+
+#define GET_MASK_VAL(name, type, mask, val, fld)	\
+		(name.m = MLX5_GET(type, mask, fld),	\
+		 name.v = MLX5_GET(type, val, fld),	\
+		 name.m & name.v)
+#define PRINT_MASKED_VAL(name, p, format) {		\
+	if (name.m)			\
+		trace_seq_printf(p, __stringify(name) "=" format " ", name.v); \
+	}
+#define PRINT_MASKED_VALP(name, cast, p, format) {	\
+	if (name.m)			\
+		trace_seq_printf(p, __stringify(name) "=" format " ",	       \
+				 (cast)&name.v);\
+	}
+
+static void print_lyr_2_4_hdrs(struct trace_seq *p,
+			       const u32 *mask, const u32 *value)
+{
+#define MASK_VAL_L2(type, name, fld) \
+	MASK_VAL(type, fte_match_set_lyr_2_4, name, mask, value, fld)
+	DECLARE_MASK_VAL(u64, smac) = {
+		.m = MLX5_GET(fte_match_set_lyr_2_4, mask, smac_47_16) << 16 |
+		     MLX5_GET(fte_match_set_lyr_2_4, mask, smac_15_0),
+		.v = MLX5_GET(fte_match_set_lyr_2_4, value, smac_47_16) << 16 |
+		     MLX5_GET(fte_match_set_lyr_2_4, value, smac_15_0)};
+	DECLARE_MASK_VAL(u64, dmac) = {
+		.m = MLX5_GET(fte_match_set_lyr_2_4, mask, dmac_47_16) << 16 |
+		     MLX5_GET(fte_match_set_lyr_2_4, mask, dmac_15_0),
+		.v = MLX5_GET(fte_match_set_lyr_2_4, value, dmac_47_16) << 16 |
+		     MLX5_GET(fte_match_set_lyr_2_4, value, dmac_15_0)};
+	MASK_VAL_L2(u16, ethertype, ethertype);
+
+	PRINT_MASKED_VALP(smac, u8 *, p, "%pM");
+	PRINT_MASKED_VALP(dmac, u8 *, p, "%pM");
+	PRINT_MASKED_VAL(ethertype, p, "%04x");
+
+	if (ethertype.m == 0xffff) {
+		if (ethertype.v == ETH_P_IP) {
+#define MASK_VAL_L2_BE(type, name, fld) \
+	MASK_VAL_BE(type, fte_match_set_lyr_2_4, name, mask, value, fld)
+			MASK_VAL_L2_BE(u32, src_ipv4,
+				       src_ipv4_src_ipv6.ipv4_layout.ipv4);
+			MASK_VAL_L2_BE(u32, dst_ipv4,
+				       dst_ipv4_dst_ipv6.ipv4_layout.ipv4);
+
+			PRINT_MASKED_VALP(src_ipv4, typeof(&src_ipv4.v), p,
+					  "%pI4");
+			PRINT_MASKED_VALP(dst_ipv4, typeof(&dst_ipv4.v), p,
+					  "%pI4");
+		} else if (ethertype.v == ETH_P_IPV6) {
+			static const struct in6_addr full_ones = {
+				.in6_u.u6_addr32 = {htonl(0xffffffff),
+						    htonl(0xffffffff),
+						    htonl(0xffffffff),
+						    htonl(0xffffffff)},
+			};
+			DECLARE_MASK_VAL(struct in6_addr, src_ipv6);
+			DECLARE_MASK_VAL(struct in6_addr, dst_ipv6);
+
+			memcpy(src_ipv6.m.in6_u.u6_addr8,
+			       MLX5_ADDR_OF(fte_match_set_lyr_2_4, mask,
+					    src_ipv4_src_ipv6.ipv6_layout.ipv6),
+			       sizeof(src_ipv6.m));
+			memcpy(dst_ipv6.m.in6_u.u6_addr8,
+			       MLX5_ADDR_OF(fte_match_set_lyr_2_4, mask,
+					    dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
+			       sizeof(dst_ipv6.m));
+			memcpy(src_ipv6.v.in6_u.u6_addr8,
+			       MLX5_ADDR_OF(fte_match_set_lyr_2_4, value,
+					    src_ipv4_src_ipv6.ipv6_layout.ipv6),
+			       sizeof(src_ipv6.v));
+			memcpy(dst_ipv6.v.in6_u.u6_addr8,
+			       MLX5_ADDR_OF(fte_match_set_lyr_2_4, value,
+					    dst_ipv4_dst_ipv6.ipv6_layout.ipv6),
+			       sizeof(dst_ipv6.v));
+
+			if (!memcmp(&src_ipv6.m, &full_ones, sizeof(full_ones)))
+				trace_seq_printf(p, "src_ipv6=%pI6 ",
+						 &src_ipv6.v);
+			if (!memcmp(&dst_ipv6.m, &full_ones, sizeof(full_ones)))
+				trace_seq_printf(p, "dst_ipv6=%pI6 ",
+						 &dst_ipv6.v);
+		}
+	}
+
+#define PRINT_MASKED_VAL_L2(type, name, fld, p, format) {\
+	MASK_VAL_L2(type, name, fld);		         \
+	PRINT_MASKED_VAL(name, p, format);		 \
+}
+
+	PRINT_MASKED_VAL_L2(u8, ip_protocol, ip_protocol, p, "%02x");
+	PRINT_MASKED_VAL_L2(u16, tcp_flags, tcp_flags, p, "%x");
+	PRINT_MASKED_VAL_L2(u16, tcp_sport, tcp_sport, p, "%u");
+	PRINT_MASKED_VAL_L2(u16, tcp_dport, tcp_dport, p, "%u");
+	PRINT_MASKED_VAL_L2(u16, udp_sport, udp_sport, p, "%u");
+	PRINT_MASKED_VAL_L2(u16, udp_dport, udp_dport, p, "%u");
+	PRINT_MASKED_VAL_L2(u16, first_vid, first_vid, p, "%04x");
+	PRINT_MASKED_VAL_L2(u8, first_prio, first_prio, p, "%x");
+	PRINT_MASKED_VAL_L2(u8, first_cfi, first_cfi, p, "%d");
+	PRINT_MASKED_VAL_L2(u8, ip_dscp, ip_dscp, p, "%02x");
+	PRINT_MASKED_VAL_L2(u8, ip_ecn, ip_ecn, p, "%x");
+	PRINT_MASKED_VAL_L2(u8, cvlan_tag, cvlan_tag, p, "%d");
+	PRINT_MASKED_VAL_L2(u8, svlan_tag, svlan_tag, p, "%d");
+	PRINT_MASKED_VAL_L2(u8, frag, frag, p, "%d");
+}
+
+static void print_misc_parameters_hdrs(struct trace_seq *p,
+				       const u32 *mask, const u32 *value)
+{
+#define MASK_VAL_MISC(type, name, fld) \
+	MASK_VAL(type, fte_match_set_misc, name, mask, value, fld)
+#define PRINT_MASKED_VAL_MISC(type, name, fld, p, format) {\
+	MASK_VAL_MISC(type, name, fld);			   \
+	PRINT_MASKED_VAL(name, p, format);		   \
+}
+	DECLARE_MASK_VAL(u64, gre_key) = {
+		.m = MLX5_GET(fte_match_set_misc, mask, gre_key_h) << 8 |
+		     MLX5_GET(fte_match_set_misc, mask, gre_key_l),
+		.v = MLX5_GET(fte_match_set_misc, value, gre_key_h) << 8 |
+		     MLX5_GET(fte_match_set_misc, value, gre_key_l)};
+
+	PRINT_MASKED_VAL(gre_key, p, "%llu");
+	PRINT_MASKED_VAL_MISC(u32, source_sqn, source_sqn, p, "%u");
+	PRINT_MASKED_VAL_MISC(u16, source_port, source_port, p, "%u");
+	PRINT_MASKED_VAL_MISC(u8, outer_second_prio, outer_second_prio,
+			      p, "%u");
+	PRINT_MASKED_VAL_MISC(u8, outer_second_cfi, outer_second_cfi, p, "%u");
+	PRINT_MASKED_VAL_MISC(u16, outer_second_vid, outer_second_vid, p, "%u");
+	PRINT_MASKED_VAL_MISC(u8, inner_second_prio, inner_second_prio,
+			      p, "%u");
+	PRINT_MASKED_VAL_MISC(u8, inner_second_cfi, inner_second_cfi, p, "%u");
+	PRINT_MASKED_VAL_MISC(u16, inner_second_vid, inner_second_vid, p, "%u");
+
+	PRINT_MASKED_VAL_MISC(u8, outer_second_cvlan_tag,
+			      outer_second_cvlan_tag, p, "%u");
+	PRINT_MASKED_VAL_MISC(u8, inner_second_cvlan_tag,
+			      inner_second_cvlan_tag, p, "%u");
+	PRINT_MASKED_VAL_MISC(u8, outer_second_svlan_tag,
+			      outer_second_svlan_tag, p, "%u");
+	PRINT_MASKED_VAL_MISC(u8, inner_second_svlan_tag,
+			      inner_second_svlan_tag, p, "%u");
+
+	PRINT_MASKED_VAL_MISC(u8, gre_protocol, gre_protocol, p, "%u");
+
+	PRINT_MASKED_VAL_MISC(u32, vxlan_vni, vxlan_vni, p, "%u");
+	PRINT_MASKED_VAL_MISC(u32, outer_ipv6_flow_label, outer_ipv6_flow_label,
+			      p, "%x");
+	PRINT_MASKED_VAL_MISC(u32, inner_ipv6_flow_label, inner_ipv6_flow_label,
+			      p, "%x");
+}
+
+const char *parse_fs_hdrs(struct trace_seq *p,
+			  u8 match_criteria_enable,
+			  const u32 *mask_outer,
+			  const u32 *mask_misc,
+			  const u32 *mask_inner,
+			  const u32 *value_outer,
+			  const u32 *value_misc,
+			  const u32 *value_inner)
+{
+	const char *ret = trace_seq_buffer_ptr(p);
+
+	if (match_criteria_enable &
+	    1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_OUTER_HEADERS) {
+		trace_seq_printf(p, "[outer] ");
+		print_lyr_2_4_hdrs(p, mask_outer, value_outer);
+	}
+
+	if (match_criteria_enable &
+	    1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_MISC_PARAMETERS) {
+		trace_seq_printf(p, "[misc] ");
+		print_misc_parameters_hdrs(p, mask_misc, value_misc);
+	}
+	if (match_criteria_enable &
+	    1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_INNER_HEADERS) {
+		trace_seq_printf(p, "[inner] ");
+		print_lyr_2_4_hdrs(p, mask_inner, value_inner);
+	}
+	trace_seq_putc(p, 0);
+	return ret;
+}
+
+const char *parse_fs_dst(struct trace_seq *p,
+			 const struct mlx5_flow_destination *dst,
+			 u32 counter_id)
+{
+	const char *ret = trace_seq_buffer_ptr(p);
+
+	switch (dst->type) {
+	case MLX5_FLOW_DESTINATION_TYPE_VPORT:
+		trace_seq_printf(p, "vport=%u\n", dst->vport_num);
+		break;
+	case MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE:
+		trace_seq_printf(p, "ft=%p\n", dst->ft);
+		break;
+	case MLX5_FLOW_DESTINATION_TYPE_TIR:
+		trace_seq_printf(p, "tir=%u\n", dst->tir_num);
+		break;
+	case MLX5_FLOW_DESTINATION_TYPE_COUNTER:
+		trace_seq_printf(p, "counter_id=%u\n", counter_id);
+		break;
+	}
+
+	trace_seq_putc(p, 0);
+	return ret;
+}
+
+EXPORT_TRACEPOINT_SYMBOL(mlx5_fs_add_fg);
+EXPORT_TRACEPOINT_SYMBOL(mlx5_fs_del_fg);
+EXPORT_TRACEPOINT_SYMBOL(mlx5_fs_set_fte);
+EXPORT_TRACEPOINT_SYMBOL(mlx5_fs_del_fte);
+EXPORT_TRACEPOINT_SYMBOL(mlx5_fs_add_rule);
+EXPORT_TRACEPOINT_SYMBOL(mlx5_fs_del_rule);
+
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h b/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h
new file mode 100644
index 000000000000..1e3a6c3e4132
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h
@@ -0,0 +1,282 @@
+/*
+ * Copyright (c) 2017, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#if !defined(_MLX5_FS_TP_) || defined(TRACE_HEADER_MULTI_READ)
+#define _MLX5_FS_TP_
+
+#include <linux/tracepoint.h>
+#include <linux/trace_seq.h>
+#include "../fs_core.h"
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mlx5
+
+#define __parse_fs_hdrs(match_criteria_enable, mouter, mmisc, minner, vouter, \
+			vinner, vmisc)					      \
+	parse_fs_hdrs(p, match_criteria_enable, mouter, mmisc, minner, vouter,\
+		      vinner, vmisc)
+
+const char *parse_fs_hdrs(struct trace_seq *p,
+			  u8 match_criteria_enable,
+			  const u32 *mask_outer,
+			  const u32 *mask_misc,
+			  const u32 *mask_inner,
+			  const u32 *value_outer,
+			  const u32 *value_misc,
+			  const u32 *value_inner);
+
+#define __parse_fs_dst(dst, counter_id) \
+	parse_fs_dst(p, (const struct mlx5_flow_destination *)dst, counter_id)
+
+const char *parse_fs_dst(struct trace_seq *p,
+			 const struct mlx5_flow_destination *dst,
+			 u32 counter_id);
+
+TRACE_EVENT(mlx5_fs_add_fg,
+	    TP_PROTO(const struct mlx5_flow_group *fg),
+	    TP_ARGS(fg),
+	    TP_STRUCT__entry(
+		__field(const struct mlx5_flow_group *, fg)
+		__field(const struct mlx5_flow_table *, ft)
+		__field(u32, start_index)
+		__field(u32, end_index)
+		__field(u32, id)
+		__field(u8, mask_enable)
+		__array(u32, mask_outer, MLX5_ST_SZ_DW(fte_match_set_lyr_2_4))
+		__array(u32, mask_inner, MLX5_ST_SZ_DW(fte_match_set_lyr_2_4))
+		__array(u32, mask_misc, MLX5_ST_SZ_DW(fte_match_set_misc))
+	    ),
+	    TP_fast_assign(
+			   __entry->fg = fg;
+			   fs_get_obj(__entry->ft, fg->node.parent);
+			   __entry->start_index = fg->start_index;
+			   __entry->end_index = fg->start_index + fg->max_ftes;
+			   __entry->id = fg->id;
+			   __entry->mask_enable = fg->mask.match_criteria_enable;
+			   memcpy(__entry->mask_outer,
+				  MLX5_ADDR_OF(fte_match_param,
+					       &fg->mask.match_criteria,
+					       outer_headers),
+				  sizeof(__entry->mask_outer));
+			   memcpy(__entry->mask_inner,
+				  MLX5_ADDR_OF(fte_match_param,
+					       &fg->mask.match_criteria,
+					       inner_headers),
+				  sizeof(__entry->mask_inner));
+			   memcpy(__entry->mask_misc,
+				  MLX5_ADDR_OF(fte_match_param,
+					       &fg->mask.match_criteria,
+					       misc_parameters),
+				  sizeof(__entry->mask_misc));
+
+	    ),
+	    TP_printk("fg=%p ft=%p id=%u start=%u end=%u bit_mask=%02x %s\n",
+		      __entry->fg, __entry->ft, __entry->id,
+		      __entry->start_index, __entry->end_index,
+		      __entry->mask_enable,
+		      __parse_fs_hdrs(__entry->mask_enable,
+				      __entry->mask_outer,
+				      __entry->mask_misc,
+				      __entry->mask_inner,
+				      __entry->mask_outer,
+				      __entry->mask_misc,
+				      __entry->mask_inner))
+	    );
+
+TRACE_EVENT(mlx5_fs_del_fg,
+	    TP_PROTO(const struct mlx5_flow_group *fg),
+	    TP_ARGS(fg),
+	    TP_STRUCT__entry(
+		__field(const struct mlx5_flow_group *, fg)
+		__field(u32, id)
+	    ),
+	    TP_fast_assign(
+			   __entry->fg = fg;
+			   __entry->id = fg->id;
+
+	    ),
+	    TP_printk("fg=%p id=%u\n",
+		      __entry->fg, __entry->id)
+	    );
+
+#define ACTION_FLAGS \
+	{MLX5_FLOW_CONTEXT_ACTION_ALLOW,	 "ALLOW"},\
+	{MLX5_FLOW_CONTEXT_ACTION_DROP,		 "DROP"},\
+	{MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,	 "FWD"},\
+	{MLX5_FLOW_CONTEXT_ACTION_COUNT,	 "CNT"},\
+	{MLX5_FLOW_CONTEXT_ACTION_ENCAP,	 "ENCAP"},\
+	{MLX5_FLOW_CONTEXT_ACTION_DECAP,	 "DECAP"},\
+	{MLX5_FLOW_CONTEXT_ACTION_MOD_HDR,	 "MOD_HDR"},\
+	{MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_PRIO, "NEXT_PRIO"}
+
+TRACE_EVENT(mlx5_fs_set_fte,
+	    TP_PROTO(const struct fs_fte *fte, bool new_fte),
+	    TP_ARGS(fte, new_fte),
+	    TP_STRUCT__entry(
+		__field(const struct fs_fte *, fte)
+		__field(const struct mlx5_flow_group *, fg)
+		__field(u32, group_index)
+		__field(u32, index)
+		__field(u32, action)
+		__field(u32, flow_tag)
+		__field(u8,  mask_enable)
+		__field(bool, new_fte)
+		__array(u32, mask_outer, MLX5_ST_SZ_DW(fte_match_set_lyr_2_4))
+		__array(u32, mask_inner, MLX5_ST_SZ_DW(fte_match_set_lyr_2_4))
+		__array(u32, mask_misc, MLX5_ST_SZ_DW(fte_match_set_misc))
+		__array(u32, value_outer, MLX5_ST_SZ_DW(fte_match_set_lyr_2_4))
+		__array(u32, value_inner, MLX5_ST_SZ_DW(fte_match_set_lyr_2_4))
+		__array(u32, value_misc, MLX5_ST_SZ_DW(fte_match_set_misc))
+	    ),
+	    TP_fast_assign(
+			   __entry->fte = fte;
+			   __entry->new_fte = new_fte;
+			   fs_get_obj(__entry->fg, fte->node.parent);
+			   __entry->group_index = __entry->fg->id;
+			   __entry->index = fte->index;
+			   __entry->action = fte->action;
+			   __entry->mask_enable = __entry->fg->mask.match_criteria_enable;
+			   __entry->flow_tag = fte->flow_tag;
+			   memcpy(__entry->mask_outer,
+				  MLX5_ADDR_OF(fte_match_param,
+					       &__entry->fg->mask.match_criteria,
+					       outer_headers),
+				  sizeof(__entry->mask_outer));
+			   memcpy(__entry->mask_inner,
+				  MLX5_ADDR_OF(fte_match_param,
+					       &__entry->fg->mask.match_criteria,
+					       inner_headers),
+				  sizeof(__entry->mask_inner));
+			   memcpy(__entry->mask_misc,
+				  MLX5_ADDR_OF(fte_match_param,
+					       &__entry->fg->mask.match_criteria,
+					       misc_parameters),
+				  sizeof(__entry->mask_misc));
+			   memcpy(__entry->value_outer,
+				  MLX5_ADDR_OF(fte_match_param,
+					       &fte->val,
+					       outer_headers),
+				  sizeof(__entry->value_outer));
+			   memcpy(__entry->value_inner,
+				  MLX5_ADDR_OF(fte_match_param,
+					       &fte->val,
+					       inner_headers),
+				  sizeof(__entry->value_inner));
+			   memcpy(__entry->value_misc,
+				  MLX5_ADDR_OF(fte_match_param,
+					       &fte->val,
+					       misc_parameters),
+				  sizeof(__entry->value_misc));
+
+	    ),
+	    TP_printk("op=%s fte=%p fg=%p index=%u group_index=%u action=<%s> flow_tag=%x %s\n",
+		      __entry->new_fte ? "add" : "set",
+		      __entry->fte, __entry->fg, __entry->index,
+		      __entry->group_index, __print_flags(__entry->action, "|",
+							  ACTION_FLAGS),
+		      __entry->flow_tag,
+		      __parse_fs_hdrs(__entry->mask_enable,
+				      __entry->mask_outer,
+				      __entry->mask_misc,
+				      __entry->mask_inner,
+				      __entry->value_outer,
+				      __entry->value_misc,
+				      __entry->value_inner))
+	    );
+
+TRACE_EVENT(mlx5_fs_del_fte,
+	    TP_PROTO(const struct fs_fte *fte),
+	    TP_ARGS(fte),
+	    TP_STRUCT__entry(
+		__field(const struct fs_fte *, fte)
+		__field(u32, index)
+	    ),
+	    TP_fast_assign(
+			   __entry->fte = fte;
+			   __entry->index = fte->index;
+
+	    ),
+	    TP_printk("fte=%p index=%u\n",
+		      __entry->fte, __entry->index)
+	    );
+
+TRACE_EVENT(mlx5_fs_add_rule,
+	    TP_PROTO(const struct mlx5_flow_rule *rule),
+	    TP_ARGS(rule),
+	    TP_STRUCT__entry(
+		__field(const struct mlx5_flow_rule *, rule)
+		__field(const struct fs_fte *, fte)
+		__field(u32, sw_action)
+		__field(u32, index)
+		__field(u32, counter_id)
+		__array(u8, destination, sizeof(struct mlx5_flow_destination))
+	    ),
+	    TP_fast_assign(
+			   __entry->rule = rule;
+			   fs_get_obj(__entry->fte, rule->node.parent);
+			   __entry->index = __entry->fte->dests_size - 1;
+			   __entry->sw_action = rule->sw_action;
+			   memcpy(__entry->destination,
+				  &rule->dest_attr,
+				  sizeof(__entry->destination));
+			   if (rule->dest_attr.type & MLX5_FLOW_DESTINATION_TYPE_COUNTER &&
+			       rule->dest_attr.counter)
+				__entry->counter_id =
+				rule->dest_attr.counter->id;
+	    ),
+	    TP_printk("rule=%p fte=%p index=%u sw_action=<%s> [dst] %s\n",
+		      __entry->rule, __entry->fte, __entry->index,
+		      __print_flags(__entry->sw_action, "|", ACTION_FLAGS),
+		      __parse_fs_dst(__entry->destination, __entry->counter_id))
+	    );
+
+TRACE_EVENT(mlx5_fs_del_rule,
+	    TP_PROTO(const struct mlx5_flow_rule *rule),
+	    TP_ARGS(rule),
+	    TP_STRUCT__entry(
+		__field(const struct mlx5_flow_rule *, rule)
+		__field(const struct fs_fte *, fte)
+	    ),
+	    TP_fast_assign(
+			   __entry->rule = rule;
+			   fs_get_obj(__entry->fte, rule->node.parent);
+	    ),
+	    TP_printk("rule=%p fte=%p\n",
+		      __entry->rule, __entry->fte)
+	    );
+#endif
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH ./diag
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE fs_tracepoint
+#include <trace/define_trace.h>
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 9704c2eb82a1..d731d57a996a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -36,6 +36,7 @@
 #include "mlx5_core.h"
 #include "fs_core.h"
 #include "fs_cmd.h"
+#include "diag/fs_tracepoint.h"
 
 #define INIT_TREE_NODE_ARRAY_SIZE(...)	(sizeof((struct init_tree_node[]){__VA_ARGS__}) /\
 					 sizeof(struct init_tree_node))
@@ -404,6 +405,7 @@ static void del_rule(struct fs_node *node)
 	fs_get_obj(fte, rule->node.parent);
 	fs_get_obj(fg, fte->node.parent);
 	fs_get_obj(ft, fg->node.parent);
+	trace_mlx5_fs_del_rule(rule);
 	list_del(&rule->node.list);
 	if (rule->sw_action == MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_PRIO) {
 		mutex_lock(&rule->dest_attr.ft->lock);
@@ -457,6 +459,7 @@ static void del_fte(struct fs_node *node)
 	fs_get_obj(fte, node);
 	fs_get_obj(fg, fte->node.parent);
 	fs_get_obj(ft, fg->node.parent);
+	trace_mlx5_fs_del_fte(fte);
 
 	dev = get_dev(&ft->node);
 	err = mlx5_cmd_delete_fte(dev, ft,
@@ -479,6 +482,7 @@ static void del_flow_group(struct fs_node *node)
 	fs_get_obj(fg, node);
 	fs_get_obj(ft, fg->node.parent);
 	dev = get_dev(&ft->node);
+	trace_mlx5_fs_del_fg(fg);
 
 	if (ft->autogroup.active)
 		ft->autogroup.num_groups--;
@@ -990,6 +994,7 @@ static struct mlx5_flow_group *create_flow_group_common(struct mlx5_flow_table *
 	/* Add node to group list */
 	list_add(&fg->node.list, prev_fg);
 
+	trace_mlx5_fs_add_fg(fg);
 	return fg;
 
 err_remove_fg:
@@ -1357,6 +1362,7 @@ static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg,
 			fte->action = old_action;
 			goto unlock_fte;
 		} else {
+			trace_mlx5_fs_set_fte(fte, false);
 			goto add_rules;
 		}
 	}
@@ -1378,10 +1384,13 @@ static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg,
 	tree_add_node(&fte->node, &fg->node);
 	/* fte list isn't sorted */
 	list_add_tail(&fte->node.list, &fg->node.children);
+	trace_mlx5_fs_set_fte(fte, true);
 add_rules:
 	for (i = 0; i < handle->num_rules; i++) {
-		if (atomic_read(&handle->rule[i]->node.refcount) == 1)
+		if (atomic_read(&handle->rule[i]->node.refcount) == 1) {
 			tree_add_node(&handle->rule[i]->node, &fte->node);
+			trace_mlx5_fs_add_rule(handle->rule[i]);
+		}
 	}
 unlock_fte:
 	unlock_ref_node(&fte->node);
-- 
2.13.0

^ permalink raw reply related

* [net-next 6/8] net/mlx5: Add hash table to search FTEs in a flow-group
From: Saeed Mahameed @ 2017-08-24 13:21 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Leon Romanovsky, Matan Barak, Saeed Mahameed
In-Reply-To: <20170824132152.29538-1-saeedm@mellanox.com>

From: Matan Barak <matanb@mellanox.com>

When adding a flow table entry (fte) to a flow group (fg), we first
need to check whether this fte exist. In such a case we just merge
the destinations (if possible). Currently, this is done by traversing
the fte list available in a fg. This could take a lot of time when
using large flow groups. Speeding this up by using rhashtable, which
is much faster.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 228 +++++++++++++++-------
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |   3 +
 2 files changed, 160 insertions(+), 71 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 986c5e50e872..d8d45b006996 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -150,6 +150,14 @@ enum fs_i_mutex_lock_class {
 	FS_MUTEX_CHILD
 };
 
+static const struct rhashtable_params rhash_fte = {
+	.key_len = FIELD_SIZEOF(struct fs_fte, val),
+	.key_offset = offsetof(struct fs_fte, val),
+	.head_offset = offsetof(struct fs_fte, hash),
+	.automatic_shrinking = true,
+	.min_size = 1,
+};
+
 static void del_rule(struct fs_node *node);
 static void del_flow_table(struct fs_node *node);
 static void del_flow_group(struct fs_node *node);
@@ -255,63 +263,59 @@ static struct fs_prio *find_prio(struct mlx5_flow_namespace *ns,
 	return NULL;
 }
 
-static bool masked_memcmp(void *mask, void *val1, void *val2, size_t size)
+static bool check_last_reserved(const u32 *match_criteria)
 {
-	unsigned int i;
-
-	for (i = 0; i < size; i++, mask++, val1++, val2++)
-		if ((*((u8 *)val1) & (*(u8 *)mask)) !=
-		    ((*(u8 *)val2) & (*(u8 *)mask)))
-			return false;
+	char *match_criteria_reserved =
+		MLX5_ADDR_OF(fte_match_param, match_criteria, MLX5_FTE_MATCH_PARAM_RESERVED);
 
-	return true;
+	return	!match_criteria_reserved[0] &&
+		!memcmp(match_criteria_reserved, match_criteria_reserved + 1,
+			MLX5_FLD_SZ_BYTES(fte_match_param,
+					  MLX5_FTE_MATCH_PARAM_RESERVED) - 1);
 }
 
-static bool compare_match_value(struct mlx5_flow_group_mask *mask,
-				void *fte_param1, void *fte_param2)
+static bool check_valid_mask(u8 match_criteria_enable, const u32 *match_criteria)
 {
-	if (mask->match_criteria_enable &
-	    1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_OUTER_HEADERS) {
-		void *fte_match1 = MLX5_ADDR_OF(fte_match_param,
-						fte_param1, outer_headers);
-		void *fte_match2 = MLX5_ADDR_OF(fte_match_param,
-						fte_param2, outer_headers);
-		void *fte_mask = MLX5_ADDR_OF(fte_match_param,
-					      mask->match_criteria, outer_headers);
+	if (match_criteria_enable & ~(
+		(1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_OUTER_HEADERS)   |
+		(1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_MISC_PARAMETERS) |
+		(1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_INNER_HEADERS)))
+		return false;
 
-		if (!masked_memcmp(fte_mask, fte_match1, fte_match2,
-				   MLX5_ST_SZ_BYTES(fte_match_set_lyr_2_4)))
+	if (!(match_criteria_enable &
+	      1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_OUTER_HEADERS)) {
+		char *fg_type_mask = MLX5_ADDR_OF(fte_match_param,
+						  match_criteria, outer_headers);
+
+		if (fg_type_mask[0] ||
+		    memcmp(fg_type_mask, fg_type_mask + 1,
+			   MLX5_ST_SZ_BYTES(fte_match_set_lyr_2_4) - 1))
 			return false;
 	}
 
-	if (mask->match_criteria_enable &
-	    1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_MISC_PARAMETERS) {
-		void *fte_match1 = MLX5_ADDR_OF(fte_match_param,
-						fte_param1, misc_parameters);
-		void *fte_match2 = MLX5_ADDR_OF(fte_match_param,
-						fte_param2, misc_parameters);
-		void *fte_mask = MLX5_ADDR_OF(fte_match_param,
-					  mask->match_criteria, misc_parameters);
+	if (!(match_criteria_enable &
+	      1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_MISC_PARAMETERS)) {
+		char *fg_type_mask = MLX5_ADDR_OF(fte_match_param,
+						  match_criteria, misc_parameters);
 
-		if (!masked_memcmp(fte_mask, fte_match1, fte_match2,
-				   MLX5_ST_SZ_BYTES(fte_match_set_misc)))
+		if (fg_type_mask[0] ||
+		    memcmp(fg_type_mask, fg_type_mask + 1,
+			   MLX5_ST_SZ_BYTES(fte_match_set_misc) - 1))
 			return false;
 	}
 
-	if (mask->match_criteria_enable &
-	    1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_INNER_HEADERS) {
-		void *fte_match1 = MLX5_ADDR_OF(fte_match_param,
-						fte_param1, inner_headers);
-		void *fte_match2 = MLX5_ADDR_OF(fte_match_param,
-						fte_param2, inner_headers);
-		void *fte_mask = MLX5_ADDR_OF(fte_match_param,
-					  mask->match_criteria, inner_headers);
+	if (!(match_criteria_enable &
+	      1 << MLX5_CREATE_FLOW_GROUP_IN_MATCH_CRITERIA_ENABLE_INNER_HEADERS)) {
+		char *fg_type_mask = MLX5_ADDR_OF(fte_match_param,
+						  match_criteria, inner_headers);
 
-		if (!masked_memcmp(fte_mask, fte_match1, fte_match2,
-				   MLX5_ST_SZ_BYTES(fte_match_set_lyr_2_4)))
+		if (fg_type_mask[0] ||
+		    memcmp(fg_type_mask, fg_type_mask + 1,
+			   MLX5_ST_SZ_BYTES(fte_match_set_lyr_2_4) - 1))
 			return false;
 	}
-	return true;
+
+	return check_last_reserved(match_criteria);
 }
 
 static bool compare_match_criteria(u8 match_criteria_enable1,
@@ -410,6 +414,18 @@ static void del_rule(struct fs_node *node)
 	}
 }
 
+static void destroy_fte(struct fs_fte *fte, struct mlx5_flow_group *fg)
+{
+	struct mlx5_flow_table *ft;
+	int ret;
+
+	ret = rhashtable_remove_fast(&fg->ftes_hash, &fte->hash, rhash_fte);
+	WARN_ON(ret);
+	fte->status = 0;
+	fs_get_obj(ft, fg->node.parent);
+	ida_simple_remove(&ft->fte_allocator, fte->index);
+}
+
 static void del_fte(struct fs_node *node)
 {
 	struct mlx5_flow_table *ft;
@@ -430,8 +446,7 @@ static void del_fte(struct fs_node *node)
 			       "flow steering can't delete fte in index %d of flow group id %d\n",
 			       fte->index, fg->id);
 
-	ida_simple_remove(&ft->fte_allocator, fte->index);
-	fte->status = 0;
+	destroy_fte(fte, fg);
 }
 
 static void del_flow_group(struct fs_node *node)
@@ -447,6 +462,7 @@ static void del_flow_group(struct fs_node *node)
 	if (ft->autogroup.active)
 		ft->autogroup.num_groups--;
 
+	rhashtable_destroy(&fg->ftes_hash);
 	if (mlx5_cmd_destroy_flow_group(dev, ft, fg->id))
 		mlx5_core_warn(dev, "flow steering can't destroy fg %d of ft %d\n",
 			       fg->id, ft->id);
@@ -481,10 +497,17 @@ static struct mlx5_flow_group *alloc_flow_group(u32 *create_fg_in)
 	u8 match_criteria_enable = MLX5_GET(create_flow_group_in,
 					    create_fg_in,
 					    match_criteria_enable);
+	int ret;
+
 	fg = kzalloc(sizeof(*fg), GFP_KERNEL);
 	if (!fg)
 		return ERR_PTR(-ENOMEM);
 
+	ret = rhashtable_init(&fg->ftes_hash, &rhash_fte);
+	if (ret) {
+		kfree(fg);
+		return ERR_PTR(ret);
+	}
 	fg->mask.match_criteria_enable = match_criteria_enable;
 	memcpy(&fg->mask.match_criteria, match_criteria,
 	       sizeof(fg->mask.match_criteria));
@@ -920,10 +943,8 @@ static struct mlx5_flow_group *create_flow_group_common(struct mlx5_flow_table *
 		return fg;
 
 	err = mlx5_cmd_create_flow_group(dev, ft, fg_in, &fg->id);
-	if (err) {
-		kfree(fg);
-		return ERR_PTR(err);
-	}
+	if (err)
+		goto err_free_fg;
 
 	if (ft->autogroup.active)
 		ft->autogroup.num_groups++;
@@ -934,13 +955,27 @@ static struct mlx5_flow_group *create_flow_group_common(struct mlx5_flow_table *
 	list_add(&fg->node.list, prev_fg);
 
 	return fg;
+
+err_free_fg:
+	rhashtable_destroy(&fg->ftes_hash);
+	kfree(fg);
+
+	return ERR_PTR(err);
 }
 
 struct mlx5_flow_group *mlx5_create_flow_group(struct mlx5_flow_table *ft,
 					       u32 *fg_in)
 {
+	void *match_criteria = MLX5_ADDR_OF(create_flow_group_in,
+					    fg_in, match_criteria);
+	u8 match_criteria_enable = MLX5_GET(create_flow_group_in,
+					    fg_in,
+					    match_criteria_enable);
 	struct mlx5_flow_group *fg;
 
+	if (!check_valid_mask(match_criteria_enable, match_criteria))
+		return ERR_PTR(-EINVAL);
+
 	if (ft->autogroup.active)
 		return ERR_PTR(-EPERM);
 
@@ -1104,6 +1139,7 @@ static struct fs_fte *create_fte(struct mlx5_flow_group *fg,
 	struct mlx5_flow_table *ft;
 	struct fs_fte *fte;
 	int index;
+	int ret;
 
 	fs_get_obj(ft, fg->node.parent);
 	index = ida_simple_get(&ft->fte_allocator, fg->start_index,
@@ -1114,11 +1150,20 @@ static struct fs_fte *create_fte(struct mlx5_flow_group *fg,
 
 	fte = alloc_fte(flow_act, match_value, index);
 	if (IS_ERR(fte)) {
-		ida_simple_remove(&ft->fte_allocator, index);
-		return fte;
+		ret = PTR_ERR(fte);
+		goto err_alloc;
 	}
+	ret = rhashtable_insert_fast(&fg->ftes_hash, &fte->hash, rhash_fte);
+	if (ret)
+		goto err_hash;
 
 	return fte;
+
+err_hash:
+	kfree(fte);
+err_alloc:
+	ida_simple_remove(&ft->fte_allocator, index);
+	return ERR_PTR(ret);
 }
 
 static struct mlx5_flow_group *create_autogroup(struct mlx5_flow_table *ft,
@@ -1206,42 +1251,78 @@ static struct mlx5_flow_rule *find_flow_rule(struct fs_fte *fte,
 	return NULL;
 }
 
+static bool check_conflicting_actions(u32 action1, u32 action2)
+{
+	u32 xored_actions = action1 ^ action2;
+
+	/* if one rule only wants to count, it's ok */
+	if (action1 == MLX5_FLOW_CONTEXT_ACTION_COUNT ||
+	    action2 == MLX5_FLOW_CONTEXT_ACTION_COUNT)
+		return false;
+
+	if (xored_actions & (MLX5_FLOW_CONTEXT_ACTION_DROP  |
+			     MLX5_FLOW_CONTEXT_ACTION_ENCAP |
+			     MLX5_FLOW_CONTEXT_ACTION_DECAP))
+		return true;
+
+	return false;
+}
+
+static int check_conflicting_ftes(struct fs_fte *fte, const struct mlx5_flow_act *flow_act)
+{
+	if (check_conflicting_actions(flow_act->action, fte->action)) {
+		mlx5_core_warn(get_dev(&fte->node),
+			       "Found two FTEs with conflicting actions\n");
+		return -EEXIST;
+	}
+
+	if (fte->flow_tag != flow_act->flow_tag) {
+		mlx5_core_warn(get_dev(&fte->node),
+			       "FTE flow tag %u already exists with different flow tag %u\n",
+			       fte->flow_tag,
+			       flow_act->flow_tag);
+		return -EEXIST;
+	}
+
+	return 0;
+}
+
 static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg,
 					    u32 *match_value,
 					    struct mlx5_flow_act *flow_act,
 					    struct mlx5_flow_destination *dest,
 					    int dest_num)
 {
+	u32 masked_val[sizeof(fg->mask.match_criteria)];
 	struct mlx5_flow_handle *handle;
 	struct mlx5_flow_table *ft;
 	struct fs_fte *fte;
 	int i;
 
 	nested_lock_ref_node(&fg->node, FS_MUTEX_PARENT);
-	fs_for_each_fte(fte, fg) {
+	for (i = 0; i < sizeof(masked_val); i++)
+		masked_val[i] = match_value[i] & fg->mask.match_criteria[i];
+	fte = rhashtable_lookup_fast(&fg->ftes_hash, masked_val, rhash_fte);
+	if (fte) {
+		int old_action;
+		int ret;
+
 		nested_lock_ref_node(&fte->node, FS_MUTEX_CHILD);
-		if (compare_match_value(&fg->mask, match_value, &fte->val) &&
-		    (flow_act->action & fte->action)) {
-			int old_action = fte->action;
-
-			if (fte->flow_tag != flow_act->flow_tag) {
-				mlx5_core_warn(get_dev(&fte->node),
-					       "FTE flow tag %u already exists with different flow tag %u\n",
-					       fte->flow_tag,
-					       flow_act->flow_tag);
-				handle = ERR_PTR(-EEXIST);
-				goto unlock_fte;
-			}
+		ret = check_conflicting_ftes(fte, flow_act);
+		if (ret) {
+			handle = ERR_PTR(ret);
+			goto unlock_fte;
+		}
 
-			fte->action |= flow_act->action;
-			handle = add_rule_fte(fte, fg, dest, dest_num,
-					      old_action != flow_act->action);
-			if (IS_ERR(handle)) {
-				fte->action = old_action;
-				goto unlock_fte;
-			} else {
-				goto add_rules;
-			}
+		old_action = fte->action;
+		fte->action |= flow_act->action;
+		handle = add_rule_fte(fte, fg, dest, dest_num,
+				      old_action != flow_act->action);
+		if (IS_ERR(handle)) {
+			fte->action = old_action;
+			goto unlock_fte;
+		} else {
+			goto add_rules;
 		}
 		unlock_ref_node(&fte->node);
 	}
@@ -1257,6 +1338,7 @@ static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg,
 	handle = add_rule_fte(fte, fg, dest, dest_num, false);
 	if (IS_ERR(handle)) {
 		unlock_ref_node(&fte->node);
+		destroy_fte(fte, fg);
 		kfree(fte);
 		goto unlock_fg;
 	}
@@ -1332,6 +1414,10 @@ _mlx5_add_flow_rules(struct mlx5_flow_table *ft,
 	struct mlx5_flow_handle *rule;
 	int i;
 
+	if (!check_valid_mask(spec->match_criteria_enable,
+			      spec->match_criteria))
+		return ERR_PTR(-EINVAL);
+
 	for (i = 0; i < dest_num; i++) {
 		if (!dest_is_valid(&dest[i], flow_act->action, ft))
 			return ERR_PTR(-EINVAL);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index bfbc081d17dc..62709a3865d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -34,6 +34,7 @@
 #define _MLX5_FS_CORE_
 
 #include <linux/mlx5/fs.h>
+#include <linux/rhashtable.h>
 
 enum fs_node_type {
 	FS_TYPE_NAMESPACE,
@@ -168,6 +169,7 @@ struct fs_fte {
 	u32				modify_id;
 	enum fs_fte_status		status;
 	struct mlx5_fc			*counter;
+	struct rhash_head		hash;
 };
 
 /* Type of children is mlx5_flow_table/namespace */
@@ -197,6 +199,7 @@ struct mlx5_flow_group {
 	u32				start_index;
 	u32				max_ftes;
 	u32				id;
+	struct rhashtable		ftes_hash;
 };
 
 struct mlx5_flow_root_namespace {
-- 
2.13.0

^ permalink raw reply related

* [net-next 5/8] net/mlx5: Don't store reserved part in FTEs and FGs
From: Saeed Mahameed @ 2017-08-24 13:21 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Leon Romanovsky, Matan Barak, Saeed Mahameed
In-Reply-To: <20170824132152.29538-1-saeedm@mellanox.com>

From: Matan Barak <matanb@mellanox.com>

The current code stores fte_match_param in the software representation
of FTEs and FGs. fte_match_param contains a large reserved area at the
bottom of the struct. Since downstream patches are going to hash this
part, we would like to avoid doing so on a reserved part.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  8 --------
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h | 16 ++++++++++++++--
 include/linux/mlx5/device.h                       |  2 +-
 4 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 16b32f31d691..e0d0efd903bc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -263,7 +263,7 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev,
 	MLX5_SET(flow_context, in_flow_context, modify_header_id, fte->modify_id);
 	in_match_value = MLX5_ADDR_OF(flow_context, in_flow_context,
 				      match_value);
-	memcpy(in_match_value, &fte->val, MLX5_ST_SZ_BYTES(fte_match_param));
+	memcpy(in_match_value, &fte->val, sizeof(fte->val));
 
 	in_dests = MLX5_ADDR_OF(flow_context, in_flow_context, destination);
 	if (fte->action & MLX5_FLOW_CONTEXT_ACTION_FWD_DEST) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index af245fcdba70..986c5e50e872 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -371,21 +371,14 @@ static void del_rule(struct fs_node *node)
 	struct mlx5_flow_table *ft;
 	struct mlx5_flow_group *fg;
 	struct fs_fte *fte;
-	u32	*match_value;
 	int modify_mask;
 	struct mlx5_core_dev *dev = get_dev(node);
-	int match_len = MLX5_ST_SZ_BYTES(fte_match_param);
 	int err;
 	bool update_fte = false;
 
-	match_value = kvzalloc(match_len, GFP_KERNEL);
-	if (!match_value)
-		return;
-
 	fs_get_obj(rule, node);
 	fs_get_obj(fte, rule->node.parent);
 	fs_get_obj(fg, fte->node.parent);
-	memcpy(match_value, fte->val, sizeof(fte->val));
 	fs_get_obj(ft, fg->node.parent);
 	list_del(&rule->node.list);
 	if (rule->sw_action == MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_PRIO) {
@@ -415,7 +408,6 @@ static void del_rule(struct fs_node *node)
 				       "%s can't del rule fg id=%d fte_index=%d\n",
 				       __func__, fg->id, fte->index);
 	}
-	kvfree(match_value);
 }
 
 static void del_fte(struct fs_node *node)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index 5fbae885558f..bfbc081d17dc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -144,10 +144,22 @@ struct mlx5_fc {
 	struct mlx5_fc_cache cache ____cacheline_aligned_in_smp;
 };
 
+#define MLX5_FTE_MATCH_PARAM_RESERVED	reserved_at_600
+/* Calculate the fte_match_param length and without the reserved length.
+ * Make sure the reserved field is the last.
+ */
+#define MLX5_ST_SZ_DW_MATCH_PARAM					    \
+	((MLX5_BYTE_OFF(fte_match_param, MLX5_FTE_MATCH_PARAM_RESERVED) / sizeof(u32)) + \
+	 BUILD_BUG_ON_ZERO(MLX5_ST_SZ_BYTES(fte_match_param) !=		     \
+			   MLX5_FLD_SZ_BYTES(fte_match_param,		     \
+					     MLX5_FTE_MATCH_PARAM_RESERVED) +\
+			   MLX5_BYTE_OFF(fte_match_param,		     \
+					 MLX5_FTE_MATCH_PARAM_RESERVED)))
+
 /* Type of children is mlx5_flow_rule */
 struct fs_fte {
 	struct fs_node			node;
-	u32				val[MLX5_ST_SZ_DW(fte_match_param)];
+	u32				val[MLX5_ST_SZ_DW_MATCH_PARAM];
 	u32				dests_size;
 	u32				flow_tag;
 	u32				index;
@@ -175,7 +187,7 @@ struct mlx5_flow_namespace {
 
 struct mlx5_flow_group_mask {
 	u8	match_criteria_enable;
-	u32	match_criteria[MLX5_ST_SZ_DW(fte_match_param)];
+	u32	match_criteria[MLX5_ST_SZ_DW_MATCH_PARAM];
 };
 
 /* Type of children is fs_fte */
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index f31a0b5377e1..3c7442b56460 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -48,7 +48,7 @@
 /* helper macros */
 #define __mlx5_nullp(typ) ((struct mlx5_ifc_##typ##_bits *)0)
 #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
-#define __mlx5_bit_off(typ, fld) ((unsigned)(unsigned long)(&(__mlx5_nullp(typ)->fld)))
+#define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
 #define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
 #define __mlx5_64_off(typ, fld) (__mlx5_bit_off(typ, fld) / 64)
 #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
-- 
2.13.0

^ permalink raw reply related

* [net-next 2/8] net/mlx5: Remove a leftover unused variable
From: Saeed Mahameed @ 2017-08-24 13:21 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Leon Romanovsky, Gal Pressman, Saeed Mahameed
In-Reply-To: <20170824132152.29538-1-saeedm@mellanox.com>

From: Gal Pressman <galp@mellanox.com>

mlx5_core_wq is no longer being used and should be removed
from the code.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 include/linux/mlx5/driver.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index d26f18b39c4a..d5b6f6a9fcc5 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -890,8 +890,6 @@ static inline void *mlx5_buf_offset(struct mlx5_buf *buf, int offset)
 		return buf->direct.buf + offset;
 }
 
-extern struct workqueue_struct *mlx5_core_wq;
-
 #define STRUCT_FIELD(header, field) \
 	.struct_offset_bytes = offsetof(struct ib_unpacked_ ## header, field),      \
 	.struct_size_bytes   = sizeof((struct ib_unpacked_ ## header *)0)->field
-- 
2.13.0

^ permalink raw reply related

* [net-next 4/8] net/mlx5: Convert linear search for free index to ida
From: Saeed Mahameed @ 2017-08-24 13:21 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Leon Romanovsky, Matan Barak, Saeed Mahameed
In-Reply-To: <20170824132152.29538-1-saeedm@mellanox.com>

From: Matan Barak <matanb@mellanox.com>

When allocating a flow table entry, we need to allocate a free index
in the flow group. Currently, this is done by traversing the existing
flow table entries in the flow group, until a free index is found.
Replacing this by using a ida, which allows us to find a free index
much faster.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 56 ++++++++---------------
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |  2 +-
 2 files changed, 20 insertions(+), 38 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index e8690fe46bf2..af245fcdba70 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -360,6 +360,7 @@ static void del_flow_table(struct fs_node *node)
 	err = mlx5_cmd_destroy_flow_table(dev, ft);
 	if (err)
 		mlx5_core_warn(dev, "flow steering can't destroy ft\n");
+	ida_destroy(&ft->fte_allocator);
 	fs_get_obj(prio, ft->node.parent);
 	prio->num_ft--;
 }
@@ -437,8 +438,8 @@ static void del_fte(struct fs_node *node)
 			       "flow steering can't delete fte in index %d of flow group id %d\n",
 			       fte->index, fg->id);
 
+	ida_simple_remove(&ft->fte_allocator, fte->index);
 	fte->status = 0;
-	fg->num_ftes--;
 }
 
 static void del_flow_group(struct fs_node *node)
@@ -523,6 +524,7 @@ static struct mlx5_flow_table *alloc_flow_table(int level, u16 vport, int max_ft
 	ft->flags = flags;
 	INIT_LIST_HEAD(&ft->fwd_rules);
 	mutex_init(&ft->lock);
+	ida_init(&ft->fte_allocator);
 
 	return ft;
 }
@@ -839,6 +841,7 @@ static struct mlx5_flow_table *__mlx5_create_flow_table(struct mlx5_flow_namespa
 destroy_ft:
 	mlx5_cmd_destroy_flow_table(root->dev, ft);
 free_ft:
+	ida_destroy(&ft->fte_allocator);
 	kfree(ft);
 unlock_root:
 	mutex_unlock(&root->chain_lock);
@@ -1102,41 +1105,26 @@ add_rule_fte(struct fs_fte *fte,
 	return ERR_PTR(err);
 }
 
-/* Assumed fg is locked */
-static unsigned int get_free_fte_index(struct mlx5_flow_group *fg,
-				       struct list_head **prev)
-{
-	struct fs_fte *fte;
-	unsigned int start = fg->start_index;
-
-	if (prev)
-		*prev = &fg->node.children;
-
-	/* assumed list is sorted by index */
-	fs_for_each_fte(fte, fg) {
-		if (fte->index != start)
-			return start;
-		start++;
-		if (prev)
-			*prev = &fte->node.list;
-	}
-
-	return start;
-}
-
-/* prev is output, prev->next = new_fte */
 static struct fs_fte *create_fte(struct mlx5_flow_group *fg,
 				 u32 *match_value,
-				 struct mlx5_flow_act *flow_act,
-				 struct list_head **prev)
+				 struct mlx5_flow_act *flow_act)
 {
+	struct mlx5_flow_table *ft;
 	struct fs_fte *fte;
 	int index;
 
-	index = get_free_fte_index(fg, prev);
+	fs_get_obj(ft, fg->node.parent);
+	index = ida_simple_get(&ft->fte_allocator, fg->start_index,
+			       fg->start_index + fg->max_ftes,
+			       GFP_KERNEL);
+	if (index < 0)
+		return ERR_PTR(index);
+
 	fte = alloc_fte(flow_act, match_value, index);
-	if (IS_ERR(fte))
+	if (IS_ERR(fte)) {
+		ida_simple_remove(&ft->fte_allocator, index);
 		return fte;
+	}
 
 	return fte;
 }
@@ -1234,7 +1222,6 @@ static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg,
 {
 	struct mlx5_flow_handle *handle;
 	struct mlx5_flow_table *ft;
-	struct list_head *prev;
 	struct fs_fte *fte;
 	int i;
 
@@ -1267,12 +1254,8 @@ static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg,
 		unlock_ref_node(&fte->node);
 	}
 	fs_get_obj(ft, fg->node.parent);
-	if (fg->num_ftes >= fg->max_ftes) {
-		handle = ERR_PTR(-ENOSPC);
-		goto unlock_fg;
-	}
 
-	fte = create_fte(fg, match_value, flow_act, &prev);
+	fte = create_fte(fg, match_value, flow_act);
 	if (IS_ERR(fte)) {
 		handle = (void *)fte;
 		goto unlock_fg;
@@ -1286,10 +1269,9 @@ static struct mlx5_flow_handle *add_rule_fg(struct mlx5_flow_group *fg,
 		goto unlock_fg;
 	}
 
-	fg->num_ftes++;
-
 	tree_add_node(&fte->node, &fg->node);
-	list_add(&fte->node.list, prev);
+	/* fte list isn't sorted */
+	list_add_tail(&fte->node.list, &fg->node.children);
 add_rules:
 	for (i = 0; i < handle->num_rules; i++) {
 		if (atomic_read(&handle->rule[i]->node.refcount) == 1)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index 9fb5a333df52..5fbae885558f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -118,6 +118,7 @@ struct mlx5_flow_table {
 	/* FWD rules that point on this flow table */
 	struct list_head		fwd_rules;
 	u32				flags;
+	struct ida			fte_allocator;
 };
 
 struct mlx5_fc_cache {
@@ -183,7 +184,6 @@ struct mlx5_flow_group {
 	struct mlx5_flow_group_mask	mask;
 	u32				start_index;
 	u32				max_ftes;
-	u32				num_ftes;
 	u32				id;
 };
 
-- 
2.13.0

^ permalink raw reply related

* [net-next 1/8] net/mlx5: Add a blank line after declarations V2
From: Saeed Mahameed @ 2017-08-24 13:21 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Leon Romanovsky, Saeed Mahameed
In-Reply-To: <20170824132152.29538-1-saeedm@mellanox.com>

The blank line should be after u32 val = ...
and not after __be32 __iomem *addr = ...

Fixes: ad5b39a95c83 ("net/mlx5: Add a blank line after declarations")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reported-by: Joe Perches <joe@perches.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index a08027b8f3ce..02da96f98352 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -188,8 +188,8 @@ static enum mlx5_dev_event port_subtype_event(u8 subtype)
 static void eq_update_ci(struct mlx5_eq *eq, int arm)
 {
 	__be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
-
 	u32 val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
+
 	__raw_writel((__force u32)cpu_to_be32(val), addr);
 	/* We still want ordering, just not swabbing, so add a barrier */
 	mb();
-- 
2.13.0

^ permalink raw reply related

* [net-next 3/8] net/mlx5e: Fix wrong code indentation in conditional statement
From: Saeed Mahameed @ 2017-08-24 13:21 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Leon Romanovsky, Gal Pressman, Saeed Mahameed
In-Reply-To: <20170824132152.29538-1-saeedm@mellanox.com>

From: Gal Pressman <galp@mellanox.com>

Fix the following checkpatch warning in en_ethtool.c:
WARNING: suspect code indent for conditional statements (8, 9)
+	for (i = 0; i < NUM_PCIE_PERF_STALL_COUNTERS(priv); i++)
+	 strcpy(data + (idx++) * ETH_GSTRING_LEN,

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index c30cf6b4736f..0dd7e9caf150 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -254,8 +254,8 @@ static void mlx5e_fill_stats_strings(struct mlx5e_priv *priv, u8 *data)
 		       pcie_perf_stats_desc64[i].format);
 
 	for (i = 0; i < NUM_PCIE_PERF_STALL_COUNTERS(priv); i++)
-	 strcpy(data + (idx++) * ETH_GSTRING_LEN,
-		pcie_perf_stall_stats_desc[i].format);
+		strcpy(data + (idx++) * ETH_GSTRING_LEN,
+		       pcie_perf_stall_stats_desc[i].format);
 
 	for (prio = 0; prio < NUM_PPORT_PRIO; prio++) {
 		for (i = 0; i < NUM_PPORT_PER_PRIO_TRAFFIC_COUNTERS; i++)
-- 
2.13.0

^ permalink raw reply related

* [pull request][net-next 0/8] Mellanox, mlx5updates 2017-08-24
From: Saeed Mahameed @ 2017-08-24 13:21 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Leon Romanovsky, Saeed Mahameed

Hi Dave,


Tthe following changes provide updates to mlx5 core driver.

For more details please see tag log message below.
Please pull and let me know if there's any problem.

Thanks,
Saeed.

---

The following changes since commit a5e2da6e9787187ff104c34aa048419703c1f9cb:

  bpf: netdev is never null in __dev_map_flush (2017-08-23 22:43:40 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2017-08-24

for you to fetch changes up to 4c03e69ab1ef31877cef63575a1869b130f9c5ce:

  net/mlx5: Add tracepoints (2017-08-24 16:02:58 +0300)

----------------------------------------------------------------
mlx5-updates-2017-08-24

This series includes updates to mlx5 core driver.

>From Gal and Saeed, three cleanup patches.
>From Matan, Low level flow steering improvements and optimizations,
 - Use more efficient data structures for flow steering objects handling.
 - Add tracepoints to flow steering operations.
 - Overall these patches improve flow steering rule insertion rate by a
   factor of seven in large scales (~50K rules or more).

-Saeed.

----------------------------------------------------------------
Gal Pressman (2):
      net/mlx5: Remove a leftover unused variable
      net/mlx5e: Fix wrong code indentation in conditional statement

Matan Barak (5):
      net/mlx5: Convert linear search for free index to ida
      net/mlx5: Don't store reserved part in FTEs and FGs
      net/mlx5: Add hash table to search FTEs in a flow-group
      net/mlx5: Add hash table for flow groups in flow table
      net/mlx5: Add tracepoints

Saeed Mahameed (1):
      net/mlx5: Add a blank line after declarations V2

 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   5 +-
 .../net/ethernet/mellanox/mlx5/core/diag/Makefile  |   1 +
 .../mellanox/mlx5/core/diag/fs_tracepoint.c        | 261 ++++++++++++
 .../mellanox/mlx5/core/diag/fs_tracepoint.h        | 282 ++++++++++++
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c       |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 472 ++++++++++++++-------
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |  23 +-
 include/linux/mlx5/device.h                        |   2 +-
 include/linux/mlx5/driver.h                        |   2 -
 11 files changed, 900 insertions(+), 156 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/Makefile
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/diag/fs_tracepoint.h

^ permalink raw reply

* [PATCH net-next] net-next/hinic: Fix MTU limitation
From: Aviad Krawczyk @ 2017-08-24 13:21 UTC (permalink / raw)
  To: davem; +Cc: linux-kernel, netdev, zhaochen6, Aviad Krawczyk

Fix the hw MTU limitation by setting min/max_mtu

Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
---
 drivers/net/ethernet/huawei/hinic/hinic_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c b/drivers/net/ethernet/huawei/hinic/hinic_main.c
index ae7ad48..7a14963 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_main.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c
@@ -980,6 +980,9 @@ static int nic_dev_init(struct pci_dev *pdev)
 	hinic_hwdev_cb_register(nic_dev->hwdev, HINIC_MGMT_MSG_CMD_LINK_STATUS,
 				nic_dev, link_status_event_handler);
 
+	netdev->min_mtu = ETH_MIN_MTU;
+	netdev->max_mtu = ETH_MAX_MTU;
+
 	err = register_netdev(netdev);
 	if (err) {
 		dev_err(&pdev->dev, "Failed to register netdev\n");
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net-next 01/13] phy: add sgmii and 10gkr modes to the phy_mode enum
From: Andrew Lunn @ 2017-08-24 13:19 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem, kishon, jason, sebastian.hesselbarth, gregory.clement,
	thomas.petazzoni, nadavh, linux, linux-kernel, mw, stefanc,
	miquel.raynal, netdev
In-Reply-To: <20170824083823.16826-2-antoine.tenart@free-electrons.com>

On Thu, Aug 24, 2017 at 10:38:11AM +0200, Antoine Tenart wrote:
> This patch adds more PHY modes to the phy_mode enum, to allow
> configuring PHYs to the SGMII and/or the 10GKR mode by using the
> set_mode callback.

Hi Antoine

You had me confused for a while here. If there is need for a respin,
could you change the commit log and prefix PHY with either generic or
Ethernet, just to make it clear which PHY subsystem we are talking
about.

	Andrew

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_CHECKSUM: avoid bad offload warnings on GSO packets
From: Florian Westphal @ 2017-08-24 13:17 UTC (permalink / raw)
  To: Davide Caratti
  Cc: Michal Kubecek, Florian Westphal, Pablo Neira Ayuso,
	Jozsef Kadlecsik, netfilter-devel, coreteam, netdev, linux-kernel,
	Michael S. Tsirkin, Markos Chandras
In-Reply-To: <1503580122.2958.37.camel@redhat.com>

Davide Caratti <dcaratti@redhat.com> wrote:
> Small nit: may I suggest you to call skb_csum_hwoffload_help() instead of
> skb_checksum_help(), so that we avoid corrupting SCTP packets in case they
> hit xt_CHECKSUM target?

Alternatively we could restrict the target to udp only.

AFAIU the only reason this thing exists is to fix up udp checksum
for old dhcp clients that use AF_PACKET without evaluating the extra
metadata that indicates when a 'bad' checksum is in fact ok because it
is supposed to be filled in by hardware later.

This can happen in virtual environemnt when such skb is directly passed
to vm.

^ permalink raw reply

* Re: [PATCH v2 3/4] net: phy: realtek: add disable RX internal delay mode
From: Andrew Lunn @ 2017-08-24 13:12 UTC (permalink / raw)
  To: icenowy-h8G6r0blFSE
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Florian Fainelli,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-sunxi-/JYPxA39Uh5TLH3MbocFFw,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Chen-Yu Tsai, Corentin Labbe,
	Icenowy Zheng, Maxime Ripard,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <7c43220aef447e5cf21e48a9eca18535-h8G6r0blFSE@public.gmane.org>

> >Do we know that the TX lines do have a delay after making this change?
> >We need to be careful here. We will get into a mess if this is
> >actually RGMII not RGMII_TXID, and somebody designs a board which
> >needs RGMII-TXID.
> 
> In fact I'm not sure -- I'm even not sure that after making this change
> there's no TX delay. The change is totally not documented, so I choose
> to use a custom property in v1.

Can you get in touch with Realtek and ask? Or can you get a hardware
guy to do some measurements?

Thanks
    Andrew

^ permalink raw reply

* Re: [PATCH] tipc: Fix tipc_sk_reinit handling of -EAGAIN
From: Herbert Xu @ 2017-08-24 13:10 UTC (permalink / raw)
  To: Bob Peterson; +Cc: Phil Sutter, davem, netdev, LKML
In-Reply-To: <25145358.1155254.1503499382947.JavaMail.zimbra@redhat.com>

On Wed, Aug 23, 2017 at 10:43:02AM -0400, Bob Peterson wrote:
> Hi,
> 
> In 9dbbfb0ab6680c6a85609041011484e6658e7d3c function tipc_sk_reinit
> had additional logic added to loop in the event that function
> rhashtable_walk_next() returned -EAGAIN. No worries.
> 
> However, if rhashtable_walk_start returns -EAGAIN, it does "continue",
> and therefore skips the call to rhashtable_walk_stop(). That has
> the effect of calling rcu_read_lock() without its paired call to
> rcu_read_unlock(). Since rcu_read_lock() may be nested, the problem
> may not be apparent for a while, especially since resize events may
> be rare. But the comments to rhashtable_walk_start() state:
> 
>  * ...Note that we take the RCU lock in all
>  * cases including when we return an error.  So you must always call
>  * rhashtable_walk_stop to clean up.
> 
> This patch replaces the continue with a goto and label to ensure a
> matching call to rhashtable_walk_stop().
> 
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks for catching this!
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_CHECKSUM: avoid bad offload warnings on GSO packets
From: Davide Caratti @ 2017-08-24 13:08 UTC (permalink / raw)
  To: Michal Kubecek, Florian Westphal
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, netfilter-devel, coreteam,
	netdev, linux-kernel, Michael S. Tsirkin, Markos Chandras
In-Reply-To: <20170824110742.qby3yoz3emf6pr5i@unicorn.suse.cz>

On Thu, 2017-08-24 at 13:07 +0200, Michal Kubecek wrote:
> On Thu, Aug 24, 2017 at 12:51:18PM +0200, Florian Westphal wrote:
> > Michal Kubecek <mkubecek@suse.cz> wrote:
> > > When --checksum_fill action is applied to a GSO packet, checksum_tg() calls
> > > skb_checksum_help() which is only meant to be applied to non-GSO packets so
> > > that it issues a warning.
> > > 
> > > This can be easily triggered by using e.g.
> > > 
> > >   iptables -t mangle -A OUTPUT -j CHECKSUM --checksum-fill
> > > 
> > > and sending TCP stream via a device with GSO enabled.
> > > 
> > > While this can be considered a misconfiguration, I believe the bad offload
> > > warning is supposed to catch bugs in drivers and networking stack, not
> > > misconfigured firewalls. So let's ignore such packets and only issue a one
> > > time warning with pr_warn_once() rather than a WARN with stack trace and
> > > tainted kernel.
> > 
> > Why issue a warning at all?
> > What kind of action should be taken upon seeing such warning?
> 
> Check and fix the configuration. The reason why I left at least some
> kind of warning is that the module does something that is unexpected as
> the checksum is not calculated (this module is often used in
> virtualization environments where "hardware checksum offload" in fact
> means the checksum is not computed at all).
> 

hello Michal,

GSO should be capable of computing the checksum on individual segments
later, so I also think the warning can be removed.

Small nit: may I suggest you to call skb_csum_hwoffload_help() instead of
skb_checksum_help(), so that we avoid corrupting SCTP packets in case they
hit xt_CHECKSUM target?

thank you in advance,
regards
-- 
davide

^ permalink raw reply

* Re: [net-next 10/15] net/mlx5: Add a blank line after declarations
From: Saeed Mahameed @ 2017-08-24 12:58 UTC (permalink / raw)
  To: Joe Perches
  Cc: Saeed Mahameed, David S. Miller, Linux Netdev List,
	Leon Romanovsky, Or Gerlitz
In-Reply-To: <1503527941.27867.2.camel@perches.com>

On Thu, Aug 24, 2017 at 1:39 AM, Joe Perches <joe@perches.com> wrote:
> On Thu, 2017-08-17 at 16:29 +0300, Saeed Mahameed wrote:
>> From: Or Gerlitz <ogerlitz@mellanox.com>
> []
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> []
>> @@ -188,6 +188,7 @@ static enum mlx5_dev_event port_subtype_event(u8 subtype)
>>  static void eq_update_ci(struct mlx5_eq *eq, int arm)
>>  {
>>       __be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
>> +
>>       u32 val = (eq->cons_index & 0xffffff) | (eq->eqn << 24);
>>       __raw_writel((__force u32)cpu_to_be32(val), addr);
>>       /* We still want ordering, just not swabbing, so add a barrier */
>
> checkpatch is stupid.
>
> The blank line should be after u32 val = ...
> and not after __be32 __iomem *addr = ...
>
>

You are completely right ! will fix this, Thanks !

^ permalink raw reply

* Re: [PATCH net-next] hinic: uninitialized variable in hinic_api_cmd_init()
From: Aviad Krawczyk @ 2017-08-24 12:24 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: netdev, kernel-janitors
In-Reply-To: <20170824104739.gw422ufydwmiu4gi@mwanda>

On 8/24/2017 1:47 PM, Dan Carpenter wrote:
> We never set the error code in this function.
> 
> Fixes: eabf0fad81d5 ("net-next/hinic: Initialize api cmd resources")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> 
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
> index 8901801fe426..c40603a183df 100644
> --- a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
> @@ -941,6 +941,7 @@ int hinic_api_cmd_init(struct hinic_api_cmd_chain **chain,
>  		if (IS_ERR(chain[chain_type])) {
>  			dev_err(&pdev->dev, "Failed to create chain %d\n",
>  				chain_type);
> +			err = PTR_ERR(chain[chain_type]);
>  			goto err_create_chain;
>  		}
>  	}
> 
> 

Thanks a lot, Reviewed-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_CHECKSUM: avoid bad offload warnings on GSO packets
From: Michal Kubecek @ 2017-08-24 11:07 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, netfilter-devel, coreteam,
	netdev, linux-kernel, Michael S. Tsirkin, Markos Chandras
In-Reply-To: <20170824105118.GA15739@breakpoint.cc>

On Thu, Aug 24, 2017 at 12:51:18PM +0200, Florian Westphal wrote:
> Michal Kubecek <mkubecek@suse.cz> wrote:
> > When --checksum_fill action is applied to a GSO packet, checksum_tg() calls
> > skb_checksum_help() which is only meant to be applied to non-GSO packets so
> > that it issues a warning.
> > 
> > This can be easily triggered by using e.g.
> > 
> >   iptables -t mangle -A OUTPUT -j CHECKSUM --checksum-fill
> > 
> > and sending TCP stream via a device with GSO enabled.
> > 
> > While this can be considered a misconfiguration, I believe the bad offload
> > warning is supposed to catch bugs in drivers and networking stack, not
> > misconfigured firewalls. So let's ignore such packets and only issue a one
> > time warning with pr_warn_once() rather than a WARN with stack trace and
> > tainted kernel.
> 
> Why issue a warning at all?
> What kind of action should be taken upon seeing such warning?

Check and fix the configuration. The reason why I left at least some
kind of warning is that the module does something that is unexpected as
the checksum is not calculated (this module is often used in
virtualization environments where "hardware checksum offload" in fact
means the checksum is not computed at all).

But maybe it would suffice to add a note in iptables-extensions(8) man
page explicitely saying that it doesn't work with GSO packets (and is of
questionable usefulness for TCP in general).

                                                         Michal Kubecek

^ permalink raw reply

* Re: [EXT] Re: [PATCH net-next 10/18] net: mvpp2: use the GoP interrupt for link status changes
From: Antoine Tenart @ 2017-08-24 10:59 UTC (permalink / raw)
  To: Marcin Wojtas
  Cc: Antoine Tenart, Stefan Chulski, Andrew Lunn, davem@davemloft.net,
	jason@lakedaemon.net, gregory.clement@free-electrons.com,
	sebastian.hesselbarth@gmail.com,
	thomas.petazzoni@free-electrons.com, Nadav Haklai,
	linux@armlinux.org.uk, netdev@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <CAPv3WKcSFcd4kBS_vpTsbf789xsLatuT2TBfZZbiu16sRauk7A@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1965 bytes --]

Hi Marcin,

On Wed, Aug 23, 2017 at 11:05:33PM +0200, Marcin Wojtas wrote:
> 2017-08-23 18:04 GMT+02:00 Antoine Tenart <antoine.tenart@free-electrons.com>:
> > On Wed, Aug 23, 2017 at 03:24:55PM +0000, Stefan Chulski wrote:
> >> > When the cable is connected (there is signal) and the serdes is in sync and AN
> >> > succeeded.
> >> >
> >> > > With SFF/SFP ports, you generally need a gpio line the fibre module
> >> > > can use to indicate if it has link. Fixed-phy has such support, and
> >> > > your link_change function will get called when the link changes.
> >> >
> >> > So that would work when using SFP modules but I wonder if the GoP irq isn't
> >> > needed when using passive cable, in which case this patch would still be needed
> >> > (and of course we should support the new Russell phylib capabilities).
> >>
> >> Even if new phylib driver supports passive direct cables connection,
> >> GoP IRQ required for SOHO/Peridot external switch support.
> >> SOHO/Peridot external switch could be connected directly to Serdes line.
> >
> > So I guess the GoP link irq patches are needed. Should I resend them
> > then?
> >
> > We'll have to discuss how to handle fixed-phy vs GoP IRQ, but I guess we
> > can do this when adding the fixed-phy support later.
> >
> 
> Please check mvneta.c - you can find there coexistence of normal
> libphy support, fixed phy and link irq for the inband management mode.
> IMO it's exactly, what you need here.

From what I see it should be pretty easy to support phy, fixed-phy and
the GoP irq without breaking anything: if no phy is found in the dt,
call of_phy_register_fixed_link() and if the fixed-phy property isn't
found either use the GoP link IRQ. This way all the possibilities would
be supported.

So we should be able to support fixed-phy in a future series.

Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_CHECKSUM: avoid bad offload warnings on GSO packets
From: Florian Westphal @ 2017-08-24 10:51 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	netfilter-devel, coreteam, netdev, linux-kernel,
	Michael S. Tsirkin, Markos Chandras
In-Reply-To: <20170824104824.2C318A0F3A@unicorn.suse.cz>

Michal Kubecek <mkubecek@suse.cz> wrote:
> When --checksum_fill action is applied to a GSO packet, checksum_tg() calls
> skb_checksum_help() which is only meant to be applied to non-GSO packets so
> that it issues a warning.
> 
> This can be easily triggered by using e.g.
> 
>   iptables -t mangle -A OUTPUT -j CHECKSUM --checksum-fill
> 
> and sending TCP stream via a device with GSO enabled.
> 
> While this can be considered a misconfiguration, I believe the bad offload
> warning is supposed to catch bugs in drivers and networking stack, not
> misconfigured firewalls. So let's ignore such packets and only issue a one
> time warning with pr_warn_once() rather than a WARN with stack trace and
> tainted kernel.

Why issue a warning at all?
What kind of action should be taken upon seeing such warning?

^ permalink raw reply

* [PATCH nf-next] netfilter: xt_CHECKSUM: avoid bad offload warnings on GSO packets
From: Michal Kubecek @ 2017-08-24 10:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jozsef Kadlecsik, Florian Westphal, netfilter-devel, coreteam,
	netdev, linux-kernel, Michael S. Tsirkin, Markos Chandras

When --checksum_fill action is applied to a GSO packet, checksum_tg() calls
skb_checksum_help() which is only meant to be applied to non-GSO packets so
that it issues a warning.

This can be easily triggered by using e.g.

  iptables -t mangle -A OUTPUT -j CHECKSUM --checksum-fill

and sending TCP stream via a device with GSO enabled.

While this can be considered a misconfiguration, I believe the bad offload
warning is supposed to catch bugs in drivers and networking stack, not
misconfigured firewalls. So let's ignore such packets and only issue a one
time warning with pr_warn_once() rather than a WARN with stack trace and
tainted kernel.

Reported-by: Markos Chandras <markos.chandras@suse.com>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
 net/netfilter/xt_CHECKSUM.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/xt_CHECKSUM.c b/net/netfilter/xt_CHECKSUM.c
index 0f642ef8cd26..9a007153ba7f 100644
--- a/net/netfilter/xt_CHECKSUM.c
+++ b/net/netfilter/xt_CHECKSUM.c
@@ -25,8 +25,12 @@ MODULE_ALIAS("ip6t_CHECKSUM");
 static unsigned int
 checksum_tg(struct sk_buff *skb, const struct xt_action_param *par)
 {
-	if (skb->ip_summed == CHECKSUM_PARTIAL)
-		skb_checksum_help(skb);
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		if (unlikely(skb_is_gso(skb)))
+			pr_warn_once("cannot mangle checksum of a GSO packet\n");
+		else
+			skb_checksum_help(skb);
+	}
 
 	return XT_CONTINUE;
 }
-- 
2.14.1


^ permalink raw reply related

* [PATCH net-next] hinic: uninitialized variable in hinic_api_cmd_init()
From: Dan Carpenter @ 2017-08-24 10:47 UTC (permalink / raw)
  To: Aviad Krawczyk; +Cc: netdev, kernel-janitors

We never set the error code in this function.

Fixes: eabf0fad81d5 ("net-next/hinic: Initialize api cmd resources")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
index 8901801fe426..c40603a183df 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_api_cmd.c
@@ -941,6 +941,7 @@ int hinic_api_cmd_init(struct hinic_api_cmd_chain **chain,
 		if (IS_ERR(chain[chain_type])) {
 			dev_err(&pdev->dev, "Failed to create chain %d\n",
 				chain_type);
+			err = PTR_ERR(chain[chain_type]);
 			goto err_create_chain;
 		}
 	}

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox