netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [pull request][net 00/12] mlx5 fixes 2022-12-28
@ 2022-12-28 19:43 Saeed Mahameed
  2022-12-28 19:43 ` [net 01/12] net/mlx5: E-Switch, properly handle ingress tagged packets on VST Saeed Mahameed
                   ` (11 more replies)
  0 siblings, 12 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-12-28 19:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan

From: Saeed Mahameed <saeedm@nvidia.com>

This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.

Thanks,
Saeed.


The following changes since commit 40cab44b9089a41f71bbd0eff753eb91d5dafd68:

  net/sched: fix retpoline wrapper compilation on configs without tc filters (2022-12-28 12:11:32 +0000)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2022-12-28

for you to fetch changes up to 4d1c1379d71777ddeda3e54f8fc26e9ecbfd1009:

  net/mlx5: Lag, fix failure to cancel delayed bond work (2022-12-28 11:38:51 -0800)

----------------------------------------------------------------
mlx5-fixes-2022-12-28

----------------------------------------------------------------
Adham Faris (1):
      net/mlx5e: Fix hw mtu initializing at XDP SQ allocation

Chris Mi (2):
      net/mlx5e: CT: Fix ct debugfs folder name
      net/mlx5e: Always clear dest encap in neigh-update-del

Dragos Tatulea (1):
      net/mlx5e: IPoIB, Don't allow CQE compression to be turned on by default

Eli Cohen (1):
      net/mlx5: Lag, fix failure to cancel delayed bond work

Jiri Pirko (1):
      net/mlx5: Add forgotten cleanup calls into mlx5_init_once() error path

Maor Dickman (1):
      net/mlx5e: Set geneve_tlv_option_0_exist when matching on geneve option

Moshe Shemesh (1):
      net/mlx5: E-Switch, properly handle ingress tagged packets on VST

Shay Drory (3):
      net/mlx5: Fix io_eq_size and event_eq_size params validation
      net/mlx5: Avoid recovery in probe flows
      net/mlx5: Fix RoCE setting at HCA level

Tariq Toukan (1):
      net/mlx5e: Fix RX reporter for XSK RQs

 drivers/net/ethernet/mellanox/mlx5/core/devlink.c  |  4 +--
 .../ethernet/mellanox/mlx5/core/en/reporter_rx.c   |  6 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c |  7 +----
 .../ethernet/mellanox/mlx5/core/en/tc_tun_encap.c  |  9 +++++-
 .../ethernet/mellanox/mlx5/core/en/tc_tun_geneve.c |  5 ++++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  2 +-
 .../mellanox/mlx5/core/esw/acl/egress_lgcy.c       |  7 ++++-
 .../mellanox/mlx5/core/esw/acl/ingress_lgcy.c      | 33 ++++++++++++++++++----
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  | 30 ++++++++++++++------
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  6 ++++
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |  6 ++++
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  |  4 +++
 drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c  |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c     |  4 ++-
 include/linux/mlx5/device.h                        |  5 ++++
 include/linux/mlx5/mlx5_ifc.h                      |  3 +-
 16 files changed, 104 insertions(+), 28 deletions(-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [net 01/12] net/mlx5: E-Switch, properly handle ingress tagged packets on VST
  2022-12-28 19:43 [pull request][net 00/12] mlx5 fixes 2022-12-28 Saeed Mahameed
@ 2022-12-28 19:43 ` Saeed Mahameed
  2022-12-30  7:40   ` patchwork-bot+netdevbpf
  2022-12-28 19:43 ` [net 02/12] net/mlx5: Add forgotten cleanup calls into mlx5_init_once() error path Saeed Mahameed
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 17+ messages in thread
From: Saeed Mahameed @ 2022-12-28 19:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Moshe Shemesh, Mark Bloch

From: Moshe Shemesh <moshe@nvidia.com>

Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already
present in the frame. Previous VST mode behavior was to drop packets or
override existing tag, depending on the device version.

In this patch we fix this behavior by correctly building the HW steering
rule with a push vlan action, or for older devices we ask the FW to stack
the vlan when a vlan is already present.

Fixes: 07bab9502641 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: dfcb1ed3c331 ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../mellanox/mlx5/core/esw/acl/egress_lgcy.c  |  7 +++-
 .../mellanox/mlx5/core/esw/acl/ingress_lgcy.c | 33 ++++++++++++++++---
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 30 ++++++++++++-----
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  6 ++++
 include/linux/mlx5/device.h                   |  5 +++
 include/linux/mlx5/mlx5_ifc.h                 |  3 +-
 6 files changed, 68 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c
index 60a73990017c..6b4c9ffad95b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c
@@ -67,6 +67,7 @@ static void esw_acl_egress_lgcy_groups_destroy(struct mlx5_vport *vport)
 int esw_acl_egress_lgcy_setup(struct mlx5_eswitch *esw,
 			      struct mlx5_vport *vport)
 {
+	bool vst_mode_steering = esw_vst_mode_is_steering(esw);
 	struct mlx5_flow_destination drop_ctr_dst = {};
 	struct mlx5_flow_destination *dst = NULL;
 	struct mlx5_fc *drop_counter = NULL;
@@ -77,6 +78,7 @@ int esw_acl_egress_lgcy_setup(struct mlx5_eswitch *esw,
 	 */
 	int table_size = 2;
 	int dest_num = 0;
+	int actions_flag;
 	int err = 0;
 
 	if (vport->egress.legacy.drop_counter) {
@@ -119,8 +121,11 @@ int esw_acl_egress_lgcy_setup(struct mlx5_eswitch *esw,
 		  vport->vport, vport->info.vlan, vport->info.qos);
 
 	/* Allowed vlan rule */
+	actions_flag = MLX5_FLOW_CONTEXT_ACTION_ALLOW;
+	if (vst_mode_steering)
+		actions_flag |= MLX5_FLOW_CONTEXT_ACTION_VLAN_POP;
 	err = esw_egress_acl_vlan_create(esw, vport, NULL, vport->info.vlan,
-					 MLX5_FLOW_CONTEXT_ACTION_ALLOW);
+					 actions_flag);
 	if (err)
 		goto out;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c
index b1a5199260f6..093ed86a0acd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c
@@ -139,11 +139,14 @@ static void esw_acl_ingress_lgcy_groups_destroy(struct mlx5_vport *vport)
 int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw,
 			       struct mlx5_vport *vport)
 {
+	bool vst_mode_steering = esw_vst_mode_is_steering(esw);
 	struct mlx5_flow_destination drop_ctr_dst = {};
 	struct mlx5_flow_destination *dst = NULL;
 	struct mlx5_flow_act flow_act = {};
 	struct mlx5_flow_spec *spec = NULL;
 	struct mlx5_fc *counter = NULL;
+	bool vst_check_cvlan = false;
+	bool vst_push_cvlan = false;
 	/* The ingress acl table contains 4 groups
 	 * (2 active rules at the same time -
 	 *      1 allow rule from one of the first 3 groups.
@@ -203,7 +206,26 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw,
 		goto out;
 	}
 
-	if (vport->info.vlan || vport->info.qos)
+	if ((vport->info.vlan || vport->info.qos)) {
+		if (vst_mode_steering)
+			vst_push_cvlan = true;
+		else if (!MLX5_CAP_ESW(esw->dev, vport_cvlan_insert_always))
+			vst_check_cvlan = true;
+	}
+
+	if (vst_check_cvlan || vport->info.spoofchk)
+		spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
+
+	/* Create ingress allow rule */
+	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_ALLOW;
+	if (vst_push_cvlan) {
+		flow_act.action |= MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH;
+		flow_act.vlan[0].prio = vport->info.qos;
+		flow_act.vlan[0].vid = vport->info.vlan;
+		flow_act.vlan[0].ethtype = ETH_P_8021Q;
+	}
+
+	if (vst_check_cvlan)
 		MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria,
 				 outer_headers.cvlan_tag);
 
@@ -218,9 +240,6 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw,
 		ether_addr_copy(smac_v, vport->info.mac);
 	}
 
-	/* Create ingress allow rule */
-	spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
-	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_ALLOW;
 	vport->ingress.allow_rule = mlx5_add_flow_rules(vport->ingress.acl, spec,
 							&flow_act, NULL, 0);
 	if (IS_ERR(vport->ingress.allow_rule)) {
@@ -232,6 +251,9 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw,
 		goto out;
 	}
 
+	if (!vst_check_cvlan && !vport->info.spoofchk)
+		goto out;
+
 	memset(&flow_act, 0, sizeof(flow_act));
 	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_DROP;
 	/* Attach drop flow counter */
@@ -257,7 +279,8 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw,
 	return 0;
 
 out:
-	esw_acl_ingress_lgcy_cleanup(esw, vport);
+	if (err)
+		esw_acl_ingress_lgcy_cleanup(esw, vport);
 	kvfree(spec);
 	return err;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 527e4bffda8d..0dfd5742c6fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -161,10 +161,17 @@ static int modify_esw_vport_cvlan(struct mlx5_core_dev *dev, u16 vport,
 			 esw_vport_context.vport_cvlan_strip, 1);
 
 	if (set_flags & SET_VLAN_INSERT) {
-		/* insert only if no vlan in packet */
-		MLX5_SET(modify_esw_vport_context_in, in,
-			 esw_vport_context.vport_cvlan_insert, 1);
-
+		if (MLX5_CAP_ESW(dev, vport_cvlan_insert_always)) {
+			/* insert either if vlan exist in packet or not */
+			MLX5_SET(modify_esw_vport_context_in, in,
+				 esw_vport_context.vport_cvlan_insert,
+				 MLX5_VPORT_CVLAN_INSERT_ALWAYS);
+		} else {
+			/* insert only if no vlan in packet */
+			MLX5_SET(modify_esw_vport_context_in, in,
+				 esw_vport_context.vport_cvlan_insert,
+				 MLX5_VPORT_CVLAN_INSERT_WHEN_NO_CVLAN);
+		}
 		MLX5_SET(modify_esw_vport_context_in, in,
 			 esw_vport_context.cvlan_pcp, qos);
 		MLX5_SET(modify_esw_vport_context_in, in,
@@ -809,6 +816,7 @@ static int mlx5_esw_vport_caps_get(struct mlx5_eswitch *esw, struct mlx5_vport *
 
 static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
 {
+	bool vst_mode_steering = esw_vst_mode_is_steering(esw);
 	u16 vport_num = vport->vport;
 	int flags;
 	int err;
@@ -839,8 +847,9 @@ static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
 
 	flags = (vport->info.vlan || vport->info.qos) ?
 		SET_VLAN_STRIP | SET_VLAN_INSERT : 0;
-	modify_esw_vport_cvlan(esw->dev, vport_num, vport->info.vlan,
-			       vport->info.qos, flags);
+	if (esw->mode == MLX5_ESWITCH_OFFLOADS || !vst_mode_steering)
+		modify_esw_vport_cvlan(esw->dev, vport_num, vport->info.vlan,
+				       vport->info.qos, flags);
 
 	return 0;
 
@@ -1848,6 +1857,7 @@ int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
 				  u16 vport, u16 vlan, u8 qos, u8 set_flags)
 {
 	struct mlx5_vport *evport = mlx5_eswitch_get_vport(esw, vport);
+	bool vst_mode_steering = esw_vst_mode_is_steering(esw);
 	int err = 0;
 
 	if (IS_ERR(evport))
@@ -1855,9 +1865,11 @@ int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
 	if (vlan > 4095 || qos > 7)
 		return -EINVAL;
 
-	err = modify_esw_vport_cvlan(esw->dev, vport, vlan, qos, set_flags);
-	if (err)
-		return err;
+	if (esw->mode == MLX5_ESWITCH_OFFLOADS || !vst_mode_steering) {
+		err = modify_esw_vport_cvlan(esw->dev, vport, vlan, qos, set_flags);
+		if (err)
+			return err;
+	}
 
 	evport->info.vlan = vlan;
 	evport->info.qos = qos;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 5a85a5d32be7..92644fbb5081 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -527,6 +527,12 @@ int mlx5_eswitch_del_vlan_action(struct mlx5_eswitch *esw,
 int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
 				  u16 vport, u16 vlan, u8 qos, u8 set_flags);
 
+static inline bool esw_vst_mode_is_steering(struct mlx5_eswitch *esw)
+{
+	return (MLX5_CAP_ESW_EGRESS_ACL(esw->dev, pop_vlan) &&
+		MLX5_CAP_ESW_INGRESS_ACL(esw->dev, push_vlan));
+}
+
 static inline bool mlx5_eswitch_vlan_actions_supported(struct mlx5_core_dev *dev,
 						       u8 vlan_depth)
 {
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 5fe5d198b57a..29d4b201c7b2 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1090,6 +1090,11 @@ enum {
 	MLX5_VPORT_ADMIN_STATE_AUTO  = 0x2,
 };
 
+enum {
+	MLX5_VPORT_CVLAN_INSERT_WHEN_NO_CVLAN  = 0x1,
+	MLX5_VPORT_CVLAN_INSERT_ALWAYS         = 0x3,
+};
+
 enum {
 	MLX5_L3_PROT_TYPE_IPV4		= 0,
 	MLX5_L3_PROT_TYPE_IPV6		= 1,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index f3d1c62c98dd..a9ee7bc59c90 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -913,7 +913,8 @@ struct mlx5_ifc_e_switch_cap_bits {
 	u8         vport_svlan_insert[0x1];
 	u8         vport_cvlan_insert_if_not_exist[0x1];
 	u8         vport_cvlan_insert_overwrite[0x1];
-	u8         reserved_at_5[0x2];
+	u8         reserved_at_5[0x1];
+	u8         vport_cvlan_insert_always[0x1];
 	u8         esw_shared_ingress_acl[0x1];
 	u8         esw_uplink_ingress_acl[0x1];
 	u8         root_ft_on_other_esw[0x1];
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net 02/12] net/mlx5: Add forgotten cleanup calls into mlx5_init_once() error path
  2022-12-28 19:43 [pull request][net 00/12] mlx5 fixes 2022-12-28 Saeed Mahameed
  2022-12-28 19:43 ` [net 01/12] net/mlx5: E-Switch, properly handle ingress tagged packets on VST Saeed Mahameed
@ 2022-12-28 19:43 ` Saeed Mahameed
  2022-12-28 19:43 ` [net 03/12] net/mlx5: Fix io_eq_size and event_eq_size params validation Saeed Mahameed
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-12-28 19:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Jiri Pirko

From: Jiri Pirko <jiri@nvidia.com>

There are two cleanup calls missing in mlx5_init_once() error path.
Add them making the error path flow to be the same as
mlx5_cleanup_once().

Fixes: 52ec462eca9b ("net/mlx5: Add reserved-gids support")
Fixes: 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 7f5db13e3550..ec5652f31dda 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1050,6 +1050,8 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
 err_tables_cleanup:
 	mlx5_geneve_destroy(dev->geneve);
 	mlx5_vxlan_destroy(dev->vxlan);
+	mlx5_cleanup_clock(dev);
+	mlx5_cleanup_reserved_gids(dev);
 	mlx5_cq_debugfs_cleanup(dev);
 	mlx5_fw_reset_cleanup(dev);
 err_events_cleanup:
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net 03/12] net/mlx5: Fix io_eq_size and event_eq_size params validation
  2022-12-28 19:43 [pull request][net 00/12] mlx5 fixes 2022-12-28 Saeed Mahameed
  2022-12-28 19:43 ` [net 01/12] net/mlx5: E-Switch, properly handle ingress tagged packets on VST Saeed Mahameed
  2022-12-28 19:43 ` [net 02/12] net/mlx5: Add forgotten cleanup calls into mlx5_init_once() error path Saeed Mahameed
@ 2022-12-28 19:43 ` Saeed Mahameed
  2022-12-28 19:43 ` [net 04/12] net/mlx5: Avoid recovery in probe flows Saeed Mahameed
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-12-28 19:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

From: Shay Drory <shayd@nvidia.com>

io_eq_size and event_eq_size params are of param type
DEVLINK_PARAM_TYPE_U32. But, the validation callback is addressing them
as DEVLINK_PARAM_TYPE_U16.

This cause mismatch in validation in big-endian systems, in which
values in range were rejected while 268500991 was accepted.
Fix it by checking the U32 value in the validation callback.

Fixes: 0844fa5f7b89 ("net/mlx5: Let user configure io_eq_size param")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index ddb197970c22..be59bb35d795 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -563,7 +563,7 @@ static int mlx5_devlink_eq_depth_validate(struct devlink *devlink, u32 id,
 					  union devlink_param_value val,
 					  struct netlink_ext_ack *extack)
 {
-	return (val.vu16 >= 64 && val.vu16 <= 4096) ? 0 : -EINVAL;
+	return (val.vu32 >= 64 && val.vu32 <= 4096) ? 0 : -EINVAL;
 }
 
 static const struct devlink_param mlx5_devlink_params[] = {
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net 04/12] net/mlx5: Avoid recovery in probe flows
  2022-12-28 19:43 [pull request][net 00/12] mlx5 fixes 2022-12-28 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2022-12-28 19:43 ` [net 03/12] net/mlx5: Fix io_eq_size and event_eq_size params validation Saeed Mahameed
@ 2022-12-28 19:43 ` Saeed Mahameed
  2022-12-29  6:33   ` Leon Romanovsky
  2022-12-28 19:43 ` [net 05/12] net/mlx5: Fix RoCE setting at HCA level Saeed Mahameed
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 17+ messages in thread
From: Saeed Mahameed @ 2022-12-28 19:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

From: Shay Drory <shayd@nvidia.com>

Currently, recovery is done without considering whether the device is
still in probe flow.
This may lead to recovery before device have finished probed
successfully. e.g.: while mlx5_init_one() is running. Recovery flow is
using functionality that is loaded only by mlx5_init_one(), and there
is no point in running recovery without mlx5_init_one() finished
successfully.

Fix it by waiting for probe flow to finish and checking whether the
device is probed before trying to perform recovery.

Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/health.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 86ed87d704f7..96417c5feed7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -674,6 +674,12 @@ static void mlx5_fw_fatal_reporter_err_work(struct work_struct *work)
 	dev = container_of(priv, struct mlx5_core_dev, priv);
 	devlink = priv_to_devlink(dev);
 
+	mutex_lock(&dev->intf_state_mutex);
+	if (test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags)) {
+		mlx5_core_err(dev, "health works are not permitted at this stage\n");
+		return;
+	}
+	mutex_unlock(&dev->intf_state_mutex);
 	enter_error_state(dev, false);
 	if (IS_ERR_OR_NULL(health->fw_fatal_reporter)) {
 		devl_lock(devlink);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net 05/12] net/mlx5: Fix RoCE setting at HCA level
  2022-12-28 19:43 [pull request][net 00/12] mlx5 fixes 2022-12-28 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2022-12-28 19:43 ` [net 04/12] net/mlx5: Avoid recovery in probe flows Saeed Mahameed
@ 2022-12-28 19:43 ` Saeed Mahameed
  2022-12-28 19:43 ` [net 06/12] net/mlx5e: IPoIB, Don't allow CQE compression to be turned on by default Saeed Mahameed
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-12-28 19:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

From: Shay Drory <shayd@nvidia.com>

mlx5 PF can disable RoCE for its VFs and SFs. In such case RoCE is
marked as unsupported on those VFs/SFs.
The cited patch added an option for disable (and enable) RoCE at HCA
level. However, that commit didn't check whether RoCE is supported on
the HCA and enabled user to try and set RoCE to on.
Fix it by checking whether the HCA supports RoCE.

Fixes: fbfa97b4d79f ("net/mlx5: Disable roce at HCA level")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index be59bb35d795..5bd83c0275f8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -468,7 +468,7 @@ static int mlx5_devlink_enable_roce_validate(struct devlink *devlink, u32 id,
 	bool new_state = val.vbool;
 
 	if (new_state && !MLX5_CAP_GEN(dev, roce) &&
-	    !MLX5_CAP_GEN(dev, roce_rw_supported)) {
+	    !(MLX5_CAP_GEN(dev, roce_rw_supported) && MLX5_CAP_GEN_MAX(dev, roce))) {
 		NL_SET_ERR_MSG_MOD(extack, "Device doesn't support RoCE");
 		return -EOPNOTSUPP;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index ec5652f31dda..df134f6d32dc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -613,7 +613,7 @@ static int handle_hca_cap(struct mlx5_core_dev *dev, void *set_ctx)
 		MLX5_SET(cmd_hca_cap, set_hca_cap, num_total_dynamic_vf_msix,
 			 MLX5_CAP_GEN_MAX(dev, num_total_dynamic_vf_msix));
 
-	if (MLX5_CAP_GEN(dev, roce_rw_supported))
+	if (MLX5_CAP_GEN(dev, roce_rw_supported) && MLX5_CAP_GEN_MAX(dev, roce))
 		MLX5_SET(cmd_hca_cap, set_hca_cap, roce,
 			 mlx5_is_roce_on(dev));
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net 06/12] net/mlx5e: IPoIB, Don't allow CQE compression to be turned on by default
  2022-12-28 19:43 [pull request][net 00/12] mlx5 fixes 2022-12-28 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2022-12-28 19:43 ` [net 05/12] net/mlx5: Fix RoCE setting at HCA level Saeed Mahameed
@ 2022-12-28 19:43 ` Saeed Mahameed
  2022-12-28 19:43 ` [net 07/12] net/mlx5e: Fix RX reporter for XSK RQs Saeed Mahameed
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-12-28 19:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Dragos Tatulea,
	Gal Pressman

From: Dragos Tatulea <dtatulea@nvidia.com>

mlx5e_build_nic_params will turn CQE compression on if the hardware
capability is enabled and the slow_pci_heuristic condition is detected.
As IPoIB doesn't support CQE compression, make sure to disable the
feature in the IPoIB profile init.

Please note that the feature is not exposed to the user for IPoIB
interfaces, so it can't be subsequently turned on.

Fixes: b797a684b0dd ("net/mlx5e: Enable CQE compression when PCI is slower than link")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index 7c5c500fd215..2c73c8445e63 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -71,6 +71,10 @@ static void mlx5i_build_nic_params(struct mlx5_core_dev *mdev,
 	params->packet_merge.type = MLX5E_PACKET_MERGE_NONE;
 	params->hard_mtu = MLX5_IB_GRH_BYTES + MLX5_IPOIB_HARD_LEN;
 	params->tunneled_offload_en = false;
+
+	/* CQE compression is not supported for IPoIB */
+	params->rx_cqe_compress_def = false;
+	MLX5E_SET_PFLAG(params, MLX5E_PFLAG_RX_CQE_COMPRESS, params->rx_cqe_compress_def);
 }
 
 /* Called directly after IPoIB netdevice was created to initialize SW structs */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net 07/12] net/mlx5e: Fix RX reporter for XSK RQs
  2022-12-28 19:43 [pull request][net 00/12] mlx5 fixes 2022-12-28 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2022-12-28 19:43 ` [net 06/12] net/mlx5e: IPoIB, Don't allow CQE compression to be turned on by default Saeed Mahameed
@ 2022-12-28 19:43 ` Saeed Mahameed
  2022-12-28 19:43 ` [net 08/12] net/mlx5e: CT: Fix ct debugfs folder name Saeed Mahameed
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-12-28 19:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman

From: Tariq Toukan <tariqt@nvidia.com>

RX reporter mistakenly reads from the regular (inactive) RQ
when XSK RQ is active. Fix it here.

Fixes: 3db4c85cde7a ("net/mlx5e: xsk: Use queue indices starting from 0 for XSK queues")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
index 5f6f95ad6888..1ae15b8536a8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -459,7 +459,11 @@ static int mlx5e_rx_reporter_diagnose(struct devlink_health_reporter *reporter,
 		goto unlock;
 
 	for (i = 0; i < priv->channels.num; i++) {
-		struct mlx5e_rq *rq = &priv->channels.c[i]->rq;
+		struct mlx5e_channel *c = priv->channels.c[i];
+		struct mlx5e_rq *rq;
+
+		rq = test_bit(MLX5E_CHANNEL_STATE_XSK, c->state) ?
+			&c->xskrq : &c->rq;
 
 		err = mlx5e_rx_reporter_build_diagnose_output(rq, fmsg);
 		if (err)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net 08/12] net/mlx5e: CT: Fix ct debugfs folder name
  2022-12-28 19:43 [pull request][net 00/12] mlx5 fixes 2022-12-28 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2022-12-28 19:43 ` [net 07/12] net/mlx5e: Fix RX reporter for XSK RQs Saeed Mahameed
@ 2022-12-28 19:43 ` Saeed Mahameed
  2022-12-28 19:43 ` [net 09/12] net/mlx5e: Always clear dest encap in neigh-update-del Saeed Mahameed
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-12-28 19:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Chris Mi, Roi Dayan

From: Chris Mi <cmi@nvidia.com>

Need to use sprintf to build a string instead of sscanf. Otherwise
dirname is null and both "ct_nic" and "ct_fdb" won't be created.
But its redundant anyway as driver could be in switchdev mode but
still add nic rules. So use "ct" as folder name.

Fixes: 77422a8f6f61 ("net/mlx5e: CT: Add ct driver counters")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index a69849e0deed..313df8232db7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -2103,14 +2103,9 @@ mlx5_tc_ct_init_check_support(struct mlx5e_priv *priv,
 static void
 mlx5_ct_tc_create_dbgfs(struct mlx5_tc_ct_priv *ct_priv)
 {
-	bool is_fdb = ct_priv->ns_type == MLX5_FLOW_NAMESPACE_FDB;
 	struct mlx5_tc_ct_debugfs *ct_dbgfs = &ct_priv->debugfs;
-	char dirname[16] = {};
 
-	if (sscanf(dirname, "ct_%s", is_fdb ? "fdb" : "nic") < 0)
-		return;
-
-	ct_dbgfs->root = debugfs_create_dir(dirname, mlx5_debugfs_get_dev_root(ct_priv->dev));
+	ct_dbgfs->root = debugfs_create_dir("ct", mlx5_debugfs_get_dev_root(ct_priv->dev));
 	debugfs_create_atomic_t("offloaded", 0400, ct_dbgfs->root,
 				&ct_dbgfs->stats.offloaded);
 	debugfs_create_atomic_t("rx_dropped", 0400, ct_dbgfs->root,
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net 09/12] net/mlx5e: Always clear dest encap in neigh-update-del
  2022-12-28 19:43 [pull request][net 00/12] mlx5 fixes 2022-12-28 Saeed Mahameed
                   ` (7 preceding siblings ...)
  2022-12-28 19:43 ` [net 08/12] net/mlx5e: CT: Fix ct debugfs folder name Saeed Mahameed
@ 2022-12-28 19:43 ` Saeed Mahameed
  2022-12-28 19:43 ` [net 10/12] net/mlx5e: Fix hw mtu initializing at XDP SQ allocation Saeed Mahameed
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-12-28 19:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Chris Mi, Roi Dayan

From: Chris Mi <cmi@nvidia.com>

The cited commit introduced a bug for multiple encapsulations flow.
If one dest encap becomes invalid, the flow is set slow path flag.
But when other dests encap become invalid, they are not cleared due
to slow path flag of the flow. When neigh-update-add is running, it
will use invalid encap.

Fix it by checking slow path flag after clearing dest encap.

Fixes: 9a5f9cc794e1 ("net/mlx5e: Fix possible use-after-free deleting fdb rule")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c    | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c
index ff73d25bc6eb..2aaf8ab857b8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c
@@ -222,7 +222,7 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 	int err;
 
 	list_for_each_entry(flow, flow_list, tmp_list) {
-		if (!mlx5e_is_offloaded_flow(flow) || flow_flag_test(flow, SLOW))
+		if (!mlx5e_is_offloaded_flow(flow))
 			continue;
 
 		attr = mlx5e_tc_get_encap_attr(flow);
@@ -231,6 +231,13 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv,
 		esw_attr->dests[flow->tmp_entry_index].flags &= ~MLX5_ESW_DEST_ENCAP_VALID;
 		esw_attr->dests[flow->tmp_entry_index].pkt_reformat = NULL;
 
+		/* Clear pkt_reformat before checking slow path flag. Because
+		 * in next iteration, the same flow is already set slow path
+		 * flag, but still need to clear the pkt_reformat.
+		 */
+		if (flow_flag_test(flow, SLOW))
+			continue;
+
 		/* update from encap rule to slow path rule */
 		spec = &flow->attr->parse_attr->spec;
 		rule = mlx5e_tc_offload_to_slow_path(esw, flow, spec);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net 10/12] net/mlx5e: Fix hw mtu initializing at XDP SQ allocation
  2022-12-28 19:43 [pull request][net 00/12] mlx5 fixes 2022-12-28 Saeed Mahameed
                   ` (8 preceding siblings ...)
  2022-12-28 19:43 ` [net 09/12] net/mlx5e: Always clear dest encap in neigh-update-del Saeed Mahameed
@ 2022-12-28 19:43 ` Saeed Mahameed
  2022-12-28 19:43 ` [net 11/12] net/mlx5e: Set geneve_tlv_option_0_exist when matching on geneve option Saeed Mahameed
  2022-12-28 19:43 ` [net 12/12] net/mlx5: Lag, fix failure to cancel delayed bond work Saeed Mahameed
  11 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-12-28 19:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Adham Faris

From: Adham Faris <afaris@nvidia.com>

Current xdp xmit functions logic (mlx5e_xmit_xdp_frame_mpwqe or
mlx5e_xmit_xdp_frame), validates xdp packet length by comparing it to
hw mtu (configured at xdp sq allocation) before xmiting it. This check
does not account for ethernet fcs length (calculated and filled by the
nic). Hence, when we try sending packets with length > (hw-mtu -
ethernet-fcs-size), the device port drops it and tx_errors_phy is
incremented. Desired behavior is to catch these packets and drop them
by the driver.

Fix this behavior in XDP SQ allocation function (mlx5e_alloc_xdpsq) by
subtracting ethernet FCS header size (4 Bytes) from current hw mtu
value, since ethernet FCS is calculated and written to ethernet frames
by the nic.

Fixes: d8bec2b29a82 ("net/mlx5e: Support bpf_xdp_adjust_head()")
Signed-off-by: Adham Faris <afaris@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8d36e2de53a9..cff5f2e29e1e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1305,7 +1305,7 @@ static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c,
 	sq->channel   = c;
 	sq->uar_map   = mdev->mlx5e_res.hw_objs.bfreg.map;
 	sq->min_inline_mode = params->tx_min_inline_mode;
-	sq->hw_mtu    = MLX5E_SW2HW_MTU(params, params->sw_mtu);
+	sq->hw_mtu    = MLX5E_SW2HW_MTU(params, params->sw_mtu) - ETH_FCS_LEN;
 	sq->xsk_pool  = xsk_pool;
 
 	sq->stats = sq->xsk_pool ?
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net 11/12] net/mlx5e: Set geneve_tlv_option_0_exist when matching on geneve option
  2022-12-28 19:43 [pull request][net 00/12] mlx5 fixes 2022-12-28 Saeed Mahameed
                   ` (9 preceding siblings ...)
  2022-12-28 19:43 ` [net 10/12] net/mlx5e: Fix hw mtu initializing at XDP SQ allocation Saeed Mahameed
@ 2022-12-28 19:43 ` Saeed Mahameed
  2022-12-28 19:43 ` [net 12/12] net/mlx5: Lag, fix failure to cancel delayed bond work Saeed Mahameed
  11 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-12-28 19:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Maor Dickman, Roi Dayan

From: Maor Dickman <maord@nvidia.com>

The cited patch added support of matching on geneve option by setting
geneve_tlv_option_0_data mask and key but didn't set geneve_tlv_option_0_exist
bit which is required on some HWs when matching geneve_tlv_option_0_data parameter,
this may cause in some cases for packets to wrongly match on rules with different
geneve option.

Example of such case is packet with geneve_tlv_object class=789 and data=456
will wrongly match on rule with match geneve_tlv_object class=123 and data=456.

Fix it by setting geneve_tlv_option_0_exist bit when supported by the HW when matching
on geneve_tlv_option_0_data parameter.

Fixes: 9272e3df3023 ("net/mlx5e: Geneve, Add support for encap/decap flows offload")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_geneve.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_geneve.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_geneve.c
index f5b26f5a7de4..054d80c4e65c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_geneve.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_geneve.c
@@ -273,6 +273,11 @@ static int mlx5e_tc_tun_parse_geneve_options(struct mlx5e_priv *priv,
 		 geneve_tlv_option_0_data, be32_to_cpu(opt_data_key));
 	MLX5_SET(fte_match_set_misc3, misc_3_c,
 		 geneve_tlv_option_0_data, be32_to_cpu(opt_data_mask));
+	if (MLX5_CAP_ESW_FLOWTABLE_FDB(priv->mdev,
+				       ft_field_support.geneve_tlv_option_0_exist)) {
+		MLX5_SET_TO_ONES(fte_match_set_misc, misc_c, geneve_tlv_option_0_exist);
+		MLX5_SET_TO_ONES(fte_match_set_misc, misc_v, geneve_tlv_option_0_exist);
+	}
 
 	spec->match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS_3;
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [net 12/12] net/mlx5: Lag, fix failure to cancel delayed bond work
  2022-12-28 19:43 [pull request][net 00/12] mlx5 fixes 2022-12-28 Saeed Mahameed
                   ` (10 preceding siblings ...)
  2022-12-28 19:43 ` [net 11/12] net/mlx5e: Set geneve_tlv_option_0_exist when matching on geneve option Saeed Mahameed
@ 2022-12-28 19:43 ` Saeed Mahameed
  11 siblings, 0 replies; 17+ messages in thread
From: Saeed Mahameed @ 2022-12-28 19:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Eli Cohen, Maor Dickman

From: Eli Cohen <elic@nvidia.com>

Commit 0d4e8ed139d8 ("net/mlx5: Lag, avoid lockdep warnings")
accidentally removed a call to cancel delayed bond work thus it may
cause queued delay to expire and fall on an already destroyed work
queue.

Fix by restoring the call cancel_delayed_work_sync() before
destroying the workqueue.

This prevents call trace such as this:

[  329.230417] BUG: kernel NULL pointer dereference, address: 0000000000000000
 [  329.231444] #PF: supervisor write access in kernel mode
 [  329.232233] #PF: error_code(0x0002) - not-present page
 [  329.233007] PGD 0 P4D 0
 [  329.233476] Oops: 0002 [#1] SMP
 [  329.234012] CPU: 5 PID: 145 Comm: kworker/u20:4 Tainted: G OE      6.0.0-rc5_mlnx #1
 [  329.235282] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 [  329.236868] Workqueue: mlx5_cmd_0000:08:00.1 cmd_work_handler [mlx5_core]
 [  329.237886] RIP: 0010:_raw_spin_lock+0xc/0x20
 [  329.238585] Code: f0 0f b1 17 75 02 f3 c3 89 c6 e9 6f 3c 5f ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 02 f3 c3 89 c6 e9 45 3c 5f ff 0f 1f 44 00 00 0f 1f
 [  329.241156] RSP: 0018:ffffc900001b0e98 EFLAGS: 00010046
 [  329.241940] RAX: 0000000000000000 RBX: ffffffff82374ae0 RCX: 0000000000000000
 [  329.242954] RDX: 0000000000000001 RSI: 0000000000000014 RDI: 0000000000000000
 [  329.243974] RBP: ffff888106ccf000 R08: ffff8881004000c8 R09: ffff888100400000
 [  329.244990] R10: 0000000000000000 R11: ffffffff826669f8 R12: 0000000000002000
 [  329.246009] R13: 0000000000000005 R14: ffff888100aa7ce0 R15: ffff88852ca80000
 [  329.247030] FS:  0000000000000000(0000) GS:ffff88852ca80000(0000) knlGS:0000000000000000
 [  329.248260] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [  329.249111] CR2: 0000000000000000 CR3: 000000016d675001 CR4: 0000000000770ee0
 [  329.250133] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [  329.251152] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [  329.252176] PKRU: 55555554

Fixes: 0d4e8ed139d8 ("net/mlx5: Lag, avoid lockdep warnings")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index 32c3e0a649a7..ad32b80e8501 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -228,6 +228,7 @@ static void mlx5_ldev_free(struct kref *ref)
 	if (ldev->nb.notifier_call)
 		unregister_netdevice_notifier_net(&init_net, &ldev->nb);
 	mlx5_lag_mp_cleanup(ldev);
+	cancel_delayed_work_sync(&ldev->bond_work);
 	destroy_workqueue(ldev->wq);
 	mlx5_lag_mpesw_cleanup(ldev);
 	mutex_destroy(&ldev->lock);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [net 04/12] net/mlx5: Avoid recovery in probe flows
  2022-12-28 19:43 ` [net 04/12] net/mlx5: Avoid recovery in probe flows Saeed Mahameed
@ 2022-12-29  6:33   ` Leon Romanovsky
  2022-12-29 18:29     ` Saeed Mahameed
  0 siblings, 1 reply; 17+ messages in thread
From: Leon Romanovsky @ 2022-12-29  6:33 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On Wed, Dec 28, 2022 at 11:43:23AM -0800, Saeed Mahameed wrote:
> From: Shay Drory <shayd@nvidia.com>
> 
> Currently, recovery is done without considering whether the device is
> still in probe flow.
> This may lead to recovery before device have finished probed
> successfully. e.g.: while mlx5_init_one() is running. Recovery flow is
> using functionality that is loaded only by mlx5_init_one(), and there
> is no point in running recovery without mlx5_init_one() finished
> successfully.
> 
> Fix it by waiting for probe flow to finish and checking whether the
> device is probed before trying to perform recovery.
> 
> Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling")
> Signed-off-by: Shay Drory <shayd@nvidia.com>
> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/health.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> index 86ed87d704f7..96417c5feed7 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> @@ -674,6 +674,12 @@ static void mlx5_fw_fatal_reporter_err_work(struct work_struct *work)
>  	dev = container_of(priv, struct mlx5_core_dev, priv);
>  	devlink = priv_to_devlink(dev);
>  
> +	mutex_lock(&dev->intf_state_mutex);
> +	if (test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags)) {
> +		mlx5_core_err(dev, "health works are not permitted at this stage\n");
> +		return;
> +	}

This bit is already checked when health recovery is queued in mlx5_trigger_health_work().

  764 void mlx5_trigger_health_work(struct mlx5_core_dev *dev)
  765 {
  766         struct mlx5_core_health *health = &dev->priv.health;
  767         unsigned long flags;
  768
  769         spin_lock_irqsave(&health->wq_lock, flags);
  770         if (!test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags))
  771                 queue_work(health->wq, &health->fatal_report_work);
  772         else
  773                 mlx5_core_err(dev, "new health works are not permitted at this stage\n");
  774         spin_unlock_irqrestore(&health->wq_lock, flags);
  775 }

You probably need to elevate this check to poll_health() routine and
change intf_state_mutex to be spinlock.

Or another solution is to start health polling only when init complete.

Thanks


> +	mutex_unlock(&dev->intf_state_mutex);
>  	enter_error_state(dev, false);
>  	if (IS_ERR_OR_NULL(health->fw_fatal_reporter)) {
>  		devl_lock(devlink);
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net 04/12] net/mlx5: Avoid recovery in probe flows
  2022-12-29  6:33   ` Leon Romanovsky
@ 2022-12-29 18:29     ` Saeed Mahameed
  2023-01-01  6:52       ` Leon Romanovsky
  0 siblings, 1 reply; 17+ messages in thread
From: Saeed Mahameed @ 2022-12-29 18:29 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On 29 Dec 08:33, Leon Romanovsky wrote:
>On Wed, Dec 28, 2022 at 11:43:23AM -0800, Saeed Mahameed wrote:
>> From: Shay Drory <shayd@nvidia.com>
>>
>> Currently, recovery is done without considering whether the device is
>> still in probe flow.
>> This may lead to recovery before device have finished probed
>> successfully. e.g.: while mlx5_init_one() is running. Recovery flow is
>> using functionality that is loaded only by mlx5_init_one(), and there
>> is no point in running recovery without mlx5_init_one() finished
>> successfully.
>>
>> Fix it by waiting for probe flow to finish and checking whether the
>> device is probed before trying to perform recovery.
>>
>> Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling")
>> Signed-off-by: Shay Drory <shayd@nvidia.com>
>> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
>> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
>> ---
>>  drivers/net/ethernet/mellanox/mlx5/core/health.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
>> index 86ed87d704f7..96417c5feed7 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
>> @@ -674,6 +674,12 @@ static void mlx5_fw_fatal_reporter_err_work(struct work_struct *work)
>>  	dev = container_of(priv, struct mlx5_core_dev, priv);
>>  	devlink = priv_to_devlink(dev);
>>
>> +	mutex_lock(&dev->intf_state_mutex);
>> +	if (test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags)) {
>> +		mlx5_core_err(dev, "health works are not permitted at this stage\n");
>> +		return;
>> +	}
>
>This bit is already checked when health recovery is queued in mlx5_trigger_health_work().
>
>  764 void mlx5_trigger_health_work(struct mlx5_core_dev *dev)
>  765 {
>  766         struct mlx5_core_health *health = &dev->priv.health;
>  767         unsigned long flags;
>  768
>  769         spin_lock_irqsave(&health->wq_lock, flags);
>  770         if (!test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags))
>  771                 queue_work(health->wq, &health->fatal_report_work);
>  772         else
>  773                 mlx5_core_err(dev, "new health works are not permitted at this stage\n");
>  774         spin_unlock_irqrestore(&health->wq_lock, flags);
>  775 }
>
>You probably need to elevate this check to poll_health() routine and
>change intf_state_mutex to be spinlock.

not possible, big design change to the driver..

>
>Or another solution is to start health polling only when init complete.
>

Also very complex and very risky to do in rc.
Health poll should be running on dynamic driver reloads,
for example devlink reload, but not on first probe.. 
if we are going to start after probe then we will have to stop (sync) any
health work before .remove, which is a locking nightmare.. we've been there
before.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net 01/12] net/mlx5: E-Switch, properly handle ingress tagged packets on VST
  2022-12-28 19:43 ` [net 01/12] net/mlx5: E-Switch, properly handle ingress tagged packets on VST Saeed Mahameed
@ 2022-12-30  7:40   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 17+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-12-30  7:40 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: davem, kuba, pabeni, edumazet, saeedm, netdev, tariqt, moshe,
	mbloch

Hello:

This series was applied to netdev/net.git (master)
by Saeed Mahameed <saeedm@nvidia.com>:

On Wed, 28 Dec 2022 11:43:20 -0800 you wrote:
> From: Moshe Shemesh <moshe@nvidia.com>
> 
> Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already
> present in the frame. Previous VST mode behavior was to drop packets or
> override existing tag, depending on the device version.
> 
> In this patch we fix this behavior by correctly building the HW steering
> rule with a push vlan action, or for older devices we ask the FW to stack
> the vlan when a vlan is already present.
> 
> [...]

Here is the summary with links:
  - [net,01/12] net/mlx5: E-Switch, properly handle ingress tagged packets on VST
    https://git.kernel.org/netdev/net/c/1f0ae22ab470
  - [net,02/12] net/mlx5: Add forgotten cleanup calls into mlx5_init_once() error path
    https://git.kernel.org/netdev/net/c/2a35b2c2e6a2
  - [net,03/12] net/mlx5: Fix io_eq_size and event_eq_size params validation
    https://git.kernel.org/netdev/net/c/44aee8ea15ac
  - [net,04/12] net/mlx5: Avoid recovery in probe flows
    https://git.kernel.org/netdev/net/c/9078e843efec
  - [net,05/12] net/mlx5: Fix RoCE setting at HCA level
    https://git.kernel.org/netdev/net/c/c4ad5f2bdad5
  - [net,06/12] net/mlx5e: IPoIB, Don't allow CQE compression to be turned on by default
    https://git.kernel.org/netdev/net/c/b12d581e83e3
  - [net,07/12] net/mlx5e: Fix RX reporter for XSK RQs
    https://git.kernel.org/netdev/net/c/f8c18a5749cf
  - [net,08/12] net/mlx5e: CT: Fix ct debugfs folder name
    https://git.kernel.org/netdev/net/c/849190e3e4cc
  - [net,09/12] net/mlx5e: Always clear dest encap in neigh-update-del
    https://git.kernel.org/netdev/net/c/2951b2e142ec
  - [net,10/12] net/mlx5e: Fix hw mtu initializing at XDP SQ allocation
    https://git.kernel.org/netdev/net/c/1e267ab88dc4
  - [net,11/12] net/mlx5e: Set geneve_tlv_option_0_exist when matching on geneve option
    https://git.kernel.org/netdev/net/c/e54638a8380b
  - [net,12/12] net/mlx5: Lag, fix failure to cancel delayed bond work
    https://git.kernel.org/netdev/net/c/4d1c1379d717

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [net 04/12] net/mlx5: Avoid recovery in probe flows
  2022-12-29 18:29     ` Saeed Mahameed
@ 2023-01-01  6:52       ` Leon Romanovsky
  0 siblings, 0 replies; 17+ messages in thread
From: Leon Romanovsky @ 2023-01-01  6:52 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Shay Drory, Moshe Shemesh

On Thu, Dec 29, 2022 at 10:29:58AM -0800, Saeed Mahameed wrote:
> On 29 Dec 08:33, Leon Romanovsky wrote:
> > On Wed, Dec 28, 2022 at 11:43:23AM -0800, Saeed Mahameed wrote:
> > > From: Shay Drory <shayd@nvidia.com>
> > > 
> > > Currently, recovery is done without considering whether the device is
> > > still in probe flow.
> > > This may lead to recovery before device have finished probed
> > > successfully. e.g.: while mlx5_init_one() is running. Recovery flow is
> > > using functionality that is loaded only by mlx5_init_one(), and there
> > > is no point in running recovery without mlx5_init_one() finished
> > > successfully.
> > > 
> > > Fix it by waiting for probe flow to finish and checking whether the
> > > device is probed before trying to perform recovery.
> > > 
> > > Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling")
> > > Signed-off-by: Shay Drory <shayd@nvidia.com>
> > > Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
> > > Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
> > > ---
> > >  drivers/net/ethernet/mellanox/mlx5/core/health.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > > index 86ed87d704f7..96417c5feed7 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > > @@ -674,6 +674,12 @@ static void mlx5_fw_fatal_reporter_err_work(struct work_struct *work)
> > >  	dev = container_of(priv, struct mlx5_core_dev, priv);
> > >  	devlink = priv_to_devlink(dev);
> > > 
> > > +	mutex_lock(&dev->intf_state_mutex);
> > > +	if (test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags)) {
> > > +		mlx5_core_err(dev, "health works are not permitted at this stage\n");
> > > +		return;
> > > +	}
> > 

<...>

> > Or another solution is to start health polling only when init complete.
> > 
> 
> Also very complex and very risky to do in rc.
> Health poll should be running on dynamic driver reloads,
> for example devlink reload, but not on first probe.. if we are going to
> start after probe then we will have to stop (sync) any
> health work before .remove, which is a locking nightmare.. we've been there
> before.

I afraid that my proposed solution distracted you. The real issue is
that this patch can't be correct.

Let's focus on MLX5_DROP_NEW_HEALTH_WORK bit. It is checked while holding different
locks, so one of the locks is wrong and not needed.

If MLX5_DROP_NEW_HEALTH_WORK bit can't be changed after/during queuing the work, the newly
added check in mlx5_fw_fatal_reporter_err_work will be redundant.

If MLX5_DROP_NEW_HEALTH_WORK bit can be changed after queuing the work. the check is racy and
can have different results immediately after releasing intf_state_mutex.

Thanks

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-01-01  6:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-28 19:43 [pull request][net 00/12] mlx5 fixes 2022-12-28 Saeed Mahameed
2022-12-28 19:43 ` [net 01/12] net/mlx5: E-Switch, properly handle ingress tagged packets on VST Saeed Mahameed
2022-12-30  7:40   ` patchwork-bot+netdevbpf
2022-12-28 19:43 ` [net 02/12] net/mlx5: Add forgotten cleanup calls into mlx5_init_once() error path Saeed Mahameed
2022-12-28 19:43 ` [net 03/12] net/mlx5: Fix io_eq_size and event_eq_size params validation Saeed Mahameed
2022-12-28 19:43 ` [net 04/12] net/mlx5: Avoid recovery in probe flows Saeed Mahameed
2022-12-29  6:33   ` Leon Romanovsky
2022-12-29 18:29     ` Saeed Mahameed
2023-01-01  6:52       ` Leon Romanovsky
2022-12-28 19:43 ` [net 05/12] net/mlx5: Fix RoCE setting at HCA level Saeed Mahameed
2022-12-28 19:43 ` [net 06/12] net/mlx5e: IPoIB, Don't allow CQE compression to be turned on by default Saeed Mahameed
2022-12-28 19:43 ` [net 07/12] net/mlx5e: Fix RX reporter for XSK RQs Saeed Mahameed
2022-12-28 19:43 ` [net 08/12] net/mlx5e: CT: Fix ct debugfs folder name Saeed Mahameed
2022-12-28 19:43 ` [net 09/12] net/mlx5e: Always clear dest encap in neigh-update-del Saeed Mahameed
2022-12-28 19:43 ` [net 10/12] net/mlx5e: Fix hw mtu initializing at XDP SQ allocation Saeed Mahameed
2022-12-28 19:43 ` [net 11/12] net/mlx5e: Set geneve_tlv_option_0_exist when matching on geneve option Saeed Mahameed
2022-12-28 19:43 ` [net 12/12] net/mlx5: Lag, fix failure to cancel delayed bond work Saeed Mahameed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).