netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28
@ 2017-06-30  7:12 Saeed Mahameed
  2017-06-30  7:12 ` [net 1/3] net/mlx5: Fix driver load error flow when firmware is stuck Saeed Mahameed
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Saeed Mahameed @ 2017-06-30  7:12 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Saeed Mahameed

This series contains some fixes for the mlx5 core and netdev driver.

Please pull and let me know if there's any problem.

For -stable:
("net/mlx5e: Fix TX carrier errors report in get stats ndo") Kernels >= v4.7

("net/mlx5: Cancel delayed recovery work when unloading the driver") Kernels >= v4.10
* When applied to net-next this will introduce a contextual conflict, it
should be easy to resolve, (a spin_lock was changed to spin_lock_irqsave in net-next),
if you need any help with this please let me know.

("net/mlx5: Fix driver load error flow when firmware is stuck") Kernels >= v4.4*
* This patch fixes: 6c780a0267b8 ("net/mlx5: Wait for FW readiness before initializing command interface")
which was submitted two weeks ago and queued up for v4.4.

Sorry about the mess, but other than the above, this series doesn't introduce
any conflict with the current mlx5 IPSec offload series.

Thanks,
Saeed.

---

The following changes since commit d747a7a51b00984127a88113cdbbc26f91e9d815:

  tcp: reset sk_rx_dst in tcp_disconnect() (2017-06-25 12:23:07 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2017-06-28

for you to fetch changes up to 8ff93de7668bd81bc8efa819d1184ebd48fae72d:

  net/mlx5e: Fix TX carrier errors report in get stats ndo (2017-06-27 14:49:57 +0300)

----------------------------------------------------------------
mlx5-fixes-2017-06-28

----------------------------------------------------------------
Gal Pressman (2):
      net/mlx5: Fix driver load error flow when firmware is stuck
      net/mlx5e: Fix TX carrier errors report in get stats ndo

Mohamad Haj Yahia (1):
      net/mlx5: Cancel delayed recovery work when unloading the driver

 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  2 --
 drivers/net/ethernet/mellanox/mlx5/core/health.c  | 15 ++++++++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/main.c    |  4 ++--
 include/linux/mlx5/driver.h                       |  1 +
 4 files changed, 17 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [net 1/3] net/mlx5: Fix driver load error flow when firmware is stuck
  2017-06-30  7:12 [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28 Saeed Mahameed
@ 2017-06-30  7:12 ` Saeed Mahameed
  2017-06-30  7:12 ` [net 2/3] net/mlx5: Cancel delayed recovery work when unloading the driver Saeed Mahameed
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Saeed Mahameed @ 2017-06-30  7:12 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Gal Pressman, Saeed Mahameed

From: Gal Pressman <galp@mellanox.com>

When wait for firmware init fails, previous code would mistakenly
return success and cause inconsistency in the driver state.

Fixes: 6c780a0267b8 ("net/mlx5: Wait for FW readiness before initializing command interface")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 13be264587f1..fd47b5134841 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1020,7 +1020,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	if (err) {
 		dev_err(&dev->pdev->dev, "Firmware over %d MS in pre-initializing state, aborting\n",
 			FW_PRE_INIT_TIMEOUT_MILI);
-		goto out;
+		goto out_err;
 	}
 
 	err = mlx5_cmd_init(dev);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [net 2/3] net/mlx5: Cancel delayed recovery work when unloading the driver
  2017-06-30  7:12 [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28 Saeed Mahameed
  2017-06-30  7:12 ` [net 1/3] net/mlx5: Fix driver load error flow when firmware is stuck Saeed Mahameed
@ 2017-06-30  7:12 ` Saeed Mahameed
  2017-06-30  7:12 ` [net 3/3] net/mlx5e: Fix TX carrier errors report in get stats ndo Saeed Mahameed
  2017-07-01 21:17 ` [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28 David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: Saeed Mahameed @ 2017-06-30  7:12 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Mohamad Haj Yahia, Moshe Shemesh, Saeed Mahameed

From: Mohamad Haj Yahia <mohamad@mellanox.com>

Draining the health workqueue will ignore future health works including
the one that report hardware failure and thus we can't enter error state
Instead cancel the recovery flow and make sure only recovery flow won't
be scheduled.

Fixes: 5e44fca50470 ('net/mlx5: Only cancel recovery work when cleaning up device')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/health.c | 15 ++++++++++++++-
 drivers/net/ethernet/mellanox/mlx5/core/main.c   |  2 +-
 include/linux/mlx5/driver.h                      |  1 +
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index f27f84ffbc85..8a8b5f0e497c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -67,6 +67,7 @@ enum {
 
 enum {
 	MLX5_DROP_NEW_HEALTH_WORK,
+	MLX5_DROP_NEW_RECOVERY_WORK,
 };
 
 static u8 get_nic_state(struct mlx5_core_dev *dev)
@@ -193,7 +194,7 @@ static void health_care(struct work_struct *work)
 	mlx5_handle_bad_state(dev);
 
 	spin_lock(&health->wq_lock);
-	if (!test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags))
+	if (!test_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags))
 		schedule_delayed_work(&health->recover_work, recover_delay);
 	else
 		dev_err(&dev->pdev->dev,
@@ -313,6 +314,7 @@ void mlx5_start_health_poll(struct mlx5_core_dev *dev)
 	init_timer(&health->timer);
 	health->sick = 0;
 	clear_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags);
+	clear_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags);
 	health->health = &dev->iseg->health;
 	health->health_counter = &dev->iseg->health_counter;
 
@@ -335,11 +337,22 @@ void mlx5_drain_health_wq(struct mlx5_core_dev *dev)
 
 	spin_lock(&health->wq_lock);
 	set_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags);
+	set_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags);
 	spin_unlock(&health->wq_lock);
 	cancel_delayed_work_sync(&health->recover_work);
 	cancel_work_sync(&health->work);
 }
 
+void mlx5_drain_health_recovery(struct mlx5_core_dev *dev)
+{
+	struct mlx5_core_health *health = &dev->priv.health;
+
+	spin_lock(&health->wq_lock);
+	set_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags);
+	spin_unlock(&health->wq_lock);
+	cancel_delayed_work_sync(&dev->priv.health.recover_work);
+}
+
 void mlx5_health_cleanup(struct mlx5_core_dev *dev)
 {
 	struct mlx5_core_health *health = &dev->priv.health;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index fd47b5134841..524c16f72e83 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1228,7 +1228,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
 	int err = 0;
 
 	if (cleanup)
-		mlx5_drain_health_wq(dev);
+		mlx5_drain_health_recovery(dev);
 
 	mutex_lock(&dev->intf_state_mutex);
 	if (test_bit(MLX5_INTERFACE_STATE_DOWN, &dev->intf_state)) {
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 93273d9ea4d1..ba260330ce5e 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -925,6 +925,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev);
 void mlx5_start_health_poll(struct mlx5_core_dev *dev);
 void mlx5_stop_health_poll(struct mlx5_core_dev *dev);
 void mlx5_drain_health_wq(struct mlx5_core_dev *dev);
+void mlx5_drain_health_recovery(struct mlx5_core_dev *dev);
 int mlx5_buf_alloc_node(struct mlx5_core_dev *dev, int size,
 			struct mlx5_buf *buf, int node);
 int mlx5_buf_alloc(struct mlx5_core_dev *dev, int size, struct mlx5_buf *buf);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [net 3/3] net/mlx5e: Fix TX carrier errors report in get stats ndo
  2017-06-30  7:12 [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28 Saeed Mahameed
  2017-06-30  7:12 ` [net 1/3] net/mlx5: Fix driver load error flow when firmware is stuck Saeed Mahameed
  2017-06-30  7:12 ` [net 2/3] net/mlx5: Cancel delayed recovery work when unloading the driver Saeed Mahameed
@ 2017-06-30  7:12 ` Saeed Mahameed
  2017-07-01 21:17 ` [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28 David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: Saeed Mahameed @ 2017-06-30  7:12 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Gal Pressman, Saeed Mahameed

From: Gal Pressman <galp@mellanox.com>

Symbol error during carrier counter from PPCNT was mistakenly reported as
TX carrier errors in get_stats ndo, although it's an RX counter.

Fixes: 269e6b3af3bf ("net/mlx5e: Report additional error statistics in get stats ndo")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 277f4de30375..7819fe9ede22 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3053,8 +3053,6 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
 		PPORT_802_3_GET(pstats, a_frame_check_sequence_errors);
 	stats->rx_frame_errors = PPORT_802_3_GET(pstats, a_alignment_errors);
 	stats->tx_aborted_errors = PPORT_2863_GET(pstats, if_out_discards);
-	stats->tx_carrier_errors =
-		PPORT_802_3_GET(pstats, a_symbol_error_during_carrier);
 	stats->rx_errors = stats->rx_length_errors + stats->rx_crc_errors +
 			   stats->rx_frame_errors;
 	stats->tx_errors = stats->tx_aborted_errors + stats->tx_carrier_errors;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28
  2017-06-30  7:12 [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2017-06-30  7:12 ` [net 3/3] net/mlx5e: Fix TX carrier errors report in get stats ndo Saeed Mahameed
@ 2017-07-01 21:17 ` David Miller
  3 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2017-07-01 21:17 UTC (permalink / raw)
  To: saeedm; +Cc: netdev

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Fri, 30 Jun 2017 10:12:26 +0300

> This series contains some fixes for the mlx5 core and netdev driver.
> 
> Please pull and let me know if there's any problem.
> 
> For -stable:
> ("net/mlx5e: Fix TX carrier errors report in get stats ndo") Kernels >= v4.7
> 
> ("net/mlx5: Cancel delayed recovery work when unloading the driver") Kernels >= v4.10
> * When applied to net-next this will introduce a contextual conflict, it
> should be easy to resolve, (a spin_lock was changed to spin_lock_irqsave in net-next),
> if you need any help with this please let me know.
> 
> ("net/mlx5: Fix driver load error flow when firmware is stuck") Kernels >= v4.4*
> * This patch fixes: 6c780a0267b8 ("net/mlx5: Wait for FW readiness before initializing command interface")
> which was submitted two weeks ago and queued up for v4.4.
> 
> Sorry about the mess, but other than the above, this series doesn't introduce
> any conflict with the current mlx5 IPSec offload series.

Pulled and queued up for -stable, thanks.

I should be able to resolve the merge conflicts, thanks for letting
me know about it.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-07-01 21:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-30  7:12 [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28 Saeed Mahameed
2017-06-30  7:12 ` [net 1/3] net/mlx5: Fix driver load error flow when firmware is stuck Saeed Mahameed
2017-06-30  7:12 ` [net 2/3] net/mlx5: Cancel delayed recovery work when unloading the driver Saeed Mahameed
2017-06-30  7:12 ` [net 3/3] net/mlx5e: Fix TX carrier errors report in get stats ndo Saeed Mahameed
2017-07-01 21:17 ` [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28 David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).