* [net 1/3] net/mlx5: Fix driver load error flow when firmware is stuck
2017-06-30 7:12 [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28 Saeed Mahameed
@ 2017-06-30 7:12 ` Saeed Mahameed
2017-06-30 7:12 ` [net 2/3] net/mlx5: Cancel delayed recovery work when unloading the driver Saeed Mahameed
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Saeed Mahameed @ 2017-06-30 7:12 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Gal Pressman, Saeed Mahameed
From: Gal Pressman <galp@mellanox.com>
When wait for firmware init fails, previous code would mistakenly
return success and cause inconsistency in the driver state.
Fixes: 6c780a0267b8 ("net/mlx5: Wait for FW readiness before initializing command interface")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 13be264587f1..fd47b5134841 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1020,7 +1020,7 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
if (err) {
dev_err(&dev->pdev->dev, "Firmware over %d MS in pre-initializing state, aborting\n",
FW_PRE_INIT_TIMEOUT_MILI);
- goto out;
+ goto out_err;
}
err = mlx5_cmd_init(dev);
--
2.11.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [net 2/3] net/mlx5: Cancel delayed recovery work when unloading the driver
2017-06-30 7:12 [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28 Saeed Mahameed
2017-06-30 7:12 ` [net 1/3] net/mlx5: Fix driver load error flow when firmware is stuck Saeed Mahameed
@ 2017-06-30 7:12 ` Saeed Mahameed
2017-06-30 7:12 ` [net 3/3] net/mlx5e: Fix TX carrier errors report in get stats ndo Saeed Mahameed
2017-07-01 21:17 ` [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28 David Miller
3 siblings, 0 replies; 5+ messages in thread
From: Saeed Mahameed @ 2017-06-30 7:12 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Mohamad Haj Yahia, Moshe Shemesh, Saeed Mahameed
From: Mohamad Haj Yahia <mohamad@mellanox.com>
Draining the health workqueue will ignore future health works including
the one that report hardware failure and thus we can't enter error state
Instead cancel the recovery flow and make sure only recovery flow won't
be scheduled.
Fixes: 5e44fca50470 ('net/mlx5: Only cancel recovery work when cleaning up device')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/health.c | 15 ++++++++++++++-
drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 +-
include/linux/mlx5/driver.h | 1 +
3 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index f27f84ffbc85..8a8b5f0e497c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -67,6 +67,7 @@ enum {
enum {
MLX5_DROP_NEW_HEALTH_WORK,
+ MLX5_DROP_NEW_RECOVERY_WORK,
};
static u8 get_nic_state(struct mlx5_core_dev *dev)
@@ -193,7 +194,7 @@ static void health_care(struct work_struct *work)
mlx5_handle_bad_state(dev);
spin_lock(&health->wq_lock);
- if (!test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags))
+ if (!test_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags))
schedule_delayed_work(&health->recover_work, recover_delay);
else
dev_err(&dev->pdev->dev,
@@ -313,6 +314,7 @@ void mlx5_start_health_poll(struct mlx5_core_dev *dev)
init_timer(&health->timer);
health->sick = 0;
clear_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags);
+ clear_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags);
health->health = &dev->iseg->health;
health->health_counter = &dev->iseg->health_counter;
@@ -335,11 +337,22 @@ void mlx5_drain_health_wq(struct mlx5_core_dev *dev)
spin_lock(&health->wq_lock);
set_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags);
+ set_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags);
spin_unlock(&health->wq_lock);
cancel_delayed_work_sync(&health->recover_work);
cancel_work_sync(&health->work);
}
+void mlx5_drain_health_recovery(struct mlx5_core_dev *dev)
+{
+ struct mlx5_core_health *health = &dev->priv.health;
+
+ spin_lock(&health->wq_lock);
+ set_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags);
+ spin_unlock(&health->wq_lock);
+ cancel_delayed_work_sync(&dev->priv.health.recover_work);
+}
+
void mlx5_health_cleanup(struct mlx5_core_dev *dev)
{
struct mlx5_core_health *health = &dev->priv.health;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index fd47b5134841..524c16f72e83 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1228,7 +1228,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
int err = 0;
if (cleanup)
- mlx5_drain_health_wq(dev);
+ mlx5_drain_health_recovery(dev);
mutex_lock(&dev->intf_state_mutex);
if (test_bit(MLX5_INTERFACE_STATE_DOWN, &dev->intf_state)) {
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 93273d9ea4d1..ba260330ce5e 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -925,6 +925,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev);
void mlx5_start_health_poll(struct mlx5_core_dev *dev);
void mlx5_stop_health_poll(struct mlx5_core_dev *dev);
void mlx5_drain_health_wq(struct mlx5_core_dev *dev);
+void mlx5_drain_health_recovery(struct mlx5_core_dev *dev);
int mlx5_buf_alloc_node(struct mlx5_core_dev *dev, int size,
struct mlx5_buf *buf, int node);
int mlx5_buf_alloc(struct mlx5_core_dev *dev, int size, struct mlx5_buf *buf);
--
2.11.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [net 3/3] net/mlx5e: Fix TX carrier errors report in get stats ndo
2017-06-30 7:12 [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28 Saeed Mahameed
2017-06-30 7:12 ` [net 1/3] net/mlx5: Fix driver load error flow when firmware is stuck Saeed Mahameed
2017-06-30 7:12 ` [net 2/3] net/mlx5: Cancel delayed recovery work when unloading the driver Saeed Mahameed
@ 2017-06-30 7:12 ` Saeed Mahameed
2017-07-01 21:17 ` [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28 David Miller
3 siblings, 0 replies; 5+ messages in thread
From: Saeed Mahameed @ 2017-06-30 7:12 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Gal Pressman, Saeed Mahameed
From: Gal Pressman <galp@mellanox.com>
Symbol error during carrier counter from PPCNT was mistakenly reported as
TX carrier errors in get_stats ndo, although it's an RX counter.
Fixes: 269e6b3af3bf ("net/mlx5e: Report additional error statistics in get stats ndo")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 277f4de30375..7819fe9ede22 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3053,8 +3053,6 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
PPORT_802_3_GET(pstats, a_frame_check_sequence_errors);
stats->rx_frame_errors = PPORT_802_3_GET(pstats, a_alignment_errors);
stats->tx_aborted_errors = PPORT_2863_GET(pstats, if_out_discards);
- stats->tx_carrier_errors =
- PPORT_802_3_GET(pstats, a_symbol_error_during_carrier);
stats->rx_errors = stats->rx_length_errors + stats->rx_crc_errors +
stats->rx_frame_errors;
stats->tx_errors = stats->tx_aborted_errors + stats->tx_carrier_errors;
--
2.11.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28
2017-06-30 7:12 [pull request][net 0/3] Mellanox, mlx5 fixes 2017-06-28 Saeed Mahameed
` (2 preceding siblings ...)
2017-06-30 7:12 ` [net 3/3] net/mlx5e: Fix TX carrier errors report in get stats ndo Saeed Mahameed
@ 2017-07-01 21:17 ` David Miller
3 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2017-07-01 21:17 UTC (permalink / raw)
To: saeedm; +Cc: netdev
From: Saeed Mahameed <saeedm@mellanox.com>
Date: Fri, 30 Jun 2017 10:12:26 +0300
> This series contains some fixes for the mlx5 core and netdev driver.
>
> Please pull and let me know if there's any problem.
>
> For -stable:
> ("net/mlx5e: Fix TX carrier errors report in get stats ndo") Kernels >= v4.7
>
> ("net/mlx5: Cancel delayed recovery work when unloading the driver") Kernels >= v4.10
> * When applied to net-next this will introduce a contextual conflict, it
> should be easy to resolve, (a spin_lock was changed to spin_lock_irqsave in net-next),
> if you need any help with this please let me know.
>
> ("net/mlx5: Fix driver load error flow when firmware is stuck") Kernels >= v4.4*
> * This patch fixes: 6c780a0267b8 ("net/mlx5: Wait for FW readiness before initializing command interface")
> which was submitted two weeks ago and queued up for v4.4.
>
> Sorry about the mess, but other than the above, this series doesn't introduce
> any conflict with the current mlx5 IPSec offload series.
Pulled and queued up for -stable, thanks.
I should be able to resolve the merge conflicts, thanks for letting
me know about it.
^ permalink raw reply [flat|nested] 5+ messages in thread