All of lore.kernel.org
 help / color / mirror / Atom feed
From: Saeed Mahameed <saeedm@nvidia.com>
To: "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org, Moshe Shemesh <moshe@nvidia.com>,
	Maher Sanalla <msanalla@nvidia.com>,
	Shay Drory <shayd@nvidia.com>, Saeed Mahameed <saeedm@nvidia.com>
Subject: [net 14/15] net/mlx5: Avoid double clear or set of sync reset requested
Date: Wed,  4 May 2022 00:02:55 -0700	[thread overview]
Message-ID: <20220504070256.694458-15-saeedm@nvidia.com> (raw)
In-Reply-To: <20220504070256.694458-1-saeedm@nvidia.com>

From: Moshe Shemesh <moshe@nvidia.com>

Double clear of reset requested state can lead to NULL pointer as it
will try to delete the timer twice. This can happen for example on a
race between abort from FW and pci error or reset. Avoid such case using
test_and_clear_bit() to verify only one time reset requested state clear
flow. Similarly use test_and_set_bit() to verify only one time reset
requested state set flow.

Fixes: 7dd6df329d4c ("net/mlx5: Handle sync reset abort event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/fw_reset.c    | 28 +++++++++++++------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index ec18d4ccbc11..ca1aba845dd6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -162,14 +162,19 @@ static void mlx5_stop_sync_reset_poll(struct mlx5_core_dev *dev)
 	del_timer_sync(&fw_reset->timer);
 }
 
-static void mlx5_sync_reset_clear_reset_requested(struct mlx5_core_dev *dev, bool poll_health)
+static int mlx5_sync_reset_clear_reset_requested(struct mlx5_core_dev *dev, bool poll_health)
 {
 	struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
 
+	if (!test_and_clear_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags)) {
+		mlx5_core_warn(dev, "Reset request was already cleared\n");
+		return -EALREADY;
+	}
+
 	mlx5_stop_sync_reset_poll(dev);
-	clear_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags);
 	if (poll_health)
 		mlx5_start_health_poll(dev);
+	return 0;
 }
 
 static void mlx5_sync_reset_reload_work(struct work_struct *work)
@@ -229,13 +234,17 @@ static int mlx5_fw_reset_set_reset_sync_nack(struct mlx5_core_dev *dev)
 	return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, 0, 2, false);
 }
 
-static void mlx5_sync_reset_set_reset_requested(struct mlx5_core_dev *dev)
+static int mlx5_sync_reset_set_reset_requested(struct mlx5_core_dev *dev)
 {
 	struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
 
+	if (test_and_set_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags)) {
+		mlx5_core_warn(dev, "Reset request was already set\n");
+		return -EALREADY;
+	}
 	mlx5_stop_health_poll(dev, true);
-	set_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags);
 	mlx5_start_sync_reset_poll(dev);
+	return 0;
 }
 
 static void mlx5_fw_live_patch_event(struct work_struct *work)
@@ -264,7 +273,9 @@ static void mlx5_sync_reset_request_event(struct work_struct *work)
 			       err ? "Failed" : "Sent");
 		return;
 	}
-	mlx5_sync_reset_set_reset_requested(dev);
+	if (mlx5_sync_reset_set_reset_requested(dev))
+		return;
+
 	err = mlx5_fw_reset_set_reset_sync_ack(dev);
 	if (err)
 		mlx5_core_warn(dev, "PCI Sync FW Update Reset Ack Failed. Error code: %d\n", err);
@@ -362,7 +373,8 @@ static void mlx5_sync_reset_now_event(struct work_struct *work)
 	struct mlx5_core_dev *dev = fw_reset->dev;
 	int err;
 
-	mlx5_sync_reset_clear_reset_requested(dev, false);
+	if (mlx5_sync_reset_clear_reset_requested(dev, false))
+		return;
 
 	mlx5_core_warn(dev, "Sync Reset now. Device is going to reset.\n");
 
@@ -391,10 +403,8 @@ static void mlx5_sync_reset_abort_event(struct work_struct *work)
 						      reset_abort_work);
 	struct mlx5_core_dev *dev = fw_reset->dev;
 
-	if (!test_bit(MLX5_FW_RESET_FLAGS_RESET_REQUESTED, &fw_reset->reset_flags))
+	if (mlx5_sync_reset_clear_reset_requested(dev, true))
 		return;
-
-	mlx5_sync_reset_clear_reset_requested(dev, true);
 	mlx5_core_warn(dev, "PCI Sync FW Update Reset Aborted.\n");
 }
 
-- 
2.35.1


  parent reply	other threads:[~2022-05-04  7:03 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-04  7:02 [pull request][net 00/15] mlx5 fixes 2022-05-03 Saeed Mahameed
2022-05-04  7:02 ` [net 01/15] net/mlx5e: Fix wrong source vport matching on tunnel rule Saeed Mahameed
2022-05-04 10:10   ` patchwork-bot+netdevbpf
2022-05-04  7:02 ` [net 02/15] net/mlx5: Fix slab-out-of-bounds while reading resource dump menu Saeed Mahameed
2022-05-04  7:02 ` [net 03/15] net/mlx5e: Don't match double-vlan packets if cvlan is not set Saeed Mahameed
2022-05-04  7:02 ` [net 04/15] net/mlx5e: Fix the calling of update_buffer_lossy() API Saeed Mahameed
2022-05-04  7:02 ` [net 05/15] net/mlx5e: Lag, Fix use-after-free in fib event handler Saeed Mahameed
2022-05-04  7:02 ` [net 06/15] net/mlx5e: Lag, Fix fib_info pointer assignment Saeed Mahameed
2022-05-04  7:02 ` [net 07/15] net/mlx5e: Lag, Don't skip fib events on current dst Saeed Mahameed
2022-05-04  7:02 ` [net 08/15] net/mlx5e: TC, Fix ct_clear overwriting ct action metadata Saeed Mahameed
2022-05-04  7:02 ` [net 09/15] net/mlx5e: TC, fix decap fallback to uplink when int port not supported Saeed Mahameed
2022-05-04  7:02 ` [net 10/15] net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release Saeed Mahameed
2022-05-04  7:02 ` [net 11/15] net/mlx5e: Avoid checking offload capability in post_parse action Saeed Mahameed
2022-05-04  7:02 ` [net 12/15] net/mlx5e: Fix trust state reset in reload Saeed Mahameed
2022-05-04  7:02 ` [net 13/15] net/mlx5: Fix deadlock in sync reset flow Saeed Mahameed
2022-05-04  7:02 ` Saeed Mahameed [this message]
2022-05-04  7:02 ` [net 15/15] net/mlx5: Fix matching on inner TTC Saeed Mahameed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220504070256.694458-15-saeedm@nvidia.com \
    --to=saeedm@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=moshe@nvidia.com \
    --cc=msanalla@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=shayd@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.