From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linuxfoundation.org ([140.211.169.12]:38018 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759841AbdKPRuM (ORCPT ); Thu, 16 Nov 2017 12:50:12 -0500 From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Moshe Shemesh , Saeed Mahameed , "David S. Miller" Subject: [PATCH 4.13 24/44] net/mlx5: Fix health work queue spin lock to IRQ safe Date: Thu, 16 Nov 2017 18:42:48 +0100 Message-Id: <20171116172824.732968599@linuxfoundation.org> In-Reply-To: <20171116172823.336649076@linuxfoundation.org> References: <20171116172823.336649076@linuxfoundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: stable-owner@vger.kernel.org List-ID: 4.13-stable review patch. If anyone has any objections, please let me know. ------------------ From: Moshe Shemesh [ Upstream commit 6377ed0bbae6fa28853e1679d068a9106c8a8908 ] spin_lock/unlock of health->wq_lock should be IRQ safe. It was changed to spin_lock_irqsave since adding commit 0179720d6be2 ("net/mlx5: Introduce trigger_health_work function") which uses spin_lock from asynchronous event (IRQ) context. Thus, all spin_lock/unlock of health->wq_lock should have been moved to IRQ safe mode. However, one occurrence on new code using this lock missed that change, resulting in possible deadlock: kernel: Possible unsafe locking scenario: kernel: CPU0 kernel: ---- kernel: lock(&(&health->wq_lock)->rlock); kernel: kernel: lock(&(&health->wq_lock)->rlock); kernel: #012 *** DEADLOCK *** Fixes: 2a0165a034ac ("net/mlx5: Cancel delayed recovery work when unloading the driver") Signed-off-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/health.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -356,10 +356,11 @@ void mlx5_drain_health_wq(struct mlx5_co void mlx5_drain_health_recovery(struct mlx5_core_dev *dev) { struct mlx5_core_health *health = &dev->priv.health; + unsigned long flags; - spin_lock(&health->wq_lock); + spin_lock_irqsave(&health->wq_lock, flags); set_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags); - spin_unlock(&health->wq_lock); + spin_unlock_irqrestore(&health->wq_lock, flags); cancel_delayed_work_sync(&dev->priv.health.recover_work); }