linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Smart <jsmart2021@gmail.com>
To: linux-scsi@vger.kernel.org
Cc: James Smart <jsmart2021@gmail.com>, Justin Tee <justin.tee@broadcom.com>
Subject: [PATCH 06/26] lpfc: Fix SCSI I/O completion and abort handler deadlock
Date: Tue, 12 Apr 2022 15:19:48 -0700	[thread overview]
Message-ID: <20220412222008.126521-7-jsmart2021@gmail.com> (raw)
In-Reply-To: <20220412222008.126521-1-jsmart2021@gmail.com>

During stress I/O tests with 500+ vports, hard LOCKUP call traces are
observed.

CPU A:
 native_queued_spin_lock_slowpath+0x192
 _raw_spin_lock_irqsave+0x32
 lpfc_handle_fcp_err+0x4c6
 lpfc_fcp_io_cmd_wqe_cmpl+0x964
 lpfc_sli4_fp_handle_cqe+0x266
 __lpfc_sli4_process_cq+0x105
 __lpfc_sli4_hba_process_cq+0x3c
 lpfc_cq_poll_hdler+0x16
 irq_poll_softirq+0x76
 __softirqentry_text_start+0xe4
 irq_exit+0xf7
 do_IRQ+0x7f

CPU B:
 native_queued_spin_lock_slowpath+0x5b
 _raw_spin_lock+0x1c
 lpfc_abort_handler+0x13e
 scmd_eh_abort_handler+0x85
 process_one_work+0x1a7
 worker_thread+0x30
 kthread+0x112
 ret_from_fork+0x1f

Diagram of lockup:

CPUA                            CPUB
----                            ----
lpfc_cmd->buf_lock
                            phba->hbalock
                            lpfc_cmd->buf_lock
phba->hbalock

Fix by reordering the taking of the lpfc_cmd->buf_lock and phba->hbalock
in lpfc_abort_handler routine so that it tries to take the
lpfc_cmd->buf_lock first before phba->hbalock.

Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_scsi.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index ae340850d94f..c3daf7a3e123 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -5865,25 +5865,25 @@ lpfc_abort_handler(struct scsi_cmnd *cmnd)
 	if (!lpfc_cmd)
 		return ret;
 
-	spin_lock_irqsave(&phba->hbalock, flags);
+	/* Guard against IO completion being called at same time */
+	spin_lock_irqsave(&lpfc_cmd->buf_lock, flags);
+
+	spin_lock(&phba->hbalock);
 	/* driver queued commands are in process of being flushed */
 	if (phba->hba_flag & HBA_IOQ_FLUSH) {
 		lpfc_printf_vlog(vport, KERN_WARNING, LOG_FCP,
 			"3168 SCSI Layer abort requested I/O has been "
 			"flushed by LLD.\n");
 		ret = FAILED;
-		goto out_unlock;
+		goto out_unlock_hba;
 	}
 
-	/* Guard against IO completion being called at same time */
-	spin_lock(&lpfc_cmd->buf_lock);
-
 	if (!lpfc_cmd->pCmd) {
 		lpfc_printf_vlog(vport, KERN_WARNING, LOG_FCP,
 			 "2873 SCSI Layer I/O Abort Request IO CMPL Status "
 			 "x%x ID %d LUN %llu\n",
 			 SUCCESS, cmnd->device->id, cmnd->device->lun);
-		goto out_unlock_buf;
+		goto out_unlock_hba;
 	}
 
 	iocb = &lpfc_cmd->cur_iocbq;
@@ -5891,7 +5891,7 @@ lpfc_abort_handler(struct scsi_cmnd *cmnd)
 		pring_s4 = phba->sli4_hba.hdwq[iocb->hba_wqidx].io_wq->pring;
 		if (!pring_s4) {
 			ret = FAILED;
-			goto out_unlock_buf;
+			goto out_unlock_hba;
 		}
 		spin_lock(&pring_s4->ring_lock);
 	}
@@ -5924,8 +5924,8 @@ lpfc_abort_handler(struct scsi_cmnd *cmnd)
 			 "3389 SCSI Layer I/O Abort Request is pending\n");
 		if (phba->sli_rev == LPFC_SLI_REV4)
 			spin_unlock(&pring_s4->ring_lock);
-		spin_unlock(&lpfc_cmd->buf_lock);
-		spin_unlock_irqrestore(&phba->hbalock, flags);
+		spin_unlock(&phba->hbalock);
+		spin_unlock_irqrestore(&lpfc_cmd->buf_lock, flags);
 		goto wait_for_cmpl;
 	}
 
@@ -5946,15 +5946,13 @@ lpfc_abort_handler(struct scsi_cmnd *cmnd)
 	if (ret_val != IOCB_SUCCESS) {
 		/* Indicate the IO is not being aborted by the driver. */
 		lpfc_cmd->waitq = NULL;
-		spin_unlock(&lpfc_cmd->buf_lock);
-		spin_unlock_irqrestore(&phba->hbalock, flags);
 		ret = FAILED;
-		goto out;
+		goto out_unlock_hba;
 	}
 
 	/* no longer need the lock after this point */
-	spin_unlock(&lpfc_cmd->buf_lock);
-	spin_unlock_irqrestore(&phba->hbalock, flags);
+	spin_unlock(&phba->hbalock);
+	spin_unlock_irqrestore(&lpfc_cmd->buf_lock, flags);
 
 	if (phba->cfg_poll & DISABLE_FCP_RING_INT)
 		lpfc_sli_handle_fast_ring_event(phba,
@@ -5989,10 +5987,9 @@ lpfc_abort_handler(struct scsi_cmnd *cmnd)
 out_unlock_ring:
 	if (phba->sli_rev == LPFC_SLI_REV4)
 		spin_unlock(&pring_s4->ring_lock);
-out_unlock_buf:
-	spin_unlock(&lpfc_cmd->buf_lock);
-out_unlock:
-	spin_unlock_irqrestore(&phba->hbalock, flags);
+out_unlock_hba:
+	spin_unlock(&phba->hbalock);
+	spin_unlock_irqrestore(&lpfc_cmd->buf_lock, flags);
 out:
 	lpfc_printf_vlog(vport, KERN_WARNING, LOG_FCP,
 			 "0749 SCSI Layer I/O Abort Request Status x%x ID %d "
-- 
2.26.2


  parent reply	other threads:[~2022-04-12 23:34 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-12 22:19 [PATCH 00/26] lpfc: Update lpfc to revision 14.2.0.2 James Smart
2022-04-12 22:19 ` [PATCH 01/26] lpfc: Tweak message log categories for ELS/FDMI/NVME Rescan James Smart
2022-04-12 22:19 ` [PATCH 02/26] lpfc: Move cfg_log_verbose check before calling lpfc_dmp_dbg James Smart
2022-04-12 22:19 ` [PATCH 03/26] lpfc: Fix diagnostic fw logging after a function reset James Smart
2022-04-12 22:19 ` [PATCH 04/26] lpfc: Zero SLI4 fcp_cmnd buffer's fcpCntl0 field James Smart
2022-04-12 22:19 ` [PATCH 05/26] lpfc: Requeue SCSI I/O to upper layer when fw reports link down James Smart
2022-04-12 22:19 ` James Smart [this message]
2022-04-12 22:19 ` [PATCH 07/26] lpfc: Clear fabric topology flag before initiating a new FLOGI James Smart
2022-04-12 22:19 ` [PATCH 08/26] lpfc: Fix null pointer dereference after failing to issue FLOGI and PLOGI James Smart
2022-04-12 22:19 ` [PATCH 09/26] lpfc: Protect memory leak for NPIV ports sending PLOGI_RJT James Smart
2022-04-12 22:19 ` [PATCH 10/26] lpfc: Update fc_prli_sent outstanding only after guaranteed IOCB submit James Smart
2022-04-12 22:19 ` [PATCH 11/26] lpfc: Transition to NPR state upon LOGO cmpl if link down or aborted James Smart
2022-04-12 22:19 ` [PATCH 12/26] lpfc: Remove unnecessary NULL pointer assignment for ELS_RDF path James Smart
2022-04-12 22:19 ` [PATCH 13/26] lpfc: Move MI module parameter check to handle dynamic disable James Smart
2022-04-12 22:19 ` [PATCH 14/26] lpfc: Correct CRC32 calculation for congestion stats James Smart
2022-04-12 22:19 ` [PATCH 15/26] lpfc: Fix call trace observed during I/O with CMF enabled James Smart
2022-04-12 22:19 ` [PATCH 16/26] lpfc: Revise FDMI reporting of supported port speed for trunk groups James Smart
2022-04-12 22:19 ` [PATCH 17/26] lpfc: Remove false FDMI NVME FC-4 support for NPIV ports James Smart
2022-04-12 22:20 ` [PATCH 18/26] lpfc: Register for Application Services FC-4 type in Fabric topology James Smart
2022-04-12 22:20 ` [PATCH 19/26] lpfc: Introduce FC_RSCN_MEMENTO flag for tracking post RSCN completion James Smart
2022-04-12 22:20 ` [PATCH 20/26] lpfc: Fix field overload in lpfc_iocbq data structure James Smart
2022-04-13 16:25   ` kernel test robot
2022-04-22 14:51   ` Dan Carpenter
2022-04-12 22:20 ` [PATCH 21/26] lpfc: Refactor cleanup of mailbox commands James Smart
2022-04-12 22:20 ` [PATCH 22/26] lpfc: Change FA-PWWN detection methodology James Smart
2022-04-12 22:20 ` [PATCH 23/26] lpfc: Update stat accounting for READ_STATUS mbox command James Smart
2022-04-12 22:20 ` [PATCH 24/26] lpfc: Expand setting ELS_ID field in ELS_REQUEST64_WQE James Smart
2022-04-12 22:20 ` [PATCH 25/26] lpfc: Update lpfc version to 14.2.0.2 James Smart
2022-04-12 22:20 ` [PATCH 26/26] lpfc: Copyright updates for 14.2.0.2 patches James Smart
2022-04-19  2:50 ` [PATCH 00/26] lpfc: Update lpfc to revision 14.2.0.2 Martin K. Petersen
2022-04-26  4:00 ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220412222008.126521-7-jsmart2021@gmail.com \
    --to=jsmart2021@gmail.com \
    --cc=justin.tee@broadcom.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).