Linux SCSI subsystem development
 help / color / mirror / Atom feed
From: Justin Tee <justintee8345@gmail.com>
To: linux-scsi@vger.kernel.org
Cc: jsmart833426@gmail.com, justin.tee@broadcom.com,
	Justin Tee <justintee8345@gmail.com>
Subject: [PATCH 07/14] lpfc: Rework I/O flush ordering when unloading driver
Date: Thu,  4 Jun 2026 12:29:30 -0700	[thread overview]
Message-ID: <20260604192937.65605-8-justintee8345@gmail.com> (raw)
In-Reply-To: <20260604192937.65605-1-justintee8345@gmail.com>

The lpfc_els_abort routine has a code path that cancels outstanding
I/Os on the ELS ring when attempted aborts fail.  The failed aborts are
queued to a drv_cmpl_list and then cancelled after the ELS pring->txcmplq
is fully traversed.  However if the abort failure returns IOCB_ABORTING,
then the driver should not have cancelled it.  Doing so starts two threads
working on the same iocb and ndlp, leading to unintended race conditions.

Fix by capturing the IOCB_ABORTING return value in lpfc_els_abort and not
adding it to the list of iocbs for cancelling.  We should allow the iocb
scheduled for abort to complete naturally.  This avoids simultaneous
threads acting on the same iocb and ndlp objects.

The lpfc_free_iocb_list is moved to execute after lpfc_sli4_hba_unset
allowing the routine to flush I/O before freeing it.  And, in
lpfc_pci_remove_one_s4 a call to flush the phba->wq is added.  This makes
the unload logic consistent with offline handling logic.

Signed-off-by: Justin Tee <justintee8345@gmail.com>
---
 drivers/scsi/lpfc/lpfc_init.c      | 16 ++++++++++++++--
 drivers/scsi/lpfc/lpfc_nportdisc.c | 11 +++++++++--
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 968a25235a2d..44f213f42347 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -13515,6 +13515,9 @@ lpfc_sli4_hba_unset(struct lpfc_hba *phba)
 	/* Stop the SLI4 device port */
 	if (phba->pport)
 		phba->pport->work_port_events = 0;
+
+	/* All IO completed and queues released. Free the IOCBs. */
+	lpfc_free_iocb_list(phba);
 }
 
 /*
@@ -14949,11 +14952,20 @@ lpfc_pci_remove_one_s4(struct pci_dev *pdev)
 
 	/* Perform scsi free before driver resource_unset since scsi
 	 * buffers are released to their corresponding pools here.
+	 * lpfc_sli4_hba_unset() issues aborts via lpfc_sli_hba_iocb_abort(),
+	 * which allocates abort IOCBs from phba->lpfc_iocb_list; the pool
+	 * must still exist, so lpfc_free_iocb_list() runs only after unset.
 	 */
 	lpfc_io_free(phba);
-	lpfc_free_iocb_list(phba);
-	lpfc_sli4_hba_unset(phba);
 
+	/* Flush the PHBA WQ - there could be a race with ELS IOs while lpfc
+	 * is unloading.  This stops a race between completions, aborts and
+	 * resource recovery.
+	 */
+	if (phba->wq)
+		flush_workqueue(phba->wq);
+
+	lpfc_sli4_hba_unset(phba);
 	lpfc_unset_driver_resource_phase2(phba);
 	lpfc_sli4_driver_resource_unset(phba);
 
diff --git a/drivers/scsi/lpfc/lpfc_nportdisc.c b/drivers/scsi/lpfc/lpfc_nportdisc.c
index 2c8d995a45bf..f917a5bcfd02 100644
--- a/drivers/scsi/lpfc/lpfc_nportdisc.c
+++ b/drivers/scsi/lpfc/lpfc_nportdisc.c
@@ -255,8 +255,9 @@ lpfc_els_abort(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp)
 	spin_lock_irq(&phba->hbalock);
 	if (phba->sli_rev == LPFC_SLI_REV4)
 		spin_lock(&pring->ring_lock);
+
 	list_for_each_entry_safe(iocb, next_iocb, &pring->txcmplq, list) {
-	/* Add to abort_list on on NDLP match. */
+		/* Add to abort_list on NDLP match. */
 		if (lpfc_check_sli_ndlp(phba, pring, iocb, ndlp))
 			list_add_tail(&iocb->dlist, &abort_list);
 	}
@@ -271,7 +272,13 @@ lpfc_els_abort(struct lpfc_hba *phba, struct lpfc_nodelist *ndlp)
 		retval = lpfc_sli_issue_abort_iotag(phba, pring, iocb, NULL);
 		spin_unlock_irq(&phba->hbalock);
 
-		if (retval && test_bit(FC_UNLOADING, &phba->pport->load_flag)) {
+		/* An abort that fails here is just cancelled when the driver is
+		 * going offline.  However, if the abort failure is because the
+		 * IOCB is already getting aborted, don't cancel.  Just let it
+		 * complete.
+		 */
+		if (test_bit(FC_UNLOADING, &phba->pport->load_flag) &&
+		    retval && retval != IOCB_ABORTING) {
 			list_del_init(&iocb->list);
 			list_add_tail(&iocb->list, &drv_cmpl_list);
 		}
-- 
2.38.0


  parent reply	other threads:[~2026-06-04 18:50 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-04 19:29 [PATCH 00/14] Update lpfc to revision 15.0.0.1 Justin Tee
2026-06-04 19:29 ` [PATCH 01/14] lpfc: Fix use-after-free in lpfc_cmpl_ct_cmd_vmid Justin Tee
2026-06-04 19:29 ` [PATCH 02/14] lpfc: Early return out of lpfc_els_abort when HBA_SETUP flag is not set Justin Tee
2026-06-04 19:29 ` [PATCH 03/14] lpfc: Fix kernel oops when unmapping scsi dma buffers for an aborted cmd Justin Tee
2026-06-04 19:29 ` [PATCH 04/14] lpfc: Check fc4_xpt_flags before decrementing ndlp kref on FDISC error Justin Tee
2026-06-04 19:29 ` [PATCH 05/14] lpfc: Add handling for when PLOGI or PRLI is dropped during link failure Justin Tee
2026-06-04 19:29 ` [PATCH 06/14] lpfc: Fix ndlp use-after-free during repeated RSCN and rediscovery sequence Justin Tee
2026-06-04 19:29 ` Justin Tee [this message]
2026-06-04 19:29 ` [PATCH 08/14] lpfc: Improve PLOGI retry handling for large SAN configurations Justin Tee
2026-06-04 19:29 ` [PATCH 09/14] lpfc: Send inhibited ABORT_WQE when PLOGI CQE SEQUENCE_TMO is received Justin Tee
2026-06-04 19:29 ` [PATCH 10/14] lpfc: Remove slowpath cqe process limiter in slow ring event handler Justin Tee
2026-06-04 19:29 ` [PATCH 11/14] lpfc: Put iocbq on phba->txq when ELS WQ is full or ELS SGL unavailable Justin Tee
2026-06-04 19:29 ` [PATCH 12/14] lpfc: Update ELS ACC logging for diagnostic troubleshooting Justin Tee
2026-06-04 19:29 ` [PATCH 13/14] lpfc: Refactor calls on fc_disctmo to lpfc_set_disctmo in RSCN handler Justin Tee
2026-06-04 19:29 ` [PATCH 14/14] lpfc: Update lpfc version to 15.0.0.1 Justin Tee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260604192937.65605-8-justintee8345@gmail.com \
    --to=justintee8345@gmail.com \
    --cc=jsmart833426@gmail.com \
    --cc=justin.tee@broadcom.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox