* [PATCH AUTOSEL 6.6 53/58] scsi: lpfc: Add ELS_RSP cmd to the list of WQEs to flush in lpfc_els_flush_cmd()
[not found] <20241004182503.3672477-1-sashal@kernel.org>
@ 2024-10-04 18:24 ` Sasha Levin
2024-10-04 18:24 ` [PATCH AUTOSEL 6.6 54/58] scsi: lpfc: Ensure DA_ID handling completion before deleting an NPIV instance Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2024-10-04 18:24 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Justin Tee, Martin K . Petersen, Sasha Levin, james.smart,
dick.kennedy, James.Bottomley, linux-scsi
From: Justin Tee <justin.tee@broadcom.com>
[ Upstream commit 93bcc5f3984bf4f51da1529700aec351872dbfff ]
During HBA stress testing, a spam of received PLOGIs exposes a resource
recovery bug causing leakage of lpfc_sqlq entries from the global
phba->sli4_hba.lpfc_els_sgl_list.
The issue is in lpfc_els_flush_cmd(), where the driver attempts to recover
outstanding ELS sgls when walking the txcmplq. Only CMD_ELS_REQUEST64_CRs
and CMD_GEN_REQUEST64_CRs are added to the abort and cancel lists. A check
for CMD_XMIT_ELS_RSP64_WQE is missing in order to recover LS_ACC usages of
the phba->sli4_hba.lpfc_els_sgl_list too.
Fix by adding CMD_XMIT_ELS_RSP64_WQE as part of the txcmplq walk when
adding WQEs to the abort and cancel list in lpfc_els_flush_cmd(). Also,
update naming convention from CRs to WQEs.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240912232447.45607-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/scsi/lpfc/lpfc_els.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index 44d3ada9fbbcb..739508e0a45f3 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -9644,11 +9644,12 @@ lpfc_els_flush_cmd(struct lpfc_vport *vport)
if (piocb->cmd_flag & LPFC_DRIVER_ABORTED && !mbx_tmo_err)
continue;
- /* On the ELS ring we can have ELS_REQUESTs or
- * GEN_REQUESTs waiting for a response.
+ /* On the ELS ring we can have ELS_REQUESTs, ELS_RSPs,
+ * or GEN_REQUESTs waiting for a CQE response.
*/
ulp_command = get_job_cmnd(phba, piocb);
- if (ulp_command == CMD_ELS_REQUEST64_CR) {
+ if (ulp_command == CMD_ELS_REQUEST64_WQE ||
+ ulp_command == CMD_XMIT_ELS_RSP64_WQE) {
list_add_tail(&piocb->dlist, &abort_list);
/* If the link is down when flushing ELS commands
--
2.43.0
^ permalink raw reply related [flat|nested] 2+ messages in thread* [PATCH AUTOSEL 6.6 54/58] scsi: lpfc: Ensure DA_ID handling completion before deleting an NPIV instance
[not found] <20241004182503.3672477-1-sashal@kernel.org>
2024-10-04 18:24 ` [PATCH AUTOSEL 6.6 53/58] scsi: lpfc: Add ELS_RSP cmd to the list of WQEs to flush in lpfc_els_flush_cmd() Sasha Levin
@ 2024-10-04 18:24 ` Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2024-10-04 18:24 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Justin Tee, Martin K . Petersen, Sasha Levin, james.smart,
dick.kennedy, James.Bottomley, linux-scsi
From: Justin Tee <justin.tee@broadcom.com>
[ Upstream commit 0a3c84f71680684c1d41abb92db05f95c09111e8 ]
Deleting an NPIV instance requires all fabric ndlps to be released before
an NPIV's resources can be torn down. Failure to release fabric ndlps
beforehand opens kref imbalance race conditions. Fix by forcing the DA_ID
to complete synchronously with usage of wait_queue.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20240912232447.45607-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/scsi/lpfc/lpfc_ct.c | 12 ++++++++++
drivers/scsi/lpfc/lpfc_disc.h | 7 ++++++
drivers/scsi/lpfc/lpfc_vport.c | 43 ++++++++++++++++++++++++++++------
3 files changed, 55 insertions(+), 7 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_ct.c b/drivers/scsi/lpfc/lpfc_ct.c
index baae1f8279e0c..1775115239860 100644
--- a/drivers/scsi/lpfc/lpfc_ct.c
+++ b/drivers/scsi/lpfc/lpfc_ct.c
@@ -1671,6 +1671,18 @@ lpfc_cmpl_ct(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
}
out:
+ /* If the caller wanted a synchronous DA_ID completion, signal the
+ * wait obj and clear flag to reset the vport.
+ */
+ if (ndlp->save_flags & NLP_WAIT_FOR_DA_ID) {
+ if (ndlp->da_id_waitq)
+ wake_up(ndlp->da_id_waitq);
+ }
+
+ spin_lock_irq(&ndlp->lock);
+ ndlp->save_flags &= ~NLP_WAIT_FOR_DA_ID;
+ spin_unlock_irq(&ndlp->lock);
+
lpfc_ct_free_iocb(phba, cmdiocb);
lpfc_nlp_put(ndlp);
return;
diff --git a/drivers/scsi/lpfc/lpfc_disc.h b/drivers/scsi/lpfc/lpfc_disc.h
index f82615d87c4bb..f5ae8cc158205 100644
--- a/drivers/scsi/lpfc/lpfc_disc.h
+++ b/drivers/scsi/lpfc/lpfc_disc.h
@@ -90,6 +90,8 @@ enum lpfc_nlp_save_flags {
NLP_IN_RECOV_POST_DEV_LOSS = 0x1,
/* wait for outstanding LOGO to cmpl */
NLP_WAIT_FOR_LOGO = 0x2,
+ /* wait for outstanding DA_ID to finish */
+ NLP_WAIT_FOR_DA_ID = 0x4
};
struct lpfc_nodelist {
@@ -159,7 +161,12 @@ struct lpfc_nodelist {
uint32_t nvme_fb_size; /* NVME target's supported byte cnt */
#define NVME_FB_BIT_SHIFT 9 /* PRLI Rsp first burst in 512B units. */
uint32_t nlp_defer_did;
+
+ /* These wait objects are NPIV specific. These IOs must complete
+ * synchronously.
+ */
wait_queue_head_t *logo_waitq;
+ wait_queue_head_t *da_id_waitq;
};
struct lpfc_node_rrq {
diff --git a/drivers/scsi/lpfc/lpfc_vport.c b/drivers/scsi/lpfc/lpfc_vport.c
index 9e0e9e02d2c47..256ee797adb30 100644
--- a/drivers/scsi/lpfc/lpfc_vport.c
+++ b/drivers/scsi/lpfc/lpfc_vport.c
@@ -633,6 +633,7 @@ lpfc_vport_delete(struct fc_vport *fc_vport)
struct Scsi_Host *shost = lpfc_shost_from_vport(vport);
struct lpfc_hba *phba = vport->phba;
int rc;
+ DECLARE_WAIT_QUEUE_HEAD_ONSTACK(waitq);
if (vport->port_type == LPFC_PHYSICAL_PORT) {
lpfc_printf_vlog(vport, KERN_ERR, LOG_TRACE_EVENT,
@@ -688,21 +689,49 @@ lpfc_vport_delete(struct fc_vport *fc_vport)
if (!ndlp)
goto skip_logo;
+ /* Send the DA_ID and Fabric LOGO to cleanup the NPIV fabric entries. */
if (ndlp && ndlp->nlp_state == NLP_STE_UNMAPPED_NODE &&
phba->link_state >= LPFC_LINK_UP &&
phba->fc_topology != LPFC_TOPOLOGY_LOOP) {
if (vport->cfg_enable_da_id) {
- /* Send DA_ID and wait for a completion. */
+ /* Send DA_ID and wait for a completion. This is best
+ * effort. If the DA_ID fails, likely the fabric will
+ * "leak" NportIDs but at least the driver issued the
+ * command.
+ */
+ ndlp = lpfc_findnode_did(vport, NameServer_DID);
+ if (!ndlp)
+ goto issue_logo;
+
+ spin_lock_irq(&ndlp->lock);
+ ndlp->da_id_waitq = &waitq;
+ ndlp->save_flags |= NLP_WAIT_FOR_DA_ID;
+ spin_unlock_irq(&ndlp->lock);
+
rc = lpfc_ns_cmd(vport, SLI_CTNS_DA_ID, 0, 0);
- if (rc) {
- lpfc_printf_log(vport->phba, KERN_WARNING,
- LOG_VPORT,
- "1829 CT command failed to "
- "delete objects on fabric, "
- "rc %d\n", rc);
+ if (!rc) {
+ wait_event_timeout(waitq,
+ !(ndlp->save_flags & NLP_WAIT_FOR_DA_ID),
+ msecs_to_jiffies(phba->fc_ratov * 2000));
}
+
+ lpfc_printf_vlog(vport, KERN_INFO, LOG_VPORT | LOG_ELS,
+ "1829 DA_ID issue status %d. "
+ "SFlag x%x NState x%x, NFlag x%x "
+ "Rpi x%x\n",
+ rc, ndlp->save_flags, ndlp->nlp_state,
+ ndlp->nlp_flag, ndlp->nlp_rpi);
+
+ /* Remove the waitq and save_flags. It no
+ * longer matters if the wake happened.
+ */
+ spin_lock_irq(&ndlp->lock);
+ ndlp->da_id_waitq = NULL;
+ ndlp->save_flags &= ~NLP_WAIT_FOR_DA_ID;
+ spin_unlock_irq(&ndlp->lock);
}
+issue_logo:
/*
* If the vpi is not registered, then a valid FDISC doesn't
* exist and there is no need for a ELS LOGO. Just cleanup
--
2.43.0
^ permalink raw reply related [flat|nested] 2+ messages in thread