From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Justin Tee <justin.tee@broadcom.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Sasha Levin <sashal@kernel.org>,
paul.ely@broadcom.com, linux-scsi@vger.kernel.org
Subject: [PATCH AUTOSEL 6.17-6.12] scsi: lpfc: Ensure PLOGI_ACC is sent prior to PRLI in Point to Point topology
Date: Sat, 25 Oct 2025 12:00:06 -0400 [thread overview]
Message-ID: <20251025160905.3857885-375-sashal@kernel.org> (raw)
In-Reply-To: <20251025160905.3857885-1-sashal@kernel.org>
From: Justin Tee <justin.tee@broadcom.com>
[ Upstream commit 2bf81856a403c92a4ce375288f33fba82ca2ccc6 ]
There is a timing race condition when a PRLI may be sent on the wire
before PLOGI_ACC in Point to Point topology. Fix by deferring REG_RPI
mbox completion handling to after PLOGI_ACC's CQE completion. Because
the discovery state machine only sends PRLI after REG_RPI mbox
completion, PRLI is now guaranteed to be sent after PLOGI_ACC.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Message-ID: <20250915180811.137530-8-justintee8345@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
YES
- `drivers/scsi/lpfc/lpfc_nportdisc.c:329-353` now keeps the original
`REG_RPI` mailbox queued with the PLOGI ACC when `FC_PT2PT` is set, so
the driver no longer tells the discovery state machine that login
succeeded until the ACC CQE really arrives; this closes the race where
the state machine could transmit PRLI while the remote port was still
waiting for our PLOGI_ACC.
- The matching completion path in
`drivers/scsi/lpfc/lpfc_els.c:5341-5409` runs
`lpfc_mbx_cmpl_reg_login()` only after the ACC response finishes on a
point-to-point link, guaranteeing the required on-wire ordering
(PLOGI_ACC before PRLI) and keeping the `NLP_ACC_REGLOGIN` bookkeeping
consistent.
- The change is tightly scoped to lpfc point-to-point discovery, adds no
new features, and leaves fabric/NVMe paths untouched; failure paths
still fall back to the existing cleanup, so regression risk is low.
- Without this fix, direct-attach systems can intermittently fail to
establish sessions because the target sees PRLI before we have
acknowledged its login, which is a user-visible bug.
- Backporters should be aware that older stable trees still use
`login_mbox->context3` and bitmask-clear macros for `nlp_flag`; the
logic ports cleanly but needs those mechanical adjustments.
drivers/scsi/lpfc/lpfc_els.c | 10 +++++++---
drivers/scsi/lpfc/lpfc_nportdisc.c | 23 ++++++++++++++++++-----
2 files changed, 25 insertions(+), 8 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index 3f703932b2f07..8762fb84f14f1 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -5339,12 +5339,12 @@ lpfc_cmpl_els_rsp(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
ulp_status, ulp_word4, did);
/* ELS response tag <ulpIoTag> completes */
lpfc_printf_vlog(vport, KERN_INFO, LOG_ELS,
- "0110 ELS response tag x%x completes "
+ "0110 ELS response tag x%x completes fc_flag x%lx"
"Data: x%x x%x x%x x%x x%lx x%x x%x x%x %p %p\n",
- iotag, ulp_status, ulp_word4, tmo,
+ iotag, vport->fc_flag, ulp_status, ulp_word4, tmo,
ndlp->nlp_DID, ndlp->nlp_flag, ndlp->nlp_state,
ndlp->nlp_rpi, kref_read(&ndlp->kref), mbox, ndlp);
- if (mbox) {
+ if (mbox && !test_bit(FC_PT2PT, &vport->fc_flag)) {
if (ulp_status == 0 &&
test_bit(NLP_ACC_REGLOGIN, &ndlp->nlp_flag)) {
if (!lpfc_unreg_rpi(vport, ndlp) &&
@@ -5403,6 +5403,10 @@ lpfc_cmpl_els_rsp(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
}
out_free_mbox:
lpfc_mbox_rsrc_cleanup(phba, mbox, MBOX_THD_UNLOCKED);
+ } else if (mbox && test_bit(FC_PT2PT, &vport->fc_flag) &&
+ test_bit(NLP_ACC_REGLOGIN, &ndlp->nlp_flag)) {
+ lpfc_mbx_cmpl_reg_login(phba, mbox);
+ clear_bit(NLP_ACC_REGLOGIN, &ndlp->nlp_flag);
}
out:
if (ndlp && shost) {
diff --git a/drivers/scsi/lpfc/lpfc_nportdisc.c b/drivers/scsi/lpfc/lpfc_nportdisc.c
index a596b80d03d4d..3799bdf2f1b88 100644
--- a/drivers/scsi/lpfc/lpfc_nportdisc.c
+++ b/drivers/scsi/lpfc/lpfc_nportdisc.c
@@ -326,8 +326,14 @@ lpfc_defer_plogi_acc(struct lpfc_hba *phba, LPFC_MBOXQ_t *login_mbox)
/* Now that REG_RPI completed successfully,
* we can now proceed with sending the PLOGI ACC.
*/
- rc = lpfc_els_rsp_acc(login_mbox->vport, ELS_CMD_PLOGI,
- save_iocb, ndlp, NULL);
+ if (test_bit(FC_PT2PT, &ndlp->vport->fc_flag)) {
+ rc = lpfc_els_rsp_acc(login_mbox->vport, ELS_CMD_PLOGI,
+ save_iocb, ndlp, login_mbox);
+ } else {
+ rc = lpfc_els_rsp_acc(login_mbox->vport, ELS_CMD_PLOGI,
+ save_iocb, ndlp, NULL);
+ }
+
if (rc) {
lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
"4576 PLOGI ACC fails pt2pt discovery: "
@@ -335,9 +341,16 @@ lpfc_defer_plogi_acc(struct lpfc_hba *phba, LPFC_MBOXQ_t *login_mbox)
}
}
- /* Now process the REG_RPI cmpl */
- lpfc_mbx_cmpl_reg_login(phba, login_mbox);
- clear_bit(NLP_ACC_REGLOGIN, &ndlp->nlp_flag);
+ /* If this is a fabric topology, complete the reg_rpi and prli now.
+ * For Pt2Pt, the reg_rpi and PRLI are deferred until after the LS_ACC
+ * completes. This ensures, in Pt2Pt, that the PLOGI LS_ACC is sent
+ * before the PRLI.
+ */
+ if (!test_bit(FC_PT2PT, &ndlp->vport->fc_flag)) {
+ /* Now process the REG_RPI cmpl */
+ lpfc_mbx_cmpl_reg_login(phba, login_mbox);
+ clear_bit(NLP_ACC_REGLOGIN, &ndlp->nlp_flag);
+ }
kfree(save_iocb);
}
--
2.51.0
next prev parent reply other threads:[~2025-10-25 16:26 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20251025160905.3857885-1-sashal@kernel.org>
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17-5.4] scsi: lpfc: Define size of debugfs entry for xri rebalancing Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17] scsi: ufs: ufs-qcom: Disable lane clocks during phy hibern8 Sasha Levin
2025-10-25 15:54 ` [PATCH AUTOSEL 6.17-6.12] PCI/ERR: Update device error_state already after reset Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17] scsi: ufs: core: Change MCQ interrupt enable flow Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17-5.4] scsi: lpfc: Check return status of lpfc_reset_flush_io_context during TGT_RESET Sasha Levin
2025-10-25 15:55 ` [PATCH AUTOSEL 6.17-6.1] scsi: ufs: host: mediatek: Fix invalid access in vccqx handling Sasha Levin
2025-10-25 15:56 ` [PATCH AUTOSEL 6.17-6.1] scsi: ufs: host: mediatek: Change reset sequence for improved stability Sasha Levin
2025-10-25 15:56 ` [PATCH AUTOSEL 6.17-5.15] scsi: mpi3mr: Fix controller init failure on fault during queue creation Sasha Levin
2025-10-25 15:56 ` [PATCH AUTOSEL 6.17-6.6] scsi: ufs: host: mediatek: Disable auto-hibern8 during power mode changes Sasha Levin
2025-10-25 15:56 ` [PATCH AUTOSEL 6.17-6.12] scsi: lpfc: Clean up allocated queues when queue setup mbox commands fail Sasha Levin
2025-10-25 15:56 ` [PATCH AUTOSEL 6.17] scsi: ufs: ufs-qcom: Align programming sequence of Shared ICE for UFS controller v5 Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17-6.12] scsi: ufs: host: mediatek: Fix PWM mode switch issue Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17-6.6] scsi: ufs: host: mediatek: Enhance recovery on hibernation exit failure Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17-5.15] scsi: libfc: Fix potential buffer overflow in fc_ct_ms_fill() Sasha Levin
2025-10-25 15:57 ` [PATCH AUTOSEL 6.17-6.12] scsi: ufs: exynos: fsd: Gate ref_clk and put UFS device in reset on suspend Sasha Levin
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17-5.10] scsi: pm80xx: Fix race condition caused by static variables Sasha Levin
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17] scsi: ufs: host: mediatek: Fix adapt issue after PA_Init Sasha Levin
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17-6.6] scsi: ufs: core: Disable timestamp functionality if not supported Sasha Levin
2025-10-25 15:58 ` [PATCH AUTOSEL 6.17-6.1] scsi: ufs: host: mediatek: Assign power mode userdata before FASTAUTO mode change Sasha Levin
2025-10-25 15:59 ` [PATCH AUTOSEL 6.17-6.12] scsi: mpi3mr: Fix I/O failures during controller reset Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17] scsi: mpi3mr: Fix device loss during enclosure reboot due to zero link speed Sasha Levin
2025-10-25 16:00 ` Sasha Levin [this message]
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.6] scsi: ufs: host: mediatek: Fix auto-hibern8 timer configuration Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.12] scsi: ufs: host: mediatek: Fix unbalanced IRQ enable issue Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.1] scsi: ufs: host: mediatek: Enhance recovery on resume failure Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.1] scsi: mpt3sas: Add support for 22.5 Gbps SAS link rate Sasha Levin
2025-10-25 16:00 ` [PATCH AUTOSEL 6.17-6.12] scsi: lpfc: Decrement ndlp kref after FDISC retries exhausted Sasha Levin
2025-10-25 16:01 ` [PATCH AUTOSEL 6.17-5.4] scsi: pm8001: Use int instead of u32 to store error codes Sasha Levin
2025-10-25 16:01 ` [PATCH AUTOSEL 6.17-6.12] scsi: ufs: host: mediatek: Correct system PM flow Sasha Levin
2025-10-25 16:01 ` [PATCH AUTOSEL 6.17-5.15] scsi: lpfc: Remove ndlp kref decrement clause for F_Port_Ctrl in lpfc_cleanup Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251025160905.3857885-375-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=justin.tee@broadcom.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=patches@lists.linux.dev \
--cc=paul.ely@broadcom.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox