From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Justin Tee <justin.tee@broadcom.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Sasha Levin <sashal@kernel.org>,
paul.ely@broadcom.com, linux-scsi@vger.kernel.org
Subject: [PATCH AUTOSEL 6.18-6.17] scsi: lpfc: Fix reusing an ndlp that is marked NLP_DROPPED during FLOGI
Date: Tue, 9 Dec 2025 22:48:59 -0500 [thread overview]
Message-ID: <20251210034915.2268617-18-sashal@kernel.org> (raw)
In-Reply-To: <20251210034915.2268617-1-sashal@kernel.org>
From: Justin Tee <justin.tee@broadcom.com>
[ Upstream commit 07caedc6a3887938813727beafea40f07c497705 ]
It's possible for an unstable link to repeatedly bounce allowing a FLOGI
retry, but then bounce again forcing an abort of the FLOGI. Ensure that
the initial reference count on the FLOGI ndlp is restored in this faulty
link scenario.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20251106224639.139176-8-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Looking at this commit, I need to analyze it thoroughly to determine if
it's suitable for stable backporting.
## Commit Analysis
### 1. Problem Being Fixed
This commit fixes a **reference counting bug** in the lpfc (Broadcom
Fibre Channel) driver. The issue occurs when:
1. A FLOGI (Fabric Login) is issued on an unstable link
2. The link bounces, causing the FLOGI to abort
3. The ndlp (node list pointer) gets marked `NLP_DROPPED` and its
reference count is decremented via `lpfc_nlp_put()`
4. The link comes back and lpfc tries to reuse the same ndlp for a new
FLOGI retry
5. The ndlp is still marked `NLP_DROPPED` with a decremented refcount,
causing reference count imbalances
This can lead to **use-after-free** or **double-free** conditions -
serious kernel stability and security issues.
### 2. Code Changes Analysis
The fix is surgical and well-designed:
**In `lpfc_issue_els_flogi()`:**
- Adds a check: if ndlp is marked `NLP_DROPPED`, clear the flag and
restore the reference count with `lpfc_nlp_get()`
- This is the core fix - properly restoring state when retrying with a
previously-dropped ndlp
**In `lpfc_cmpl_els_flogi()`:**
- Before calling `lpfc_nlp_put()`, now checks if `NLP_DROPPED` is
already set
- Sets `NLP_DROPPED` atomically before decrementing to prevent double-
decrement
**In `lpfc_dev_loss_tmo_handler()`:**
- Uses `test_and_set_bit(NLP_DROPPED, ...)` to atomically check and set,
preventing races
**In `lpfc_check_nlp_post_devloss()`:**
- Clears `NLP_DROPPED` when restoring the ndlp reference
### 3. Scope and Risk Assessment
- **Files changed:** 2 files in lpfc driver only
- **Lines changed:** ~50 lines of code
- **Scope:** Limited to lpfc FLOGI handling and devloss timeout paths
- **Risk:** MEDIUM - Reference counting changes require care, but these
use proper atomic operations (`test_and_set_bit`,
`test_and_clear_bit`)
### 4. Stable Criteria Evaluation
**Meets stable criteria:**
- ✅ Fixes a real bug (reference count corruption causing potential use-
after-free)
- ✅ Small and contained to specific driver (lpfc)
- ✅ No new features added
- ✅ Clear mechanism: adds proper NLP_DROPPED flag tracking to prevent
refcount imbalance
- ✅ Affects real-world scenarios: unstable FC links occur in enterprise
environments
**Concerns:**
- ❌ No "Fixes:" tag indicating when bug was introduced
- ❌ No "Cc: stable@vger.kernel.org" tag
- The `NLP_DROPPED` flag usage appears to exist in older kernels already
### 5. User Impact
- **Who is affected:** Users with Broadcom lpfc Fibre Channel HBAs in
enterprise storage environments
- **Severity:** HIGH - Reference count bugs can cause kernel crashes,
data corruption, or security vulnerabilities
- **Likelihood:** MODERATE - Requires unstable FC link conditions, but
these occur in real deployments
### 6. Backport Feasibility
The changes use standard kernel primitives (`test_and_set_bit`,
`test_and_clear_bit`). The `NLP_DROPPED` flag and the functions being
modified (`lpfc_nlp_put`, `lpfc_nlp_get`, `lpfc_check_nlp_post_devloss`)
appear to exist in stable kernels. The patch should apply cleanly or
with minor adjustments.
## Conclusion
This is a legitimate bug fix for a reference counting issue that can
cause serious problems (use-after-free, crashes) in enterprise Fibre
Channel environments. The fix is well-designed with proper atomic
operations and is contained to the lpfc driver. While there's no
explicit stable tag, the nature of the bug (reference counting
corruption in a widely-used enterprise driver) makes it appropriate for
stable backporting.
The fix is small, surgical, and addresses a real stability/potential
security issue. Enterprise users with lpfc HBAs who experience link
instability would benefit from this fix.
**YES**
drivers/scsi/lpfc/lpfc_els.c | 36 +++++++++++++++++++++++++-------
drivers/scsi/lpfc/lpfc_hbadisc.c | 4 +++-
2 files changed, 32 insertions(+), 8 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index b71db7d7d747d..c08237f04bce2 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -934,10 +934,15 @@ lpfc_cmpl_els_flogi(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
/* Check to see if link went down during discovery */
if (lpfc_els_chk_latt(vport)) {
/* One additional decrement on node reference count to
- * trigger the release of the node
+ * trigger the release of the node. Make sure the ndlp
+ * is marked NLP_DROPPED.
*/
- if (!(ndlp->fc4_xpt_flags & SCSI_XPT_REGD))
+ if (!test_bit(NLP_IN_DEV_LOSS, &ndlp->nlp_flag) &&
+ !test_bit(NLP_DROPPED, &ndlp->nlp_flag) &&
+ !(ndlp->fc4_xpt_flags & SCSI_XPT_REGD)) {
+ set_bit(NLP_DROPPED, &ndlp->nlp_flag);
lpfc_nlp_put(ndlp);
+ }
goto out;
}
@@ -995,9 +1000,10 @@ lpfc_cmpl_els_flogi(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
IOERR_LOOP_OPEN_FAILURE)))
lpfc_vlog_msg(vport, KERN_WARNING, LOG_ELS,
"2858 FLOGI Status:x%x/x%x TMO"
- ":x%x Data x%lx x%x\n",
+ ":x%x Data x%lx x%x x%lx x%x\n",
ulp_status, ulp_word4, tmo,
- phba->hba_flag, phba->fcf.fcf_flag);
+ phba->hba_flag, phba->fcf.fcf_flag,
+ ndlp->nlp_flag, ndlp->fc4_xpt_flags);
/* Check for retry */
if (lpfc_els_retry(phba, cmdiocb, rspiocb)) {
@@ -1015,14 +1021,17 @@ lpfc_cmpl_els_flogi(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
* reference to trigger node release.
*/
if (!test_bit(NLP_IN_DEV_LOSS, &ndlp->nlp_flag) &&
- !(ndlp->fc4_xpt_flags & SCSI_XPT_REGD))
+ !test_bit(NLP_DROPPED, &ndlp->nlp_flag) &&
+ !(ndlp->fc4_xpt_flags & SCSI_XPT_REGD)) {
+ set_bit(NLP_DROPPED, &ndlp->nlp_flag);
lpfc_nlp_put(ndlp);
+ }
lpfc_printf_vlog(vport, KERN_WARNING, LOG_ELS,
"0150 FLOGI Status:x%x/x%x "
- "xri x%x TMO:x%x refcnt %d\n",
+ "xri x%x iotag x%x TMO:x%x refcnt %d\n",
ulp_status, ulp_word4, cmdiocb->sli4_xritag,
- tmo, kref_read(&ndlp->kref));
+ cmdiocb->iotag, tmo, kref_read(&ndlp->kref));
/* If this is not a loop open failure, bail out */
if (!(ulp_status == IOSTAT_LOCAL_REJECT &&
@@ -1279,6 +1288,19 @@ lpfc_issue_els_flogi(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
uint32_t tmo, did;
int rc;
+ /* It's possible for lpfc to reissue a FLOGI on an ndlp that is marked
+ * NLP_DROPPED. This happens when the FLOGI completed with the XB bit
+ * set causing lpfc to reference the ndlp until the XRI_ABORTED CQE is
+ * issued. The time window for the XRI_ABORTED CQE can be as much as
+ * 2*2*RA_TOV allowing for ndlp reuse of this type when the link is
+ * cycling quickly. When true, restore the initial reference and remove
+ * the NLP_DROPPED flag as lpfc is retrying.
+ */
+ if (test_and_clear_bit(NLP_DROPPED, &ndlp->nlp_flag)) {
+ if (!lpfc_nlp_get(ndlp))
+ return 1;
+ }
+
cmdsize = (sizeof(uint32_t) + sizeof(struct serv_parm));
elsiocb = lpfc_prep_els_iocb(vport, 1, cmdsize, retry, ndlp,
ndlp->nlp_DID, ELS_CMD_FLOGI);
diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index 43d246c5c049c..717ae56c8e4bd 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -424,6 +424,7 @@ lpfc_check_nlp_post_devloss(struct lpfc_vport *vport,
struct lpfc_nodelist *ndlp)
{
if (test_and_clear_bit(NLP_IN_RECOV_POST_DEV_LOSS, &ndlp->save_flags)) {
+ clear_bit(NLP_DROPPED, &ndlp->nlp_flag);
lpfc_nlp_get(ndlp);
lpfc_printf_vlog(vport, KERN_INFO, LOG_DISCOVERY | LOG_NODE,
"8438 Devloss timeout reversed on DID x%x "
@@ -566,7 +567,8 @@ lpfc_dev_loss_tmo_handler(struct lpfc_nodelist *ndlp)
return fcf_inuse;
}
- lpfc_nlp_put(ndlp);
+ if (!test_and_set_bit(NLP_DROPPED, &ndlp->nlp_flag))
+ lpfc_nlp_put(ndlp);
return fcf_inuse;
}
--
2.51.0
prev parent reply other threads:[~2025-12-10 3:49 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20251210034915.2268617-1-sashal@kernel.org>
2025-12-10 3:48 ` [PATCH AUTOSEL 6.18-5.10] scsi: qla2xxx: Use reinit_completion on mbx_intr_comp Sasha Levin
2025-12-10 3:48 ` [PATCH AUTOSEL 6.18-5.15] scsi: qla2xxx: Fix lost interrupts with qlini_mode=disabled Sasha Levin
2025-12-10 3:48 ` [PATCH AUTOSEL 6.18-6.12] scsi: smartpqi: Add support for Hurray Data new controller PCI device Sasha Levin
2025-12-10 3:48 ` [PATCH AUTOSEL 6.18-6.12] scsi: ufs: host: mediatek: Fix shutdown/suspend race condition Sasha Levin
2025-12-10 3:48 ` [PATCH AUTOSEL 6.18-5.10] scsi: qla2xxx: Fix initiator mode with qlini_mode=exclusive Sasha Levin
2025-12-10 3:48 ` Sasha Levin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251210034915.2268617-18-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=justin.tee@broadcom.com \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=patches@lists.linux.dev \
--cc=paul.ely@broadcom.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox