* [PATCH AUTOSEL 6.18-5.10] nvme-fc: don't hold rport lock when putting ctrl
[not found] <20251212020903.4153935-1-sashal@kernel.org>
@ 2025-12-12 2:08 ` Sasha Levin
2025-12-12 2:08 ` [PATCH AUTOSEL 6.18-6.12] nvme-fabrics: add ENOKEY to no retry criteria for authentication failures Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2025-12-12 2:08 UTC (permalink / raw)
To: patches, stable
Cc: Daniel Wagner, Justin Tee, Christoph Hellwig, Keith Busch,
Sasha Levin, nareshgottumukkala83, paul.ely, sagi, linux-nvme
From: Daniel Wagner <wagi@kernel.org>
[ Upstream commit b71cbcf7d170e51148d5467820ae8a72febcb651 ]
nvme_fc_ctrl_put can acquire the rport lock when freeing the
ctrl object:
nvme_fc_ctrl_put
nvme_fc_ctrl_free
spin_lock_irqsave(rport->lock)
Thus we can't hold the rport lock when calling nvme_fc_ctrl_put.
Justin suggested use the safe list iterator variant because
nvme_fc_ctrl_put will also modify the rport->list.
Cc: Justin Tee <justin.tee@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
This shows the affected function was introduced in v5.8-rc1
(`14fd1e98afafc`), meaning this deadlock bug has existed since **Linux
5.8** and affects all stable kernels from 5.8 onwards (5.10.y, 5.15.y,
6.1.y, 6.6.y, etc.).
### SUMMARY
**What the commit fixes:**
A **deadlock bug** in the NVMe-FC (Fibre Channel) driver where
`nvme_fc_match_disconn_ls()` holds `rport->lock` while calling
`nvme_fc_ctrl_put()`. When the reference count hits zero,
`nvme_fc_ctrl_free()` tries to acquire the same lock, causing a
deadlock.
**Stable kernel criteria:**
| Criterion | Assessment |
|-----------|------------|
| Obviously correct | ✅ Uses canonical lock drop/reacquire pattern |
| Fixes real bug | ✅ Deadlock - system hang |
| Important issue | ✅ Deadlocks in storage paths are critical |
| Small and contained | ✅ ~6 lines in one function |
| No new features | ✅ Pure bug fix |
| Expert reviewed | ✅ Christoph Hellwig |
**Risk vs Benefit:**
- **Risk:** LOW - The fix uses a well-established kernel pattern
(`list_for_each_entry_safe` + lock release/reacquire)
- **Benefit:** HIGH - Prevents deadlock in NVMe-FC storage driver used
in enterprise environments
**Concerns:**
- No explicit `Cc: stable` tag, but this is not required for obvious bug
fixes
- No `Fixes:` tag, but we've identified the bug exists since v5.8
- The fix should apply cleanly to any kernel with the affected function
(5.8+)
### CONCLUSION
This commit fixes a clear deadlock bug in the NVMe-FC driver that has
existed since Linux 5.8. The fix is:
- Small and surgical (only ~6 lines changed)
- Uses well-understood, standard kernel locking patterns
- Has been reviewed by a respected kernel developer (Christoph Hellwig)
- Signed off by the NVMe maintainer (Keith Busch)
- Affects enterprise storage users who rely on NVMe over Fibre Channel
Deadlocks in storage drivers are serious issues that warrant stable
backporting. The minimal scope and established fix pattern make this a
low-risk, high-value backport.
**YES**
drivers/nvme/host/fc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 2c903729b0b90..8324230c53719 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1468,14 +1468,14 @@ nvme_fc_match_disconn_ls(struct nvme_fc_rport *rport,
{
struct fcnvme_ls_disconnect_assoc_rqst *rqst =
&lsop->rqstbuf->rq_dis_assoc;
- struct nvme_fc_ctrl *ctrl, *ret = NULL;
+ struct nvme_fc_ctrl *ctrl, *tmp, *ret = NULL;
struct nvmefc_ls_rcv_op *oldls = NULL;
u64 association_id = be64_to_cpu(rqst->associd.association_id);
unsigned long flags;
spin_lock_irqsave(&rport->lock, flags);
- list_for_each_entry(ctrl, &rport->ctrl_list, ctrl_list) {
+ list_for_each_entry_safe(ctrl, tmp, &rport->ctrl_list, ctrl_list) {
if (!nvme_fc_ctrl_get(ctrl))
continue;
spin_lock(&ctrl->lock);
@@ -1488,7 +1488,9 @@ nvme_fc_match_disconn_ls(struct nvme_fc_rport *rport,
if (ret)
/* leave the ctrl get reference */
break;
+ spin_unlock_irqrestore(&rport->lock, flags);
nvme_fc_ctrl_put(ctrl);
+ spin_lock_irqsave(&rport->lock, flags);
}
spin_unlock_irqrestore(&rport->lock, flags);
--
2.51.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
* [PATCH AUTOSEL 6.18-6.12] nvme-fabrics: add ENOKEY to no retry criteria for authentication failures
[not found] <20251212020903.4153935-1-sashal@kernel.org>
2025-12-12 2:08 ` [PATCH AUTOSEL 6.18-5.10] nvme-fc: don't hold rport lock when putting ctrl Sasha Levin
@ 2025-12-12 2:08 ` Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2025-12-12 2:08 UTC (permalink / raw)
To: patches, stable
Cc: Justin Tee, Daniel Wagner, Hannes Reinecke, Keith Busch,
Sasha Levin, sagi, linux-nvme
From: Justin Tee <justintee8345@gmail.com>
[ Upstream commit 13989207ee29c40501e719512e8dc90768325895 ]
With authentication, in addition to EKEYREJECTED there is also no point in
retrying reconnects when status is ENOKEY. Thus, add -ENOKEY as another
criteria to determine when to stop retries.
Cc: Daniel Wagner <wagi@kernel.org>
Cc: Hannes Reinecke <hare@suse.de>
Closes: https://lore.kernel.org/linux-nvme/20250829-nvme-fc-sync-v3-0-d69c87e63aee@kernel.org/
Signed-off-by: Justin Tee <justintee8345@gmail.com>
Tested-by: Daniel Wagner <wagi@kernel.org>
Reviewed-by: Daniel Wagner <wagi@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
## Comprehensive Analysis
### 1. COMMIT MESSAGE ANALYSIS
- **Purpose**: Add `-ENOKEY` to prevent pointless reconnect retries when
authentication key is missing
- **Tags**: `Tested-by` (Daniel Wagner), `Reviewed-by` (Daniel Wagner,
Hannes Reinecke), `Closes:` (lore link)
- **Missing**: No explicit `Cc: stable@vger.kernel.org` or `Fixes:` tag
- **Maintainer signoff**: Keith Busch (NVMe maintainer)
### 2. CODE CHANGE ANALYSIS
The change is minimal - single line modification:
```c
- if (status == -EKEYREJECTED)
+ if (status == -EKEYREJECTED || status == -ENOKEY)
```
**Where `-ENOKEY` is returned:**
- `drivers/nvme/host/auth.c:720` - No session key negotiated
- `drivers/nvme/host/auth.c:973` - No host key (`ctrl->host_key` is
NULL)
- `drivers/nvme/host/auth.c:978` - Controller key configured but invalid
- `drivers/nvme/host/tcp.c:1698,2080,2112,2121` - Various TLS/PSK key
failures
All these represent "key does not exist" scenarios where retrying cannot
help.
**Function impact:** `nvmf_should_reconnect()` is called by all three
NVMe fabric transports (TCP, FC, RDMA) via
`nvme_tcp_reconnect_or_remove()`, `nvme_fc_reconnect_or_delete()`, and
`nvme_rdma_reconnect_or_remove()`.
### 3. CLASSIFICATION
- **Bug fix**: Yes - fixes futile retry behavior
- **New feature**: No - extends existing error handling pattern
- **Follows established pattern**: The `-EKEYREJECTED` check was added
in v6.10 (commit 0e34bd9605f6c) with identical logic
### 4. SCOPE AND RISK ASSESSMENT
- **Lines changed**: 1
- **Files touched**: 1
- **Complexity**: Trivial
- **Risk**: Extremely low - change only affects reconnect decision for
an already-failed authentication
- **Regression potential**: Near zero - the code path only executes when
authentication already failed
### 5. USER IMPACT
- **Who is affected**: Users of NVMe Fabrics (TCP/RDMA/FC) with
authentication enabled
- **Severity without fix**: Wasteful reconnect retries, potential log
spam, resource consumption
- **Not a crash/data corruption**: This is a behavioral improvement, not
a critical fix
### 6. STABILITY INDICATORS
- Tested by Daniel Wagner (NVMe developer)
- Reviewed by Daniel Wagner and Hannes Reinecke (both storage/NVMe
experts)
- Clean, simple change with clear semantics
### 7. DEPENDENCY CHECK
- Requires commit `0e34bd9605f6c` ("nvme: do not retry authentication
failures") from v6.10
- NVMe authentication feature itself was added in v6.1 (`f50fff73d620c`)
- Backport applies cleanly to trees with the `-EKEYREJECTED` check
### Decision Rationale
**Pros for backporting:**
- Trivial one-line change with zero regression risk
- Fixes real wasteful behavior (pointless retries that can never
succeed)
- Follows existing code pattern already established
- Reviewed and tested by domain experts
- Semantically correct: `-ENOKEY` means "no key available" - retry won't
create one
**Cons for backporting:**
- No explicit `Cc: stable@vger.kernel.org` tag from maintainers
- Not a crash, security bug, or data corruption fix
- NVMe authentication is a relatively niche feature
- Bug impact is resource waste, not functional failure
**Conclusion:**
This is a low-risk, obviously correct bug fix that prevents wasteful
behavior. While it lacks explicit stable tags and isn't fixing a
critical bug, the change is so simple and safe that the benefit-to-risk
ratio strongly favors inclusion. The fix completes the authentication
error handling that was started with the `-EKEYREJECTED` check, making
it a natural complement to that existing code. Stable kernel users with
NVMe authentication would benefit from not having pointless reconnection
storms when their keys are missing.
**YES**
drivers/nvme/host/fabrics.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 2e58a7ce10905..55a8afd2efd50 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -592,7 +592,7 @@ bool nvmf_should_reconnect(struct nvme_ctrl *ctrl, int status)
if (status > 0 && (status & NVME_STATUS_DNR))
return false;
- if (status == -EKEYREJECTED)
+ if (status == -EKEYREJECTED || status == -ENOKEY)
return false;
if (ctrl->opts->max_reconnects == -1 ||
--
2.51.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-12-12 2:09 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20251212020903.4153935-1-sashal@kernel.org>
2025-12-12 2:08 ` [PATCH AUTOSEL 6.18-5.10] nvme-fc: don't hold rport lock when putting ctrl Sasha Levin
2025-12-12 2:08 ` [PATCH AUTOSEL 6.18-6.12] nvme-fabrics: add ENOKEY to no retry criteria for authentication failures Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox