[PATCH AUTOSEL 7.0-5.10] scsi: storvsc: Handle PERSISTENT_RESERVE

Linux SCSI subsystem development
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 7.0-5.10] scsi: storvsc: Handle PERSISTENT_RESERVE_IN truncation for Hyper-V vFC
       [not found] <20260428104133.2858589-1-sashal@kernel.org>
@ 2026-04-28 10:40 ` Sasha Levin
  2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-6.18] scsi: lpfc: Remove unnecessary ndlp kref get in lpfc_check_nlp_post_devloss Sasha Levin
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2026-04-28 10:40 UTC (permalink / raw)
  To: patches, stable
  Cc: Li Tian, Long Li, Laurence Oberman, Martin K. Petersen,
	Sasha Levin, kys, haiyangz, wei.liu, decui, jejb, linux-hyperv,
	linux-scsi, linux-kernel

From: Li Tian <litian@redhat.com>

[ Upstream commit 9cf351b289fb2be22491fa3964f99126db67aa08 ]

The storvsc driver has become stricter in handling SRB status codes
returned by the Hyper-V host. When using Virtual Fibre Channel (vFC)
passthrough, the host may return SRB_STATUS_DATA_OVERRUN for
PERSISTENT_RESERVE_IN commands if the allocation length in the CDB does
not match the host's expected response size.

Currently, this status is treated as a fatal error, propagating
Host_status=0x07 [DID_ERROR] to the SCSI mid-layer. This causes
userspace storage utilities (such as sg_persist) to fail with transport
errors, even when the host has actually returned the requested
reservation data in the buffer.

Refactor the existing command-specific workarounds into a new helper
function, storvsc_host_mishandles_cmd(), and add PERSISTENT_RESERVE_IN
to the list of commands where SRB status errors should be suppressed for
vFC devices. This ensures that the SCSI mid-layer processes the returned
data buffer instead of terminating the command.

Signed-off-by: Li Tian <litian@redhat.com>
Reviewed-by: Long Li <longli@microsoft.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Link: https://patch.msgid.link/20260406015344.12566-1-litian@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have enough information. Let me synthesize the full analysis.

---

## Phase 1: Commit Message Forensics

**Step 1.1 — Subject line parse**
Record: Subsystem `scsi: storvsc`, verb "Handle" (bug-fix indicator),
summary: treat SRB_STATUS_DATA_OVERRUN for PERSISTENT_RESERVE_IN on
Hyper-V vFC as success instead of DID_ERROR.

**Step 1.2 — Tags**
Record:
- Signed-off-by: Li Tian (Red Hat, author)
- Reviewed-by: Long Li (Microsoft, storvsc maintainer/co-maintainer)
- Reviewed-by: Laurence Oberman (Red Hat, storage expert)
- Link: patch.msgid.link (lore)
- Signed-off-by: Martin K. Petersen (SCSI maintainer)
- No Fixes:, no Cc: stable (expected for manual review candidates)

**Step 1.3 — Body analysis**
Record: Bug is concrete and user-visible. When the storvsc driver became
stricter (SRB_STATUS_DATA_OVERRUN treated as error), Hyper-V vFC started
failing PERSISTENT_RESERVE_IN with `Host_status=0x07 [DID_ERROR]`, which
causes `sg_persist` and other SCSI-3 PR utilities to fail with transport
errors even though the reservation data was correctly returned in the
buffer. Root cause: the host echoes SRB_STATUS_DATA_OVERRUN when the
CDB's allocation length does not match the host's expected response
size.

**Step 1.4 — Hidden bug-fix detection**
Record: "Handle ... truncation" = explicit bug fix; the refactor is
incidental.

---

## Phase 2: Diff Analysis

**Step 2.1 — Inventory**
Record: Single file `drivers/scsi/storvsc_drv.c`, +21/-11, scope: one
new static helper (`storvsc_host_mishandles_cmd`) and a rewrite of one
`if` expression in `storvsc_on_io_completion`.

**Step 2.2 — Code flow change**
Record:
- BEFORE: `if ((cdb[0]==INQUIRY) || (cdb[0]==MODE_SENSE) ||
  (cdb[0]==MODE_SENSE_10) || (cdb[0]==MAINTENANCE_IN &&
  hv_dev_is_fc(device)))` → mask status to success
- AFTER: `if (storvsc_host_mishandles_cmd(cdb[0], device))` → same mask;
  helper adds `PERSISTENT_RESERVE_IN` under the FC-gated arm

**Step 2.3 — Bug mechanism**
Record: Hardware-workaround / status-filtering bug. Adds
PERSISTENT_RESERVE_IN to the same whitelist used for MAINTENANCE_IN —
i.e., commands for which Hyper-V vFC returns SRB_STATUS_DATA_OVERRUN
despite the data being correct.

**Step 2.4 — Fix quality**
Record: Obviously correct: the new `switch` is semantically identical to
the old `if` for existing opcodes; it only adds PERSISTENT_RESERVE_IN
under `hv_dev_is_fc(device)`. Minimal, surgical. Refactor into a helper
is cosmetic but slightly expands diff; no functional change for non-vFC
devices or for other opcodes.

---

## Phase 3: Git History Investigation

**Step 3.1 — Blame**
Record: The stricter handling of SRB_STATUS_DATA_OVERRUN was introduced
by commit `812fe6420a6e7` ("scsi: storvsc: Handle additional SRB status
values"), which went into **v6.6-rc1**. Before that, unknown SRB
statuses were simply ignored, so the bug did not manifest.

**Step 3.2 — Fixes: follow-up**
Record: There is no Fixes: tag, but the root-cause commit is clearly
`812fe6420a6e7` (v6.6). That commit IS present in all stable trees from
6.6.y onward, so the bug it introduced is present in all those trees.

**Step 3.3 — File history / related commits**
Record:
- `b1aee7f034615` ("scsi: storvsc: Do not flag MAINTENANCE_IN return of
  SRB_STATUS_DATA_OVERRUN as an error", v6.13-rc2, Cathy Avery) —
  parallel fix for MAINTENANCE_IN/RTPG. **Confirmed via web search this
  was AUTOSEL'd to 6.6.y stable (`[PATCH AUTOSEL 6.6 17/23]` on lkml).**
- `9eacec5d18f98` ("scsi: storvsc: Process unsupported MODE_SENSE_10",
  2026) — also had `Cc: stable@kernel.org` and `Fixes:`.
- The current commit is effectively "MAINTENANCE_IN fix, part 2" for a
  different opcode.

**Step 3.4 — Author context**
Record: Li Tian (Red Hat) — regular contributor of Hyper-V /
virtualization fixes. Reviewed by Long Li (Microsoft), who is the
storvsc maintainer. Strong review signal.

**Step 3.5 — Dependencies**
Record: Uses `hv_dev_is_fc(device)`, `INQUIRY`, `MODE_SENSE`,
`MODE_SENSE_10`, `MAINTENANCE_IN`, `PERSISTENT_RESERVE_IN`, all of which
exist in every stable tree 6.6+. Patch is self-contained.

---

## Phase 4: Mailing List and External Research

**Step 4.1 — b4 dig**
Record: `b4 dig -c 9cf351b289fb2` matched by patch-id, single series,
only v1, URL
`https://lore.kernel.org/all/20260406015344.12566-1-litian@redhat.com`.
No v2 or further revisions — applied as submitted.

**Step 4.2 — b4 dig -w**
Record: Recipients included linux-scsi, linux-hyperv, Microsoft
maintainers (K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long
Li), James Bottomley, and Martin K. Petersen. The right audience
reviewed it.

**Step 4.3 — Thread contents (saved mbox)**
Record: Thread contains:
- Reviewed-by: Long Li: "Reviewed-by: Long Li <longli@microsoft.com>"
- Reviewed-by: Laurence Oberman: "Looks good, rewrite of how it was done
  before but will achieve the same behavior we wanted for the new
  addition for PR. Reviewed-by: Laurence Oberman."
- Martin K. Petersen applied to 7.1/scsi-staging and later 7.1/scsi-
  queue.
No NAKs, no stable-suitability concerns raised. No explicit `Cc: stable`
nomination, but also no objections.

**Step 4.4 — Series**
Record: Standalone single-patch series, no dependencies.

**Step 4.5 — Stable list history**
Record: Could not access lore.kernel.org/stable (bot protection). Web
search confirms the sibling MAINTENANCE_IN fix `b1aee7f034615` was
selected by AUTOSEL for 6.6.y, validating the pattern for stable
backport.

---

## Phase 5: Code Semantic Analysis

**Step 5.1–5.2 — Callers**
Record: `storvsc_on_io_completion` is called from `storvsc_on_receive`
for `VSTOR_OPERATION_COMPLETE_IO`, which is the primary completion path
for every SCSI command issued through storvsc. Every `sg_persist` or
other userspace PR issuer for a vFC LUN reaches this code.

**Step 5.3 — Callees**
Record: `hv_dev_is_fc()` is a trivial GUID comparison
(`HV_SYNTHFC_GUID`); no side effects.

**Step 5.4 — Reachability**
Record: Yes, trivially reachable from userspace: `/dev/sgN` ioctl →
PERSISTENT_RESERVE_IN CDB → storvsc → this completion path. Used by
`sg_persist`, `multipath`, cluster software.

**Step 5.5 — Similar patterns**
Record: Same pattern already applied for INQUIRY (pre-2014),
MODE_SENSE/MODE_SENSE_10 (`9eacec5d18f98`, Cc: stable), and
MAINTENANCE_IN (`b1aee7f034615`, AUTOSEL to 6.6.y).

---

## Phase 6: Stable Tree Analysis

**Step 6.1 — Bug presence in stable**
Record: Stricter SRB_STATUS_DATA_OVERRUN handling (812fe6420a6e7) is in
v6.6+. Therefore the PERSISTENT_RESERVE_IN regression exists in 6.6.y,
6.12.y, 6.15.y, 6.18.y and current trees.

**Step 6.2 — Backport complications**
Record: On stable trees that already have the MAINTENANCE_IN fix
(`b1aee7f034615`), the diff context matches exactly and should apply
cleanly with minor refactor. On any tree missing the MAINTENANCE_IN fix,
the refactor-into-helper form may need trivial adjustment. Given that
`b1aee7f034615` was AUTOSEL'd to 6.6.y, most 6.6+ stable trees should
have it.

**Step 6.3 — Related fixes already in stable**
Record: `b1aee7f034615` (MAINTENANCE_IN) and `9eacec5d18f98`
(MODE_SENSE_10) are already in stable. This completes the triad for vFC
command-mishandle workarounds.

---

## Phase 7: Subsystem Context

**Step 7.1 — Criticality**
Record: `drivers/scsi/storvsc_drv.c` — Hyper-V synthetic SCSI/FC driver.
IMPORTANT (used by every Linux VM running on Hyper-V/Azure that uses vFC
for storage — enterprise and cloud).

**Step 7.2 — Activity**
Record: Actively maintained; recent commits (MODE_SENSE_10,
MAINTENANCE_IN, queue/channel work). Not a stale subsystem.

---

## Phase 8: Impact and Risk

**Step 8.1 — Affected population**
Record: Linux VMs on Hyper-V/Azure using vFC passthrough with SCSI-3
persistent reservations (HA clusters, shared-storage coordination,
`sg_persist` consumers). Significant enterprise user base; not a niche
driver.

**Step 8.2 — Trigger**
Record: Any userspace call issuing PERSISTENT_RESERVE_IN (service action
READ_KEYS, READ_RESERVATION, REPORT_CAPABILITIES, READ_FULL_STATUS) to a
vFC LUN. Reproduces via `sg_persist -k /dev/sdX`. Unprivileged trigger:
limited by device permissions, but not a kernel-level security issue.

**Step 8.3 — Failure mode severity**
Record: User-visible I/O failure (sg_persist returns transport error,
multipath/cluster software fails to query/modify reservations). Severity
MEDIUM — no crash, no corruption, but broken functionality that blocks
clustering on Hyper-V vFC.

**Step 8.4 — Risk/benefit**
Record:
- Benefit: restores working SCSI-3 PR on Hyper-V vFC for all stable
  users on 6.6+.
- Risk: very low — change is gated on opcode == PERSISTENT_RESERVE_IN
  AND device is vFC. Non-vFC storvsc devices and non-PR opcodes are
  unaffected. The refactor is semantically equivalent for existing
  cases.
Ratio: favorable.

---

## Phase 9: Final Synthesis

**Step 9.1 — Evidence**
FOR: real user-visible bug (sg_persist failure), subsystem-maintainer +
independent reviewer sign-off, small single-file diff, follows
established pattern (MAINTENANCE_IN fix was AUTOSEL'd to 6.6.y), bug
present in all 6.6+ stable trees, no dependencies, low regression
surface.
AGAINST: includes a refactor-into-helper along with the fix (minor
cosmetic); missing explicit `Cc: stable` tag (but that's expected for
manually-reviewed candidates and also absent from the already-backported
MAINTENANCE_IN fix).

**Step 9.2 — Stable rules checklist**
1. Obviously correct & tested? YES (reviewed by MS + RH; switch is
   semantically equivalent for existing opcodes).
2. Fixes a real bug affecting users? YES (sg_persist on vFC).
3. Important issue? YES — breaks clustering/PR on a widely-used
   virtualization platform (MEDIUM-HIGH severity).
4. Small & contained? YES (+21/-11, one file).
5. No new features/APIs? YES (workaround only).
6. Applies to stable? YES (clean on 6.13+; likely clean on 6.6.y–6.12.y
   which already have the sibling MAINTENANCE_IN fix).

**Step 9.3 — Exception category**
Record: Falls under "hardware workaround / quirk" exception — the host-
side mishandling is effectively a device bug the driver compensates for.

**Step 9.4 — Decision**
YES.

---

### Verification
- [Phase 1] Parsed tags via Read of commit message: found 2 Reviewed-by,
  Link:, Signed-off-by chain. No Fixes/Cc:stable.
- [Phase 2] `git show 9cf351b289fb2`: confirmed +21/-11, single file,
  refactor + PERSISTENT_RESERVE_IN addition gated on
  `hv_dev_is_fc(device)`.
- [Phase 3] `git log --oneline --grep="MAINTENANCE_IN" --
  drivers/scsi/storvsc_drv.c`: found sibling fix `b1aee7f034615`.
- [Phase 3] `git show 812fe6420a6e`: confirmed this is the commit that
  introduced the stricter handling. `git describe --contains
  812fe6420a6e` → `v6.6-rc1~11^2~9^2` (v6.6).
- [Phase 3] `git describe --contains b1aee7f034615` → v6.13-rc2~7^2~1
  (MAINTENANCE_IN fix landed in v6.13).
- [Phase 3] `git show 9eacec5d18f98`: confirmed MODE_SENSE_10 handling
  fix explicitly had `Cc: stable@kernel.org` and `Fixes:` tag (similar
  hardware-mishandling pattern).
- [Phase 4] `b4 dig -c 9cf351b289fb2`: matched by patch-id, single
  version v1, lore URL obtained.
- [Phase 4] `b4 dig -c 9cf351b289fb2 -a`: only v1 exists (no v2/v3;
  applied as submitted).
- [Phase 4] `b4 dig -c 9cf351b289fb2 -w`: correct recipients (Microsoft
  storvsc maintainers + linux-scsi + linux-hyperv + MKP).
- [Phase 4] Saved thread to /tmp/thread.mbox and read it: found Long
  Li's Reviewed-by, Laurence Oberman's Reviewed-by ("Looks good, rewrite
  of how it was done before..."), and Martin K. Petersen's "Applied to
  7.1/scsi-queue". No NAKs, no concerns.
- [Phase 4] WebSearch confirmed `b1aee7f034615` was listed as "[PATCH
  AUTOSEL 6.6 17/23]" on lkml, validating that the AUTOSEL process
  already backported the sibling fix.
- [Phase 5] Grep for `storvsc_on_io_completion`: confirmed it is called
  from `storvsc_on_receive` on `VSTOR_OPERATION_COMPLETE_IO`, i.e.,
  every SCSI completion path.
- [Phase 5] Read `hv_dev_is_fc` (line 1928): trivial GUID match against
  `HV_SYNTHFC_GUID` — zero side effects.
- [Phase 6] v6.6 contains 812fe6420a6e (bug root cause is present in
  6.6.y, 6.12.y, 6.15.y, 6.18.y).
- [Phase 8] Failure mode: sg_persist returns DID_ERROR even when host
  returned data; breaks SCSI-3 PR on vFC — MEDIUM severity, broad impact
  on enterprise Hyper-V users.
- UNVERIFIED: Could not fetch lore.kernel.org/stable directly (Anubis
  bot protection), so I could not confirm whether this specific commit
  is already under stable discussion. This does not change the YES
  decision; it is based on technical merit.
- UNVERIFIED: Did not run an actual `git am` against linux-stable
  branches to confirm clean apply, but the code context (lines 1131–1180
  in storvsc_drv.c) matches the upstream base `ae1abab97835` (index in
  diff) and all referenced symbols (`INQUIRY`, `MODE_SENSE_10`,
  `MAINTENANCE_IN`, `PERSISTENT_RESERVE_IN`, `hv_dev_is_fc`) exist in
  6.13+ trees with the MAINTENANCE_IN fix already present.

This is a small, well-reviewed hardware-workaround fix for a real user-
visible regression (sg_persist/SCSI-3 PR broken on Hyper-V vFC since
v6.6), following an established pattern whose sibling was already
AUTOSEL'd to stable. Benefit clearly outweighs risk.

**YES**

 drivers/scsi/storvsc_drv.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index ae1abab97835b..6977ca8a06582 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1131,6 +1131,26 @@ static void storvsc_command_completion(struct storvsc_cmd_request *cmd_request,
 		kfree(payload);
 }

+/*
+ * The current SCSI handling on the host side does not correctly handle:
+ * INQUIRY with page code 0x80, MODE_SENSE / MODE_SENSE_10 with cmd[2] == 0x1c,
+ * and (for FC) MAINTENANCE_IN / PERSISTENT_RESERVE_IN passthrough.
+ */
+static bool storvsc_host_mishandles_cmd(u8 opcode, struct hv_device *device)
+{
+	switch (opcode) {
+	case INQUIRY:
+	case MODE_SENSE:
+	case MODE_SENSE_10:
+		return true;
+	case MAINTENANCE_IN:
+	case PERSISTENT_RESERVE_IN:
+		return hv_dev_is_fc(device);
+	default:
+		return false;
+	}
+}
+
 static void storvsc_on_io_completion(struct storvsc_device *stor_device,
 				  struct vstor_packet *vstor_packet,
 				  struct storvsc_cmd_request *request)
@@ -1141,22 +1161,12 @@ static void storvsc_on_io_completion(struct storvsc_device *stor_device,
 	stor_pkt = &request->vstor_packet;

 	/*
-	 * The current SCSI handling on the host side does
-	 * not correctly handle:
-	 * INQUIRY command with page code parameter set to 0x80
-	 * MODE_SENSE and MODE_SENSE_10 command with cmd[2] == 0x1c
-	 * MAINTENANCE_IN is not supported by HyperV FC passthrough
-	 *
 	 * Setup srb and scsi status so this won't be fatal.
 	 * We do this so we can distinguish truly fatal failues
 	 * (srb status == 0x4) and off-line the device in that case.
 	 */

-	if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
-	   (stor_pkt->vm_srb.cdb[0] == MODE_SENSE) ||
-	   (stor_pkt->vm_srb.cdb[0] == MODE_SENSE_10) ||
-	   (stor_pkt->vm_srb.cdb[0] == MAINTENANCE_IN &&
-	   hv_dev_is_fc(device))) {
+	if (storvsc_host_mishandles_cmd(stor_pkt->vm_srb.cdb[0], device)) {
 		vstor_packet->vm_srb.scsi_status = 0;
 		vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
 	}
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] scsi: lpfc: Remove unnecessary ndlp kref get in lpfc_check_nlp_post_devloss
       [not found] <20260428104133.2858589-1-sashal@kernel.org>
  2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-5.10] scsi: storvsc: Handle PERSISTENT_RESERVE_IN truncation for Hyper-V vFC Sasha Levin
@ 2026-04-28 10:40 ` Sasha Levin
  2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-6.18] scsi: ufs: ufs-pci: Add support for Intel Nova Lake Sasha Levin
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2026-04-28 10:40 UTC (permalink / raw)
  To: patches, stable
  Cc: Justin Tee, Martin K. Petersen, Sasha Levin, james.smart,
	dick.kennedy, jejb, linux-scsi, linux-kernel

From: Justin Tee <justin.tee@broadcom.com>

[ Upstream commit f6bfb8d149336661bb80e62980da9a45b920403c ]

When NLP_IN_RECOV_POST_DEV_LOSS is set, the initial node reference
remains held while recovery is in progress.  Taking a reference when
NLP_IN_RECOV_POST_DEV_LOSS is cleared results in an additional reference
being held.  This causes an extra reference when cleaning up lpfc_vport
instances.  Thus, remove the extraneous ndlp kref get in
lpfc_check_nlp_post_devloss.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20260212213008.149873-7-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Based on my thorough investigation, I can now complete my analysis.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Parse the Subject Line**
- Record: Subsystem `scsi: lpfc:`, action verb `Remove`, summary:
  removes an unnecessary `ndlp` kref get in
  `lpfc_check_nlp_post_devloss` that was leaving an extra reference
  after device-loss recovery.

**Step 1.2: Parse All Commit Message Tags**
- Record: `Signed-off-by: Justin Tee <justin.tee@broadcom.com>` (author
  is a Broadcom lpfc maintainer). `Link: https://patch.msgid.link/202602
  12213008.149873-7-justintee8345@gmail.com`. `Signed-off-by: Martin K.
  Petersen <martin.petersen@oracle.com>` (SCSI maintainer). NO `Fixes:`
  tag, NO `Reported-by:` tag, NO `Cc: stable`, NO `Reviewed-by`.

**Step 1.3: Analyze the Commit Body Text**
- Record: The commit describes a refcount leak - when
  `NLP_IN_RECOV_POST_DEV_LOSS` is set, the initial ndlp reference is
  *still held* (because current `lpfc_dev_loss_tmo_handler` does *not*
  put in the recovering path anymore). Taking another reference via
  `lpfc_nlp_get` in the "reverse" path therefore *adds* an extra
  reference. Symptom: "extra reference when cleaning up lpfc_vport
  instances". No crash/stack trace given.

**Step 1.4: Detect Hidden Bug Fixes**
- Record: Subject begins with "Remove unnecessary" (cleanup wording) but
  commit body explicitly says "extra reference when cleaning up
  lpfc_vport instances" - this IS a real bug fix for a ref leak (pattern
  7: reference counting bug).

## PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory the Changes**
- Record: One file changed: `drivers/scsi/lpfc/lpfc_hbadisc.c`. One line
  removed (`lpfc_nlp_get(ndlp);`). Scope: single-function surgical fix.

**Step 2.2: Understand the Code Flow Change**
- Record: Before - `lpfc_check_nlp_post_devloss` cleared the
  `NLP_IN_RECOV_POST_DEV_LOSS` flag, cleared `NLP_DROPPED`, then called
  `lpfc_nlp_get(ndlp)` to "restore" a put supposedly performed in
  `lpfc_dev_loss_tmo_handler`. After - the get is gone; the flag bits
  are still cleared; no refcount change is performed.

**Step 2.3: Identify the Bug Mechanism**
- Record: Category (c) reference counting fix - removing an extra
  `lpfc_nlp_get()`. The reason the get is wrong: commit `d1a2ef63fc8b3`
  ("scsi: lpfc: Fix kref imbalance on fabric ndlps from dev_loss_tmo
  handler", merged in v6.12) added an early `return fcf_inuse;` in the
  `recovering` branch, so `lpfc_nlp_put(ndlp)` is no longer executed
  when `NLP_IN_RECOV_POST_DEV_LOSS` is set. The matching `lpfc_nlp_get`
  in `lpfc_check_nlp_post_devloss` was left behind and now grabs an
  extra reference every time a fabric ndlp transiently hits dev_loss and
  recovers.

**Step 2.4: Assess the Fix Quality**
- Record: Minimal 1-line removal. Obvious correctness - just removes the
  stale counterpart of a no-longer-happening put. No new regression
  risk: removing an unbalanced get can only reduce references, it cannot
  cause a UAF (the ndlp reference held before dev_loss remains held
  because the put never happens in the recovering path).

## PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1: Blame the Changed Lines**
- Record: The `lpfc_nlp_get(ndlp);` being removed was originally
  introduced by `af984c87293b1` (Oct 2021, v5.17) as part of "scsi:
  lpfc: Allow fabric node recovery if recovery is in progress before
  devloss". At that time it was *correct* (paired with a put in the
  handler). The bug was introduced by `d1a2ef63fc8b3` (Sep 2024, v6.12)
  "scsi: lpfc: Fix kref imbalance on fabric ndlps from dev_loss_tmo
  handler", which added the early `return fcf_inuse;` that skips the
  `lpfc_nlp_put` in the recovering path but left the `lpfc_nlp_get` in
  `lpfc_check_nlp_post_devloss` dangling.

**Step 3.2: Follow the Fixes: Tag**
- Record: No `Fixes:` tag given. Based on code analysis, the actual
  Fixes: target is `d1a2ef63fc8b3`, present in v6.12 and forward.

**Step 3.3: Check File History for Related Changes**
- Record: Related recent commits: `07caedc6a3887` (Nov 2025) added the
  `clear_bit(NLP_DROPPED, ...);` to `lpfc_check_nlp_post_devloss`;
  `3f8f9f16f844a` converted save_flags to bitmask; `e07ac2d2aa5fc`
  removed unnecessary relocking. This is a standalone patch, part of a
  13-patch lpfc-14.4.0.14 update series (PATCH 6/13).

**Step 3.4: Check the Author's Other Commits**
- Record: Justin Tee (Broadcom) is the primary lpfc
  maintainer/contributor. Wrote `d1a2ef63fc8b3` (which introduced this
  bug) and has 30+ lpfc commits recently. Subject-matter expert.

**Step 3.5: Check for Dependent/Prerequisite Commits**
- Record: This fix is completely standalone for trees that contain
  `d1a2ef63fc8b3` (v6.12+). The only dependency is that the buggy commit
  is present. Can apply standalone.

## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

**Step 4.1: Find the Original Patch Discussion**
- Record: `b4 am` located the patch series at lore.kernel.org. Patch is
  "PATCH 6/13 lpfc: Remove unnecessary ndlp kref get in
  lpfc_check_nlp_post_devloss" from series "Update lpfc to revision
  14.4.0.14" posted 2026-02-12 against Martin's 6.20/scsi-queue tree. No
  review discussion found on this specific patch (no `Reviewed-by`, no
  NAKs, no stable suggestions).

**Step 4.2: Check Who Reviewed the Patch**
- Record: Series is a typical Broadcom/lpfc driver update going through
  SCSI maintainer Martin K. Petersen. No explicit per-patch review
  comments retrieved.

**Step 4.3: Search for the Bug Report**
- Record: No `Reported-by:` tag. The bug was author-discovered via code
  analysis, not a user report.

**Step 4.4: Check for Related Patches and Series**
- Record: This is patch 6 of a 13-patch series. Other patches include
  logging improvements, typecast changes, cleanup of `lpfc_fdmi_cmd`
  error paths, `txcmplq_cnt` fixes, NVMe abort cleanup on PCI reset, and
  version bump. This specific patch is self-contained.

**Step 4.5: Check Stable Mailing List History**
- Record: No prior discussion found.

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1: Identify Key Functions in the Diff**
- Record: `lpfc_check_nlp_post_devloss()` only.

**Step 5.2: Trace Callers**
- Record: Callers verified via grep - 5 total call sites:
  `lpfc_mbx_cmpl_fc_reg_login`, `lpfc_nlp_reg_node` (in
  `lpfc_hbadisc.c`), and 3 sites in `lpfc_els.c` around FLOGI handling.
  These are mainline FC discovery/login completion paths - called
  routinely during normal operation, particularly after link events.

**Step 5.3: Trace Callees**
- Record: The function calls `test_and_clear_bit`, `clear_bit`,
  `lpfc_nlp_get` (being removed), and `lpfc_printf_vlog` (logging). No
  I/O, no complex state changes.

**Step 5.4: Follow the Call Chain**
- Record: Reachable from any FC link-bounce/devloss path. Real-world
  triggered every time a fabric ndlp (Fabric_DID, FDMI_DID,
  NameServer_DID, Fabric_Cntl_DID) dev_loss_tmo-fires while recovery is
  still in progress - a common event on FC fabrics.

**Step 5.5: Search for Similar Patterns**
- Record: N/A - this is a specific function.

## PHASE 6: CROSS-REFERENCING AND STABLE TREE ANALYSIS

**Step 6.1: Does the Buggy Code Exist in Stable Trees?**
- Record: Verified via `git show
  <branch>:drivers/scsi/lpfc/lpfc_hbadisc.c`:
  - stable/linux-6.12.y: HAS BUG - `recovering` path has `return
    fcf_inuse;` (no put); `lpfc_check_nlp_post_devloss` has
    `lpfc_nlp_get(ndlp)` (extra ref).
  - stable/linux-6.17.y, 6.18.y, 6.19.y: HAS BUG - same pattern.
  - stable/linux-6.6.y: NO BUG - the `recovering` path in 6.6.y does NOT
    have the early return; the `lpfc_nlp_put(ndlp)` still executes at
    the end of fabric-node handling, so the `lpfc_nlp_get` correctly
    balances it. Fix must NOT be backported to 6.6.y.

**Step 6.2: Check for Backport Complications**
- Record: For 6.18.y and 6.19.y: patch applies cleanly (same surrounding
  context with `clear_bit(NLP_DROPPED, ...)`). For 6.17.y: needs minor
  context adjustment (no `clear_bit(NLP_DROPPED, ...)` line above). For
  6.12.y: function still uses the older `spin_lock_irqsave`/`save_flags
  &=` form; a manual adjustment (simply removing the
  `lpfc_nlp_get(ndlp);` line amidst different surrounding context) is
  needed but trivial.

**Step 6.3: Check If Related Fixes Are Already in Stable**
- Record: None. Buggy commit `d1a2ef63fc8b3` went into v6.12 directly
  (not backported). No alternate fix present.

## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT

**Step 7.1: Identify the Subsystem and Its Criticality**
- Record: `drivers/scsi/lpfc/` - Broadcom/Emulex Fibre Channel HBA
  driver. Criticality: IMPORTANT - used in enterprise storage/SAN
  setups. Reference leaks in a driver of this size matter to enterprise
  users running long-lived systems.

**Step 7.2: Assess Subsystem Activity**
- Record: Actively maintained (regular version bumps, multiple patches
  per quarter). Justin Tee / Broadcom are responsive maintainers.

## PHASE 8: IMPACT AND RISK ASSESSMENT

**Step 8.1: Determine Who Is Affected**
- Record: Affected population - users of lpfc-driven FC HBAs running
  v6.12+ stable kernels who experience fabric link bounces,
  zoning/fabric reconfiguration, or transient device-loss events on
  Fabric_DID/FDMI/NameServer/Fabric_Cntl ndlps.

**Step 8.2: Determine the Trigger Conditions**
- Record: Triggered every time a fabric ndlp enters dev_loss_tmo while
  still in a recovering discovery state. On busy fabrics with many
  vports, this can happen repeatedly; each occurrence leaks one ndlp
  reference.

**Step 8.3: Determine the Failure Mode Severity**
- Record: Failure mode - ndlp kref leak; ndlp objects cannot be freed
  during `lpfc_vport` cleanup; potential memory accumulation over time;
  can cause WARN messages or stuck/hang behavior on vport
  teardown/module unload. Severity: MEDIUM-HIGH - not an immediate crash
  or security issue but a persistent resource leak in a commonly-
  triggered code path in enterprise storage workloads.

**Step 8.4: Calculate Risk-Benefit Ratio**
- Record: Benefit - fixes a real ref leak introduced by a previous "fix"
  commit, benefit is MEDIUM-HIGH for lpfc users. Risk - 1-line removal,
  obvious correctness, no possibility of new UAF since we're only
  reducing (not adding) puts; risk VERY LOW. Ratio: clearly favorable
  for backport.

## PHASE 9: FINAL SYNTHESIS

**Step 9.1: Compile Evidence**
- FOR: Real kref leak bug in common code path; trivially small/obvious
  fix; author is subsystem maintainer who introduced the original bug;
  bug confirmed to exist in 6.12.y, 6.17.y, 6.18.y, 6.19.y via direct
  inspection; code analysis is unambiguous.
- AGAINST: No `Fixes:`, `Reported-by:`, `Cc: stable`; no user report of
  symptoms; a similar comment language ("put the initial ref count
  back") now stale but not updated.
- UNRESOLVED: Whether this is already in linux-next / scsi-queue for
  v6.21 (not verified directly, but the cover letter says it targets
  6.20/scsi-queue so likely on its way to mainline imminently).

**Step 9.2: Apply Stable Rules Checklist**
1. Obviously correct and tested? YES - trivial to verify by reading the
   two functions together; author has build/test history.
2. Fixes real bug affecting users? YES - refcount leak on every fabric-
   ndlp recovery event.
3. Important issue? MEDIUM-HIGH (resource leak in enterprise storage
   driver).
4. Small and contained? YES - one line removal.
5. No new features or APIs? YES - pure bug fix.
6. Applies to stable? YES for 6.12.y (with trivial manual adjustment)
   and 6.17.y+; must NOT go to 6.6.y and earlier.

**Step 9.3: Check for Exception Categories**
- N/A - normal bug fix, not a quirk/device ID/DT update/build fix.

**Step 9.4: Decision**
- This is a genuine reference-counting bug fix for a regression
  introduced by `d1a2ef63fc8b3` in v6.12. The fix is minimal, obviously
  correct, and affects a common code path in the lpfc FC driver used by
  enterprise storage. Risk is essentially zero; benefit prevents a real
  ref leak. Should be backported to 6.12+ stable trees (NOT to 6.6.y and
  earlier where the bug does not exist).

## Verification

- [Phase 1] Parsed tags on the supplied commit message: Signed-off-by
  Justin Tee / Martin K. Petersen, Link to patch.msgid.link. No
  Fixes/Reported-by/Cc-stable/Reviewed-by.
- [Phase 2] Read the diff: single 1-line removal of
  `lpfc_nlp_get(ndlp);` in `lpfc_check_nlp_post_devloss` within
  `drivers/scsi/lpfc/lpfc_hbadisc.c`.
- [Phase 3] `git log -L:lpfc_check_nlp_post_devloss` confirmed the get
  was introduced in `af984c87293b1` (v5.17) where it was correct (paired
  with a put).
- [Phase 3] `git show d1a2ef63fc8b3` - confirmed this commit added the
  early `return fcf_inuse;` in the recovering branch, breaking the
  get/put pairing.
- [Phase 3] `git log --author="Justin Tee"` - confirmed author is active
  lpfc maintainer.
- [Phase 3] `git tag --contains d1a2ef63fc8b3` - buggy commit is in
  v6.12 and beyond.
- [Phase 4] `b4 am 20260212213008.149873-7-justintee8345@gmail.com` -
  downloaded the series, confirmed it is patch 6/13 in "Update lpfc to
  revision 14.4.0.14" against 6.20/scsi-queue. No per-patch review
  discussion.
- [Phase 5] `grep lpfc_check_nlp_post_devloss` - 5 callers:
  `lpfc_mbx_cmpl_fc_reg_login`, `lpfc_nlp_reg_node`, 3 in `lpfc_els.c`
  FLOGI paths - all routine discovery paths.
- [Phase 6] `git show stable/linux-6.12.y:...lpfc_hbadisc.c` - confirmed
  bug exists (function has lpfc_nlp_get, recovering path has `return
  fcf_inuse;`).
- [Phase 6] `git show
  stable/linux-6.17.y/6.18.y/6.19.y:...lpfc_hbadisc.c` - same bug
  pattern confirmed in each.
- [Phase 6] `git show stable/linux-6.6.y:...lpfc_hbadisc.c` - confirmed
  bug does NOT exist (recovering path has no early return; put still
  fires).
- [Phase 8] Failure mode: ndlp kref leak on every fabric-ndlp dev_loss
  recovery event, MEDIUM-HIGH severity.
- UNVERIFIED: Whether any user has reported hangs/OOM from this specific
  leak - no such report exists, but the mechanism is clear from code
  analysis.
- UNVERIFIED: Exact mainline SHA of the commit (not yet in my local
  `master`), but the patch content matches what is on lore/in the
  14.4.0.14 series.

The fix is small, surgical, and addresses a real regression that exists
in multiple active stable trees (6.12+). Meets all stable kernel rules.
The adaptation for 6.12.y requires a trivial manual edit (different
surrounding context) but the change itself is a single-line removal.

**YES**

 drivers/scsi/lpfc/lpfc_hbadisc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index 8aaf05d7bb0af..d42b911a0aee1 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -425,7 +425,6 @@ lpfc_check_nlp_post_devloss(struct lpfc_vport *vport,
 {
 	if (test_and_clear_bit(NLP_IN_RECOV_POST_DEV_LOSS, &ndlp->save_flags)) {
 		clear_bit(NLP_DROPPED, &ndlp->nlp_flag);
-		lpfc_nlp_get(ndlp);
 		lpfc_printf_vlog(vport, KERN_INFO, LOG_DISCOVERY | LOG_NODE,
 				 "8438 Devloss timeout reversed on DID x%x "
 				 "refcnt %d ndlp %p flag x%lx "
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] scsi: ufs: ufs-pci: Add support for Intel Nova Lake
       [not found] <20260428104133.2858589-1-sashal@kernel.org>
  2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-5.10] scsi: storvsc: Handle PERSISTENT_RESERVE_IN truncation for Hyper-V vFC Sasha Levin
  2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-6.18] scsi: lpfc: Remove unnecessary ndlp kref get in lpfc_check_nlp_post_devloss Sasha Levin
@ 2026-04-28 10:40 ` Sasha Levin
  2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-6.1] scsi: lpfc: Fix incorrect txcmplq_cnt during cleanup in lpfc_sli_abort_ring() Sasha Levin
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2026-04-28 10:40 UTC (permalink / raw)
  To: patches, stable
  Cc: Adrian Hunter, Bart Van Assche, Martin K. Petersen, Sasha Levin,
	jejb, linux-scsi, linux-kernel

From: Adrian Hunter <adrian.hunter@intel.com>

[ Upstream commit 096cd6b7adf21791827a045d464242d93a6fd54e ]

Add PCI ID to support Intel Nova Lake, same as Intel Meteor Lake (MTL).

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260309085815.55216-1-adrian.hunter@intel.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have enough information to complete the analysis.

## Phase 1: Commit Message Forensics

**Step 1.1: Parse Subject Line**
- Record: subsystem `scsi: ufs: ufs-pci:`, action verb `Add`, summary:
  "Add support for Intel Nova Lake" — hardware enablement via PCI ID

**Step 1.2: Parse All Commit Message Tags**
- Record:
  - Signed-off-by: Adrian Hunter (Intel, author, UFS PCI driver
    maintainer-level contributor)
  - Reviewed-by: Bart Van Assche (prolific SCSI/UFS reviewer)
  - Link: https://patch.msgid.link/20260309085815.55216-1-
    adrian.hunter@intel.com
  - Signed-off-by: Martin K. Petersen (SCSI subsystem maintainer)
  - No `Fixes:` tag, no `Cc: stable` (absence is expected per
    guidelines)
  - No `Reported-by:`, no syzbot

**Step 1.3: Analyze Commit Body**
- Record: The body is one sentence: "Add PCI ID to support Intel Nova
  Lake, same as Intel Meteor Lake (MTL)." Explicitly states the new
  platform reuses the existing MTL variant ops. No bug described — this
  is hardware enablement.

**Step 1.4: Detect Hidden Bug Fixes**
- Record: Not a hidden bug fix. This is a straightforward new-hardware-
  enablement PCI ID addition — which is an explicit exception category
  in the stable rules.

## Phase 2: Diff Analysis

**Step 2.1: Inventory**
- Record: 1 file changed (`drivers/ufs/host/ufshcd-pci.c`), +1/-0 lines.
  Scope: single-file, single-line surgical addition. Modified table:
  `ufshcd_pci_tbl[]`.

**Step 2.2: Code Flow Change**
- Record: Before: PCI device 0xD335 (INTEL) not matched → driver would
  not bind. After: PCI device 0xD335 matches and uses
  `ufs_intel_mtl_hba_vops`. Only affects the specific new device.

**Step 2.3: Bug Mechanism**
- Record: Category (h) Hardware workarounds — device ID addition. No bug
  mechanism; enables existing driver logic for a new SKU.

**Step 2.4: Fix Quality**
- Record: Trivially correct — simply matches the vendor/device pair to
  an existing, tested vops struct (`ufs_intel_mtl_hba_vops`). Zero
  regression risk: entries are only evaluated for matching vendor/device
  PCI IDs, so no non-Nova-Lake system can be affected.

## Phase 3: Git History Investigation

**Step 3.1: Blame**
- Record: The MTL vops struct `ufs_intel_mtl_hba_vops` was introduced by
  commit `4049f7acef3eb` ("scsi: ufs: ufs-pci: Add support for Intel
  MTL", Apr 2022, v5.18), which carried `Cc: stable@vger.kernel.org #
  v5.15+`. The MTL infrastructure is therefore in every active stable
  tree (5.15.y and later).

**Step 3.2: Follow Fixes: Tag**
- Record: No Fixes: tag — N/A. This is an enablement, not a fix.

**Step 3.3: File History / Prerequisites**
- Record: The surrounding PCI table has accumulated several similar
  single-line additions: Arrow Lake (`51031cc3f903e`, v6.5), Lunar Lake
  (`0a07d3c7a1d20`, v6.4), Panther Lake (`bdee2f1dcd84d`, v6.11),
  Wildcat Lake (`823f95575d854`, 2025). Each is identical in structure:
  one PCI ID reusing MTL ops. This commit is self-contained and has no
  prerequisites.

**Step 3.4: Author's Other Commits**
- Record: Adrian Hunter (Intel) is the long-time author/maintainer of
  the Intel UFS PCI support code. All past Intel PCI ID additions for
  this driver are his. Strong authority signal.

**Step 3.5: Dependencies**
- Record: The only dependency is `ufs_intel_mtl_hba_vops`, which has
  existed in stable since v5.15+.

## Phase 4: Mailing List Research

**Step 4.1: Find Original Discussion**
- Record: `b4 dig -c 096cd6b7adf21` matched by patch-id and returned htt
  ps://lore.kernel.org/all/20260309085815.55216-1-
  adrian.hunter@intel.com/ (only a single v1, no iterations).

**Step 4.2: Reviewers**
- Record: Bart Van Assche reviewed (Reviewed-by), Martin K. Petersen
  applied (SCSI maintainer). Addressed to linux-scsi@vger.kernel.org.
  Appropriate maintainer chain.

**Step 4.3: Bug Report**
- Record: No Reported-by/bug report — N/A (enablement).

**Step 4.4: Related Patches**
- Record: `b4 dig -a` confirmed only v1; no multi-patch series.
  Standalone.

**Step 4.5: Stable-Specific Discussion**
- Record: No explicit Cc: stable request in thread, but thread is clean
  and contains an Reviewed-by from Bart and a clean apply message from
  Martin. No objections.

## Phase 5: Code Semantic Analysis

**Step 5.1: Functions Modified**
- Record: No functions modified — only the PCI device ID table
  `ufshcd_pci_tbl[]`.

**Step 5.2–5.4: Callers / Callees / Call Chain**
- Record: The table is consumed by the PCI core (`pci_match_id`) for
  driver binding. Reachability: only when a Nova Lake host with PCI ID
  8086:D335 is present. With no such device, the added row is dead data.

**Step 5.5: Similar Patterns**
- Record: Five identical prior commits add single MTL-compatible IDs
  (MTL itself, ARL, LNL, PTL, WCL). Consistent, well-established
  pattern.

## Phase 6: Cross-Referencing / Stable Tree Analysis

**Step 6.1: Buggy Code in Stable?**
- Record: The underlying driver + `ufs_intel_mtl_hba_vops` exist in all
  active stable trees (5.15.y and later, since v5.18 with Cc: stable #
  v5.15+).

**Step 6.2: Backport Complications**
- Record: Trivial clean apply expected — single-line addition to a table
  that exists in all active stable trees. Minor possibility of context
  fuzz if the table has slightly fewer entries in older branches (e.g.,
  pre-Wildcat Lake), but still trivial.

**Step 6.3: Related Fixes Already in Stable?**
- Record: Panther Lake and Wildcat Lake PCI ID additions already made it
  to autosel stable branches (per `git branch --contains`), confirming
  the stable trees routinely accept this class of single-line Intel UFS
  PCI ID enablement.

## Phase 7: Subsystem Context

**Step 7.1: Subsystem Criticality**
- Record: `drivers/ufs/host/` — device driver for UFS (Universal Flash
  Storage). Criticality: IMPORTANT — UFS is the primary storage on
  modern Intel mobile/client platforms, so without this ID the system's
  main storage doesn't work at all on Nova Lake.

**Step 7.2: Activity**
- Record: Actively maintained; multiple commits per release. Mature,
  stable interfaces.

## Phase 8: Impact and Risk Assessment

**Step 8.1: Affected Users**
- Record: Users of Intel Nova Lake systems running a stable kernel.
  Without this patch, the UFS controller simply won't bind → the system
  cannot use its UFS storage.

**Step 8.2: Trigger Conditions**
- Record: Universally triggered on Nova Lake hardware at boot/probe. No
  special conditions. For all other hardware the code is inert.

**Step 8.3: Failure Mode Severity**
- Record: Without the patch: UFS storage inoperative on Nova Lake. With
  the patch: storage works. Severity for affected platform: HIGH (no
  storage = unusable system); for all other systems: zero impact.

**Step 8.4: Benefit / Risk**
- Record: BENEFIT: High for target hardware users, zero-cost for others.
  RISK: Essentially zero — only a PCI ID entry with a pointer to an
  existing, heavily-tested vops struct already in stable. Ratio strongly
  favors backport.

## Phase 9: Final Synthesis

**Step 9.1: Evidence**
- FOR: One-line PCI ID addition (exception category #1 in stable rules);
  reuses existing in-tree and in-stable `ufs_intel_mtl_hba_vops`;
  reviewed by Bart Van Assche; applied by Martin K. Petersen; obvious
  correctness; zero regression risk for non-matching hardware; matches
  the pattern of prior backported Intel UFS PCI ID additions (Arrow
  Lake, Lunar Lake, Panther Lake, Wildcat Lake).
- AGAINST: None substantive. No explicit Cc: stable (but that's expected
  — would not be in review otherwise).

**Step 9.2: Stable Rules Checklist**
1. Obviously correct and tested? YES — reviewed and applied by the
   maintainer; trivial addition.
2. Fixes a real problem for users? YES — UFS storage non-functional on
   Nova Lake without it.
3. Important? YES — primary storage on target platform.
4. Small & contained? YES — 1 line.
5. No new features/APIs? Correct — no user-visible API or new behavior;
   enables existing code path for a new device.
6. Applies to stable? YES — clean apply expected.

**Step 9.3: Exception Category**
- Record: Falls squarely under Exception Category #1 "NEW DEVICE IDs" —
  adding a PCI ID to an existing driver where the driver and the reused
  vops are already in stable.

**Step 9.4: Decision**
- YES. This is a textbook hardware enablement PCI ID addition that meets
  the explicit stable rule exception.

---

## Verification

- [Phase 1] Parsed tags from commit message: Signed-off-by (Adrian
  Hunter, Martin K. Petersen), Reviewed-by (Bart Van Assche), Link to
  patch.msgid.link. No Fixes:, no Cc: stable, no Reported-by.
- [Phase 2] `git show 096cd6b7adf21` confirmed: 1 file, +1/-0 line,
  single new entry `{ PCI_VDEVICE(INTEL, 0xD335),
  (kernel_ulong_t)&ufs_intel_mtl_hba_vops }` in `ufshcd_pci_tbl[]`.
- [Phase 3] `git log -- drivers/ufs/host/ufshcd-pci.c` showed history of
  prior Intel PCI ID additions (MTL, ADL, ARL, LNL, PTL, WCL).
- [Phase 3] `git show 4049f7acef3eb` confirmed the MTL support
  (introducing `ufs_intel_mtl_hba_vops`) carried `Cc:
  stable@vger.kernel.org # v5.15+`, so the vops exist in every active
  stable tree.
- [Phase 3] `git branch --contains 823f95575d854` and `git branch
  --contains bdee2f1dcd84d` showed Wildcat Lake and Panther Lake PCI ID
  patches already in `for-greg/*-200` autosel branches — precedent for
  backporting this exact class of patch.
- [Phase 3] Read `drivers/ufs/host/ufshcd-pci.c` lines 510–719:
  confirmed `ufs_intel_mtl_hba_vops` is defined at line 525 and already
  used by multiple MTL-family device IDs.
- [Phase 4] `b4 dig -c 096cd6b7adf21`: matched by patch-id to https://lo
  re.kernel.org/all/20260309085815.55216-1-adrian.hunter@intel.com/.
- [Phase 4] `b4 dig -c 096cd6b7adf21 -a`: confirmed only v1, no
  subsequent revisions.
- [Phase 4] Mbox inspection: Martin K. Petersen applied ("Applied to
  7.1/scsi-queue"), Bart Van Assche provided Reviewed-by. No NAKs, no
  objections.
- [Phase 5] Inspection of `ufshcd_pci_tbl[]` surroundings: no functions
  modified, only table data; table is consumed by PCI core during driver
  binding.
- [Phase 6] Confirmed `ufs_intel_mtl_hba_vops` is present in stable
  trees via the v5.15+ stable tag on commit 4049f7acef3eb.
- [Phase 8] Failure mode analysis: without patch, Nova Lake UFS
  controller does not bind → no storage on that platform; with patch, it
  binds using the already-in-stable MTL vops. No risk to unrelated
  systems because the PCI core only dispatches to this driver on
  matching vendor:device.
- UNVERIFIED: Did not fetch the lore.kernel.org thread via WebFetch
  (Anubis challenge page returned); mitigated by reading the thread
  contents directly from the b4-saved mbox
  (`/tmp/nova_lake_thread.mbox`), which contains the maintainer apply
  message and Bart's Reviewed-by.

The commit is a minimal, low-risk, standards-compliant PCI device ID
addition that enables UFS storage on a new Intel platform and reuses
infrastructure already in all active stable trees. It fits the explicit
"Adding PCI/USB/ACPI IDs to existing drivers" exception listed in the
stable kernel rules, and matches the precedent of prior autosel-ed Intel
UFS PCI ID additions (Panther Lake, Wildcat Lake).

**YES**

 drivers/ufs/host/ufshcd-pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ufs/host/ufshcd-pci.c b/drivers/ufs/host/ufshcd-pci.c
index 5f65dfad1a71a..63f6b36b912fc 100644
--- a/drivers/ufs/host/ufshcd-pci.c
+++ b/drivers/ufs/host/ufshcd-pci.c
@@ -695,6 +695,7 @@ static const struct pci_device_id ufshcd_pci_tbl[] = {
 	{ PCI_VDEVICE(INTEL, 0x7747), (kernel_ulong_t)&ufs_intel_mtl_hba_vops },
 	{ PCI_VDEVICE(INTEL, 0xE447), (kernel_ulong_t)&ufs_intel_mtl_hba_vops },
 	{ PCI_VDEVICE(INTEL, 0x4D47), (kernel_ulong_t)&ufs_intel_mtl_hba_vops },
+	{ PCI_VDEVICE(INTEL, 0xD335), (kernel_ulong_t)&ufs_intel_mtl_hba_vops },
 	{ }	/* terminate list */
 };
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH AUTOSEL 7.0-6.1] scsi: lpfc: Fix incorrect txcmplq_cnt during cleanup in lpfc_sli_abort_ring()
       [not found] <20260428104133.2858589-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-6.18] scsi: ufs: ufs-pci: Add support for Intel Nova Lake Sasha Levin
@ 2026-04-28 10:40 ` Sasha Levin
  2026-04-28 10:40 ` [PATCH AUTOSEL 7.0] scsi: virtio_scsi: Move INIT_WORK calls to virtscsi_probe() Sasha Levin
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2026-04-28 10:40 UTC (permalink / raw)
  To: patches, stable
  Cc: Justin Tee, Martin K. Petersen, Sasha Levin, james.smart,
	dick.kennedy, jejb, linux-scsi, linux-kernel

From: Justin Tee <justin.tee@broadcom.com>

[ Upstream commit 2da10bcaa58a389ca60f8e788180e0dca00739bc ]

When a port is offline in lpfc_sli_abort_ring, the phba->txcmplq is
cleared but the phba->txcmplq_cnt is not reset to zero.  This can
sometimes result in a phba->txcmplq_cnt that never reaches zero, which
hangs the cleanup process.

Update lpfc_sli_abort_ring so that txcmplq_cnt is reset to zero and also
ensure that the LPFC_IO_ON_TXCMPLQ flag is properly cleared.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20260212213008.149873-9-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have enough information to write a complete analysis.

# Analysis: scsi: lpfc: Fix incorrect txcmplq_cnt during cleanup in
lpfc_sli_abort_ring()

## PHASE 1: COMMIT MESSAGE FORENSICS

### Step 1.1: Parse Subject Line
- Subsystem prefix: `scsi: lpfc:`
- Action verb: "Fix" (explicit bug fix)
- Record: SCSI/lpfc driver fix for incorrect `txcmplq_cnt` tracking
  during cleanup in `lpfc_sli_abort_ring()` (actual function is
  `lpfc_sli_abort_iocb_ring`)

### Step 1.2: Parse Commit Message Tags
- **Signed-off-by:** Justin Tee (author, Broadcom lpfc maintainer)
- **Link:** https://patch.msgid.link/20260212213008.149873-9-
  justintee8345@gmail.com (lore mailing list archive)
- **Signed-off-by:** Martin K. Petersen (SCSI maintainer)
- No Fixes: tag (expected for manual review candidates)
- No Cc: stable (expected for manual review candidates)
- No Reported-by tags
- Record: Standard upstream flow through SCSI maintainer; two SOB chain
  indicating proper review path.

### Step 1.3: Analyze Commit Body
- Bug: When port is offline (`pci_channel_offline`), `phba->txcmplq`
  list is cleared via `list_splice_init()` but `phba->txcmplq_cnt` is
  NOT reset to zero
- Symptom: "can sometimes result in a phba->txcmplq_cnt that never
  reaches zero, which hangs the cleanup process"
- Fix: Reset `txcmplq_cnt` to zero and clear `LPFC_IO_ON_TXCMPLQ` flag
  on iocbs
- Record: Bug causes cleanup hang during PCI channel offline (EEH error
  recovery); the author clearly understood the root cause

### Step 1.4: Hidden Bug Fix Detection
- This is an EXPLICIT bug fix ("Fix incorrect"), not disguised
- Record: Not a hidden fix; clearly labeled as bug fix

## PHASE 2: DIFF ANALYSIS

### Step 2.1: Inventory Changes
- 1 file: `drivers/scsi/lpfc/lpfc_sli.c`
- Net: -18 lines (24 insertions, 42 deletions)
- Only function modified: `lpfc_sli_abort_iocb_ring()`
- Record: Single-file surgical fix with refactoring consolidation

### Step 2.2: Code Flow Change
**BEFORE (offline path, both SLI_REV3 and SLI_REV4):**
- Held appropriate lock, splice `txcmplq` → local `txcmplq_completions`
- Did NOT reset `pring->txcmplq_cnt`
- Did NOT clear `LPFC_IO_ON_TXCMPLQ` flag on each iocb

**AFTER:**
- Single `plock` pointer (ring_lock or hbalock based on sli_rev)
- Consolidated SLI3/SLI4 duplicated blocks into one
- For offline: clears `LPFC_IO_ON_TXCMPLQ` flag on each iocb, splices to
  `tx_completions`, **resets `pring->txcmplq_cnt = 0`**

### Step 2.3: Bug Mechanism
Classification: **Logic/correctness fix + refactoring**
- Missing counter reset: `pring->txcmplq_cnt = 0` when list is cleared
- Missing flag clearing: `iocb->cmd_flag &= ~LPFC_IO_ON_TXCMPLQ`
- Record: Offline splice path never decremented counter or cleared per-
  iocb flag, causing stuck counter

### Step 2.4: Fix Quality
- Follows identical pattern established in `lpfc_hba_down_post_s4()`
  lines 4705/4709 and `lpfc_hba_down_post_s3()` lines 4731/4735 which
  already do both (flag clear + count reset)
- Refactoring is mechanical - no change in lock semantics (still uses
  `pring->ring_lock` for SLI4, `phba->hbalock` for SLI3)
- Same `lpfc_sli_cancel_iocbs()` called on the iocbs as before
- Record: Fix quality high; pattern matches existing correct code

## PHASE 3: GIT HISTORY INVESTIGATION

### Step 3.1: Blame Analysis
```
a4691038b4071 (James Smart 2022-03-16) introduced the offline branch
```
- Buggy offline handling added in v5.18 (commit `a4691038b4071f` -
  "scsi: lpfc: Fix unload hang after back to back PCI EEH faults")
- Record: Bug present since v5.18; code in many stable trees (v5.18,
  v5.19, v6.0, v6.1.y, v6.6.y, v6.12.y)

### Step 3.2: Follow Fixes: Tag
- No Fixes: tag present
- Root cause commit identified via blame: `a4691038b4071f` is in v5.18
- Record: Original commit a4691038 went into v5.18 and IS present in
  stable trees

### Step 3.3: File History
- `lpfc_sli.c` actively developed; recent commits mostly lpfc version
  updates
- No intermediate fix attempts found for `txcmplq_cnt` issue
- Record: Standalone fix, not part of larger series

### Step 3.4: Author Context
- Justin Tee is the primary lpfc maintainer at Broadcom with many
  commits to this driver
- Record: Author is subsystem maintainer - strong credibility signal

### Step 3.5: Dependencies
- Self-contained change to one function
- Uses existing helpers (`list_splice_init`, `lpfc_sli_cancel_iocbs`,
  `lpfc_sli_issue_abort_iotag`) that exist in all stable trees
- Record: No dependencies; applies standalone

## PHASE 4: MAILING LIST RESEARCH

### Step 4.1: Find Original Discussion
- `b4 dig` found: https://lore.kernel.org/all/20260212213008.149873-9-
  justintee8345@gmail.com/
- Subject: [PATCH 08/13] lpfc: Fix incorrect txcmplq_cnt during cleanup
  in lpfc_sli_abort_ring
- Part of series: "Update lpfc to revision 14.4.0.14"
- Record: Only v1 submitted; no review feedback or revisions

### Step 4.2: Reviewers
- `b4 dig -w` shows: linux-scsi@vger.kernel.org, jsmart833426@gmail.com
  (James Smart - original lpfc author), justin.tee@broadcom.com
- Applied by Martin K. Petersen (SCSI maintainer)
- Record: Proper review through SCSI subsystem

### Step 4.3: Bug Report
- No Reported-by or bug reports linked; found via internal
  testing/analysis
- Record: No external bug report

### Step 4.4: Related Patches
- Series "Update lpfc to revision 14.4.0.14" contains mix of fixes and
  improvements
- This specific patch (08/13) is an independent bug fix
- Record: Standalone bug fix within larger maintenance series

### Step 4.5: Stable Mailing List
- No stable-specific discussion found
- Not explicitly Cc'd to stable
- Record: Standard flow, no stable discussion

## PHASE 5: CODE SEMANTIC ANALYSIS

### Step 5.1: Key Functions
- Modified: `lpfc_sli_abort_iocb_ring()`
- Record: Single function modified

### Step 5.2: Callers
- `lpfc_sli_abort_iocb_ring` called from:
  - `lpfc_sli_abort_fcp_rings` (line 4643) — called from EEH/PCI error
    recovery: `lpfc_sli_prep_dev_for_recover` (line 14285),
    `lpfc_sli4_prep_dev_for_recover` (line 15105),
    `lpfc_handle_eratt_s3` at lpfc_init.c:1715 and 1830
  - `lpfc_sli_hba_iocb_abort` (line 12605) — called from controller
    fatal error handlers
  - `lpfc_hba_down_post_s3` (lpfc_init.c:1028 and 1046) — called during
    HBA shutdown
- Record: Called from critical error recovery paths and shutdown paths

### Step 5.3: Callees
- `lpfc_fabric_abort_hba` - aborts fabric commands
- `list_splice_init` - moves list elements
- `lpfc_sli_issue_abort_iotag` - issues ABTS
- `lpfc_sli_cancel_iocbs` - cancels iocbs on list (calls cmd_cmpl or
  releases)
- `lpfc_issue_hb_tmo` - heartbeat timer
- Record: Standard SLI cleanup primitives

### Step 5.4: Call Chain & Reachability
- Triggered by PCI EEH (Enhanced Error Handling) errors → common on IBM
  POWER systems, enterprise PCIe AER environments
- Also reachable via module unload, HBA controller reset, firmware
  errors
- `pci_channel_offline=true` triggers the buggy branch (used in PCI
  error recovery callbacks)
- Record: Reachable from real hardware error recovery paths on
  enterprise systems

### Step 5.5: Similar Patterns
- `lpfc_hba_down_post_s4()` at line 4700-4709: correctly does flag clear
  + `txcmplq_cnt = 0`
- `lpfc_hba_down_post_s3()` at line 4726-4735: correctly does flag clear
  + `txcmplq_cnt = 0`
- `__lpfc_nvme_ls_abort_outstanding_reqs`-style code in
  lpfc_nvme.c:2873-2878: clears flag and decrements `txcmplq_cnt` per-
  iocb
- Record: Correct pattern exists elsewhere; this fix brings
  `lpfc_sli_abort_iocb_ring` into consistency with established codebase
  patterns

## PHASE 6: STABLE TREE ANALYSIS

### Step 6.1: Buggy Code in Stable Trees
- Verified v6.6 has buggy code (same structure, missing txcmplq_cnt
  reset and flag clear)
- Verified v6.12 has buggy code
- Verified v6.1 has buggy code
- v5.15 did NOT yet have offline branch (introduced v5.18)
- Record: Bug present in v6.1.y, v6.6.y, v6.12.y, v6.18.y, v6.19.y and
  other active stable trees derived from v5.18+

### Step 6.2: Backport Complications
- Function signature and structure are nearly identical in v6.1 and v6.6
- The minimal bug fix (adding `txcmplq_cnt = 0` and flag clearing loop)
  would apply cleanly
- The full refactor (consolidating plock) may require small adjustments
  in older trees but is still straightforward
- Record: Clean apply expected; minor adjustments possible for older
  trees

### Step 6.3: Related Fixes in Stable
- No prior fix for this specific issue found in stable
- Record: First fix for this bug

## PHASE 7: SUBSYSTEM CONTEXT

### Step 7.1: Subsystem Criticality
- `drivers/scsi/lpfc` - Emulex LightPulse Fibre Channel HBA driver
- Criticality: IMPORTANT - used widely in enterprise storage (SAN)
  deployments
- Common on enterprise servers; fibre channel storage is critical data
  path
- Record: IMPORTANT criticality for enterprise SCSI/SAN users

### Step 7.2: Activity Level
- Actively developed by Broadcom team, regular updates
- Record: Active, well-maintained driver with regular fixes

## PHASE 8: IMPACT AND RISK ASSESSMENT

### Step 8.1: Affected Population
- Users of Emulex/Broadcom LightPulse FC HBAs running in
  enterprise/datacenter environments
- Especially affected: systems using PCI EEH error recovery (IBM POWER,
  modern x86 with AER)
- Record: Enterprise SCSI/FC users; driver-specific

### Step 8.2: Trigger Conditions
- Primary: PCI channel goes offline (EEH/AER error recovery)
- Secondary: HBA controller hardware error during operation
- Cannot be triggered by unprivileged users (kernel-internal error path)
- Record: Error recovery path; infrequent but occurs on real enterprise
  hardware faults

### Step 8.3: Failure Mode Severity
- When triggered, `pring->txcmplq_cnt` remains positive indefinitely
- `lpfc_nvme_lport_unreg_wait` (lpfc_nvme.c:2252, confirmed) waits for
  this counter to reach 0
- Loop indefinitely prints "wait timed out. Pending %d... Renewing"
  every 10 seconds
- Effectively **hangs cleanup** (module unload, lport unregistration,
  recovery completion)
- Severity: **HIGH** — system task hang during error recovery, affects
  ability to recover from hardware faults
- Record: HIGH severity — cleanup hang during EEH recovery

### Step 8.4: Risk-Benefit
- **Benefit**: Fixes real hang in error recovery path on enterprise
  systems; aligns with established correct pattern
- **Risk**: Refactoring increases scope beyond minimum (24+/42- lines)
  but is clean and functionally equivalent apart from fix; lock usage
  preserved; no behavior change outside the bug fix
- Ratio: Favorable — meaningful fix, low regression risk

## PHASE 9: FINAL SYNTHESIS

### Step 9.1: Evidence Summary

**FOR Backporting:**
- Explicit bug fix with clear commit message explaining the hang
- Fix follows established correct pattern in sibling functions
  (`lpfc_hba_down_post_s3/s4`)
- Author is subsystem maintainer (Justin Tee at Broadcom)
- Applied through proper SCSI maintainer (Martin K. Petersen)
- Bug affects error recovery path → when triggered, causes system hang
- Reachable via common PCI EEH error recovery
- Contained to single file, single function
- Bug present in all active stable trees (v5.18+)
- No new APIs, no userspace-visible changes

**AGAINST Backporting:**
- Change is 66 lines in diff (larger than pure minimal fix would be)
- Refactoring consolidates SLI3/SLI4 paths beyond what's strictly needed
  for the fix
- No explicit stable nomination or Cc: stable
- No Fixes: tag (common for this reviewer workflow, not decisive)

### Step 9.2: Stable Rules Checklist
1. **Obviously correct**: YES — follows existing pattern in same file
2. **Fixes real bug**: YES — cleanup hang described
3. **Important issue**: YES — system hang during error recovery (HIGH
   severity)
4. **Small and contained**: MOSTLY — 66 lines, single function, but more
   than strictly minimal
5. **No new features/APIs**: YES — pure fix + refactor
6. **Applies to stable**: YES — expected to apply cleanly to v6.1.y,
   v6.6.y, v6.12.y, v6.18.y

### Step 9.3: Exception Categories
- Not a device ID addition
- Not a pure quirk (though it's a driver fix)
- Normal bug fix category

### Step 9.4: Decision
Benefits clearly outweigh risks. The bug is a legitimate system hang on
a real error recovery path, the fix is well-understood and pattern-
consistent with existing correct code. The refactoring is clean and
preserves lock semantics. Enterprise storage users affected by PCI error
recovery scenarios (IBM POWER, AER-enabled x86) will benefit.

## Verification

- **[Phase 1]** Parsed commit message tags: Signed-off-by chain (Justin
  Tee → Martin K. Petersen), Link: to patch.msgid.link/lore, no Fixes:
  or Cc: stable (expected)
- **[Phase 1]** Confirmed author is subsystem maintainer via `git log
  --author="Justin Tee"` showing many lpfc commits
- **[Phase 2]** Diff analysis: `git show 2da10bcaa58a3` confirmed 66
  lines changed in lpfc_sli_abort_iocb_ring, single function
- **[Phase 2]** Read current buggy code at
  `drivers/scsi/lpfc/lpfc_sli.c:4571-4631` — confirmed
  `pring->txcmplq_cnt` never reset in offline path
- **[Phase 3]** `git blame -L 4577,4630 drivers/scsi/lpfc/lpfc_sli.c` —
  buggy offline code introduced by `a4691038b4071f` (James Smart,
  2022-03-16)
- **[Phase 3]** `git describe --contains a4691038b4071f` →
  v5.18-rc2~14^2~11^2~20 (buggy code in v5.18+)
- **[Phase 3]** `git show --stat a4691038b4071f` confirmed original
  commit was "Fix unload hang after back to back PCI EEH faults"
- **[Phase 4]** `b4 dig -c 2da10bcaa58a3` found original submission
  lore.kernel.org/all/20260212213008.149873-9-justintee8345@gmail.com
- **[Phase 4]** `b4 dig -c 2da10bcaa58a3 -a` showed only v1 version, no
  revisions
- **[Phase 4]** `b4 dig -c 2da10bcaa58a3 -w` confirmed linux-
  scsi@vger.kernel.org and jsmart833426@gmail.com (James Smart) included
- **[Phase 4]** Read mbox thread /tmp/lpfc_thread.mbox — no reviewer
  replies on PATCH 08/13; no stable-related discussion (`grep -E
  "stable|backport" /tmp/lpfc_thread.mbox` returned nothing)
- **[Phase 5]** `grep txcmplq_cnt` confirmed counter used in
  lpfc_nvme.c:2252 for wait loop in lpfc_nvme_lport_unreg_wait; also
  used for watermarks (21704) and busy stats (21634)
- **[Phase 5]** Read `lpfc_nvme_lport_unreg_wait` at
  lpfc_nvme.c:2219-2280 — confirmed it loops forever printing "Renewing"
  if pending (txcmplq_cnt) never hits zero
- **[Phase 5]** Read lines 4690-4744 of lpfc_sli.c — confirmed identical
  pattern (flag clear + txcmplq_cnt=0) already exists in
  `lpfc_hba_down_post_s4` and `lpfc_hba_down_post_s3`
- **[Phase 5]** Read lpfc_nvme.c:2870-2880 confirmed similar pattern
  (flag clear + counter decrement) for NVMe LS abort
- **[Phase 5]** Confirmed callers via `grep lpfc_sli_abort_iocb_ring`
  and `grep lpfc_sli_abort_fcp_rings|lpfc_sli_hba_iocb_abort` — called
  from EEH recovery (`lpfc_sli_prep_dev_for_recover`,
  `lpfc_sli4_prep_dev_for_recover`), error handlers
  (`lpfc_handle_eratt_s3`), HBA abort
- **[Phase 5]** Verified `pci_channel_offline(phba->pcidev)` used in
  line 4582 triggers the buggy branch; it's true during PCI EEH error
  recovery
- **[Phase 6]** `git show v6.6:drivers/scsi/lpfc/lpfc_sli.c` confirmed
  buggy code present in v6.6
- **[Phase 6]** `git show v6.12:drivers/scsi/lpfc/lpfc_sli.c` confirmed
  buggy code present in v6.12
- **[Phase 6]** `git show v6.1:drivers/scsi/lpfc/lpfc_sli.c` confirmed
  buggy code present in v6.1
- **[Phase 6]** `git show v5.15:drivers/scsi/lpfc/lpfc_sli.c` confirmed
  offline branch NOT in v5.15 (bug introduced later in v5.18)
- **[Phase 7]** Subsystem identified as `drivers/scsi/lpfc` — enterprise
  FC HBA driver
- **[Phase 8]** Failure mode confirmed: cleanup path
  (`lpfc_nvme_lport_unreg_wait`) hangs indefinitely when `txcmplq_cnt`
  doesn't reach 0
- **UNVERIFIED**: Could not build the stable trees or execute runtime
  tests; relied on code structure analysis
- **UNVERIFIED**: Did not check every stable tree for the exact line-
  level conflict state (e.g., v5.18.y EOL); only checked v6.1, v6.6,
  v6.12 tags

The bug is a real hang in cleanup paths during PCI EEH error recovery on
enterprise systems using Broadcom Emulex FC HBAs. The fix follows the
established correct pattern already present in sibling functions in the
same file. While the change includes refactoring that goes slightly
beyond the minimal fix, the refactoring is clean and preserves lock
semantics. The fix has clear technical merit and real user impact on
enterprise storage environments.

**YES**

 drivers/scsi/lpfc/lpfc_sli.c | 66 +++++++++++++-----------------------
 1 file changed, 24 insertions(+), 42 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 303523f754b86..ad5b0e60acc5a 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -4572,59 +4572,41 @@ void
 lpfc_sli_abort_iocb_ring(struct lpfc_hba *phba, struct lpfc_sli_ring *pring)
 {
 	LIST_HEAD(tx_completions);
-	LIST_HEAD(txcmplq_completions);
+	spinlock_t *plock;		/* for transmit queue access */
 	struct lpfc_iocbq *iocb, *next_iocb;
 	int offline;
 
-	if (pring->ringno == LPFC_ELS_RING) {
+	if (phba->sli_rev >= LPFC_SLI_REV4)
+		plock = &pring->ring_lock;
+	else
+		plock = &phba->hbalock;
+
+	if (pring->ringno == LPFC_ELS_RING)
 		lpfc_fabric_abort_hba(phba);
-	}
+
 	offline = pci_channel_offline(phba->pcidev);
 
-	/* Error everything on txq and txcmplq
-	 * First do the txq.
-	 */
-	if (phba->sli_rev >= LPFC_SLI_REV4) {
-		spin_lock_irq(&pring->ring_lock);
-		list_splice_init(&pring->txq, &tx_completions);
-		pring->txq_cnt = 0;
+	/* Cancel everything on txq */
+	spin_lock_irq(plock);
+	list_splice_init(&pring->txq, &tx_completions);
+	pring->txq_cnt = 0;
 
-		if (offline) {
-			list_splice_init(&pring->txcmplq,
-					 &txcmplq_completions);
-		} else {
-			/* Next issue ABTS for everything on the txcmplq */
-			list_for_each_entry_safe(iocb, next_iocb,
-						 &pring->txcmplq, list)
-				lpfc_sli_issue_abort_iotag(phba, pring,
-							   iocb, NULL);
-		}
-		spin_unlock_irq(&pring->ring_lock);
+	if (offline) {
+		/* Cancel everything on txcmplq */
+		list_for_each_entry_safe(iocb, next_iocb, &pring->txcmplq, list)
+			iocb->cmd_flag &= ~LPFC_IO_ON_TXCMPLQ;
+		list_splice_init(&pring->txcmplq, &tx_completions);
+		pring->txcmplq_cnt = 0;
 	} else {
-		spin_lock_irq(&phba->hbalock);
-		list_splice_init(&pring->txq, &tx_completions);
-		pring->txq_cnt = 0;
-
-		if (offline) {
-			list_splice_init(&pring->txcmplq, &txcmplq_completions);
-		} else {
-			/* Next issue ABTS for everything on the txcmplq */
-			list_for_each_entry_safe(iocb, next_iocb,
-						 &pring->txcmplq, list)
-				lpfc_sli_issue_abort_iotag(phba, pring,
-							   iocb, NULL);
-		}
-		spin_unlock_irq(&phba->hbalock);
+		/* Issue ABTS for everything on the txcmplq */
+		list_for_each_entry_safe(iocb, next_iocb, &pring->txcmplq, list)
+			lpfc_sli_issue_abort_iotag(phba, pring, iocb, NULL);
 	}
+	spin_unlock_irq(plock);
 
-	if (offline) {
-		/* Cancel all the IOCBs from the completions list */
-		lpfc_sli_cancel_iocbs(phba, &txcmplq_completions,
-				      IOSTAT_LOCAL_REJECT, IOERR_SLI_ABORTED);
-	} else {
-		/* Make sure HBA is alive */
+	if (!offline)
 		lpfc_issue_hb_tmo(phba);
-	}
+
 	/* Cancel all the IOCBs from the completions list */
 	lpfc_sli_cancel_iocbs(phba, &tx_completions, IOSTAT_LOCAL_REJECT,
 			      IOERR_SLI_ABORTED);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH AUTOSEL 7.0] scsi: virtio_scsi: Move INIT_WORK calls to virtscsi_probe()
       [not found] <20260428104133.2858589-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-6.1] scsi: lpfc: Fix incorrect txcmplq_cnt during cleanup in lpfc_sli_abort_ring() Sasha Levin
@ 2026-04-28 10:40 ` Sasha Levin
  2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-6.6] scsi: ufs: core: Disable timestamp for Kioxia THGJFJT0E25BAIP Sasha Levin
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2026-04-28 10:40 UTC (permalink / raw)
  To: patches, stable
  Cc: Joshua Daley, Stefan Hajnoczi, Martin K. Petersen, Sasha Levin,
	mst, jasowang, jejb, virtualization, linux-scsi, linux-kernel

From: Joshua Daley <jdaley@linux.ibm.com>

[ Upstream commit da3159a3b3fdc05c6bdba2fd4f4802a6718d879a ]

The last step of virtscsi_handle_event() is to call
virtscsi_kick_event(), which calls INIT_WORK on its own work
item. INIT_WORK resets the work item's data bits to 0.

If this occurs while the work item is being flushed by
cancel_work_sync(), then kernel/workqueue.c/work_offqd_enable triggers a
kernel warning, as it expects the "disable" bit to be 1:

[   21.450115] workqueue: work disable count underflowed
[   21.450117] WARNING: CPU: 1 PID: 56 at kernel/workqueue.c:4328 enable_work+0x10a/0x120
...
[   21.450171] Call Trace:
[   21.450173]  [<000003db2e5bdc3e>] enable_work+0x10e/0x120
[   21.450176] ([<000003db2e5bdc3a>] enable_work+0x10a/0x120)
[   21.450178]  [<000003db2e5bdd86>] cancel_work_sync+0x86/0xa0
[   21.450181]  [<000003daae97d9e4>] virtscsi_remove+0xb4/0xd0 [virtio_scsi]
[   21.450184]  [<000003db2ef3b5ca>] virtio_dev_remove+0x6a/0xd0
[   21.450186]  [<000003db2ef9106c>] device_release_driver_internal+0x1ac/0x260
[   21.450190]  [<000003db2ef8edc8>] bus_remove_device+0xf8/0x190
[   21.450192]  [<000003db2ef88d72>] device_del+0x142/0x340
[   21.450194]  [<000003db2ef88fa0>] device_unregister+0x30/0xa0
[   21.450196]  [<000003db2ef3b2fa>] unregister_virtio_device+0x2a/0x40

This warning may occur if a controller is detached immediately following
a disk detach.

Move the INIT_WORK call to prevent this. Don't re-init event list work
items in virtscsi_kick_event(), init them only once in virtscsi_probe()
instead.

Signed-off-by: Joshua Daley <jdaley@linux.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://patch.msgid.link/20260325180857.3675854-2-jdaley@linux.ibm.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have enough information to complete the analysis.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject Line**
- Record: `scsi: virtio_scsi` "Move INIT_WORK calls to virtscsi_probe()"
  — restructures INIT_WORK placement to avoid race with
  `cancel_work_sync()` during controller removal.

**Step 1.2: Tags**
- Record:
  - `Signed-off-by: Joshua Daley <jdaley@linux.ibm.com>` (author)
  - `Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>` (subsystem
    expert)
  - `Link: https://patch.msgid.link/20260325180857.3675854-2-
    jdaley@linux.ibm.com` (mailing list)
  - `Signed-off-by: Martin K. Petersen` (SCSI maintainer, applied the
    patch)
  - No `Fixes:` tag, no `Cc: stable` (expected - this is why review is
    needed)
  - No `Reported-by:` but reproduction steps present in cover letter

**Step 1.3: Commit Body Analysis**
- Record: The commit describes a race where:
  - `virtscsi_handle_event()` (work function) calls
    `virtscsi_kick_event()` at the end
  - `virtscsi_kick_event()` calls `INIT_WORK` on the SAME work item that
    is currently executing
  - `INIT_WORK` resets work->data bits (including the workqueue disable
    count) to 0
  - If this happens while `cancel_work_sync()` is flushing the work,
    `work_offqd_enable` sees the disable count was cleared and triggers
    "work disable count underflowed" WARN
  - Includes a full stack trace on S390; trigger: "controller is
    detached immediately following a disk detach"

**Step 1.4: Hidden Bug Fix Detection**
- Record: Not hidden - clearly labeled as fixing a warning. Race
  condition fix disguised as "Move INIT_WORK".

## PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- Record: 1 file (`drivers/scsi/virtio_scsi.c`), ~5 net lines added.
  Changed functions: `virtscsi_kick_event()` (INIT_WORK removed) and
  `virtscsi_probe()` (INIT_WORK loop added). Single-file surgical fix.

**Step 2.2: Code Flow**
- Record:
  - Before: `INIT_WORK(&event_node->work, virtscsi_handle_event)` called
    in `virtscsi_kick_event()`, which is invoked from both
    `virtscsi_kick_event_all()` (at probe/restore time) AND from
    `virtscsi_handle_event()` itself (re-queueing at end of event
    handling).
  - After: `INIT_WORK` called once in `virtscsi_probe()` inside a `for`
    loop over all 8 event_list entries (guarded by
    VIRTIO_SCSI_F_HOTPLUG). `virtscsi_kick_event()` no longer resets the
    work struct state.
  - Forward declaration of `virtscsi_handle_event` removed (probe is
    after the definition).

**Step 2.3: Bug Mechanism**
- Record: **Race condition fix** (category b from playbook). The issue
  is that `INIT_WORK` resets all state bits in `work->data` (including
  the disable count introduced in v6.10 by commit `86898fa6b8cd9`).
  Internally, `cancel_work_sync()` now calls `__cancel_work_sync(work,
  0)` → `__cancel_work(work, WORK_CANCEL_DISABLE)` which increments the
  disable count via `work_offqd_disable()`, then `__flush_work()` waits
  for the function to complete, then calls `enable_work()` to decrement.
  If the work function calls `INIT_WORK` during the flush, disable count
  goes 1→0; later `enable_work()` sees 0 and triggers `WARN_ONCE(true,
  "workqueue: work disable count underflowed\n")` at
  `kernel/workqueue.c:4422`.

**Step 2.4: Fix Quality**
- Record: Obviously correct. The INIT_WORK was redundant after the first
  call (work's function pointer doesn't change between kicks). Moving it
  to probe() eliminates the race. Low regression risk: the work struct
  state is preserved across kicks (no need to re-init), and it persists
  through freeze/resume cycles (virtscsi_freeze doesn't cancel work, so
  state remains intact).

## PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1: Blame of Buggy Code**
- Record: `INIT_WORK(&event_node->work, virtscsi_handle_event)` in
  `virtscsi_kick_event()` was introduced by commit `365a715009411`
  "[SCSI] virtio-scsi: hotplug support for virtio-scsi" (v3.6-rc1,
  2012). The pattern has existed unchanged for 13+ years in all stable
  trees.

**Step 3.2: No Fixes: Tag to Follow**
- Record: No Fixes: tag present. The WARN symptom was enabled by commit
  `86898fa6b8cd9` "workqueue: Implement disable/enable for (delayed)
  work items" which landed in **v6.10-rc1**. Before v6.10 the same race
  existed but did not trigger this specific WARN (cancel_work_sync
  didn't use the disable count).

**Step 3.3: File History**
- Record: Recent virtio_scsi.c history shows a related commit
  `2678369e8efe0` "virtio_scsi: fix DMA cacheline issues for events" (by
  Michael Tsirkin, Dec 2025) which restructured the event buffers. The
  currently analyzed patch applies cleanly on top of that. No patch
  dependencies required beyond the usual.

**Step 3.4: Author Context**
- Record: Joshua Daley (IBM); this is their first virtio_scsi fix.
  However, the patch was Reviewed-by Stefan Hajnoczi (original virtio-
  scsi author at IBM/RedHat and primary reviewer for virtio_scsi), and
  applied by Martin K. Petersen (SCSI maintainer).

**Step 3.5: Dependencies**
- Record: Standalone fix. A second patch in the series (2/2 "kick
  event_list unconditionally") is independent and addresses a different
  cleanup - not required for this one to work. This patch doesn't depend
  on the other.

## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

**Step 4.1: Patch Discussion**
- Record: Retrieved full thread via `b4 mbox
  20260325180857.3675854-2-jdaley@linux.ibm.com`. Series is at v4.
  Previous versions (v1-v3) had different approaches (INIT_WORK moved to
  `virtscsi_init()` initially). Changelog notes v4 addresses bisection
  concerns (by placing this patch first in the series) and
  suspend/resume concerns (by choosing `virtscsi_probe()` rather than
  `virtscsi_init()`). **No stable nomination in the thread**, but the
  patch is clearly framed as a bug fix.

**Step 4.2: Reviewers**
- Record: Cc'd: linux-scsi, linux-kernel, virtualization list, MST,
  jasowang, pbonzini (QEMU/virtio maintainers), stefanha (virtio-scsi
  expert), eperezma, Martin Petersen (SCSI maintainer), and multiple IBM
  S390 engineers (mjrosato, farman, frankja). Stefan Hajnoczi's
  Reviewed-by tag confirms subsystem expert review.

**Step 4.3: Bug Report**
- Record: No syzbot report. The reporter is the author himself running
  tests on IBM S390 (evidenced by addresses in stack trace
  `000003db2e5...`). The cover letter documents that the warning is
  reliably reproducible by adding `msleep(1000)` before INIT_WORK and
  running `virsh detach-device disk; virsh detach-device controller`.

**Step 4.4: Related Patches**
- Record: The series "scsi: virtio_scsi: move INIT_WORK calls to
  virtscsi_probe" contains 2 patches, both applied by Martin K. Petersen
  to `7.1/scsi-queue` (`[1/2] da3159a3b3fd` and `[2/2] 0019a3a5756b`).

**Step 4.5: Stable-specific Discussion**
- Record: No explicit stable discussion in the thread. The v4 changelog
  does not mention stable.

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1: Key Functions**
- Record: `virtscsi_kick_event` (INIT_WORK removed), `virtscsi_probe`
  (INIT_WORK loop added), `virtscsi_handle_event` (forward declaration
  removed since probe is below it).

**Step 5.2: Callers of `virtscsi_kick_event`**
- Record: `virtscsi_kick_event_all()` (called at probe and restore) and
  `virtscsi_handle_event()` (the work function itself, for re-queueing).
  `virtscsi_kick_event_all` is called from `virtscsi_probe()` and
  `virtscsi_restore()`.

**Step 5.3: Callees**
- Record: `virtscsi_kick_event` calls `sg_init_one`,
  `virtqueue_add_inbuf_cache_clean`, `virtqueue_kick`. None of these
  interact with work struct state.

**Step 5.4: Reachability**
- Record: The race path is reachable from userspace via standard device
  hotplug operations (virsh detach-device or equivalent QEMU API calls).
  Very common in cloud/virt environments.

**Step 5.5: Similar Patterns**
- Record: The anti-pattern of "calling INIT_WORK from within the work
  function on its own work_struct" is known to be racy with
  cancel_work_sync. This is why v6.10+ workqueue added the WARN to
  detect it.

## PHASE 6: STABLE TREE ANALYSIS

**Step 6.1: Code in Stable Trees**
- Record: Verified by reading
  `remotes/stable/linux-6.6.y:drivers/scsi/virtio_scsi.c` and
  `linux-6.12.y` — both have the exact same
  `INIT_WORK(&event_node->work, virtscsi_handle_event)` pattern in
  `virtscsi_kick_event()` and the same
  `virtscsi_probe()`/`virtscsi_remove()` structure. Code exists
  unchanged in all maintained stable trees (back to at least 5.15).

**Step 6.2: Backport Difficulty**
- Record: The patch should apply cleanly or with trivial adjustments.
  The surrounding code in `virtscsi_probe()` is similar across stable
  trees, though there was a recent reorganization (`2678369e8efe0`
  "virtio_scsi: fix DMA cacheline issues for events" in mainline, not in
  stable). In 6.12.y, `event_node->event` is still an inline struct (not
  a pointer); the patch's INIT_WORK change is independent of that.

**Step 6.3: Related Fixes in Stable**
- Record: No prior fix for this race in stable. The WARN_ONCE at
  kernel/workqueue.c:4422 was introduced in v6.10 (commit
  `86898fa6b8cd9`).

## PHASE 7: SUBSYSTEM CONTEXT

**Step 7.1: Subsystem**
- Record: `drivers/scsi/virtio_scsi.c` — virtio-scsi driver.
  Criticality: **IMPORTANT**. Used by essentially every KVM/QEMU-based
  virtualization stack (including cloud providers using KVM, libvirt,
  AWS EC2, GCP GCE, OpenStack).

**Step 7.2: Subsystem Activity**
- Record: Moderately active (~20 commits in recent history, many
  cleanup/refactoring). Core logic unchanged since v3.6.

## PHASE 8: IMPACT AND RISK

**Step 8.1: Affected Users**
- Record: All users of virtio-scsi on v6.10+ kernels who perform hotplug
  operations (disk/controller detach). This is a massive user base in
  virtualization.

**Step 8.2: Trigger Conditions**
- Record: Normal administrative workflow: detach a disk, then detach the
  controller immediately. Reproducible with standard virsh commands. Not
  privileged-user-triggerable from guest, but a host-side operation.

**Step 8.3: Failure Mode**
- Record: Kernel `WARN_ONCE` with stack trace in dmesg. **Severity:
  MEDIUM**. Not a crash, not data corruption, not security-relevant.
  But: user-visible warning that could trigger monitoring alerts, CI
  failures (many CI systems treat kernel WARN as failure), and user
  concern. The pre-v6.10 race still exists but without the visible WARN
  — potentially other subtle effects but not documented.

**Step 8.4: Risk vs Benefit**
- Record:
  - BENEFIT: Eliminates a user-visible kernel warning in a very common
    virtualization path. Moderate benefit.
  - RISK: Very low — 5 net lines, moves a single INIT_WORK call,
    reviewed by subsystem expert, applied by maintainer. The work struct
    state is correctly preserved across freeze/resume cycles.
  - Ratio: Favorable for backporting.

## PHASE 9: FINAL SYNTHESIS

**Step 9.1: Evidence Summary**

FOR backporting:
- Real race condition with reproducible user-visible WARN
- Small, surgical fix (5 net lines)
- Reviewed by subsystem expert (Stefan Hajnoczi)
- Applied by SCSI maintainer (Martin K. Petersen)
- Code pattern identical in all stable trees v6.10+ where the WARN is
  triggered
- Common workflow (VM hotplug) triggers this bug
- Low regression risk

AGAINST backporting:
- Symptom is WARN, not crash/corruption
- No explicit Cc: stable nomination
- No Fixes: tag
- Race has existed since v3.6 without major user impact reports

**Step 9.2: Stable Rules**
1. Obviously correct: YES — trivially correct (just relocates INIT_WORK
   to run once)
2. Fixes real bug: YES — triggers WARN in v6.10+
3. Important issue: MEDIUM (WARN, not crash)
4. Small and contained: YES (5 lines, one file)
5. No new features: YES
6. Applies cleanly: YES for recent stable trees

**Step 9.3: Exception Categories**
- Record: Not an exception category (not device ID, quirk, DT, build, or
  doc fix). Regular bug fix.

**Step 9.4: Decision**
The fix addresses a reproducible race condition with a clear user-
visible symptom on v6.10+ kernels (6.12, 6.17, 6.18, 6.19 stable trees).
It has been reviewed by the virtio-scsi expert, applied by the SCSI
maintainer, and affects a widely-deployed driver (used in virtually all
KVM/QEMU VMs). The patch is minimal, correct, and low-risk. The symptom
is only a WARN (not a crash), but it occurs in common hotplug workflows
and the fix is safe.

---

## Verification

- [Phase 1] Parsed tags: Reviewed-by Stefan Hajnoczi, Link to lore,
  Signed-off-by Martin K. Petersen; no Fixes/Cc stable tags
- [Phase 2] Diff analysis: 5 net lines changed in
  `drivers/scsi/virtio_scsi.c`; INIT_WORK moved from
  `virtscsi_kick_event` to a loop in `virtscsi_probe` guarded by
  VIRTIO_SCSI_F_HOTPLUG
- [Phase 3] `git log -S"INIT_WORK(&event_node->work"`: pattern
  introduced by `365a715009411` in v3.6-rc1 (2012)
- [Phase 3] `git describe --contains 365a715009411`: v3.6-rc1, confirmed
  pattern has been stable for 13+ years
- [Phase 3] `git log -S"work disable count underflowed"`: WARN
  introduced by `86898fa6b8cd9` in v6.10-rc1 — this is why the visible
  symptom only exists v6.10+
- [Phase 3] Read kernel/workqueue.c lines 4407-4499 to verify
  `enable_work()`/`__cancel_work_sync()` logic and confirm the race
  mechanism
- [Phase 3] Read include/linux/workqueue.h: confirmed `INIT_WORK` →
  `__INIT_WORK_KEY` → resets `work->data = WORK_DATA_INIT()` (all bits
  zeroed, clobbering disable count)
- [Phase 4] `b4 mbox 20260325180857.3675854-2-jdaley@linux.ibm.com`:
  retrieved full thread (7 messages, 2 reviewers, 1 apply confirmation)
- [Phase 4] Thread content confirmed: v4 is latest, Reviewed-by tag
  added, Martin K. Petersen applied to 7.1/scsi-queue
  (git.kernel.org/mkp/scsi/c/da3159a3b3fd)
- [Phase 4] No explicit stable nomination in the mailing list discussion
- [Phase 5] `grep "virtscsi_kick_event\b"`: 3 hits — function
  definition, call from `virtscsi_kick_event_all()` (probe/restore),
  call from `virtscsi_handle_event()` (re-queue after event processing)
- [Phase 6] Read `remotes/stable/linux-6.6.y:drivers/scsi/virtio_scsi.c`
  and `linux-6.12.y`: confirmed identical `INIT_WORK(&event_node->work,
  ...)` pattern in `virtscsi_kick_event()`, so fix applies conceptually
  cleanly
- [Phase 6] Confirmed via search that recent commit `2678369e8efe0`
  "virtio_scsi: fix DMA cacheline issues for events" changed the event
  buffer layout in mainline but doesn't affect the INIT_WORK move
- [Phase 7] Subsystem: virtio-scsi — widely used in all KVM/QEMU
  virtualization
- [Phase 8] Failure mode: `WARN_ONCE` at kernel/workqueue.c:4422, stack
  trace visible in dmesg; severity MEDIUM (not a
  crash/corruption/security)
- UNVERIFIED: Whether the pre-v6.10 race has any user-visible effect
  beyond the post-v6.10 WARN. The fix is still logically correct for
  pre-v6.10 but the symptom documentation only covers v6.10+.
- UNVERIFIED: Whether the fix requires any adjustment for stable trees
  (the diff in the thread differs slightly from the presented diff —
  presented diff has a few whitespace/context differences from the v4
  mailed patch, but the functional change is identical)

The commit is a small, well-reviewed bug fix for a reproducible race
condition that triggers a kernel WARN in a widely-deployed driver
(virtio-scsi used in essentially all KVM/QEMU VMs) on v6.10+ kernels.
The fix is minimal (5 net lines, one file), obviously correct (the work
struct doesn't need to be re-initialized on every kick), reviewed by the
subsystem expert (Stefan Hajnoczi), and applied by the SCSI maintainer.
Regression risk is very low.

**YES**

 drivers/scsi/virtio_scsi.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 0ed8558dad724..64b6c942f5720 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -233,7 +233,6 @@ static void virtscsi_ctrl_done(struct virtqueue *vq)
 	virtscsi_vq_done(vscsi, &vscsi->ctrl_vq, virtscsi_complete_free);
 };
 
-static void virtscsi_handle_event(struct work_struct *work);
 
 static int virtscsi_kick_event(struct virtio_scsi *vscsi,
 			       struct virtio_scsi_event_node *event_node)
@@ -242,7 +241,6 @@ static int virtscsi_kick_event(struct virtio_scsi *vscsi,
 	struct scatterlist sg;
 	unsigned long flags;
 
-	INIT_WORK(&event_node->work, virtscsi_handle_event);
 	sg_init_one(&sg, event_node->event, sizeof(struct virtio_scsi_event));
 
 	spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
@@ -984,8 +982,11 @@ static int virtscsi_probe(struct virtio_device *vdev)
 
 	virtio_device_ready(vdev);
 
-	if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG))
+	if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
+		for (int i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++)
+			INIT_WORK(&vscsi->event_list[i].work, virtscsi_handle_event);
 		virtscsi_kick_event_all(vscsi);
+	}
 
 	scsi_scan_host(shost);
 	return 0;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH AUTOSEL 7.0-6.6] scsi: ufs: core: Disable timestamp for Kioxia THGJFJT0E25BAIP
       [not found] <20260428104133.2858589-1-sashal@kernel.org>
                   ` (4 preceding siblings ...)
  2026-04-28 10:40 ` [PATCH AUTOSEL 7.0] scsi: virtio_scsi: Move INIT_WORK calls to virtscsi_probe() Sasha Levin
@ 2026-04-28 10:41 ` Sasha Levin
  2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-5.10] scsi: esas2r: Fix __printf annotation on esas2r_log_master() Sasha Levin
  2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-6.18] scsi: lpfc: Add PCI ID support for LPe42100 series adapters Sasha Levin
  7 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2026-04-28 10:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Aaron Kling, Bart Van Assche, Martin K. Petersen, Sasha Levin,
	jejb, linux-scsi, linux-kernel

From: Aaron Kling <webgeek1234@gmail.com>

[ Upstream commit e423f1c7195645e18945fba0bd8f0a32e39286e7 ]

Kioxia has another product that does not support the qTimestamp
attribute.

Signed-off-by: Aaron Kling <webgeek1234@gmail.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260403-thgjfjt0e25baip-no-timestamp-v1-1-1ddb34225133@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Based on my investigation, I'll now write up the complete analysis.

## Analysis

### PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject Line Parse**
- Record: Subsystem = `scsi: ufs: core`; action verb = "Disable"; intent
  = "Disable timestamp for Kioxia THGJFJT0E25BAIP" — adds a device quirk
  entry to disable qTimestamp handling for a specific Kioxia UFS device
  model.

**Step 1.2: Commit Message Tags**
- Record:
  - Author: Aaron Kling <webgeek1234@gmail.com>
  - Reviewed-by: Bart Van Assche <bvanassche@acm.org> (SCSI/UFS core
    reviewer who authored the original quirk infrastructure)
  - Link: https://patch.msgid.link/20260403-thgjfjt0e25baip-no-
    timestamp-v1-1-1ddb34225133@gmail.com
  - Signed-off-by: Martin K. Petersen (SCSI maintainer)
  - No Fixes:, no Reported-by, no Cc: stable. (Absence of stable tag is
    expected.)

**Step 1.3: Commit Body**
- Record: Very short body — "Kioxia has another product that does not
  support the qTimestamp attribute." The parent commit (fb1f4568346153)
  introduced `UFS_DEVICE_QUIRK_NO_TIMESTAMP_SUPPORT` to avoid log-error
  spam when the device rejects the SET_TIMESTAMP query; this commit just
  adds another affected device model.

**Step 1.4: Hidden Bug Fix Detection**
- Record: This IS effectively a bug fix — on the THGJFJT0E25BAIP, the
  current kernel calls `ufshcd_set_timestamp_attr()` periodically and at
  init. The device returns an error, which produces `dev_err()` log spam
  ("failed to set timestamp %d" / "Failed to update rtc %d"). The quirk
  bypasses the query entirely. Hidden-fix category: hardware workaround
  / quirk.

### PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- Record: 1 file modified (`drivers/ufs/core/ufshcd.c`), +3/-0 lines.
  One function touched: the static `ufs_fixups[]` table (data-only
  change). Scope: trivial, surgical.

**Step 2.2: Code Flow Change**
- Record: Before — only `THGLF2G9C8KBADG`, `THGLF2G9D8KBADG`
  (PA_TACTIVATE) and `THGJFJT1E45BATP` (NO_TIMESTAMP_SUPPORT) were
  matched for Toshiba-ID devices. After — `THGJFJT0E25BAIP` is also
  matched and gets `UFS_DEVICE_QUIRK_NO_TIMESTAMP_SUPPORT` bit set via
  `ufshcd_fixup_dev_quirks()` at device probe. At runtime
  `ufshcd_set_timestamp_attr()` exits early (verified
  `ufshcd.c:8966-8968`).

**Step 2.3: Bug Mechanism**
- Record: Category (h) — Hardware workaround, device-ID/quirk-table
  addition. No logic changes, no synchronization change, no refcount
  change.

**Step 2.4: Fix Quality**
- Record: Obviously correct. Zero risk for any non-matching device
  (quirk table is a prefix-match on manufacturer+model, so only the
  Kioxia THGJFJT0E25BAIP is affected). Cannot regress any other device.

### PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1: Blame**
- Record: The table surrounding the addition was introduced over time;
  the specifically-referenced quirk
  `UFS_DEVICE_QUIRK_NO_TIMESTAMP_SUPPORT` was introduced by commit
  `fb1f4568346153d2f80fdb4ffcfa0cf4fb257d3c` ("scsi: ufs: core: Disable
  timestamp functionality if not supported", Bart Van Assche,
  2025-09-09), which also added the first device entry
  `THGJFJT1E45BATP`.

**Step 3.2: Fixes: Tag**
- Record: No Fixes: tag. Not applicable. The conceptual "Fixes" target
  is fb1f4568346153, already backported to stable (see Step 6.3).

**Step 3.3: Related File Changes**
- Record: Recent ufshcd.c traffic is mostly core refactors/fixes. Only
  two prior NO_TIMESTAMP-related commits (fb1f4568346153 and
  cb7cc0cfb38cf). This addition is standalone — no series, no
  prerequisites beyond fb1f4568346153 which already exists in stable.

**Step 3.4: Author**
- Record: Aaron Kling is a known Tegra/ARM contributor (`git log
  --author="Aaron Kling"` shows cpufreq, PCI tegra, irqdomain,
  arm64/tegra DT work). He almost certainly hit this on a Tegra board
  shipping with the Kioxia THGJFJT0E25BAIP. Reviewed-by comes from the
  original quirk author (Bart Van Assche) — ideal reviewer.

**Step 3.5: Dependencies**
- Record: Depends on commit fb1f4568346153 (defines the quirk macro and
  the dispatch in `ufshcd_set_timestamp_attr()`). Confirmed present in
  stable — see Phase 6.

### PHASE 4: MAILING LIST RESEARCH

**Step 4.1: Original Submission**
- Record: `b4 dig -c e423f1c719564` found the series at
  https://lore.kernel.org/all/20260403-thgjfjt0e25baip-no-
  timestamp-v1-1-1ddb34225133@gmail.com/ . Single version (v1), no
  respins.

**Step 4.2: Reviewers**
- Record: Patch went to Alim Akhtar, Avri Altman, Bart Van Assche, James
  Bottomley, Martin K. Petersen, linux-scsi. Bart Van Assche explicitly
  replied with `Reviewed-by:` (he is the author of the quirk
  infrastructure, so he is the domain expert on this). No NAKs, no
  concerns raised, no requests for changes. No explicit stable
  nomination in thread.

**Step 4.3: Bug Report**
- Record: No Reported-by, no external bug report cited. User-facing
  symptom is log-error spam on boot/resume/periodic RTC update — the
  kind of thing an engineer notices when bringing up the board and files
  a patch directly.

**Step 4.4: Series Context**
- Record: Single standalone patch. Not part of a larger series.

**Step 4.5: Stable Discussion**
- Record: No stable-list discussion specific to this commit. The
  precedent is well-established from the prior patch.

### PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1: Key Functions**
- Record: No function added/modified — only a data entry in the static
  `ufs_fixups[]` array.

**Step 5.2: Callers**
- Record: `ufs_fixups[]` is consumed by `ufshcd_fixup_dev_quirks(hba,
  ufs_fixups)` called from `ufs_fixup_device_setup()` at `ufshcd.c:8666`
  during normal device probe. Quirk bit
  (`UFS_DEVICE_QUIRK_NO_TIMESTAMP_SUPPORT`) is consumed at
  `ufshcd.c:8966-8968` inside `ufshcd_set_timestamp_attr()`, which is
  called from `ufshcd_add_lus()` (init) and `ufshcd.c:10225` (resume
  path).

**Step 5.3: Callees**
- Record: N/A (data entry only).

**Step 5.4: Reachability**
- Record: Any boot or resume of a system with this Kioxia UFS storage
  triggers the code path. Fully reachable, real users.

**Step 5.5: Similar Patterns**
- Record: Entire `ufs_fixups[]` table is this pattern. The adjacent
  entry (THGJFJT1E45BATP) is the exact same fix for a sibling Kioxia
  product.

### PHASE 6: STABLE TREE ANALYSIS

**Step 6.1: Code Exists in Stable?**
- Record: `ufshcd_set_timestamp_attr()` exists in all modern stable
  trees. The `UFS_DEVICE_QUIRK_NO_TIMESTAMP_SUPPORT` macro exists in
  6.6.y, 6.12.y, 6.18.y (verified by inspecting
  `include/ufs/ufs_quirks.h` on each branch — macro is defined as `(1 <<
  13)`). Not present in 6.17.y (EOL) or 6.1.y (infrastructure commit not
  backported).

**Step 6.2: Backport Complications**
- Record: None. Trivial 3-line text addition to a stable table. Will
  apply cleanly to 6.6.y, 6.12.y, 6.18.y. Cannot apply to 6.1.y because
  the quirk macro and `ufshcd_set_timestamp_attr()` gating do not exist
  there — the patch would be a no-op there anyway.

**Step 6.3: Related Fixes in Stable**
- Record: Parent commit `fb1f4568346153` was backported (by the autosel
  pipeline) to:
  - 6.18.y as `fb1f456834615`
  - 6.12.y as `c6e1e2135d004`
  - 6.6.y as `88ac95b17a038`
  This establishes the precedent: the sibling "add Kioxia timestamp
quirk" patch is already deemed stable-worthy.

### PHASE 7: SUBSYSTEM CONTEXT

**Step 7.1: Subsystem / Criticality**
- Record: drivers/ufs/core — UFS (Universal Flash Storage) subsystem —
  the primary storage on most modern Android/Tegra/Snapdragon/MediaTek
  devices. Criticality: IMPORTANT (affects a specific storage device,
  not universal, but affects real deployed hardware).

**Step 7.2: Activity**
- Record: Active subsystem with regular fixes landing.

### PHASE 8: IMPACT AND RISK

**Step 8.1: Who Is Affected**
- Record: Users of devices with Kioxia THGJFJT0E25BAIP UFS storage (a
  specific hardware quirk — likely used in particular Tegra-based
  boards, given Aaron Kling's affiliation).

**Step 8.2: Trigger Conditions**
- Record: Every boot of an affected system triggers one "failed to set
  timestamp" dev_err. The periodic RTC update work (`ufshcd_rtc_work()`)
  also triggers "Failed to update rtc" repeatedly (every
  `rtc_update_period` ms). Also triggers on resume. No userspace trigger
  required.

**Step 8.3: Failure Mode Severity**
- Record: LOW severity — the UFS device rejects the query gracefully,
  nothing crashes, no data is lost. But dev_err output is continuous
  (RTC update work loop). Severity: LOW (log noise), no functional
  impact.

**Step 8.4: Risk-Benefit**
- Record:
  - Benefit: Silences dev_err spam on a specific Kioxia product; affects
    only matching devices.
  - Risk: Essentially zero. Literal 3-line data entry. Prefix matching
    in `ufshcd_fixup_dev_quirks()` (`STR_PRFX_EQUAL`) only triggers on
    Toshiba-manufactured devices whose model starts with
    "THGJFJT0E25BAIP"; no other device is touched.
  - Ratio: Favorable.

### PHASE 9: SYNTHESIS

**Step 9.1: Evidence**
- FOR: Textbook hardware quirk / device-ID-table addition; explicitly
  listed as an "IMPORTANT EXCEPTION" for stable; trivial 3-line change;
  reviewed by the subsystem expert who authored the underlying quirk;
  the precedent commit adding the same quirk for a different Kioxia
  model was auto-backported to 6.6.y, 6.12.y, 6.18.y; infrastructure is
  already present in those trees; zero regression risk to non-matching
  hardware.
- AGAINST: Low severity (log noise, not functional); no Reported-by from
  multiple users.
- UNRESOLVED: None relevant.

**Step 9.2: Stable Rules Checklist**
1. Obviously correct and tested — YES (static data entry, reviewed by
   core expert)
2. Fixes a real bug affecting users — YES (produces repeated dev_err on
   affected hardware)
3. Important — borderline, but falls into explicitly-allowed
   quirk/hardware-workaround category
4. Small and contained — YES (3 lines, one file)
5. No new features or APIs — YES
6. Applies to stable — YES to 6.6.y/6.12.y/6.18.y; N/A to 6.1.y
   (infrastructure missing)

**Step 9.3: Exception Category**
- Falls under the "HARDWARE QUIRK / DEVICE-ID ADDITION TO EXISTING
  DRIVER" exception. This is exactly the pattern the stable rules call
  out as acceptable.

**Step 9.4: Decision**
- This is a tiny hardware-quirk addition that matches a clear precedent
  already in stable trees. Zero regression risk. Should be backported.

## Verification

- [Phase 1] Read `git show e423f1c7195645e18945fba0bd8f0a32e39286e7` —
  confirmed commit details, Reviewed-by: Bart Van Assche, Link tag,
  Martin K. Petersen SOB.
- [Phase 2] Read the diff and `ufs_fixups[]` in
  `drivers/ufs/core/ufshcd.c` (lines 292-322) — confirmed pure data-
  entry addition, 3 lines, 1 file.
- [Phase 2] Read `ufshcd_fixup_dev_quirks()` at `ufshcd.c:8430-8448` —
  confirmed strict manufacturer-ID + prefix model matching so only
  THGJFJT0E25BAIP-prefix Toshiba devices are affected.
- [Phase 2] Read `ufshcd_set_timestamp_attr()` at `ufshcd.c:8958-8988` —
  confirmed gate on `UFS_DEVICE_QUIRK_NO_TIMESTAMP_SUPPORT`.
- [Phase 3] `git show fb1f4568346153` — confirmed this is the commit
  introducing the quirk macro and the first Kioxia THGJFJT1E45BATP
  entry.
- [Phase 3] `git log --author="Aaron Kling" --oneline -10` — confirmed
  author is a long-time Tegra contributor.
- [Phase 4] `b4 dig -c e423f1c719564` — found lore thread
  https://lore.kernel.org/all/20260403-thgjfjt0e25baip-no-
  timestamp-v1-1-1ddb34225133@gmail.com/ .
- [Phase 4] `b4 dig -c e423f1c719564 -a` — confirmed only a v1 exists,
  no respins.
- [Phase 4] `b4 dig -c e423f1c719564 -m /tmp/thread_timestamp.mbox` and
  read mbox — confirmed Bart Van Assche gave Reviewed-by, Martin K.
  Petersen applied it to 7.1/scsi-staging then 7.1/scsi-queue. No NAKs,
  no stable discussion, no requested changes.
- [Phase 5] `grep ufshcd_fixup_dev_quirks` — confirmed `ufs_fixups[]` is
  consumed during normal device probe at `ufshcd.c:8666`.
- [Phase 5] Re-read call sites of `ufshcd_set_timestamp_attr` —
  confirmed called from init (`ufshcd_add_lus`) and resume
  (`ufshcd.c:10225`).
- [Phase 6] `git show stable-push/linux-6.18.y:include/ufs/ufs_quirks.h`
  — confirmed `UFS_DEVICE_QUIRK_NO_TIMESTAMP_SUPPORT (1<<13)` exists.
- [Phase 6] Same for `stable-push/linux-6.12.y` and `stable-
  push/linux-6.6.y` — both contain the macro.
- [Phase 6] Same for `stable-push/linux-6.17.y` and `stable-
  push/linux-6.1.y` — macro NOT present; either EOL or infrastructure
  not backported.
- [Phase 6] `git show stable-
  push/linux-6.18.y:drivers/ufs/core/ufshcd.c` | grep THGJFJT —
  confirmed existing `THGJFJT1E45BATP` entry in stable, so the new entry
  will apply cleanly.
- [Phase 6] `git log --grep="Disable timestamp functionality" stable-
  push/linux-6.6.y` — found `88ac95b17a038` (backported by Sasha Levin
  autosel pipeline, marked `[ Upstream commit fb1f45683461… ]`),
  confirming the parent/infrastructure commit was deemed stable-worthy.
- UNVERIFIED: Whether 6.17.y is still receiving updates (treated as EOL
  based on no matching commit); this does not affect the decision since
  6.18.y, 6.12.y, 6.6.y are the active targets.

This commit adds a hardware-specific device quirk entry — a textbook
stable-tree exception. The fix is three lines, cannot affect other
hardware, was reviewed by the subsystem expert who wrote the underlying
quirk, and directly mirrors a sibling commit that is already in
6.6.y/6.12.y/6.18.y.

**YES**

 drivers/ufs/core/ufshcd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 9ceb6d6d479d0..9b77639f04535 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -315,6 +315,9 @@ static const struct ufs_dev_quirk ufs_fixups[] = {
 	{ .wmanufacturerid = UFS_VENDOR_TOSHIBA,
 	  .model = "THGLF2G9D8KBADG",
 	  .quirk = UFS_DEVICE_QUIRK_PA_TACTIVATE },
+	{ .wmanufacturerid = UFS_VENDOR_TOSHIBA,
+	  .model = "THGJFJT0E25BAIP",
+	  .quirk = UFS_DEVICE_QUIRK_NO_TIMESTAMP_SUPPORT },
 	{ .wmanufacturerid = UFS_VENDOR_TOSHIBA,
 	  .model = "THGJFJT1E45BATP",
 	  .quirk = UFS_DEVICE_QUIRK_NO_TIMESTAMP_SUPPORT },
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH AUTOSEL 7.0-5.10] scsi: esas2r: Fix __printf annotation on esas2r_log_master()
       [not found] <20260428104133.2858589-1-sashal@kernel.org>
                   ` (5 preceding siblings ...)
  2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-6.6] scsi: ufs: core: Disable timestamp for Kioxia THGJFJT0E25BAIP Sasha Levin
@ 2026-04-28 10:41 ` Sasha Levin
  2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-6.18] scsi: lpfc: Add PCI ID support for LPe42100 series adapters Sasha Levin
  7 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2026-04-28 10:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Arnd Bergmann, Bart Van Assche, Martin K. Petersen, Sasha Levin,
	linuxdrivers, jejb, linux-scsi, linux-kernel

From: Arnd Bergmann <arnd@arndb.de>

[ Upstream commit 67557418905b103eaa7bacf81999be83accda334 ]

clang-22 started warning about functions that take printf format
strings:

drivers/scsi/esas2r/esas2r_log.c:160:50: error: diagnostic behavior may be improved by adding the 'format(printf, 3, 0)' attribute to the declaration of 'esas2r_log_master' [-Werror,-Wmissing-format-attribute]
  121 |                 retval = vsnprintf(buffer, buflen, format, args);
      |                                                                ^
drivers/scsi/esas2r/esas2r_log.c:121:12: note: 'esas2r_log_master' declared here
  121 | static int esas2r_log_master(const long level,
      |            ^

The warning already got silenced for gcc but not clang in the past.
Rather than modify that hack to turn it off for both, just add the
attribute as suggested and remove the pragma again.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20260323100027.1975646-1-arnd@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have enough information to complete my analysis.

## Phase 1: Commit Message Forensics

**Step 1.1 — Subject line parse**
- Subsystem prefix: `scsi: esas2r:`
- Action verb: "Fix"
- Record: [scsi/esas2r driver] [Fix] [Adjust `__printf` annotation on
  `esas2r_log_master()` to silence a clang-22 `-Wmissing-format-
  attribute` compile error]

**Step 1.2 — Tags present**
- Signed-off-by: Arnd Bergmann (author), Martin K. Petersen (SCSI
  maintainer)
- Reviewed-by: Bart Van Assche (SCSI/block reviewer)
- Link:
  https://patch.msgid.link/20260323100027.1975646-1-arnd@kernel.org
- No `Fixes:` tag, no `Reported-by:`, no `Cc: stable@vger.kernel.org`
- Record: Reviewed by a well-known SCSI reviewer; no explicit stable
  nomination or Fixes reference.

**Step 1.3 — Commit body**
- clang-22 introduced a new diagnostic `-Wmissing-format-attribute`
  which is promoted to error by `-Werror` (e.g. `CONFIG_WERROR`). The
  message shows the exact error text referencing the `vsnprintf(buffer,
  buflen, format, args)` call inside `esas2r_log_master()`.
- A previous GCC-only workaround used `#pragma GCC diagnostic ignored
  "-Wsuggest-attribute=format"` guarded with `#ifndef __clang__`. That
  pragma silenced GCC but left clang with no annotation, and clang-22
  now emits an error.
- Fix: drop the pragma hack and add the real `__printf(3, 0)` attribute,
  which is the portable, compiler-correct solution.
- Record: Build-only change; no runtime behavior description; no user-
  visible symptom beyond compilation failure with clang-22.

**Step 1.4 — Hidden bug fix?**
- Not hiding any runtime bug. The fix is exactly what it appears to be:
  a compiler-attribute cleanup that also happens to be required for
  clang-22 builds.
- Record: Not a hidden runtime fix; it is a compilation/annotation fix.

## Phase 2: Diff Analysis

**Step 2.1 — Inventory**
- Single file: `drivers/scsi/esas2r/esas2r_log.c`, +3 / -11 lines
- Functions modified: `esas2r_log_master()` only (prototype annotation)
- Scope: single-file surgical annotation change.

**Step 2.2 — Code flow**
- Before: `static int esas2r_log_master(...)` with `#pragma GCC
  diagnostic push/pop` around it to hide `-Wsuggest-attribute=format`
  for GCC only.
- After: `static __printf(3, 0) int esas2r_log_master(...)` with no
  pragma wrappers.
- Execution flow is unchanged. `__printf(a, b)` expands to
  `__attribute__((format(printf, 3, 0)))`, a compile-time hint to the
  format-string checker. It affects compiler diagnostics, not generated
  code.

**Step 2.3 — Bug mechanism**
- Category (h) hardware workaround: N/A
- Category closest fit: **build/annotation fix** (compiler-attribute
  correctness). No runtime resource leak, race, UAF, deref, etc.

**Step 2.4 — Fix quality**
- Obviously correct: `esas2r_log_master(level, dev, format, args)` —
  `format` is argument 3, `args` is `va_list`, so `__printf(3, 0)` is
  the textbook annotation for a vprintf-style function (second argument
  `0` for va_list variants).
- Minimal, surgical, zero regression risk; binary output is effectively
  unchanged.

## Phase 3: Git History

**Step 3.1 — blame / introduction**
- `git log` on `drivers/scsi/esas2r/esas2r_log.c` shows the pragma
  workaround was introduced in commit `1c666a3e0a54e` ("scsi: esas2r:
  Supply __printf(x, y) formatting for esas2r_log_master()", Lee Jones,
  2021-03-12), which first appeared in **v5.13-rc1**.
- Record: Pragma present since v5.13; the clang-specific gap has existed
  ever since.

**Step 3.2 — Fixes: target**
- No Fixes tag. Logically references `1c666a3e0a54e`, which is present
  in 5.15.y, 6.1.y, 6.6.y, 6.12.y. (5.10.y does not carry 1c666a3e0a54e
  — neither the pragma nor the warning baseline exist there.)
- Record: Implicit target is in stable trees ≥5.15.y.

**Step 3.3 — File history**
- Recent churn on the file is minimal; the only other commit touching it
  around the pragma is the original Lee Jones cleanup. No competing
  changes that would complicate backport.

**Step 3.4 — Author context**
- Arnd Bergmann — prolific kernel build-fix contributor; many of his
  compiler-warning fixes have been backported to stable (e.g.
  `5c3de2cae7ced`, `09dc5be323d4f`, `7ebd51c3f032d`, `81fdecac3f2c0`).
- Record: Author is a trusted build-fix maintainer.

**Step 3.5 — Dependencies**
- No prerequisite patch required. Standalone. `__printf` and friends are
  kernel-wide macros present in all supported trees.
- Record: Standalone; applies without dependencies.

## Phase 4: Mailing-list research

**Step 4.1 — Original submission**
- `b4 dig -c 67557418905b103eaa7bacf81999be83accda334` resolved to `http
  s://lore.kernel.org/all/20260323100027.1975646-1-arnd@kernel.org/` — a
  single-version patch (no v2/v3).
- Thread pulled via mbox and inspected directly. Contents:
  - Bart Van Assche replied with `Reviewed-by:` immediately.
  - Martin K. Petersen replied first with "Applied to 7.1/scsi-staging"
    then "Applied to 7.1/scsi-queue" — no discussion about stable.
  - No NAKs, no alternative proposals, no stable request.

**Step 4.2 — Reviewers**
- `b4 dig -w`: To/Cc included Bradley Grove (driver author), James
  Bottomley, Martin K. Petersen, Nathan Chancellor, Nick Desaulniers,
  Bill Wendling, Justin Stitt, linux-scsi, linux-kernel, llvm list.
  Appropriate audience reviewed.

**Step 4.3 — Bug report**
- No Reported-by. The clang-22 diagnostic is self-reported by Arnd from
  his own build with clang-22.

**Step 4.4 — Series context**
- Single standalone patch; not part of a series.

**Step 4.5 — Stable mailing list**
- No stable-list discussion found via `b4 dig`. The SCSI maintainer
  explicitly queued to `7.1/scsi-queue`; no indication of stable intent.

## Phase 5: Code Semantic Analysis

**Step 5.1 — Functions in diff**
- Only `esas2r_log_master()` annotation changes.

**Step 5.2 — Callers**
- `esas2r_log_master()` is `static` in `esas2r_log.c`; callers are
  `esas2r_log()` and `esas2r_log_dev()` in the same file (visible in the
  full file read). These in turn are called from throughout the esas2r
  driver for logging. Reachability is normal driver code paths — all
  with constant format strings inside the module.

**Step 5.3 — Callees**
- `esas2r_log_master()` calls `spin_lock_irqsave`, `memset`, `snprintf`,
  `strlen`, `vsnprintf`, `printk` — standard kernel APIs, unchanged.

**Step 5.4 — Call chain**
- Logging path; nothing security-sensitive. Annotation change has no
  semantic effect on this path.

**Step 5.5 — Similar patterns**
- Similar clang-22 `-Wmissing-format-attribute` fixes exist in the same
  tree:
  - `d2fd4225d8de3` ("bug: avoid format attribute warning for clang as
    well")
  - `096abbb6682ee` ("clk: qoriq: avoid format string warning")
  - These confirm the clang-22 diagnostic is broadly hitting the kernel
    and is being addressed across subsystems the same way.

## Phase 6: Stable-tree cross-reference

**Step 6.1 — Does buggy code exist in stable?**
- The pragma `#pragma GCC diagnostic ignored "-Wsuggest-
  attribute=format"` (with the `#ifndef __clang__` guard) exists in
  5.15.y, 6.1.y, 6.6.y, 6.12.y. Those trees will emit the clang-22
  `-Werror=missing-format-attribute` and fail to build with
  `CONFIG_WERROR=y` + clang-22.
- 5.10.y does NOT carry the pragma commit and is not affected.

**Step 6.2 — Backport complications**
- File has seen virtually no churn since 2021. Pre-change context
  matches exactly between mainline and 5.15/6.1/6.6/6.12. Patch applies
  cleanly with no rework.
- Record: Clean apply to 5.15.y, 6.1.y, 6.6.y, 6.12.y.

**Step 6.3 — Related fixes already in stable?**
- No prior version of this fix exists in stable. Companion commits
  (`d2fd4225d8de3`, `096abbb6682ee`) are recent mainline only at this
  point.

## Phase 7: Subsystem context

**Step 7.1 — Subsystem / criticality**
- `drivers/scsi/esas2r/` — ATTO ExpressSAS SAS/SATA RAID driver.
  PERIPHERAL criticality (specific hardware, still "Supported" per
  MAINTAINERS).

**Step 7.2 — Activity level**
- Very low activity; only treewide mechanical changes recently.

## Phase 8: Impact / Risk

**Step 8.1 — Who is affected**
- Only users building affected stable trees with clang-22 (and typically
  with `CONFIG_WERROR=y`, which defaults to `COMPILE_TEST`). This is a
  limited audience today (clang-22 is brand new) but will grow over the
  life of these LTS trees.

**Step 8.2 — Trigger conditions**
- Compile-time only; never triggered at runtime regardless of
  configuration.

**Step 8.3 — Severity**
- With `-Werror`: build failure (prevents module compilation with
  clang-22).
- Without `-Werror`: a warning only.
- No runtime severity.

**Step 8.4 — Risk/benefit**
- Benefit: Low but real. Keeps stable trees compilable with newer
  compilers (important for CI/distros that build with clang).
- Risk: Near-zero. The diff only adds a format-checking hint; it cannot
  cause a regression.

## Phase 9: Synthesis

**Evidence FOR backport**
- Explicitly falls under the "BUILD FIXES" exception in stable-kernel-
  rules.rst.
- Fix is trivial (+3/−11), obviously correct, self-contained, and
  applies cleanly to 5.15/6.1/6.6/6.12.
- Reviewed by a maintainer (Bart Van Assche) and applied by Martin K.
  Petersen.
- Arnd Bergmann has a well-established pattern of similar compiler-
  warning fixes being AUTOSEL'd into stable (e.g., `5c3de2cae7ced`
  gcc-16 warning, `09dc5be323d4f` hwmon `__printf`, `7ebd51c3f032d`
  gcc-16 Wnonnull, `81fdecac3f2c0` Wformat-security). Those were also
  small cleanups without Cc: stable that were nonetheless picked up.
- Zero runtime risk — the attribute affects only compiler diagnostics.

**Evidence AGAINST backport**
- No `Fixes:`, no `Cc: stable`, no reviewer asking for stable.
- No runtime bug is being fixed.
- The warning requires clang-22 + `CONFIG_WERROR` to become a hard
  failure. clang-22 users on older LTS kernels are a narrow audience.

**Stable rules checklist**
1. Obviously correct and tested? Yes — tested by build with clang-22
   (shown in commit message) and reviewed.
2. Fixes a real bug affecting users? Yes, a real build-failure bug for
   clang-22 + WERROR users.
3. Important issue? Medium — build break, no runtime impact.
4. Small and contained? Yes, +3/−11 in one file.
5. No new features/APIs? Correct — just adds an attribute hint.
6. Apply to stable? Yes, cleanly to 5.15.y/6.1.y/6.6.y/6.12.y; not
   relevant to 5.10.y.

**Exception category**: BUILD FIX — explicitly enumerated as acceptable
stable material.

**Decision rationale**: The commit is exactly the kind of trivial, risk-
free build-fix that keeps stable trees compilable with current
toolchains. It mirrors other clang/gcc warning fixes already AUTOSEL'd
for LTS. The downside of backporting is effectively nil; the upside is
that 5.15+ LTS users can build this driver with clang-22 +
CONFIG_WERROR.

## Verification

- [Phase 1] Parsed tags via direct inspection of commit message and `git
  show 67557418905b1`: found Reviewed-by (Bart Van Assche), Link
  (patch.msgid.link), no Fixes, no Cc: stable, no Reported-by.
- [Phase 2] Diff inspection confirms: +3 lines (`__printf(3, 0)` +
  restructured prototype) / −11 lines (removed `#pragma GCC diagnostic
  push/ifndef __clang__/ignored/pop`). Only `esas2r_log_master()`
  prototype touched; function body unchanged.
- [Phase 3] `git log -- drivers/scsi/esas2r/esas2r_log.c` confirmed
  pragma workaround was added by `1c666a3e0a54e` (Lee Jones,
  2021-03-12).
- [Phase 3] `git describe --tags --contains 1c666a3e0a54e` →
  `v5.13-rc1~103^2~273`, confirming pragma first appeared in v5.13.
- [Phase 3] Verified `esas2r_log_master(level, dev, format, args)`
  argument numbering by reading the full function: `format` is the 3rd
  arg, `args` is `va_list`, so `__printf(3, 0)` is correct.
- [Phase 4] `b4 dig -c 67557418905b103eaa7bacf81999be83accda334` found
  the lore URL; saved full thread mbox to `/tmp/esas2r_thread.mbox` and
  read all messages. Only responses: Bart Van Assche's Reviewed-by and
  Martin K. Petersen's "Applied to 7.1/scsi-staging" then "Applied to
  7.1/scsi-queue". No stable nomination, no NAKs.
- [Phase 4] `b4 dig -w` confirmed Bradley Grove, James Bottomley, Martin
  K. Petersen, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
  Justin Stitt, linux-scsi, linux-kernel, llvm list were on the
  recipient list.
- [Phase 5] Read full `drivers/scsi/esas2r/esas2r_log.c`: confirmed only
  callers of `esas2r_log_master()` are the same-file wrappers
  `esas2r_log()` and `esas2r_log_dev()`; function body and callees are
  unchanged by the patch.
- [Phase 6] Verified with `git log --oneline stable-
  push/linux-{5.10,5.15,6.1,6.6,6.12}.y --
  drivers/scsi/esas2r/esas2r_log.c` that 5.15/6.1/6.6/6.12 all have
  `1c666a3e0a54e` (the pragma commit) and thus are affected; 5.10.y
  lacks it and is NOT affected.
- [Phase 6] Printed `stable-
  push/linux-6.12.y:drivers/scsi/esas2r/esas2r_log.c` lines 98–125 to
  confirm pre-patch context matches mainline exactly; the diff will
  apply cleanly.
- [Phase 7] `grep -A5 "ATTO EXPRESSSAS" MAINTAINERS` confirmed the
  driver is "Supported" with Bradley Grove as maintainer.
- [Phase 8] Inspected `init/Kconfig` `config WERROR`: defaults to
  `COMPILE_TEST`, meaning builds that enable COMPILE_TEST (including
  many CI/distro builds) would see the clang-22 error as a build
  failure, not a mere warning.
- [Phase 8] Confirmed similar precedent: `git show 09dc5be323d4f` and
  `git show 5c3de2cae7ced` — both are Arnd Bergmann compiler-warning
  fixes that landed in `stable-push/linux-6.12.y` via AUTOSEL, both of
  similar trivial scope; this establishes the pattern.
- UNVERIFIED: Whether any specific distro/CI is already building these
  stable trees with clang-22 and hitting this break — but the risk of
  backporting is essentially zero regardless.

The fix is a textbook stable-eligible build fix: tiny, obviously
correct, no runtime impact, applies cleanly to all affected LTS trees,
written by a trusted build-fix author, and matches the pattern of
similar compiler-warning fixes already carried in stable.

**YES**

 drivers/scsi/esas2r/esas2r_log.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/esas2r/esas2r_log.c b/drivers/scsi/esas2r/esas2r_log.c
index d6c87a0bae098..46f489b2263cb 100644
--- a/drivers/scsi/esas2r/esas2r_log.c
+++ b/drivers/scsi/esas2r/esas2r_log.c
@@ -101,11 +101,6 @@ static const char *translate_esas2r_event_level_to_kernel(const long level)
 	}
 }
 
-#pragma GCC diagnostic push
-#ifndef __clang__
-#pragma GCC diagnostic ignored "-Wsuggest-attribute=format"
-#endif
-
 /*
  * the master logging function.  this function will format the message as
  * outlined by the formatting string, the input device information and the
@@ -118,10 +113,9 @@ static const char *translate_esas2r_event_level_to_kernel(const long level)
  *
  * @return 0 on success, or -1 if an error occurred.
  */
-static int esas2r_log_master(const long level,
-			     const struct device *dev,
-			     const char *format,
-			     va_list args)
+static __printf(3, 0)
+int esas2r_log_master(const long level, const struct device *dev,
+		      const char *format, va_list args)
 {
 	if (level <= event_log_level) {
 		unsigned long flags = 0;
@@ -175,8 +169,6 @@ static int esas2r_log_master(const long level,
 	return 0;
 }
 
-#pragma GCC diagnostic pop
-
 /*
  * formats and logs a message to the system log.
  *
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] scsi: lpfc: Add PCI ID support for LPe42100 series adapters
       [not found] <20260428104133.2858589-1-sashal@kernel.org>
                   ` (6 preceding siblings ...)
  2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-5.10] scsi: esas2r: Fix __printf annotation on esas2r_log_master() Sasha Levin
@ 2026-04-28 10:41 ` Sasha Levin
  7 siblings, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2026-04-28 10:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Justin Tee, Martin K. Petersen, Sasha Levin, james.smart,
	dick.kennedy, jejb, linux-scsi, linux-kernel

From: Justin Tee <justin.tee@broadcom.com>

[ Upstream commit 49b9f31e52b2125125318cb60fe9f5e7fa9c6755 ]

Update supported pci_device_id table to include the values for the G8 ASIC
Device ID utilized by LPe42100 series of adapters.  The default reporting
string will be "LPe42100".

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://patch.msgid.link/20260331205928.119833-10-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have enough information to compile a complete analysis.

## Phase 1: Commit Message Forensics

**Step 1.1: Subject line parsing**
Record: Subsystem `scsi: lpfc:`. Action verb "Add" (hardware
enablement). Summary: Adds PCI device ID 0xd300 for the new LPe42100
Fibre Channel adapter (G8 ASIC).

**Step 1.2: Tag inventory**
Record: Tags present:
- `Signed-off-by: Justin Tee <justin.tee@broadcom.com>` (author,
  Broadcom developer for lpfc)
- `Link: https://patch.msgid.link/20260331205928.119833-10-
  justintee8345@gmail.com` (mailing list reference)
- `Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>` (SCSI
  subsystem maintainer)
- No `Cc: stable`, no `Fixes:`, no `Reported-by`, no `Reviewed-by`, no
  `Tested-by`, no syzbot, no `Link` to any bug report.

**Step 1.3: Body text analysis**
Record: Short three-sentence message. No bug description. No stack
trace. No symptom. No reproducer. Explicitly framed as hardware
enablement: "Update supported pci_device_id table to include the values
for the G8 ASIC Device ID utilized by LPe42100 series of adapters."
States the model name reported will be "LPe42100".

**Step 1.4: Hidden bug fix detection**
Record: Not a hidden fix. No "cleanup"/"improve"/"handle"/"ensure"
wording. This is explicitly and exclusively hardware enablement — a new
PCI ID addition.

## Phase 2: Diff Analysis

**Step 2.1: Change inventory**
Record: 3 files, 8 meaningful lines added (plus 2 copyright year bumps):
- `drivers/scsi/lpfc/lpfc_hw.h`: +1 line (`#define
  PCI_DEVICE_ID_LANCER_G8_FC 0xd300`)
- `drivers/scsi/lpfc/lpfc_ids.h`: +2 lines (entry in `lpfc_id_table[]`)
- `drivers/scsi/lpfc/lpfc_init.c`: +3 lines (new `case` in
  `lpfc_get_hba_model_desc()` returning model string "LPe42100")

Scope: single-driver, surgical addition following exact pattern of
existing G6/G7/G7P entries.

**Step 2.2: Code flow change**
Record: Before: `lpfc_id_table[]` did not match 0x10df:0xd300 → lpfc
driver would not bind to LPe42100 hardware. `lpfc_get_hba_model_desc()`
would emit "Unknown" for such a device. After: lpfc binds to
0x10df:0xd300, model string populated as "LPe42100".

**Step 2.3: Bug mechanism**
Record: Category (h) — Hardware workaround / device ID addition. No bug
being fixed; new hardware enablement.

**Step 2.4: Fix quality**
Record: Obviously correct. Pattern-identical to the existing
LANCER_G6_FC / LANCER_G7_FC / LANCER_G7P_FC entries. No new code paths,
no API change, no behavioural change for any existing device.
Essentially zero regression risk — new table entry and new switch case
are only reached when a 0xd300 device is present in the system.

## Phase 3: Git History Investigation

**Step 3.1: Blame**
Record: The `lpfc_id_table[]` and `lpfc_get_hba_model_desc()` code has
been in the tree since the lpfc driver's early days. Neighbouring G7P
entry was added by commit f449a3d7a1530 (James Smart, Jul 2021, "scsi:
lpfc: Add PCI ID support for LPe37000/LPe38000 series adapters") which
first appeared in v5.15. So the surrounding code exists in every active
stable tree from 5.15.y through 7.0.y.

**Step 3.2: Fixes tag follow-up**
Record: No `Fixes:` tag. Not applicable — this is a hardware enablement,
not a fix.

**Step 3.3: File history / series context**
Record: Part of the 10-patch series "Update lpfc to revision 15.0.0.0".
Adjacent commits in the series:
- 39d1d94166da3 — "scsi: lpfc: Introduce 128G link speed selection and
  support" (immediately before)
- 7f1e2c1cce1ca — "scsi: lpfc: Update lpfc version to 15.0.0.0"
  (immediately after)

The 128G commit is a feature addition (not a fix) that enables the
highest link speed the LPe42100 supports. **However**, I verified that
no other code in lpfc mainline references `PCI_DEVICE_ID_LANCER_G8_FC` —
only the three sites this commit touches — so binding and operation at
supported lower speeds does not require the 128G patch.

**Step 3.4: Author context**
Record: Justin Tee (Broadcom) is a regular lpfc contributor. SCSI
maintainer Martin K. Petersen signed off, indicating maintainer review.

**Step 3.5: Dependencies**
Record: No strict dependency on other patches in the series. G8 ASIC
reuses the existing LANCER_G6/G7/G7P code paths; there is no G8-specific
behaviour anywhere else in the driver. Full 128G link speed would
require the 128G patch, but the adapter binds, probes, and operates at
<=64G without it.

## Phase 4: Mailing List Research

**Step 4.1: Original submission**
Record: `b4 dig -c 49b9f31e52b21` located the original patch at https://
lore.kernel.org/all/20260331205928.119833-10-justintee8345@gmail.com/.
Part of series "[PATCH 00/10] Update lpfc to revision 15.0.0.0"
submitted 2026-03-31.

**Step 4.2: Reviewers**
Record: `b4 dig -a` shows only v1 of the series exists (no v2/v3
needed). Thread contains no Reviewed-by / Acked-by / Tested-by tags, no
NAKs, no `Cc: stable` suggestion. Martin K. Petersen accepted the
series.

**Step 4.3: Bug report**
Record: Not applicable — no bug report; new-hardware enablement.

**Step 4.4: Related series patches**
Record: The relevant companion is patch 08/10 (128G support, not a fix
and not for stable). Patch 10/10 is a version bump. No other companion
needed for the PCI ID to function.

**Step 4.5: Stable mailing list history**
Record: No stable list discussion about this commit (it is too recent —
merged early April 2026, well after v7.0).

## Phase 5: Code Semantic Analysis

**Step 5.1–5.4: Impact surface**
Record: Three touched sites:
- `lpfc_id_table[]` — consumed by the PCI core for driver match; no new
  code paths, just a new entry.
- `PCI_DEVICE_ID_LANCER_G8_FC` macro — used only in the new switch case
  in `lpfc_get_hba_model_desc()`.
- `lpfc_get_hba_model_desc()` — called during probe/ioctl to format a
  model string. Reached only when a device with the new ID is present.

`grep PCI_DEVICE_ID_LANCER_G8` across origin/master returns exactly
those three sites — no hidden dependencies.

**Step 5.5: Similar patterns**
Record: Existing LANCER_G6/G7/G7P entries are structurally identical.
This patch is a literal template-follow-up.

## Phase 6: Cross-Referencing and Stable Tree Analysis

**Step 6.1: Does the buggy code exist in stable?**
Record: There is no buggy code. The driver and surrounding structures
(`lpfc_id_table[]`, `lpfc_get_hba_model_desc()` switch) are present in
every active stable tree:
- 5.15.y: confirmed `PCI_DEVICE_ID_LANCER_G7P_FC` at lpfc_ids.h:121,
  lpfc_init.c:2608 — full context present
- 6.1.y: confirmed at lpfc_ids.h:119, lpfc_init.c:2741
- 6.6.y: confirmed at lpfc_ids.h:119, lpfc_init.c:2743
- 6.12.y: confirmed at lpfc_ids.h:119, lpfc_init.c:2732
- 5.10.y: no G7P present; driver older, backport would likely still
  apply but requires verification

**Step 6.2: Backport complications**
Record: Expected clean apply on 5.15.y, 6.1.y, 6.6.y, 6.12.y, 6.18.y,
6.19.y, 7.0.y. The three hunks anchor on G7P/SKYHAWK lines that are
unchanged in all those trees. Copyright bumps may need trivial
adjustment.

**Step 6.3: Related fixes in stable**
Record: N/A — no related fix.

## Phase 7: Subsystem Context

**Step 7.1: Criticality**
Record: `drivers/scsi/lpfc` — Emulex/Broadcom enterprise Fibre Channel
HBA driver. IMPORTANT (used in data-centre storage deployments, often
via enterprise distros that track LTS stable trees).

**Step 7.2: Activity**
Record: Actively maintained by Broadcom with quarterly "Update lpfc to
revision X" series, and many bug fixes are routinely backported to all
recent stable trees.

## Phase 8: Impact and Risk Assessment

**Step 8.1: Affected users**
Record: Users of LPe42100 (and compatible LPe421xx) Fibre Channel HBAs
running a stable/LTS kernel. Without this patch, the HBA does not bind
to the `lpfc` driver — hardware is effectively unusable on those
kernels. Enterprise/distro users often run 6.1.y / 6.6.y / 6.12.y LTS.

**Step 8.2: Trigger**
Record: Device present → driver should bind. Without the patch: driver
does not claim the device on stable kernels. Unprivileged trigger: N/A
(hardware presence is the trigger).

**Step 8.3: Failure mode severity**
Record: On stable kernels lacking this patch, a correctly installed
LPe42100 is unsupported (device is recognized by PCI subsystem but
`lpfc` declines it). User-visible symptom: no FC connectivity. Severity
category: hardware enablement — MEDIUM-HIGH for affected users (full
feature loss of the purchased adapter).

**Step 8.4: Risk-benefit**
Record: Benefit — enables new hardware for stable users (distro
customers). Risk — essentially zero: all new code paths are gated on
matching the new PCI ID; no existing device can reach the added code. 8
lines, trivial content, maintainer-signed.

## Phase 9: Final Synthesis

**Step 9.1: Evidence**
For: (a) Small, contained, obviously correct. (b) Follows the stable
rules' "NEW DEVICE IDs" exception verbatim. (c) Applies cleanly to all
active stable trees. (d) Near-zero regression risk. (e) Broadcom-signed
and maintainer-signed. (f) Strong historical precedent — smartpqi,
iwlwifi, arcmsr, k10temp, hid wacom, etc. all regularly get new-PCI-ID
additions into stable.

Against: (a) Not a bug fix. (b) Not tagged `Cc: stable`. (c) No reviewer
explicitly nominated for stable. (d) 128G link speed support is a
separate (non-stable) feature patch — but the adapter remains functional
at supported lower speeds without it.

**Step 9.2: Stable checklist**
1. Obviously correct and tested: YES (pattern-identical to 3 prior
   entries).
2. Fixes a real bug: NO — but falls under stable's explicit hardware-
   enablement exception.
3. Important issue: MEDIUM-HIGH for LPe42100 owners (no adapter
   operation without it).
4. Small and contained: YES (8 lines, 3 files).
5. No new features/APIs: Adds hardware support only — no new UAPI,
   sysfs, or module param.
6. Can apply to stable trees: YES, clean apply expected on 5.15.y
   through 7.0.y.

**Step 9.3: Exception category**
Record: YES — matches the "NEW DEVICE IDs" exception explicitly
documented in the stable rules. The `lpfc` driver already exists in
every active stable tree; only the ID and a model-string case are new.

**Step 9.4: Decision**
The commit meets the stable "new device ID for existing driver"
exception cleanly: driver pre-exists in all active stable trees, diff is
minimal and template-matches the G6/G7/G7P predecessors, regression risk
for non-G8 users is zero, and affected users (data-centre LPe42100
owners on LTS kernels) genuinely cannot use their hardware without it.

## Verification

- [Phase 1] Parsed commit message: confirmed no Fixes/Reported-by/Cc-
  stable/Reviewed-by/Tested-by tags; only author SOB, Link to
  patch.msgid.link, and maintainer SOB.
- [Phase 2] Counted diff hunks: `git show 49b9f31e52b21` — 3 files, 8
  meaningful lines + copyright bumps.
- [Phase 3] `git log --oneline -- drivers/scsi/lpfc/lpfc_ids.h` —
  confirmed f449a3d7a1530 (G7P addition, 2021) as nearest precedent.
- [Phase 3] `git log --oneline 49b9f31e52b21~3..49b9f31e52b21` —
  confirmed series context (128G link speed is the immediate
  predecessor).
- [Phase 3] `git grep PCI_DEVICE_ID_LANCER_G8 origin/master --
  drivers/scsi/lpfc/` — exactly 3 hits, matching the 3 diff sites; no
  hidden dependencies on other G8-specific code.
- [Phase 3] `git merge-base HEAD 49b9f31e52b21` → 6de23f81a5e08 (Linux
  7.0-rc1) — commit is post-7.0 (targeting 7.1 window) and not yet in
  any stable tree.
- [Phase 4] `b4 dig -c 49b9f31e52b21` — found thread at
  lore.kernel.org/all/20260331205928.119833-10-justintee8345@gmail.com.
- [Phase 4] `b4 dig -c 49b9f31e52b21 -a` — confirmed only v1 exists (no
  v2/v3).
- [Phase 4] Downloaded thread via `b4 dig -m`; grep for
  "stable@|NAK|Reviewed-by|Tested-by" → none beyond maintainer SOB.
- [Phase 6] `git grep PCI_DEVICE_ID_LANCER_G7P_FC` on
  stable/linux-5.15.y, 6.1.y, 6.6.y, 6.12.y — all contain the anchor
  lines the diff applies against; 5.10.y does not have G7P (pre-dates
  it).
- [Phase 6] Verified context around `lpfc_get_hba_model_desc()` and
  `lpfc_id_table[]` is identical in 6.12.y to HEAD (sed -n '110,130p' on
  stable/linux-6.12.y copy) → clean apply expected.
- [Phase 8] Verified that full 128G operation depends on commit
  39d1d94166da3, but basic bind/operation at ≤64G does not — confirmed
  by absence of any other G8 reference in the driver.
- UNVERIFIED: Did not actually run the backport against each stable
  branch; clean-apply assertion is based on source-line comparison, not
  `git cherry-pick` dry-runs.
- UNVERIFIED: Did not confirm whether 5.10.y is still actively
  maintained enough to warrant backport (LPe42100 is new enough that
  users on very old LTS likely aren't the target).

The commit is a textbook application of the stable "new device IDs to
existing drivers" exception: small, surgical, maintainer-signed, zero
regression risk for existing users, and enables real hardware for stable
kernel users. The absence of `Cc: stable` is expected (this is the whole
reason for the review). The companion 128G feature patch would be nice
for full performance but is not required for the device to function.

**YES**

 drivers/scsi/lpfc/lpfc_hw.h   | 3 ++-
 drivers/scsi/lpfc/lpfc_ids.h  | 4 +++-
 drivers/scsi/lpfc/lpfc_init.c | 3 +++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_hw.h b/drivers/scsi/lpfc/lpfc_hw.h
index b2e353590ebb5..6326f7353dd68 100644
--- a/drivers/scsi/lpfc/lpfc_hw.h
+++ b/drivers/scsi/lpfc/lpfc_hw.h
@@ -1,7 +1,7 @@
 /*******************************************************************
  * This file is part of the Emulex Linux Device Driver for         *
  * Fibre Channel Host Bus Adapters.                                *
- * Copyright (C) 2017-2025 Broadcom. All Rights Reserved. The term *
+ * Copyright (C) 2017-2026 Broadcom. All Rights Reserved. The term *
  * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.     *
  * Copyright (C) 2004-2016 Emulex.  All rights reserved.           *
  * EMULEX and SLI are trademarks of Emulex.                        *
@@ -1771,6 +1771,7 @@ struct lpfc_fdmi_reg_portattr {
 #define PCI_DEVICE_ID_LANCER_G6_FC  0xe300
 #define PCI_DEVICE_ID_LANCER_G7_FC  0xf400
 #define PCI_DEVICE_ID_LANCER_G7P_FC 0xf500
+#define PCI_DEVICE_ID_LANCER_G8_FC  0xd300
 #define PCI_DEVICE_ID_SAT_SMB       0xf011
 #define PCI_DEVICE_ID_SAT_MID       0xf015
 #define PCI_DEVICE_ID_RFLY          0xf095
diff --git a/drivers/scsi/lpfc/lpfc_ids.h b/drivers/scsi/lpfc/lpfc_ids.h
index 0b1616e93cf47..a0a6e2d379b86 100644
--- a/drivers/scsi/lpfc/lpfc_ids.h
+++ b/drivers/scsi/lpfc/lpfc_ids.h
@@ -1,7 +1,7 @@
 /*******************************************************************
  * This file is part of the Emulex Linux Device Driver for         *
  * Fibre Channel Host Bus Adapters.                                *
- * Copyright (C) 2017-2022 Broadcom. All Rights Reserved. The term *
+ * Copyright (C) 2017-2026 Broadcom. All Rights Reserved. The term *
  * “Broadcom” refers to Broadcom Inc. and/or its subsidiaries.     *
  * Copyright (C) 2004-2016 Emulex.  All rights reserved.           *
  * EMULEX and SLI are trademarks of Emulex.                        *
@@ -118,6 +118,8 @@ const struct pci_device_id lpfc_id_table[] = {
 		PCI_ANY_ID, PCI_ANY_ID, },
 	{PCI_VENDOR_ID_EMULEX, PCI_DEVICE_ID_LANCER_G7P_FC,
 		PCI_ANY_ID, PCI_ANY_ID, },
+	{PCI_VENDOR_ID_EMULEX, PCI_DEVICE_ID_LANCER_G8_FC,
+		PCI_ANY_ID, PCI_ANY_ID, },
 	{PCI_VENDOR_ID_EMULEX, PCI_DEVICE_ID_SKYHAWK,
 		PCI_ANY_ID, PCI_ANY_ID, },
 	{PCI_VENDOR_ID_EMULEX, PCI_DEVICE_ID_SKYHAWK_VF,
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index e9d9ac7da485b..f29e4b8fd02f4 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -2752,6 +2752,9 @@ lpfc_get_hba_model_desc(struct lpfc_hba *phba, uint8_t *mdp, uint8_t *descp)
 	case PCI_DEVICE_ID_LANCER_G7P_FC:
 		m = (typeof(m)){"LPe38000", "PCIe", "Fibre Channel Adapter"};
 		break;
+	case PCI_DEVICE_ID_LANCER_G8_FC:
+		m = (typeof(m)){"LPe42100", "PCIe", "Fibre Channel Adapter"};
+		break;
 	case PCI_DEVICE_ID_SKYHAWK:
 	case PCI_DEVICE_ID_SKYHAWK_VF:
 		oneConnect = 1;
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-28 10:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260428104133.2858589-1-sashal@kernel.org>
2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-5.10] scsi: storvsc: Handle PERSISTENT_RESERVE_IN truncation for Hyper-V vFC Sasha Levin
2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-6.18] scsi: lpfc: Remove unnecessary ndlp kref get in lpfc_check_nlp_post_devloss Sasha Levin
2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-6.18] scsi: ufs: ufs-pci: Add support for Intel Nova Lake Sasha Levin
2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-6.1] scsi: lpfc: Fix incorrect txcmplq_cnt during cleanup in lpfc_sli_abort_ring() Sasha Levin
2026-04-28 10:40 ` [PATCH AUTOSEL 7.0] scsi: virtio_scsi: Move INIT_WORK calls to virtscsi_probe() Sasha Levin
2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-6.6] scsi: ufs: core: Disable timestamp for Kioxia THGJFJT0E25BAIP Sasha Levin
2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-5.10] scsi: esas2r: Fix __printf annotation on esas2r_log_master() Sasha Levin
2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-6.18] scsi: lpfc: Add PCI ID support for LPe42100 series adapters Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox