From: sashiko-bot@kernel.org
To: "Karan Tilak Kumar" <kartilak@cisco.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH v5 05/13] scsi: fnic: Add FDLS role handling for NVMe initiators
Date: Wed, 24 Jun 2026 05:18:38 +0000 [thread overview]
Message-ID: <20260624051839.51BA31F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260624044334.3079-6-kartilak@cisco.com>
Thank you for your contribution! Sashiko AI review found 8 potential issue(s) to consider:
New issues:
- [High] The NVMe PRLI payload is truncated because `frame_size` is calculated using the smaller FCP payload structure.
- [High] NVMe initiators incorrectly validate the FCP Target Function bit instead of the NVMe Target Function bit.
Pre-existing issues:
- [Critical] `fnic_cleanup_io` calls `schedule_timeout(100)` while executing in a hardirq context, causing a kernel panic.
- [Critical] The `notify_timer` unconditionally re-arms itself, bypassing `timer_delete_sync` during module teardown and causing a Use-After-Free.
- [High] Dropping the lock inside a `list_for_each_entry_safe` loop creates a Use-After-Free condition.
- [High] The `fnic` structure is freed without being removed from the global `reset_fnic_list`, resulting in a Use-After-Free in the worker thread.
- [High] The `fabric.retry_timer` is never cancelled before freeing the `fnic` structure, leading to a Use-After-Free.
- [High] The code restores a stale IRQ state by using the `fnic->lock_flags` struct member instead of the caller's saved flags.
--
commit 5d6ee3fb48072d18a71bfad0fac88f211a7027ca
Author: Karan Tilak Kumar <kartilak@cisco.com>
scsi: fnic: Add FDLS role handling for NVMe initiators
Modify FDLS registration and discovery flows to use NVMe FC-4 type,
features, PRLI service parameters, and FDMI attributes when the adapter
runs as an NVMe initiator. Limit SCSI host setup, teardown, rport reset,
and FC host notifications to FCP initiators.
> diff --git a/drivers/scsi/fnic/fdls_disc.c b/drivers/scsi/fnic/fdls_disc.c
> index f665460ef62c5..e03256183ac63 100644
> --- a/drivers/scsi/fnic/fdls_disc.c
> +++ b/drivers/scsi/fnic/fdls_disc.c
[ ... ]
> @@ -1264,11 +1284,14 @@ bool fdls_delete_tport(struct fnic_iport_s *iport, struct fnic_tport_s *tport)
> tport->timer_pending = 0;
> }
>
> - spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
> - fnic_rport_exch_reset(iport->fnic, tport->fcid);
> - spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
> + if (IS_FNIC_FCP_INITIATOR(fnic)) {
> + spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
> + fnic_rport_exch_reset(iport->fnic, tport->fcid);
> + spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
> + }
[Severity: High]
This is a pre-existing issue, but is it safe to use the struct member
fnic->lock_flags to restore the lock state here?
In fnic_tport_event_handler(), the fnic_lock is acquired using a local flags
variable, so fnic->lock_flags is never updated. When fdls_delete_tport()
subsequently drops the lock here using fnic->lock_flags, it restores a
potentially stale IRQ state. This can corrupt the CPU's interrupt state and
lead to deadlocks.
[ ... ]
> @@ -1508,13 +1535,21 @@ fdls_send_tgt_prli(struct fnic_iport_s *iport, struct fnic_tport_s *tport)
> .fchdr = {.fh_r_ctl = FC_RCTL_ELS_REQ, .fh_type = FC_TYPE_ELS,
> .fh_f_ctl = {FNIC_ELS_REQ_FCTL, 0, 0},
> .fh_rx_id = cpu_to_be16(FNIC_UNASSIGNED_RXID)},
> - .els_prli = {.prli_cmd = ELS_PRLI,
> - .prli_spp_len = 16,
> - .prli_len = cpu_to_be16(0x14)},
> - .sp = {.spp_type = 0x08, .spp_flags = 0x0020,
> - .spp_params = cpu_to_be32(0xA2)}
> + .els_prli = {.prli_cmd = ELS_PRLI},
> + .sp = {.spp_params = cpu_to_be32(iport->service_params)}
> };
>
> + fdls_set_frame_type(fnic, &pprli->sp.spp_type);
> + if (IS_FNIC_FCP_INITIATOR(fnic)) {
> + pprli->els_prli.prli_spp_len = 16;
> + pprli->els_prli.prli_len = cpu_to_be16(0x14);
> + pprli->sp.spp_flags = FC_SPP_EST_IMG_PAIR;
> + } else if (IS_FNIC_NVME_INITIATOR(fnic)) {
> + /* Per FC-NVMe, Establish Image Pair must not be set for NVMe PRLI. */
> + pprli->els_prli.prli_spp_len = 20;
> + pprli->els_prli.prli_len = cpu_to_be16(0x18);
> + }
[Severity: High]
Does this result in a truncated NVMe payload?
The NVMe PRLI length is correctly set to 24 bytes here (0x18), but the
frame_size allocated for the transmission at the beginning of
fdls_send_tgt_prli() is calculated using sizeof(struct fc_std_els_prli).
Because that struct only accommodates the smaller 16-byte FCP payload, the
NVMe request is sent with an incomplete payload, which violates the FC-NVMe
standard.
[ ... ]
> @@ -2852,14 +2893,22 @@ fdls_process_tgt_prli_rsp(struct fnic_iport_s *iport,
> tport->fcp_csp = be32_to_cpu(prli_rsp->sp.spp_params);
> tport->retry_counter = 0;
>
> - if (tport->fcp_csp & FCP_SPPF_RETRY)
> - tport->tgt_flags |= FNIC_FC_RP_FLAGS_RETRY;
> + if (IS_FNIC_FCP_INITIATOR(fnic)) {
> + if (tport->fcp_csp & FCP_SPPF_RETRY)
> + tport->tgt_flags |= FNIC_FC_RP_FLAGS_RETRY;
> + } else if (IS_FNIC_NVME_INITIATOR(fnic)) {
> + if (tport->fcp_csp & FNIC_NVME_SP_SLER)
> + tport->tgt_flags |= FNIC_FC_RP_FLAGS_RETRY;
> + }
>
> /* Check if the device plays Target Mode Function */
> - if (!(tport->fcp_csp & FCP_PRLI_FUNC_TARGET)) {
> + if ((IS_FNIC_FCP_INITIATOR(fnic) &&
> + !(tport->fcp_csp & FCP_PRLI_FUNC_TARGET)) ||
> + (IS_FNIC_NVME_INITIATOR(fnic) &&
> + !(tport->fcp_csp & FCP_PRLI_FUNC_TARGET))) {
[Severity: High]
Are we checking the correct bit for NVMe targets?
The condition explicitly checks the FCP_PRLI_FUNC_TARGET bit (bit 4) for both
FCP and NVMe initiators. NVMe targets use a different bit mask
(FNIC_NVME_FEAT_TARG, which is bit 0) to indicate target support. This causes
the driver to incorrectly evaluate target support when connecting to NVMe
endpoints.
[ ... ]
> @@ -5087,14 +5143,18 @@ void fnic_fdls_link_down(struct fnic_iport_s *iport)
> fdls_set_state((&iport->fabric), FDLS_STATE_LINKDOWN);
> iport->fabric.flags = 0;
>
> - spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
> - fnic_fcpio_reset(iport->fnic);
> - spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
> - list_for_each_entry_safe(tport, next, &iport->tport_list, links) {
> - FNIC_FCS_DBG(KERN_INFO, fnic,
> + if (IS_FNIC_FCP_INITIATOR(fnic) ||
> + IS_FNIC_NVME_INITIATOR(fnic)) {
> + spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
> + fnic_fcpio_reset(iport->fnic);
> + spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
> + list_for_each_entry_safe(tport, next, &iport->tport_list, links) {
> + FNIC_FCS_DBG(KERN_INFO, fnic,
> "removing rport: 0x%x", tport->fcid);
> - fdls_delete_tport(iport, tport);
> + fdls_delete_tport(iport, tport);
> + }
> }
[Severity: High]
This is a pre-existing issue, but does dropping the lock inside
list_for_each_entry_safe() defeat the safety of the loop iteration?
When fdls_delete_tport() is called inside the loop body, it explicitly drops
fnic->fnic_lock. While unlocked, another thread or workqueue can delete and
free the pre-fetched next element. This can lead to a use-after-free crash
when the loop iterates.
[ ... ]
> diff --git a/drivers/scsi/fnic/fnic_fcs.c b/drivers/scsi/fnic/fnic_fcs.c
> index ca592bc3f6182..a67d1085d720b 100644
> --- a/drivers/scsi/fnic/fnic_fcs.c
> +++ b/drivers/scsi/fnic/fnic_fcs.c
[ ... ]
> @@ -1110,12 +1111,15 @@ void fnic_reset_work_handler(struct work_struct *work)
> spin_unlock_irqrestore(&reset_fnic_list_lock,
> reset_fnic_list_lock_flags);
>
> - dev_err(&cur_fnic->pdev->dev, "fnic: <%d>: issuing a host reset\n",
> - cur_fnic->fnic_num);
> - host_reset_ret_code = fnic_host_reset(cur_fnic->host);
> - dev_err(&cur_fnic->pdev->dev,
> - "fnic: <%d>: returned from host reset with status: %d\n",
> - cur_fnic->fnic_num, host_reset_ret_code);
> + if (IS_FNIC_FCP_INITIATOR(cur_fnic)) {
> + dev_err(&cur_fnic->pdev->dev,
> + "fnic: <%d>: issuing a host reset\n",
> + cur_fnic->fnic_num);
> + host_reset_ret_code = fnic_host_reset(cur_fnic->host);
> + dev_err(&cur_fnic->pdev->dev,
> + "fnic: <%d>: returned from host reset with status: %d\n",
> + cur_fnic->fnic_num, host_reset_ret_code);
> + }
[Severity: High]
This is a pre-existing issue, but does this leak the fnic structure onto a
global list when the device is removed?
When fdls_process_rscn() is called, it adds the fnic instance to the global
reset_fnic_list. If the device is subsequently removed, fnic_remove() frees
the fnic memory without deleting it from reset_fnic_list or flushing the
reset_fnic_work_queue. The background worker iterates over reset_fnic_list
here and will access the freed cur_fnic object, resulting in a
use-after-free.
[ ... ]
> diff --git a/drivers/scsi/fnic/fnic_main.c b/drivers/scsi/fnic/fnic_main.c
> index 5850d51b0e8f6..0d7828be244da 100644
> --- a/drivers/scsi/fnic/fnic_main.c
> +++ b/drivers/scsi/fnic/fnic_main.c
[ ... ]
> @@ -1184,7 +1188,8 @@ static void fnic_remove(struct pci_dev *pdev)
> */
> flush_workqueue(fnic_event_queue);
>
> - fnic_scsi_unload(fnic);
> + if (IS_FNIC_FCP_INITIATOR(fnic))
> + fnic_scsi_unload(fnic);
>
> if (vnic_dev_get_intr_mode(fnic->vdev) == VNIC_DEV_INTR_MODE_MSI)
> timer_delete_sync(&fnic->notify_timer);
[Severity: Critical]
This is a pre-existing issue, but can this lead to a use-after-free due to
a self-rearming timer?
The timer callback fnic_notify_timer() unconditionally re-arms itself with
mod_timer(). Calling timer_delete_sync() here will wait for the currently
executing callback to finish, but the callback will queue itself again. This
leaves a pending timer firing after the fnic structure is freed below.
[Severity: High]
This is a pre-existing issue, but is the fabric.retry_timer left active?
The discovery process arms iport->fabric.retry_timer, but it is not
cancelled in fnic_remove() alongside the other timers like fip_timer and
notify_timer. This can cause the timer callback to execute after the fnic
memory is freed.
[ ... ]
> diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
> index b92260583c67b..37d1e136b7b91 100644
> --- a/drivers/scsi/fnic/fnic_scsi.c
> +++ b/drivers/scsi/fnic/fnic_scsi.c
> @@ -732,7 +732,8 @@ static int fnic_fcpio_fw_reset_cmpl_handler(struct fnic *fnic,
> atomic64_inc(&reset_stats->fw_reset_completions);
>
> /* Clean up all outstanding io requests */
> - fnic_cleanup_io(fnic, SCSI_NO_TAG);
> + if (IS_FNIC_FCP_INITIATOR(fnic))
> + fnic_cleanup_io(fnic, SCSI_NO_TAG);
[Severity: Critical]
This is a pre-existing issue, but does this lead to sleeping in atomic
context?
The fnic_fcpio_fw_reset_cmpl_handler() function executes in a hardirq context
(via an MSI-X interrupt). It calls fnic_cleanup_io() which contains a while
loop that unconditionally calls schedule_timeout(msecs_to_jiffies(100)).
Calling a sleeping function in a hardirq context causes a kernel panic.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260624044334.3079-1-kartilak@cisco.com?part=5
next prev parent reply other threads:[~2026-06-24 5:18 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-24 4:43 [PATCH v5 00/13] Introduce functionality for NVMe initiator Karan Tilak Kumar
2026-06-24 4:43 ` [PATCH v5 01/13] scsi: fnic: Make debug logging protocol independent Karan Tilak Kumar
2026-06-24 5:08 ` sashiko-bot
2026-06-24 4:43 ` [PATCH v5 02/13] scsi: fnic: Use fnic_num for non-SCSI identifiers Karan Tilak Kumar
2026-06-24 5:11 ` sashiko-bot
2026-06-24 4:43 ` [PATCH v5 03/13] scsi: fnic: Decode firmware role configuration Karan Tilak Kumar
2026-06-24 4:43 ` [PATCH v5 04/13] scsi: fnic: Advertise NVMe initiator service parameters Karan Tilak Kumar
2026-06-24 4:43 ` [PATCH v5 05/13] scsi: fnic: Add FDLS role handling for NVMe initiators Karan Tilak Kumar
2026-06-24 5:18 ` sashiko-bot [this message]
2026-06-24 4:43 ` [PATCH v5 06/13] scsi: fnic: Add the NVMe/FC transport path Karan Tilak Kumar
2026-06-24 5:12 ` sashiko-bot
2026-06-24 4:43 ` [PATCH v5 07/13] scsi: fnic: Route completions and resets by initiator role Karan Tilak Kumar
2026-06-24 5:11 ` sashiko-bot
2026-06-24 4:43 ` [PATCH v5 08/13] scsi: fnic: Handle NVMe LS frames in FDLS Karan Tilak Kumar
2026-06-24 5:13 ` sashiko-bot
2026-06-24 4:43 ` [PATCH v5 09/13] scsi: fnic: Send NVMe LS requests through FDLS Karan Tilak Kumar
2026-06-24 5:11 ` sashiko-bot
2026-06-24 4:43 ` [PATCH v5 10/13] scsi: fnic: Abort timed-out NVMe LS requests Karan Tilak Kumar
2026-06-24 5:13 ` sashiko-bot
2026-06-24 4:43 ` [PATCH v5 11/13] scsi: fnic: Track NVMe transport statistics Karan Tilak Kumar
2026-06-24 5:15 ` sashiko-bot
2026-06-24 4:43 ` [PATCH v5 12/13] scsi: fnic: Expose NVMe transport state in debugfs Karan Tilak Kumar
2026-06-24 5:17 ` sashiko-bot
2026-06-24 4:43 ` [PATCH v5 13/13] scsi: fnic: Bump up version number Karan Tilak Kumar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260624051839.51BA31F000E9@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=kartilak@cisco.com \
--cc=linux-scsi@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox