From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AEF6F5478D for ; Wed, 24 Jun 2026 05:18:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782278321; cv=none; b=A6AB9RNmYrthaWYkEFX2ZlKo8sHqzYihpboPXiFVhZTwHx0xaroo/6q+FBQxXfOPcHIsjat8FSzv1dXFZw1jOqGG8JI/Q21DO+jR2gUZpDlBo3TxNRx4BGxHCVaCZF/0FDKev/3sO1cUECh6u6b6pWdF3uMvKIuaZc0xVdeBPBo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782278321; c=relaxed/simple; bh=Bjk8ZINCjVBPwQ4ozrH5/s/ZeEilrQChyk+sf3KNV5s=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=PBra/T8voXlkfUtxCB3cOt6ABxrEvhSi9ru/6X6Ja1O1D7HTL9jAXZHT2hX8Jks1JD7MGumot8w06cQ6+lWP+Owh7LQIps8zZpNH7mSLAO96PEwp+5RNcIVSX3k33tLIdTzJpg70FdcBEqk7mM6Os52pIIpEBqBHly4RgEnMmJ4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NK0OsaB1; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NK0OsaB1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 51BA31F000E9; Wed, 24 Jun 2026 05:18:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782278319; bh=O5UPikNuB52ULoSKUrP/IHz5ACOPMMwGGMu6Gb5IXkc=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=NK0OsaB1cUAvq/0V4TzI/35dUtXnKvcXvgzwKbM64M8EwuCcWyWPmgzOhKZ1x+tSq 5g3DboHyR7l1P5QRTFmDp4chx6kLDngoi5SE0P0Lo+ye9snXa/2GbdMJfWxJQknFBp kYlpyR/2X6KvyMne6Bgv2j14/+lNfp1afhCf9TZ0T07sk6Sg6LYN5LU0J3GpF/Gtt5 hXJCURgBgtbXxHaaeqaDnPX7PDcKdOVl4fUV8a6LerOuHXAnxrCxGVnueiKOLWQQ4Y cfgH2Q70NLNO/+4KPnSJRli6NRF6Y7zplWIiRINj4wVnVbOv+U9bf9/zivTZdDG5pD E5KygMNaeMN9g== From: sashiko-bot@kernel.org Subject: Re: [PATCH v5 05/13] scsi: fnic: Add FDLS role handling for NVMe initiators Reply-To: sashiko-reviews@lists.linux.dev To: "Karan Tilak Kumar" Cc: linux-scsi@vger.kernel.org In-Reply-To: <20260624044334.3079-6-kartilak@cisco.com> References: <20260624044334.3079-1-kartilak@cisco.com> <20260624044334.3079-6-kartilak@cisco.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Wed, 24 Jun 2026 05:18:38 +0000 Message-Id: <20260624051839.51BA31F000E9@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-scsi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 8 potential issue(= s) to consider: New issues: - [High] The NVMe PRLI payload is truncated because `frame_size` is calcula= ted using the smaller FCP payload structure. - [High] NVMe initiators incorrectly validate the FCP Target Function bit i= nstead of the NVMe Target Function bit. Pre-existing issues: - [Critical] `fnic_cleanup_io` calls `schedule_timeout(100)` while executin= g in a hardirq context, causing a kernel panic. - [Critical] The `notify_timer` unconditionally re-arms itself, bypassing `= timer_delete_sync` during module teardown and causing a Use-After-Free. - [High] Dropping the lock inside a `list_for_each_entry_safe` loop creates= a Use-After-Free condition. - [High] The `fnic` structure is freed without being removed from the globa= l `reset_fnic_list`, resulting in a Use-After-Free in the worker thread. - [High] The `fabric.retry_timer` is never cancelled before freeing the `fn= ic` structure, leading to a Use-After-Free. - [High] The code restores a stale IRQ state by using the `fnic->lock_flags= ` struct member instead of the caller's saved flags. -- commit 5d6ee3fb48072d18a71bfad0fac88f211a7027ca Author: Karan Tilak Kumar scsi: fnic: Add FDLS role handling for NVMe initiators Modify FDLS registration and discovery flows to use NVMe FC-4 type, features, PRLI service parameters, and FDMI attributes when the adapter runs as an NVMe initiator. Limit SCSI host setup, teardown, rport reset, and FC host notifications to FCP initiators. > diff --git a/drivers/scsi/fnic/fdls_disc.c b/drivers/scsi/fnic/fdls_disc.c > index f665460ef62c5..e03256183ac63 100644 > --- a/drivers/scsi/fnic/fdls_disc.c > +++ b/drivers/scsi/fnic/fdls_disc.c [ ... ] > @@ -1264,11 +1284,14 @@ bool fdls_delete_tport(struct fnic_iport_s *iport= , struct fnic_tport_s *tport) > tport->timer_pending =3D 0; > } > =20 > - spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags); > - fnic_rport_exch_reset(iport->fnic, tport->fcid); > - spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags); > + if (IS_FNIC_FCP_INITIATOR(fnic)) { > + spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags); > + fnic_rport_exch_reset(iport->fnic, tport->fcid); > + spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags); > + } [Severity: High] This is a pre-existing issue, but is it safe to use the struct member fnic->lock_flags to restore the lock state here? In fnic_tport_event_handler(), the fnic_lock is acquired using a local flags variable, so fnic->lock_flags is never updated. When fdls_delete_tport() subsequently drops the lock here using fnic->lock_flags, it restores a potentially stale IRQ state. This can corrupt the CPU's interrupt state and lead to deadlocks. [ ... ] > @@ -1508,13 +1535,21 @@ fdls_send_tgt_prli(struct fnic_iport_s *iport, st= ruct fnic_tport_s *tport) > .fchdr =3D {.fh_r_ctl =3D FC_RCTL_ELS_REQ, .fh_type =3D FC_TYPE_ELS, > .fh_f_ctl =3D {FNIC_ELS_REQ_FCTL, 0, 0}, > .fh_rx_id =3D cpu_to_be16(FNIC_UNASSIGNED_RXID)}, > - .els_prli =3D {.prli_cmd =3D ELS_PRLI, > - .prli_spp_len =3D 16, > - .prli_len =3D cpu_to_be16(0x14)}, > - .sp =3D {.spp_type =3D 0x08, .spp_flags =3D 0x0020, > - .spp_params =3D cpu_to_be32(0xA2)} > + .els_prli =3D {.prli_cmd =3D ELS_PRLI}, > + .sp =3D {.spp_params =3D cpu_to_be32(iport->service_params)} > }; > =20 > + fdls_set_frame_type(fnic, &pprli->sp.spp_type); > + if (IS_FNIC_FCP_INITIATOR(fnic)) { > + pprli->els_prli.prli_spp_len =3D 16; > + pprli->els_prli.prli_len =3D cpu_to_be16(0x14); > + pprli->sp.spp_flags =3D FC_SPP_EST_IMG_PAIR; > + } else if (IS_FNIC_NVME_INITIATOR(fnic)) { > + /* Per FC-NVMe, Establish Image Pair must not be set for NVMe PRLI. */ > + pprli->els_prli.prli_spp_len =3D 20; > + pprli->els_prli.prli_len =3D cpu_to_be16(0x18); > + } [Severity: High] Does this result in a truncated NVMe payload? The NVMe PRLI length is correctly set to 24 bytes here (0x18), but the frame_size allocated for the transmission at the beginning of fdls_send_tgt_prli() is calculated using sizeof(struct fc_std_els_prli). Because that struct only accommodates the smaller 16-byte FCP payload, the NVMe request is sent with an incomplete payload, which violates the FC-NVMe standard. [ ... ] > @@ -2852,14 +2893,22 @@ fdls_process_tgt_prli_rsp(struct fnic_iport_s *ip= ort, > tport->fcp_csp =3D be32_to_cpu(prli_rsp->sp.spp_params); > tport->retry_counter =3D 0; > =20 > - if (tport->fcp_csp & FCP_SPPF_RETRY) > - tport->tgt_flags |=3D FNIC_FC_RP_FLAGS_RETRY; > + if (IS_FNIC_FCP_INITIATOR(fnic)) { > + if (tport->fcp_csp & FCP_SPPF_RETRY) > + tport->tgt_flags |=3D FNIC_FC_RP_FLAGS_RETRY; > + } else if (IS_FNIC_NVME_INITIATOR(fnic)) { > + if (tport->fcp_csp & FNIC_NVME_SP_SLER) > + tport->tgt_flags |=3D FNIC_FC_RP_FLAGS_RETRY; > + } > =20 > /* Check if the device plays Target Mode Function */ > - if (!(tport->fcp_csp & FCP_PRLI_FUNC_TARGET)) { > + if ((IS_FNIC_FCP_INITIATOR(fnic) && > + !(tport->fcp_csp & FCP_PRLI_FUNC_TARGET)) || > + (IS_FNIC_NVME_INITIATOR(fnic) && > + !(tport->fcp_csp & FCP_PRLI_FUNC_TARGET))) { [Severity: High] Are we checking the correct bit for NVMe targets? The condition explicitly checks the FCP_PRLI_FUNC_TARGET bit (bit 4) for bo= th FCP and NVMe initiators. NVMe targets use a different bit mask (FNIC_NVME_FEAT_TARG, which is bit 0) to indicate target support. This caus= es the driver to incorrectly evaluate target support when connecting to NVMe endpoints. [ ... ] > @@ -5087,14 +5143,18 @@ void fnic_fdls_link_down(struct fnic_iport_s *ipo= rt) > fdls_set_state((&iport->fabric), FDLS_STATE_LINKDOWN); > iport->fabric.flags =3D 0; > =20 > - spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags); > - fnic_fcpio_reset(iport->fnic); > - spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags); > - list_for_each_entry_safe(tport, next, &iport->tport_list, links) { > - FNIC_FCS_DBG(KERN_INFO, fnic, > + if (IS_FNIC_FCP_INITIATOR(fnic) || > + IS_FNIC_NVME_INITIATOR(fnic)) { > + spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags); > + fnic_fcpio_reset(iport->fnic); > + spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags); > + list_for_each_entry_safe(tport, next, &iport->tport_list, links) { > + FNIC_FCS_DBG(KERN_INFO, fnic, > "removing rport: 0x%x", tport->fcid); > - fdls_delete_tport(iport, tport); > + fdls_delete_tport(iport, tport); > + } > } [Severity: High] This is a pre-existing issue, but does dropping the lock inside list_for_each_entry_safe() defeat the safety of the loop iteration? When fdls_delete_tport() is called inside the loop body, it explicitly drops fnic->fnic_lock. While unlocked, another thread or workqueue can delete and free the pre-fetched next element. This can lead to a use-after-free crash when the loop iterates. [ ... ] > diff --git a/drivers/scsi/fnic/fnic_fcs.c b/drivers/scsi/fnic/fnic_fcs.c > index ca592bc3f6182..a67d1085d720b 100644 > --- a/drivers/scsi/fnic/fnic_fcs.c > +++ b/drivers/scsi/fnic/fnic_fcs.c [ ... ] > @@ -1110,12 +1111,15 @@ void fnic_reset_work_handler(struct work_struct *= work) > spin_unlock_irqrestore(&reset_fnic_list_lock, > reset_fnic_list_lock_flags); > =20 > - dev_err(&cur_fnic->pdev->dev, "fnic: <%d>: issuing a host reset\n", > - cur_fnic->fnic_num); > - host_reset_ret_code =3D fnic_host_reset(cur_fnic->host); > - dev_err(&cur_fnic->pdev->dev, > - "fnic: <%d>: returned from host reset with status: %d\n", > - cur_fnic->fnic_num, host_reset_ret_code); > + if (IS_FNIC_FCP_INITIATOR(cur_fnic)) { > + dev_err(&cur_fnic->pdev->dev, > + "fnic: <%d>: issuing a host reset\n", > + cur_fnic->fnic_num); > + host_reset_ret_code =3D fnic_host_reset(cur_fnic->host); > + dev_err(&cur_fnic->pdev->dev, > + "fnic: <%d>: returned from host reset with status: %d\n", > + cur_fnic->fnic_num, host_reset_ret_code); > + } [Severity: High] This is a pre-existing issue, but does this leak the fnic structure onto a global list when the device is removed? When fdls_process_rscn() is called, it adds the fnic instance to the global reset_fnic_list. If the device is subsequently removed, fnic_remove() frees the fnic memory without deleting it from reset_fnic_list or flushing the reset_fnic_work_queue. The background worker iterates over reset_fnic_list here and will access the freed cur_fnic object, resulting in a use-after-free. [ ... ] > diff --git a/drivers/scsi/fnic/fnic_main.c b/drivers/scsi/fnic/fnic_main.c > index 5850d51b0e8f6..0d7828be244da 100644 > --- a/drivers/scsi/fnic/fnic_main.c > +++ b/drivers/scsi/fnic/fnic_main.c [ ... ] > @@ -1184,7 +1188,8 @@ static void fnic_remove(struct pci_dev *pdev) > */ > flush_workqueue(fnic_event_queue); > =20 > - fnic_scsi_unload(fnic); > + if (IS_FNIC_FCP_INITIATOR(fnic)) > + fnic_scsi_unload(fnic); > =20 > if (vnic_dev_get_intr_mode(fnic->vdev) =3D=3D VNIC_DEV_INTR_MODE_MSI) > timer_delete_sync(&fnic->notify_timer); [Severity: Critical] This is a pre-existing issue, but can this lead to a use-after-free due to a self-rearming timer? The timer callback fnic_notify_timer() unconditionally re-arms itself with mod_timer(). Calling timer_delete_sync() here will wait for the currently executing callback to finish, but the callback will queue itself again. This leaves a pending timer firing after the fnic structure is freed below. [Severity: High] This is a pre-existing issue, but is the fabric.retry_timer left active? The discovery process arms iport->fabric.retry_timer, but it is not cancelled in fnic_remove() alongside the other timers like fip_timer and notify_timer. This can cause the timer callback to execute after the fnic memory is freed. [ ... ] > diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c > index b92260583c67b..37d1e136b7b91 100644 > --- a/drivers/scsi/fnic/fnic_scsi.c > +++ b/drivers/scsi/fnic/fnic_scsi.c > @@ -732,7 +732,8 @@ static int fnic_fcpio_fw_reset_cmpl_handler(struct fn= ic *fnic, > atomic64_inc(&reset_stats->fw_reset_completions); > =20 > /* Clean up all outstanding io requests */ > - fnic_cleanup_io(fnic, SCSI_NO_TAG); > + if (IS_FNIC_FCP_INITIATOR(fnic)) > + fnic_cleanup_io(fnic, SCSI_NO_TAG); [Severity: Critical] This is a pre-existing issue, but does this lead to sleeping in atomic context? The fnic_fcpio_fw_reset_cmpl_handler() function executes in a hardirq conte= xt (via an MSI-X interrupt). It calls fnic_cleanup_io() which contains a while loop that unconditionally calls schedule_timeout(msecs_to_jiffies(100)). Calling a sleeping function in a hardirq context causes a kernel panic. --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260624044334.3079= -1-kartilak@cisco.com?part=3D5