From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F2698380FE6 for ; Wed, 24 Jun 2026 05:13:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782277982; cv=none; b=NVBm0YfDCEaQKYeG6LMuUH0klJWH5vy7QAJGr03wkEKq3LFpp0rOBbqXiXX+ZqaSo4w8NNpPOR2MwAEJc9H9GT6lCm9icKsO1LFq5JmGHBETK6EFG7k0kvmcqJe90d6o0nyrG1+Qq2SKkFSJGkYysTeNic5m583mdIAxjPpZheg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782277982; c=relaxed/simple; bh=6fbWd2k8xoONYDhHT72z+pjrn/xScTOYvBrtIHit/tA=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=bwrZ3w//7w1VWcEX3XVkGNUaLkCryeiE/PKL1nw9mYEq4L3BQN8fLqGhXowvcOGQxofCTNmJofgGeDtsLH4IfCxN7WDOYmzzo1t0SvCzirOPJQRPjUP/MHyCAQ9FS1kyiPY0e2febOA40lPd1WmQRAta81/R9AUcnjGQIeUHf9Q= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dKR+YrSe; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dKR+YrSe" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 707741F000E9; Wed, 24 Jun 2026 05:13:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782277980; bh=QJaYJe9iS//bFkyBVgN+uIBX0ErW+xDumojEMitUwm0=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=dKR+YrSespoPnfGAotONcv6wMA53KA7flCaeuCQNffKuffz0G6WV5l9ncHU5J/eb4 Ikx4cbJGaGX3vISf3GO5i52mzQM5T6C25k1dRcQ+q//P6H+2EqA5hEtqoor+5BH1Qx dMkWGLtzggKIbRl9KqX1iHSzfuUdrnjK0z5/RxyRZLPHJyJr9h6un2VDVervMjgMFH HO5SfDbZuUYjSGTogJ1IeF23ALQfRjvXQ3xUyasp/NSR67Xcn1XVp6TGQVmL3xT1EN x89Rd8lZEAy4dxA1aZ6bFHiX1HJ0yOkqUZ8cYaoE85q/9ZY/Oqj9AF+STap3Hj5upx 6ZGxcQOmkjlnA== From: sashiko-bot@kernel.org Subject: Re: [PATCH v5 08/13] scsi: fnic: Handle NVMe LS frames in FDLS Reply-To: sashiko-reviews@lists.linux.dev To: "Karan Tilak Kumar" Cc: linux-scsi@vger.kernel.org In-Reply-To: <20260624044334.3079-9-kartilak@cisco.com> References: <20260624044334.3079-1-kartilak@cisco.com> <20260624044334.3079-9-kartilak@cisco.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Wed, 24 Jun 2026 05:13:00 +0000 Message-Id: <20260624051300.707741F000E9@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-scsi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 5 potential issue(= s) to consider: New issues: - [High] Using non-synchronous cancel_delayed_work() in fdls_reset_oxid_poo= l() allows concurrent lockless execution of the retry worker, leading to st= ate corruption and OXID double-allocations. - [High] Mismatched lock/unlock flags cause IRQ state corruption when acqui= ring fnic_lock with local flags and releasing it with the stale global fnic= ->lock_flags. - [High] Using non-sync timer_delete() in nvfnic_ls_rsp_recv() causes a UAF= in the timer callback. Pre-existing issues: - [High] Dropping a lock inside list_for_each_entry_safe() allows concurren= t deletion of the 'next' pointer. - [High] Module unload/teardown sequence fails to synchronously cancel the = oxid_pool delayed works, leading to a UAF crash. -- commit 04dceb559b15b01cb194adfb5c26b4d828e90ace Author: Karan Tilak Kumar scsi: fnic: Handle NVMe LS frames in FDLS Classify NVMe LS request OXIDs, route NVMe LS responses and ABTS frames through the FCS receive path, and reset NVMe exchanges when FDLS tears down target ports. > diff --git a/drivers/scsi/fnic/fdls_disc.c b/drivers/scsi/fnic/fdls_disc.c > --- a/drivers/scsi/fnic/fdls_disc.c > +++ b/drivers/scsi/fnic/fdls_disc.c [ ... ] > @@ -387,10 +387,25 @@ static bool fdls_is_oxid_tgt_req(uint16_t oxid) > =20 > static void fdls_reset_oxid_pool(struct fnic_iport_s *iport) > { > struct fnic_oxid_pool_s *oxid_pool =3D &iport->oxid_pool; > + struct reclaim_entry_s *reclaim_entry, *next; > =20 > + cancel_delayed_work(&oxid_pool->oxid_reclaim_work); > + cancel_delayed_work(&oxid_pool->schedule_oxid_free_retry); > + list_for_each_entry_safe(reclaim_entry, next, > + &oxid_pool->oxid_reclaim_list, links) { > + list_del(&reclaim_entry->links); > + kfree(reclaim_entry); > + } > + bitmap_clear(oxid_pool->pending_schedule_free, 0, FNIC_OXID_POOL_SZ); > + bitmap_clear(oxid_pool->bitmap, 0, FNIC_OXID_POOL_SZ); [Severity: High] Can this race with the lockless retry worker? fdls_reset_oxid_pool() uses non-synchronous cancel_delayed_work() and immediately clears the bitmaps. If fdls_schedule_oxid_free_retry_work() is traversing pending_schedule_free locklessly on another CPU, could it read a set bit just before the reset clears the bitmap, allocate memory, and block on fnic_lock? Once the reset finishes and unlocks, the worker might inject a stale reclaim entry into the now empty list and schedule oxid_reclaim_work. When the reclaim work fires, could it clear a bit in oxid_pool->bitmap that may have been reallocated to a new active I/O request, causing an OXID collision? [Severity: High] This is a pre-existing issue, but does the module unload/teardown sequence fail to synchronously cancel these oxid_pool delayed works? If fnic_remove() initiates driver teardown and frees the fnic struct without calling cancel_delayed_work_sync(), flush_workqueue(fnic_event_queue) won't synchronize them because they are queued on the system workqueue. Could fdls_reclaim_oxid_handler() execute after the fnic structure is freed, causing a use-after-free crash? > oxid_pool->next_idx =3D 0; > } [ ... ] > @@ -1288,6 +1303,10 @@ bool fdls_delete_tport(struct fnic_iport_s *iport,= struct fnic_tport_s *tport) > spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags); > fnic_rport_exch_reset(iport->fnic, tport->fcid); > spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags); > + } else if (IS_FNIC_NVME_INITIATOR(fnic)) { > + spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags); > + nvfnic_exch_reset(iport, tport); > + spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags); > } [Severity: High] This is a pre-existing issue, but does dropping fnic_lock here allow concurrent modification of the tport_list? When fnic_fdls_link_down() iterates over tport_list using list_for_each_entry_safe(), it caches the next pointer. Since fdls_delete_tport() unlocks the list briefly, could another context process a concurrent event and delete the cached next entry? If the cached next entry is freed while the lock is dropped, won't the loop dereference a freed pointer when it continues? [Severity: High] Does this sequence corrupt the hardware IRQ masking state? When callers like fdls_tport_timer_callback() or fnic_tport_event_handler() acquire the fnic_lock using a local flags variable: spin_lock_irqsave(&fnic->fnic_lock, flags); and then invoke helpers like fdls_delete_tport() or nvfnic_process_ls_abts_rsp() which temporarily drop and reacquire the lock using the struct member fnic->lock_flags: spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags); Because fnic->lock_flags was never initialized by the original caller, does it restore an arbitrary stale IRQ state? Reacquiring the lock then overwrit= es fnic->lock_flags, causing the caller to unlock with the original local flags and severely corrupt the IRQ state. [ ... ] > diff --git a/drivers/scsi/fnic/fnic_fcs.c b/drivers/scsi/fnic/fnic_fcs.c > --- a/drivers/scsi/fnic/fnic_fcs.c > +++ b/drivers/scsi/fnic/fnic_fcs.c [ ... ] > @@ -313,8 +319,14 @@ void fnic_handle_frame(struct work_struct *work) > fchdr_offset =3D (cur_frame->rx_ethhdr_stripped) ? > 0 : FNIC_ETH_FCOE_HDRS_OFFSET; > =20 > - fnic_fdls_recv_frame(&fnic->iport, cur_frame->fp, > - cur_frame->frame_len, fchdr_offset); > + fchdr =3D (struct fc_frame_header *)((u8 *)cur_frame->fp + fchdr_offse= t); > + if (IS_FNIC_NVME_INITIATOR(fnic) && fnic_is_nvme_frame(fchdr)) { > + nvfnic_ls_rsp_recv(&fnic->iport, fchdr, > + cur_frame->frame_len - fchdr_offset); [Severity: High] Does calling nvfnic_ls_rsp_recv() here introduce a use-after-free risk in its timer callback? Inside nvfnic_ls_rsp_recv(), it executes with fnic_lock held and invokes timer_delete(&nvfnic_ls_req->ls_req_timer) without synchronization. If the timer callback nvfnic_ls_req_timeout() is concurrently executing on another CPU, it will spin-wait on fnic_lock. nvfnic_ls_rsp_recv() then drops the lock and calls lsreq->done(), which commonly causes the transport layer to free the request and its private payload. Once the lock is dropped, will the spinning timer callback unblock, acquire the lock, and inspect the now-freed nvfnic_ls_req memory? > + } else { > + fnic_fdls_recv_frame(&fnic->iport, cur_frame->fp, > + cur_frame->frame_len, fchdr_offset); > + } --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260624044334.3079= -1-kartilak@cisco.com?part=3D8