Re: [PATCH v5 08/13] scsi: fnic: Handle NVMe LS frames in FDLS

Linux SCSI subsystem development
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Karan Tilak Kumar" <kartilak@cisco.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH v5 08/13] scsi: fnic: Handle NVMe LS frames in FDLS
Date: Wed, 24 Jun 2026 05:13:00 +0000	[thread overview]
Message-ID: <20260624051300.707741F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260624044334.3079-9-kartilak@cisco.com>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] Using non-synchronous cancel_delayed_work() in fdls_reset_oxid_pool() allows concurrent lockless execution of the retry worker, leading to state corruption and OXID double-allocations.
- [High] Mismatched lock/unlock flags cause IRQ state corruption when acquiring fnic_lock with local flags and releasing it with the stale global fnic->lock_flags.
- [High] Using non-sync timer_delete() in nvfnic_ls_rsp_recv() causes a UAF in the timer callback.

Pre-existing issues:
- [High] Dropping a lock inside list_for_each_entry_safe() allows concurrent deletion of the 'next' pointer.
- [High] Module unload/teardown sequence fails to synchronously cancel the oxid_pool delayed works, leading to a UAF crash.
--

commit 04dceb559b15b01cb194adfb5c26b4d828e90ace
Author: Karan Tilak Kumar <kartilak@cisco.com>

scsi: fnic: Handle NVMe LS frames in FDLS

Classify NVMe LS request OXIDs, route NVMe LS responses and ABTS frames
through the FCS receive path, and reset NVMe exchanges when FDLS tears
down target ports.

> diff --git a/drivers/scsi/fnic/fdls_disc.c b/drivers/scsi/fnic/fdls_disc.c
> --- a/drivers/scsi/fnic/fdls_disc.c
> +++ b/drivers/scsi/fnic/fdls_disc.c

[ ... ]

> @@ -387,10 +387,25 @@ static bool fdls_is_oxid_tgt_req(uint16_t oxid)
>  
>  static void fdls_reset_oxid_pool(struct fnic_iport_s *iport)
>  {
>  	struct fnic_oxid_pool_s *oxid_pool = &iport->oxid_pool;
> +	struct reclaim_entry_s *reclaim_entry, *next;
>  
> +	cancel_delayed_work(&oxid_pool->oxid_reclaim_work);
> +	cancel_delayed_work(&oxid_pool->schedule_oxid_free_retry);
> +	list_for_each_entry_safe(reclaim_entry, next,
> +				 &oxid_pool->oxid_reclaim_list, links) {
> +		list_del(&reclaim_entry->links);
> +		kfree(reclaim_entry);
> +	}
> +	bitmap_clear(oxid_pool->pending_schedule_free, 0, FNIC_OXID_POOL_SZ);
> +	bitmap_clear(oxid_pool->bitmap, 0, FNIC_OXID_POOL_SZ);

[Severity: High]
Can this race with the lockless retry worker? fdls_reset_oxid_pool() uses
non-synchronous cancel_delayed_work() and immediately clears the bitmaps.

If fdls_schedule_oxid_free_retry_work() is traversing
pending_schedule_free locklessly on another CPU, could it read a set bit
just before the reset clears the bitmap, allocate memory, and block on
fnic_lock?

Once the reset finishes and unlocks, the worker might inject a stale
reclaim entry into the now empty list and schedule oxid_reclaim_work. When
the reclaim work fires, could it clear a bit in oxid_pool->bitmap that may
have been reallocated to a new active I/O request, causing an OXID
collision?

[Severity: High]
This is a pre-existing issue, but does the module unload/teardown sequence
fail to synchronously cancel these oxid_pool delayed works?

If fnic_remove() initiates driver teardown and frees the fnic struct without
calling cancel_delayed_work_sync(), flush_workqueue(fnic_event_queue) won't
synchronize them because they are queued on the system workqueue. Could
fdls_reclaim_oxid_handler() execute after the fnic structure is freed,
causing a use-after-free crash?

>  	oxid_pool->next_idx = 0;
>  }

[ ... ]

> @@ -1288,6 +1303,10 @@ bool fdls_delete_tport(struct fnic_iport_s *iport, struct fnic_tport_s *tport)
>  		spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
>  		fnic_rport_exch_reset(iport->fnic, tport->fcid);
>  		spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
> +	} else if (IS_FNIC_NVME_INITIATOR(fnic)) {
> +		spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);
> +		nvfnic_exch_reset(iport, tport);
> +		spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);
>  	}

[Severity: High]
This is a pre-existing issue, but does dropping fnic_lock here allow
concurrent modification of the tport_list?

When fnic_fdls_link_down() iterates over tport_list using
list_for_each_entry_safe(), it caches the next pointer. Since
fdls_delete_tport() unlocks the list briefly, could another context
process a concurrent event and delete the cached next entry?

If the cached next entry is freed while the lock is dropped, won't the
loop dereference a freed pointer when it continues?

[Severity: High]
Does this sequence corrupt the hardware IRQ masking state?

When callers like fdls_tport_timer_callback() or fnic_tport_event_handler()
acquire the fnic_lock using a local flags variable:

spin_lock_irqsave(&fnic->fnic_lock, flags);

and then invoke helpers like fdls_delete_tport() or
nvfnic_process_ls_abts_rsp() which temporarily drop and reacquire the lock
using the struct member fnic->lock_flags:

spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags);

Because fnic->lock_flags was never initialized by the original caller, does
it restore an arbitrary stale IRQ state? Reacquiring the lock then overwrites
fnic->lock_flags, causing the caller to unlock with the original local flags
and severely corrupt the IRQ state.

[ ... ]

> diff --git a/drivers/scsi/fnic/fnic_fcs.c b/drivers/scsi/fnic/fnic_fcs.c
> --- a/drivers/scsi/fnic/fnic_fcs.c
> +++ b/drivers/scsi/fnic/fnic_fcs.c

[ ... ]

> @@ -313,8 +319,14 @@ void fnic_handle_frame(struct work_struct *work)
>  		fchdr_offset = (cur_frame->rx_ethhdr_stripped) ?
>  			0 : FNIC_ETH_FCOE_HDRS_OFFSET;
>  
> -		fnic_fdls_recv_frame(&fnic->iport, cur_frame->fp,
> -							 cur_frame->frame_len, fchdr_offset);
> +		fchdr = (struct fc_frame_header *)((u8 *)cur_frame->fp + fchdr_offset);
> +		if (IS_FNIC_NVME_INITIATOR(fnic) && fnic_is_nvme_frame(fchdr)) {
> +			nvfnic_ls_rsp_recv(&fnic->iport, fchdr,
> +					  cur_frame->frame_len - fchdr_offset);

[Severity: High]
Does calling nvfnic_ls_rsp_recv() here introduce a use-after-free risk
in its timer callback?

Inside nvfnic_ls_rsp_recv(), it executes with fnic_lock held and invokes
timer_delete(&nvfnic_ls_req->ls_req_timer) without synchronization. If the
timer callback nvfnic_ls_req_timeout() is concurrently executing on another
CPU, it will spin-wait on fnic_lock.

nvfnic_ls_rsp_recv() then drops the lock and calls lsreq->done(), which
commonly causes the transport layer to free the request and its private
payload. Once the lock is dropped, will the spinning timer callback unblock,
acquire the lock, and inspect the now-freed nvfnic_ls_req memory?

> +		} else {
> +			fnic_fdls_recv_frame(&fnic->iport, cur_frame->fp,
> +					     cur_frame->frame_len, fchdr_offset);
> +		}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260624044334.3079-1-kartilak@cisco.com?part=8

next prev parent reply	other threads:[~2026-06-24  5:13 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24  4:43 [PATCH v5 00/13] Introduce functionality for NVMe initiator Karan Tilak Kumar
2026-06-24  4:43 ` [PATCH v5 01/13] scsi: fnic: Make debug logging protocol independent Karan Tilak Kumar
2026-06-24  5:08   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 02/13] scsi: fnic: Use fnic_num for non-SCSI identifiers Karan Tilak Kumar
2026-06-24  5:11   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 03/13] scsi: fnic: Decode firmware role configuration Karan Tilak Kumar
2026-06-24  4:43 ` [PATCH v5 04/13] scsi: fnic: Advertise NVMe initiator service parameters Karan Tilak Kumar
2026-06-24  4:43 ` [PATCH v5 05/13] scsi: fnic: Add FDLS role handling for NVMe initiators Karan Tilak Kumar
2026-06-24  5:18   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 06/13] scsi: fnic: Add the NVMe/FC transport path Karan Tilak Kumar
2026-06-24  5:12   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 07/13] scsi: fnic: Route completions and resets by initiator role Karan Tilak Kumar
2026-06-24  5:11   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 08/13] scsi: fnic: Handle NVMe LS frames in FDLS Karan Tilak Kumar
2026-06-24  5:13   ` sashiko-bot [this message]
2026-06-24  4:43 ` [PATCH v5 09/13] scsi: fnic: Send NVMe LS requests through FDLS Karan Tilak Kumar
2026-06-24  5:11   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 10/13] scsi: fnic: Abort timed-out NVMe LS requests Karan Tilak Kumar
2026-06-24  5:13   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 11/13] scsi: fnic: Track NVMe transport statistics Karan Tilak Kumar
2026-06-24  5:15   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 12/13] scsi: fnic: Expose NVMe transport state in debugfs Karan Tilak Kumar
2026-06-24  5:17   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 13/13] scsi: fnic: Bump up version number Karan Tilak Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260624051300.707741F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=kartilak@cisco.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox