Re: [PATCH v4 01/13] scsi: fnic: Make debug logging protocol independent

Linux SCSI subsystem development
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Karan Tilak Kumar" <kartilak@cisco.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH v4 01/13] scsi: fnic: Make debug logging protocol independent
Date: Fri, 12 Jun 2026 18:42:32 +0000	[thread overview]
Message-ID: <20260612184233.1FDAB1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260612180918.8554-2-kartilak@cisco.com>

Thank you for your contribution! Sashiko AI review found 10 potential issue(s) to consider:

Pre-existing issues:
- [High] Workqueue corruption due to dynamic INIT_WORK on a shared work_struct in timer callbacks.
- [High] Infinite loop in FIP VLAN parsing caused by zero-length descriptors.
- [High] Out-of-bounds read in fdls_process_gpn_ft_tgt_list due to unadjusted payload length.
- [High] Memory leak of control frames when fnic_send_fcoe_frame() fails.
- [High] OXID allocation permanently halts due to missing wrap-around search in find_next_zero_bit.
- [High] Use-after-free (UAF) due to missing cancellation of delayed works and timers in fnic_remove.
- [High] Use-after-free caused by dropping a spinlock inside list_for_each_entry_safe.
- [High] Out-of-bounds memory read in FIP parsing because actual packet length is discarded.
- [High] Out-of-bounds memory read when processing spoofed RSCN payload lengths.
- [Medium] Memory leak of target port (tport) when event allocation fails in fdls_delete_tport.
--

commit ef42b3354b5f1530ee27d88277ffc1381b470972
Author: Karan Tilak Kumar <kartilak@cisco.com>
Subject: scsi: fnic: Make debug logging protocol independent

This patch updates the fnic debug macros to take a fnic structure rather
than a Scsi_Host. This allows FCP and NVMe initiator roles to share the same
logging interface. It also adds role predicates and logging masks to properly
route output to the appropriate reporting mechanism.

> diff --git a/drivers/scsi/fnic/fdls_disc.c b/drivers/scsi/fnic/fdls_disc.c
> index 554dea7678854..5880ca28a0ad9 100644
> --- a/drivers/scsi/fnic/fdls_disc.c
> +++ b/drivers/scsi/fnic/fdls_disc.c
[ ... ]
> @@ -136,7 +136,7 @@ uint16_t fdls_alloc_oxid(struct fnic_iport_s *iport, int oxid_frame_type,
>  	 */
>  	idx = find_next_zero_bit(oxid_pool->bitmap, FNIC_OXID_POOL_SZ, oxid_pool->next_idx);
>  	if (idx == FNIC_OXID_POOL_SZ) {
> -		FNIC_FCS_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> +		FNIC_FCS_DBG(KERN_INFO, fnic,
>  			"Alloc oxid: all oxid slots are busy iport state:%d\n",
>  			iport->state);
>  		return FNIC_UNASSIGNED_OXID;

[Severity: High]
This is a pre-existing issue, but does this logic permanently halt allocation 
if it reaches the end of the pool? If find_next_zero_bit() starts at
oxid_pool->next_idx and reaches FNIC_OXID_POOL_SZ, it appears to return an
error rather than wrapping around to index 0 to search the rest of the
bitmap. Because next_idx is not advanced on failure, all subsequent
allocations might search the same empty tail and fail indefinitely.

[ ... ]
> @@ -1272,9 +1272,9 @@ bool fdls_delete_tport(struct fnic_iport_s *iport, struct fnic_tport_s *tport)
>  		tport_del_evt =
>  			kzalloc_obj(struct fnic_tport_event_s, GFP_ATOMIC);
>  		if (!tport_del_evt) {
> -			FNIC_FCS_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> -				 "Failed to allocate memory for tport fcid: 0x%0x\n",
> -				 tport->fcid);
> +			FNIC_FCS_DBG(KERN_INFO, fnic,
> +					"iport: 0x%x tport 0x%x: Failed to allocate memory\n",
> +					iport->fcid, tport->fcid);
>  			return false;
>  		}
>  		tport_del_evt->event = TGT_EV_RPORT_DEL;

[Severity: Medium]
This isn't a bug introduced by this patch, but does returning false here
leak the tport structure? If kzalloc_obj() fails, the target port remains
in the iport->tport_list despite being marked OFFLINING and TERMINATING.

[ ... ]
> @@ -1330,7 +1330,7 @@ fdls_send_tgt_plogi(struct fnic_iport_s *iport, struct fnic_tport_s *tport)
>  	hton24(d_id, tport->fcid);
>  	FNIC_STD_SET_D_ID(pplogi->fchdr, d_id);
>  
> -	FNIC_FCS_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> +	FNIC_FCS_DBG(KERN_INFO, fnic,
>  				 "0x%x: FDLS send tgt PLOGI to tgt: 0x%x with oxid: 0x%x",
>  				 iport->fcid, tport->fcid, oxid);
>  

[Severity: High]
This is a pre-existing issue, but are we leaking control frames if the
transmission fails? In fdls_send_tgt_plogi(), fnic_send_fcoe_frame() is
called right after this debug statement, but its return value is ignored.
If the underlying DMA mapping or workqueue is full, does the allocated
mempool frame stay permanently leaked?

[ ... ]
> @@ -3232,7 +3244,7 @@ fdls_process_gpn_ft_tgt_list(struct fnic_iport_s *iport,
>  		rem_len -= sizeof(struct fc_gpn_ft_rsp_iu);
>  	}
>  	if (rem_len <= 0) {
> -		FNIC_FCS_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> +		FNIC_FCS_DBG(KERN_INFO, fnic,
>  			 "GPN_FT response: malformed/corrupt frame rxlen: %d remlen: %d",
>  			 len, rem_len);
>  }

[Severity: High]
This isn't a bug introduced by this patch, but does the payload parsing read
out of bounds here? Earlier in fdls_process_gpn_ft_tgt_list(), len is
adjusted by subtracting the FC and CT headers, but rem_len is initialized to
the original unadjusted len. Does the while (rem_len > 0) loop process extra
bytes past the end of the packet if the FC_NS_FID_LAST bit is missing?

> @@ -3242,7 +3254,7 @@ fdls_process_gpn_ft_tgt_list(struct fnic_iport_s *iport,
>  		list_for_each_entry_safe(tport, next, &iport->tport_list, links) {
>  
>  			if (!(tport->flags & FNIC_FDLS_TPORT_IN_GPN_FT_LIST)) {
> -				FNIC_FCS_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> +				FNIC_FCS_DBG(KERN_INFO, fnic,
>  					 "Remove port: 0x%x not found in GPN_FT list",
>  					 tport->fcid);
>  				fdls_delete_tport(iport, tport);

[Severity: High]
This is a pre-existing issue, but could this loop trigger a use-after-free?
The list_for_each_entry_safe() loop caches the next pointer. However,
fdls_delete_tport() intentionally drops the fnic->fnic_lock internally.
While the lock is dropped, can a concurrent event delete the cached next
element from the list?

[ ... ]
> @@ -4527,7 +4539,7 @@ fdls_process_rscn(struct fnic_iport_s *iport, struct fc_frame_header *fchdr)
>  		num_ports = (rscn_payload_len - 4) / rscn->els.rscn_page_len;
>  		rscn_port = (struct fc_els_rscn_page *)(rscn + 1);
>  	}
> -	FNIC_FCS_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> +	FNIC_FCS_DBG(KERN_INFO, fnic,
>  			 "RSCN received for num_ports: %d payload_len: %d page_len: %d ",
>  		     num_ports, rscn_payload_len, rscn->els.rscn_page_len);
>  

[Severity: High]
This is a pre-existing issue, but does this code trust the RSCN payload
length blindly? The rscn_payload_len variable is read directly from the
network packet without validating it against the actual received buffer
length. If a spoofed frame specifies a large length, does this loop read
out-of-bounds?

> diff --git a/drivers/scsi/fnic/fip.c b/drivers/scsi/fnic/fip.c
> index 132f00512ee16..21d08a710f0df 100644
> --- a/drivers/scsi/fnic/fip.c
> +++ b/drivers/scsi/fnic/fip.c
[ ... ]
> @@ -155,14 +152,14 @@ void fnic_fcoe_process_vlan_resp(struct fnic *fnic, struct fip_header *fiph)
>  			list_add_tail(&vlan->list, &fnic->vlan_list);
>  			break;
>  		}
> -		FNIC_FIP_DBG(KERN_INFO, fnic->host,
> -			     fnic->fnic_num,
> -			     "Invalid descriptor type(%x) in VLan response\n",
> -			     vlan_desc->fd_desc.fip_dtype);
>  		/*
> -		 * Note : received a type=2 descriptor here i.e. FIP
> -		 * MAC Address Descriptor
> +		 * Note : skip any type=2 descriptor here
> +		 * (i.e. FIP MAC Address Descriptor)
>  		 */
> +		if (vlan_desc->fd_desc.fip_dtype != FIP_DT_MAC)
> +			FNIC_FIP_DBG(KERN_INFO, fnic,
> +				     "Invalid descriptor type(0x%x) in vlan response\n",
> +				     vlan_desc->fd_desc.fip_dtype);
>  		cur_desc += vlan_desc->fd_desc.fip_dlen;
>  		desc_len -= vlan_desc->fd_desc.fip_dlen;
>  	}

[Severity: High]
This is a pre-existing issue, but can this loop run infinitely if
vlan_desc->fd_desc.fip_dlen is 0? The while (desc_len > 0) loop would
fail to advance cur_desc or decrement desc_len. Because this function runs
with interrupts disabled (spin_lock_irqsave), could a maliciously crafted
zero-length descriptor cause a hard lockup?

[ ... ]
> @@ -706,13 +706,13 @@ int fdls_fip_recv_frame(struct fnic *fnic, void *frame)
>  		return true;
>  	}
>  
> -	FNIC_FCS_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> +	FNIC_FCS_DBG(KERN_INFO, fnic,
>  		"Not a FIP Frame");
>  	return false;
>  }

[Severity: High]
This isn't a bug introduced by this patch, but does the FIP parsing read
out of bounds here? At the start of fdls_fip_recv_frame(), the packet length
appears to be hardcoded to 2048 instead of using the actual received frame
size. If a packet declares large descriptors but the actual payload is small,
would parsing read past the end of the buffer?

>  
>  void fnic_work_on_fip_timer(struct work_struct *work)
>  {
>  	struct fnic *fnic = container_of(work, struct fnic, fip_timer_work);
>  	struct fnic_iport_s *iport = &fnic->iport;
>  
> -	FNIC_FIP_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> +	FNIC_FIP_DBG(KERN_INFO, fnic,
>  		     "FIP timeout\n");

[Severity: High]
This is a pre-existing issue, but is the shared fip_timer_work susceptible
to workqueue corruption? It appears that fnic_handle_fip_timer() and
fnic_handle_fcs_ka_timer() unconditionally call INIT_WORK on 
fnic->fip_timer_work before queuing it. If these fire concurrently or if one
queues the work while it is already pending/executing, does INIT_WORK
corrupt the active work_struct linked list pointers?

General Observations
====================

[Severity: High]
This is a pre-existing issue, but does fnic_remove() tear down resources
safely? During driver teardown, fnic_remove() frees the fnic structure but
fails to cancel oxid_pool.oxid_reclaim_work,
oxid_pool.schedule_oxid_free_retry, and iport.fabric.retry_timer. If any of
these delayed works or timers fire after teardown, could they execute their
callbacks and access the freed fnic structure, causing a use-after-free
crash?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260612180918.8554-1-kartilak@cisco.com?part=1

next prev parent reply	other threads:[~2026-06-12 18:42 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-12 18:09 [PATCH v4 00/13] Introduce functionality for NVMe initiator Karan Tilak Kumar
2026-06-12 18:09 ` [PATCH v4 01/13] scsi: fnic: Make debug logging protocol independent Karan Tilak Kumar
2026-06-12 18:42   ` sashiko-bot [this message]
2026-06-12 18:09 ` [PATCH v4 02/13] scsi: fnic: Use fnic_num for non-SCSI identifiers Karan Tilak Kumar
2026-06-12 18:57   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 03/13] scsi: fnic: Decode firmware role configuration Karan Tilak Kumar
2026-06-12 18:09 ` [PATCH v4 04/13] scsi: fnic: Advertise NVMe initiator service parameters Karan Tilak Kumar
2026-06-12 18:09 ` [PATCH v4 05/13] scsi: fnic: Add FDLS role handling for NVMe initiators Karan Tilak Kumar
2026-06-12 21:42   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 06/13] scsi: fnic: Add the NVMe/FC transport path Karan Tilak Kumar
2026-06-12 21:59   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 07/13] scsi: fnic: Route completions and resets by initiator role Karan Tilak Kumar
2026-06-12 22:15   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 08/13] scsi: fnic: Handle NVMe LS frames in FDLS Karan Tilak Kumar
2026-06-12 22:29   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 09/13] scsi: fnic: Send NVMe LS requests through FDLS Karan Tilak Kumar
2026-06-12 22:45   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 10/13] scsi: fnic: Abort timed-out NVMe LS requests Karan Tilak Kumar
2026-06-12 22:57   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 11/13] scsi: fnic: Track NVMe transport statistics Karan Tilak Kumar
2026-06-12 23:16   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 12/13] scsi: fnic: Expose NVMe transport state in debugfs Karan Tilak Kumar
2026-06-12 23:22   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 13/13] scsi: fnic: Bump up version number Karan Tilak Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260612184233.1FDAB1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=kartilak@cisco.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox