Re: [PATCH v4 01/13] scsi: fnic: Make debug logging protocol independent

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Karan Tilak Kumar" <kartilak@cisco.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH v4 01/13] scsi: fnic: Make debug logging protocol independent
Date: Fri, 12 Jun 2026 18:42:32 +0000	[thread overview]
Message-ID: <20260612184233.1FDAB1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260612180918.8554-2-kartilak@cisco.com>

Thank you for your contribution! Sashiko AI review found 10 potential issue(s) to consider:

Pre-existing issues:
- [High] Workqueue corruption due to dynamic INIT_WORK on a shared work_struct in timer callbacks.
- [High] Infinite loop in FIP VLAN parsing caused by zero-length descriptors.
- [High] Out-of-bounds read in fdls_process_gpn_ft_tgt_list due to unadjusted payload length.
- [High] Memory leak of control frames when fnic_send_fcoe_frame() fails.
- [High] OXID allocation permanently halts due to missing wrap-around search in find_next_zero_bit.
- [High] Use-after-free (UAF) due to missing cancellation of delayed works and timers in fnic_remove.
- [High] Use-after-free caused by dropping a spinlock inside list_for_each_entry_safe.
- [High] Out-of-bounds memory read in FIP parsing because actual packet length is discarded.
- [High] Out-of-bounds memory read when processing spoofed RSCN payload lengths.
- [Medium] Memory leak of target port (tport) when event allocation fails in fdls_delete_tport.
--

commit ef42b3354b5f1530ee27d88277ffc1381b470972
Author: Karan Tilak Kumar <kartilak@cisco.com>
Subject: scsi: fnic: Make debug logging protocol independent

This patch updates the fnic debug macros to take a fnic structure rather
than a Scsi_Host. This allows FCP and NVMe initiator roles to share the same
logging interface. It also adds role predicates and logging masks to properly
route output to the appropriate reporting mechanism.

> diff --git a/drivers/scsi/fnic/fdls_disc.c b/drivers/scsi/fnic/fdls_disc.c
> index 554dea7678854..5880ca28a0ad9 100644
> --- a/drivers/scsi/fnic/fdls_disc.c
> +++ b/drivers/scsi/fnic/fdls_disc.c
[ ... ]
> @@ -136,7 +136,7 @@ uint16_t fdls_alloc_oxid(struct fnic_iport_s *iport, int oxid_frame_type,
>  	 */
>  	idx = find_next_zero_bit(oxid_pool->bitmap, FNIC_OXID_POOL_SZ, oxid_pool->next_idx);
>  	if (idx == FNIC_OXID_POOL_SZ) {
> -		FNIC_FCS_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> +		FNIC_FCS_DBG(KERN_INFO, fnic,
>  			"Alloc oxid: all oxid slots are busy iport state:%d\n",
>  			iport->state);
>  		return FNIC_UNASSIGNED_OXID;

[Severity: High]
This is a pre-existing issue, but does this logic permanently halt allocation 
if it reaches the end of the pool? If find_next_zero_bit() starts at
oxid_pool->next_idx and reaches FNIC_OXID_POOL_SZ, it appears to return an
error rather than wrapping around to index 0 to search the rest of the
bitmap. Because next_idx is not advanced on failure, all subsequent
allocations might search the same empty tail and fail indefinitely.

[ ... ]
> @@ -1272,9 +1272,9 @@ bool fdls_delete_tport(struct fnic_iport_s *iport, struct fnic_tport_s *tport)
>  		tport_del_evt =
>  			kzalloc_obj(struct fnic_tport_event_s, GFP_ATOMIC);
>  		if (!tport_del_evt) {
> -			FNIC_FCS_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> -				 "Failed to allocate memory for tport fcid: 0x%0x\n",
> -				 tport->fcid);
> +			FNIC_FCS_DBG(KERN_INFO, fnic,
> +					"iport: 0x%x tport 0x%x: Failed to allocate memory\n",
> +					iport->fcid, tport->fcid);
>  			return false;
>  		}
>  		tport_del_evt->event = TGT_EV_RPORT_DEL;

[Severity: Medium]
This isn't a bug introduced by this patch, but does returning false here
leak the tport structure? If kzalloc_obj() fails, the target port remains
in the iport->tport_list despite being marked OFFLINING and TERMINATING.

[ ... ]
> @@ -1330,7 +1330,7 @@ fdls_send_tgt_plogi(struct fnic_iport_s *iport, struct fnic_tport_s *tport)
>  	hton24(d_id, tport->fcid);
>  	FNIC_STD_SET_D_ID(pplogi->fchdr, d_id);
>  
> -	FNIC_FCS_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> +	FNIC_FCS_DBG(KERN_INFO, fnic,
>  				 "0x%x: FDLS send tgt PLOGI to tgt: 0x%x with oxid: 0x%x",
>  				 iport->fcid, tport->fcid, oxid);
>  

[Severity: High]
This is a pre-existing issue, but are we leaking control frames if the
transmission fails? In fdls_send_tgt_plogi(), fnic_send_fcoe_frame() is
called right after this debug statement, but its return value is ignored.
If the underlying DMA mapping or workqueue is full, does the allocated
mempool frame stay permanently leaked?

[ ... ]
> @@ -3232,7 +3244,7 @@ fdls_process_gpn_ft_tgt_list(struct fnic_iport_s *iport,
>  		rem_len -= sizeof(struct fc_gpn_ft_rsp_iu);
>  	}
>  	if (rem_len <= 0) {
> -		FNIC_FCS_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> +		FNIC_FCS_DBG(KERN_INFO, fnic,
>  			 "GPN_FT response: malformed/corrupt frame rxlen: %d remlen: %d",
>  			 len, rem_len);
>  }

[Severity: High]
This isn't a bug introduced by this patch, but does the payload parsing read
out of bounds here? Earlier in fdls_process_gpn_ft_tgt_list(), len is
adjusted by subtracting the FC and CT headers, but rem_len is initialized to
the original unadjusted len. Does the while (rem_len > 0) loop process extra
bytes past the end of the packet if the FC_NS_FID_LAST bit is missing?

> @@ -3242,7 +3254,7 @@ fdls_process_gpn_ft_tgt_list(struct fnic_iport_s *iport,
>  		list_for_each_entry_safe(tport, next, &iport->tport_list, links) {
>  
>  			if (!(tport->flags & FNIC_FDLS_TPORT_IN_GPN_FT_LIST)) {
> -				FNIC_FCS_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> +				FNIC_FCS_DBG(KERN_INFO, fnic,
>  					 "Remove port: 0x%x not found in GPN_FT list",
>  					 tport->fcid);
>  				fdls_delete_tport(iport, tport);

[Severity: High]
This is a pre-existing issue, but could this loop trigger a use-after-free?
The list_for_each_entry_safe() loop caches the next pointer. However,
fdls_delete_tport() intentionally drops the fnic->fnic_lock internally.
While the lock is dropped, can a concurrent event delete the cached next
element from the list?

[ ... ]
> @@ -4527,7 +4539,7 @@ fdls_process_rscn(struct fnic_iport_s *iport, struct fc_frame_header *fchdr)
>  		num_ports = (rscn_payload_len - 4) / rscn->els.rscn_page_len;
>  		rscn_port = (struct fc_els_rscn_page *)(rscn + 1);
>  	}
> -	FNIC_FCS_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> +	FNIC_FCS_DBG(KERN_INFO, fnic,
>  			 "RSCN received for num_ports: %d payload_len: %d page_len: %d ",
>  		     num_ports, rscn_payload_len, rscn->els.rscn_page_len);
>  

[Severity: High]
This is a pre-existing issue, but does this code trust the RSCN payload
length blindly? The rscn_payload_len variable is read directly from the
network packet without validating it against the actual received buffer
length. If a spoofed frame specifies a large length, does this loop read
out-of-bounds?

> diff --git a/drivers/scsi/fnic/fip.c b/drivers/scsi/fnic/fip.c
> index 132f00512ee16..21d08a710f0df 100644
> --- a/drivers/scsi/fnic/fip.c
> +++ b/drivers/scsi/fnic/fip.c
[ ... ]
> @@ -155,14 +152,14 @@ void fnic_fcoe_process_vlan_resp(struct fnic *fnic, struct fip_header *fiph)
>  			list_add_tail(&vlan->list, &fnic->vlan_list);
>  			break;
>  		}
> -		FNIC_FIP_DBG(KERN_INFO, fnic->host,
> -			     fnic->fnic_num,
> -			     "Invalid descriptor type(%x) in VLan response\n",
> -			     vlan_desc->fd_desc.fip_dtype);
>  		/*
> -		 * Note : received a type=2 descriptor here i.e. FIP
> -		 * MAC Address Descriptor
> +		 * Note : skip any type=2 descriptor here
> +		 * (i.e. FIP MAC Address Descriptor)
>  		 */
> +		if (vlan_desc->fd_desc.fip_dtype != FIP_DT_MAC)
> +			FNIC_FIP_DBG(KERN_INFO, fnic,
> +				     "Invalid descriptor type(0x%x) in vlan response\n",
> +				     vlan_desc->fd_desc.fip_dtype);
>  		cur_desc += vlan_desc->fd_desc.fip_dlen;
>  		desc_len -= vlan_desc->fd_desc.fip_dlen;
>  	}

[Severity: High]
This is a pre-existing issue, but can this loop run infinitely if
vlan_desc->fd_desc.fip_dlen is 0? The while (desc_len > 0) loop would
fail to advance cur_desc or decrement desc_len. Because this function runs
with interrupts disabled (spin_lock_irqsave), could a maliciously crafted
zero-length descriptor cause a hard lockup?

[ ... ]
> @@ -706,13 +706,13 @@ int fdls_fip_recv_frame(struct fnic *fnic, void *frame)
>  		return true;
>  	}
>  
> -	FNIC_FCS_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> +	FNIC_FCS_DBG(KERN_INFO, fnic,
>  		"Not a FIP Frame");
>  	return false;
>  }

[Severity: High]
This isn't a bug introduced by this patch, but does the FIP parsing read
out of bounds here? At the start of fdls_fip_recv_frame(), the packet length
appears to be hardcoded to 2048 instead of using the actual received frame
size. If a packet declares large descriptors but the actual payload is small,
would parsing read past the end of the buffer?

>  
>  void fnic_work_on_fip_timer(struct work_struct *work)
>  {
>  	struct fnic *fnic = container_of(work, struct fnic, fip_timer_work);
>  	struct fnic_iport_s *iport = &fnic->iport;
>  
> -	FNIC_FIP_DBG(KERN_INFO, fnic->host, fnic->fnic_num,
> +	FNIC_FIP_DBG(KERN_INFO, fnic,
>  		     "FIP timeout\n");

[Severity: High]
This is a pre-existing issue, but is the shared fip_timer_work susceptible
to workqueue corruption? It appears that fnic_handle_fip_timer() and
fnic_handle_fcs_ka_timer() unconditionally call INIT_WORK on 
fnic->fip_timer_work before queuing it. If these fire concurrently or if one
queues the work while it is already pending/executing, does INIT_WORK
corrupt the active work_struct linked list pointers?

General Observations
====================

[Severity: High]
This is a pre-existing issue, but does fnic_remove() tear down resources
safely? During driver teardown, fnic_remove() frees the fnic structure but
fails to cancel oxid_pool.oxid_reclaim_work,
oxid_pool.schedule_oxid_free_retry, and iport.fabric.retry_timer. If any of
these delayed works or timers fire after teardown, could they execute their
callbacks and access the freed fnic structure, causing a use-after-free
crash?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260612180918.8554-1-kartilak@cisco.com?part=1

next prev parent reply	other threads:[~2026-06-12 18:42 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-12 18:09 [PATCH v4 00/13] Introduce functionality for NVMe initiator Karan Tilak Kumar
2026-06-12 18:09 ` [PATCH v4 01/13] scsi: fnic: Make debug logging protocol independent Karan Tilak Kumar
2026-06-12 18:42   ` sashiko-bot [this message]
2026-06-12 18:09 ` [PATCH v4 02/13] scsi: fnic: Use fnic_num for non-SCSI identifiers Karan Tilak Kumar
2026-06-12 18:57   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 03/13] scsi: fnic: Decode firmware role configuration Karan Tilak Kumar
2026-06-12 18:09 ` [PATCH v4 04/13] scsi: fnic: Advertise NVMe initiator service parameters Karan Tilak Kumar
2026-06-12 18:09 ` [PATCH v4 05/13] scsi: fnic: Add FDLS role handling for NVMe initiators Karan Tilak Kumar
2026-06-12 21:42   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 06/13] scsi: fnic: Add the NVMe/FC transport path Karan Tilak Kumar
2026-06-12 21:59   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 07/13] scsi: fnic: Route completions and resets by initiator role Karan Tilak Kumar
2026-06-12 22:15   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 08/13] scsi: fnic: Handle NVMe LS frames in FDLS Karan Tilak Kumar
2026-06-12 22:29   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 09/13] scsi: fnic: Send NVMe LS requests through FDLS Karan Tilak Kumar
2026-06-12 22:45   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 10/13] scsi: fnic: Abort timed-out NVMe LS requests Karan Tilak Kumar
2026-06-12 22:57   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 11/13] scsi: fnic: Track NVMe transport statistics Karan Tilak Kumar
2026-06-12 23:16   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 12/13] scsi: fnic: Expose NVMe transport state in debugfs Karan Tilak Kumar
2026-06-12 23:22   ` sashiko-bot
2026-06-12 18:09 ` [PATCH v4 13/13] scsi: fnic: Bump up version number Karan Tilak Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260612184233.1FDAB1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=kartilak@cisco.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.