Linux SCSI subsystem development
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Karan Tilak Kumar" <kartilak@cisco.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH v5 07/13] scsi: fnic: Route completions and resets by initiator role
Date: Wed, 24 Jun 2026 05:11:53 +0000	[thread overview]
Message-ID: <20260624051153.C379C1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260624044334.3079-8-kartilak@cisco.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] Asynchronous NVMe IO completion work queued during driver teardown executes after the fnic structure is freed, leading to a Use-After-Free.

Pre-existing issues:
- [Critical] The firmware reset completion handler calls `schedule_timeout()` from a hardirq context, leading to a "scheduling while atomic" kernel panic.
- [High] Using a shared struct member (`fnic->lock_flags`) to store IRQ state for `spin_lock_irqsave` violates locking semantics and can corrupt IRQ state across CPUs.
--

commit 0746859ac4c31fbfb285f20a291564287cfa0119
Author: Karan Tilak Kumar <kartilak@cisco.com>

scsi: fnic: Route completions and resets by initiator role

Dispatch FCPIO command, response, and ITMF completions to the FCP or
NVMe handlers based on the configured role. Read the NVMe queue-depth
and timeout retry fields from firmware config, clean up NVMe I/O on
firmware reset, and skip SCSI-only cleanup for initiator roles.

[ ... ]
> diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
> index 04ab384033b19..9607684bc6107 100644
> --- a/drivers/scsi/fnic/fnic_scsi.c
> +++ b/drivers/scsi/fnic/fnic_scsi.c
[ ... ]
> @@ -734,6 +735,8 @@ static int fnic_fcpio_fw_reset_cmpl_handler(struct fnic *fnic,
>  	/* Clean up all outstanding io requests */
>  	if (IS_FNIC_FCP_INITIATOR(fnic))
>  		fnic_cleanup_io(fnic, SCSI_NO_TAG);

[Severity: Critical]
This is a pre-existing issue, but does calling fnic_cleanup_io() from a
hardware interrupt context trigger a "scheduling while atomic" panic?

If a hardware interrupt handler like fnic_isr_msix_wq_copy() processes
a FCPIO_RESET_CMPL event:

fnic_wq_copy_cmpl_handler() ->
    fnic_fcpio_fw_reset_cmpl_handler() ->
        fnic_cleanup_io() ->
            schedule_timeout(msecs_to_jiffies(100))

Calling schedule_timeout() yields the CPU to the scheduler, which is
invalid from hardirq context. Will this crash the system upon firmware
reset completion?

> +	else if (IS_FNIC_NVME_INITIATOR(fnic))
> +		nvfnic_cleanup_all_nvme_ios(fnic);
>  
>  	atomic64_set(&fnic->fnic_stats.fw_stats.active_fw_reqs, 0);
>  	atomic64_set(&fnic->fnic_stats.io_stats.active_ios, 0);
[ ... ]
> @@ -1457,11 +1461,22 @@ static int fnic_fcpio_cmpl_handler(struct vnic_dev *vdev,
>  		break;
>  
>  	case FCPIO_ICMND_CMPL: /* fw completed a command */
> -		fnic_fcpio_icmnd_cmpl_handler(fnic, cq_index, desc);
> +		if (IS_FNIC_FCP_INITIATOR(fnic))
> +			fnic_fcpio_icmnd_cmpl_handler(fnic, cq_index, desc);
> +		else if (IS_FNIC_NVME_INITIATOR(fnic))
> +			nvfnic_fcpio_nvme_fast_cmpl_handler(fnic, desc);
> +		break;
> +
> +	case FCPIO_NVME_ERSP_HW_CMPL: /* fw completed NVMe ERSP */
> +		if (IS_FNIC_NVME_INITIATOR(fnic))
> +			nvfnic_fcpio_ersp_cmpl_handler(fnic, desc, 1);

[Severity: High]
This is a pre-existing issue, but does calling these NVMe completion handlers
corrupt IRQ state across CPUs?

Looking at nvfnic_fcpio_nvme_fast_cmpl_handler() and
nvfnic_fcpio_ersp_cmpl_handler() in fnic_nvme.c, they both use a shared
structure member to save IRQ state:

    spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);

Because fnic->lock_flags is shared rather than a local stack variable,
concurrent threads acquiring the lock on different CPUs will overwrite
each other's saved IRQ states. When spin_unlock_irqrestore() is called,
the wrong IRQ state is restored.

>  		break;
>  
>  	case FCPIO_ITMF_CMPL: /* fw completed itmf (abort cmd, lun reset)*/
> -		fnic_fcpio_itmf_cmpl_handler(fnic, cq_index, desc);
> +		if (IS_FNIC_FCP_INITIATOR(fnic))
> +			fnic_fcpio_itmf_cmpl_handler(fnic, cq_index, desc);
> +		else if (IS_FNIC_NVME_INITIATOR(fnic))
> +			nvfnic_fcpio_nvme_itmf_cmpl_handler(fnic, desc);
>  		break;
[ ... ]
> @@ -1650,6 +1665,15 @@ void fnic_wq_copy_cleanup_handler(struct vnic_wq_copy *wq,
>  	unsigned long start_time = 0;
>  	uint16_t hwq;
>  
> +	/*
> +	 * Clean up outstanding NVMe requests if firmware reset did not
> +	 * complete them before WQ copy cleanup.
> +	 */
> +	if (IS_FNIC_NVME_INITIATOR(fnic)) {
> +		nvfnic_cleanup_all_nvme_ios(fnic);
> +		return;
> +	}
> +

[Severity: High]
Can this unconditional cleanup lead to a use-after-free during driver
teardown?

The fnic_remove() sequence calls nvfnic_nvme_unload() which flushes
fnic->nvme_io_cmpl_work. It then calls fnic_cleanup() which invokes this
handler:

fnic_remove()
    nvfnic_nvme_unload()
        flush_work(&fnic->nvme_io_cmpl_work);
    fnic_cleanup()
        fnic_wq_copy_cleanup_handler()
            nvfnic_cleanup_all_nvme_ios()
                io_req->done()

If there are uncompleted requests, nvfnic_cleanup_all_nvme_ios() will
forcibly complete them, and io_req->done() queues new async work referencing
&fnic->nvme_io_cmpl_work.

Since fnic_remove() then frees the fnic structure without another flush,
won't the asynchronously executing work item dereference the freed fnic
pointer?

>  	/* get the tag reference */
>  	fcpio_tag_id_dec(&desc->hdr.tag, &id);
>  	id &= FNIC_TAG_MASK;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260624044334.3079-1-kartilak@cisco.com?part=7

  reply	other threads:[~2026-06-24  5:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24  4:43 [PATCH v5 00/13] Introduce functionality for NVMe initiator Karan Tilak Kumar
2026-06-24  4:43 ` [PATCH v5 01/13] scsi: fnic: Make debug logging protocol independent Karan Tilak Kumar
2026-06-24  5:08   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 02/13] scsi: fnic: Use fnic_num for non-SCSI identifiers Karan Tilak Kumar
2026-06-24  5:11   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 03/13] scsi: fnic: Decode firmware role configuration Karan Tilak Kumar
2026-06-24  4:43 ` [PATCH v5 04/13] scsi: fnic: Advertise NVMe initiator service parameters Karan Tilak Kumar
2026-06-24  4:43 ` [PATCH v5 05/13] scsi: fnic: Add FDLS role handling for NVMe initiators Karan Tilak Kumar
2026-06-24  5:18   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 06/13] scsi: fnic: Add the NVMe/FC transport path Karan Tilak Kumar
2026-06-24  5:12   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 07/13] scsi: fnic: Route completions and resets by initiator role Karan Tilak Kumar
2026-06-24  5:11   ` sashiko-bot [this message]
2026-06-24  4:43 ` [PATCH v5 08/13] scsi: fnic: Handle NVMe LS frames in FDLS Karan Tilak Kumar
2026-06-24  5:13   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 09/13] scsi: fnic: Send NVMe LS requests through FDLS Karan Tilak Kumar
2026-06-24  5:11   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 10/13] scsi: fnic: Abort timed-out NVMe LS requests Karan Tilak Kumar
2026-06-24  5:13   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 11/13] scsi: fnic: Track NVMe transport statistics Karan Tilak Kumar
2026-06-24  5:15   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 12/13] scsi: fnic: Expose NVMe transport state in debugfs Karan Tilak Kumar
2026-06-24  5:17   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 13/13] scsi: fnic: Bump up version number Karan Tilak Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260624051153.C379C1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=kartilak@cisco.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox