All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Karan Tilak Kumar" <kartilak@cisco.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH v5 07/13] scsi: fnic: Route completions and resets by initiator role
Date: Wed, 24 Jun 2026 05:11:53 +0000	[thread overview]
Message-ID: <20260624051153.C379C1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260624044334.3079-8-kartilak@cisco.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] Asynchronous NVMe IO completion work queued during driver teardown executes after the fnic structure is freed, leading to a Use-After-Free.

Pre-existing issues:
- [Critical] The firmware reset completion handler calls `schedule_timeout()` from a hardirq context, leading to a "scheduling while atomic" kernel panic.
- [High] Using a shared struct member (`fnic->lock_flags`) to store IRQ state for `spin_lock_irqsave` violates locking semantics and can corrupt IRQ state across CPUs.
--

commit 0746859ac4c31fbfb285f20a291564287cfa0119
Author: Karan Tilak Kumar <kartilak@cisco.com>

scsi: fnic: Route completions and resets by initiator role

Dispatch FCPIO command, response, and ITMF completions to the FCP or
NVMe handlers based on the configured role. Read the NVMe queue-depth
and timeout retry fields from firmware config, clean up NVMe I/O on
firmware reset, and skip SCSI-only cleanup for initiator roles.

[ ... ]
> diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
> index 04ab384033b19..9607684bc6107 100644
> --- a/drivers/scsi/fnic/fnic_scsi.c
> +++ b/drivers/scsi/fnic/fnic_scsi.c
[ ... ]
> @@ -734,6 +735,8 @@ static int fnic_fcpio_fw_reset_cmpl_handler(struct fnic *fnic,
>  	/* Clean up all outstanding io requests */
>  	if (IS_FNIC_FCP_INITIATOR(fnic))
>  		fnic_cleanup_io(fnic, SCSI_NO_TAG);

[Severity: Critical]
This is a pre-existing issue, but does calling fnic_cleanup_io() from a
hardware interrupt context trigger a "scheduling while atomic" panic?

If a hardware interrupt handler like fnic_isr_msix_wq_copy() processes
a FCPIO_RESET_CMPL event:

fnic_wq_copy_cmpl_handler() ->
    fnic_fcpio_fw_reset_cmpl_handler() ->
        fnic_cleanup_io() ->
            schedule_timeout(msecs_to_jiffies(100))

Calling schedule_timeout() yields the CPU to the scheduler, which is
invalid from hardirq context. Will this crash the system upon firmware
reset completion?

> +	else if (IS_FNIC_NVME_INITIATOR(fnic))
> +		nvfnic_cleanup_all_nvme_ios(fnic);
>  
>  	atomic64_set(&fnic->fnic_stats.fw_stats.active_fw_reqs, 0);
>  	atomic64_set(&fnic->fnic_stats.io_stats.active_ios, 0);
[ ... ]
> @@ -1457,11 +1461,22 @@ static int fnic_fcpio_cmpl_handler(struct vnic_dev *vdev,
>  		break;
>  
>  	case FCPIO_ICMND_CMPL: /* fw completed a command */
> -		fnic_fcpio_icmnd_cmpl_handler(fnic, cq_index, desc);
> +		if (IS_FNIC_FCP_INITIATOR(fnic))
> +			fnic_fcpio_icmnd_cmpl_handler(fnic, cq_index, desc);
> +		else if (IS_FNIC_NVME_INITIATOR(fnic))
> +			nvfnic_fcpio_nvme_fast_cmpl_handler(fnic, desc);
> +		break;
> +
> +	case FCPIO_NVME_ERSP_HW_CMPL: /* fw completed NVMe ERSP */
> +		if (IS_FNIC_NVME_INITIATOR(fnic))
> +			nvfnic_fcpio_ersp_cmpl_handler(fnic, desc, 1);

[Severity: High]
This is a pre-existing issue, but does calling these NVMe completion handlers
corrupt IRQ state across CPUs?

Looking at nvfnic_fcpio_nvme_fast_cmpl_handler() and
nvfnic_fcpio_ersp_cmpl_handler() in fnic_nvme.c, they both use a shared
structure member to save IRQ state:

    spin_lock_irqsave(&fnic->fnic_lock, fnic->lock_flags);

Because fnic->lock_flags is shared rather than a local stack variable,
concurrent threads acquiring the lock on different CPUs will overwrite
each other's saved IRQ states. When spin_unlock_irqrestore() is called,
the wrong IRQ state is restored.

>  		break;
>  
>  	case FCPIO_ITMF_CMPL: /* fw completed itmf (abort cmd, lun reset)*/
> -		fnic_fcpio_itmf_cmpl_handler(fnic, cq_index, desc);
> +		if (IS_FNIC_FCP_INITIATOR(fnic))
> +			fnic_fcpio_itmf_cmpl_handler(fnic, cq_index, desc);
> +		else if (IS_FNIC_NVME_INITIATOR(fnic))
> +			nvfnic_fcpio_nvme_itmf_cmpl_handler(fnic, desc);
>  		break;
[ ... ]
> @@ -1650,6 +1665,15 @@ void fnic_wq_copy_cleanup_handler(struct vnic_wq_copy *wq,
>  	unsigned long start_time = 0;
>  	uint16_t hwq;
>  
> +	/*
> +	 * Clean up outstanding NVMe requests if firmware reset did not
> +	 * complete them before WQ copy cleanup.
> +	 */
> +	if (IS_FNIC_NVME_INITIATOR(fnic)) {
> +		nvfnic_cleanup_all_nvme_ios(fnic);
> +		return;
> +	}
> +

[Severity: High]
Can this unconditional cleanup lead to a use-after-free during driver
teardown?

The fnic_remove() sequence calls nvfnic_nvme_unload() which flushes
fnic->nvme_io_cmpl_work. It then calls fnic_cleanup() which invokes this
handler:

fnic_remove()
    nvfnic_nvme_unload()
        flush_work(&fnic->nvme_io_cmpl_work);
    fnic_cleanup()
        fnic_wq_copy_cleanup_handler()
            nvfnic_cleanup_all_nvme_ios()
                io_req->done()

If there are uncompleted requests, nvfnic_cleanup_all_nvme_ios() will
forcibly complete them, and io_req->done() queues new async work referencing
&fnic->nvme_io_cmpl_work.

Since fnic_remove() then frees the fnic structure without another flush,
won't the asynchronously executing work item dereference the freed fnic
pointer?

>  	/* get the tag reference */
>  	fcpio_tag_id_dec(&desc->hdr.tag, &id);
>  	id &= FNIC_TAG_MASK;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260624044334.3079-1-kartilak@cisco.com?part=7

  reply	other threads:[~2026-06-24  5:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24  4:43 [PATCH v5 00/13] Introduce functionality for NVMe initiator Karan Tilak Kumar
2026-06-24  4:43 ` [PATCH v5 01/13] scsi: fnic: Make debug logging protocol independent Karan Tilak Kumar
2026-06-24  5:08   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 02/13] scsi: fnic: Use fnic_num for non-SCSI identifiers Karan Tilak Kumar
2026-06-24  5:11   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 03/13] scsi: fnic: Decode firmware role configuration Karan Tilak Kumar
2026-06-24  4:43 ` [PATCH v5 04/13] scsi: fnic: Advertise NVMe initiator service parameters Karan Tilak Kumar
2026-06-24  4:43 ` [PATCH v5 05/13] scsi: fnic: Add FDLS role handling for NVMe initiators Karan Tilak Kumar
2026-06-24  5:18   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 06/13] scsi: fnic: Add the NVMe/FC transport path Karan Tilak Kumar
2026-06-24  5:12   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 07/13] scsi: fnic: Route completions and resets by initiator role Karan Tilak Kumar
2026-06-24  5:11   ` sashiko-bot [this message]
2026-06-24  4:43 ` [PATCH v5 08/13] scsi: fnic: Handle NVMe LS frames in FDLS Karan Tilak Kumar
2026-06-24  5:13   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 09/13] scsi: fnic: Send NVMe LS requests through FDLS Karan Tilak Kumar
2026-06-24  5:11   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 10/13] scsi: fnic: Abort timed-out NVMe LS requests Karan Tilak Kumar
2026-06-24  5:13   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 11/13] scsi: fnic: Track NVMe transport statistics Karan Tilak Kumar
2026-06-24  5:15   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 12/13] scsi: fnic: Expose NVMe transport state in debugfs Karan Tilak Kumar
2026-06-24  5:17   ` sashiko-bot
2026-06-24  4:43 ` [PATCH v5 13/13] scsi: fnic: Bump up version number Karan Tilak Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260624051153.C379C1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=kartilak@cisco.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.