Linux SCSI subsystem development
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Tyrel Datwyler" <tyreld@linux.ibm.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH 29/29] ibmvfc: fail nvme-fc fcp-io and ls requests during transport reset
Date: Tue, 23 Jun 2026 02:04:39 +0000	[thread overview]
Message-ID: <20260623020439.5AF251F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260623013035.3436640-30-tyreld@linux.ibm.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [Critical] Pooled event structures retain stale NVMe request pointers, leading to exploitable Use-After-Free and double completions when non-NVMe commands fail.
- [High] Target kref is permanently leaked when an NVMe LS request fails or is purged.

Pre-existing issues:
- [High] Returning SCSI_MLQUEUE_HOST_BUSY for NVMe requests on transport closure causes permanent IO failures instead of requeuing.
- [Medium] NVMe FCP requests are not purged during a transport reset, rendering the patch's FCP functionality dead code.
--

commit 88ce7ce7f11c11cfca700c6919820cf4d9865c8a
Author: Tyrel Datwyler <tyreld@linux.ibm.com>

    ibmvfc: fail nvme-fc fcp-io and ls requests during transport reset
    
    The current purge code for flushing outstanding commands during a
    transport reset only deals with SCSI commands. Rename the
    ibmvfc_scsi_eh_done completion handler to ibmvfc_vfc_eh_done and wire it
    to correctly complete nvme fcp and ls commands when flushing the
    inflight command list during a reset.

> The current purge code for flushing outstanding commands during a
> transport reset only deals with SCSI commands. Rename the
> ibmvfc_scsi_eh_done completion handler to ibmvfc_vfc_eh_done and wire it
> to correctly complete nvme fcp and ls commands when flushing the
> inflight command list during a reset.

[Severity: Medium]
This isn't a bug introduced by this patch, but does the purge code actually
evaluate inflight NVMe FCP commands?

Looking at ibmvfc_purge_requests(), the loop iterates over vhost->crq and
vhost->scsi_scrqs.scrqs:

drivers/scsi/ibmvscsi/ibmvfc-core.c:ibmvfc_purge_requests() {
    ...
    for (i = 0; i < hwqs; i++) {
        spin_lock_irqsave(queues[i].q_lock, flags);
    ...
}

It seems to entirely miss vhost->nvme_scrqs.scrqs where FCP requests are
queued. If so, inflight NVMe FCP commands might be orphaned during a reset,
making the new FCP completion handling unreachable.

> diff --git a/drivers/scsi/ibmvscsi/ibmvfc-core.c b/drivers/scsi/ibmvscsi/ibmvfc-core.c
> index a7183493cf965..aeb5b8902aa55 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc-core.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc-core.c

[ ... ]

> @@ -1092,20 +1092,28 @@ void ibmvfc_free_event(struct ibmvfc_event *evt)
>  }
>  
>  /**
> - * ibmvfc_scsi_eh_done - EH done function for queuecommand commands
> + * ibmvfc_vfc_eh_done - EH done function for queued IO
>   * @evt:	ibmvfc event struct
>   *
> - * This function does not setup any error status, that must be done
> - * before this function gets called.
> + * This function does not setup any error status for scsi commands, that must be
> + * done before this function gets called.
>   **/
> -static void ibmvfc_scsi_eh_done(struct ibmvfc_event *evt)
> +static void ibmvfc_vfc_eh_done(struct ibmvfc_event *evt)
>  {
>  	struct scsi_cmnd *cmnd = evt->cmnd;
> +	struct nvmefc_ls_req *ls_req = evt->ls_req;
> +	struct nvmefc_fcp_req *fcp_req = evt->fcp_req;
>  
>  	if (cmnd) {
>  		scsi_dma_unmap(cmnd);
>  		scsi_done(cmnd);
> -	}
> +	} else if (fcp_req) {
> +		fcp_req->rcv_rsplen = 0;
> +		fcp_req->transferred_length = 0;
> +		fcp_req->status = NVME_SC_INTERNAL;
> +		fcp_req->done(fcp_req);
> +	} else if (ls_req)
> +		ls_req->done(ls_req, -EIO);

[Severity: High]
Does this error handler leak the target kref for LS requests?

In ibmvfc_nvme_ls_req(), a reference is taken via kref_get(&tgt->kref) which
is typically dropped in the success path by ibmvfc_ls_req_done().

Since this patch replaces the completion handler with ibmvfc_vfc_eh_done(),
the transport request is completed here but kref_put(&evt->tgt->kref, ...)
is never called, potentially leaving the target structure leaked.

>  
>  	ibmvfc_free_event(evt);
>  }
> @@ -1146,8 +1154,10 @@ static void ibmvfc_fail_request(struct ibmvfc_event *evt, int error_code)
>  	BUG_ON(!atomic_dec_and_test(&evt->active));
>  	if (evt->cmnd) {
>  		evt->cmnd->result = (error_code << 16);
> -		evt->done = ibmvfc_scsi_eh_done;
> -	} else
> +		evt->done = ibmvfc_vfc_eh_done;
> +	} else if (evt->fcp_req || evt->ls_req)
> +		evt->done = ibmvfc_vfc_eh_done;
> +	else

[Severity: Critical]
Can these checks be triggered by stale NVMe pointers on non-NVMe commands?

When ibmvfc_get_event() reallocates an event structure from the pool,
ibmvfc_init_event() doesn't clear evt->fcp_req or evt->ls_req to NULL:

drivers/scsi/ibmvscsi/ibmvfc-core.c:ibmvfc_init_event() {
    ...
    evt->cmnd = NULL;
    evt->sync_iu = NULL;
    evt->eh_comp = NULL;
    ...
}

If a non-NVMe command (e.g., a MAD command) is allocated an event with stale
NVMe pointers and fails here, will it incorrectly replace the completion
handler with ibmvfc_vfc_eh_done() and later use-after-free the stale NVMe
pointers?

>  		evt->xfer_iu->mad_common.status = cpu_to_be16(IBMVFC_MAD_DRIVER_FAILED);
>  
>  	timer_delete(&evt->timer);

[ ... ]

> @@ -1816,8 +1826,10 @@ int ibmvfc_send_event(struct ibmvfc_event *evt,

[Severity: High]
This isn't a bug introduced by this patch, but earlier in ibmvfc_send_event(),
it hardcodes a return of SCSI_MLQUEUE_HOST_BUSY on H_CLOSED without
distinguishing between SCSI and NVMe commands:

drivers/scsi/ibmvscsi/ibmvfc-core.c:ibmvfc_send_event() {
    ...
    if (rc == H_CLOSED) {
        ...
        return SCSI_MLQUEUE_HOST_BUSY;
    }
    ...
}

This value (0x1055) is propagated back up through ibmvfc_nvme_fcp_io()
directly to the NVMe FC transport layer. Since 0x1055 is neither 0 nor
-EBUSY, will the NVMe FC core treat it as a hard failure and abort the
request with BLK_STS_IOERR instead of handling the queue pressure
appropriately?

>  		dev_err(vhost->dev, "Send error (rc=%d)\n", rc);
>  		if (evt->cmnd) {
>  			evt->cmnd->result = DID_ERROR << 16;
> -			evt->done = ibmvfc_scsi_eh_done;
> -		} else
> +			evt->done = ibmvfc_vfc_eh_done;
> +		} else if (evt->fcp_req || evt->ls_req)
> +			evt->done = ibmvfc_vfc_eh_done;
> +		else
>  			evt->xfer_iu->mad_common.status = cpu_to_be16(IBMVFC_MAD_CRQ_ERROR);
>  
>  		evt->done(evt);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623013035.3436640-1-tyreld@linux.ibm.com?part=29

      reply	other threads:[~2026-06-23  2:04 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23  1:30 [PATCH 00/29] ibmvfc: Add NVMe-FC support Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 01/29] ibmvfc: move target list from host to protocol specific channel groups Tyrel Datwyler
2026-06-23  1:49   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 02/29] ibmvfc: add NVMe/FC protocol interface definitions Tyrel Datwyler
2026-06-23  1:54   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 03/29] ibmvfc: split NVMe support into separate source file and add transport stubs Tyrel Datwyler
2026-06-23  1:50   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 04/29] ibmvfc: initialize NVMe channel configuration during driver probe Tyrel Datwyler
2026-06-23  1:51   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 05/29] ibmvfc: alloc/dealloc sub-queues for nvme channels Tyrel Datwyler
2026-06-23  1:55   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 06/29] ibmvfc: add logic for protocol specific fabric logins Tyrel Datwyler
2026-06-23  1:50   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 07/29] ibmvfc: add wrapper to get vhost associated with a channel struct Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 08/29] ibmvfc: add helper for creating protocol specific discovery event Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 09/29] ibmvfc: add helper to check NVMe/FC support with active channels Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 10/29] ibmvfc: allocate and free NVMe channel group discover buffer Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 11/29] ibmvfc: send NVMe target discovery MAD Tyrel Datwyler
2026-06-23  1:52   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 12/29] ibmvfc: add NVMe/FC Implicit Logout and Move Login support Tyrel Datwyler
2026-06-23  1:49   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 13/29] ibmvfc: add NVMe/FC Port " Tyrel Datwyler
2026-06-23  1:53   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 14/29] ibmvfc: add NVMe/FC Process " Tyrel Datwyler
2026-06-23  1:52   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 15/29] ibmvfc: add NVMe/FC Query Target support Tyrel Datwyler
2026-06-23  1:52   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 16/29] ibmvfc: allocate targets based on protocol Tyrel Datwyler
2026-06-23  1:56   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 17/29] ibmvfc: delete NVMe/FC targets as well as SCSI Tyrel Datwyler
2026-06-23  1:51   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 18/29] ibmvfc: update state machine to process NVMe/FC targets Tyrel Datwyler
2026-06-23  1:55   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 19/29] ibmvfc: implement NVMe/FC stubs for local/remote port registration Tyrel Datwyler
2026-06-23  1:51   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 20/29] ibmvfc: register local nvme fc port after fabric login Tyrel Datwyler
2026-06-23  1:57   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 21/29] ibmvfc: process NVMe/FC rports in work thread Tyrel Datwyler
2026-06-23  2:00   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 22/29] ibmvfc: extend ibmvfc_debug visibility to ibmvfc-nvme.h Tyrel Datwyler
2026-06-23  1:51   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 23/29] ibmvfc: declare global function definitions Tyrel Datwyler
2026-06-23  2:04   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 24/29] ibmvfc: implement LLDD callbacks for mapping nvme-fc queues Tyrel Datwyler
2026-06-23  2:05   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 25/29] ibmvfc: implement nvme-fc LS submission transport callback Tyrel Datwyler
2026-06-23  2:08   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 26/29] ibmvfc: implement nvme-fc IO command submission callback Tyrel Datwyler
2026-06-23  2:09   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 27/29] ibmvfc: implement nvme-fc LS abort handling callback Tyrel Datwyler
2026-06-23  2:09   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 28/29] ibmvfc: implement nvme-fc FCP abort callback Tyrel Datwyler
2026-06-23  2:05   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 29/29] ibmvfc: fail nvme-fc fcp-io and ls requests during transport reset Tyrel Datwyler
2026-06-23  2:04   ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623020439.5AF251F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=tyreld@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox