Linux SCSI subsystem development
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Tyrel Datwyler" <tyreld@linux.ibm.com>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH 28/29] ibmvfc: implement nvme-fc FCP abort callback
Date: Tue, 23 Jun 2026 02:05:04 +0000	[thread overview]
Message-ID: <20260623020505.042221F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260623013035.3436640-29-tyreld@linux.ibm.com>

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
- [High] Synchronous wait in `fcp_abort` violates the NVMe-FC transport API and blocks the block layer timeout handler.
- [High] Lack of synchronization against command completion allows `fcp_abort` to abort the wrong I/O request and causes a Use-After-Free of `ibmvfc_event`.
- [High] Indefinite hang in `wait_for_completion` when an NVMe MAD command times out, due to missing NVMe queue cleanup in the host reset path.
- [High] Lock inversion (AB-BA) deadlock between the hardware queue lock and the SCSI host lock during the failure path of `ibmvfc_send_event`.
- [High] Uninitialized `evt->tgt` pointer in FCP IO requests causes a NULL pointer dereference or Use-After-Free in the abort path.
- [Medium] Event pool exhaustion prevents aborts from issuing during hardware queue saturation, causing the transport to hang.
--

commit 0e079891124178111c0e0055ee9e67515aff453b
Author: Tyrel Datwyler <tyreld@linux.ibm.com>

ibmvfc: implement nvme-fc FCP abort callback

This commit implements the NVMe-FC FCP abort callback by issuing an NVMF
cancel MAD on the same submission queue used by the original FCP request.
It factors the TMF setup into a dedicated helper and sends the cancel
synchronously, waiting for completion.

> diff --git a/drivers/scsi/ibmvscsi/ibmvfc-nvme.c b/drivers/scsi/ibmvscsi/ibmvfc-nvme.c
> index 18e8657abc444..92937f9aa4649 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc-nvme.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc-nvme.c
> @@ -374,11 +374,75 @@ static int ibmvfc_nvme_fcp_io(struct nvme_fc_local_port *lport,
[ ... ]
> +static void ibmvfc_init_fcp_abort(struct ibmvfc_event *evt,
> +				  struct nvmefc_fcp_req *abort_req)
> +{
> +	struct ibmvfc_tmf *tmf;
> +	struct ibmvfc_event *abt_evt = abort_req->private;
> +	struct ibmvfc_target *tgt = abt_evt->tgt;
> +
> +	tmf = &evt->iu.tmf;
> +	memset(tmf, 0, sizeof(*tmf));
> +	tmf->common.version = cpu_to_be32(2);
> +	tmf->common.opcode = cpu_to_be32(IBMVFC_NVMF_TMF_MAD);
> +	tmf->common.length = cpu_to_be16(sizeof(*tmf));
> +	tmf->flags = cpu_to_be32(IBMVFC_TMF_ABORT_TASK | IBMVFC_TMF_NVMF_ASSOC);
> +	tmf->cancel_key = cpu_to_be32((u64)abt_evt);
> +	tmf->my_cancel_key = cpu_to_be32((u64)evt);
> +	tmf->target_wwpn = cpu_to_be64(tgt->wwpn);

[Severity: High]
Does this dereference a NULL or stale pointer?

It appears that when an NVMe FCP request is submitted via
ibmvfc_nvme_fcp_io(), an ibmvfc_event is allocated but its evt->tgt
field is never initialized. Because evt->tgt was never set during
submission, could it contain NULL (from pool zeroing) or a stale pointer
from a previous LS request when we extract it here?

> +	tmf->assoc_id = cpu_to_be64(tgt->assoc_id);
> +	tmf->task_tag = cpu_to_be64((u64)abt_evt);
> +
> +	init_completion(&evt->comp);
> +}
> +
> +static void ibmvfc_nvme_fcp_abort(struct nvme_fc_local_port *lport,
> +				  struct nvme_fc_remote_port *rport,
> +				  void *hw_queue_handle,
> +				  struct nvmefc_fcp_req *abort_req)
> +{
> +	struct ibmvfc_host *vhost = lport->private;
> +	struct ibmvfc_target *tgt = rport->private;
> +	struct ibmvfc_event *evt, *abt_evt = abort_req->private;
> +	struct ibmvfc_queue *queue;
> +	union ibmvfc_iu rsp;
> +	unsigned long flags;
> +	u16 status = 0;
> +
> +	if (!abt_evt)
> +		return;
> +
> +	queue = abt_evt->queue;

[Severity: High]
Is it possible for the original command to complete concurrently at the
hardware level before this code accesses abt_evt?

If the interrupt handler returns abt_evt to the free pool, could this
cause a use-after-free? Furthermore, if the event is reallocated to a
new I/O request on another CPU, could the TMF payload be populated with
the new event's tag, causing the VIOS to abort the wrong I/O request?

> +	if (!vhost->logged_in || !queue)
> +		return;
> +
> +	evt = ibmvfc_get_event(queue);
> +	if (!evt)
> +		return;

[Severity: Medium]
Will this silently drop the abort request if the standard event pool is
exhausted?

Since aborts are typically invoked when commands are timing out (often
due to submission queue saturation), should this use a reserved event?
If this returns NULL without sending the abort, could the command hang
indefinitely since the NVMe FC transport assumes the abort was initiated?

> +
> +	spin_lock_irqsave(queue->q_lock, flags);
> +	kref_get(&tgt->kref);
> +	ibmvfc_init_event(evt, ibmvfc_sync_nvme_completion, IBMVFC_MAD_FORMAT);
> +	ibmvfc_init_fcp_abort(evt, abort_req);
> +	evt->sync_iu = &rsp;
> +
> +	if (ibmvfc_send_event(evt, vhost, default_timeout))

[Severity: High]
Can this cause a lock inversion (AB-BA) deadlock?

We acquire queue->q_lock before calling ibmvfc_send_event(). If
ibmvfc_send_event() encounters an error, it calls evt->done(evt).
Because the event is initialized with IBMVFC_MAD_FORMAT, the callback is
ibmvfc_locked_done(), which unconditionally acquires host_lock.

This creates an acquisition order of queue->q_lock -> host_lock. However,
the driver's reset path in ibmvfc_reset_host() acquires host_lock first,
then iterates over the hardware queues to acquire queue->q_lock inside
ibmvfc_purge_requests().

> +		goto out;
> +
> +	spin_unlock_irqrestore(queue->q_lock, flags);
> +
> +	wait_for_completion(&evt->comp);

[Severity: High]
Does this synchronous wait violate the NVMe-FC transport API?

The NVMe FC transport requires fcp_abort to be an asynchronous operation.
Because fcp_abort is invoked by the block layer's timeout worker, waiting
synchronously for the abort MAD to complete could stall the timeout
processing thread for the system until the VIOS responds.

[Severity: High]
Can this wait_for_completion() hang indefinitely if the NVMe MAD command
times out?

The fcp_abort TMF event is allocated on the NVMe queue (nvme_scrqs).
If the VIOS fails to respond to the cancel MAD, ibmvfc_send_event() times
out and triggers a host reset, invoking ibmvfc_purge_requests().

However, ibmvfc_purge_requests() iterates only over the legacy CRQ and
vhost->scsi_scrqs, entirely ignoring vhost->nvme_scrqs. Because the event
is never failed by the purge logic, could this wait forever?

> +	status = be16_to_cpu(rsp.mad_common.status);
> +

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623013035.3436640-1-tyreld@linux.ibm.com?part=28

  reply	other threads:[~2026-06-23  2:05 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23  1:30 [PATCH 00/29] ibmvfc: Add NVMe-FC support Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 01/29] ibmvfc: move target list from host to protocol specific channel groups Tyrel Datwyler
2026-06-23  1:49   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 02/29] ibmvfc: add NVMe/FC protocol interface definitions Tyrel Datwyler
2026-06-23  1:54   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 03/29] ibmvfc: split NVMe support into separate source file and add transport stubs Tyrel Datwyler
2026-06-23  1:50   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 04/29] ibmvfc: initialize NVMe channel configuration during driver probe Tyrel Datwyler
2026-06-23  1:51   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 05/29] ibmvfc: alloc/dealloc sub-queues for nvme channels Tyrel Datwyler
2026-06-23  1:55   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 06/29] ibmvfc: add logic for protocol specific fabric logins Tyrel Datwyler
2026-06-23  1:50   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 07/29] ibmvfc: add wrapper to get vhost associated with a channel struct Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 08/29] ibmvfc: add helper for creating protocol specific discovery event Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 09/29] ibmvfc: add helper to check NVMe/FC support with active channels Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 10/29] ibmvfc: allocate and free NVMe channel group discover buffer Tyrel Datwyler
2026-06-23  1:30 ` [PATCH 11/29] ibmvfc: send NVMe target discovery MAD Tyrel Datwyler
2026-06-23  1:52   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 12/29] ibmvfc: add NVMe/FC Implicit Logout and Move Login support Tyrel Datwyler
2026-06-23  1:49   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 13/29] ibmvfc: add NVMe/FC Port " Tyrel Datwyler
2026-06-23  1:53   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 14/29] ibmvfc: add NVMe/FC Process " Tyrel Datwyler
2026-06-23  1:52   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 15/29] ibmvfc: add NVMe/FC Query Target support Tyrel Datwyler
2026-06-23  1:52   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 16/29] ibmvfc: allocate targets based on protocol Tyrel Datwyler
2026-06-23  1:56   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 17/29] ibmvfc: delete NVMe/FC targets as well as SCSI Tyrel Datwyler
2026-06-23  1:51   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 18/29] ibmvfc: update state machine to process NVMe/FC targets Tyrel Datwyler
2026-06-23  1:55   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 19/29] ibmvfc: implement NVMe/FC stubs for local/remote port registration Tyrel Datwyler
2026-06-23  1:51   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 20/29] ibmvfc: register local nvme fc port after fabric login Tyrel Datwyler
2026-06-23  1:57   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 21/29] ibmvfc: process NVMe/FC rports in work thread Tyrel Datwyler
2026-06-23  2:00   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 22/29] ibmvfc: extend ibmvfc_debug visibility to ibmvfc-nvme.h Tyrel Datwyler
2026-06-23  1:51   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 23/29] ibmvfc: declare global function definitions Tyrel Datwyler
2026-06-23  2:04   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 24/29] ibmvfc: implement LLDD callbacks for mapping nvme-fc queues Tyrel Datwyler
2026-06-23  2:05   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 25/29] ibmvfc: implement nvme-fc LS submission transport callback Tyrel Datwyler
2026-06-23  2:08   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 26/29] ibmvfc: implement nvme-fc IO command submission callback Tyrel Datwyler
2026-06-23  2:09   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 27/29] ibmvfc: implement nvme-fc LS abort handling callback Tyrel Datwyler
2026-06-23  2:09   ` sashiko-bot
2026-06-23  1:30 ` [PATCH 28/29] ibmvfc: implement nvme-fc FCP abort callback Tyrel Datwyler
2026-06-23  2:05   ` sashiko-bot [this message]
2026-06-23  1:30 ` [PATCH 29/29] ibmvfc: fail nvme-fc fcp-io and ls requests during transport reset Tyrel Datwyler
2026-06-23  2:04   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623020505.042221F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=tyreld@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox