From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D015313C9C4 for ; Tue, 23 Jun 2026 02:04:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782180281; cv=none; b=ZiQSHzqFvHLnsOWjnGRCCGn7f0Gu5Q0QN4KwcjT9EwDgqKAIVXLsp4bxsSsGVH381UoxcHnBsLX/X2vYLOyb/0gBjHz3BoVysLzkCAY++QqkbXMhowkVNvwJ8eNPwftc+wQoRdQsu3kGYUI0ai9X/ajnBwL74m5EM7N1QUPrvz8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782180281; c=relaxed/simple; bh=yMVBCIChQgFP06uPrcWg35V6HPAb1vKHQkqdeHGZ1vg=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=hZpRy5GPdEb7KJRBUECYk6gMp6/i6vI/ik4zy0rx9O/QcD2hCcDM3ZuImtqM9TfulNur0EaKA2cR9AL4n8vhPz8e3tOU5NCCA+5kZI9kBtlNs9enfV82f3p5JLZg7KKG2X8UgFl7dX7+yxNI10xbJOzktXLqppvGQrjyFygC2U8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YPRz/+IR; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YPRz/+IR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5AF251F000E9; Tue, 23 Jun 2026 02:04:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782180279; bh=C0USVOOIsCK4t8IBEbU5zMl8c9GgJHn0mTuK5oMKJag=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=YPRz/+IRQOqo/o2J1LAA2m6WnpE++6bcvXQOYuV48wMOHqNq+s5iN3x1WAcbEle/j eotYK8uKGLyxGoz9qIHs59FJpie4EJl1da4Ii3ne16DqhE6b0op8GKr4SxNJHf4TeV SrSThbCxM+fB1+HhwVfKXOrLencyy/hWfq+I9fosJwdVRtLzwtfe8Afbzhs6gNYiIm hLCfdFZObOvv7jbAV6orrWHleMRkzTyNUDbcRBpT9iQknsNSh4fOJHZ35Uxq9zlAMT 72BxJXtmygIv7FKqA1/EvJB+vD9Aw60SR+sPPHmo3ewPWydW9V7pL5WzIe4nGPH3vc 8ZJBMF5y+gZNQ== From: sashiko-bot@kernel.org Subject: Re: [PATCH 29/29] ibmvfc: fail nvme-fc fcp-io and ls requests during transport reset Reply-To: sashiko-reviews@lists.linux.dev To: "Tyrel Datwyler" Cc: linux-scsi@vger.kernel.org In-Reply-To: <20260623013035.3436640-30-tyreld@linux.ibm.com> References: <20260623013035.3436640-1-tyreld@linux.ibm.com> <20260623013035.3436640-30-tyreld@linux.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 23 Jun 2026 02:04:39 +0000 Message-Id: <20260623020439.5AF251F000E9@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-scsi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 4 potential issue(= s) to consider: New issues: - [Critical] Pooled event structures retain stale NVMe request pointers, le= ading to exploitable Use-After-Free and double completions when non-NVMe co= mmands fail. - [High] Target kref is permanently leaked when an NVMe LS request fails or= is purged. Pre-existing issues: - [High] Returning SCSI_MLQUEUE_HOST_BUSY for NVMe requests on transport cl= osure causes permanent IO failures instead of requeuing. - [Medium] NVMe FCP requests are not purged during a transport reset, rende= ring the patch's FCP functionality dead code. -- commit 88ce7ce7f11c11cfca700c6919820cf4d9865c8a Author: Tyrel Datwyler ibmvfc: fail nvme-fc fcp-io and ls requests during transport reset =20 The current purge code for flushing outstanding commands during a transport reset only deals with SCSI commands. Rename the ibmvfc_scsi_eh_done completion handler to ibmvfc_vfc_eh_done and wire it to correctly complete nvme fcp and ls commands when flushing the inflight command list during a reset. > The current purge code for flushing outstanding commands during a > transport reset only deals with SCSI commands. Rename the > ibmvfc_scsi_eh_done completion handler to ibmvfc_vfc_eh_done and wire it > to correctly complete nvme fcp and ls commands when flushing the > inflight command list during a reset. [Severity: Medium] This isn't a bug introduced by this patch, but does the purge code actually evaluate inflight NVMe FCP commands? Looking at ibmvfc_purge_requests(), the loop iterates over vhost->crq and vhost->scsi_scrqs.scrqs: drivers/scsi/ibmvscsi/ibmvfc-core.c:ibmvfc_purge_requests() { ... for (i =3D 0; i < hwqs; i++) { spin_lock_irqsave(queues[i].q_lock, flags); ... } It seems to entirely miss vhost->nvme_scrqs.scrqs where FCP requests are queued. If so, inflight NVMe FCP commands might be orphaned during a reset, making the new FCP completion handling unreachable. > diff --git a/drivers/scsi/ibmvscsi/ibmvfc-core.c b/drivers/scsi/ibmvscsi/= ibmvfc-core.c > index a7183493cf965..aeb5b8902aa55 100644 > --- a/drivers/scsi/ibmvscsi/ibmvfc-core.c > +++ b/drivers/scsi/ibmvscsi/ibmvfc-core.c [ ... ] > @@ -1092,20 +1092,28 @@ void ibmvfc_free_event(struct ibmvfc_event *evt) > } > =20 > /** > - * ibmvfc_scsi_eh_done - EH done function for queuecommand commands > + * ibmvfc_vfc_eh_done - EH done function for queued IO > * @evt: ibmvfc event struct > * > - * This function does not setup any error status, that must be done > - * before this function gets called. > + * This function does not setup any error status for scsi commands, that= must be > + * done before this function gets called. > **/ > -static void ibmvfc_scsi_eh_done(struct ibmvfc_event *evt) > +static void ibmvfc_vfc_eh_done(struct ibmvfc_event *evt) > { > struct scsi_cmnd *cmnd =3D evt->cmnd; > + struct nvmefc_ls_req *ls_req =3D evt->ls_req; > + struct nvmefc_fcp_req *fcp_req =3D evt->fcp_req; > =20 > if (cmnd) { > scsi_dma_unmap(cmnd); > scsi_done(cmnd); > - } > + } else if (fcp_req) { > + fcp_req->rcv_rsplen =3D 0; > + fcp_req->transferred_length =3D 0; > + fcp_req->status =3D NVME_SC_INTERNAL; > + fcp_req->done(fcp_req); > + } else if (ls_req) > + ls_req->done(ls_req, -EIO); [Severity: High] Does this error handler leak the target kref for LS requests? In ibmvfc_nvme_ls_req(), a reference is taken via kref_get(&tgt->kref) which is typically dropped in the success path by ibmvfc_ls_req_done(). Since this patch replaces the completion handler with ibmvfc_vfc_eh_done(), the transport request is completed here but kref_put(&evt->tgt->kref, ...) is never called, potentially leaving the target structure leaked. > =20 > ibmvfc_free_event(evt); > } > @@ -1146,8 +1154,10 @@ static void ibmvfc_fail_request(struct ibmvfc_even= t *evt, int error_code) > BUG_ON(!atomic_dec_and_test(&evt->active)); > if (evt->cmnd) { > evt->cmnd->result =3D (error_code << 16); > - evt->done =3D ibmvfc_scsi_eh_done; > - } else > + evt->done =3D ibmvfc_vfc_eh_done; > + } else if (evt->fcp_req || evt->ls_req) > + evt->done =3D ibmvfc_vfc_eh_done; > + else [Severity: Critical] Can these checks be triggered by stale NVMe pointers on non-NVMe commands? When ibmvfc_get_event() reallocates an event structure from the pool, ibmvfc_init_event() doesn't clear evt->fcp_req or evt->ls_req to NULL: drivers/scsi/ibmvscsi/ibmvfc-core.c:ibmvfc_init_event() { ... evt->cmnd =3D NULL; evt->sync_iu =3D NULL; evt->eh_comp =3D NULL; ... } If a non-NVMe command (e.g., a MAD command) is allocated an event with stale NVMe pointers and fails here, will it incorrectly replace the completion handler with ibmvfc_vfc_eh_done() and later use-after-free the stale NVMe pointers? > evt->xfer_iu->mad_common.status =3D cpu_to_be16(IBMVFC_MAD_DRIVER_FAIL= ED); > =20 > timer_delete(&evt->timer); [ ... ] > @@ -1816,8 +1826,10 @@ int ibmvfc_send_event(struct ibmvfc_event *evt, [Severity: High] This isn't a bug introduced by this patch, but earlier in ibmvfc_send_event= (), it hardcodes a return of SCSI_MLQUEUE_HOST_BUSY on H_CLOSED without distinguishing between SCSI and NVMe commands: drivers/scsi/ibmvscsi/ibmvfc-core.c:ibmvfc_send_event() { ... if (rc =3D=3D H_CLOSED) { ... return SCSI_MLQUEUE_HOST_BUSY; } ... } This value (0x1055) is propagated back up through ibmvfc_nvme_fcp_io() directly to the NVMe FC transport layer. Since 0x1055 is neither 0 nor -EBUSY, will the NVMe FC core treat it as a hard failure and abort the request with BLK_STS_IOERR instead of handling the queue pressure appropriately? > dev_err(vhost->dev, "Send error (rc=3D%d)\n", rc); > if (evt->cmnd) { > evt->cmnd->result =3D DID_ERROR << 16; > - evt->done =3D ibmvfc_scsi_eh_done; > - } else > + evt->done =3D ibmvfc_vfc_eh_done; > + } else if (evt->fcp_req || evt->ls_req) > + evt->done =3D ibmvfc_vfc_eh_done; > + else > evt->xfer_iu->mad_common.status =3D cpu_to_be16(IBMVFC_MAD_CRQ_ERROR); > =20 > evt->done(evt); --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260623013035.3436= 640-1-tyreld@linux.ibm.com?part=3D29