From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 67C9D5478D for ; Wed, 24 Jun 2026 05:15:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782278140; cv=none; b=MYRN9sS5SUZ8omyk6/T0uD3OfJx2NR65QbL0mA2g7i7mO308nyv99iseJ7sCirJ/eUdHVkYHSNrn9ou3KcalL81Jl6VppwGZpPnYilEwBAekspBeVnmUA/mXWLgjPIfqw1D3bhPdr7jVQqCUXtPIJ2ZR9EF554WyfaNmMvXtl94= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782278140; c=relaxed/simple; bh=8JjrUODs5GfRovP7jc5jIZ4WqJEh2kMHThInF2i2z9M=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=S7ZvqSeUzn6HYC3LaDbG75xoh/M2qwx2hiVg5aZZFGyOcEzmQQKINJMOfJJhb++mYRs1Nx6UH11PpTFVawHE/mC9XB9BBLJcZODjQ2JgjPgLmToRr5HHeHv3gq/DUXVljvCrTJcVs4dkeOFtVrPuzASHSrlSQhvXpzPO41PxSYk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=kxIRq3vY; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kxIRq3vY" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 293A51F000E9; Wed, 24 Jun 2026 05:15:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782278139; bh=kSWlX+vgcuMzR3mTuGwMxBZDEmEaIeG2MdxlVwXGK24=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=kxIRq3vYHelrMW7Y24Hun8QPIRzerztb5a6DfnJwwsRvpIcXXPFnNzP9JCnhFpdQc cM7GDX7lqArl1EHQfO2mALivpQx1D3/cuzbooJ5RMnLsvkQ3Chu8ujqaxoFC6ftmD4 m2Y+19bbjXhpaVR+1+q56F0FhjQHwYnhYcRNPgLvlSNxca41pZzmQGhakjo5em27f7 c8oSAQHjteLkz3eZe41N4BNadab9Hvw80PR3586+zG1gxurhgdhqcoxGfdYWE2LNQ5 M71HdOtUV5ekfiWcPqzDY6+Bs9stQNjJQwFv6awGoVLpT13L0SK6vjDA2cF8/c6hp0 ji3sSdW8zDJVA== From: sashiko-bot@kernel.org Subject: Re: [PATCH v5 11/13] scsi: fnic: Track NVMe transport statistics Reply-To: sashiko-reviews@lists.linux.dev To: "Karan Tilak Kumar" Cc: linux-scsi@vger.kernel.org In-Reply-To: <20260624044334.3079-12-kartilak@cisco.com> References: <20260624044334.3079-1-kartilak@cisco.com> <20260624044334.3079-12-kartilak@cisco.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Wed, 24 Jun 2026 05:15:38 +0000 Message-Id: <20260624051539.293A51F000E9@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-scsi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 6 potential issue(= s) to consider: New issues: - [Medium] Concurrent completions can underflow `io_cmpl_skip`, permanently= breaking `io_completions` counting - [Medium] Unlocked read-modify-write race corrupting max statistics counte= rs - [Medium] The `active_fw_reqs` gauge is incremented but never decremented = in NVMe completion paths - [Low] Several newly added NVMe wait-queue statistics counters are dead co= de and never updated Pre-existing issues: - [High] Use-After-Free accessing IO request memory after hardware submissi= on - [High] Use-After-Free due to non-sync timer_delete() on NVMe LS requests -- commit 4e8ddd73c76a57b49825d3889a5531eb4520f10a Author: Karan Tilak Kumar scsi: fnic: Track NVMe transport statistics Add counters for NVMe requests, responses, LS handling, aborts, and wait-queue activity. Link: https://lore.kernel.org/oe-kbuild-all/202605280619.pmobiDWp-lkp@intel= .com/ Link: https://lore.kernel.org/oe-kbuild-all/202605280519.Jd4fmgAZ-lkp@intel= .com/ > diff --git a/drivers/scsi/fnic/fnic_nvme.c b/drivers/scsi/fnic/fnic_nvme.c > index c4de6606c0b2f..e322dd4dcb787 100644 > --- a/drivers/scsi/fnic/fnic_nvme.c > +++ b/drivers/scsi/fnic/fnic_nvme.c > @@ -71,6 +71,61 @@ static void nvfnic_update_io_bytes(struct fnic *fnic, > fnic->fcp_output_bytes +=3D io_req->fcp_req->transferred_length; > } > =20 > +static void nvfnic_update_io_stats(struct fnic *fnic, u8 opcode) > +{ [ ... ] > +static void nvfnic_update_cmpl_stats(struct fnic *fnic, > + struct fnic_io_req *io_req) > +{ > + struct io_path_stats *io_stats =3D &fnic->fnic_stats.io_stats; > + atomic64_t *duration_stat; > + unsigned long io_duration_time; > + > + atomic64_dec(&io_stats->active_ios); > + if (atomic64_read(&fnic->io_cmpl_skip)) > + atomic64_dec(&fnic->io_cmpl_skip); > + else > + atomic64_inc(&io_stats->io_completions); [Severity: Medium] Could concurrent completions underflow io_cmpl_skip? If multiple threads process completions concurrently in nvfnic_update_cmpl_stats(), they can both read a positive value and decrement the counter, causing it to fall below zero. Because atomic64_read() evaluates to true for negative values, all subsequent completions would continue to decrement io_cmpl_skip indefinitel= y. Will io_completions ever be incremented again if this happens? [ ... ] > @@ -198,6 +255,11 @@ inline int nvfnic_queue_wq_nvme_copy_desc(struct fni= c *fnic, > tport->max_payload_size, tport->r_a_tov, > tport->e_d_tov); > =20 > + atomic64_inc(&fnic->fnic_stats.fw_stats.active_fw_reqs); [Severity: Medium] Does active_fw_reqs ever get decremented? This gauge is incremented here in nvfnic_queue_wq_nvme_copy_desc() and also in nvfnic_queue_abort_io_req(), but there do not appear to be any corresponding atomic64_dec() calls added to the NVMe completion handlers. If it is never decremented when requests complete, will it act as a monotonically increasing counter rather than a gauge of in-flight requests? > + if (atomic64_read(&fnic->fnic_stats.fw_stats.active_fw_reqs) > > + atomic64_read(&fnic->fnic_stats.fw_stats.max_fw_reqs)) > + atomic64_set(&fnic->fnic_stats.fw_stats.max_fw_reqs, > + atomic64_read(&fnic->fnic_stats.fw_stats.active_fw_reqs)); [Severity: Medium] Can this read-modify-write pattern corrupt the maximum statistic? If active_fw_reqs drops between the first conditional check and the second atomic64_read() inside atomic64_set(), max_fw_reqs could be set to a lower value than the one that triggered the condition. Could this cause the maximum statistic to shrink over time under high concurrency? > =20 > spin_unlock_irqrestore(&fnic->wq_copy_lock[idx], intr_flags); > return 0; [ ... ] > @@ -296,6 +368,13 @@ int nvfnic_queuecommand(struct fnic_io_req *io_req) > (((u64)io_req->cmd_flags << 32) | io_req->cmd_state)); > return ret; > } > + > + atomic64_inc(&fnic_stats->io_stats.active_ios); > + atomic64_inc(&fnic_stats->io_stats.num_ios); > + if (atomic64_read(&fnic_stats->io_stats.active_ios) > > + atomic64_read(&fnic_stats->io_stats.max_active_ios)) > + atomic64_set(&fnic_stats->io_stats.max_active_ios, > + atomic64_read(&fnic_stats->io_stats.active_ios)); > io_req->cmd_flags |=3D FNIC_IO_ISSUED; > out: > lba =3D (char *)&cmdiu->sqe.rw.slba; [Severity: High] This isn't a bug introduced by this patch, but is there a use-after-free risk here? In nvfnic_queuecommand(), the IO request is submitted to the hardware via nvfnic_queue_wq_nvme_copy_desc(). Once the doorbell is written, the hardware can execute the NVMe command and post a completion immediately. If the thread executing nvfnic_queuecommand() is preempted after submitting the request, the completion handler could process the completion and free the fcp_req memory. When the preempted thread resumes, it writes to io_req->cmd_flags and reads from cmdiu->sqe.rw.slba: io_req->cmd_flags |=3D FNIC_IO_ISSUED; out: lba =3D (char *)&cmdiu->sqe.rw.slba; Should all local state modifications and tracing be performed before the hardware submission? [ ... ] > @@ -1231,6 +1365,7 @@ void nvfnic_ls_rsp_recv(struct fnic_iport_s *iport, > } > =20 > nvfnic_ls_req->state =3D FNIC_LS_REQ_CMD_COMPLETE; > + atomic64_inc(&fnic_stats->nvme_stats.nvme_ls_responses); > =20 > list_del_init(&nvfnic_ls_req->list); > lsreq->private =3D NULL; [Severity: High] This is a pre-existing issue, but does the timer cancellation create a use-after-free risk in nvfnic_ls_rsp_recv()? When timer_delete() is called without sync on nvfnic_ls_req->ls_req_timer: timer_delete(&nvfnic_ls_req->ls_req_timer); if (status =3D=3D 0) { ... spin_unlock_irqrestore(&fnic->fnic_lock, fnic->lock_flags); lsreq->done(lsreq, status); The timer callback nvfnic_ls_req_timeout() might be executing concurrently on another CPU. After the lock is dropped and lsreq->done() is called, the NVMe FC transport frees the lsreq structure.=20 If the concurrent timer callback resumes, it will dereference=20 nvfnic_ls_req->fnic. Is timer_delete_sync() required here? [ ... ] > @@ -1700,6 +1857,15 @@ void nvfnic_fcpio_abort(struct nvme_fc_local_port = *lport, > FNIC_NVME_DBG(KERN_INFO, fnic, > "cmd tag freed or not issued:0x%x sn:0x%08x\n", > io_req->tag, be32_to_cpu(cmd_iu->csn)); > + num_ios_waitq =3D > + atomic64_read(&fnic_stats->io_stats.nvme_num_ios_in_waitq); > + waitq_2sec =3D > + atomic64_read(&fnic_stats->io_stats.nvme_ios_in_waitq_3000_msec); > + waitq_max_time =3D > + atomic64_read(&fnic_stats->io_stats.nvme_ios_in_waitq_max_time); [Severity: Low] Are these wait-queue counters actually updated anywhere? The counters like nvme_ios_in_waitq_3000_msec and nvme_ios_in_waitq_max_time are read and logged here, but there does not appear to be any logic in the patch to increment them or measure the wait queue duration. Will these counters perpetually read as zero? --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260624044334.3079= -1-kartilak@cisco.com?part=3D11