From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Sagi Grimberg <sagi@grimberg.me>, Alex Turin <alex@vastdata.com>,
Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>,
Sasha Levin <sashal@kernel.org>,
kch@nvidia.com, linux-nvme@lists.infradead.org
Subject: [PATCH AUTOSEL 6.6 10/18] nvmet: fix a possible leak when destroy a ctrl during qp establishment
Date: Wed, 5 Jun 2024 08:03:49 -0400 [thread overview]
Message-ID: <20240605120409.2967044-10-sashal@kernel.org> (raw)
In-Reply-To: <20240605120409.2967044-1-sashal@kernel.org>
From: Sagi Grimberg <sagi@grimberg.me>
[ Upstream commit c758b77d4a0a0ed3a1292b3fd7a2aeccd1a169a4 ]
In nvmet_sq_destroy we capture sq->ctrl early and if it is non-NULL we
know that a ctrl was allocated (in the admin connect request handler)
and we need to release pending AERs, clear ctrl->sqs and sq->ctrl
(for nvme-loop primarily), and drop the final reference on the ctrl.
However, a small window is possible where nvmet_sq_destroy starts (as
a result of the client giving up and disconnecting) concurrently with
the nvme admin connect cmd (which may be in an early stage). But *before*
kill_and_confirm of sq->ref (i.e. the admin connect managed to get an sq
live reference). In this case, sq->ctrl was allocated however after it was
captured in a local variable in nvmet_sq_destroy.
This prevented the final reference drop on the ctrl.
Solve this by re-capturing the sq->ctrl after all inflight request has
completed, where for sure sq->ctrl reference is final, and move forward
based on that.
This issue was observed in an environment with many hosts connecting
multiple ctrls simoutanuosly, creating a delay in allocating a ctrl
leading up to this race window.
Reported-by: Alex Turin <alex@vastdata.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/nvme/target/core.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 3935165048e74..8af930e05d96c 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -803,6 +803,15 @@ void nvmet_sq_destroy(struct nvmet_sq *sq)
percpu_ref_exit(&sq->ref);
nvmet_auth_sq_free(sq);
+ /*
+ * we must reference the ctrl again after waiting for inflight IO
+ * to complete. Because admin connect may have sneaked in after we
+ * store sq->ctrl locally, but before we killed the percpu_ref. the
+ * admin connect allocates and assigns sq->ctrl, which now needs a
+ * final ref put, as this ctrl is going away.
+ */
+ ctrl = sq->ctrl;
+
if (ctrl) {
/*
* The teardown flow may take some time, and the host may not
--
2.43.0
next prev parent reply other threads:[~2024-06-05 12:04 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-05 12:03 [PATCH AUTOSEL 6.6 01/18] nvme-multipath: find NUMA path only for online numa-node Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 02/18] dma-mapping: benchmark: avoid needless copy_to_user if benchmark fails Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 03/18] connector: Fix invalid conversion in cn_proc.h Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 04/18] nvme: adjust multiples of NVME_CTRL_PAGE_SIZE in offset Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 05/18] afs: Don't cross .backup mountpoint from backup volume Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 06/18] regmap-i2c: Subtract reg size from max_write Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 07/18] platform/x86: touchscreen_dmi: Add support for setting touchscreen properties from cmdline Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 08/18] platform/x86: touchscreen_dmi: Add info for GlobalSpace SolT IVW 11.6" tablet Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 09/18] platform/x86: touchscreen_dmi: Add info for the EZpad 6s Pro Sasha Levin
2024-06-05 12:03 ` Sasha Levin [this message]
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 11/18] kbuild: fix short log for AS in link-vmlinux.sh Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 12/18] nfc/nci: Add the inconsistency check between the input data length and count Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 13/18] spi: cadence: Ensure data lines set to low during dummy-cycle period Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 14/18] ALSA: ump: Set default protocol when not given explicitly Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 15/18] drm/amdgpu: silence UBSAN warning Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 16/18] drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 17/18] Revert "drm/amdkfd: fix gfx_target_version for certain 11.0.3 devices" Sasha Levin
2024-06-05 12:03 ` [PATCH AUTOSEL 6.6 18/18] null_blk: Do not allow runt zone with zone capacity smaller then zone size Sasha Levin
2024-06-05 14:59 ` François Valenduc
2024-06-05 15:04 ` François Valenduc
2024-07-08 11:55 ` Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240605120409.2967044-10-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=alex@vastdata.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.