From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AAECDCCA47F for ; Wed, 6 Jul 2022 15:33:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=LvpFNt3+QUl3ywyg5WKldcwIOL4sGyHHPIIlvRLaOOk=; b=UpkC5f5pnre3rOOFYYP6lJD2HB RPkuuKJnMbV1ZSzcqQX2KTqeDNpTdtdaO1HBhAPLwGMOUb/ulzWM7wmGLeuwlUg7av+8KQq1phfq6 SZjQF1LkxTqo1DTudMHAfaqxBD51gj/+UwPlFdYaLU23bq6Dv25awEj46xRh2lsMQoHwEJGTrPxvH Kp49N9HWok1GF0ubCofayYJKBDjMTGP23fBImywZiUZrk2ijdRhRlTsvoZbkarOwkB2HIcrMUsY3c mlmpYLK9tupXl9hlPo4O+ZU98kRIYky0k5fCGpRTdYt2uWvG634YVNHZycKCtSE3RZ4VELykoiCoX ibAFzxyw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1o971S-00B2z6-FB; Wed, 06 Jul 2022 15:33:02 +0000 Received: from ams.source.kernel.org ([145.40.68.75]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1o971M-00B2vv-VY for linux-nvme@lists.infradead.org; Wed, 06 Jul 2022 15:32:58 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B1045B81D8E; Wed, 6 Jul 2022 15:32:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8795EC341CD; Wed, 6 Jul 2022 15:32:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1657121574; bh=9mbzvjLbLLILSJPUT6/t8HLb6lxmwvL77vuZWExpcNw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=neUhIfvHGfTwAqPaMpFN6jAg0TsKDURInnTe6kDwBvY6plqWEhGXWDhPk2LAz+tDN mbQM0IECh/zpzS4PakGIDq6pXJuc72VoKeM8JEmjAgrgnyZm8wHGZL0W93uPwKSPUA PtceFHl9Qd69wGkuMlJvp0AYVhmlVew9GJ1zQoFhfnLcSQ3NCVDUbnnXCrU/qluW6q raHd8UYsenzto+alOwAJVtx1qUPkBqGRNf+KWU4dwmDctY3pRvqpNNFD4oITWRDTjz QK1GVypAf1ir9uv2hgpTzRFc1tiA9BxG6Qw9Qqdl0xuTPoenFDEx7nmbGW+jBf7Byd 0gCHiGcq7taVg== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Ruozhu Li , Sagi Grimberg , Christoph Hellwig , Sasha Levin , kbusch@kernel.org, axboe@fb.com, linux-nvme@lists.infradead.org Subject: [PATCH AUTOSEL 5.15 17/18] nvme: fix regression when disconnect a recovering ctrl Date: Wed, 6 Jul 2022 11:31:52 -0400 Message-Id: <20220706153153.1598076-17-sashal@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220706153153.1598076-1-sashal@kernel.org> References: <20220706153153.1598076-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220706_083257_342268_9561BBC3 X-CRM114-Status: GOOD ( 18.07 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org From: Ruozhu Li [ Upstream commit f7f70f4aa09dc43d7455c060143e86a017c30548 ] We encountered a problem that the disconnect command hangs. After analyzing the log and stack, we found that the triggering process is as follows: CPU0 CPU1 nvme_rdma_error_recovery_work nvme_rdma_teardown_io_queues nvme_do_delete_ctrl nvme_stop_queues nvme_remove_namespaces --clear ctrl->namespaces nvme_start_queues --no ns in ctrl->namespaces nvme_ns_remove return(because ctrl is deleting) blk_freeze_queue blk_mq_freeze_queue_wait --wait for ns to unquiesce to clean infligt IO, hang forever This problem was not found in older kernels because we will flush err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not seem to be modified for functional reasons, the patch can be revert to solve the problem. Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout") Signed-off-by: Ruozhu Li Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin --- drivers/nvme/host/core.c | 2 ++ drivers/nvme/host/nvme.h | 1 + drivers/nvme/host/rdma.c | 12 +++++++++--- drivers/nvme/host/tcp.c | 10 +++++++--- 4 files changed, 19 insertions(+), 6 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 19054b791c67..29b56ea01132 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4385,6 +4385,8 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl) nvme_stop_failfast_work(ctrl); flush_work(&ctrl->async_event_work); cancel_work_sync(&ctrl->fw_act_work); + if (ctrl->ops->stop_ctrl) + ctrl->ops->stop_ctrl(ctrl); } EXPORT_SYMBOL_GPL(nvme_stop_ctrl); diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 72bcd7e5716e..75a7e7baa1fc 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -495,6 +495,7 @@ struct nvme_ctrl_ops { void (*free_ctrl)(struct nvme_ctrl *ctrl); void (*submit_async_event)(struct nvme_ctrl *ctrl); void (*delete_ctrl)(struct nvme_ctrl *ctrl); + void (*stop_ctrl)(struct nvme_ctrl *ctrl); int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size); }; diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index d51f52e296f5..2db9c166a1b7 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1049,6 +1049,14 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl, } } +static void nvme_rdma_stop_ctrl(struct nvme_ctrl *nctrl) +{ + struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl); + + cancel_work_sync(&ctrl->err_work); + cancel_delayed_work_sync(&ctrl->reconnect_work); +} + static void nvme_rdma_free_ctrl(struct nvme_ctrl *nctrl) { struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl); @@ -2230,9 +2238,6 @@ static const struct blk_mq_ops nvme_rdma_admin_mq_ops = { static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown) { - cancel_work_sync(&ctrl->err_work); - cancel_delayed_work_sync(&ctrl->reconnect_work); - nvme_rdma_teardown_io_queues(ctrl, shutdown); blk_mq_quiesce_queue(ctrl->ctrl.admin_q); if (shutdown) @@ -2282,6 +2287,7 @@ static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = { .submit_async_event = nvme_rdma_submit_async_event, .delete_ctrl = nvme_rdma_delete_ctrl, .get_address = nvmf_get_address, + .stop_ctrl = nvme_rdma_stop_ctrl, }; /* diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 1821d38e620e..20138e132558 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -2163,9 +2163,6 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work) static void nvme_tcp_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown) { - cancel_work_sync(&to_tcp_ctrl(ctrl)->err_work); - cancel_delayed_work_sync(&to_tcp_ctrl(ctrl)->connect_work); - nvme_tcp_teardown_io_queues(ctrl, shutdown); blk_mq_quiesce_queue(ctrl->admin_q); if (shutdown) @@ -2205,6 +2202,12 @@ static void nvme_reset_ctrl_work(struct work_struct *work) nvme_tcp_reconnect_or_remove(ctrl); } +static void nvme_tcp_stop_ctrl(struct nvme_ctrl *ctrl) +{ + cancel_work_sync(&to_tcp_ctrl(ctrl)->err_work); + cancel_delayed_work_sync(&to_tcp_ctrl(ctrl)->connect_work); +} + static void nvme_tcp_free_ctrl(struct nvme_ctrl *nctrl) { struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl); @@ -2528,6 +2531,7 @@ static const struct nvme_ctrl_ops nvme_tcp_ctrl_ops = { .submit_async_event = nvme_tcp_submit_async_event, .delete_ctrl = nvme_tcp_delete_ctrl, .get_address = nvmf_get_address, + .stop_ctrl = nvme_tcp_stop_ctrl, }; static bool -- 2.35.1