From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6F82BEB64DC for ; Tue, 11 Jul 2023 09:41:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=XsHk3s5WKMPNtRSOR2hho7ctM2lgfEHLxGIuRz1PzpE=; b=HXgK//oYaoxp1NBLKqdW8Hh0Ck MMGUZSo9LllsQJP6AvTC1DcnLtjJOL8ybkZyFvAaXvqIHoZyZeFGlpoigyLeir3fkvHdxaEmw8rM8 +RhjDxNK0JB9xOqrE0e1jN3vDpu1tCKTML+t5ypP5TJvs0l2L9I5g1mu8QTidNlLSIUDx+pGjL81D Gv6xej02YvOz2eR69MXK+MwzHfJOK+Fh8rmN3Pc5A0VZHYy3ap46zp4jp863JIq4wbHu6Svss8gUM j9SxCKfXlNjvoXpqjQd7adG4rP6gp9XSZmwbnJunw7lE+0s+KfC+RQ26xMrOa4iLK+jabJkGLmjFd I1gFPHMw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qJ9rl-00EKHC-0o; Tue, 11 Jul 2023 09:41:05 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qJ9ri-00EKGH-1r for linux-nvme@lists.infradead.org; Tue, 11 Jul 2023 09:41:03 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1689068461; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XsHk3s5WKMPNtRSOR2hho7ctM2lgfEHLxGIuRz1PzpE=; b=NYK/7FSooZ0d7DTHOl06DdOMattU6fVdZkh9ZY2r9GXU9h1KB3NiQCBL79KMDxMpX9XhvO t2wR1Hx96PAUBKZv4Zvr92UH1R4xBfMu7M1tI5V7mDbNs+s2SrL/Auz3Hz1Q+4APJ5FP6E zZ+uDXyRQy8Mlmh84i1tUIrwPFRURrE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-52-xcXtflm3M6Cx6BYMoIu5Bg-1; Tue, 11 Jul 2023 05:40:58 -0400 X-MC-Unique: xcXtflm3M6Cx6BYMoIu5Bg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A24571C4FDAC; Tue, 11 Jul 2023 09:40:56 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id C3F31207B356; Tue, 11 Jul 2023 09:40:55 +0000 (UTC) From: Ming Lei To: Christoph Hellwig , Sagi Grimberg , Keith Busch , linux-nvme@lists.infradead.org Cc: Yi Zhang , Chunguang Xu , Ming Lei , stable@vger.kernel.org Subject: [PATCH V2 1/3] nvme: fix possible hang when removing a controller during error recovery Date: Tue, 11 Jul 2023 17:40:39 +0800 Message-Id: <20230711094041.1819102-2-ming.lei@redhat.com> In-Reply-To: <20230711094041.1819102-1-ming.lei@redhat.com> References: <20230711094041.1819102-1-ming.lei@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230711_024102_679192_7123074A X-CRM114-Status: GOOD ( 15.64 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org Error recovery can be interrupted by controller removal, then the controller is left as quiesced, and IO hang can be caused. Fix the issue by unquiescing controller unconditionally when removing namespaces. This way is reasonable and safe given forward progress can be made when removing namespaces. Reviewed-by: Keith Busch Reviewed-by: Sagi Grimberg Reported-by: Chunguang Xu Closes: https://lore.kernel.org/linux-nvme/cover.1685350577.git.chunguang.xu@shopee.com/ Cc: stable@vger.kernel.org Signed-off-by: Ming Lei --- drivers/nvme/host/core.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 47d7ba2827ff..98fa8315bc65 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3903,6 +3903,12 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl) */ nvme_mpath_clear_ctrl_paths(ctrl); + /* + * Unquiesce io queues so any pending IO won't hang, especially + * those submitted from scan work + */ + nvme_unquiesce_io_queues(ctrl); + /* prevent racing with ns scanning */ flush_work(&ctrl->scan_work); @@ -3912,10 +3918,8 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl) * removing the namespaces' disks; fail all the queues now to avoid * potentially having to clean up the failed sync later. */ - if (ctrl->state == NVME_CTRL_DEAD) { + if (ctrl->state == NVME_CTRL_DEAD) nvme_mark_namespaces_dead(ctrl); - nvme_unquiesce_io_queues(ctrl); - } /* this is a no-op when called from the controller reset handler */ nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING_NOIO); -- 2.40.1