From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFFB01FA6 for ; Wed, 9 Aug 2023 10:49:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 13A0BC433C9; Wed, 9 Aug 2023 10:49:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1691578167; bh=RgI5jdTOnRVkueDhWlTHRh5eBIn4loITRH0kFpcrMZA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=G6NcEdexVDbvb6pUQs4OQLYFgmbtuHeBjZ0JQmzpPdAsTjeNUSqEXtBqhr4Rge7Vm o7JcyIvCerIDELzAnpOnaiIWBnfb3ovFIQngvnh1pBFDOVSFRY8jzh9Oboys66gfI8 fnx4hQ00V6prWcPflyYGVZ+jLOGKtZQT4aJjlX2M= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Ilya Dryomov , Dongsheng Yang , Xiubo Li Subject: [PATCH 6.4 100/165] libceph: fix potential hang in ceph_osdc_notify() Date: Wed, 9 Aug 2023 12:40:31 +0200 Message-ID: <20230809103646.048888231@linuxfoundation.org> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230809103642.720851262@linuxfoundation.org> References: <20230809103642.720851262@linuxfoundation.org> User-Agent: quilt/0.67 Precedence: bulk X-Mailing-List: patches@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Ilya Dryomov commit e6e2843230799230fc5deb8279728a7218b0d63c upstream. If the cluster becomes unavailable, ceph_osdc_notify() may hang even with osd_request_timeout option set because linger_notify_finish_wait() waits for MWatchNotify NOTIFY_COMPLETE message with no associated OSD request in flight -- it's completely asynchronous. Introduce an additional timeout, derived from the specified notify timeout. While at it, switch both waits to killable which is more correct. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang Reviewed-by: Xiubo Li Signed-off-by: Greg Kroah-Hartman --- net/ceph/osd_client.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -3334,17 +3334,24 @@ static int linger_reg_commit_wait(struct int ret; dout("%s lreq %p linger_id %llu\n", __func__, lreq, lreq->linger_id); - ret = wait_for_completion_interruptible(&lreq->reg_commit_wait); + ret = wait_for_completion_killable(&lreq->reg_commit_wait); return ret ?: lreq->reg_commit_error; } -static int linger_notify_finish_wait(struct ceph_osd_linger_request *lreq) +static int linger_notify_finish_wait(struct ceph_osd_linger_request *lreq, + unsigned long timeout) { - int ret; + long left; dout("%s lreq %p linger_id %llu\n", __func__, lreq, lreq->linger_id); - ret = wait_for_completion_interruptible(&lreq->notify_finish_wait); - return ret ?: lreq->notify_finish_error; + left = wait_for_completion_killable_timeout(&lreq->notify_finish_wait, + ceph_timeout_jiffies(timeout)); + if (left <= 0) + left = left ?: -ETIMEDOUT; + else + left = lreq->notify_finish_error; /* completed */ + + return left; } /* @@ -4896,7 +4903,8 @@ int ceph_osdc_notify(struct ceph_osd_cli linger_submit(lreq); ret = linger_reg_commit_wait(lreq); if (!ret) - ret = linger_notify_finish_wait(lreq); + ret = linger_notify_finish_wait(lreq, + msecs_to_jiffies(2 * timeout * MSEC_PER_SEC)); else dout("lreq %p failed to initiate notify %d\n", lreq, ret);