From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A4EC14F8E for ; Wed, 9 Aug 2023 11:10:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CBFD9C433C8; Wed, 9 Aug 2023 11:10:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1691579442; bh=b6D4oGDJPJ/t8wjEXFx6aE0D7uDPZU5QAoKaGZliVC4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=BwfC4GlPv/3AIoKTZLSA6VjtAzpDDxIpdEpNUGMYF+zztoynY6WHEjHCyBgFeKLFp Rriv3BB0e2pMHnCJR+3CC9m9pNr/tmY5EflrWqysUWe/K8KBulVUjNbB1tp9TLPjru YwmXju/5SwVfxaXvqhmTeSMEKiWIX00QgANWW50I= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Ilya Dryomov , Dongsheng Yang , Xiubo Li Subject: [PATCH 4.14 197/204] libceph: fix potential hang in ceph_osdc_notify() Date: Wed, 9 Aug 2023 12:42:15 +0200 Message-ID: <20230809103649.081278214@linuxfoundation.org> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230809103642.552405807@linuxfoundation.org> References: <20230809103642.552405807@linuxfoundation.org> User-Agent: quilt/0.67 Precedence: bulk X-Mailing-List: patches@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Ilya Dryomov commit e6e2843230799230fc5deb8279728a7218b0d63c upstream. If the cluster becomes unavailable, ceph_osdc_notify() may hang even with osd_request_timeout option set because linger_notify_finish_wait() waits for MWatchNotify NOTIFY_COMPLETE message with no associated OSD request in flight -- it's completely asynchronous. Introduce an additional timeout, derived from the specified notify timeout. While at it, switch both waits to killable which is more correct. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov Reviewed-by: Dongsheng Yang Reviewed-by: Xiubo Li Signed-off-by: Greg Kroah-Hartman --- net/ceph/osd_client.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -3041,17 +3041,24 @@ static int linger_reg_commit_wait(struct int ret; dout("%s lreq %p linger_id %llu\n", __func__, lreq, lreq->linger_id); - ret = wait_for_completion_interruptible(&lreq->reg_commit_wait); + ret = wait_for_completion_killable(&lreq->reg_commit_wait); return ret ?: lreq->reg_commit_error; } -static int linger_notify_finish_wait(struct ceph_osd_linger_request *lreq) +static int linger_notify_finish_wait(struct ceph_osd_linger_request *lreq, + unsigned long timeout) { - int ret; + long left; dout("%s lreq %p linger_id %llu\n", __func__, lreq, lreq->linger_id); - ret = wait_for_completion_interruptible(&lreq->notify_finish_wait); - return ret ?: lreq->notify_finish_error; + left = wait_for_completion_killable_timeout(&lreq->notify_finish_wait, + ceph_timeout_jiffies(timeout)); + if (left <= 0) + left = left ?: -ETIMEDOUT; + else + left = lreq->notify_finish_error; /* completed */ + + return left; } /* @@ -4666,7 +4673,8 @@ int ceph_osdc_notify(struct ceph_osd_cli ret = linger_reg_commit_wait(lreq); if (!ret) - ret = linger_notify_finish_wait(lreq); + ret = linger_notify_finish_wait(lreq, + msecs_to_jiffies(2 * timeout * MSEC_PER_SEC)); else dout("lreq %p failed to initiate notify %d\n", lreq, ret);