From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05376C4742C for ; Fri, 13 Nov 2020 10:39:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A64342224B for ; Fri, 13 Nov 2020 10:39:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726290AbgKMKjv convert rfc822-to-8bit (ORCPT ); Fri, 13 Nov 2020 05:39:51 -0500 Received: from mx2.suse.de ([195.135.220.15]:36466 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726237AbgKMKju (ORCPT ); Fri, 13 Nov 2020 05:39:50 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id E848FABD1; Fri, 13 Nov 2020 10:39:48 +0000 (UTC) Received: from localhost (brahms [local]) by brahms (OpenSMTPD) with ESMTPA id d98a3e14; Fri, 13 Nov 2020 10:40:01 +0000 (UTC) From: Luis Henriques To: Jeff Layton Cc: ceph-devel@vger.kernel.org, gengjichao@jd.com Subject: Re: [PATCH] ceph: when filling trace, do ceph_get_inode outside of mutexes References: <20201112181555.539948-1-jlayton@kernel.org> <720728ce40c340d5db2a4c003f89483a68876fe5.camel@kernel.org> Date: Fri, 13 Nov 2020 10:40:01 +0000 In-Reply-To: <720728ce40c340d5db2a4c003f89483a68876fe5.camel@kernel.org> (Jeff Layton's message of "Thu, 12 Nov 2020 14:24:10 -0500") Message-ID: <87361d8rr2.fsf@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org Jeff Layton writes: > On Thu, 2020-11-12 at 13:15 -0500, Jeff Layton wrote: >> Geng Jichao reported a rather complex deadlock involving several >> moving parts: >> >> 1) readahead is issued against an inode and some of its pages are locked >>    while the read is in flight >> >> 2) the same inode is evicted from the cache, and this task gets stuck >>    waiting for the page lock because of the above readahead >> >> 3) another task is processing a reply trace, and looks up the inode >>    being evicted while holding the s_mutex. That ends up waiting for the >>    eviction to complete >> >> 4) a write reply for an unrelated inode is then processed in the >>    ceph_con_workfn job. It calls ceph_check_caps after putting wrbuffer >>    caps, and that gets stuck waiting on the s_mutex held by 3. >> >> The reply to "1" is stuck behind the write reply in "4", so we deadlock >> at that point. >> >> This patch changes the trace processing to call ceph_get_inode outside >> of the s_mutex and snap_rwsem, which should break the cycle above. >> >> URL: https://tracker.ceph.com/issues/47998 >> Reported-by: Geng Jichao >> Signed-off-by: Jeff Layton >> --- >>  fs/ceph/inode.c | 13 ++++--------- >>  fs/ceph/mds_client.c | 15 +++++++++++++++ >>  2 files changed, 19 insertions(+), 9 deletions(-) >> > > My belief is that this patch probably won't hurt anything and might fix > this specific deadlock. I do however wonder if we might have a whole > class of deadlocks of this nature. I can't quite prove it by inspection > though, so testing with this would be a good datapoint. Yeah, I keep staring at this and can't decide either. But I agree with you: this may or may not fix the issue, but shouldn't hurt. Just a minor comment bellow: >> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c >> index 88bbeb05bd27..9b85d86d8efb 100644 >> --- a/fs/ceph/inode.c >> +++ b/fs/ceph/inode.c >> @@ -1315,15 +1315,10 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req) >>   } >>   >> >> >> >>   if (rinfo->head->is_target) { >> - tvino.ino = le64_to_cpu(rinfo->targeti.in->ino); >> - tvino.snap = le64_to_cpu(rinfo->targeti.in->snapid); >> - >> - in = ceph_get_inode(sb, tvino); >> - if (IS_ERR(in)) { >> - err = PTR_ERR(in); >> - goto done; >> - } >> + /* Should be filled in by handle_reply */ >> + BUG_ON(!req->r_target_inode); >>   >> >> >> >> + in = req->r_target_inode; >>   err = ceph_fill_inode(in, req->r_locked_page, &rinfo->targeti, >>   NULL, session, >>   (!test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags) && >> @@ -1333,13 +1328,13 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req) >>   if (err < 0) { >>   pr_err("ceph_fill_inode badness %p %llx.%llx\n", >>   in, ceph_vinop(in)); >> + req->r_target_inode = NULL; >>   if (in->i_state & I_NEW) >>   discard_new_inode(in); >>   else >>   iput(in); >>   goto done; >>   } >> - req->r_target_inode = in; >>   if (in->i_state & I_NEW) >>   unlock_new_inode(in); >>   } >> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c >> index b3c941503f8c..48db1f593d81 100644 >> --- a/fs/ceph/mds_client.c >> +++ b/fs/ceph/mds_client.c >> @@ -3179,6 +3179,21 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg) >>   err = parse_reply_info(session, msg, rinfo, session->s_con.peer_features); >>   mutex_unlock(&mdsc->mutex); >>   >> >> >> >> + /* Must find target inode outside of mutexes to avoid deadlocks */ >> + if (rinfo->head->is_target) { This condition should probably be if ((err >= 0) && rinfo->head->is_target) { This would allow a faster exit path and eventually avoid masking this 'err' value with another one from ceph_get_inode(). Cheers, -- Luis >> + struct inode *in; >> + struct ceph_vino tvino = { .ino = le64_to_cpu(rinfo->targeti.in->ino), >> + .snap = le64_to_cpu(rinfo->targeti.in->snapid) }; >> + >> + in = ceph_get_inode(mdsc->fsc->sb, tvino); >> + if (IS_ERR(in)) { >> + err = PTR_ERR(in); >> + mutex_lock(&session->s_mutex); >> + goto out_err; >> + } >> + req->r_target_inode = in; >> + } >> + >>   mutex_lock(&session->s_mutex); >>   if (err < 0) { >>   pr_err("mdsc_handle_reply got corrupt reply mds%d(tid:%lld)\n", mds, tid); > > -- > Jeff Layton >