From: Luis Henriques <lhenriques@suse.de>
To: Jeff Layton <jlayton@kernel.org>
Cc: ceph-devel@vger.kernel.org, gengjichao@jd.com
Subject: Re: [PATCH v2] ceph: when filling trace, call ceph_get_inode outside of mutexes
Date: Fri, 13 Nov 2020 13:46:36 +0000 [thread overview]
Message-ID: <87tutt74jn.fsf@suse.de> (raw)
In-Reply-To: <20201113122142.11979-1-jlayton@kernel.org> (Jeff Layton's message of "Fri, 13 Nov 2020 07:21:42 -0500")
Jeff Layton <jlayton@kernel.org> writes:
> Geng Jichao reported a rather complex deadlock involving several
> moving parts:
>
> 1) readahead is issued against an inode and some of its pages are locked
> while the read is in flight.
>
> 2) the same inode is evicted from the cache, and this task gets stuck
> waiting for the page lock because of the above readahead.
>
> 3) another task is processing a reply trace, and looks up the inode
> being evicted while holding the s_mutex. That ends up waiting for the
> eviction to complete.
>
> 4) a write reply for an unrelated inode is then processed in the
> ceph_con_workfn job. It calls ceph_check_caps after putting wrbuffer
> caps, and that gets stuck waiting on the s_mutex held by 3.
>
> The reply to "1" is stuck behind the write reply in "4", so it deadlocks
> at that point.
>
> This patch changes the trace processing to call ceph_get_inode outside
> of the s_mutex and snap_rwsem, which should break the cycle above.
>
> URL: https://tracker.ceph.com/issues/47998
> Reported-by: Geng Jichao <gengjichao@jd.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/ceph/inode.c | 13 ++++---------
> fs/ceph/mds_client.c | 15 +++++++++++++++
> 2 files changed, 19 insertions(+), 9 deletions(-)
>
> v2: don't call ceph_get_inode if err < 0
Thanks, looks good (AFAICT!).
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Cheers,
--
Luis
>
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 88bbeb05bd27..9b85d86d8efb 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -1315,15 +1315,10 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
> }
>
> if (rinfo->head->is_target) {
> - tvino.ino = le64_to_cpu(rinfo->targeti.in->ino);
> - tvino.snap = le64_to_cpu(rinfo->targeti.in->snapid);
> -
> - in = ceph_get_inode(sb, tvino);
> - if (IS_ERR(in)) {
> - err = PTR_ERR(in);
> - goto done;
> - }
> + /* Should be filled in by handle_reply */
> + BUG_ON(!req->r_target_inode);
>
> + in = req->r_target_inode;
> err = ceph_fill_inode(in, req->r_locked_page, &rinfo->targeti,
> NULL, session,
> (!test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags) &&
> @@ -1333,13 +1328,13 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
> if (err < 0) {
> pr_err("ceph_fill_inode badness %p %llx.%llx\n",
> in, ceph_vinop(in));
> + req->r_target_inode = NULL;
> if (in->i_state & I_NEW)
> discard_new_inode(in);
> else
> iput(in);
> goto done;
> }
> - req->r_target_inode = in;
> if (in->i_state & I_NEW)
> unlock_new_inode(in);
> }
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index b3c941503f8c..4fab37d858ce 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -3179,6 +3179,21 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
> err = parse_reply_info(session, msg, rinfo, session->s_con.peer_features);
> mutex_unlock(&mdsc->mutex);
>
> + /* Must find target inode outside of mutexes to avoid deadlocks */
> + if ((err >= 0) && rinfo->head->is_target) {
> + struct inode *in;
> + struct ceph_vino tvino = { .ino = le64_to_cpu(rinfo->targeti.in->ino),
> + .snap = le64_to_cpu(rinfo->targeti.in->snapid) };
> +
> + in = ceph_get_inode(mdsc->fsc->sb, tvino);
> + if (IS_ERR(in)) {
> + err = PTR_ERR(in);
> + mutex_lock(&session->s_mutex);
> + goto out_err;
> + }
> + req->r_target_inode = in;
> + }
> +
> mutex_lock(&session->s_mutex);
> if (err < 0) {
> pr_err("mdsc_handle_reply got corrupt reply mds%d(tid:%lld)\n", mds, tid);
> --
>
> 2.28.0
>
prev parent reply other threads:[~2020-11-13 13:46 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-13 12:21 [PATCH v2] ceph: when filling trace, call ceph_get_inode outside of mutexes Jeff Layton
2020-11-13 13:46 ` Luis Henriques [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tutt74jn.fsf@suse.de \
--to=lhenriques@suse.de \
--cc=ceph-devel@vger.kernel.org \
--cc=gengjichao@jd.com \
--cc=jlayton@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox