From: xiubli@redhat.com
To: jlayton@kernel.org, idryomov@gmail.com, zyan@redhat.com
Cc: pdonnell@redhat.com, ceph-devel@vger.kernel.org,
Xiubo Li <xiubli@redhat.com>
Subject: [PATCH] ceph: make sure the mdsc->mutex is nested in s->s_mutex to fix dead lock
Date: Wed, 20 May 2020 03:51:19 -0400 [thread overview]
Message-ID: <1589961079-27932-1-git-send-email-xiubli@redhat.com> (raw)
From: Xiubo Li <xiubli@redhat.com>
The call trace:
<6>[15981.740583] libceph: mon2 (1)10.72.36.245:40083 session lost, hunting for new mon
<3>[16097.960293] INFO: task kworker/18:1:32111 blocked for more than 122 seconds.
<3>[16097.960860] Tainted: G E 5.7.0-rc5+ #80
<3>[16097.961332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[16097.961868] kworker/18:1 D 0 32111 2 0x80004080
<6>[16097.962151] Workqueue: ceph-msgr ceph_con_workfn [libceph]
<4>[16097.962188] Call Trace:
<4>[16097.962353] ? __schedule+0x276/0x6e0
<4>[16097.962359] ? schedule+0x40/0xb0
<4>[16097.962364] ? schedule_preempt_disabled+0xa/0x10
<4>[16097.962368] ? __mutex_lock.isra.8+0x2b5/0x4a0
<4>[16097.962460] ? kick_requests+0x21/0x100 [ceph]
<4>[16097.962485] ? ceph_mdsc_handle_mdsmap+0x19c/0x5f0 [ceph]
<4>[16097.962503] ? extra_mon_dispatch+0x34/0x40 [ceph]
<4>[16097.962523] ? extra_mon_dispatch+0x34/0x40 [ceph]
<4>[16097.962580] ? dispatch+0x77/0x930 [libceph]
<4>[16097.962602] ? try_read+0x78b/0x11e0 [libceph]
<4>[16097.962619] ? __switch_to_asm+0x40/0x70
<4>[16097.962623] ? __switch_to_asm+0x34/0x70
<4>[16097.962627] ? __switch_to_asm+0x40/0x70
<4>[16097.962631] ? __switch_to_asm+0x34/0x70
<4>[16097.962635] ? __switch_to_asm+0x40/0x70
<4>[16097.962654] ? ceph_con_workfn+0x130/0x5e0 [libceph]
<4>[16097.962713] ? process_one_work+0x1ad/0x370
<4>[16097.962717] ? worker_thread+0x30/0x390
<4>[16097.962722] ? create_worker+0x1a0/0x1a0
<4>[16097.962737] ? kthread+0x112/0x130
<4>[16097.962742] ? kthread_park+0x80/0x80
<4>[16097.962747] ? ret_from_fork+0x35/0x40
<3>[16097.962758] INFO: task kworker/25:1:1747 blocked for more than 122 seconds.
<3>[16097.963233] Tainted: G E 5.7.0-rc5+ #80
<3>[16097.963792] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6>[16097.964298] kworker/25:1 D 0 1747 2 0x80004080
<6>[16097.964325] Workqueue: ceph-msgr ceph_con_workfn [libceph]
<4>[16097.964331] Call Trace:
<4>[16097.964340] ? __schedule+0x276/0x6e0
<4>[16097.964344] ? schedule+0x40/0xb0
<4>[16097.964347] ? schedule_preempt_disabled+0xa/0x10
<4>[16097.964351] ? __mutex_lock.isra.8+0x2b5/0x4a0
<4>[16097.964376] ? handle_reply+0x33f/0x6f0 [ceph]
<4>[16097.964407] ? dispatch+0xa6/0xbc0 [ceph]
<4>[16097.964429] ? read_partial_message+0x214/0x770 [libceph]
<4>[16097.964449] ? try_read+0x78b/0x11e0 [libceph]
<4>[16097.964454] ? __switch_to_asm+0x40/0x70
<4>[16097.964458] ? __switch_to_asm+0x34/0x70
<4>[16097.964461] ? __switch_to_asm+0x40/0x70
<4>[16097.964465] ? __switch_to_asm+0x34/0x70
<4>[16097.964470] ? __switch_to_asm+0x40/0x70
<4>[16097.964489] ? ceph_con_workfn+0x130/0x5e0 [libceph]
<4>[16097.964494] ? process_one_work+0x1ad/0x370
<4>[16097.964498] ? worker_thread+0x30/0x390
<4>[16097.964501] ? create_worker+0x1a0/0x1a0
<4>[16097.964506] ? kthread+0x112/0x130
<4>[16097.964511] ? kthread_park+0x80/0x80
<4>[16097.964516] ? ret_from_fork+0x35/0x40
URL: https://tracker.ceph.com/issues/45609
Reported-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
fs/ceph/mds_client.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 6c283c5..0e0ab01 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -3769,8 +3769,6 @@ static int encode_snap_realms(struct ceph_mds_client *mdsc,
* recovering MDS might have.
*
* This is a relatively heavyweight operation, but it's rare.
- *
- * called with mdsc->mutex held.
*/
static void send_mds_reconnect(struct ceph_mds_client *mdsc,
struct ceph_mds_session *session)
@@ -4024,7 +4022,9 @@ static void check_new_map(struct ceph_mds_client *mdsc,
oldstate != CEPH_MDS_STATE_STARTING)
pr_info("mds%d recovery completed\n", s->s_mds);
kick_requests(mdsc, i);
+ mutex_unlock(&mdsc->mutex);
mutex_lock(&s->s_mutex);
+ mutex_lock(&mdsc->mutex);
ceph_kick_flushing_caps(mdsc, s);
mutex_unlock(&s->s_mutex);
wake_up_session_caps(s, RECONNECT);
--
1.8.3.1
next reply other threads:[~2020-05-20 7:51 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-20 7:51 xiubli [this message]
2020-05-20 11:44 ` [PATCH] ceph: make sure the mdsc->mutex is nested in s->s_mutex to fix dead lock Jeff Layton
2020-05-27 12:25 ` Ilya Dryomov
2020-05-27 12:47 ` Xiubo Li
2020-05-27 13:47 ` Jeff Layton
2020-05-27 13:53 ` Xiubo Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1589961079-27932-1-git-send-email-xiubli@redhat.com \
--to=xiubli@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
--cc=jlayton@kernel.org \
--cc=pdonnell@redhat.com \
--cc=zyan@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).