* Circular lock / deadlock in kernel client
@ 2011-11-30 12:21 Amon Ott
2011-11-30 17:20 ` Sage Weil
0 siblings, 1 reply; 11+ messages in thread
From: Amon Ott @ 2011-11-30 12:21 UTC (permalink / raw)
To: ceph-devel
[-- Attachment #1: Type: text/plain, Size: 917 bytes --]
Hi!
With some kernel debug options for soft and hard lockup detection, I got some
fine traces. My kernel is a 3.1.4 to which I have ported from ceph-client
for-linus branch what is suitable for 3.1. If needed, I can make my exact
ceph code available.
Traces are attached. It seems that two depending locks can be acquired in
different order at different parts of the code, and thus lead to a deadlock.
Additionally, I am still trying to reproduce a partial lockup of single dirs
with this debugging. Those are likely to be related to mutex locking dirs
without unlocking properly.
Amon Ott
--
Dr. Amon Ott
m-privacy GmbH Tel: +49 30 24342334
Am Köllnischen Park 1 Fax: +49 30 24342336
10179 Berlin http://www.m-privacy.de
Amtsgericht Charlottenburg, HRB 84946
Geschäftsführer:
Dipl.-Kfm. Holger Maczkowsky,
Roman Maczkowsky
GnuPG-Key-ID: 0x2DD3A649
[-- Attachment #2: circ1.log --]
[-- Type: text/x-log, Size: 7178 bytes --]
Nov 30 10:40:21 tgpro1 kernel: =======================================================
Nov 30 10:40:21 tgpro1 kernel: [ INFO: possible circular locking dependency detected ]
Nov 30 10:40:21 tgpro1 kernel: 3.1.3-rsbac #1
Nov 30 10:40:21 tgpro1 kernel: -------------------------------------------------------
Nov 30 10:40:21 tgpro1 kernel: kworker/0:0/8787 is trying to acquire lock:
Nov 30 10:40:21 tgpro1 kernel: (&sb->s_type->i_lock_key#16){+.+...}, at: [<000f7176>] igrab+0x11/0x41
Nov 30 10:40:21 tgpro1 kernel:
Nov 30 10:40:21 tgpro1 kernel: but task is already holding lock:
Nov 30 10:40:21 tgpro1 kernel: (&(&s->s_cap_lock)->rlock){+.+...}, at: [<006e0de4>] iterate_session_caps+0x40/0x17d [ceph]
Nov 30 10:40:21 tgpro1 kernel:
Nov 30 10:40:21 tgpro1 kernel: which lock already depends on the new lock.
Nov 30 10:40:21 tgpro1 kernel:
Nov 30 10:40:21 tgpro1 kernel:
Nov 30 10:40:21 tgpro1 kernel: the existing dependency chain (in reverse order) is:
Nov 30 10:40:21 tgpro1 kernel:
Nov 30 10:40:21 tgpro1 kernel: -> #1 (&(&s->s_cap_lock)->rlock){+.+...}:
Nov 30 10:40:21 tgpro1 kernel: [<0009038d>] lock_acquire+0x42/0x59
Nov 30 10:40:21 tgpro1 kernel: [<0052f167>] _raw_spin_lock+0x24/0x33
Nov 30 10:40:21 tgpro1 kernel: [<006dc0bc>] ceph_add_cap+0x302/0x551 [ceph]
Nov 30 10:40:21 tgpro1 kernel: [<006cfbfd>] fill_inode+0x5ef/0x72e [ceph]
Nov 30 10:40:21 tgpro1 kernel: [<006d0d6b>] ceph_fill_trace+0x663/0x6c5 [ceph]
Nov 30 10:40:21 tgpro1 kernel: [<006e4c6e>] dispatch+0xafe/0x10e7 [ceph]
Nov 30 10:40:21 tgpro1 kernel: [<006a444d>] con_work+0x14f7/0x16c6 [libceph]
Nov 30 10:40:21 tgpro1 kernel: [<0007a24b>] process_one_work+0x223/0x391
Nov 30 10:40:21 tgpro1 kernel: [<0007a747>] worker_thread+0x182/0x2d7
Nov 30 10:40:21 tgpro1 kernel: [<0007d767>] kthread+0x62/0x67
Nov 30 10:40:21 tgpro1 kernel: [<00530b26>] kernel_thread_helper+0x6/0xd
Nov 30 10:40:21 tgpro1 kernel:
Nov 30 10:40:21 tgpro1 kernel: -> #0 (&sb->s_type->i_lock_key#16){+.+...}:
Nov 30 10:40:21 tgpro1 kernel: [<0008fa3e>] __lock_acquire+0xe79/0x1425
Nov 30 10:40:21 tgpro1 kernel: [<0009038d>] lock_acquire+0x42/0x59
Nov 30 10:40:21 tgpro1 kernel: [<0052f167>] _raw_spin_lock+0x24/0x33
Nov 30 10:40:21 tgpro1 kernel: [<000f7176>] igrab+0x11/0x41
Nov 30 10:40:21 tgpro1 kernel: [<006e0e14>] iterate_session_caps+0x70/0x17d [ceph]
Nov 30 10:40:21 tgpro1 kernel: [<006e3bca>] send_mds_reconnect+0x319/0x46a [ceph]
Nov 30 10:40:21 tgpro1 kernel: [<006e4035>] ceph_mdsc_handle_map+0x2f2/0x42d [ceph]
Nov 30 10:40:21 tgpro1 kernel: [<006ce00d>] extra_mon_dispatch+0x18/0x1f [ceph]
Nov 30 10:40:21 tgpro1 kernel: [<006a63dd>] dispatch+0x4f7/0x52a [libceph]
Nov 30 10:40:21 tgpro1 kernel: [<006a444d>] con_work+0x14f7/0x16c6 [libceph]
Nov 30 10:40:21 tgpro1 kernel: [<0007a24b>] process_one_work+0x223/0x391
Nov 30 10:40:21 tgpro1 kernel: [<0007a747>] worker_thread+0x182/0x2d7
Nov 30 10:40:21 tgpro1 kernel: [<0007d767>] kthread+0x62/0x67
Nov 30 10:40:21 tgpro1 kernel: [<00530b26>] kernel_thread_helper+0x6/0xd
Nov 30 10:40:21 tgpro1 kernel:
Nov 30 10:40:21 tgpro1 kernel: other info that might help us debug this:
Nov 30 10:40:21 tgpro1 kernel:
Nov 30 10:40:21 tgpro1 kernel: Possible unsafe locking scenario:
Nov 30 10:40:21 tgpro1 kernel:
Nov 30 10:40:21 tgpro1 kernel: CPU0 CPU1
Nov 30 10:40:21 tgpro1 kernel: ---- ----
Nov 30 10:40:21 tgpro1 kernel: lock(&(&s->s_cap_lock)->rlock);
Nov 30 10:40:21 tgpro1 kernel: lock(&sb->s_type->i_lock_key);
Nov 30 10:40:21 tgpro1 kernel: lock(&(&s->s_cap_lock)->rlock);
Nov 30 10:40:21 tgpro1 kernel: lock(&sb->s_type->i_lock_key);
Nov 30 10:40:21 tgpro1 kernel:
Nov 30 10:40:21 tgpro1 kernel: *** DEADLOCK ***
Nov 30 10:40:21 tgpro1 kernel:
Nov 30 10:40:21 tgpro1 kernel: 5 locks held by kworker/0:0/8787:
Nov 30 10:40:21 tgpro1 kernel: #0: (ceph-msgr){.+.+.+}, at: [<0007a1ff>] process_one_work+0x1d7/0x391
Nov 30 10:40:21 tgpro1 kernel: #1: ((&(&con->work)->work)){+.+.+.}, at: [<0007a1ff>] process_one_work+0x1d7/0x391
Nov 30 10:40:21 tgpro1 kernel: #2: (&s->s_mutex){+.+.+.}, at: [<006e3954>] send_mds_reconnect+0xa3/0x46a [ceph]
Nov 30 10:40:21 tgpro1 kernel: #3: (&mdsc->snap_rwsem){+++++.}, at: [<006e3a4c>] send_mds_reconnect+0x19b/0x46a [ceph]
Nov 30 10:40:21 tgpro1 kernel: #4: (&(&s->s_cap_lock)->rlock){+.+...}, at: [<006e0de4>] iterate_session_caps+0x40/0x17d [ceph]
Nov 30 10:40:21 tgpro1 kernel:
Nov 30 10:40:21 tgpro1 kernel: stack backtrace:
Nov 30 10:40:21 tgpro1 kernel: Pid: 8787, comm: kworker/0:0 Tainted: G W 3.1.3-rsbac #1
Nov 30 10:40:21 tgpro1 kernel: Call Trace:
Nov 30 10:40:21 tgpro1 kernel: [<0008e776>] print_circular_bug+0x21a/0x227
Nov 30 10:40:21 tgpro1 kernel: [<0008fa3e>] __lock_acquire+0xe79/0x1425
Nov 30 10:40:21 tgpro1 kernel: [<0009038d>] lock_acquire+0x42/0x59
Nov 30 10:40:21 tgpro1 kernel: [<000f7176>] ? igrab+0x11/0x41
Nov 30 10:40:21 tgpro1 kernel: [<0052f167>] _raw_spin_lock+0x24/0x33
Nov 30 10:40:21 tgpro1 kernel: [<000f7176>] ? igrab+0x11/0x41
Nov 30 10:40:21 tgpro1 kernel: [<000f7176>] igrab+0x11/0x41
Nov 30 10:40:21 tgpro1 kernel: [<006e0e14>] iterate_session_caps+0x70/0x17d [ceph]
Nov 30 10:40:21 tgpro1 kernel: [<006e5488>] ? ceph_mdsc_submit_request+0x55/0x55 [ceph]
Nov 30 10:40:21 tgpro1 kernel: [<006a4d03>] ? ceph_pagelist_append+0xbc/0xf9 [libceph]
Nov 30 10:40:21 tgpro1 kernel: [<006e3bca>] send_mds_reconnect+0x319/0x46a [ceph]
Nov 30 10:40:21 tgpro1 kernel: [<0008daa1>] ? trace_hardirqs_on_caller+0x10b/0x13c
Nov 30 10:40:21 tgpro1 kernel: [<0008dadd>] ? trace_hardirqs_on+0xb/0xd
Nov 30 10:40:21 tgpro1 kernel: [<006e4035>] ceph_mdsc_handle_map+0x2f2/0x42d [ceph]
Nov 30 10:40:21 tgpro1 kernel: [<000d7cd3>] ? __kmalloc+0x10d/0x137
Nov 30 10:40:21 tgpro1 kernel: [<00c40015>] ? 0xc40014
Nov 30 10:40:21 tgpro1 kernel: [<006ce00d>] extra_mon_dispatch+0x18/0x1f [ceph]
Nov 30 10:40:21 tgpro1 kernel: [<006a63dd>] dispatch+0x4f7/0x52a [libceph]
Nov 30 10:40:21 tgpro1 kernel: [<0008dadd>] ? trace_hardirqs_on+0xb/0xd
Nov 30 10:40:21 tgpro1 kernel: [<006a444d>] con_work+0x14f7/0x16c6 [libceph]
Nov 30 10:40:21 tgpro1 kernel: [<0007a24b>] process_one_work+0x223/0x391
Nov 30 10:40:21 tgpro1 kernel: [<0007a1ff>] ? process_one_work+0x1d7/0x391
Nov 30 10:40:21 tgpro1 kernel: [<006a2f56>] ? ceph_fault+0x262/0x262 [libceph]
Nov 30 10:40:21 tgpro1 kernel: [<0007a747>] worker_thread+0x182/0x2d7
Nov 30 10:40:21 tgpro1 kernel: [<00002a00>] ? __show_regs+0x6f/0x6f
Nov 30 10:40:21 tgpro1 kernel: [<00002a00>] ? __show_regs+0x6f/0x6f
Nov 30 10:40:21 tgpro1 kernel: [<0007a5c5>] ? rescuer_thread+0x20c/0x20c
Nov 30 10:40:21 tgpro1 kernel: [<0007d767>] kthread+0x62/0x67
Nov 30 10:40:21 tgpro1 kernel: [<0007d705>] ? __init_kthread_worker+0x42/0x42
Nov 30 10:40:21 tgpro1 kernel: [<00530b26>] kernel_thread_helper+0x6/0xd
Nov 30 10:40:26 tgpro1 kernel: ceph: mds0 reconnect success
Nov 30 10:40:40 tgpro1 kernel: ceph: mds0 recovery completed
[-- Attachment #3: circ2.log --]
[-- Type: text/x-log, Size: 7686 bytes --]
Nov 30 10:50:53 tgpro1 kernel: libceph: loaded (mon/osd proto 15/24, osdmap 5/6 5/6)
Nov 30 10:50:53 tgpro1 kernel: ceph: loaded (mds proto 32)
Nov 30 10:50:53 tgpro1 kernel: libceph: client6597 fsid 520649ba-50cc-2c58-7db3-4b30d3bb97d3
Nov 30 10:50:53 tgpro1 kernel: libceph: mon0 192.168.111.1:6789 session established
Nov 30 10:56:43 tgpro1 kernel: ceph: mds0 caps stale
Nov 30 10:57:03 tgpro1 kernel: ceph: mds0 caps stale
Nov 30 10:58:24 tgpro1 kernel: ceph: mds0 reconnect start
Nov 30 10:58:24 tgpro1 kernel:
Nov 30 10:58:24 tgpro1 kernel: =======================================================
Nov 30 10:58:24 tgpro1 kernel: [ INFO: possible circular locking dependency detected ]
Nov 30 10:58:24 tgpro1 kernel: 3.1.4-rsbac #1
Nov 30 10:58:24 tgpro1 kernel: -------------------------------------------------------
Nov 30 10:58:24 tgpro1 kernel: kworker/0:0/4 is trying to acquire lock:
Nov 30 10:58:24 tgpro1 kernel: (&sb->s_type->i_lock_key#16){+.+...}, at: [<000f7166>] igrab+0x11/0x41
Nov 30 10:58:24 tgpro1 kernel:
Nov 30 10:58:24 tgpro1 kernel: but task is already holding lock:
Nov 30 10:58:24 tgpro1 kernel: (&(&s->s_cap_lock)->rlock){+.+...}, at: [<006b2de4>] iterate_session_caps+0x40/0x17d [ceph]
Nov 30 10:58:24 tgpro1 kernel:
Nov 30 10:58:24 tgpro1 kernel: which lock already depends on the new lock.
Nov 30 10:58:24 tgpro1 kernel:
Nov 30 10:58:24 tgpro1 kernel:
Nov 30 10:58:24 tgpro1 kernel: the existing dependency chain (in reverse order) is:
Nov 30 10:58:24 tgpro1 kernel:
Nov 30 10:58:24 tgpro1 kernel: -> #1 (&(&s->s_cap_lock)->rlock){+.+...}:
Nov 30 10:58:24 tgpro1 kernel: [<0009038d>] lock_acquire+0x42/0x59
Nov 30 10:58:24 tgpro1 kernel: [<0052f147>] _raw_spin_lock+0x24/0x33
Nov 30 10:58:24 tgpro1 kernel: [<006ae0bc>] ceph_add_cap+0x302/0x551 [ceph]
Nov 30 10:58:24 tgpro1 kernel: [<006a1bfd>] fill_inode+0x5ef/0x72e [ceph]
Nov 30 10:58:24 tgpro1 kernel: [<006a2d6b>] ceph_fill_trace+0x663/0x6c5 [ceph]
Nov 30 10:58:24 tgpro1 kernel: [<006b6c6e>] dispatch+0xafe/0x10e7 [ceph]
Nov 30 10:58:24 tgpro1 kernel: [<0067644d>] con_work+0x14f7/0x16c6 [libceph]
Nov 30 10:58:24 tgpro1 kernel: [<0007a24b>] process_one_work+0x223/0x391
Nov 30 10:58:24 tgpro1 kernel: [<0007a747>] worker_thread+0x182/0x2d7
Nov 30 10:58:24 tgpro1 kernel: [<0007d767>] kthread+0x62/0x67
Nov 30 10:58:24 tgpro1 kernel: [<00530b06>] kernel_thread_helper+0x6/0xd
Nov 30 10:58:24 tgpro1 kernel:
Nov 30 10:58:24 tgpro1 kernel: -> #0 (&sb->s_type->i_lock_key#16){+.+...}:
Nov 30 10:58:24 tgpro1 kernel: [<0008fa3e>] __lock_acquire+0xe79/0x1425
Nov 30 10:58:24 tgpro1 kernel: [<0009038d>] lock_acquire+0x42/0x59
Nov 30 10:58:24 tgpro1 kernel: [<0052f147>] _raw_spin_lock+0x24/0x33
Nov 30 10:58:24 tgpro1 kernel: [<000f7166>] igrab+0x11/0x41
Nov 30 10:58:24 tgpro1 kernel: [<006b2e14>] iterate_session_caps+0x70/0x17d [ceph]
Nov 30 10:58:24 tgpro1 kernel: [<006b5bca>] send_mds_reconnect+0x319/0x46a [ceph]
Nov 30 10:58:24 tgpro1 kernel: [<006b6035>] ceph_mdsc_handle_map+0x2f2/0x42d [ceph]
Nov 30 10:58:24 tgpro1 kernel: [<006a000d>] extra_mon_dispatch+0x18/0x1f [ceph]
Nov 30 10:58:24 tgpro1 kernel: [<006783dd>] dispatch+0x4f7/0x52a [libceph]
Nov 30 10:58:24 tgpro1 kernel: [<0067644d>] con_work+0x14f7/0x16c6 [libceph]
Nov 30 10:58:24 tgpro1 kernel: [<0007a24b>] process_one_work+0x223/0x391
Nov 30 10:58:24 tgpro1 kernel: [<0007a747>] worker_thread+0x182/0x2d7
Nov 30 10:58:25 tgpro1 kernel: [<0007d767>] kthread+0x62/0x67
Nov 30 10:58:25 tgpro1 kernel: [<00530b06>] kernel_thread_helper+0x6/0xd
Nov 30 10:58:25 tgpro1 kernel:
Nov 30 10:58:25 tgpro1 kernel: other info that might help us debug this:
Nov 30 10:58:25 tgpro1 kernel:
Nov 30 10:58:25 tgpro1 kernel: Possible unsafe locking scenario:
Nov 30 10:58:25 tgpro1 kernel:
Nov 30 10:58:25 tgpro1 kernel: CPU0 CPU1
Nov 30 10:58:25 tgpro1 kernel: ---- ----
Nov 30 10:58:25 tgpro1 kernel: lock(&(&s->s_cap_lock)->rlock);
Nov 30 10:58:25 tgpro1 kernel: lock(&sb->s_type->i_lock_key);
Nov 30 10:58:25 tgpro1 kernel: lock(&(&s->s_cap_lock)->rlock);
Nov 30 10:58:25 tgpro1 kernel: lock(&sb->s_type->i_lock_key);
Nov 30 10:58:25 tgpro1 kernel:
Nov 30 10:58:25 tgpro1 kernel: *** DEADLOCK ***
Nov 30 10:58:25 tgpro1 kernel:
Nov 30 10:58:25 tgpro1 kernel: 5 locks held by kworker/0:0/4:
Nov 30 10:58:25 tgpro1 kernel: #0: (ceph-msgr){.+.+.+}, at: [<0007a1ff>] process_one_work+0x1d7/0x391
Nov 30 10:58:25 tgpro1 kernel: #1: ((&(&con->work)->work)){+.+.+.}, at: [<0007a1ff>] process_one_work+0x1d7/0x391
Nov 30 10:58:25 tgpro1 kernel: #2: (&s->s_mutex){+.+.+.}, at: [<006b5954>] send_mds_reconnect+0xa3/0x46a [ceph]
Nov 30 10:58:25 tgpro1 kernel: #3: (&mdsc->snap_rwsem){+++++.}, at: [<006b5a4c>] send_mds_reconnect+0x19b/0x46a [ceph]
Nov 30 10:58:25 tgpro1 kernel: #4: (&(&s->s_cap_lock)->rlock){+.+...}, at: [<006b2de4>] iterate_session_caps+0x40/0x17d [ceph]
Nov 30 10:58:25 tgpro1 kernel:
Nov 30 10:58:25 tgpro1 kernel: stack backtrace:
Nov 30 10:58:25 tgpro1 kernel: Pid: 4, comm: kworker/0:0 Tainted: G W 3.1.4-rsbac #1
Nov 30 10:58:25 tgpro1 kernel: Call Trace:
Nov 30 10:58:25 tgpro1 kernel: [<0008e776>] print_circular_bug+0x21a/0x227
Nov 30 10:58:25 tgpro1 kernel: [<0008fa3e>] __lock_acquire+0xe79/0x1425
Nov 30 10:58:25 tgpro1 kernel: [<0009038d>] lock_acquire+0x42/0x59
Nov 30 10:58:25 tgpro1 kernel: [<000f7166>] ? igrab+0x11/0x41
Nov 30 10:58:25 tgpro1 kernel: [<0052f147>] _raw_spin_lock+0x24/0x33
Nov 30 10:58:25 tgpro1 kernel: [<000f7166>] ? igrab+0x11/0x41
Nov 30 10:58:25 tgpro1 kernel: [<000f7166>] igrab+0x11/0x41
Nov 30 10:58:25 tgpro1 kernel: [<006b2e14>] iterate_session_caps+0x70/0x17d [ceph]
Nov 30 10:58:25 tgpro1 kernel: [<006b7488>] ? ceph_mdsc_submit_request+0x55/0x55 [ceph]
Nov 30 10:58:25 tgpro1 kernel: [<00676d03>] ? ceph_pagelist_append+0xbc/0xf9 [libceph]
Nov 30 10:58:25 tgpro1 kernel: [<006b5bca>] send_mds_reconnect+0x319/0x46a [ceph]
Nov 30 10:58:25 tgpro1 kernel: [<0008daa1>] ? trace_hardirqs_on_caller+0x10b/0x13c
Nov 30 10:58:25 tgpro1 kernel: [<0008dadd>] ? trace_hardirqs_on+0xb/0xd
Nov 30 10:58:25 tgpro1 kernel: [<006b6035>] ceph_mdsc_handle_map+0x2f2/0x42d [ceph]
Nov 30 10:58:25 tgpro1 kernel: [<000d7cd3>] ? __kmalloc+0x10d/0x137
Nov 30 10:58:25 tgpro1 kernel: [<00c40015>] ? 0xc40014
Nov 30 10:58:25 tgpro1 kernel: [<006a000d>] extra_mon_dispatch+0x18/0x1f [ceph]
Nov 30 10:58:25 tgpro1 kernel: [<006783dd>] dispatch+0x4f7/0x52a [libceph]
Nov 30 10:58:25 tgpro1 kernel: [<0008dadd>] ? trace_hardirqs_on+0xb/0xd
Nov 30 10:58:25 tgpro1 kernel: [<0067644d>] con_work+0x14f7/0x16c6 [libceph]
Nov 30 10:58:25 tgpro1 kernel: [<0007a24b>] process_one_work+0x223/0x391
Nov 30 10:58:25 tgpro1 kernel: [<0007a1ff>] ? process_one_work+0x1d7/0x391
Nov 30 10:58:25 tgpro1 kernel: [<00674f56>] ? ceph_fault+0x262/0x262 [libceph]
Nov 30 10:58:25 tgpro1 kernel: [<0007a747>] worker_thread+0x182/0x2d7
Nov 30 10:58:25 tgpro1 kernel: [<00002a00>] ? __show_regs+0x6f/0x6f
Nov 30 10:58:25 tgpro1 kernel: [<00002a00>] ? __show_regs+0x6f/0x6f
Nov 30 10:58:25 tgpro1 kernel: [<0007a5c5>] ? rescuer_thread+0x20c/0x20c
Nov 30 10:58:25 tgpro1 kernel: [<0007d767>] kthread+0x62/0x67
Nov 30 10:58:25 tgpro1 kernel: [<0007d705>] ? __init_kthread_worker+0x42/0x42
Nov 30 10:58:25 tgpro1 kernel: [<00530b06>] kernel_thread_helper+0x6/0xd
Nov 30 10:58:26 tgpro1 kernel: ceph: mds0 reconnect success
Nov 30 10:58:30 tgpro1 kernel: ceph: mds0 recovery completed
[-- Attachment #4: circ3.log --]
[-- Type: text/x-log, Size: 6696 bytes --]
Nov 30 13:02:05 tgpro1 kernel:
Nov 30 13:02:05 tgpro1 kernel: =======================================================
Nov 30 13:02:05 tgpro1 kernel: [ INFO: possible circular locking dependency detected ]
Nov 30 13:02:05 tgpro1 kernel: 3.1.4-rsbac #1
Nov 30 13:02:05 tgpro1 kernel: -------------------------------------------------------
Nov 30 13:02:05 tgpro1 kernel: kworker/0:1/22099 is trying to acquire lock:
Nov 30 13:02:05 tgpro1 kernel: (&sb->s_type->i_lock_key#16){+.+...}, at: [<000ff721>] igrab+0x11/0x41
Nov 30 13:02:05 tgpro1 kernel:
Nov 30 13:02:05 tgpro1 kernel: but task is already holding lock:
Nov 30 13:02:05 tgpro1 kernel: (&(&realm->inodes_with_caps_lock)->rlock){+.+...}, at: [<006ce9b4>] ceph_update_snap_trace+0x258/0x3bc [ceph]
Nov 30 13:02:05 tgpro1 kernel:
Nov 30 13:02:05 tgpro1 kernel: which lock already depends on the new lock.
Nov 30 13:02:05 tgpro1 kernel:
Nov 30 13:02:05 tgpro1 kernel:
Nov 30 13:02:05 tgpro1 kernel: the existing dependency chain (in reverse order) is:
Nov 30 13:02:05 tgpro1 kernel:
Nov 30 13:02:05 tgpro1 kernel: -> #1 (&(&realm->inodes_with_caps_lock)->rlock){+.+...}:
Nov 30 13:02:05 tgpro1 kernel: [<00091456>] lock_acquire+0x42/0x59
Nov 30 13:02:05 tgpro1 kernel: [<0053f187>] _raw_spin_lock+0x24/0x33
Nov 30 13:02:05 tgpro1 kernel: [<006cc0a3>] ceph_add_cap+0x37d/0x554 [ceph]
Nov 30 13:02:05 tgpro1 kernel: [<006bfbf7>] fill_inode+0x5ee/0x72d [ceph]
Nov 30 13:02:05 tgpro1 kernel: [<006c0d61>] ceph_fill_trace+0x663/0x6c5 [ceph]
Nov 30 13:02:05 tgpro1 kernel: [<006d4bd4>] dispatch+0xaff/0x10e9 [ceph]
Nov 30 13:02:05 tgpro1 kernel: [<00694459>] con_work+0x14f4/0x16c3 [libceph]
Nov 30 13:02:05 tgpro1 kernel: [<0007b331>] process_one_work+0x229/0x397
Nov 30 13:02:05 tgpro1 kernel: [<0007b80e>] worker_thread+0x182/0x2d8
Nov 30 13:02:05 tgpro1 kernel: [<0007e843>] kthread+0x62/0x67
Nov 30 13:02:05 tgpro1 kernel: [<00540c86>] kernel_thread_helper+0x6/0xd
Nov 30 13:02:05 tgpro1 kernel:
Nov 30 13:02:05 tgpro1 kernel: -> #0 (&sb->s_type->i_lock_key#16){+.+...}:
Nov 30 13:02:05 tgpro1 kernel: [<00090b06>] __lock_acquire+0xe7b/0x1428
Nov 30 13:02:05 tgpro1 kernel: [<00091456>] lock_acquire+0x42/0x59
Nov 30 13:02:05 tgpro1 kernel: [<0053f187>] _raw_spin_lock+0x24/0x33
Nov 30 13:02:05 tgpro1 kernel: [<000ff721>] igrab+0x11/0x41
Nov 30 13:02:05 tgpro1 kernel: [<006ce9d3>] ceph_update_snap_trace+0x277/0x3bc [ceph]
Nov 30 13:02:05 tgpro1 kernel: [<006cee4e>] ceph_handle_snap+0x336/0x458 [ceph]
Nov 30 13:02:05 tgpro1 kernel: [<006d4e7d>] dispatch+0xda8/0x10e9 [ceph]
Nov 30 13:02:05 tgpro1 kernel: [<00694459>] con_work+0x14f4/0x16c3 [libceph]
Nov 30 13:02:05 tgpro1 kernel: [<0007b331>] process_one_work+0x229/0x397
Nov 30 13:02:05 tgpro1 kernel: [<0007b80e>] worker_thread+0x182/0x2d8
Nov 30 13:02:05 tgpro1 kernel: [<0007e843>] kthread+0x62/0x67
Nov 30 13:02:05 tgpro1 kernel: [<00540c86>] kernel_thread_helper+0x6/0xd
Nov 30 13:02:05 tgpro1 kernel:
Nov 30 13:02:05 tgpro1 kernel: other info that might help us debug this:
Nov 30 13:02:05 tgpro1 kernel:
Nov 30 13:02:05 tgpro1 kernel: Possible unsafe locking scenario:
Nov 30 13:02:05 tgpro1 kernel:
Nov 30 13:02:05 tgpro1 kernel: CPU0 CPU1
Nov 30 13:02:05 tgpro1 kernel: ---- ----
Nov 30 13:02:05 tgpro1 kernel: lock(&(&realm->inodes_with_caps_lock)->rlock);
Nov 30 13:02:05 tgpro1 kernel: lock(&sb->s_type->i_lock_key);
Nov 30 13:02:05 tgpro1 kernel: lock(&(&realm->inodes_with_caps_lock)->rlock);
Nov 30 13:02:05 tgpro1 kernel: lock(&sb->s_type->i_lock_key);
Nov 30 13:02:05 tgpro1 kernel:
Nov 30 13:02:05 tgpro1 kernel: *** DEADLOCK ***
Nov 30 13:02:05 tgpro1 kernel:
Nov 30 13:02:05 tgpro1 kernel: 4 locks held by kworker/0:1/22099:
Nov 30 13:02:05 tgpro1 kernel: #0: (ceph-msgr){++++.+}, at: [<0007b2e5>] process_one_work+0x1dd/0x397
Nov 30 13:02:05 tgpro1 kernel: #1: ((&(&con->work)->work)){+.+.+.}, at: [<0007b2e5>] process_one_work+0x1dd/0x397
Nov 30 13:02:05 tgpro1 kernel: #2: (&mdsc->snap_rwsem){+++++.}, at: [<006cebe5>] ceph_handle_snap+0xcd/0x458 [ceph]
Nov 30 13:02:05 tgpro1 kernel: #3: (&(&realm->inodes_with_caps_lock)->rlock){+.+...}, at: [<006ce9b4>] ceph_update_snap_trace+0x258/0x3bc [ceph]
Nov 30 13:02:05 tgpro1 kernel:
Nov 30 13:02:05 tgpro1 kernel: stack backtrace:
Nov 30 13:02:05 tgpro1 kernel: Pid: 22099, comm: kworker/0:1 Tainted: G W 3.1.4-rsbac #1
Nov 30 13:02:05 tgpro1 kernel: Call Trace:
Nov 30 13:02:05 tgpro1 kernel: [<0008f840>] print_circular_bug+0x219/0x226
Nov 30 13:02:05 tgpro1 kernel: [<00090b06>] __lock_acquire+0xe7b/0x1428
Nov 30 13:02:05 tgpro1 kernel: [<00090b54>] ? __lock_acquire+0xec9/0x1428
Nov 30 13:02:05 tgpro1 kernel: [<00091456>] lock_acquire+0x42/0x59
Nov 30 13:02:05 tgpro1 kernel: [<000ff721>] ? igrab+0x11/0x41
Nov 30 13:02:05 tgpro1 kernel: [<0053f187>] _raw_spin_lock+0x24/0x33
Nov 30 13:02:05 tgpro1 kernel: [<000ff721>] ? igrab+0x11/0x41
Nov 30 13:02:05 tgpro1 kernel: [<000ff721>] igrab+0x11/0x41
Nov 30 13:02:05 tgpro1 kernel: [<006ce9d3>] ceph_update_snap_trace+0x277/0x3bc [ceph]
Nov 30 13:02:05 tgpro1 kernel: [<006cee4e>] ceph_handle_snap+0x336/0x458 [ceph]
Nov 30 13:02:05 tgpro1 kernel: [<007f0312>] ? 0x7f0311
Nov 30 13:02:05 tgpro1 kernel: [<006d4e7d>] dispatch+0xda8/0x10e9 [ceph]
Nov 30 13:02:05 tgpro1 kernel: [<0048738c>] ? kernel_recvmsg+0x2a/0x34
Nov 30 13:02:05 tgpro1 kernel: [<0053dbb6>] ? __mutex_unlock_slowpath+0xdf/0xf8
Nov 30 13:02:05 tgpro1 kernel: [<0008eb6f>] ? trace_hardirqs_on_caller+0x10b/0x13c
Nov 30 13:02:05 tgpro1 kernel: [<0008ebab>] ? trace_hardirqs_on+0xb/0xd
Nov 30 13:02:05 tgpro1 kernel: [<00694459>] con_work+0x14f4/0x16c3 [libceph]
Nov 30 13:02:05 tgpro1 kernel: [<0000784e>] ? text_poke+0x79/0x95
Nov 30 13:02:05 tgpro1 kernel: [<0007b331>] process_one_work+0x229/0x397
Nov 30 13:02:05 tgpro1 kernel: [<0007b2e5>] ? process_one_work+0x1dd/0x397
Nov 30 13:02:05 tgpro1 kernel: [<00692f65>] ? ceph_fault+0x262/0x262 [libceph]
Nov 30 13:02:05 tgpro1 kernel: [<0007b80e>] worker_thread+0x182/0x2d8
Nov 30 13:02:05 tgpro1 kernel: [<00002b40>] ? setup_sigcontext+0x24/0x24
Nov 30 13:02:05 tgpro1 kernel: [<00002b40>] ? setup_sigcontext+0x24/0x24
Nov 30 13:02:05 tgpro1 kernel: [<0007b68c>] ? rescuer_thread+0x1ed/0x1ed
Nov 30 13:02:05 tgpro1 kernel: [<0007e843>] kthread+0x62/0x67
Nov 30 13:02:05 tgpro1 kernel: [<0007e7e1>] ? __init_kthread_worker+0x42/0x42
Nov 30 13:02:05 tgpro1 kernel: [<00540c86>] kernel_thread_helper+0x6/0xd
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Circular lock / deadlock in kernel client
2011-11-30 12:21 Circular lock / deadlock in kernel client Amon Ott
@ 2011-11-30 17:20 ` Sage Weil
2011-11-30 17:55 ` Sage Weil
0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2011-11-30 17:20 UTC (permalink / raw)
To: Amon Ott; +Cc: ceph-devel
On Wed, 30 Nov 2011, Amon Ott wrote:
> Hi!
>
> With some kernel debug options for soft and hard lockup detection, I got some
> fine traces. My kernel is a 3.1.4 to which I have ported from ceph-client
> for-linus branch what is suitable for 3.1. If needed, I can make my exact
> ceph code available.
>
> Traces are attached. It seems that two depending locks can be acquired in
> different order at different parts of the code, and thus lead to a deadlock.
> Additionally, I am still trying to reproduce a partial lockup of single dirs
> with this debugging. Those are likely to be related to mutex locking dirs
> without unlocking properly.
Thanks, put these in the tracker at
http://tracker.newdream.net/issues/1762.
sage
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Circular lock / deadlock in kernel client
2011-11-30 17:20 ` Sage Weil
@ 2011-11-30 17:55 ` Sage Weil
2011-12-01 8:16 ` Amon Ott
0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2011-11-30 17:55 UTC (permalink / raw)
To: Amon Ott; +Cc: ceph-devel
Hi Amon,
On Wed, 30 Nov 2011, Sage Weil wrote:
> On Wed, 30 Nov 2011, Amon Ott wrote:
> > Hi!
> >
> > With some kernel debug options for soft and hard lockup detection, I got some
> > fine traces. My kernel is a 3.1.4 to which I have ported from ceph-client
> > for-linus branch what is suitable for 3.1. If needed, I can make my exact
> > ceph code available.
> >
> > Traces are attached. It seems that two depending locks can be acquired in
> > different order at different parts of the code, and thus lead to a deadlock.
> > Additionally, I am still trying to reproduce a partial lockup of single dirs
> > with this debugging. Those are likely to be related to mutex locking dirs
> > without unlocking properly.
>
> Thanks, put these in the tracker at
> http://tracker.newdream.net/issues/1762.
I pushed a wip-i-ceph-lock branch to ceph-client.git that replaces our
(ab?)use of i_lock with a new i_ceph_lock in the ceph inode. This avoids
being bitten by the lock ordering constraint imposed by igrab(), which
requires i_lock to safely take a reference to an inode without racing with
inode destruction. This lets us keep two inode list locks logically
ordered inside i_ceph_lock (with i_lock as an inner lock).
I did some very basic testing and it didn't blow up. If can give it a
try, that would be very helpful.
Also, we need to enable lockdep in our qa environment and make sure
teuthology is erroring out on lockdep warnings. (#1763)
Thanks!
sage
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Circular lock / deadlock in kernel client
2011-11-30 17:55 ` Sage Weil
@ 2011-12-01 8:16 ` Amon Ott
2011-12-01 16:12 ` Sage Weil
0 siblings, 1 reply; 11+ messages in thread
From: Amon Ott @ 2011-12-01 8:16 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On Wednesday 30 November 2011 wrote Sage Weil:
> I pushed a wip-i-ceph-lock branch to ceph-client.git that replaces our
> (ab?)use of i_lock with a new i_ceph_lock in the ceph inode. This avoids
> being bitten by the lock ordering constraint imposed by igrab(), which
> requires i_lock to safely take a reference to an inode without racing with
> inode destruction. This lets us keep two inode list locks logically
> ordered inside i_ceph_lock (with i_lock as an inner lock).
I see the branch, but there is nothing new in it. Is that the right location?
Maybe forgot to push?
https://github.com/NewDreamNetwork/ceph-client/tree/wip-i-ceph-lock
Amon Ott
--
Dr. Amon Ott
m-privacy GmbH Tel: +49 30 24342334
Am Köllnischen Park 1 Fax: +49 30 24342336
10179 Berlin http://www.m-privacy.de
Amtsgericht Charlottenburg, HRB 84946
Geschäftsführer:
Dipl.-Kfm. Holger Maczkowsky,
Roman Maczkowsky
GnuPG-Key-ID: 0x2DD3A649
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Circular lock / deadlock in kernel client
2011-12-01 8:16 ` Amon Ott
@ 2011-12-01 16:12 ` Sage Weil
2011-12-01 16:24 ` Amon Ott
0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2011-12-01 16:12 UTC (permalink / raw)
To: Amon Ott; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1350 bytes --]
On Thu, 1 Dec 2011, Amon Ott wrote:
> On Wednesday 30 November 2011 wrote Sage Weil:
> > I pushed a wip-i-ceph-lock branch to ceph-client.git that replaces our
> > (ab?)use of i_lock with a new i_ceph_lock in the ceph inode. This avoids
> > being bitten by the lock ordering constraint imposed by igrab(), which
> > requires i_lock to safely take a reference to an inode without racing with
> > inode destruction. This lets us keep two inode list locks logically
> > ordered inside i_ceph_lock (with i_lock as an inner lock).
>
> I see the branch, but there is nothing new in it. Is that the right location?
> Maybe forgot to push?
>
> https://github.com/NewDreamNetwork/ceph-client/tree/wip-i-ceph-lock
Oops, I pushed the wrong branch. It's there now.
Sorry!
sage
>
> Amon Ott
> --
> Dr. Amon Ott
> m-privacy GmbH Tel: +49 30 24342334
> Am Köllnischen Park 1 Fax: +49 30 24342336
> 10179 Berlin http://www.m-privacy.de
>
> Amtsgericht Charlottenburg, HRB 84946
>
> Geschäftsführer:
> Dipl.-Kfm. Holger Maczkowsky,
> Roman Maczkowsky
>
> GnuPG-Key-ID: 0x2DD3A649
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Circular lock / deadlock in kernel client
2011-12-01 16:12 ` Sage Weil
@ 2011-12-01 16:24 ` Amon Ott
2011-12-06 13:21 ` Amon Ott
0 siblings, 1 reply; 11+ messages in thread
From: Amon Ott @ 2011-12-01 16:24 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On Thursday 01 December 2011 wrote Sage Weil:
> On Thu, 1 Dec 2011, Amon Ott wrote:
> > On Wednesday 30 November 2011 wrote Sage Weil:
> > > I pushed a wip-i-ceph-lock branch to ceph-client.git that replaces our
> > > (ab?)use of i_lock with a new i_ceph_lock in the ceph inode. This
> > > avoids being bitten by the lock ordering constraint imposed by igrab(),
> > > which requires i_lock to safely take a reference to an inode without
> > > racing with inode destruction. This lets us keep two inode list locks
> > > logically ordered inside i_ceph_lock (with i_lock as an inner lock).
> >
> > I see the branch, but there is nothing new in it. Is that the right
> > location? Maybe forgot to push?
> >
> > https://github.com/NewDreamNetwork/ceph-client/tree/wip-i-ceph-lock
>
> Oops, I pushed the wrong branch. It's there now.
Got the new commit, but it does not apply cleanly to my tree. I will try to
get it merged tomorrow and retry. I cannot use kernel 3.2-pre here, the rest
of the system needs to be stable.
Interesting note: Even though I got the deadlock message in syslog, the system
worked fine for another few hours, until all the MDS processes died (see my
other mail).
Amon Ott
--
Dr. Amon Ott
m-privacy GmbH Tel: +49 30 24342334
Am Köllnischen Park 1 Fax: +49 30 24342336
10179 Berlin http://www.m-privacy.de
Amtsgericht Charlottenburg, HRB 84946
Geschäftsführer:
Dipl.-Kfm. Holger Maczkowsky,
Roman Maczkowsky
GnuPG-Key-ID: 0x2DD3A649
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Circular lock / deadlock in kernel client
2011-12-01 16:24 ` Amon Ott
@ 2011-12-06 13:21 ` Amon Ott
2011-12-07 15:20 ` Amon Ott
0 siblings, 1 reply; 11+ messages in thread
From: Amon Ott @ 2011-12-06 13:21 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On Thursday 01 December 2011 wrote Amon Ott:
> On Thursday 01 December 2011 wrote Sage Weil:
> > On Thu, 1 Dec 2011, Amon Ott wrote:
> > > On Wednesday 30 November 2011 wrote Sage Weil:
> > > > I pushed a wip-i-ceph-lock branch to ceph-client.git that replaces
> > > > our (ab?)use of i_lock with a new i_ceph_lock in the ceph inode.
> > > > This avoids being bitten by the lock ordering constraint imposed by
> > > > igrab(), which requires i_lock to safely take a reference to an inode
> > > > without racing with inode destruction. This lets us keep two inode
> > > > list locks logically ordered inside i_ceph_lock (with i_lock as an
> > > > inner lock).
> > >
> > > I see the branch, but there is nothing new in it. Is that the right
> > > location? Maybe forgot to push?
> > >
> > > https://github.com/NewDreamNetwork/ceph-client/tree/wip-i-ceph-lock
> >
> > Oops, I pushed the wrong branch. It's there now.
>
> Got the new commit, but it does not apply cleanly to my tree. I will try to
> get it merged tomorrow and retry. I cannot use kernel 3.2-pre here, the
> rest of the system needs to be stable.
Merged in and bug seems to be fixed. No more deadlock warnings today.
Amon Ott
--
Dr. Amon Ott
m-privacy GmbH Tel: +49 30 24342334
Am Köllnischen Park 1 Fax: +49 30 24342336
10179 Berlin http://www.m-privacy.de
Amtsgericht Charlottenburg, HRB 84946
Geschäftsführer:
Dipl.-Kfm. Holger Maczkowsky,
Roman Maczkowsky
GnuPG-Key-ID: 0x2DD3A649
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Circular lock / deadlock in kernel client
2011-12-06 13:21 ` Amon Ott
@ 2011-12-07 15:20 ` Amon Ott
2011-12-07 16:33 ` Sage Weil
0 siblings, 1 reply; 11+ messages in thread
From: Amon Ott @ 2011-12-07 15:20 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
[-- Attachment #1: Type: text/plain, Size: 541 bytes --]
On Tuesday 06 December 2011 wrote Amon Ott:
> Merged in and bug seems to be fixed. No more deadlock warnings today.
Unfortunately, I got another deadlock message in the log today. Full log of
one boot time is attached.
Amon Ott
--
Dr. Amon Ott
m-privacy GmbH Tel: +49 30 24342334
Am Köllnischen Park 1 Fax: +49 30 24342336
10179 Berlin http://www.m-privacy.de
Amtsgericht Charlottenburg, HRB 84946
Geschäftsführer:
Dipl.-Kfm. Holger Maczkowsky,
Roman Maczkowsky
GnuPG-Key-ID: 0x2DD3A649
[-- Attachment #2: circ4.log.bz2 --]
[-- Type: application/x-bzip2, Size: 14561 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Circular lock / deadlock in kernel client
2011-12-07 15:20 ` Amon Ott
@ 2011-12-07 16:33 ` Sage Weil
2011-12-08 9:58 ` Amon Ott
0 siblings, 1 reply; 11+ messages in thread
From: Sage Weil @ 2011-12-07 16:33 UTC (permalink / raw)
To: Amon Ott; +Cc: ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1033 bytes --]
I pushed a patch to wip-d-lock that may fix this one, but unfortunately
don't have time to test this very carefully right now. Let us know if
that helps, or you can wait until next week.
The call path that was triggering both of these can be exercised by
restarting the ceph-mds daemon. Try running your client for a bit and
the doing that and see if you get any more splats.
Thanks!
sage
On Wed, 7 Dec 2011, Amon Ott wrote:
> On Tuesday 06 December 2011 wrote Amon Ott:
> > Merged in and bug seems to be fixed. No more deadlock warnings today.
>
> Unfortunately, I got another deadlock message in the log today. Full log of
> one boot time is attached.
>
> Amon Ott
> --
> Dr. Amon Ott
> m-privacy GmbH Tel: +49 30 24342334
> Am Köllnischen Park 1 Fax: +49 30 24342336
> 10179 Berlin http://www.m-privacy.de
>
> Amtsgericht Charlottenburg, HRB 84946
>
> Geschäftsführer:
> Dipl.-Kfm. Holger Maczkowsky,
> Roman Maczkowsky
>
> GnuPG-Key-ID: 0x2DD3A649
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Circular lock / deadlock in kernel client
2011-12-07 16:33 ` Sage Weil
@ 2011-12-08 9:58 ` Amon Ott
2011-12-08 13:39 ` Amon Ott
0 siblings, 1 reply; 11+ messages in thread
From: Amon Ott @ 2011-12-08 9:58 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On Wednesday 07 December 2011 wrote Sage Weil:
> I pushed a patch to wip-d-lock that may fix this one, but unfortunately
> don't have time to test this very carefully right now. Let us know if
> that helps, or you can wait until next week.
Rebuilding kernel with that patch.
> The call path that was triggering both of these can be exercised by
> restarting the ceph-mds daemon. Try running your client for a bit and
> the doing that and see if you get any more splats.
What triggered the kernel problem was bug 1047. ceph-mds crashed on all nodes
with that assert. When the kernel detected that the main mds connection was
missing, it tried to reconnect and hung.
We do appreciate how much work you are doing to make Ceph production ready.
Thank you!
We really want to use Ceph in our product, because the design and experience
so far have shown a lot of potential.
Amon Ott
--
Dr. Amon Ott
m-privacy GmbH Tel: +49 30 24342334
Am Köllnischen Park 1 Fax: +49 30 24342336
10179 Berlin http://www.m-privacy.de
Amtsgericht Charlottenburg, HRB 84946
Geschäftsführer:
Dipl.-Kfm. Holger Maczkowsky,
Roman Maczkowsky
GnuPG-Key-ID: 0x2DD3A649
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Circular lock / deadlock in kernel client
2011-12-08 9:58 ` Amon Ott
@ 2011-12-08 13:39 ` Amon Ott
0 siblings, 0 replies; 11+ messages in thread
From: Amon Ott @ 2011-12-08 13:39 UTC (permalink / raw)
To: Sage Weil; +Cc: ceph-devel
On Thursday 08 December 2011 wrote Amon Ott:
> On Wednesday 07 December 2011 wrote Sage Weil:
> > I pushed a patch to wip-d-lock that may fix this one, but unfortunately
> > don't have time to test this very carefully right now. Let us know if
> > that helps, or you can wait until next week.
>
> Rebuilding kernel with that patch.
With the patch the deadlock disappears and the kernel does not hang at the
locks afterwards.
> > The call path that was triggering both of these can be exercised by
> > restarting the ceph-mds daemon. Try running your client for a bit and
> > the doing that and see if you get any more splats.
>
> What triggered the kernel problem was bug 1047. ceph-mds crashed on all
> nodes with that assert. When the kernel detected that the main mds
> connection was missing, it tried to reconnect and hung.
This problem remains, and without a working mds the whole ceph mount hangs.
Instead of crashing, mds sometimes goes into a dead loop and uses a cpu core
at 100%. This also makes the mount hang.
Amon Ott
--
Dr. Amon Ott
m-privacy GmbH Tel: +49 30 24342334
Am Köllnischen Park 1 Fax: +49 30 24342336
10179 Berlin http://www.m-privacy.de
Amtsgericht Charlottenburg, HRB 84946
Geschäftsführer:
Dipl.-Kfm. Holger Maczkowsky,
Roman Maczkowsky
GnuPG-Key-ID: 0x2DD3A649
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-12-08 13:39 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-30 12:21 Circular lock / deadlock in kernel client Amon Ott
2011-11-30 17:20 ` Sage Weil
2011-11-30 17:55 ` Sage Weil
2011-12-01 8:16 ` Amon Ott
2011-12-01 16:12 ` Sage Weil
2011-12-01 16:24 ` Amon Ott
2011-12-06 13:21 ` Amon Ott
2011-12-07 15:20 ` Amon Ott
2011-12-07 16:33 ` Sage Weil
2011-12-08 9:58 ` Amon Ott
2011-12-08 13:39 ` Amon Ott
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.