* rbd unmap deadlock @ 2014-05-02 16:04 Hannes Landeholm 2014-05-02 16:09 ` Hannes Landeholm 2014-05-02 16:09 ` Alex Elder 0 siblings, 2 replies; 7+ messages in thread From: Hannes Landeholm @ 2014-05-02 16:04 UTC (permalink / raw) To: Ceph Development Hi, I just had a rbd unmap operation deadlock on my development machine. The file system was in heavy use before I did it but I have a sync barrier before the umount and unmap so it shouldn't matter. The rbd unmap hanged in "State: D (disk sleep)". I have so far waited over 10 minutes, this normally takes < 1 sec. Here is the /proc/pid/stack output: [<ffffffff8107e23a>] flush_workqueue+0x11a/0x5a0 [<ffffffffa031b415>] ceph_msgr_flush+0x15/0x20 [libceph] [<ffffffffa03219c6>] ceph_monc_stop+0x46/0x120 [libceph] [<ffffffffa031af28>] ceph_destroy_client+0x38/0xa0 [libceph] [<ffffffffa0359b88>] rbd_client_release+0x68/0xa0 [rbd] [<ffffffffa0359bec>] rbd_put_client+0x2c/0x30 [rbd] [<ffffffffa0359c06>] rbd_dev_destroy+0x16/0x30 [rbd] [<ffffffffa0359c77>] rbd_dev_image_release+0x57/0x60 [rbd] [<ffffffffa035adc7>] do_rbd_remove.isra.25+0x167/0x1b0 [rbd] [<ffffffffa035ae54>] rbd_remove+0x24/0x30 [rbd] [<ffffffff8136ea67>] bus_attr_store+0x27/0x30 [<ffffffff81218d4d>] sysfs_kf_write+0x3d/0x50 [<ffffffff8121c982>] kernfs_fop_write+0xd2/0x140 [<ffffffff811a67fa>] vfs_write+0xba/0x1e0 [<ffffffff811a7206>] SyS_write+0x46/0xc0 [<ffffffff814e66e9>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff Unfortunately our rbd.ko does not appear to have any debug symbols. Other unmaps also hanged after this that have the same parent. (We are using layering.) Linux version: 3.14.1. Thank you for your time, -- Hannes Landeholm ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rbd unmap deadlock 2014-05-02 16:04 rbd unmap deadlock Hannes Landeholm @ 2014-05-02 16:09 ` Hannes Landeholm 2014-05-02 16:09 ` Alex Elder 1 sibling, 0 replies; 7+ messages in thread From: Hannes Landeholm @ 2014-05-02 16:09 UTC (permalink / raw) To: Ceph Development Correction: I just realized that the other hanged unmaps does not have the same parent. They are actually in different pools and unrelated. Thank you for your time, -- Hannes Landeholm ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rbd unmap deadlock 2014-05-02 16:04 rbd unmap deadlock Hannes Landeholm 2014-05-02 16:09 ` Hannes Landeholm @ 2014-05-02 16:09 ` Alex Elder 2014-05-02 16:23 ` Hannes Landeholm 1 sibling, 1 reply; 7+ messages in thread From: Alex Elder @ 2014-05-02 16:09 UTC (permalink / raw) To: Hannes Landeholm, Ceph Development On 05/02/2014 11:04 AM, Hannes Landeholm wrote: > Hi, I just had a rbd unmap operation deadlock on my development > machine. The file system was in heavy use before I did it but I have a > sync barrier before the umount and unmap so it shouldn't matter. The > rbd unmap hanged in "State: D (disk sleep)". I have so far waited > over 10 minutes, this normally takes < 1 sec. > > Here is the /proc/pid/stack output: > > [<ffffffff8107e23a>] flush_workqueue+0x11a/0x5a0 > [<ffffffffa031b415>] ceph_msgr_flush+0x15/0x20 [libceph] > [<ffffffffa03219c6>] ceph_monc_stop+0x46/0x120 [libceph] > [<ffffffffa031af28>] ceph_destroy_client+0x38/0xa0 [libceph] > [<ffffffffa0359b88>] rbd_client_release+0x68/0xa0 [rbd] > [<ffffffffa0359bec>] rbd_put_client+0x2c/0x30 [rbd] > [<ffffffffa0359c06>] rbd_dev_destroy+0x16/0x30 [rbd] > [<ffffffffa0359c77>] rbd_dev_image_release+0x57/0x60 [rbd] > [<ffffffffa035adc7>] do_rbd_remove.isra.25+0x167/0x1b0 [rbd] > [<ffffffffa035ae54>] rbd_remove+0x24/0x30 [rbd] > [<ffffffff8136ea67>] bus_attr_store+0x27/0x30 > [<ffffffff81218d4d>] sysfs_kf_write+0x3d/0x50 > [<ffffffff8121c982>] kernfs_fop_write+0xd2/0x140 > [<ffffffff811a67fa>] vfs_write+0xba/0x1e0 > [<ffffffff811a7206>] SyS_write+0x46/0xc0 > [<ffffffff814e66e9>] system_call_fastpath+0x16/0x1b > [<ffffffffffffffff>] 0xffffffffffffffff > > Unfortunately our rbd.ko does not appear to have any debug symbols. > > Other unmaps also hanged after this that have the same parent. (We are > using layering.) Linux version: 3.14.1. Is this "stock" 3.14.1? Can you provide the full output of "uname -a"? And if possible, either /proc/config.gz or /boot/config-3.13.1 (or whichever file seems to match the currently-running kernel)? Thank you. -Alex > > Thank you for your time, > -- > Hannes Landeholm > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rbd unmap deadlock 2014-05-02 16:09 ` Alex Elder @ 2014-05-02 16:23 ` Hannes Landeholm 2014-05-02 16:30 ` Ilya Dryomov 0 siblings, 1 reply; 7+ messages in thread From: Hannes Landeholm @ 2014-05-02 16:23 UTC (permalink / raw) To: Alex Elder; +Cc: Ceph Development On Fri, May 2, 2014 at 6:09 PM, Alex Elder <elder@ieee.org> wrote: > On 05/02/2014 11:04 AM, Hannes Landeholm wrote: >> >> Hi, I just had a rbd unmap operation deadlock on my development >> machine. The file system was in heavy use before I did it but I have a >> sync barrier before the umount and unmap so it shouldn't matter. The >> rbd unmap hanged in "State: D (disk sleep)". I have so far waited >> over 10 minutes, this normally takes < 1 sec. >> >> Here is the /proc/pid/stack output: >> >> [<ffffffff8107e23a>] flush_workqueue+0x11a/0x5a0 >> [<ffffffffa031b415>] ceph_msgr_flush+0x15/0x20 [libceph] >> [<ffffffffa03219c6>] ceph_monc_stop+0x46/0x120 [libceph] >> [<ffffffffa031af28>] ceph_destroy_client+0x38/0xa0 [libceph] >> [<ffffffffa0359b88>] rbd_client_release+0x68/0xa0 [rbd] >> [<ffffffffa0359bec>] rbd_put_client+0x2c/0x30 [rbd] >> [<ffffffffa0359c06>] rbd_dev_destroy+0x16/0x30 [rbd] >> [<ffffffffa0359c77>] rbd_dev_image_release+0x57/0x60 [rbd] >> [<ffffffffa035adc7>] do_rbd_remove.isra.25+0x167/0x1b0 [rbd] >> [<ffffffffa035ae54>] rbd_remove+0x24/0x30 [rbd] >> [<ffffffff8136ea67>] bus_attr_store+0x27/0x30 >> [<ffffffff81218d4d>] sysfs_kf_write+0x3d/0x50 >> [<ffffffff8121c982>] kernfs_fop_write+0xd2/0x140 >> [<ffffffff811a67fa>] vfs_write+0xba/0x1e0 >> [<ffffffff811a7206>] SyS_write+0x46/0xc0 >> [<ffffffff814e66e9>] system_call_fastpath+0x16/0x1b >> [<ffffffffffffffff>] 0xffffffffffffffff >> >> Unfortunately our rbd.ko does not appear to have any debug symbols. >> >> Other unmaps also hanged after this that have the same parent. (We are >> using layering.) Linux version: 3.14.1. > > Is this "stock" 3.14.1? Can you provide the full output of "uname -a"? > And if possible, either /proc/config.gz or /boot/config-3.13.1 (or > whichever file seems to match the currently-running kernel)? Yes, this is a "stock" Arch 3.14.1 kernel with no custom patches. uname: Linux localhost 3.14.1-1-js #1 SMP PREEMPT Tue Apr 15 17:59:05 CEST 2014 x86_64 GNU/Linux config: http://pastebin.com/unZCzXZZ Thank you for your time, -- Hannes Landeholm ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rbd unmap deadlock 2014-05-02 16:23 ` Hannes Landeholm @ 2014-05-02 16:30 ` Ilya Dryomov 2014-05-02 16:40 ` Hannes Landeholm 0 siblings, 1 reply; 7+ messages in thread From: Ilya Dryomov @ 2014-05-02 16:30 UTC (permalink / raw) To: Hannes Landeholm; +Cc: Alex Elder, Ceph Development Can you succesfully map and then unmap a different image? What's the general state of the cluster, i.e. the output of ceph -s? Thanks, Ilya ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rbd unmap deadlock 2014-05-02 16:30 ` Ilya Dryomov @ 2014-05-02 16:40 ` Hannes Landeholm 2014-05-02 16:52 ` Ilya Dryomov 0 siblings, 1 reply; 7+ messages in thread From: Hannes Landeholm @ 2014-05-02 16:40 UTC (permalink / raw) To: Ilya Dryomov; +Cc: Alex Elder, Ceph Development On Fri, May 2, 2014 at 6:30 PM, Ilya Dryomov <ilya.dryomov@inktank.com> wrote: > Can you succesfully map and then unmap a different image? What's the > general state of the cluster, i.e. the output of ceph -s? > > Thanks, > > Ilya Yes, that was possible, however (as I mentioned) some additional unmaps also deadlocked (and some succeeded). Unfortunately I've rebooted the devel machine now (which fixed the issue). The status of ceph looks pretty much the same as before though: cluster e1206f49-cc79-436e-b69d-375e0374d7a9 health HEALTH_WARN monmap e1: 1 mons at {localhost=192.168.0.215:6789/0}, election epoch 1, quorum 0 localhost osdmap e550: 3 osds: 1 up, 1 in pgmap v153419: 892 pgs, 10 pools, 37194 MB data, 10703 objects 49119 MB used, 299 GB / 349 GB avail 892 active+clean 2014-05-02 18:31:57.254360 mon.0 [INF] pgmap v153419: 892 pgs: 892 active+clean; 37194 MB data, 49119 MB used, 299 GB / 349 GB avail FYI: This machine runs both the ceph cluster and the clients. Thank you for your time, -- Hannes Landeholm ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rbd unmap deadlock 2014-05-02 16:40 ` Hannes Landeholm @ 2014-05-02 16:52 ` Ilya Dryomov 0 siblings, 0 replies; 7+ messages in thread From: Ilya Dryomov @ 2014-05-02 16:52 UTC (permalink / raw) To: Hannes Landeholm; +Cc: Alex Elder, Ceph Development On Fri, May 2, 2014 at 8:40 PM, Hannes Landeholm <hannes@jumpstarter.io> wrote: > FYI: This machine runs both the ceph cluster and the clients. I'll file a ticket. I think I saw something similar sometime ago, at least the stacktrace looks familiar, and that was a dev box running both servers and kernel client too. Thanks, Ilya ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-05-02 16:52 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-02 16:04 rbd unmap deadlock Hannes Landeholm 2014-05-02 16:09 ` Hannes Landeholm 2014-05-02 16:09 ` Alex Elder 2014-05-02 16:23 ` Hannes Landeholm 2014-05-02 16:30 ` Ilya Dryomov 2014-05-02 16:40 ` Hannes Landeholm 2014-05-02 16:52 ` Ilya Dryomov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.