* Re: Possible deadlock condition [not found] <CA+jddaPMp5R0adi2sLVUWeFytDzfOjAeryXL+jPjGAk8kKqafg@mail.gmail.com> @ 2012-06-18 22:17 ` Mandell Degerness 2012-06-18 22:57 ` Dan Mick 0 siblings, 1 reply; 6+ messages in thread From: Mandell Degerness @ 2012-06-18 22:17 UTC (permalink / raw) To: ceph-devel Here is, perhaps, a more useful traceback from a different run of tests that we just ran into: Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task flush-254:0:29582 blocked for more than 120 seconds. Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681040] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681458] flush-254:0 D ffff880bd9ca2fc0 0 29582 2 0x00000000 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681740] ffff88006e51d160 0000000000000046 0000000000000002 ffff88061b362040 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682173] ffff88006e51d160 00000000000120c0 00000000000120c0 00000000000120c0 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682659] ffff88006e51dfd8 00000000000120c0 00000000000120c0 ffff88006e51dfd8 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683088] Call Trace: Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683302] [<ffffffff81520132>] schedule+0x5a/0x5c Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683514] [<ffffffff815203e7>] schedule_timeout+0x36/0xe3 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683784] [<ffffffff8101e0b2>] ? physflat_send_IPI_mask+0xe/0x10 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683999] [<ffffffff8101a237>] ? native_smp_send_reschedule+0x46/0x48 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684219] [<ffffffff811e0071>] ? list_move_tail+0x27/0x2c Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684432] [<ffffffff81520d13>] __down_common+0x90/0xd4 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684708] [<ffffffff811e1120>] ? _xfs_buf_find+0x17f/0x210 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684925] [<ffffffff81520dca>] __down+0x1d/0x1f Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685139] [<ffffffff8105db4e>] down+0x2d/0x3d Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685350] [<ffffffff811e0f68>] xfs_buf_lock+0x76/0xaf Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685565] [<ffffffff811e1120>] _xfs_buf_find+0x17f/0x210 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685836] [<ffffffff811e13b6>] xfs_buf_get+0x2a/0x177 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686052] [<ffffffff811e19f6>] xfs_buf_read+0x1f/0xca Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686270] [<ffffffff8122a0b7>] xfs_trans_read_buf+0x205/0x308 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686490] [<ffffffff81205e01>] xfs_btree_read_buf_block.clone.22+0x4f/0xa7 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687015] [<ffffffff8122a3ee>] ? xfs_trans_log_buf+0xb2/0xc1 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687232] [<ffffffff81205edd>] xfs_btree_lookup_get_block+0x84/0xac Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687449] [<ffffffff81208e83>] xfs_btree_lookup+0x12b/0x3dc Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687721] [<ffffffff811f6bb2>] ? xfs_alloc_vextent+0x447/0x469 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687939] [<ffffffff811fd171>] xfs_bmbt_lookup_eq+0x1f/0x21 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688156] [<ffffffff811ffa88>] xfs_bmap_add_extent_delay_real+0x5b5/0xfec Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688378] [<ffffffff810f155b>] ? kmem_cache_alloc+0x87/0xf3 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688650] [<ffffffff81204c40>] ? xfs_bmbt_init_cursor+0x3f/0x107 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688867] [<ffffffff81201160>] xfs_bmapi_allocate+0x1f6/0x23a Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689084] [<ffffffff812185bd>] ? xfs_iext_bno_to_irec+0x95/0xb9 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689301] [<ffffffff81203414>] xfs_bmapi_write+0x32d/0x5a2 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689519] [<ffffffff811e99e4>] xfs_iomap_write_allocate+0x1a5/0x29f Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689797] [<ffffffff811df12a>] xfs_map_blocks+0x13e/0x1dd Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690016] [<ffffffff811dfbff>] xfs_vm_writepage+0x24e/0x410 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690233] [<ffffffff810bde1e>] __writepage+0x17/0x30 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690446] [<ffffffff810be6ed>] write_cache_pages+0x276/0x3c8 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690693] [<ffffffff810bde07>] ? set_page_dirty+0x60/0x60 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690908] [<ffffffff810be884>] generic_writepages+0x45/0x5c Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691123] [<ffffffff811defcb>] xfs_vm_writepages+0x4d/0x54 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691337] [<ffffffff810bf832>] do_writepages+0x21/0x2a Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691552] [<ffffffff811218f5>] writeback_single_inode+0x12a/0x2cc Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691800] [<ffffffff81121d92>] writeback_sb_inodes+0x174/0x215 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692016] [<ffffffff81122185>] __writeback_inodes_wb+0x78/0xb9 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692231] [<ffffffff811224b5>] wb_writeback+0x136/0x22a Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692444] [<ffffffff810becd1>] ? determine_dirtyable_memory+0x1d/0x26 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692692] [<ffffffff81122d1e>] wb_do_writeback+0x19c/0x1b7 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692907] [<ffffffff81122dc5>] bdi_writeback_thread+0x8c/0x20f Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693122] [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693336] [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693553] [<ffffffff8105911d>] kthread+0x82/0x8a Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693803] [<ffffffff81523c34>] kernel_thread_helper+0x4/0x10 Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694018] [<ffffffff8105909b>] ? kthread_worker_fn+0x13b/0x13b Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232] [<ffffffff81523c30>] ? gs_change+0xb/0xb On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness <mandell@pistoncloud.com> wrote: > We've been seeing random issues of apparent deadlocks. We are running > ceph 0.47 on kernel 3.2.18. OSDs are running on XFS file system. > mysqld (which ran into the particular problems in the attached kernel > log) is running on an RBD with XFS (mounted on a system which includes > OSDs). We have sync_fs, and gcc ver 4.5.3-r2. The mysqld process in > both instances returned an error to the calling process. > > Regards, > Mandell Degerness -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Possible deadlock condition 2012-06-18 22:17 ` Possible deadlock condition Mandell Degerness @ 2012-06-18 22:57 ` Dan Mick 2012-06-18 23:08 ` Mandell Degerness 0 siblings, 1 reply; 6+ messages in thread From: Dan Mick @ 2012-06-18 22:57 UTC (permalink / raw) To: Mandell Degerness; +Cc: ceph-devel Does the xfs on the OSD have plenty of free space left, or could this be an allocation deadlock? On 06/18/2012 03:17 PM, Mandell Degerness wrote: > Here is, perhaps, a more useful traceback from a different run of > tests that we just ran into: > > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task > flush-254:0:29582 blocked for more than 120 seconds. > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681040] "echo 0> > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681458] flush-254:0 > D ffff880bd9ca2fc0 0 29582 2 0x00000000 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681740] > ffff88006e51d160 0000000000000046 0000000000000002 ffff88061b362040 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682173] > ffff88006e51d160 00000000000120c0 00000000000120c0 00000000000120c0 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682659] > ffff88006e51dfd8 00000000000120c0 00000000000120c0 ffff88006e51dfd8 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683088] Call Trace: > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683302] > [<ffffffff81520132>] schedule+0x5a/0x5c > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683514] > [<ffffffff815203e7>] schedule_timeout+0x36/0xe3 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683784] > [<ffffffff8101e0b2>] ? physflat_send_IPI_mask+0xe/0x10 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683999] > [<ffffffff8101a237>] ? native_smp_send_reschedule+0x46/0x48 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684219] > [<ffffffff811e0071>] ? list_move_tail+0x27/0x2c > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684432] > [<ffffffff81520d13>] __down_common+0x90/0xd4 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684708] > [<ffffffff811e1120>] ? _xfs_buf_find+0x17f/0x210 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684925] > [<ffffffff81520dca>] __down+0x1d/0x1f > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685139] > [<ffffffff8105db4e>] down+0x2d/0x3d > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685350] > [<ffffffff811e0f68>] xfs_buf_lock+0x76/0xaf > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685565] > [<ffffffff811e1120>] _xfs_buf_find+0x17f/0x210 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685836] > [<ffffffff811e13b6>] xfs_buf_get+0x2a/0x177 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686052] > [<ffffffff811e19f6>] xfs_buf_read+0x1f/0xca > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686270] > [<ffffffff8122a0b7>] xfs_trans_read_buf+0x205/0x308 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686490] > [<ffffffff81205e01>] xfs_btree_read_buf_block.clone.22+0x4f/0xa7 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687015] > [<ffffffff8122a3ee>] ? xfs_trans_log_buf+0xb2/0xc1 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687232] > [<ffffffff81205edd>] xfs_btree_lookup_get_block+0x84/0xac > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687449] > [<ffffffff81208e83>] xfs_btree_lookup+0x12b/0x3dc > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687721] > [<ffffffff811f6bb2>] ? xfs_alloc_vextent+0x447/0x469 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687939] > [<ffffffff811fd171>] xfs_bmbt_lookup_eq+0x1f/0x21 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688156] > [<ffffffff811ffa88>] xfs_bmap_add_extent_delay_real+0x5b5/0xfec > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688378] > [<ffffffff810f155b>] ? kmem_cache_alloc+0x87/0xf3 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688650] > [<ffffffff81204c40>] ? xfs_bmbt_init_cursor+0x3f/0x107 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688867] > [<ffffffff81201160>] xfs_bmapi_allocate+0x1f6/0x23a > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689084] > [<ffffffff812185bd>] ? xfs_iext_bno_to_irec+0x95/0xb9 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689301] > [<ffffffff81203414>] xfs_bmapi_write+0x32d/0x5a2 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689519] > [<ffffffff811e99e4>] xfs_iomap_write_allocate+0x1a5/0x29f > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689797] > [<ffffffff811df12a>] xfs_map_blocks+0x13e/0x1dd > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690016] > [<ffffffff811dfbff>] xfs_vm_writepage+0x24e/0x410 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690233] > [<ffffffff810bde1e>] __writepage+0x17/0x30 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690446] > [<ffffffff810be6ed>] write_cache_pages+0x276/0x3c8 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690693] > [<ffffffff810bde07>] ? set_page_dirty+0x60/0x60 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690908] > [<ffffffff810be884>] generic_writepages+0x45/0x5c > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691123] > [<ffffffff811defcb>] xfs_vm_writepages+0x4d/0x54 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691337] > [<ffffffff810bf832>] do_writepages+0x21/0x2a > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691552] > [<ffffffff811218f5>] writeback_single_inode+0x12a/0x2cc > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691800] > [<ffffffff81121d92>] writeback_sb_inodes+0x174/0x215 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692016] > [<ffffffff81122185>] __writeback_inodes_wb+0x78/0xb9 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692231] > [<ffffffff811224b5>] wb_writeback+0x136/0x22a > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692444] > [<ffffffff810becd1>] ? determine_dirtyable_memory+0x1d/0x26 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692692] > [<ffffffff81122d1e>] wb_do_writeback+0x19c/0x1b7 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692907] > [<ffffffff81122dc5>] bdi_writeback_thread+0x8c/0x20f > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693122] > [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693336] > [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693553] > [<ffffffff8105911d>] kthread+0x82/0x8a > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693803] > [<ffffffff81523c34>] kernel_thread_helper+0x4/0x10 > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694018] > [<ffffffff8105909b>] ? kthread_worker_fn+0x13b/0x13b > Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232] > [<ffffffff81523c30>] ? gs_change+0xb/0xb > > > On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness > <mandell@pistoncloud.com> wrote: >> We've been seeing random issues of apparent deadlocks. We are running >> ceph 0.47 on kernel 3.2.18. OSDs are running on XFS file system. >> mysqld (which ran into the particular problems in the attached kernel >> log) is running on an RBD with XFS (mounted on a system which includes >> OSDs). We have sync_fs, and gcc ver 4.5.3-r2. The mysqld process in >> both instances returned an error to the calling process. >> >> Regards, >> Mandell Degerness > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Possible deadlock condition 2012-06-18 22:57 ` Dan Mick @ 2012-06-18 23:08 ` Mandell Degerness 2012-06-18 23:34 ` Dan Mick 0 siblings, 1 reply; 6+ messages in thread From: Mandell Degerness @ 2012-06-18 23:08 UTC (permalink / raw) To: Dan Mick; +Cc: ceph-devel None of the OSDs seem to be more than 82% full. I didn't think we were running quite that close to the margin, but it is still far from actually full. On Mon, Jun 18, 2012 at 3:57 PM, Dan Mick <dan.mick@inktank.com> wrote: > Does the xfs on the OSD have plenty of free space left, or could this be an > allocation deadlock? > > > On 06/18/2012 03:17 PM, Mandell Degerness wrote: >> >> Here is, perhaps, a more useful traceback from a different run of >> tests that we just ran into: >> >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task >> flush-254:0:29582 blocked for more than 120 seconds. >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681040] "echo 0> >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681458] flush-254:0 >> D ffff880bd9ca2fc0 0 29582 2 0x00000000 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681740] >> ffff88006e51d160 0000000000000046 0000000000000002 ffff88061b362040 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682173] >> ffff88006e51d160 00000000000120c0 00000000000120c0 00000000000120c0 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682659] >> ffff88006e51dfd8 00000000000120c0 00000000000120c0 ffff88006e51dfd8 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683088] Call Trace: >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683302] >> [<ffffffff81520132>] schedule+0x5a/0x5c >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683514] >> [<ffffffff815203e7>] schedule_timeout+0x36/0xe3 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683784] >> [<ffffffff8101e0b2>] ? physflat_send_IPI_mask+0xe/0x10 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683999] >> [<ffffffff8101a237>] ? native_smp_send_reschedule+0x46/0x48 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684219] >> [<ffffffff811e0071>] ? list_move_tail+0x27/0x2c >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684432] >> [<ffffffff81520d13>] __down_common+0x90/0xd4 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684708] >> [<ffffffff811e1120>] ? _xfs_buf_find+0x17f/0x210 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684925] >> [<ffffffff81520dca>] __down+0x1d/0x1f >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685139] >> [<ffffffff8105db4e>] down+0x2d/0x3d >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685350] >> [<ffffffff811e0f68>] xfs_buf_lock+0x76/0xaf >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685565] >> [<ffffffff811e1120>] _xfs_buf_find+0x17f/0x210 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685836] >> [<ffffffff811e13b6>] xfs_buf_get+0x2a/0x177 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686052] >> [<ffffffff811e19f6>] xfs_buf_read+0x1f/0xca >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686270] >> [<ffffffff8122a0b7>] xfs_trans_read_buf+0x205/0x308 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686490] >> [<ffffffff81205e01>] xfs_btree_read_buf_block.clone.22+0x4f/0xa7 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687015] >> [<ffffffff8122a3ee>] ? xfs_trans_log_buf+0xb2/0xc1 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687232] >> [<ffffffff81205edd>] xfs_btree_lookup_get_block+0x84/0xac >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687449] >> [<ffffffff81208e83>] xfs_btree_lookup+0x12b/0x3dc >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687721] >> [<ffffffff811f6bb2>] ? xfs_alloc_vextent+0x447/0x469 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687939] >> [<ffffffff811fd171>] xfs_bmbt_lookup_eq+0x1f/0x21 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688156] >> [<ffffffff811ffa88>] xfs_bmap_add_extent_delay_real+0x5b5/0xfec >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688378] >> [<ffffffff810f155b>] ? kmem_cache_alloc+0x87/0xf3 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688650] >> [<ffffffff81204c40>] ? xfs_bmbt_init_cursor+0x3f/0x107 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688867] >> [<ffffffff81201160>] xfs_bmapi_allocate+0x1f6/0x23a >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689084] >> [<ffffffff812185bd>] ? xfs_iext_bno_to_irec+0x95/0xb9 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689301] >> [<ffffffff81203414>] xfs_bmapi_write+0x32d/0x5a2 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689519] >> [<ffffffff811e99e4>] xfs_iomap_write_allocate+0x1a5/0x29f >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689797] >> [<ffffffff811df12a>] xfs_map_blocks+0x13e/0x1dd >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690016] >> [<ffffffff811dfbff>] xfs_vm_writepage+0x24e/0x410 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690233] >> [<ffffffff810bde1e>] __writepage+0x17/0x30 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690446] >> [<ffffffff810be6ed>] write_cache_pages+0x276/0x3c8 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690693] >> [<ffffffff810bde07>] ? set_page_dirty+0x60/0x60 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690908] >> [<ffffffff810be884>] generic_writepages+0x45/0x5c >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691123] >> [<ffffffff811defcb>] xfs_vm_writepages+0x4d/0x54 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691337] >> [<ffffffff810bf832>] do_writepages+0x21/0x2a >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691552] >> [<ffffffff811218f5>] writeback_single_inode+0x12a/0x2cc >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691800] >> [<ffffffff81121d92>] writeback_sb_inodes+0x174/0x215 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692016] >> [<ffffffff81122185>] __writeback_inodes_wb+0x78/0xb9 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692231] >> [<ffffffff811224b5>] wb_writeback+0x136/0x22a >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692444] >> [<ffffffff810becd1>] ? determine_dirtyable_memory+0x1d/0x26 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692692] >> [<ffffffff81122d1e>] wb_do_writeback+0x19c/0x1b7 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692907] >> [<ffffffff81122dc5>] bdi_writeback_thread+0x8c/0x20f >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693122] >> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693336] >> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693553] >> [<ffffffff8105911d>] kthread+0x82/0x8a >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693803] >> [<ffffffff81523c34>] kernel_thread_helper+0x4/0x10 >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694018] >> [<ffffffff8105909b>] ? kthread_worker_fn+0x13b/0x13b >> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232] >> [<ffffffff81523c30>] ? gs_change+0xb/0xb >> >> >> On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness >> <mandell@pistoncloud.com> wrote: >>> >>> We've been seeing random issues of apparent deadlocks. We are running >>> ceph 0.47 on kernel 3.2.18. OSDs are running on XFS file system. >>> mysqld (which ran into the particular problems in the attached kernel >>> log) is running on an RBD with XFS (mounted on a system which includes >>> OSDs). We have sync_fs, and gcc ver 4.5.3-r2. The mysqld process in >>> both instances returned an error to the calling process. >>> >>> Regards, >>> Mandell Degerness >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Possible deadlock condition 2012-06-18 23:08 ` Mandell Degerness @ 2012-06-18 23:34 ` Dan Mick 2012-06-20 22:34 ` Mandell Degerness 0 siblings, 1 reply; 6+ messages in thread From: Dan Mick @ 2012-06-18 23:34 UTC (permalink / raw) To: Mandell Degerness; +Cc: ceph-devel I don't know enough to know if there's a connection, but I do note this prior thread that sounds kinda similar: http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/6574 On 06/18/2012 04:08 PM, Mandell Degerness wrote: > None of the OSDs seem to be more than 82% full. I didn't think we were > running quite that close to the margin, but it is still far from > actually full. > > > On Mon, Jun 18, 2012 at 3:57 PM, Dan Mick<dan.mick@inktank.com> wrote: >> Does the xfs on the OSD have plenty of free space left, or could this be an >> allocation deadlock? >> >> >> On 06/18/2012 03:17 PM, Mandell Degerness wrote: >>> >>> Here is, perhaps, a more useful traceback from a different run of >>> tests that we just ran into: >>> >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task >>> flush-254:0:29582 blocked for more than 120 seconds. >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681040] "echo 0> >>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681458] flush-254:0 >>> D ffff880bd9ca2fc0 0 29582 2 0x00000000 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681740] >>> ffff88006e51d160 0000000000000046 0000000000000002 ffff88061b362040 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682173] >>> ffff88006e51d160 00000000000120c0 00000000000120c0 00000000000120c0 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682659] >>> ffff88006e51dfd8 00000000000120c0 00000000000120c0 ffff88006e51dfd8 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683088] Call Trace: >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683302] >>> [<ffffffff81520132>] schedule+0x5a/0x5c >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683514] >>> [<ffffffff815203e7>] schedule_timeout+0x36/0xe3 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683784] >>> [<ffffffff8101e0b2>] ? physflat_send_IPI_mask+0xe/0x10 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683999] >>> [<ffffffff8101a237>] ? native_smp_send_reschedule+0x46/0x48 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684219] >>> [<ffffffff811e0071>] ? list_move_tail+0x27/0x2c >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684432] >>> [<ffffffff81520d13>] __down_common+0x90/0xd4 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684708] >>> [<ffffffff811e1120>] ? _xfs_buf_find+0x17f/0x210 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684925] >>> [<ffffffff81520dca>] __down+0x1d/0x1f >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685139] >>> [<ffffffff8105db4e>] down+0x2d/0x3d >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685350] >>> [<ffffffff811e0f68>] xfs_buf_lock+0x76/0xaf >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685565] >>> [<ffffffff811e1120>] _xfs_buf_find+0x17f/0x210 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685836] >>> [<ffffffff811e13b6>] xfs_buf_get+0x2a/0x177 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686052] >>> [<ffffffff811e19f6>] xfs_buf_read+0x1f/0xca >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686270] >>> [<ffffffff8122a0b7>] xfs_trans_read_buf+0x205/0x308 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686490] >>> [<ffffffff81205e01>] xfs_btree_read_buf_block.clone.22+0x4f/0xa7 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687015] >>> [<ffffffff8122a3ee>] ? xfs_trans_log_buf+0xb2/0xc1 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687232] >>> [<ffffffff81205edd>] xfs_btree_lookup_get_block+0x84/0xac >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687449] >>> [<ffffffff81208e83>] xfs_btree_lookup+0x12b/0x3dc >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687721] >>> [<ffffffff811f6bb2>] ? xfs_alloc_vextent+0x447/0x469 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687939] >>> [<ffffffff811fd171>] xfs_bmbt_lookup_eq+0x1f/0x21 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688156] >>> [<ffffffff811ffa88>] xfs_bmap_add_extent_delay_real+0x5b5/0xfec >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688378] >>> [<ffffffff810f155b>] ? kmem_cache_alloc+0x87/0xf3 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688650] >>> [<ffffffff81204c40>] ? xfs_bmbt_init_cursor+0x3f/0x107 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688867] >>> [<ffffffff81201160>] xfs_bmapi_allocate+0x1f6/0x23a >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689084] >>> [<ffffffff812185bd>] ? xfs_iext_bno_to_irec+0x95/0xb9 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689301] >>> [<ffffffff81203414>] xfs_bmapi_write+0x32d/0x5a2 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689519] >>> [<ffffffff811e99e4>] xfs_iomap_write_allocate+0x1a5/0x29f >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689797] >>> [<ffffffff811df12a>] xfs_map_blocks+0x13e/0x1dd >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690016] >>> [<ffffffff811dfbff>] xfs_vm_writepage+0x24e/0x410 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690233] >>> [<ffffffff810bde1e>] __writepage+0x17/0x30 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690446] >>> [<ffffffff810be6ed>] write_cache_pages+0x276/0x3c8 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690693] >>> [<ffffffff810bde07>] ? set_page_dirty+0x60/0x60 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690908] >>> [<ffffffff810be884>] generic_writepages+0x45/0x5c >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691123] >>> [<ffffffff811defcb>] xfs_vm_writepages+0x4d/0x54 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691337] >>> [<ffffffff810bf832>] do_writepages+0x21/0x2a >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691552] >>> [<ffffffff811218f5>] writeback_single_inode+0x12a/0x2cc >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691800] >>> [<ffffffff81121d92>] writeback_sb_inodes+0x174/0x215 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692016] >>> [<ffffffff81122185>] __writeback_inodes_wb+0x78/0xb9 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692231] >>> [<ffffffff811224b5>] wb_writeback+0x136/0x22a >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692444] >>> [<ffffffff810becd1>] ? determine_dirtyable_memory+0x1d/0x26 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692692] >>> [<ffffffff81122d1e>] wb_do_writeback+0x19c/0x1b7 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692907] >>> [<ffffffff81122dc5>] bdi_writeback_thread+0x8c/0x20f >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693122] >>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693336] >>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693553] >>> [<ffffffff8105911d>] kthread+0x82/0x8a >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693803] >>> [<ffffffff81523c34>] kernel_thread_helper+0x4/0x10 >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694018] >>> [<ffffffff8105909b>] ? kthread_worker_fn+0x13b/0x13b >>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232] >>> [<ffffffff81523c30>] ? gs_change+0xb/0xb >>> >>> >>> On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness >>> <mandell@pistoncloud.com> wrote: >>>> >>>> We've been seeing random issues of apparent deadlocks. We are running >>>> ceph 0.47 on kernel 3.2.18. OSDs are running on XFS file system. >>>> mysqld (which ran into the particular problems in the attached kernel >>>> log) is running on an RBD with XFS (mounted on a system which includes >>>> OSDs). We have sync_fs, and gcc ver 4.5.3-r2. The mysqld process in >>>> both instances returned an error to the calling process. >>>> >>>> Regards, >>>> Mandell Degerness >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Possible deadlock condition 2012-06-18 23:34 ` Dan Mick @ 2012-06-20 22:34 ` Mandell Degerness 2012-06-20 22:42 ` Sage Weil 0 siblings, 1 reply; 6+ messages in thread From: Mandell Degerness @ 2012-06-20 22:34 UTC (permalink / raw) To: Dan Mick; +Cc: ceph-devel The prior thread seems to refer to something fixed in 3.0.X, we are running 3.2.18. Also, in answer to the previous question, we see the error on systems running at 82% full and systems running at 5% full disks. Anyone have any ideas about how to resolve the deadlock? Do we have to configure mysql differently? -Mandell On Mon, Jun 18, 2012 at 4:34 PM, Dan Mick <dan.mick@inktank.com> wrote: > I don't know enough to know if there's a connection, but I do note this > prior thread that sounds kinda similar: > > http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/6574 > > > > On 06/18/2012 04:08 PM, Mandell Degerness wrote: >> >> None of the OSDs seem to be more than 82% full. I didn't think we were >> running quite that close to the margin, but it is still far from >> actually full. >> >> >> On Mon, Jun 18, 2012 at 3:57 PM, Dan Mick<dan.mick@inktank.com> wrote: >>> >>> Does the xfs on the OSD have plenty of free space left, or could this be >>> an >>> allocation deadlock? >>> >>> >>> On 06/18/2012 03:17 PM, Mandell Degerness wrote: >>>> >>>> >>>> Here is, perhaps, a more useful traceback from a different run of >>>> tests that we just ran into: >>>> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task >>>> flush-254:0:29582 blocked for more than 120 seconds. >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681040] "echo 0> >>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681458] flush-254:0 >>>> D ffff880bd9ca2fc0 0 29582 2 0x00000000 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681740] >>>> ffff88006e51d160 0000000000000046 0000000000000002 ffff88061b362040 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682173] >>>> ffff88006e51d160 00000000000120c0 00000000000120c0 00000000000120c0 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682659] >>>> ffff88006e51dfd8 00000000000120c0 00000000000120c0 ffff88006e51dfd8 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683088] Call Trace: >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683302] >>>> [<ffffffff81520132>] schedule+0x5a/0x5c >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683514] >>>> [<ffffffff815203e7>] schedule_timeout+0x36/0xe3 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683784] >>>> [<ffffffff8101e0b2>] ? physflat_send_IPI_mask+0xe/0x10 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683999] >>>> [<ffffffff8101a237>] ? native_smp_send_reschedule+0x46/0x48 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684219] >>>> [<ffffffff811e0071>] ? list_move_tail+0x27/0x2c >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684432] >>>> [<ffffffff81520d13>] __down_common+0x90/0xd4 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684708] >>>> [<ffffffff811e1120>] ? _xfs_buf_find+0x17f/0x210 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684925] >>>> [<ffffffff81520dca>] __down+0x1d/0x1f >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685139] >>>> [<ffffffff8105db4e>] down+0x2d/0x3d >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685350] >>>> [<ffffffff811e0f68>] xfs_buf_lock+0x76/0xaf >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685565] >>>> [<ffffffff811e1120>] _xfs_buf_find+0x17f/0x210 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685836] >>>> [<ffffffff811e13b6>] xfs_buf_get+0x2a/0x177 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686052] >>>> [<ffffffff811e19f6>] xfs_buf_read+0x1f/0xca >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686270] >>>> [<ffffffff8122a0b7>] xfs_trans_read_buf+0x205/0x308 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686490] >>>> [<ffffffff81205e01>] xfs_btree_read_buf_block.clone.22+0x4f/0xa7 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687015] >>>> [<ffffffff8122a3ee>] ? xfs_trans_log_buf+0xb2/0xc1 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687232] >>>> [<ffffffff81205edd>] xfs_btree_lookup_get_block+0x84/0xac >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687449] >>>> [<ffffffff81208e83>] xfs_btree_lookup+0x12b/0x3dc >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687721] >>>> [<ffffffff811f6bb2>] ? xfs_alloc_vextent+0x447/0x469 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687939] >>>> [<ffffffff811fd171>] xfs_bmbt_lookup_eq+0x1f/0x21 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688156] >>>> [<ffffffff811ffa88>] xfs_bmap_add_extent_delay_real+0x5b5/0xfec >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688378] >>>> [<ffffffff810f155b>] ? kmem_cache_alloc+0x87/0xf3 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688650] >>>> [<ffffffff81204c40>] ? xfs_bmbt_init_cursor+0x3f/0x107 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688867] >>>> [<ffffffff81201160>] xfs_bmapi_allocate+0x1f6/0x23a >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689084] >>>> [<ffffffff812185bd>] ? xfs_iext_bno_to_irec+0x95/0xb9 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689301] >>>> [<ffffffff81203414>] xfs_bmapi_write+0x32d/0x5a2 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689519] >>>> [<ffffffff811e99e4>] xfs_iomap_write_allocate+0x1a5/0x29f >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689797] >>>> [<ffffffff811df12a>] xfs_map_blocks+0x13e/0x1dd >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690016] >>>> [<ffffffff811dfbff>] xfs_vm_writepage+0x24e/0x410 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690233] >>>> [<ffffffff810bde1e>] __writepage+0x17/0x30 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690446] >>>> [<ffffffff810be6ed>] write_cache_pages+0x276/0x3c8 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690693] >>>> [<ffffffff810bde07>] ? set_page_dirty+0x60/0x60 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690908] >>>> [<ffffffff810be884>] generic_writepages+0x45/0x5c >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691123] >>>> [<ffffffff811defcb>] xfs_vm_writepages+0x4d/0x54 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691337] >>>> [<ffffffff810bf832>] do_writepages+0x21/0x2a >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691552] >>>> [<ffffffff811218f5>] writeback_single_inode+0x12a/0x2cc >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691800] >>>> [<ffffffff81121d92>] writeback_sb_inodes+0x174/0x215 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692016] >>>> [<ffffffff81122185>] __writeback_inodes_wb+0x78/0xb9 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692231] >>>> [<ffffffff811224b5>] wb_writeback+0x136/0x22a >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692444] >>>> [<ffffffff810becd1>] ? determine_dirtyable_memory+0x1d/0x26 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692692] >>>> [<ffffffff81122d1e>] wb_do_writeback+0x19c/0x1b7 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692907] >>>> [<ffffffff81122dc5>] bdi_writeback_thread+0x8c/0x20f >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693122] >>>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693336] >>>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693553] >>>> [<ffffffff8105911d>] kthread+0x82/0x8a >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693803] >>>> [<ffffffff81523c34>] kernel_thread_helper+0x4/0x10 >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694018] >>>> [<ffffffff8105909b>] ? kthread_worker_fn+0x13b/0x13b >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232] >>>> [<ffffffff81523c30>] ? gs_change+0xb/0xb >>>> >>>> >>>> On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness >>>> <mandell@pistoncloud.com> wrote: >>>>> >>>>> >>>>> We've been seeing random issues of apparent deadlocks. We are running >>>>> ceph 0.47 on kernel 3.2.18. OSDs are running on XFS file system. >>>>> mysqld (which ran into the particular problems in the attached kernel >>>>> log) is running on an RBD with XFS (mounted on a system which includes >>>>> OSDs). We have sync_fs, and gcc ver 4.5.3-r2. The mysqld process in >>>>> both instances returned an error to the calling process. >>>>> >>>>> Regards, >>>>> Mandell Degerness >>>> >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Possible deadlock condition 2012-06-20 22:34 ` Mandell Degerness @ 2012-06-20 22:42 ` Sage Weil 0 siblings, 0 replies; 6+ messages in thread From: Sage Weil @ 2012-06-20 22:42 UTC (permalink / raw) To: Mandell Degerness; +Cc: Dan Mick, ceph-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 9811 bytes --] On Wed, 20 Jun 2012, Mandell Degerness wrote: > The prior thread seems to refer to something fixed in 3.0.X, we are > running 3.2.18. Also, in answer to the previous question, we see the > error on systems running at 82% full and systems running at 5% full > disks. > > Anyone have any ideas about how to resolve the deadlock? Do we have > to configure mysql differently? I'm not sure that this is related to the previous problems. I would start from square one to diagnose: - When you see the hang, are there blocked osd I/O operations? cat /sys/kernel/debug/ceph/*/osdc - Maybe this is a memory deadlock? I'm guessing not (if you can log into the system), but it's worth checking. I think you should see another blocked task in that case. - This might be an XFS thing unrelated to the fact that we're running on RBD. If it's reproducible, can you try on ext4 instead? sage > > -Mandell > On Mon, Jun 18, 2012 at 4:34 PM, Dan Mick <dan.mick@inktank.com> wrote: > > I don't know enough to know if there's a connection, but I do note this > > prior thread that sounds kinda similar: > > > > http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/6574 > > > > > > > > On 06/18/2012 04:08 PM, Mandell Degerness wrote: > >> > >> None of the OSDs seem to be more than 82% full. I didn't think we were > >> running quite that close to the margin, but it is still far from > >> actually full. > >> > >> > >> On Mon, Jun 18, 2012 at 3:57 PM, Dan Mick<dan.mick@inktank.com> wrote: > >>> > >>> Does the xfs on the OSD have plenty of free space left, or could this be > >>> an > >>> allocation deadlock? > >>> > >>> > >>> On 06/18/2012 03:17 PM, Mandell Degerness wrote: > >>>> > >>>> > >>>> Here is, perhaps, a more useful traceback from a different run of > >>>> tests that we just ran into: > >>>> > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task > >>>> flush-254:0:29582 blocked for more than 120 seconds. > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681040] "echo 0> > >>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681458] flush-254:0 > >>>> D ffff880bd9ca2fc0 0 29582 2 0x00000000 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681740] > >>>> ffff88006e51d160 0000000000000046 0000000000000002 ffff88061b362040 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682173] > >>>> ffff88006e51d160 00000000000120c0 00000000000120c0 00000000000120c0 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682659] > >>>> ffff88006e51dfd8 00000000000120c0 00000000000120c0 ffff88006e51dfd8 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683088] Call Trace: > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683302] > >>>> [<ffffffff81520132>] schedule+0x5a/0x5c > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683514] > >>>> [<ffffffff815203e7>] schedule_timeout+0x36/0xe3 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683784] > >>>> [<ffffffff8101e0b2>] ? physflat_send_IPI_mask+0xe/0x10 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683999] > >>>> [<ffffffff8101a237>] ? native_smp_send_reschedule+0x46/0x48 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684219] > >>>> [<ffffffff811e0071>] ? list_move_tail+0x27/0x2c > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684432] > >>>> [<ffffffff81520d13>] __down_common+0x90/0xd4 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684708] > >>>> [<ffffffff811e1120>] ? _xfs_buf_find+0x17f/0x210 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684925] > >>>> [<ffffffff81520dca>] __down+0x1d/0x1f > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685139] > >>>> [<ffffffff8105db4e>] down+0x2d/0x3d > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685350] > >>>> [<ffffffff811e0f68>] xfs_buf_lock+0x76/0xaf > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685565] > >>>> [<ffffffff811e1120>] _xfs_buf_find+0x17f/0x210 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685836] > >>>> [<ffffffff811e13b6>] xfs_buf_get+0x2a/0x177 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686052] > >>>> [<ffffffff811e19f6>] xfs_buf_read+0x1f/0xca > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686270] > >>>> [<ffffffff8122a0b7>] xfs_trans_read_buf+0x205/0x308 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686490] > >>>> [<ffffffff81205e01>] xfs_btree_read_buf_block.clone.22+0x4f/0xa7 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687015] > >>>> [<ffffffff8122a3ee>] ? xfs_trans_log_buf+0xb2/0xc1 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687232] > >>>> [<ffffffff81205edd>] xfs_btree_lookup_get_block+0x84/0xac > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687449] > >>>> [<ffffffff81208e83>] xfs_btree_lookup+0x12b/0x3dc > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687721] > >>>> [<ffffffff811f6bb2>] ? xfs_alloc_vextent+0x447/0x469 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687939] > >>>> [<ffffffff811fd171>] xfs_bmbt_lookup_eq+0x1f/0x21 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688156] > >>>> [<ffffffff811ffa88>] xfs_bmap_add_extent_delay_real+0x5b5/0xfec > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688378] > >>>> [<ffffffff810f155b>] ? kmem_cache_alloc+0x87/0xf3 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688650] > >>>> [<ffffffff81204c40>] ? xfs_bmbt_init_cursor+0x3f/0x107 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688867] > >>>> [<ffffffff81201160>] xfs_bmapi_allocate+0x1f6/0x23a > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689084] > >>>> [<ffffffff812185bd>] ? xfs_iext_bno_to_irec+0x95/0xb9 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689301] > >>>> [<ffffffff81203414>] xfs_bmapi_write+0x32d/0x5a2 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689519] > >>>> [<ffffffff811e99e4>] xfs_iomap_write_allocate+0x1a5/0x29f > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689797] > >>>> [<ffffffff811df12a>] xfs_map_blocks+0x13e/0x1dd > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690016] > >>>> [<ffffffff811dfbff>] xfs_vm_writepage+0x24e/0x410 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690233] > >>>> [<ffffffff810bde1e>] __writepage+0x17/0x30 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690446] > >>>> [<ffffffff810be6ed>] write_cache_pages+0x276/0x3c8 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690693] > >>>> [<ffffffff810bde07>] ? set_page_dirty+0x60/0x60 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690908] > >>>> [<ffffffff810be884>] generic_writepages+0x45/0x5c > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691123] > >>>> [<ffffffff811defcb>] xfs_vm_writepages+0x4d/0x54 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691337] > >>>> [<ffffffff810bf832>] do_writepages+0x21/0x2a > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691552] > >>>> [<ffffffff811218f5>] writeback_single_inode+0x12a/0x2cc > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691800] > >>>> [<ffffffff81121d92>] writeback_sb_inodes+0x174/0x215 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692016] > >>>> [<ffffffff81122185>] __writeback_inodes_wb+0x78/0xb9 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692231] > >>>> [<ffffffff811224b5>] wb_writeback+0x136/0x22a > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692444] > >>>> [<ffffffff810becd1>] ? determine_dirtyable_memory+0x1d/0x26 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692692] > >>>> [<ffffffff81122d1e>] wb_do_writeback+0x19c/0x1b7 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692907] > >>>> [<ffffffff81122dc5>] bdi_writeback_thread+0x8c/0x20f > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693122] > >>>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693336] > >>>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693553] > >>>> [<ffffffff8105911d>] kthread+0x82/0x8a > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693803] > >>>> [<ffffffff81523c34>] kernel_thread_helper+0x4/0x10 > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694018] > >>>> [<ffffffff8105909b>] ? kthread_worker_fn+0x13b/0x13b > >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232] > >>>> [<ffffffff81523c30>] ? gs_change+0xb/0xb > >>>> > >>>> > >>>> On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness > >>>> <mandell@pistoncloud.com> wrote: > >>>>> > >>>>> > >>>>> We've been seeing random issues of apparent deadlocks. We are running > >>>>> ceph 0.47 on kernel 3.2.18. OSDs are running on XFS file system. > >>>>> mysqld (which ran into the particular problems in the attached kernel > >>>>> log) is running on an RBD with XFS (mounted on a system which includes > >>>>> OSDs). We have sync_fs, and gcc ver 4.5.3-r2. The mysqld process in > >>>>> both instances returned an error to the calling process. > >>>>> > >>>>> Regards, > >>>>> Mandell Degerness > >>>> > >>>> > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>> the body of a message to majordomo@vger.kernel.org > >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-06-20 22:42 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CA+jddaPMp5R0adi2sLVUWeFytDzfOjAeryXL+jPjGAk8kKqafg@mail.gmail.com>
2012-06-18 22:17 ` Possible deadlock condition Mandell Degerness
2012-06-18 22:57 ` Dan Mick
2012-06-18 23:08 ` Mandell Degerness
2012-06-18 23:34 ` Dan Mick
2012-06-20 22:34 ` Mandell Degerness
2012-06-20 22:42 ` Sage Weil
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.