* Re: Possible deadlock condition
[not found] <CA+jddaPMp5R0adi2sLVUWeFytDzfOjAeryXL+jPjGAk8kKqafg@mail.gmail.com>
@ 2012-06-18 22:17 ` Mandell Degerness
2012-06-18 22:57 ` Dan Mick
0 siblings, 1 reply; 6+ messages in thread
From: Mandell Degerness @ 2012-06-18 22:17 UTC (permalink / raw)
To: ceph-devel
Here is, perhaps, a more useful traceback from a different run of
tests that we just ran into:
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task
flush-254:0:29582 blocked for more than 120 seconds.
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681040] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681458] flush-254:0
D ffff880bd9ca2fc0 0 29582 2 0x00000000
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681740]
ffff88006e51d160 0000000000000046 0000000000000002 ffff88061b362040
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682173]
ffff88006e51d160 00000000000120c0 00000000000120c0 00000000000120c0
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682659]
ffff88006e51dfd8 00000000000120c0 00000000000120c0 ffff88006e51dfd8
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683088] Call Trace:
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683302]
[<ffffffff81520132>] schedule+0x5a/0x5c
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683514]
[<ffffffff815203e7>] schedule_timeout+0x36/0xe3
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683784]
[<ffffffff8101e0b2>] ? physflat_send_IPI_mask+0xe/0x10
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683999]
[<ffffffff8101a237>] ? native_smp_send_reschedule+0x46/0x48
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684219]
[<ffffffff811e0071>] ? list_move_tail+0x27/0x2c
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684432]
[<ffffffff81520d13>] __down_common+0x90/0xd4
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684708]
[<ffffffff811e1120>] ? _xfs_buf_find+0x17f/0x210
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684925]
[<ffffffff81520dca>] __down+0x1d/0x1f
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685139]
[<ffffffff8105db4e>] down+0x2d/0x3d
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685350]
[<ffffffff811e0f68>] xfs_buf_lock+0x76/0xaf
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685565]
[<ffffffff811e1120>] _xfs_buf_find+0x17f/0x210
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685836]
[<ffffffff811e13b6>] xfs_buf_get+0x2a/0x177
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686052]
[<ffffffff811e19f6>] xfs_buf_read+0x1f/0xca
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686270]
[<ffffffff8122a0b7>] xfs_trans_read_buf+0x205/0x308
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686490]
[<ffffffff81205e01>] xfs_btree_read_buf_block.clone.22+0x4f/0xa7
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687015]
[<ffffffff8122a3ee>] ? xfs_trans_log_buf+0xb2/0xc1
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687232]
[<ffffffff81205edd>] xfs_btree_lookup_get_block+0x84/0xac
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687449]
[<ffffffff81208e83>] xfs_btree_lookup+0x12b/0x3dc
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687721]
[<ffffffff811f6bb2>] ? xfs_alloc_vextent+0x447/0x469
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687939]
[<ffffffff811fd171>] xfs_bmbt_lookup_eq+0x1f/0x21
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688156]
[<ffffffff811ffa88>] xfs_bmap_add_extent_delay_real+0x5b5/0xfec
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688378]
[<ffffffff810f155b>] ? kmem_cache_alloc+0x87/0xf3
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688650]
[<ffffffff81204c40>] ? xfs_bmbt_init_cursor+0x3f/0x107
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688867]
[<ffffffff81201160>] xfs_bmapi_allocate+0x1f6/0x23a
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689084]
[<ffffffff812185bd>] ? xfs_iext_bno_to_irec+0x95/0xb9
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689301]
[<ffffffff81203414>] xfs_bmapi_write+0x32d/0x5a2
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689519]
[<ffffffff811e99e4>] xfs_iomap_write_allocate+0x1a5/0x29f
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689797]
[<ffffffff811df12a>] xfs_map_blocks+0x13e/0x1dd
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690016]
[<ffffffff811dfbff>] xfs_vm_writepage+0x24e/0x410
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690233]
[<ffffffff810bde1e>] __writepage+0x17/0x30
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690446]
[<ffffffff810be6ed>] write_cache_pages+0x276/0x3c8
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690693]
[<ffffffff810bde07>] ? set_page_dirty+0x60/0x60
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690908]
[<ffffffff810be884>] generic_writepages+0x45/0x5c
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691123]
[<ffffffff811defcb>] xfs_vm_writepages+0x4d/0x54
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691337]
[<ffffffff810bf832>] do_writepages+0x21/0x2a
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691552]
[<ffffffff811218f5>] writeback_single_inode+0x12a/0x2cc
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691800]
[<ffffffff81121d92>] writeback_sb_inodes+0x174/0x215
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692016]
[<ffffffff81122185>] __writeback_inodes_wb+0x78/0xb9
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692231]
[<ffffffff811224b5>] wb_writeback+0x136/0x22a
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692444]
[<ffffffff810becd1>] ? determine_dirtyable_memory+0x1d/0x26
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692692]
[<ffffffff81122d1e>] wb_do_writeback+0x19c/0x1b7
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692907]
[<ffffffff81122dc5>] bdi_writeback_thread+0x8c/0x20f
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693122]
[<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693336]
[<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693553]
[<ffffffff8105911d>] kthread+0x82/0x8a
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693803]
[<ffffffff81523c34>] kernel_thread_helper+0x4/0x10
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694018]
[<ffffffff8105909b>] ? kthread_worker_fn+0x13b/0x13b
Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232]
[<ffffffff81523c30>] ? gs_change+0xb/0xb
On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness
<mandell@pistoncloud.com> wrote:
> We've been seeing random issues of apparent deadlocks. We are running
> ceph 0.47 on kernel 3.2.18. OSDs are running on XFS file system.
> mysqld (which ran into the particular problems in the attached kernel
> log) is running on an RBD with XFS (mounted on a system which includes
> OSDs). We have sync_fs, and gcc ver 4.5.3-r2. The mysqld process in
> both instances returned an error to the calling process.
>
> Regards,
> Mandell Degerness
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Possible deadlock condition
2012-06-18 22:17 ` Possible deadlock condition Mandell Degerness
@ 2012-06-18 22:57 ` Dan Mick
2012-06-18 23:08 ` Mandell Degerness
0 siblings, 1 reply; 6+ messages in thread
From: Dan Mick @ 2012-06-18 22:57 UTC (permalink / raw)
To: Mandell Degerness; +Cc: ceph-devel
Does the xfs on the OSD have plenty of free space left, or could this be
an allocation deadlock?
On 06/18/2012 03:17 PM, Mandell Degerness wrote:
> Here is, perhaps, a more useful traceback from a different run of
> tests that we just ran into:
>
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task
> flush-254:0:29582 blocked for more than 120 seconds.
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681040] "echo 0>
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681458] flush-254:0
> D ffff880bd9ca2fc0 0 29582 2 0x00000000
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681740]
> ffff88006e51d160 0000000000000046 0000000000000002 ffff88061b362040
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682173]
> ffff88006e51d160 00000000000120c0 00000000000120c0 00000000000120c0
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682659]
> ffff88006e51dfd8 00000000000120c0 00000000000120c0 ffff88006e51dfd8
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683088] Call Trace:
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683302]
> [<ffffffff81520132>] schedule+0x5a/0x5c
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683514]
> [<ffffffff815203e7>] schedule_timeout+0x36/0xe3
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683784]
> [<ffffffff8101e0b2>] ? physflat_send_IPI_mask+0xe/0x10
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683999]
> [<ffffffff8101a237>] ? native_smp_send_reschedule+0x46/0x48
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684219]
> [<ffffffff811e0071>] ? list_move_tail+0x27/0x2c
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684432]
> [<ffffffff81520d13>] __down_common+0x90/0xd4
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684708]
> [<ffffffff811e1120>] ? _xfs_buf_find+0x17f/0x210
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684925]
> [<ffffffff81520dca>] __down+0x1d/0x1f
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685139]
> [<ffffffff8105db4e>] down+0x2d/0x3d
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685350]
> [<ffffffff811e0f68>] xfs_buf_lock+0x76/0xaf
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685565]
> [<ffffffff811e1120>] _xfs_buf_find+0x17f/0x210
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685836]
> [<ffffffff811e13b6>] xfs_buf_get+0x2a/0x177
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686052]
> [<ffffffff811e19f6>] xfs_buf_read+0x1f/0xca
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686270]
> [<ffffffff8122a0b7>] xfs_trans_read_buf+0x205/0x308
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686490]
> [<ffffffff81205e01>] xfs_btree_read_buf_block.clone.22+0x4f/0xa7
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687015]
> [<ffffffff8122a3ee>] ? xfs_trans_log_buf+0xb2/0xc1
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687232]
> [<ffffffff81205edd>] xfs_btree_lookup_get_block+0x84/0xac
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687449]
> [<ffffffff81208e83>] xfs_btree_lookup+0x12b/0x3dc
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687721]
> [<ffffffff811f6bb2>] ? xfs_alloc_vextent+0x447/0x469
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687939]
> [<ffffffff811fd171>] xfs_bmbt_lookup_eq+0x1f/0x21
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688156]
> [<ffffffff811ffa88>] xfs_bmap_add_extent_delay_real+0x5b5/0xfec
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688378]
> [<ffffffff810f155b>] ? kmem_cache_alloc+0x87/0xf3
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688650]
> [<ffffffff81204c40>] ? xfs_bmbt_init_cursor+0x3f/0x107
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688867]
> [<ffffffff81201160>] xfs_bmapi_allocate+0x1f6/0x23a
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689084]
> [<ffffffff812185bd>] ? xfs_iext_bno_to_irec+0x95/0xb9
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689301]
> [<ffffffff81203414>] xfs_bmapi_write+0x32d/0x5a2
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689519]
> [<ffffffff811e99e4>] xfs_iomap_write_allocate+0x1a5/0x29f
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689797]
> [<ffffffff811df12a>] xfs_map_blocks+0x13e/0x1dd
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690016]
> [<ffffffff811dfbff>] xfs_vm_writepage+0x24e/0x410
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690233]
> [<ffffffff810bde1e>] __writepage+0x17/0x30
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690446]
> [<ffffffff810be6ed>] write_cache_pages+0x276/0x3c8
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690693]
> [<ffffffff810bde07>] ? set_page_dirty+0x60/0x60
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690908]
> [<ffffffff810be884>] generic_writepages+0x45/0x5c
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691123]
> [<ffffffff811defcb>] xfs_vm_writepages+0x4d/0x54
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691337]
> [<ffffffff810bf832>] do_writepages+0x21/0x2a
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691552]
> [<ffffffff811218f5>] writeback_single_inode+0x12a/0x2cc
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691800]
> [<ffffffff81121d92>] writeback_sb_inodes+0x174/0x215
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692016]
> [<ffffffff81122185>] __writeback_inodes_wb+0x78/0xb9
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692231]
> [<ffffffff811224b5>] wb_writeback+0x136/0x22a
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692444]
> [<ffffffff810becd1>] ? determine_dirtyable_memory+0x1d/0x26
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692692]
> [<ffffffff81122d1e>] wb_do_writeback+0x19c/0x1b7
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692907]
> [<ffffffff81122dc5>] bdi_writeback_thread+0x8c/0x20f
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693122]
> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693336]
> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693553]
> [<ffffffff8105911d>] kthread+0x82/0x8a
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693803]
> [<ffffffff81523c34>] kernel_thread_helper+0x4/0x10
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694018]
> [<ffffffff8105909b>] ? kthread_worker_fn+0x13b/0x13b
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232]
> [<ffffffff81523c30>] ? gs_change+0xb/0xb
>
>
> On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness
> <mandell@pistoncloud.com> wrote:
>> We've been seeing random issues of apparent deadlocks. We are running
>> ceph 0.47 on kernel 3.2.18. OSDs are running on XFS file system.
>> mysqld (which ran into the particular problems in the attached kernel
>> log) is running on an RBD with XFS (mounted on a system which includes
>> OSDs). We have sync_fs, and gcc ver 4.5.3-r2. The mysqld process in
>> both instances returned an error to the calling process.
>>
>> Regards,
>> Mandell Degerness
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Possible deadlock condition
2012-06-18 22:57 ` Dan Mick
@ 2012-06-18 23:08 ` Mandell Degerness
2012-06-18 23:34 ` Dan Mick
0 siblings, 1 reply; 6+ messages in thread
From: Mandell Degerness @ 2012-06-18 23:08 UTC (permalink / raw)
To: Dan Mick; +Cc: ceph-devel
None of the OSDs seem to be more than 82% full. I didn't think we were
running quite that close to the margin, but it is still far from
actually full.
On Mon, Jun 18, 2012 at 3:57 PM, Dan Mick <dan.mick@inktank.com> wrote:
> Does the xfs on the OSD have plenty of free space left, or could this be an
> allocation deadlock?
>
>
> On 06/18/2012 03:17 PM, Mandell Degerness wrote:
>>
>> Here is, perhaps, a more useful traceback from a different run of
>> tests that we just ran into:
>>
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task
>> flush-254:0:29582 blocked for more than 120 seconds.
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681040] "echo 0>
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681458] flush-254:0
>> D ffff880bd9ca2fc0 0 29582 2 0x00000000
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681740]
>> ffff88006e51d160 0000000000000046 0000000000000002 ffff88061b362040
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682173]
>> ffff88006e51d160 00000000000120c0 00000000000120c0 00000000000120c0
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682659]
>> ffff88006e51dfd8 00000000000120c0 00000000000120c0 ffff88006e51dfd8
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683088] Call Trace:
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683302]
>> [<ffffffff81520132>] schedule+0x5a/0x5c
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683514]
>> [<ffffffff815203e7>] schedule_timeout+0x36/0xe3
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683784]
>> [<ffffffff8101e0b2>] ? physflat_send_IPI_mask+0xe/0x10
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683999]
>> [<ffffffff8101a237>] ? native_smp_send_reschedule+0x46/0x48
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684219]
>> [<ffffffff811e0071>] ? list_move_tail+0x27/0x2c
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684432]
>> [<ffffffff81520d13>] __down_common+0x90/0xd4
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684708]
>> [<ffffffff811e1120>] ? _xfs_buf_find+0x17f/0x210
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684925]
>> [<ffffffff81520dca>] __down+0x1d/0x1f
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685139]
>> [<ffffffff8105db4e>] down+0x2d/0x3d
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685350]
>> [<ffffffff811e0f68>] xfs_buf_lock+0x76/0xaf
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685565]
>> [<ffffffff811e1120>] _xfs_buf_find+0x17f/0x210
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685836]
>> [<ffffffff811e13b6>] xfs_buf_get+0x2a/0x177
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686052]
>> [<ffffffff811e19f6>] xfs_buf_read+0x1f/0xca
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686270]
>> [<ffffffff8122a0b7>] xfs_trans_read_buf+0x205/0x308
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686490]
>> [<ffffffff81205e01>] xfs_btree_read_buf_block.clone.22+0x4f/0xa7
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687015]
>> [<ffffffff8122a3ee>] ? xfs_trans_log_buf+0xb2/0xc1
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687232]
>> [<ffffffff81205edd>] xfs_btree_lookup_get_block+0x84/0xac
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687449]
>> [<ffffffff81208e83>] xfs_btree_lookup+0x12b/0x3dc
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687721]
>> [<ffffffff811f6bb2>] ? xfs_alloc_vextent+0x447/0x469
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687939]
>> [<ffffffff811fd171>] xfs_bmbt_lookup_eq+0x1f/0x21
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688156]
>> [<ffffffff811ffa88>] xfs_bmap_add_extent_delay_real+0x5b5/0xfec
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688378]
>> [<ffffffff810f155b>] ? kmem_cache_alloc+0x87/0xf3
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688650]
>> [<ffffffff81204c40>] ? xfs_bmbt_init_cursor+0x3f/0x107
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688867]
>> [<ffffffff81201160>] xfs_bmapi_allocate+0x1f6/0x23a
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689084]
>> [<ffffffff812185bd>] ? xfs_iext_bno_to_irec+0x95/0xb9
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689301]
>> [<ffffffff81203414>] xfs_bmapi_write+0x32d/0x5a2
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689519]
>> [<ffffffff811e99e4>] xfs_iomap_write_allocate+0x1a5/0x29f
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689797]
>> [<ffffffff811df12a>] xfs_map_blocks+0x13e/0x1dd
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690016]
>> [<ffffffff811dfbff>] xfs_vm_writepage+0x24e/0x410
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690233]
>> [<ffffffff810bde1e>] __writepage+0x17/0x30
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690446]
>> [<ffffffff810be6ed>] write_cache_pages+0x276/0x3c8
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690693]
>> [<ffffffff810bde07>] ? set_page_dirty+0x60/0x60
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690908]
>> [<ffffffff810be884>] generic_writepages+0x45/0x5c
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691123]
>> [<ffffffff811defcb>] xfs_vm_writepages+0x4d/0x54
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691337]
>> [<ffffffff810bf832>] do_writepages+0x21/0x2a
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691552]
>> [<ffffffff811218f5>] writeback_single_inode+0x12a/0x2cc
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691800]
>> [<ffffffff81121d92>] writeback_sb_inodes+0x174/0x215
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692016]
>> [<ffffffff81122185>] __writeback_inodes_wb+0x78/0xb9
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692231]
>> [<ffffffff811224b5>] wb_writeback+0x136/0x22a
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692444]
>> [<ffffffff810becd1>] ? determine_dirtyable_memory+0x1d/0x26
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692692]
>> [<ffffffff81122d1e>] wb_do_writeback+0x19c/0x1b7
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692907]
>> [<ffffffff81122dc5>] bdi_writeback_thread+0x8c/0x20f
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693122]
>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693336]
>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693553]
>> [<ffffffff8105911d>] kthread+0x82/0x8a
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693803]
>> [<ffffffff81523c34>] kernel_thread_helper+0x4/0x10
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694018]
>> [<ffffffff8105909b>] ? kthread_worker_fn+0x13b/0x13b
>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232]
>> [<ffffffff81523c30>] ? gs_change+0xb/0xb
>>
>>
>> On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness
>> <mandell@pistoncloud.com> wrote:
>>>
>>> We've been seeing random issues of apparent deadlocks. We are running
>>> ceph 0.47 on kernel 3.2.18. OSDs are running on XFS file system.
>>> mysqld (which ran into the particular problems in the attached kernel
>>> log) is running on an RBD with XFS (mounted on a system which includes
>>> OSDs). We have sync_fs, and gcc ver 4.5.3-r2. The mysqld process in
>>> both instances returned an error to the calling process.
>>>
>>> Regards,
>>> Mandell Degerness
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Possible deadlock condition
2012-06-18 23:08 ` Mandell Degerness
@ 2012-06-18 23:34 ` Dan Mick
2012-06-20 22:34 ` Mandell Degerness
0 siblings, 1 reply; 6+ messages in thread
From: Dan Mick @ 2012-06-18 23:34 UTC (permalink / raw)
To: Mandell Degerness; +Cc: ceph-devel
I don't know enough to know if there's a connection, but I do note this
prior thread that sounds kinda similar:
http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/6574
On 06/18/2012 04:08 PM, Mandell Degerness wrote:
> None of the OSDs seem to be more than 82% full. I didn't think we were
> running quite that close to the margin, but it is still far from
> actually full.
>
>
> On Mon, Jun 18, 2012 at 3:57 PM, Dan Mick<dan.mick@inktank.com> wrote:
>> Does the xfs on the OSD have plenty of free space left, or could this be an
>> allocation deadlock?
>>
>>
>> On 06/18/2012 03:17 PM, Mandell Degerness wrote:
>>>
>>> Here is, perhaps, a more useful traceback from a different run of
>>> tests that we just ran into:
>>>
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task
>>> flush-254:0:29582 blocked for more than 120 seconds.
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681040] "echo 0>
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681458] flush-254:0
>>> D ffff880bd9ca2fc0 0 29582 2 0x00000000
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681740]
>>> ffff88006e51d160 0000000000000046 0000000000000002 ffff88061b362040
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682173]
>>> ffff88006e51d160 00000000000120c0 00000000000120c0 00000000000120c0
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682659]
>>> ffff88006e51dfd8 00000000000120c0 00000000000120c0 ffff88006e51dfd8
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683088] Call Trace:
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683302]
>>> [<ffffffff81520132>] schedule+0x5a/0x5c
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683514]
>>> [<ffffffff815203e7>] schedule_timeout+0x36/0xe3
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683784]
>>> [<ffffffff8101e0b2>] ? physflat_send_IPI_mask+0xe/0x10
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683999]
>>> [<ffffffff8101a237>] ? native_smp_send_reschedule+0x46/0x48
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684219]
>>> [<ffffffff811e0071>] ? list_move_tail+0x27/0x2c
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684432]
>>> [<ffffffff81520d13>] __down_common+0x90/0xd4
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684708]
>>> [<ffffffff811e1120>] ? _xfs_buf_find+0x17f/0x210
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684925]
>>> [<ffffffff81520dca>] __down+0x1d/0x1f
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685139]
>>> [<ffffffff8105db4e>] down+0x2d/0x3d
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685350]
>>> [<ffffffff811e0f68>] xfs_buf_lock+0x76/0xaf
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685565]
>>> [<ffffffff811e1120>] _xfs_buf_find+0x17f/0x210
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685836]
>>> [<ffffffff811e13b6>] xfs_buf_get+0x2a/0x177
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686052]
>>> [<ffffffff811e19f6>] xfs_buf_read+0x1f/0xca
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686270]
>>> [<ffffffff8122a0b7>] xfs_trans_read_buf+0x205/0x308
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686490]
>>> [<ffffffff81205e01>] xfs_btree_read_buf_block.clone.22+0x4f/0xa7
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687015]
>>> [<ffffffff8122a3ee>] ? xfs_trans_log_buf+0xb2/0xc1
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687232]
>>> [<ffffffff81205edd>] xfs_btree_lookup_get_block+0x84/0xac
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687449]
>>> [<ffffffff81208e83>] xfs_btree_lookup+0x12b/0x3dc
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687721]
>>> [<ffffffff811f6bb2>] ? xfs_alloc_vextent+0x447/0x469
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687939]
>>> [<ffffffff811fd171>] xfs_bmbt_lookup_eq+0x1f/0x21
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688156]
>>> [<ffffffff811ffa88>] xfs_bmap_add_extent_delay_real+0x5b5/0xfec
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688378]
>>> [<ffffffff810f155b>] ? kmem_cache_alloc+0x87/0xf3
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688650]
>>> [<ffffffff81204c40>] ? xfs_bmbt_init_cursor+0x3f/0x107
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688867]
>>> [<ffffffff81201160>] xfs_bmapi_allocate+0x1f6/0x23a
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689084]
>>> [<ffffffff812185bd>] ? xfs_iext_bno_to_irec+0x95/0xb9
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689301]
>>> [<ffffffff81203414>] xfs_bmapi_write+0x32d/0x5a2
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689519]
>>> [<ffffffff811e99e4>] xfs_iomap_write_allocate+0x1a5/0x29f
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689797]
>>> [<ffffffff811df12a>] xfs_map_blocks+0x13e/0x1dd
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690016]
>>> [<ffffffff811dfbff>] xfs_vm_writepage+0x24e/0x410
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690233]
>>> [<ffffffff810bde1e>] __writepage+0x17/0x30
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690446]
>>> [<ffffffff810be6ed>] write_cache_pages+0x276/0x3c8
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690693]
>>> [<ffffffff810bde07>] ? set_page_dirty+0x60/0x60
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690908]
>>> [<ffffffff810be884>] generic_writepages+0x45/0x5c
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691123]
>>> [<ffffffff811defcb>] xfs_vm_writepages+0x4d/0x54
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691337]
>>> [<ffffffff810bf832>] do_writepages+0x21/0x2a
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691552]
>>> [<ffffffff811218f5>] writeback_single_inode+0x12a/0x2cc
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691800]
>>> [<ffffffff81121d92>] writeback_sb_inodes+0x174/0x215
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692016]
>>> [<ffffffff81122185>] __writeback_inodes_wb+0x78/0xb9
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692231]
>>> [<ffffffff811224b5>] wb_writeback+0x136/0x22a
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692444]
>>> [<ffffffff810becd1>] ? determine_dirtyable_memory+0x1d/0x26
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692692]
>>> [<ffffffff81122d1e>] wb_do_writeback+0x19c/0x1b7
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692907]
>>> [<ffffffff81122dc5>] bdi_writeback_thread+0x8c/0x20f
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693122]
>>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693336]
>>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693553]
>>> [<ffffffff8105911d>] kthread+0x82/0x8a
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693803]
>>> [<ffffffff81523c34>] kernel_thread_helper+0x4/0x10
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694018]
>>> [<ffffffff8105909b>] ? kthread_worker_fn+0x13b/0x13b
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232]
>>> [<ffffffff81523c30>] ? gs_change+0xb/0xb
>>>
>>>
>>> On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness
>>> <mandell@pistoncloud.com> wrote:
>>>>
>>>> We've been seeing random issues of apparent deadlocks. We are running
>>>> ceph 0.47 on kernel 3.2.18. OSDs are running on XFS file system.
>>>> mysqld (which ran into the particular problems in the attached kernel
>>>> log) is running on an RBD with XFS (mounted on a system which includes
>>>> OSDs). We have sync_fs, and gcc ver 4.5.3-r2. The mysqld process in
>>>> both instances returned an error to the calling process.
>>>>
>>>> Regards,
>>>> Mandell Degerness
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Possible deadlock condition
2012-06-18 23:34 ` Dan Mick
@ 2012-06-20 22:34 ` Mandell Degerness
2012-06-20 22:42 ` Sage Weil
0 siblings, 1 reply; 6+ messages in thread
From: Mandell Degerness @ 2012-06-20 22:34 UTC (permalink / raw)
To: Dan Mick; +Cc: ceph-devel
The prior thread seems to refer to something fixed in 3.0.X, we are
running 3.2.18. Also, in answer to the previous question, we see the
error on systems running at 82% full and systems running at 5% full
disks.
Anyone have any ideas about how to resolve the deadlock? Do we have
to configure mysql differently?
-Mandell
On Mon, Jun 18, 2012 at 4:34 PM, Dan Mick <dan.mick@inktank.com> wrote:
> I don't know enough to know if there's a connection, but I do note this
> prior thread that sounds kinda similar:
>
> http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/6574
>
>
>
> On 06/18/2012 04:08 PM, Mandell Degerness wrote:
>>
>> None of the OSDs seem to be more than 82% full. I didn't think we were
>> running quite that close to the margin, but it is still far from
>> actually full.
>>
>>
>> On Mon, Jun 18, 2012 at 3:57 PM, Dan Mick<dan.mick@inktank.com> wrote:
>>>
>>> Does the xfs on the OSD have plenty of free space left, or could this be
>>> an
>>> allocation deadlock?
>>>
>>>
>>> On 06/18/2012 03:17 PM, Mandell Degerness wrote:
>>>>
>>>>
>>>> Here is, perhaps, a more useful traceback from a different run of
>>>> tests that we just ran into:
>>>>
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task
>>>> flush-254:0:29582 blocked for more than 120 seconds.
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681040] "echo 0>
>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681458] flush-254:0
>>>> D ffff880bd9ca2fc0 0 29582 2 0x00000000
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681740]
>>>> ffff88006e51d160 0000000000000046 0000000000000002 ffff88061b362040
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682173]
>>>> ffff88006e51d160 00000000000120c0 00000000000120c0 00000000000120c0
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682659]
>>>> ffff88006e51dfd8 00000000000120c0 00000000000120c0 ffff88006e51dfd8
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683088] Call Trace:
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683302]
>>>> [<ffffffff81520132>] schedule+0x5a/0x5c
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683514]
>>>> [<ffffffff815203e7>] schedule_timeout+0x36/0xe3
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683784]
>>>> [<ffffffff8101e0b2>] ? physflat_send_IPI_mask+0xe/0x10
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683999]
>>>> [<ffffffff8101a237>] ? native_smp_send_reschedule+0x46/0x48
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684219]
>>>> [<ffffffff811e0071>] ? list_move_tail+0x27/0x2c
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684432]
>>>> [<ffffffff81520d13>] __down_common+0x90/0xd4
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684708]
>>>> [<ffffffff811e1120>] ? _xfs_buf_find+0x17f/0x210
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684925]
>>>> [<ffffffff81520dca>] __down+0x1d/0x1f
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685139]
>>>> [<ffffffff8105db4e>] down+0x2d/0x3d
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685350]
>>>> [<ffffffff811e0f68>] xfs_buf_lock+0x76/0xaf
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685565]
>>>> [<ffffffff811e1120>] _xfs_buf_find+0x17f/0x210
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685836]
>>>> [<ffffffff811e13b6>] xfs_buf_get+0x2a/0x177
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686052]
>>>> [<ffffffff811e19f6>] xfs_buf_read+0x1f/0xca
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686270]
>>>> [<ffffffff8122a0b7>] xfs_trans_read_buf+0x205/0x308
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686490]
>>>> [<ffffffff81205e01>] xfs_btree_read_buf_block.clone.22+0x4f/0xa7
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687015]
>>>> [<ffffffff8122a3ee>] ? xfs_trans_log_buf+0xb2/0xc1
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687232]
>>>> [<ffffffff81205edd>] xfs_btree_lookup_get_block+0x84/0xac
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687449]
>>>> [<ffffffff81208e83>] xfs_btree_lookup+0x12b/0x3dc
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687721]
>>>> [<ffffffff811f6bb2>] ? xfs_alloc_vextent+0x447/0x469
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687939]
>>>> [<ffffffff811fd171>] xfs_bmbt_lookup_eq+0x1f/0x21
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688156]
>>>> [<ffffffff811ffa88>] xfs_bmap_add_extent_delay_real+0x5b5/0xfec
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688378]
>>>> [<ffffffff810f155b>] ? kmem_cache_alloc+0x87/0xf3
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688650]
>>>> [<ffffffff81204c40>] ? xfs_bmbt_init_cursor+0x3f/0x107
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688867]
>>>> [<ffffffff81201160>] xfs_bmapi_allocate+0x1f6/0x23a
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689084]
>>>> [<ffffffff812185bd>] ? xfs_iext_bno_to_irec+0x95/0xb9
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689301]
>>>> [<ffffffff81203414>] xfs_bmapi_write+0x32d/0x5a2
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689519]
>>>> [<ffffffff811e99e4>] xfs_iomap_write_allocate+0x1a5/0x29f
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689797]
>>>> [<ffffffff811df12a>] xfs_map_blocks+0x13e/0x1dd
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690016]
>>>> [<ffffffff811dfbff>] xfs_vm_writepage+0x24e/0x410
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690233]
>>>> [<ffffffff810bde1e>] __writepage+0x17/0x30
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690446]
>>>> [<ffffffff810be6ed>] write_cache_pages+0x276/0x3c8
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690693]
>>>> [<ffffffff810bde07>] ? set_page_dirty+0x60/0x60
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690908]
>>>> [<ffffffff810be884>] generic_writepages+0x45/0x5c
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691123]
>>>> [<ffffffff811defcb>] xfs_vm_writepages+0x4d/0x54
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691337]
>>>> [<ffffffff810bf832>] do_writepages+0x21/0x2a
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691552]
>>>> [<ffffffff811218f5>] writeback_single_inode+0x12a/0x2cc
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691800]
>>>> [<ffffffff81121d92>] writeback_sb_inodes+0x174/0x215
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692016]
>>>> [<ffffffff81122185>] __writeback_inodes_wb+0x78/0xb9
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692231]
>>>> [<ffffffff811224b5>] wb_writeback+0x136/0x22a
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692444]
>>>> [<ffffffff810becd1>] ? determine_dirtyable_memory+0x1d/0x26
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692692]
>>>> [<ffffffff81122d1e>] wb_do_writeback+0x19c/0x1b7
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692907]
>>>> [<ffffffff81122dc5>] bdi_writeback_thread+0x8c/0x20f
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693122]
>>>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693336]
>>>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693553]
>>>> [<ffffffff8105911d>] kthread+0x82/0x8a
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693803]
>>>> [<ffffffff81523c34>] kernel_thread_helper+0x4/0x10
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694018]
>>>> [<ffffffff8105909b>] ? kthread_worker_fn+0x13b/0x13b
>>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232]
>>>> [<ffffffff81523c30>] ? gs_change+0xb/0xb
>>>>
>>>>
>>>> On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness
>>>> <mandell@pistoncloud.com> wrote:
>>>>>
>>>>>
>>>>> We've been seeing random issues of apparent deadlocks. We are running
>>>>> ceph 0.47 on kernel 3.2.18. OSDs are running on XFS file system.
>>>>> mysqld (which ran into the particular problems in the attached kernel
>>>>> log) is running on an RBD with XFS (mounted on a system which includes
>>>>> OSDs). We have sync_fs, and gcc ver 4.5.3-r2. The mysqld process in
>>>>> both instances returned an error to the calling process.
>>>>>
>>>>> Regards,
>>>>> Mandell Degerness
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Possible deadlock condition
2012-06-20 22:34 ` Mandell Degerness
@ 2012-06-20 22:42 ` Sage Weil
0 siblings, 0 replies; 6+ messages in thread
From: Sage Weil @ 2012-06-20 22:42 UTC (permalink / raw)
To: Mandell Degerness; +Cc: Dan Mick, ceph-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 9811 bytes --]
On Wed, 20 Jun 2012, Mandell Degerness wrote:
> The prior thread seems to refer to something fixed in 3.0.X, we are
> running 3.2.18. Also, in answer to the previous question, we see the
> error on systems running at 82% full and systems running at 5% full
> disks.
>
> Anyone have any ideas about how to resolve the deadlock? Do we have
> to configure mysql differently?
I'm not sure that this is related to the previous problems. I would start
from square one to diagnose:
- When you see the hang, are there blocked osd I/O operations?
cat /sys/kernel/debug/ceph/*/osdc
- Maybe this is a memory deadlock? I'm guessing not (if you can
log into the system), but it's worth checking. I think you should see
another blocked task in that case.
- This might be an XFS thing unrelated to the fact that we're running on
RBD. If it's reproducible, can you try on ext4 instead?
sage
>
> -Mandell
> On Mon, Jun 18, 2012 at 4:34 PM, Dan Mick <dan.mick@inktank.com> wrote:
> > I don't know enough to know if there's a connection, but I do note this
> > prior thread that sounds kinda similar:
> >
> > http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/6574
> >
> >
> >
> > On 06/18/2012 04:08 PM, Mandell Degerness wrote:
> >>
> >> None of the OSDs seem to be more than 82% full. I didn't think we were
> >> running quite that close to the margin, but it is still far from
> >> actually full.
> >>
> >>
> >> On Mon, Jun 18, 2012 at 3:57 PM, Dan Mick<dan.mick@inktank.com> wrote:
> >>>
> >>> Does the xfs on the OSD have plenty of free space left, or could this be
> >>> an
> >>> allocation deadlock?
> >>>
> >>>
> >>> On 06/18/2012 03:17 PM, Mandell Degerness wrote:
> >>>>
> >>>>
> >>>> Here is, perhaps, a more useful traceback from a different run of
> >>>> tests that we just ran into:
> >>>>
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task
> >>>> flush-254:0:29582 blocked for more than 120 seconds.
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681040] "echo 0>
> >>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681458] flush-254:0
> >>>> D ffff880bd9ca2fc0 0 29582 2 0x00000000
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681740]
> >>>> ffff88006e51d160 0000000000000046 0000000000000002 ffff88061b362040
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682173]
> >>>> ffff88006e51d160 00000000000120c0 00000000000120c0 00000000000120c0
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682659]
> >>>> ffff88006e51dfd8 00000000000120c0 00000000000120c0 ffff88006e51dfd8
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683088] Call Trace:
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683302]
> >>>> [<ffffffff81520132>] schedule+0x5a/0x5c
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683514]
> >>>> [<ffffffff815203e7>] schedule_timeout+0x36/0xe3
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683784]
> >>>> [<ffffffff8101e0b2>] ? physflat_send_IPI_mask+0xe/0x10
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683999]
> >>>> [<ffffffff8101a237>] ? native_smp_send_reschedule+0x46/0x48
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684219]
> >>>> [<ffffffff811e0071>] ? list_move_tail+0x27/0x2c
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684432]
> >>>> [<ffffffff81520d13>] __down_common+0x90/0xd4
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684708]
> >>>> [<ffffffff811e1120>] ? _xfs_buf_find+0x17f/0x210
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684925]
> >>>> [<ffffffff81520dca>] __down+0x1d/0x1f
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685139]
> >>>> [<ffffffff8105db4e>] down+0x2d/0x3d
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685350]
> >>>> [<ffffffff811e0f68>] xfs_buf_lock+0x76/0xaf
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685565]
> >>>> [<ffffffff811e1120>] _xfs_buf_find+0x17f/0x210
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685836]
> >>>> [<ffffffff811e13b6>] xfs_buf_get+0x2a/0x177
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686052]
> >>>> [<ffffffff811e19f6>] xfs_buf_read+0x1f/0xca
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686270]
> >>>> [<ffffffff8122a0b7>] xfs_trans_read_buf+0x205/0x308
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686490]
> >>>> [<ffffffff81205e01>] xfs_btree_read_buf_block.clone.22+0x4f/0xa7
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687015]
> >>>> [<ffffffff8122a3ee>] ? xfs_trans_log_buf+0xb2/0xc1
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687232]
> >>>> [<ffffffff81205edd>] xfs_btree_lookup_get_block+0x84/0xac
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687449]
> >>>> [<ffffffff81208e83>] xfs_btree_lookup+0x12b/0x3dc
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687721]
> >>>> [<ffffffff811f6bb2>] ? xfs_alloc_vextent+0x447/0x469
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687939]
> >>>> [<ffffffff811fd171>] xfs_bmbt_lookup_eq+0x1f/0x21
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688156]
> >>>> [<ffffffff811ffa88>] xfs_bmap_add_extent_delay_real+0x5b5/0xfec
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688378]
> >>>> [<ffffffff810f155b>] ? kmem_cache_alloc+0x87/0xf3
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688650]
> >>>> [<ffffffff81204c40>] ? xfs_bmbt_init_cursor+0x3f/0x107
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688867]
> >>>> [<ffffffff81201160>] xfs_bmapi_allocate+0x1f6/0x23a
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689084]
> >>>> [<ffffffff812185bd>] ? xfs_iext_bno_to_irec+0x95/0xb9
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689301]
> >>>> [<ffffffff81203414>] xfs_bmapi_write+0x32d/0x5a2
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689519]
> >>>> [<ffffffff811e99e4>] xfs_iomap_write_allocate+0x1a5/0x29f
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689797]
> >>>> [<ffffffff811df12a>] xfs_map_blocks+0x13e/0x1dd
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690016]
> >>>> [<ffffffff811dfbff>] xfs_vm_writepage+0x24e/0x410
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690233]
> >>>> [<ffffffff810bde1e>] __writepage+0x17/0x30
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690446]
> >>>> [<ffffffff810be6ed>] write_cache_pages+0x276/0x3c8
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690693]
> >>>> [<ffffffff810bde07>] ? set_page_dirty+0x60/0x60
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690908]
> >>>> [<ffffffff810be884>] generic_writepages+0x45/0x5c
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691123]
> >>>> [<ffffffff811defcb>] xfs_vm_writepages+0x4d/0x54
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691337]
> >>>> [<ffffffff810bf832>] do_writepages+0x21/0x2a
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691552]
> >>>> [<ffffffff811218f5>] writeback_single_inode+0x12a/0x2cc
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691800]
> >>>> [<ffffffff81121d92>] writeback_sb_inodes+0x174/0x215
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692016]
> >>>> [<ffffffff81122185>] __writeback_inodes_wb+0x78/0xb9
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692231]
> >>>> [<ffffffff811224b5>] wb_writeback+0x136/0x22a
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692444]
> >>>> [<ffffffff810becd1>] ? determine_dirtyable_memory+0x1d/0x26
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692692]
> >>>> [<ffffffff81122d1e>] wb_do_writeback+0x19c/0x1b7
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692907]
> >>>> [<ffffffff81122dc5>] bdi_writeback_thread+0x8c/0x20f
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693122]
> >>>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693336]
> >>>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693553]
> >>>> [<ffffffff8105911d>] kthread+0x82/0x8a
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693803]
> >>>> [<ffffffff81523c34>] kernel_thread_helper+0x4/0x10
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694018]
> >>>> [<ffffffff8105909b>] ? kthread_worker_fn+0x13b/0x13b
> >>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232]
> >>>> [<ffffffff81523c30>] ? gs_change+0xb/0xb
> >>>>
> >>>>
> >>>> On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness
> >>>> <mandell@pistoncloud.com> wrote:
> >>>>>
> >>>>>
> >>>>> We've been seeing random issues of apparent deadlocks. We are running
> >>>>> ceph 0.47 on kernel 3.2.18. OSDs are running on XFS file system.
> >>>>> mysqld (which ran into the particular problems in the attached kernel
> >>>>> log) is running on an RBD with XFS (mounted on a system which includes
> >>>>> OSDs). We have sync_fs, and gcc ver 4.5.3-r2. The mysqld process in
> >>>>> both instances returned an error to the calling process.
> >>>>>
> >>>>> Regards,
> >>>>> Mandell Degerness
> >>>>
> >>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-06-20 22:42 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CA+jddaPMp5R0adi2sLVUWeFytDzfOjAeryXL+jPjGAk8kKqafg@mail.gmail.com>
2012-06-18 22:17 ` Possible deadlock condition Mandell Degerness
2012-06-18 22:57 ` Dan Mick
2012-06-18 23:08 ` Mandell Degerness
2012-06-18 23:34 ` Dan Mick
2012-06-20 22:34 ` Mandell Degerness
2012-06-20 22:42 ` Sage Weil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.