From: Dan Mick <dan.mick@inktank.com>
To: Mandell Degerness <mandell@pistoncloud.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: Possible deadlock condition
Date: Mon, 18 Jun 2012 15:57:50 -0700 [thread overview]
Message-ID: <4FDFB26E.1060109@inktank.com> (raw)
In-Reply-To: <CA+jddaO_BY08H2PPb17EKGLZ3TS1BZ7XpxSUxswPRAZ-QN1Cfg@mail.gmail.com>
Does the xfs on the OSD have plenty of free space left, or could this be
an allocation deadlock?
On 06/18/2012 03:17 PM, Mandell Degerness wrote:
> Here is, perhaps, a more useful traceback from a different run of
> tests that we just ran into:
>
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task
> flush-254:0:29582 blocked for more than 120 seconds.
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681040] "echo 0>
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681458] flush-254:0
> D ffff880bd9ca2fc0 0 29582 2 0x00000000
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681740]
> ffff88006e51d160 0000000000000046 0000000000000002 ffff88061b362040
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682173]
> ffff88006e51d160 00000000000120c0 00000000000120c0 00000000000120c0
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682659]
> ffff88006e51dfd8 00000000000120c0 00000000000120c0 ffff88006e51dfd8
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683088] Call Trace:
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683302]
> [<ffffffff81520132>] schedule+0x5a/0x5c
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683514]
> [<ffffffff815203e7>] schedule_timeout+0x36/0xe3
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683784]
> [<ffffffff8101e0b2>] ? physflat_send_IPI_mask+0xe/0x10
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683999]
> [<ffffffff8101a237>] ? native_smp_send_reschedule+0x46/0x48
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684219]
> [<ffffffff811e0071>] ? list_move_tail+0x27/0x2c
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684432]
> [<ffffffff81520d13>] __down_common+0x90/0xd4
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684708]
> [<ffffffff811e1120>] ? _xfs_buf_find+0x17f/0x210
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684925]
> [<ffffffff81520dca>] __down+0x1d/0x1f
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685139]
> [<ffffffff8105db4e>] down+0x2d/0x3d
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685350]
> [<ffffffff811e0f68>] xfs_buf_lock+0x76/0xaf
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685565]
> [<ffffffff811e1120>] _xfs_buf_find+0x17f/0x210
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685836]
> [<ffffffff811e13b6>] xfs_buf_get+0x2a/0x177
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686052]
> [<ffffffff811e19f6>] xfs_buf_read+0x1f/0xca
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686270]
> [<ffffffff8122a0b7>] xfs_trans_read_buf+0x205/0x308
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686490]
> [<ffffffff81205e01>] xfs_btree_read_buf_block.clone.22+0x4f/0xa7
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687015]
> [<ffffffff8122a3ee>] ? xfs_trans_log_buf+0xb2/0xc1
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687232]
> [<ffffffff81205edd>] xfs_btree_lookup_get_block+0x84/0xac
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687449]
> [<ffffffff81208e83>] xfs_btree_lookup+0x12b/0x3dc
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687721]
> [<ffffffff811f6bb2>] ? xfs_alloc_vextent+0x447/0x469
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687939]
> [<ffffffff811fd171>] xfs_bmbt_lookup_eq+0x1f/0x21
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688156]
> [<ffffffff811ffa88>] xfs_bmap_add_extent_delay_real+0x5b5/0xfec
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688378]
> [<ffffffff810f155b>] ? kmem_cache_alloc+0x87/0xf3
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688650]
> [<ffffffff81204c40>] ? xfs_bmbt_init_cursor+0x3f/0x107
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688867]
> [<ffffffff81201160>] xfs_bmapi_allocate+0x1f6/0x23a
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689084]
> [<ffffffff812185bd>] ? xfs_iext_bno_to_irec+0x95/0xb9
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689301]
> [<ffffffff81203414>] xfs_bmapi_write+0x32d/0x5a2
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689519]
> [<ffffffff811e99e4>] xfs_iomap_write_allocate+0x1a5/0x29f
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689797]
> [<ffffffff811df12a>] xfs_map_blocks+0x13e/0x1dd
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690016]
> [<ffffffff811dfbff>] xfs_vm_writepage+0x24e/0x410
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690233]
> [<ffffffff810bde1e>] __writepage+0x17/0x30
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690446]
> [<ffffffff810be6ed>] write_cache_pages+0x276/0x3c8
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690693]
> [<ffffffff810bde07>] ? set_page_dirty+0x60/0x60
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690908]
> [<ffffffff810be884>] generic_writepages+0x45/0x5c
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691123]
> [<ffffffff811defcb>] xfs_vm_writepages+0x4d/0x54
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691337]
> [<ffffffff810bf832>] do_writepages+0x21/0x2a
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691552]
> [<ffffffff811218f5>] writeback_single_inode+0x12a/0x2cc
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691800]
> [<ffffffff81121d92>] writeback_sb_inodes+0x174/0x215
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692016]
> [<ffffffff81122185>] __writeback_inodes_wb+0x78/0xb9
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692231]
> [<ffffffff811224b5>] wb_writeback+0x136/0x22a
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692444]
> [<ffffffff810becd1>] ? determine_dirtyable_memory+0x1d/0x26
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692692]
> [<ffffffff81122d1e>] wb_do_writeback+0x19c/0x1b7
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692907]
> [<ffffffff81122dc5>] bdi_writeback_thread+0x8c/0x20f
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693122]
> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693336]
> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693553]
> [<ffffffff8105911d>] kthread+0x82/0x8a
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693803]
> [<ffffffff81523c34>] kernel_thread_helper+0x4/0x10
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694018]
> [<ffffffff8105909b>] ? kthread_worker_fn+0x13b/0x13b
> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232]
> [<ffffffff81523c30>] ? gs_change+0xb/0xb
>
>
> On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness
> <mandell@pistoncloud.com> wrote:
>> We've been seeing random issues of apparent deadlocks. We are running
>> ceph 0.47 on kernel 3.2.18. OSDs are running on XFS file system.
>> mysqld (which ran into the particular problems in the attached kernel
>> log) is running on an RBD with XFS (mounted on a system which includes
>> OSDs). We have sync_fs, and gcc ver 4.5.3-r2. The mysqld process in
>> both instances returned an error to the calling process.
>>
>> Regards,
>> Mandell Degerness
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-06-18 22:57 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CA+jddaPMp5R0adi2sLVUWeFytDzfOjAeryXL+jPjGAk8kKqafg@mail.gmail.com>
2012-06-18 22:17 ` Possible deadlock condition Mandell Degerness
2012-06-18 22:57 ` Dan Mick [this message]
2012-06-18 23:08 ` Mandell Degerness
2012-06-18 23:34 ` Dan Mick
2012-06-20 22:34 ` Mandell Degerness
2012-06-20 22:42 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FDFB26E.1060109@inktank.com \
--to=dan.mick@inktank.com \
--cc=ceph-devel@vger.kernel.org \
--cc=mandell@pistoncloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.