All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Mick <dan.mick@inktank.com>
To: Mandell Degerness <mandell@pistoncloud.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: Possible deadlock condition
Date: Mon, 18 Jun 2012 16:34:09 -0700	[thread overview]
Message-ID: <4FDFBAF1.9090109@inktank.com> (raw)
In-Reply-To: <CA+jddaM0muy1un2pEyRh6h8wnZTJbqUjPmKa95WjQS69BHQQRA@mail.gmail.com>

I don't know enough to know if there's a connection, but I do note this 
prior thread that sounds kinda similar:

http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/6574


On 06/18/2012 04:08 PM, Mandell Degerness wrote:
> None of the OSDs seem to be more than 82% full. I didn't think we were
> running quite that close to the margin, but it is still far from
> actually full.
>
>
> On Mon, Jun 18, 2012 at 3:57 PM, Dan Mick<dan.mick@inktank.com>  wrote:
>> Does the xfs on the OSD have plenty of free space left, or could this be an
>> allocation deadlock?
>>
>>
>> On 06/18/2012 03:17 PM, Mandell Degerness wrote:
>>>
>>> Here is, perhaps, a more useful traceback from a different run of
>>> tests that we just ran into:
>>>
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.680815] INFO: task
>>> flush-254:0:29582 blocked for more than 120 seconds.
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681040] "echo 0>
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681458] flush-254:0
>>>    D ffff880bd9ca2fc0     0 29582      2 0x00000000
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.681740]
>>> ffff88006e51d160 0000000000000046 0000000000000002 ffff88061b362040
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682173]
>>> ffff88006e51d160 00000000000120c0 00000000000120c0 00000000000120c0
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.682659]
>>> ffff88006e51dfd8 00000000000120c0 00000000000120c0 ffff88006e51dfd8
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683088] Call Trace:
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683302]
>>> [<ffffffff81520132>] schedule+0x5a/0x5c
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683514]
>>> [<ffffffff815203e7>] schedule_timeout+0x36/0xe3
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683784]
>>> [<ffffffff8101e0b2>] ? physflat_send_IPI_mask+0xe/0x10
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.683999]
>>> [<ffffffff8101a237>] ? native_smp_send_reschedule+0x46/0x48
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684219]
>>> [<ffffffff811e0071>] ? list_move_tail+0x27/0x2c
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684432]
>>> [<ffffffff81520d13>] __down_common+0x90/0xd4
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684708]
>>> [<ffffffff811e1120>] ? _xfs_buf_find+0x17f/0x210
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.684925]
>>> [<ffffffff81520dca>] __down+0x1d/0x1f
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685139]
>>> [<ffffffff8105db4e>] down+0x2d/0x3d
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685350]
>>> [<ffffffff811e0f68>] xfs_buf_lock+0x76/0xaf
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685565]
>>> [<ffffffff811e1120>] _xfs_buf_find+0x17f/0x210
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.685836]
>>> [<ffffffff811e13b6>] xfs_buf_get+0x2a/0x177
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686052]
>>> [<ffffffff811e19f6>] xfs_buf_read+0x1f/0xca
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686270]
>>> [<ffffffff8122a0b7>] xfs_trans_read_buf+0x205/0x308
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.686490]
>>> [<ffffffff81205e01>] xfs_btree_read_buf_block.clone.22+0x4f/0xa7
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687015]
>>> [<ffffffff8122a3ee>] ? xfs_trans_log_buf+0xb2/0xc1
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687232]
>>> [<ffffffff81205edd>] xfs_btree_lookup_get_block+0x84/0xac
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687449]
>>> [<ffffffff81208e83>] xfs_btree_lookup+0x12b/0x3dc
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687721]
>>> [<ffffffff811f6bb2>] ? xfs_alloc_vextent+0x447/0x469
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.687939]
>>> [<ffffffff811fd171>] xfs_bmbt_lookup_eq+0x1f/0x21
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688156]
>>> [<ffffffff811ffa88>] xfs_bmap_add_extent_delay_real+0x5b5/0xfec
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688378]
>>> [<ffffffff810f155b>] ? kmem_cache_alloc+0x87/0xf3
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688650]
>>> [<ffffffff81204c40>] ? xfs_bmbt_init_cursor+0x3f/0x107
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.688867]
>>> [<ffffffff81201160>] xfs_bmapi_allocate+0x1f6/0x23a
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689084]
>>> [<ffffffff812185bd>] ? xfs_iext_bno_to_irec+0x95/0xb9
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689301]
>>> [<ffffffff81203414>] xfs_bmapi_write+0x32d/0x5a2
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689519]
>>> [<ffffffff811e99e4>] xfs_iomap_write_allocate+0x1a5/0x29f
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.689797]
>>> [<ffffffff811df12a>] xfs_map_blocks+0x13e/0x1dd
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690016]
>>> [<ffffffff811dfbff>] xfs_vm_writepage+0x24e/0x410
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690233]
>>> [<ffffffff810bde1e>] __writepage+0x17/0x30
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690446]
>>> [<ffffffff810be6ed>] write_cache_pages+0x276/0x3c8
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690693]
>>> [<ffffffff810bde07>] ? set_page_dirty+0x60/0x60
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.690908]
>>> [<ffffffff810be884>] generic_writepages+0x45/0x5c
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691123]
>>> [<ffffffff811defcb>] xfs_vm_writepages+0x4d/0x54
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691337]
>>> [<ffffffff810bf832>] do_writepages+0x21/0x2a
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691552]
>>> [<ffffffff811218f5>] writeback_single_inode+0x12a/0x2cc
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.691800]
>>> [<ffffffff81121d92>] writeback_sb_inodes+0x174/0x215
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692016]
>>> [<ffffffff81122185>] __writeback_inodes_wb+0x78/0xb9
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692231]
>>> [<ffffffff811224b5>] wb_writeback+0x136/0x22a
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692444]
>>> [<ffffffff810becd1>] ? determine_dirtyable_memory+0x1d/0x26
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692692]
>>> [<ffffffff81122d1e>] wb_do_writeback+0x19c/0x1b7
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.692907]
>>> [<ffffffff81122dc5>] bdi_writeback_thread+0x8c/0x20f
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693122]
>>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693336]
>>> [<ffffffff81122d39>] ? wb_do_writeback+0x1b7/0x1b7
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693553]
>>> [<ffffffff8105911d>] kthread+0x82/0x8a
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.693803]
>>> [<ffffffff81523c34>] kernel_thread_helper+0x4/0x10
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694018]
>>> [<ffffffff8105909b>] ? kthread_worker_fn+0x13b/0x13b
>>> Jun 18 17:58:51 node-172-29-0-15 kernel: [242522.694232]
>>> [<ffffffff81523c30>] ? gs_change+0xb/0xb
>>>
>>>
>>> On Mon, Jun 18, 2012 at 11:37 AM, Mandell Degerness
>>> <mandell@pistoncloud.com>    wrote:
>>>>
>>>> We've been seeing random issues of apparent deadlocks.  We are running
>>>> ceph 0.47 on kernel 3.2.18.  OSDs are running on XFS file system.
>>>> mysqld (which ran into the particular problems in the attached kernel
>>>> log) is running on an RBD with XFS (mounted on a system which includes
>>>> OSDs).  We have sync_fs, and gcc ver 4.5.3-r2.  The mysqld process in
>>>> both instances returned an error to the calling process.
>>>>
>>>> Regards,
>>>> Mandell Degerness
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2012-06-18 23:34 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CA+jddaPMp5R0adi2sLVUWeFytDzfOjAeryXL+jPjGAk8kKqafg@mail.gmail.com>
2012-06-18 22:17 ` Possible deadlock condition Mandell Degerness
2012-06-18 22:57   ` Dan Mick
2012-06-18 23:08     ` Mandell Degerness
2012-06-18 23:34       ` Dan Mick [this message]
2012-06-20 22:34         ` Mandell Degerness
2012-06-20 22:42           ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FDFBAF1.9090109@inktank.com \
    --to=dan.mick@inktank.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=mandell@pistoncloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.