From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from userp1040.oracle.com ([156.151.31.81]:21864 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751108AbaFMDhq (ORCPT ); Thu, 12 Jun 2014 23:37:46 -0400 Date: Fri, 13 Jun 2014 11:37:26 +0800 From: Liu Bo To: Alin Dobre Cc: linux-btrfs@vger.kernel.org Subject: Re: Deadlock/high load Message-ID: <20140613033725.GA31905@localhost.localdomain> Reply-To: bo.li.liu@oracle.com References: <5399C40F.3070509@elastichosts.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <5399C40F.3070509@elastichosts.com> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Thu, Jun 12, 2014 at 04:15:27PM +0100, Alin Dobre wrote: > Hi all, > > I have a problem that triggers quite often on our production machines. I > don't really know what's triggering this or how to reproduce it, but the > machine enters in some sort of deadlock state, where it consumes all the > i/o and the load average goes very high in seconds (it even gets to over > 200), sometimes in about a minute or even less, the machine is > unresponsive and we have to reset it. Rarely, the load just stays high > (~25) for hours, but it never gets down again, but this happens rarely, > as I said. In general, the machine is either already unresponsive or is > about to become unresponsive. > > The last machine that encountered this has 40 cores and the btrfs > filesystem is running over SSDs. We encountered this on a plain 3.14 > kernel, and also on the latest 3.14.6 kernel + all the patches whose > summary is marked "btrfs:" that made it in 3.15, straight forward > backported (cherry-picked) to 3.14. > > Also, no suspicious (malicious) activity from the running processes either. > > I noticed there was another report on 3.13 which was solved by a 3.15rc > patch, it doesn't seem to be the same thing. The output of 'btrfs filesystem df' is appreciate, it can help determine if the FS has entered into 'almost full' situation, and that may cause a bug that pages are not marked with writeback tag and lead to processes's endless waiting. -liubo > > Since the only chance to obtain something was via a SysRq dump, here's > what I could get from the last "w" trigger (tasks that are in > uninterruptable (blocked) state), showing only tasks that are related to > btrfs: > > btrfs-transacti D 000000000000000e 0 2483 2 0x00080008 > ffff881fd05975d8 ffffffff81880a27 ffff881fd05974e8 ffff881fd05974f8 > ffff881fd0596010 ffff881fd05975d8 0000000000011bc0 ffff881fd13398f0 > 0000000000011bc0 0000000000011bc0 ffff881fd28ecad0 ffff881fd13398f0 > Call Trace: > [] ? __schedule+0x687/0x72c > [] ? do_release_stripe+0xeb/0x182 > [] ? zone_statistics+0x77/0x7e > [] ? raid5_unplug+0xaa/0xb3 > [] ? blk_flush_plug_list+0x99/0x1f0 > [] ? get_active_stripe+0x65/0x5ca > [] ? prepare_to_wait+0x71/0x7c > [] ? make_request+0x7b0/0x999 > [] ? release_stripe_plug+0x20/0x95 > [] ? bit_waitqueue+0xb0/0xb0 > [] ? bit_waitqueue+0xb0/0xb0 > [] ? md_make_request+0xfa/0x215 > [] ? __btrfs_map_block+0xd6f/0xd89 > [] ? generic_make_request+0x99/0xda > [] ? submit_bio+0xf3/0xfe > [] ? submit_stripe_bio+0x77/0x82 > [] ? btrfs_map_bio+0x3cd/0x440 > [] ? csum_tree_block+0x1c1/0x1ec > [] ? btree_submit_bio_hook+0x97/0xf0 > [] ? __bio_add_page+0x153/0x1de > [] ? submit_one_bio+0x63/0x90 > [] ? account_page_writeback+0x28/0x2d > [] ? submit_extent_page+0xe7/0x17e > [] ? btree_write_cache_pages+0x44c/0x71a > [] ? extent_range_clear_dirty_for_io+0x5a/0x5a > [] ? btree_writepages+0x4a/0x53 > [] ? do_writepages+0x1b/0x24 > [] ? __filemap_fdatawrite_range+0x4e/0x50 > [] ? filemap_fdatawrite_range+0xe/0x10 > [] ? btrfs_write_marked_extents+0x83/0xd1 > [] ? btrfs_write_and_wait_transaction+0x5c/0x8a > [] ? btrfs_commit_transaction+0x68b/0x87c > [] ? del_timer+0x87/0x87 > [] ? transaction_kthread+0x114/0x1e9 > [] ? close_ctree+0x280/0x280 > [] ? kthread+0xc9/0xd1 > [] ? kthread_freezable_should_stop+0x5b/0x5b > [] ? ret_from_fork+0x7c/0xb0 > [] ? kthread_freezable_should_stop+0x5b/0x5b > rs:main Q:Reg D 0000000000000002 0 6857 4976 0x00000000 > ffff883fc9b0bb08 0000000000000002 ffff883fc9b0b9e8 ffff883fc9b0ba28 > ffff883fc9b0a010 ffff883fc9b0bb08 0000000000011bc0 ffff883fc794db70 > 0000000000011bc0 0000000000011bc0 ffff881fd28e8000 ffff883fc794db70 > Call Trace: > [] ? native_sched_clock+0x17/0xd3 > [] ? sched_clock+0x9/0xd > [] ? arch_vtime_task_switch+0x81/0x86 > [] ? vtime_common_task_switch+0x29/0x2d > [] ? read_tsc+0x9/0x1b > [] schedule+0x6e/0x70 > [] io_schedule+0x93/0xd7 > [] ? __lock_page+0x68/0x68 > [] sleep_on_page+0x9/0xd > [] __wait_on_bit+0x45/0x7a > [] wait_on_page_bit+0x71/0x78 > [] ? wake_atomic_t_function+0x28/0x28 > [] prepare_pages+0xd2/0x11b > [] __btrfs_buffered_write+0x214/0x482 > [] ? futex_wait+0x176/0x239 > [] ? current_fs_time+0x22/0x29 > [] btrfs_file_aio_write+0x417/0x507 > [] ? path_openat+0x593/0x5cc > [] do_sync_write+0x59/0x79 > [] vfs_write+0xd3/0x172 > [] SyS_write+0x4b/0x8f > [] tracesys+0xd0/0xd5 > freshclam D 0000000000000002 0 8305 4976 0x00000000 > ffff883fbc1b1b08 0000000000000002 ffff881fdfc72760 ffff883fbc1b1a28 > ffff883fbc1b0010 ffff883fbc1b1b08 0000000000011bc0 ffff883fb5ca31e0 > 0000000000011bc0 0000000000011bc0 ffffffff81b8d440 ffff883fb5ca31e0 > Call Trace: > [] ? lru_cache_add+0x9/0xb > [] ? page_add_new_anon_rmap+0x108/0x11a > [] ? handle_mm_fault+0xbdf/0xc84 > [] ? read_tsc+0x9/0x1b > [] schedule+0x6e/0x70 > [] io_schedule+0x93/0xd7 > [] ? __lock_page+0x68/0x68 > [] sleep_on_page+0x9/0xd > [] __wait_on_bit+0x45/0x7a > [] wait_on_page_bit+0x71/0x78 > [] ? wake_atomic_t_function+0x28/0x28 > [] prepare_pages+0xd2/0x11b > [] __btrfs_buffered_write+0x214/0x482 > [] ? complete_walk+0x84/0xc9 > [] ? current_fs_time+0x22/0x29 > [] btrfs_file_aio_write+0x417/0x507 > [] ? final_putname+0x33/0x37 > [] do_sync_write+0x59/0x79 > [] vfs_write+0xd3/0x172 > [] SyS_write+0x4b/0x8f > [] tracesys+0xd0/0xd5 > nginx D 0000000000000002 0 12360 12358 0x00000000 > ffff881f9ef6bb08 0000000000000002 ffff881f9ef6ba08 ffff881f9ef6ba28 > ffff881f9ef6a010 ffff881f9ef6bb08 0000000000011bc0 ffff881fb894ec10 > 0000000000011bc0 0000000000011bc0 ffffffff81b8d440 ffff881fb894ec10 > Call Trace: > [] ? netif_rx_internal+0xc9/0xda > [] ? netif_rx+0x9/0xb > [] ? loopback_xmit+0x9a/0xb4 > [] ? read_tsc+0x9/0x1b > [] schedule+0x6e/0x70 > [] io_schedule+0x93/0xd7 > [] ? __lock_page+0x68/0x68 > [] sleep_on_page+0x9/0xd > [] __wait_on_bit+0x45/0x7a > [] wait_on_page_bit+0x71/0x78 > [] ? wake_atomic_t_function+0x28/0x28 > [] prepare_pages+0xd2/0x11b > [] __btrfs_buffered_write+0x214/0x482 > [] ? current_fs_time+0x22/0x29 > [] btrfs_file_aio_write+0x417/0x507 > [] ? sock_destroy_inode+0x2e/0x32 > [] do_sync_write+0x59/0x79 > [] vfs_write+0xd3/0x172 > [] SyS_write+0x4b/0x8f > [] tracesys+0xd0/0xd5 > kworker/u81:1 D ffff883fd150e800 0 35017 2 0x00000000 > Workqueue: writeback bdi_writeback_workfn (flush-9:0) > ffff881f60ae56d8 0000000000000002 ffff881f60ae55c8 ffff881f60ae55f8 > ffff881f60ae4010 ffff881f60ae56d8 0000000000011bc0 ffff881fd12bf460 > 0000000000011bc0 0000000000011bc0 ffff881fd28e8850 ffff881fd12bf460 > Call Trace: > [] ? default_wake_function+0xd/0xf > [] ? autoremove_wake_function+0x11/0x34 > [] ? __wake_up_common+0x49/0x7f > [] schedule+0x6e/0x70 > [] md_write_start+0x12a/0x142 > [] ? bit_waitqueue+0xb0/0xb0 > [] make_request+0x61/0xc26 > [] ? __bio_add_page+0x153/0x1de > [] ? unlock_page+0x22/0x26 > [] ? release_pages+0x1f2/0x201 > [] md_make_request+0xfa/0x215 > [] ? __wake_up+0x3f/0x48 > [] generic_make_request+0x99/0xda > [] submit_bio+0xf3/0xfe > [] ext4_io_submit+0x24/0x43 > [] ext4_writepages+0x8bc/0xa07 > [] ? blk_finish_plug+0x13/0x34 > [] do_writepages+0x1b/0x24 > [] __writeback_single_inode+0x40/0xf3 > [] writeback_sb_inodes+0x21d/0x391 > [] ? put_super+0x2c/0x31 > [] __writeback_inodes_wb+0x73/0xb4 > [] wb_writeback+0x109/0x19c > [] ? bdi_dirty_limit+0x2c/0x89 > [] wb_do_writeback+0x149/0x16d > [] bdi_writeback_workfn+0x6d/0x16f > [] ? finish_ordered_fn+0x10/0x12 > [] ? normal_work_helper+0xcc/0x18e > [] ? pwq_dec_nr_in_flight+0xe3/0xec > [] process_one_work+0x253/0x368 > [] worker_thread+0x1ea/0x343 > [] ? process_one_work+0x368/0x368 > [] kthread+0xc9/0xd1 > [] ? kthread_freezable_should_stop+0x5b/0x5b > [] ret_from_fork+0x7c/0xb0 > [] ? kthread_freezable_should_stop+0x5b/0x5b > kworker/u82:2 D 0000000000000000 0 35547 2 0x00080000 > Workqueue: btrfs-submit normal_work_helper > 0000000000000000 0000000000000000 0000000000000008 ffff881fd0ec0040 > ffff881fd0ec0070 0000000000000000 0000000091827364 ffff883eec757d30 > ffff883eec757d30 ffff883eec757d40 ffff883eec757d40 ffff883fd157a0c0 > Call Trace: > [] ? pending_bios_fn+0x10/0x12 > [] ? normal_work_helper+0xcc/0x18e > [] ? process_one_work+0x253/0x368 > [] ? worker_thread+0x1ea/0x343 > [] ? process_one_work+0x368/0x368 > [] ? kthread+0xc9/0xd1 > [] ? kthread_freezable_should_stop+0x5b/0x5b > [] ? ret_from_fork+0x7c/0xb0 > [] ? kthread_freezable_should_stop+0x5b/0x5b > php-fpm D ffffffff81134170 0 16086 12386 0x00000000 > ffff881f474c3b28 0000000000000002 0000001400000010 ffff881f474c3a48 > ffff881f474c2010 ffff881f474c3b28 0000000000011bc0 ffff881f7184f460 > 0000000000011bc0 0000000000011bc0 ffff881fd28bec10 ffff881f7184f460 > Call Trace: > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? __sb_end_write+0x2d/0x5c > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] ? read_tsc+0x9/0x1b > [] ? __lock_page+0x68/0x68 > [] schedule+0x6e/0x70 > [] io_schedule+0x93/0xd7 > [] sleep_on_page+0x9/0xd > [] __wait_on_bit_lock+0x43/0x8f > [] __lock_page+0x61/0x68 > [] ? wake_atomic_t_function+0x28/0x28 > [] btrfs_page_mkwrite+0xb0/0x2c2 > [] do_wp_page+0x1e4/0x79a > [] ? path_lookupat+0x5db/0x64d > [] handle_mm_fault+0xbbb/0xc84 > [] ? final_putname+0x33/0x37 > [] ? user_path_at_empty+0x5e/0x8f > [] ? cpuacct_account_field+0x55/0x5e > [] __do_page_fault+0x3bb/0x3e2 > [] ? rcu_eqs_enter+0x70/0x83 > [] ? rcu_user_enter+0xe/0x10 > [] ? cpuacct_account_field+0x55/0x5e > [] ? account_user_time+0x6e/0x97 > [] ? vtime_account_user+0x4d/0x52 > [] do_page_fault+0x44/0x61 > [] page_fault+0x28/0x30 > > And then the php-fpm process is present 14 more times with the same > backtrace. In total, the number of btrfs calls is pretty much in the > following state: > > $ grep btrfs blocked.txt | sort > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? btrfs_block_rsv_check+0x55/0x61 > [] ? btrfs_write_marked_extents+0x83/0xd1 > [] ? btrfs_write_and_wait_transaction+0x5c/0x8a > [] ? btrfs_commit_transaction+0x68b/0x87c > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] ? __btrfs_end_transaction+0x2ad/0x2d1 > [] btrfs_page_mkwrite+0xb0/0x2c2 > [] btrfs_page_mkwrite+0xb0/0x2c2 > [] btrfs_page_mkwrite+0xb0/0x2c2 > [] btrfs_page_mkwrite+0xb0/0x2c2 > [] btrfs_page_mkwrite+0xb0/0x2c2 > [] btrfs_page_mkwrite+0xb0/0x2c2 > [] btrfs_page_mkwrite+0xb0/0x2c2 > [] btrfs_page_mkwrite+0xb0/0x2c2 > [] btrfs_page_mkwrite+0xb0/0x2c2 > [] btrfs_page_mkwrite+0xb0/0x2c2 > [] btrfs_page_mkwrite+0xb0/0x2c2 > [] btrfs_page_mkwrite+0xb0/0x2c2 > [] btrfs_page_mkwrite+0xb0/0x2c2 > [] btrfs_page_mkwrite+0xef/0x2c2 > [] __btrfs_buffered_write+0x214/0x482 > [] __btrfs_buffered_write+0x214/0x482 > [] __btrfs_buffered_write+0x214/0x482 > [] btrfs_file_aio_write+0x417/0x507 > [] btrfs_file_aio_write+0x417/0x507 > [] btrfs_file_aio_write+0x417/0x507 > [] ? __btrfs_map_block+0xd6f/0xd89 > [] ? btrfs_map_bio+0x3cd/0x440 > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > [] ? __btrfs_release_delayed_node+0x184/0x1e2 > Workqueue: btrfs-submit normal_work_helper > btrfs-transacti D 000000000000000e 0 2483 2 0x00080008 > > I can send the whole file, if someone is interested. > > Appreciate any feedback. > > Cheers, > Alin. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html