* Re: 3.7.0-rc8 btrfs locking issue
2012-12-05 16:07 3.7.0-rc8 btrfs locking issue Jim Schutt
@ 2012-12-07 19:03 ` Jim Schutt
2012-12-09 14:04 ` Liu Bo
1 sibling, 0 replies; 7+ messages in thread
From: Jim Schutt @ 2012-12-07 19:03 UTC (permalink / raw)
To: Chris Mason; +Cc: linux-btrfs, linux-kernel, ceph-devel@vger.kernel.org
On 12/05/2012 09:07 AM, Jim Schutt wrote:
> Hi,
>
> I'm hitting a btrfs locking issue with 3.7.0-rc8.
>
> The btrfs filesystem in question is backing a Ceph OSD
> under a heavy write load from many cephfs clients.
>
> I reported this issue a while ago:
> http://www.spinics.net/lists/linux-btrfs/msg19370.html
> when I was testing what I thought might be part of the
> 3.7 btrfs patch queue, using Josef Bacik's btrfs-next tree.
>
> I spent some time attempting to bisect the btrfs patch queue
> just before it was merged for 3.7, but got nowhere due to
> false negatives.
>
> I've just been able to get back to testing 3.7-rc, and found
> that I can still trigger the issue.
>
> First I get this lockdep splat:
>
> [ 1184.201331] =============================================
> [ 1184.206716] [ INFO: possible recursive locking detected ]
> [ 1184.212111] 3.7.0-rc8-00013-gdf2fc24 #438 Not tainted
> [ 1184.217156] ---------------------------------------------
> [ 1184.222544] ceph-osd/42177 is trying to acquire lock:
> [ 1184.227589] (&fs_info->chunk_mutex){+.+...}, at: [<ffffffffa055a5d3>] do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1184.237270]
> [ 1184.237270] but task is already holding lock:
> [ 1184.243114] (&fs_info->chunk_mutex){+.+...}, at: [<ffffffffa055a5d3>] do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1184.252786]
> [ 1184.252786] other info that might help us debug this:
> [ 1184.259303] Possible unsafe locking scenario:
> [ 1184.259303]
> [ 1184.265220] CPU0
> [ 1184.267680] ----
> [ 1184.270133] lock(&fs_info->chunk_mutex);
> [ 1184.274276] lock(&fs_info->chunk_mutex);
> [ 1184.278417]
> [ 1184.278417] *** DEADLOCK ***
To try to debug this, I applied this patch:
-----cut here -----
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c72ead8..86da319 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1470,6 +1470,10 @@ struct btrfs_fs_info {
int backup_root_index;
int num_tolerated_disk_barrier_failures;
+
+ /* protect against recursive do_chunk_alloc() */
+ const void *chunk_alloc_task;
+ const struct btrfs_space_info *prev_sinfo;
};
/*
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7cda519..d61e93d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2087,6 +2087,9 @@ int open_ctree(struct super_block *sb,
fs_info->btree_inode->i_ino = BTRFS_BTREE_INODE_OBJECTID;
set_nlink(fs_info->btree_inode, 1);
+
+ fs_info->chunk_alloc_task = NULL;
+ fs_info->prev_sinfo = NULL;
/*
* we set the i_size on the btree inode to the max possible int.
* the real end of the address space is determined by all of
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3d3e2c1..1e0e421 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3582,6 +3582,16 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
}
BUG_ON(!space_info); /* Logic error */
+ if (mutex_is_locked(&fs_info->chunk_mutex) &&
+ unlikely(fs_info->chunk_alloc_task == current)) {
+ WARN_ONCE(1, "do_chunk_alloc() about to recursively acquire "
+ "fs_info->chunk_mutex: impending deadlock avoided!\n"
+ "outer call space_info = %p flags %#llx\n"
+ "nested call space_info = %p flags %#llx\n",
+ fs_info->prev_sinfo, fs_info->prev_sinfo->flags,
+ space_info, space_info->flags);
+ return -EDEADLK;
+ }
again:
spin_lock(&space_info->lock);
if (force < space_info->force_alloc)
@@ -3603,6 +3613,8 @@ again:
spin_unlock(&space_info->lock);
mutex_lock(&fs_info->chunk_mutex);
+ fs_info->chunk_alloc_task = current;
+ fs_info->prev_sinfo = space_info;
/*
* The chunk_mutex is held throughout the entirety of a chunk
@@ -3655,6 +3667,8 @@ again:
space_info->chunk_alloc = 0;
spin_unlock(&space_info->lock);
out:
+ fs_info->chunk_alloc_task = NULL;
+ fs_info->prev_sinfo = NULL;
mutex_unlock(&fs_info->chunk_mutex);
return ret;
}
-----cut here -----
After a few trials, I got this during a test:
[ 1614.213516] ------------[ cut here ]------------
[ 1614.218185] WARNING: at fs/btrfs/extent-tree.c:3592 do_chunk_alloc+0xec/0x3f0 [btrfs]()
[ 1614.226228] Hardware name: X8DTH-i/6/iF/6F
[ 1614.230484] do_chunk_alloc() about to recursively acquire fs_info->chunk_mutex: impending deadlock avoided!
[ 1614.230484] outer call space_info = ffff880c214dac00 flags 0x1
[ 1614.230484] nested call space_info = ffff880c214dbc00 flags 0x4
[ 1614.251968] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput joydev sg hid_generic sd_mod iTCO_wdt iTCO_vendor_support ata_piix libata button coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul microcode mpt2sas scsi_transport_sas raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core cxgb4 i2c_i801 i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd ioatdma i7core_edac dm_mod edac_core nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc fscache broadcom tg3 hwmon bnx2 igb dca e1000
[ 1614.320138] Pid: 84332, comm: ceph-osd Not tainted 3.7.0-rc8-00031-g2ec1a5f #458
[ 1614.327564] Call Trace:
[ 1614.330054] [<ffffffff8103ff04>] warn_slowpath_common+0x94/0xc0
[ 1614.336145] [<ffffffff8103ffe6>] warn_slowpath_fmt+0x46/0x50
[ 1614.342081] [<ffffffffa052053c>] do_chunk_alloc+0xec/0x3f0 [btrfs]
[ 1614.348431] [<ffffffffa052966c>] find_free_extent+0xa3c/0xb70 [btrfs]
[ 1614.355013] [<ffffffffa051c9e9>] ? btrfs_reduce_alloc_profile+0xa9/0x120 [btrfs]
[ 1614.362580] [<ffffffffa0529822>] btrfs_reserve_extent+0x82/0x190 [btrfs]
[ 1614.369404] [<ffffffffa05299b5>] btrfs_alloc_free_block+0x85/0x230 [btrfs]
[ 1614.376417] [<ffffffffa0515b6a>] __btrfs_cow_block+0x14a/0x4b0 [btrfs]
[ 1614.383127] [<ffffffffa056b2b3>] ? btrfs_set_lock_blocking_rw+0xe3/0x160 [btrfs]
[ 1614.390716] [<ffffffffa051656d>] btrfs_cow_block+0x18d/0x230 [btrfs]
[ 1614.397288] [<ffffffffa0519460>] btrfs_search_slot+0x360/0x730 [btrfs]
[ 1614.403988] [<ffffffffa051a26d>] btrfs_insert_empty_items+0x8d/0x100 [btrfs]
[ 1614.411202] [<ffffffff81157413>] ? kmem_cache_alloc+0xd3/0x170
[ 1614.417248] [<ffffffffa055cfe5>] btrfs_alloc_dev_extent+0xb5/0x190 [btrfs]
[ 1614.424271] [<ffffffffa0561197>] __btrfs_alloc_chunk+0x6a7/0x750 [btrfs]
[ 1614.431142] [<ffffffffa056129e>] btrfs_alloc_chunk+0x5e/0x90 [btrfs]
[ 1614.437625] [<ffffffffa0520391>] ? check_system_chunk+0x71/0x130 [btrfs]
[ 1614.444468] [<ffffffffa05207be>] do_chunk_alloc+0x36e/0x3f0 [btrfs]
[ 1614.450853] [<ffffffffa051c9e9>] ? btrfs_reduce_alloc_profile+0xa9/0x120 [btrfs]
[ 1614.458422] [<ffffffffa052440c>] btrfs_check_data_free_space+0x13c/0x2b0 [btrfs]
[ 1614.466046] [<ffffffffa0525fb0>] btrfs_delalloc_reserve_space+0x20/0x60 [btrfs]
[ 1614.473573] [<ffffffffa054a39e>] __btrfs_buffered_write+0x15e/0x350 [btrfs]
[ 1614.480691] [<ffffffffa054a8c9>] btrfs_file_aio_write+0x209/0x320 [btrfs]
[ 1614.487642] [<ffffffffa053a550>] ? btrfs_end_transaction+0x10/0x20 [btrfs]
[ 1614.494723] [<ffffffffa054ed73>] ? __btrfs_setxattr+0xf3/0x110 [btrfs]
[ 1614.501398] [<ffffffff814abf0e>] ? mutex_unlock+0xe/0x10
[ 1614.506918] [<ffffffffa054a6c0>] ? __btrfs_direct_write+0x130/0x130 [btrfs]
[ 1614.514098] [<ffffffff81162754>] do_sync_readv_writev+0x94/0xe0
[ 1614.520240] [<ffffffff81163813>] do_readv_writev+0xe3/0x1e0
[ 1614.526005] [<ffffffff81180ca2>] ? fget_light+0x122/0x170
[ 1614.531537] [<ffffffff81163956>] vfs_writev+0x46/0x60
[ 1614.536752] [<ffffffff81163a8f>] sys_writev+0x5f/0xc0
[ 1614.542167] [<ffffffff812637de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1614.548678] [<ffffffff814b77c2>] system_call_fastpath+0x16/0x1b
[ 1614.554884] ---[ end trace 5c724b2e8e635fff ]---
So, evidently a do_chunk_alloc() call for BTRFS_BLOCK_GROUP_DATA ended
up needing more space for metadata, resulting in a recursive call to
do_chunk_alloc() for BTRFS_BLOCK_GROUP_METADATA.
Since the metadata allocation fails, I get some aborted transactions:
[ 1614.564332] WARNING: at fs/btrfs/super.c:246 __btrfs_abort_transaction+0x60/0x110 [btrfs]()
[ 1614.572784] Hardware name: X8DTH-i/6/iF/6F
[ 1614.577028] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput joydev sg hid_generic sd_mod iTCO_wdt iTCO_vendor_support ata_piix libata button coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul microcode mpt2sas scsi_transport_sas raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core cxgb4 i2c_i801 i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd ioatdma i7core_edac dm_mod edac_core nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc fscache broadcom tg3 hwmon bnx2 igb dca e1000
[ 1614.647280] Pid: 84332, comm: ceph-osd Tainted: G W 3.7.0-rc8-00031-g2ec1a5f #458
[ 1614.655759] Call Trace:
[ 1614.658359] [<ffffffff8103ff04>] warn_slowpath_common+0x94/0xc0
[ 1614.664505] [<ffffffff8103ffe6>] warn_slowpath_fmt+0x46/0x50
[ 1614.670357] [<ffffffffa050e5c0>] __btrfs_abort_transaction+0x60/0x110 [btrfs]
[ 1614.677681] [<ffffffffa05296ad>] find_free_extent+0xa7d/0xb70 [btrfs]
[ 1614.684309] [<ffffffffa051c9e9>] ? btrfs_reduce_alloc_profile+0xa9/0x120 [btrfs]
[ 1614.691889] [<ffffffffa0529822>] btrfs_reserve_extent+0x82/0x190 [btrfs]
[ 1614.698803] [<ffffffffa05299b5>] btrfs_alloc_free_block+0x85/0x230 [btrfs]
[ 1614.705871] [<ffffffffa0515b6a>] __btrfs_cow_block+0x14a/0x4b0 [btrfs]
[ 1614.712666] [<ffffffffa056b2b3>] ? btrfs_set_lock_blocking_rw+0xe3/0x160 [btrfs]
[ 1614.720393] [<ffffffffa051656d>] btrfs_cow_block+0x18d/0x230 [btrfs]
[ 1614.727042] [<ffffffffa0519460>] btrfs_search_slot+0x360/0x730 [btrfs]
[ 1614.733884] [<ffffffffa051a26d>] btrfs_insert_empty_items+0x8d/0x100 [btrfs]
[ 1614.741202] [<ffffffff81157413>] ? kmem_cache_alloc+0xd3/0x170
[ 1614.747361] [<ffffffffa055cfe5>] btrfs_alloc_dev_extent+0xb5/0x190 [btrfs]
[ 1614.754573] [<ffffffffa0561197>] __btrfs_alloc_chunk+0x6a7/0x750 [btrfs]
[ 1614.761638] [<ffffffffa056129e>] btrfs_alloc_chunk+0x5e/0x90 [btrfs]
[ 1614.768288] [<ffffffffa0520391>] ? check_system_chunk+0x71/0x130 [btrfs]
[ 1614.775403] [<ffffffffa05207be>] do_chunk_alloc+0x36e/0x3f0 [btrfs]
[ 1614.781998] [<ffffffffa051c9e9>] ? btrfs_reduce_alloc_profile+0xa9/0x120 [btrfs]
[ 1614.789694] [<ffffffffa052440c>] btrfs_check_data_free_space+0x13c/0x2b0 [btrfs]
[ 1614.797419] [<ffffffffa0525fb0>] btrfs_delalloc_reserve_space+0x20/0x60 [btrfs]
[ 1614.805054] [<ffffffffa054a39e>] __btrfs_buffered_write+0x15e/0x350 [btrfs]
[ 1614.812341] [<ffffffffa054a8c9>] btrfs_file_aio_write+0x209/0x320 [btrfs]
[ 1614.819510] [<ffffffffa053a550>] ? btrfs_end_transaction+0x10/0x20 [btrfs]
[ 1614.826800] [<ffffffffa054ed73>] ? __btrfs_setxattr+0xf3/0x110 [btrfs]
[ 1614.833847] [<ffffffff814abf0e>] ? mutex_unlock+0xe/0x10
[ 1614.839543] [<ffffffffa054a6c0>] ? __btrfs_direct_write+0x130/0x130 [btrfs]
[ 1614.847074] [<ffffffff81162754>] do_sync_readv_writev+0x94/0xe0
[ 1614.853294] [<ffffffff81163813>] do_readv_writev+0xe3/0x1e0
[ 1614.859233] [<ffffffff81180ca2>] ? fget_light+0x122/0x170
[ 1614.865064] [<ffffffff81163956>] vfs_writev+0x46/0x60
[ 1614.870338] [<ffffffff81163a8f>] sys_writev+0x5f/0xc0
[ 1614.875708] [<ffffffff812637de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1614.882356] [<ffffffff814b77c2>] system_call_fastpath+0x16/0x1b
[ 1614.888536] ---[ end trace 5c724b2e8e636000 ]---
[ 1614.893309] BTRFS warning (device dm-64): find_free_extent:5878: Aborting unused transaction(error 35).
[ 1614.902879] BTRFS warning (device dm-64): __btrfs_alloc_chunk:3488: Aborting unused transaction(error 35).
[ 1614.912806] BTRFS warning (device dm-64): find_free_extent:5878: Aborting unused transaction(Object already exists).
[ 1614.916914] divide error: 0000 [#1] SMP
[ 1614.916955] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput joydev sg hid_generic sd_mod iTCO_wdt iTCO_vendor_support ata_piix libata button coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul microcode mpt2sas scsi_transport_sas raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core cxgb4 i2c_i801 i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd ioatdma i7core_edac dm_mod edac_core nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc fscache broadcom tg3 hwmon bnx2 igb dca e1000
[ 1614.916961] CPU 21
[ 1614.916961] Pid: 84341, comm: ceph-osd Tainted: G W 3.7.0-rc8-00031-g2ec1a5f #458 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 1614.916985] RIP: 0010:[<ffffffffa056031d>] [<ffffffffa056031d>] __btrfs_map_block+0xcd/0x670 [btrfs]
[ 1614.916986] RSP: 0018:ffff8804dad7bad8 EFLAGS: 00010246
[ 1614.916987] RAX: 0000000002c00000 RBX: 0000000002c00000 RCX: 0000000040000000
[ 1614.916988] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1614.916989] RBP: ffff8804dad7bb68 R08: 0000000000000001 R09: 0000000000000000
[ 1614.916990] R10: ffff880c14f99990 R11: 0000000000000000 R12: ffff8804dad7bb90
[ 1614.916991] R13: ffff880b49e5c080 R14: ffff880c16324298 R15: ffff8804dad7be68
[ 1614.916992] FS: 00007fffe6cba700(0000) GS:ffff880c3fd20000(0000) knlGS:0000000000000000
[ 1614.916993] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1614.916994] CR2: ffffffffff600400 CR3: 00000009cb6ba000 CR4: 00000000000007e0
[ 1614.916995] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1614.916997] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1614.916998] Process ceph-osd (pid: 84341, threadinfo ffff8804dad7a000, task ffff8804dad5de20)
[ 1614.916999] Stack:
[ 1614.917004] ffff8804dad5e4d8 000000ec54c11738 ffff880c14f99990 ffff880754c11cc0
[ 1614.917008] 0000000000000000 ffffffff81110eed 0000000000000000 0000000000000000
[ 1614.917011] 00000000dad7bb98 0000000000000046 0000000000000000 ffffffff81110eed
[ 1614.917011] Call Trace:
[ 1614.917019] [<ffffffff81110eed>] ? test_set_page_writeback+0x6d/0x170
[ 1614.917021] [<ffffffff81110eed>] ? test_set_page_writeback+0x6d/0x170
[ 1614.917032] [<ffffffffa05608ce>] btrfs_map_block+0xe/0x10 [btrfs]
[ 1614.917042] [<ffffffffa053d387>] btrfs_merge_bio_hook+0x57/0x80 [btrfs]
[ 1614.917052] [<ffffffffa0551c23>] submit_extent_page+0xc3/0x1d0 [btrfs]
[ 1614.917054] [<ffffffff81107960>] ? find_get_pages_tag+0x1e0/0x1e0
[ 1614.917068] [<ffffffffa0557aff>] __extent_writepage+0x69f/0x760 [btrfs]
[ 1614.917081] [<ffffffffa0551f50>] ? extent_io_tree_init+0x90/0x90 [btrfs]
[ 1614.917091] [<ffffffffa0557fd2>] extent_write_cache_pages.clone.3+0x242/0x3d0 [btrfs]
[ 1614.917102] [<ffffffffa05581af>] extent_writepages+0x4f/0x70 [btrfs]
[ 1614.917112] [<ffffffffa05436b0>] ? btrfs_lookup+0x70/0x70 [btrfs]
[ 1614.917116] [<ffffffff81180ca2>] ? fget_light+0x122/0x170
[ 1614.917127] [<ffffffffa053de77>] btrfs_writepages+0x27/0x30 [btrfs]
[ 1614.917130] [<ffffffff81112cb3>] do_writepages+0x23/0x40
[ 1614.917134] [<ffffffff8110702e>] __filemap_fdatawrite_range+0x4e/0x50
[ 1614.917137] [<ffffffff811072e3>] filemap_fdatawrite_range+0x13/0x20
[ 1614.917143] [<ffffffff81193399>] sys_sync_file_range+0x109/0x170
[ 1614.917148] [<ffffffff814b77c2>] system_call_fastpath+0x16/0x1b
[ 1614.917167] Code: 66 0f 1f 44 00 00 4d 8b 6a 60 48 29 c3 8b 45 c4 41 39 45 18 b8 00 00 00 00 0f 4d 45 c4 31 d2 89 45 c4 49 63 75 10 48 89 d8 89 f7 <48> f7 f7 49 89 c6 48 89 45 c8 4c 0f af f6 4c 39 f3 73 10 0f 0b
[ 1614.917178] RIP [<ffffffffa056031d>] __btrfs_map_block+0xcd/0x670 [btrfs]
[ 1614.917179] RSP <ffff8804dad7bad8>
[ 1614.917687] ---[ end trace 5c724b2e8e636001 ]---
But from there, things go downhill:
[ 1615.971959] divide error: 0000 [#2] SMP
[ 1615.975920] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput joydev sg hid_generic sd_mod iTCO_wdt iTCO_vendor_support ata_piix libata button coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul microcode mpt2sas scsi_transport_sas raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core cxgb4 i2c_i801 i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd ioatdma i7core_edac dm_mod edac_core nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc fscache broadcom tg3 hwmon bnx2 igb dca e1000
[ 1616.041983] CPU 10
[ 1616.043917] Pid: 33589, comm: kworker/10:0 Tainted: G D W 3.7.0-rc8-00031-g2ec1a5f #458 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 1616.055643] RIP: 0010:[<ffffffffa056031d>] [<ffffffffa056031d>] __btrfs_map_block+0xcd/0x670 [btrfs]
[ 1616.064894] RSP: 0018:ffff880c1704f778 EFLAGS: 00010246
[ 1616.070198] RAX: 0000000003000000 RBX: 0000000003000000 RCX: 0000000040000000
[ 1616.077312] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1616.084434] RBP: ffff880c1704f808 R08: 0000000000000000 R09: 0000000000000000
[ 1616.091556] R10: ffff880c14f99990 R11: 0000000000000000 R12: ffff880c1704f830
[ 1616.098669] R13: ffff880b49e5c080 R14: ffff880c16324298 R15: ffff880c1704fb08
[ 1616.105783] FS: 0000000000000000(0000) GS:ffff880c3fc80000(0000) knlGS:0000000000000000
[ 1616.113855] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1616.119603] CR2: 00007ffff7ddc020 CR3: 00000009cb6ba000 CR4: 00000000000007e0
[ 1616.126719] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1616.133839] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1616.140951] Process kworker/10:0 (pid: 33589, threadinfo ffff880c1704e000, task ffff880c19d48000)
[ 1616.149795] Stack:
[ 1616.151808] ffff8804f8b71000 ffff880754c44c60 ffff880c14f99990 ffff880754c451e8
[ 1616.159280] 0000000000000000 ffffffff81110eed 0000000000000000 0000000000000000
[ 1616.166749] 000000001704f838 0000000000000046 0000000000000000 ffffffff81110eed
[ 1616.174217] Call Trace:
[ 1616.176679] [<ffffffff81110eed>] ? test_set_page_writeback+0x6d/0x170
[ 1616.183199] [<ffffffff81110eed>] ? test_set_page_writeback+0x6d/0x170
[ 1616.189745] [<ffffffffa05608ce>] btrfs_map_block+0xe/0x10 [btrfs]
[ 1616.195935] [<ffffffffa053d387>] btrfs_merge_bio_hook+0x57/0x80 [btrfs]
[ 1616.202651] [<ffffffffa0551c23>] submit_extent_page+0xc3/0x1d0 [btrfs]
[ 1616.209257] [<ffffffff81107960>] ? find_get_pages_tag+0x1e0/0x1e0
[ 1616.215447] [<ffffffffa0557aff>] __extent_writepage+0x69f/0x760 [btrfs]
[ 1616.222158] [<ffffffffa0551f50>] ? extent_io_tree_init+0x90/0x90 [btrfs]
[ 1616.228953] [<ffffffffa0557fd2>] extent_write_cache_pages.clone.3+0x242/0x3d0 [btrfs]
[ 1616.236875] [<ffffffffa05581af>] extent_writepages+0x4f/0x70 [btrfs]
[ 1616.243325] [<ffffffffa05436b0>] ? btrfs_lookup+0x70/0x70 [btrfs]
[ 1616.249499] [<ffffffff8117c660>] ? igrab+0x40/0x70
[ 1616.254393] [<ffffffffa053de77>] btrfs_writepages+0x27/0x30 [btrfs]
[ 1616.260735] [<ffffffff81112cb3>] do_writepages+0x23/0x40
[ 1616.266127] [<ffffffff8110702e>] __filemap_fdatawrite_range+0x4e/0x50
[ 1616.272674] [<ffffffff814aebbb>] ? _raw_spin_unlock+0x2b/0x50
[ 1616.278499] [<ffffffff8110724c>] filemap_flush+0x1c/0x20
[ 1616.283905] [<ffffffffa053be64>] btrfs_start_delalloc_inodes+0xd4/0x240 [btrfs]
[ 1616.291301] [<ffffffffa0539a95>] btrfs_commit_transaction+0x2c5/0xa10 [btrfs]
[ 1616.298521] [<ffffffffa053a560>] ? btrfs_end_transaction+0x20/0x20 [btrfs]
[ 1616.305479] [<ffffffff81067c70>] ? wake_up_bit+0x40/0x40
[ 1616.310888] [<ffffffffa053a5d6>] do_async_commit+0x76/0x90 [btrfs]
[ 1616.317168] [<ffffffffa053a560>] ? btrfs_end_transaction+0x20/0x20 [btrfs]
[ 1616.324117] [<ffffffff8105fd33>] process_one_work+0x2d3/0x4b0
[ 1616.329947] [<ffffffff8105fc98>] ? process_one_work+0x238/0x4b0
[ 1616.335980] [<ffffffffa053a560>] ? btrfs_end_transaction+0x20/0x20 [btrfs]
[ 1616.342937] [<ffffffff8106219a>] worker_thread+0x20a/0x330
[ 1616.348512] [<ffffffff81061f90>] ? manage_workers+0x170/0x170
[ 1616.354340] [<ffffffff81067561>] kthread+0xe1/0xf0
[ 1616.359228] [<ffffffff81067480>] ? __init_kthread_worker+0x70/0x70
[ 1616.365496] [<ffffffff814b771c>] ret_from_fork+0x7c/0xb0
[ 1616.370905] [<ffffffff81067480>] ? __init_kthread_worker+0x70/0x70
[ 1616.377155] Code: 66 0f 1f 44 00 00 4d 8b 6a 60 48 29 c3 8b 45 c4 41 39 45 18 b8 00 00 00 00 0f 4d 45 c4 31 d2 89 45 c4 49 63 75 10 48 89 d8 89 f7 <48> f7 f7 49 89 c6 48 89 45 c8 4c 0f af f6 4c 39 f3 73 10 0f 0b
[ 1616.397533] RIP [<ffffffffa056031d>] __btrfs_map_block+0xcd/0x670 [btrfs]
[ 1616.404461] RSP <ffff880c1704f778>
[ 1616.410313] ---[ end trace 5c724b2e8e636002 ]---
[ 1616.419293] BUG: unable to handle kernel paging request at ffffffffffffffa8
[ 1616.426291] IP: [<ffffffff81066be0>] kthread_data+0x10/0x20
[ 1616.431886] PGD 1a0d067 PUD 1a0e067 PMD 0
[ 1616.436059] Oops: 0000 [#3] SMP
[ 1616.439335] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput joydev sg hid_generic sd_mod iTCO_wdt iTCO_vendor_support ata_piix libata button coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul microcode mpt2sas scsi_transport_sas raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core cxgb4 i2c_i801 i2c_core lpc_ich mfd_core ehci_hcd uhci_hcd ioatdma i7core_edac dm_mod edac_core nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc fscache broadcom tg3 hwmon bnx2 igb dca e1000
[ 1616.504982] CPU 10
[ 1616.506911] Pid: 33589, comm: kworker/10:0 Tainted: G D W 3.7.0-rc8-00031-g2ec1a5f #458 Supermicro X8DTH-i/6/iF/6F/X8DTH
[ 1616.518630] RIP: 0010:[<ffffffff81066be0>] [<ffffffff81066be0>] kthread_data+0x10/0x20
[ 1616.526632] RSP: 0018:ffff880c1704f448 EFLAGS: 00010082
[ 1616.531930] RAX: 0000000000000000 RBX: ffff880c3fc932c0 RCX: 000000000000000a
[ 1616.539045] RDX: ffffffff81ccaa20 RSI: 000000000000000a RDI: ffff880c19d48000
[ 1616.546161] RBP: ffff880c1704f448 R08: ffff880c19d48070 R09: 0000000000000001
[ 1616.553271] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000a
[ 1616.560385] R13: 000000000000000a R14: 0000000000000001 R15: 0000000000000000
[ 1616.567498] FS: 0000000000000000(0000) GS:ffff880c3fc80000(0000) knlGS:0000000000000000
[ 1616.575562] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1616.581284] CR2: ffffffffffffffa8 CR3: 0000000b7429b000 CR4: 00000000000007e0
[ 1616.588390] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1616.595502] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1616.602609] Process kworker/10:0 (pid: 33589, threadinfo ffff880c1704e000, task ffff880c19d48000)
[ 1616.611451] Stack:
[ 1616.613464] ffff880c1704f478 ffffffff8105f861 000000000000000a ffff880c3fc932c0
[ 1616.620896] 000000000000000a ffff880c19d483c0 ffff880c1704f508 ffffffff814acb4e
[ 1616.628329] 0000000000000018 ffff880c1704ffd8 ffff880c1704e010 ffff880c1704e000
[ 1616.635765] Call Trace:
[ 1616.638217] [<ffffffff8105f861>] wq_worker_sleeping+0x21/0xa0
[ 1616.644031] [<ffffffff814acb4e>] __schedule+0x17e/0x690
[ 1616.649327] [<ffffffff814ad3dd>] schedule+0x5d/0x60
[ 1616.654281] [<ffffffff810456c3>] do_exit+0x3e3/0x430
[ 1616.659320] [<ffffffff814afe68>] oops_end+0xd8/0xf0
[ 1616.664273] [<ffffffff8100593a>] die+0x6a/0x80
[ 1616.668792] [<ffffffff814af7be>] do_trap+0x7e/0x170
[ 1616.673747] [<ffffffff81002f45>] do_divide_error+0x95/0xa0
[ 1616.679321] [<ffffffffa056031d>] ? __btrfs_map_block+0xcd/0x670 [btrfs]
[ 1616.686003] [<ffffffff8126381d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1616.692510] [<ffffffff814af08c>] ? restore_args+0x30/0x30
[ 1616.697983] [<ffffffff814b8758>] divide_error+0x18/0x20
[ 1616.703292] [<ffffffffa056031d>] ? __btrfs_map_block+0xcd/0x670 [btrfs]
[ 1616.709981] [<ffffffffa05602a1>] ? __btrfs_map_block+0x51/0x670 [btrfs]
[ 1616.716663] [<ffffffff81110eed>] ? test_set_page_writeback+0x6d/0x170
[ 1616.723169] [<ffffffff81110eed>] ? test_set_page_writeback+0x6d/0x170
[ 1616.729688] [<ffffffffa05608ce>] btrfs_map_block+0xe/0x10 [btrfs]
[ 1616.735859] [<ffffffffa053d387>] btrfs_merge_bio_hook+0x57/0x80 [btrfs]
[ 1616.742551] [<ffffffffa0551c23>] submit_extent_page+0xc3/0x1d0 [btrfs]
[ 1616.749144] [<ffffffff81107960>] ? find_get_pages_tag+0x1e0/0x1e0
[ 1616.755316] [<ffffffffa0557aff>] __extent_writepage+0x69f/0x760 [btrfs]
[ 1616.762008] [<ffffffffa0551f50>] ? extent_io_tree_init+0x90/0x90 [btrfs]
[ 1616.768784] [<ffffffffa0557fd2>] extent_write_cache_pages.clone.3+0x242/0x3d0 [btrfs]
[ 1616.776686] [<ffffffffa05581af>] extent_writepages+0x4f/0x70 [btrfs]
[ 1616.783114] [<ffffffffa05436b0>] ? btrfs_lookup+0x70/0x70 [btrfs]
[ 1616.789276] [<ffffffff8117c660>] ? igrab+0x40/0x70
[ 1616.794151] [<ffffffffa053de77>] btrfs_writepages+0x27/0x30 [btrfs]
[ 1616.800486] [<ffffffff81112cb3>] do_writepages+0x23/0x40
[ 1616.805872] [<ffffffff8110702e>] __filemap_fdatawrite_range+0x4e/0x50
[ 1616.812380] [<ffffffff814aebbb>] ? _raw_spin_unlock+0x2b/0x50
[ 1616.818196] [<ffffffff8110724c>] filemap_flush+0x1c/0x20
[ 1616.823591] [<ffffffffa053be64>] btrfs_start_delalloc_inodes+0xd4/0x240 [btrfs]
[ 1616.830974] [<ffffffffa0539a95>] btrfs_commit_transaction+0x2c5/0xa10 [btrfs]
[ 1616.838182] [<ffffffffa053a560>] ? btrfs_end_transaction+0x20/0x20 [btrfs]
[ 1616.845121] [<ffffffff81067c70>] ? wake_up_bit+0x40/0x40
[ 1616.850517] [<ffffffffa053a5d6>] do_async_commit+0x76/0x90 [btrfs]
[ 1616.856774] [<ffffffffa053a560>] ? btrfs_end_transaction+0x20/0x20 [btrfs]
[ 1616.863712] [<ffffffff8105fd33>] process_one_work+0x2d3/0x4b0
[ 1616.869530] [<ffffffff8105fc98>] ? process_one_work+0x238/0x4b0
[ 1616.875531] [<ffffffffa053a560>] ? btrfs_end_transaction+0x20/0x20 [btrfs]
[ 1616.882469] [<ffffffff8106219a>] worker_thread+0x20a/0x330
[ 1616.888028] [<ffffffff81061f90>] ? manage_workers+0x170/0x170
[ 1616.893844] [<ffffffff81067561>] kthread+0xe1/0xf0
[ 1616.898711] [<ffffffff81067480>] ? __init_kthread_worker+0x70/0x70
[ 1616.904960] [<ffffffff814b771c>] ret_from_fork+0x7c/0xb0
[ 1616.910345] [<ffffffff81067480>] ? __init_kthread_worker+0x70/0x70
[ 1616.916593] Code: 68 03 00 00 48 8b 40 98 c9 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 87 68 03 00 00 <48> 8b 40 a8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66
[ 1616.936489] RIP [<ffffffff81066be0>] kthread_data+0x10/0x20
[ 1616.942150] RSP <ffff880c1704f448>
[ 1616.945626] CR2: ffffffffffffffa8
[ 1616.948930] ---[ end trace 5c724b2e8e636003 ]---
[ 1616.953536] Fixing recursive fault but reboot is needed!
Is there some way to avoid this do_chunk_alloc() deadlock by learning
before a call to do_chunk_alloc() for more data space that more metadata
space will be needed, and doing that call first?
Or, is there some other way to fix this problem?
Thanks -- Jim
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: 3.7.0-rc8 btrfs locking issue
2012-12-05 16:07 3.7.0-rc8 btrfs locking issue Jim Schutt
2012-12-07 19:03 ` Jim Schutt
@ 2012-12-09 14:04 ` Liu Bo
2012-12-11 16:33 ` Jim Schutt
1 sibling, 1 reply; 7+ messages in thread
From: Liu Bo @ 2012-12-09 14:04 UTC (permalink / raw)
To: Jim Schutt; +Cc: linux-btrfs, linux-kernel, ceph-devel@vger.kernel.org
On Wed, Dec 05, 2012 at 09:07:05AM -0700, Jim Schutt wrote:
> Hi,
>
> I'm hitting a btrfs locking issue with 3.7.0-rc8.
>
> The btrfs filesystem in question is backing a Ceph OSD
> under a heavy write load from many cephfs clients.
>
> I reported this issue a while ago:
> http://www.spinics.net/lists/linux-btrfs/msg19370.html
> when I was testing what I thought might be part of the
> 3.7 btrfs patch queue, using Josef Bacik's btrfs-next tree.
>
> I spent some time attempting to bisect the btrfs patch queue
> just before it was merged for 3.7, but got nowhere due to
> false negatives.
>
> I've just been able to get back to testing 3.7-rc, and found
> that I can still trigger the issue.
Hi Jim,
Could you please apply the following patch to test if it works?
(It's against 3.7-rc8.)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3d3e2c1..100289b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3346,7 +3346,8 @@ u64 btrfs_get_alloc_profile(struct btrfs_root
*root, int data)
if (data)
flags = BTRFS_BLOCK_GROUP_DATA;
- else if (root == root->fs_info->chunk_root)
+ else if (root == root->fs_info->chunk_root ||
+ root == root->fs_info->dev_root)
flags = BTRFS_BLOCK_GROUP_SYSTEM;
else
flags = BTRFS_BLOCK_GROUP_METADATA;
@@ -3535,6 +3536,7 @@ static u64 get_system_chunk_thresh(struct
btrfs_root *root, u64 type)
num_dev = 1; /* DUP or single */
/* metadata for updaing devices and chunk tree */
+ num_dev = num_dev << 1
return btrfs_calc_trans_metadata_size(root, num_dev + 1);
}
@@ -4351,7 +4353,7 @@ static void init_global_block_rsv(struct
btrfs_fs_info *fs_info)
fs_info->extent_root->block_rsv = &fs_info->global_block_rsv;
fs_info->csum_root->block_rsv = &fs_info->global_block_rsv;
- fs_info->dev_root->block_rsv = &fs_info->global_block_rsv;
+ fs_info->dev_root->block_rsv = &fs_info->chunk_block_rsv;
fs_info->tree_root->block_rsv = &fs_info->global_block_rsv;
fs_info->chunk_root->block_rsv = &fs_info->chunk_block_rsv;
thanks,
liubo
>
> First I get this lockdep splat:
>
> [ 1184.201331] =============================================
> [ 1184.206716] [ INFO: possible recursive locking detected ]
> [ 1184.212111] 3.7.0-rc8-00013-gdf2fc24 #438 Not tainted
> [ 1184.217156] ---------------------------------------------
> [ 1184.222544] ceph-osd/42177 is trying to acquire lock:
> [ 1184.227589] (&fs_info->chunk_mutex){+.+...}, at: [<ffffffffa055a5d3>] do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1184.237270]
> [ 1184.237270] but task is already holding lock:
> [ 1184.243114] (&fs_info->chunk_mutex){+.+...}, at: [<ffffffffa055a5d3>] do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1184.252786]
> [ 1184.252786] other info that might help us debug this:
> [ 1184.259303] Possible unsafe locking scenario:
> [ 1184.259303]
> [ 1184.265220] CPU0
> [ 1184.267680] ----
> [ 1184.270133] lock(&fs_info->chunk_mutex);
> [ 1184.274276] lock(&fs_info->chunk_mutex);
> [ 1184.278417]
> [ 1184.278417] *** DEADLOCK ***
> [ 1184.278417]
> [ 1184.284325] May be due to missing lock nesting notation
> [ 1184.284325]
> [ 1184.291099] 4 locks held by ceph-osd/42177:
> [ 1184.295277] #0: (sb_writers#7){.+.+.+}, at: [<ffffffffa0584664>] btrfs_file_aio_write+0x64/0x320 [btrfs]
> [ 1184.305103] #1: (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffffa058466e>] btrfs_file_aio_write+0x6e/0x320 [btrfs]
> [ 1184.316108] #2: (sb_internal){.+.+..}, at: [<ffffffffa05746f4>] start_transaction+0x1c4/0x450 [btrfs]
> [ 1184.325632] #3: (&fs_info->chunk_mutex){+.+...}, at: [<ffffffffa055a5d3>] do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1184.335761]
> [ 1184.335761] stack backtrace:
> [ 1184.340126] Pid: 42177, comm: ceph-osd Not tainted 3.7.0-rc8-00013-gdf2fc24 #438
> [ 1184.347508] Call Trace:
> [ 1184.349962] [<ffffffff81042eca>] ? vprintk_emit+0x42a/0x4c0
> [ 1184.355619] [<ffffffff8109b579>] print_deadlock_bug+0xe9/0x100
> [ 1184.361556] [<ffffffff8109d246>] validate_chain+0x596/0x750
> [ 1184.367222] [<ffffffff8109d849>] __lock_acquire+0x449/0x510
> [ 1184.372894] [<ffffffffa055a5d3>] ? do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1184.379417] [<ffffffff8109d9d9>] lock_acquire+0xc9/0x120
> [ 1184.384855] [<ffffffffa055a5d3>] ? do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1184.391377] [<ffffffff8109d849>] ? __lock_acquire+0x449/0x510
> [ 1184.397204] [<ffffffff814ab17d>] __mutex_lock_common+0x5d/0x3a0
> [ 1184.403221] [<ffffffffa055a5d3>] ? do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1184.409762] [<ffffffffa055a5d3>] ? do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1184.416323] [<ffffffffa055a5c9>] ? do_chunk_alloc+0x179/0x340 [btrfs]
> [ 1184.422849] [<ffffffff814ab62a>] mutex_lock_nested+0x4a/0x60
> [ 1184.428640] [<ffffffffa055a5d3>] do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1184.435018] [<ffffffffa05635bc>] find_free_extent+0xa3c/0xb70 [btrfs]
> [ 1184.441555] [<ffffffffa05569e9>] ? btrfs_reduce_alloc_profile+0xa9/0x120 [btrfs]
> [ 1184.449051] [<ffffffffa0563772>] btrfs_reserve_extent+0x82/0x190 [btrfs]
> [ 1184.455843] [<ffffffffa0563905>] btrfs_alloc_free_block+0x85/0x230 [btrfs]
> [ 1184.462828] [<ffffffffa054fb6a>] __btrfs_cow_block+0x14a/0x4b0 [btrfs]
> [ 1184.469471] [<ffffffffa05a50d3>] ? btrfs_set_lock_blocking_rw+0xe3/0x160 [btrfs]
> [ 1184.476962] [<ffffffffa055056d>] btrfs_cow_block+0x18d/0x230 [btrfs]
> [ 1184.483426] [<ffffffffa0553460>] btrfs_search_slot+0x360/0x730 [btrfs]
> [ 1184.490067] [<ffffffffa055426d>] btrfs_insert_empty_items+0x8d/0x100 [btrfs]
> [ 1184.497199] [<ffffffff81159be3>] ? kmem_cache_alloc+0xd3/0x170
> [ 1184.503151] [<ffffffffa0596f25>] btrfs_alloc_dev_extent+0xb5/0x190 [btrfs]
> [ 1184.510125] [<ffffffffa059b0d7>] __btrfs_alloc_chunk+0x6a7/0x750 [btrfs]
> [ 1184.516928] [<ffffffffa059b1de>] btrfs_alloc_chunk+0x5e/0x90 [btrfs]
> [ 1184.523374] [<ffffffffa055a391>] ? check_system_chunk+0x71/0x130 [btrfs]
> [ 1184.530179] [<ffffffffa055a71e>] do_chunk_alloc+0x2ce/0x340 [btrfs]
> [ 1184.536555] [<ffffffffa05569e9>] ? btrfs_reduce_alloc_profile+0xa9/0x120 [btrfs]
> [ 1184.544070] [<ffffffffa055e35c>] btrfs_check_data_free_space+0x13c/0x2b0 [btrfs]
> [ 1184.551574] [<ffffffffa055ff00>] btrfs_delalloc_reserve_space+0x20/0x60 [btrfs]
> [ 1184.558987] [<ffffffffa05842de>] __btrfs_buffered_write+0x15e/0x350 [btrfs]
> [ 1184.566042] [<ffffffffa0584809>] btrfs_file_aio_write+0x209/0x320 [btrfs]
> [ 1184.572944] [<ffffffffa0584600>] ? __btrfs_direct_write+0x130/0x130 [btrfs]
> [ 1184.579984] [<ffffffff81165014>] do_sync_readv_writev+0x94/0xe0
> [ 1184.585985] [<ffffffff81166023>] do_readv_writev+0xe3/0x1e0
> [ 1184.591645] [<ffffffff81183112>] ? fget_light+0x122/0x170
> [ 1184.597131] [<ffffffff81166166>] vfs_writev+0x46/0x60
> [ 1184.602266] [<ffffffff8116629f>] sys_writev+0x5f/0xc0
> [ 1184.607398] [<ffffffff81264b6e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 1184.613832] [<ffffffff814b7042>] system_call_fastpath+0x16/0x1b
>
> and then a little later:
>
> [ 1319.463719] INFO: task ceph-osd:42177 blocked for more than 120 seconds.
> [ 1319.470409] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1319.478227] ceph-osd D 0000000000000246 0 42177 1 0x00000000
> [ 1319.485330] ffff880402c4d358 0000000000000046 00007fffeccc49d0 ffff880402c4dfd8
> [ 1319.492852] ffff880402c4c010 ffff880402c4c000 ffff880402c4c000 ffff880402c4c000
> [ 1319.500449] ffff880402c4dfd8 ffff880402c4c000 ffff880c2151bec0 ffff8804e9403ec0
> [ 1319.508012] Call Trace:
> [ 1319.510495] [<ffffffff814acc5d>] schedule+0x5d/0x60
> [ 1319.515478] [<ffffffff814acf67>] schedule_preempt_disabled+0x27/0x40
> [ 1319.521967] [<ffffffff814ab375>] __mutex_lock_common+0x255/0x3a0
> [ 1319.528090] [<ffffffffa055a5d3>] ? do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1319.534659] [<ffffffffa055a5d3>] ? do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1319.541264] [<ffffffff814ab62a>] mutex_lock_nested+0x4a/0x60
> [ 1319.547118] [<ffffffffa055a5d3>] do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1319.553532] [<ffffffffa05635bc>] find_free_extent+0xa3c/0xb70 [btrfs]
> [ 1319.560122] [<ffffffffa05569e9>] ? btrfs_reduce_alloc_profile+0xa9/0x120 [btrfs]
> [ 1319.567620] [<ffffffffa0563772>] btrfs_reserve_extent+0x82/0x190 [btrfs]
> [ 1319.574460] [<ffffffffa0563905>] btrfs_alloc_free_block+0x85/0x230 [btrfs]
> [ 1319.581464] [<ffffffffa054fb6a>] __btrfs_cow_block+0x14a/0x4b0 [btrfs]
> [ 1319.588117] [<ffffffffa05a50d3>] ? btrfs_set_lock_blocking_rw+0xe3/0x160 [btrfs]
> [ 1319.595644] [<ffffffffa055056d>] btrfs_cow_block+0x18d/0x230 [btrfs]
> [ 1319.602110] [<ffffffffa0553460>] btrfs_search_slot+0x360/0x730 [btrfs]
> [ 1319.608767] [<ffffffffa055426d>] btrfs_insert_empty_items+0x8d/0x100 [btrfs]
> [ 1319.615897] [<ffffffff81159be3>] ? kmem_cache_alloc+0xd3/0x170
> [ 1319.621910] [<ffffffffa0596f25>] btrfs_alloc_dev_extent+0xb5/0x190 [btrfs]
> [ 1319.628893] [<ffffffffa059b0d7>] __btrfs_alloc_chunk+0x6a7/0x750 [btrfs]
> [ 1319.635736] [<ffffffffa059b1de>] btrfs_alloc_chunk+0x5e/0x90 [btrfs]
> [ 1319.642184] [<ffffffffa055a391>] ? check_system_chunk+0x71/0x130 [btrfs]
> [ 1319.649009] [<ffffffffa055a71e>] do_chunk_alloc+0x2ce/0x340 [btrfs]
> [ 1319.655366] [<ffffffffa05569e9>] ? btrfs_reduce_alloc_profile+0xa9/0x120 [btrfs]
> [ 1319.662935] [<ffffffffa055e35c>] btrfs_check_data_free_space+0x13c/0x2b0 [btrfs]
> [ 1319.670465] [<ffffffffa055ff00>] btrfs_delalloc_reserve_space+0x20/0x60 [btrfs]
> [ 1319.677891] [<ffffffffa05842de>] __btrfs_buffered_write+0x15e/0x350 [btrfs]
> [ 1319.685000] [<ffffffffa0584809>] btrfs_file_aio_write+0x209/0x320 [btrfs]
> [ 1319.691952] [<ffffffffa0584600>] ? __btrfs_direct_write+0x130/0x130 [btrfs]
> [ 1319.699054] [<ffffffff81165014>] do_sync_readv_writev+0x94/0xe0
> [ 1319.705118] [<ffffffff81166023>] do_readv_writev+0xe3/0x1e0
> [ 1319.710828] [<ffffffff81183112>] ? fget_light+0x122/0x170
> [ 1319.716324] [<ffffffff81166166>] vfs_writev+0x46/0x60
> [ 1319.721466] [<ffffffff8116629f>] sys_writev+0x5f/0xc0
> [ 1319.726674] [<ffffffff81264b6e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 1319.733198] [<ffffffff814b7042>] system_call_fastpath+0x16/0x1b
> [ 1319.739274] INFO: lockdep is turned off.
> [ 1319.743261] INFO: task ceph-osd:42236 blocked for more than 120 seconds.
> [ 1319.749953] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1319.757812] ceph-osd D 0000000000000246 0 42236 1 0x00000000
> [ 1319.765056] ffff88052e7bdcc8 0000000000000046 ffff880624666120 ffff88052e7bdfd8
> [ 1319.772785] ffff88052e7bc010 ffff88052e7bc000 ffff88052e7bc000 ffff88052e7bc000
> [ 1319.780332] ffff88052e7bdfd8 ffff88052e7bc000 ffff88062453bec0 ffff88052e7b5e20
> [ 1319.787905] Call Trace:
> [ 1319.790368] [<ffffffff814acc5d>] schedule+0x5d/0x60
> [ 1319.795335] [<ffffffff814acf67>] schedule_preempt_disabled+0x27/0x40
> [ 1319.801762] [<ffffffff814ab375>] __mutex_lock_common+0x255/0x3a0
> [ 1319.807861] [<ffffffffa0548b4f>] ? btrfs_statfs+0x5f/0x240 [btrfs]
> [ 1319.814143] [<ffffffff811703cf>] ? final_putname+0x3f/0x50
> [ 1319.819724] [<ffffffffa0548b4f>] ? btrfs_statfs+0x5f/0x240 [btrfs]
> [ 1319.825982] [<ffffffff814ab62a>] mutex_lock_nested+0x4a/0x60
> [ 1319.831731] [<ffffffffa0548b4f>] btrfs_statfs+0x5f/0x240 [btrfs]
> [ 1319.837846] [<ffffffff811968c0>] statfs_by_dentry+0x60/0x90
> [ 1319.843508] [<ffffffff81196a0b>] vfs_statfs+0x1b/0xb0
> [ 1319.848659] [<ffffffff81196c00>] user_statfs+0x40/0x60
> [ 1319.853887] [<ffffffff81196caa>] sys_statfs+0x2a/0x50
> [ 1319.859036] [<ffffffff814b7042>] system_call_fastpath+0x16/0x1b
> [ 1319.865035] INFO: lockdep is turned off.
>
> and then later still:
>
> [ 1439.546768] INFO: task btrfs-transacti:27591 blocked for more than 120 seconds.
> [ 1439.554139] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1439.561949] btrfs-transacti D ffff880993231810 0 27591 2 0x00000000
> [ 1439.569205] ffff8808e79d3c58 0000000000000046 0000000000000000 ffff8808e79d3fd8
> [ 1439.576820] ffff8808e79d2010 ffff8808e79d2000 ffff8808e79d2000 ffff8808e79d2000
> [ 1439.584384] ffff8808e79d3fd8 ffff8808e79d2000 ffff880b3f165e20 ffff8808d8cdbec0
> [ 1439.592040] Call Trace:
> [ 1439.594573] [<ffffffff814acc5d>] schedule+0x5d/0x60
> [ 1439.599646] [<ffffffff814aac3d>] schedule_timeout+0x3d/0x260
> [ 1439.605402] [<ffffffff8109c410>] ? trace_hardirqs_on_caller+0x20/0x1d0
> [ 1439.612023] [<ffffffff8109c5cd>] ? trace_hardirqs_on+0xd/0x10
> [ 1439.617939] [<ffffffff8106781a>] ? prepare_to_wait+0x7a/0x90
> [ 1439.623788] [<ffffffffa0573a46>] btrfs_commit_transaction+0x336/0xa10 [btrfs]
> [ 1439.631128] [<ffffffffa05748fb>] ? start_transaction+0x3cb/0x450 [btrfs]
> [ 1439.637943] [<ffffffff810674e0>] ? wake_up_bit+0x40/0x40
> [ 1439.643380] [<ffffffffa056c223>] transaction_kthread+0x123/0x210 [btrfs]
> [ 1439.650259] [<ffffffffa056c100>] ? btrfs_bio_wq_end_io+0x90/0x90 [btrfs]
> [ 1439.657134] [<ffffffff81066dd1>] kthread+0xe1/0xf0
> [ 1439.662109] [<ffffffff81066cf0>] ? __init_kthread_worker+0x70/0x70
> [ 1439.668398] [<ffffffff814b6f9c>] ret_from_fork+0x7c/0xb0
> [ 1439.673858] [<ffffffff81066cf0>] ? __init_kthread_worker+0x70/0x70
> [ 1439.680181] INFO: lockdep is turned off.
> [ 1439.685095] INFO: task ceph-osd:42177 blocked for more than 120 seconds.
> [ 1439.691847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1439.699743] ceph-osd D 0000000000000246 0 42177 1 0x00000000
> [ 1439.706941] ffff880402c4d358 0000000000000046 00007fffeccc49d0 ffff880402c4dfd8
> [ 1439.714494] ffff880402c4c010 ffff880402c4c000 ffff880402c4c000 ffff880402c4c000
> [ 1439.722039] ffff880402c4dfd8 ffff880402c4c000 ffff880c2151bec0 ffff8804e9403ec0
> [ 1439.729688] Call Trace:
> [ 1439.732217] [<ffffffff814acc5d>] schedule+0x5d/0x60
> [ 1439.737179] [<ffffffff814acf67>] schedule_preempt_disabled+0x27/0x40
> [ 1439.743705] [<ffffffff814ab375>] __mutex_lock_common+0x255/0x3a0
> [ 1439.749843] [<ffffffffa055a5d3>] ? do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1439.756481] [<ffffffffa055a5d3>] ? do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1439.763037] [<ffffffff814ab62a>] mutex_lock_nested+0x4a/0x60
> [ 1439.768854] [<ffffffffa055a5d3>] do_chunk_alloc+0x183/0x340 [btrfs]
> [ 1439.775290] [<ffffffffa05635bc>] find_free_extent+0xa3c/0xb70 [btrfs]
> [ 1439.781932] [<ffffffffa05569e9>] ? btrfs_reduce_alloc_profile+0xa9/0x120 [btrfs]
> [ 1439.789534] [<ffffffffa0563772>] btrfs_reserve_extent+0x82/0x190 [btrfs]
> [ 1439.796474] [<ffffffffa0563905>] btrfs_alloc_free_block+0x85/0x230 [btrfs]
> [ 1439.803499] [<ffffffffa054fb6a>] __btrfs_cow_block+0x14a/0x4b0 [btrfs]
> [ 1439.810253] [<ffffffffa05a50d3>] ? btrfs_set_lock_blocking_rw+0xe3/0x160 [btrfs]
> [ 1439.817877] [<ffffffffa055056d>] btrfs_cow_block+0x18d/0x230 [btrfs]
> [ 1439.824427] [<ffffffffa0553460>] btrfs_search_slot+0x360/0x730 [btrfs]
> [ 1439.831162] [<ffffffffa055426d>] btrfs_insert_empty_items+0x8d/0x100 [btrfs]
> [ 1439.838423] [<ffffffff81159be3>] ? kmem_cache_alloc+0xd3/0x170
> [ 1439.844514] [<ffffffffa0596f25>] btrfs_alloc_dev_extent+0xb5/0x190 [btrfs]
> [ 1439.851619] [<ffffffffa059b0d7>] __btrfs_alloc_chunk+0x6a7/0x750 [btrfs]
> [ 1439.858535] [<ffffffffa059b1de>] btrfs_alloc_chunk+0x5e/0x90 [btrfs]
> [ 1439.865079] [<ffffffffa055a391>] ? check_system_chunk+0x71/0x130 [btrfs]
> [ 1439.871939] [<ffffffffa055a71e>] do_chunk_alloc+0x2ce/0x340 [btrfs]
> [ 1439.878375] [<ffffffffa05569e9>] ? btrfs_reduce_alloc_profile+0xa9/0x120 [btrfs]
> [ 1439.885933] [<ffffffffa055e35c>] btrfs_check_data_free_space+0x13c/0x2b0 [btrfs]
> [ 1439.893514] [<ffffffffa055ff00>] btrfs_delalloc_reserve_space+0x20/0x60 [btrfs]
> [ 1439.901018] [<ffffffffa05842de>] __btrfs_buffered_write+0x15e/0x350 [btrfs]
> [ 1439.908226] [<ffffffffa0584809>] btrfs_file_aio_write+0x209/0x320 [btrfs]
> [ 1439.915235] [<ffffffffa0584600>] ? __btrfs_direct_write+0x130/0x130 [btrfs]
> [ 1439.922375] [<ffffffff81165014>] do_sync_readv_writev+0x94/0xe0
> [ 1439.928475] [<ffffffff81166023>] do_readv_writev+0xe3/0x1e0
> [ 1439.934197] [<ffffffff81183112>] ? fget_light+0x122/0x170
> [ 1439.939738] [<ffffffff81166166>] vfs_writev+0x46/0x60
> [ 1439.944948] [<ffffffff8116629f>] sys_writev+0x5f/0xc0
> [ 1439.950139] [<ffffffff81264b6e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 1439.956621] [<ffffffff814b7042>] system_call_fastpath+0x16/0x1b
> [ 1439.962742] INFO: lockdep is turned off.
> [ 1439.966769] INFO: task ceph-osd:42236 blocked for more than 120 seconds.
> [ 1439.973558] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1439.981513] ceph-osd D 0000000000000246 0 42236 1 0x00000000
> [ 1439.988788] ffff88052e7bdcc8 0000000000000046 ffff880624666120 ffff88052e7bdfd8
> [ 1439.996611] ffff88052e7bc010 ffff88052e7bc000 ffff88052e7bc000 ffff88052e7bc000
> [ 1440.004237] ffff88052e7bdfd8 ffff88052e7bc000 ffff88062453bec0 ffff88052e7b5e20
> [ 1440.011956] Call Trace:
> [ 1440.014516] [<ffffffff814acc5d>] schedule+0x5d/0x60
> [ 1440.019578] [<ffffffff814acf67>] schedule_preempt_disabled+0x27/0x40
> [ 1440.026054] [<ffffffff814ab375>] __mutex_lock_common+0x255/0x3a0
> [ 1440.032245] [<ffffffffa0548b4f>] ? btrfs_statfs+0x5f/0x240 [btrfs]
> [ 1440.038624] [<ffffffff811703cf>] ? final_putname+0x3f/0x50
> [ 1440.044323] [<ffffffffa0548b4f>] ? btrfs_statfs+0x5f/0x240 [btrfs]
> [ 1440.050952] [<ffffffff814ab62a>] mutex_lock_nested+0x4a/0x60
> [ 1440.056805] [<ffffffffa0548b4f>] btrfs_statfs+0x5f/0x240 [btrfs]
> [ 1440.063029] [<ffffffff811968c0>] statfs_by_dentry+0x60/0x90
> [ 1440.068739] [<ffffffff81196a0b>] vfs_statfs+0x1b/0xb0
> [ 1440.073962] [<ffffffff81196c00>] user_statfs+0x40/0x60
> [ 1440.079203] [<ffffffff81196caa>] sys_statfs+0x2a/0x50
> [ 1440.084401] [<ffffffff814b7042>] system_call_fastpath+0x16/0x1b
> [ 1440.090816] INFO: lockdep is turned off.
>
>
> Please let me know what I can do to help resolve this issue.
>
> Thanks -- Jim
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 7+ messages in thread