* Is this a race bug when releasing eb?
@ 2014-04-07 15:45 ylet ylet
2014-04-07 15:50 ` Josef Bacik
0 siblings, 1 reply; 6+ messages in thread
From: ylet ylet @ 2014-04-07 15:45 UTC (permalink / raw)
To: linux-btrfs; +Cc: clm, jbacik
Assuming thread 1 (may be VFS want to release the page) wants to
release a CLEAN page and thus the eb attaching the page.
and thread 2 wants to access the eb and cow the eb.
Thread 1
Thread 2
btree_releasepage
try_release_extent_buffer
release_extent_buffer
->if (atomic_dec_and_test(&eb->refs)) {
->spin_unlock(&eb->refs_lock);
-------switch to thread 2 here------
read_block_for_search
btrfs_find_tree_block
radix_tree_lookup && atomic_inc_not_zero
................
btrfs_mark_buffer_dirty(eb)
---------switch to thread 1 here ------
- >spin_lock(&tree->buffer_lock);
->radix_tree_delete(&tree->buffer,
eb->start >> PAGE_CACHE_SHIFT);
->spin_unlock(&tree->buffer_lock);
-->btrfs_release_extent_buffer_page
-->trigger BUG_ON(extent_buffer_under_io(eb))!!!!
This is because Thread 2 can still find the eb while thread 1 has dec
the eb->refs to 0, so eventually trigger the BUG_ON.
I have triggered the BUG_ON while testing btrfs, but I am not sure if
the bug-on is rise by my guess above.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Is this a race bug when releasing eb? 2014-04-07 15:45 Is this a race bug when releasing eb? ylet ylet @ 2014-04-07 15:50 ` Josef Bacik [not found] ` <CAAcaxmWEDa+jS8n4LZKdE_ZXFVAoOgACDseQs0_fViYdfRHShg@mail.gmail.com> 0 siblings, 1 reply; 6+ messages in thread From: Josef Bacik @ 2014-04-07 15:50 UTC (permalink / raw) To: ylet ylet, linux-btrfs; +Cc: clm On 04/07/2014 11:45 AM, ylet ylet wrote: > Assuming thread 1 (may be VFS want to release the page) wants to > release a CLEAN page and thus the eb attaching the page. > and thread 2 wants to access the eb and cow the eb. > > Thread 1 > Thread 2 > > btree_releasepage > > try_release_extent_buffer > > release_extent_buffer > ->if (atomic_dec_and_test(&eb->refs)) { > ->spin_unlock(&eb->refs_lock); > -------switch to thread 2 here------ > > read_block_for_search > > btrfs_find_tree_block > > radix_tree_lookup && atomic_inc_not_zero > atomic_inc_not_zero would return 0 here because we've dropped the last buffer for this, that is how we protect against this problem. How reliably are you able to reproduce this bug? Cause I'd love to find it if you can reproduce reliably. Thanks, Josef ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CAAcaxmWEDa+jS8n4LZKdE_ZXFVAoOgACDseQs0_fViYdfRHShg@mail.gmail.com>]
* Fwd: Is this a race bug when releasing eb? [not found] ` <CAAcaxmWEDa+jS8n4LZKdE_ZXFVAoOgACDseQs0_fViYdfRHShg@mail.gmail.com> @ 2014-04-07 16:01 ` ylet ylet 2014-04-07 16:03 ` Josef Bacik 2014-04-07 16:12 ` ylet ylet 0 siblings, 2 replies; 6+ messages in thread From: ylet ylet @ 2014-04-07 16:01 UTC (permalink / raw) To: jbacik, linux-btrfs hi Josef Thanks, I got it. But i still have no idea to reliably reproduce it. This bug-on is triggered when I ran POSTMARK test lasting two days. BTW, can you reproduce the bug reported by email with title of "btrfs hung with iozone test under linux kernel 3.14." The reported bug can be reliably reproduced in my environment. 2014-04-07 23:50 GMT+08:00 Josef Bacik <jbacik@fb.com>: > On 04/07/2014 11:45 AM, ylet ylet wrote: >> >> Assuming thread 1 (may be VFS want to release the page) wants to >> release a CLEAN page and thus the eb attaching the page. >> and thread 2 wants to access the eb and cow the eb. >> >> Thread 1 >> Thread 2 >> >> btree_releasepage >> >> try_release_extent_buffer >> >> release_extent_buffer >> ->if (atomic_dec_and_test(&eb->refs)) { >> ->spin_unlock(&eb->refs_lock); >> -------switch to thread 2 here------ >> >> read_block_for_search >> >> btrfs_find_tree_block >> >> radix_tree_lookup && atomic_inc_not_zero >> > > atomic_inc_not_zero would return 0 here because we've dropped the last > buffer for this, that is how we protect against this problem. How reliably > are you able to reproduce this bug? Cause I'd love to find it if you can > reproduce reliably. Thanks, > > Josef ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fwd: Is this a race bug when releasing eb? 2014-04-07 16:01 ` Fwd: " ylet ylet @ 2014-04-07 16:03 ` Josef Bacik 2014-04-07 16:12 ` ylet ylet 1 sibling, 0 replies; 6+ messages in thread From: Josef Bacik @ 2014-04-07 16:03 UTC (permalink / raw) To: ylet ylet, linux-btrfs On 04/07/2014 12:01 PM, ylet ylet wrote: > hi Josef > > Thanks, I got it. But i still have no idea to reliably reproduce it. > This bug-on is triggered when I ran POSTMARK test lasting two days. > > BTW, can you reproduce the bug reported by email with title of "btrfs > hung with iozone test under linux kernel 3.14." The reported bug can > be reliably reproduced in my environment. > Can I have the postmark options you are using and the general setup of your fs? Number of disks, mount options and mkfs options. I've not seen it, I'm flirting with paternity leave atm and was gone all last week, once we track down this memory leak I'll give it a shot. Thanks, Josef ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Is this a race bug when releasing eb? 2014-04-07 16:01 ` Fwd: " ylet ylet 2014-04-07 16:03 ` Josef Bacik @ 2014-04-07 16:12 ` ylet ylet 2014-04-07 16:16 ` Josef Bacik 1 sibling, 1 reply; 6+ messages in thread From: ylet ylet @ 2014-04-07 16:12 UTC (permalink / raw) To: jbacik, linux-btrfs ======================This is the IOzone test bug (can be reliably reproduced)================================== Btrfs falls into hang when I ran iozone test with the following configuration: ./iozone -s 8g -i 0 -i 2 -i 1 -t 4 -r 4k -+w 50 -+y 20 -+C 60 -F /mnt/btrfs/test1 /mnt/btrfs/test2 /mnt/btrfs/test3 /mnt/btrfs/test The bug can be very easily triggered when running this test in my PC. My PC equipped with 4-core intel I5 cpu, 8G memory and a SSD. dmesg result is as below, hoping to help you guys: [23379.435980] INFO: task btrfs-flush_del:1971 blocked for more than 120 seconds. [23379.435997] Tainted: G O 3.14.0-rc2 #2 [23379.436006] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [23379.436019] btrfs-flush_del D 0000000000000000 0 1971 2 0x00000000 [23379.436024] ffff88002eb9dd28 0000000000000046 ffff88002eb9dcc8 0000000000000296 [23379.436029] ffff8800d9946100 00000000000133c0 ffff88002eb9dfd8 00000000000133c0 [23379.436033] ffff880064cb0000 ffff8800d9946100 ffff88002eb9dd38 ffff880058b7ccc0 [23379.436037] Call Trace: [23379.436047] [<ffffffff816b67f9>] schedule+0x29/0x70 [23379.436068] [<ffffffffa033d825>] btrfs_start_ordered_extent+0x75/0x120 [btrfs] [23379.436073] [<ffffffff81095200>] ? __init_waitqueue_head+0x40/0x40 [23379.436089] [<ffffffffa033d8f9>] btrfs_run_ordered_extent_work+0x29/0x40 [btrfs] [23379.436106] [<ffffffffa03531ea>] worker_loop+0x16a/0x570 [btrfs] [23379.436120] [<ffffffffa0353080>] ? btrfs_queue_worker+0x300/0x300 [btrfs] [23379.436124] [<ffffffff81074191>] kthread+0xe1/0x100 [23379.436129] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.436133] [<ffffffff816c236c>] ret_from_fork+0x7c/0xb0 [23379.436137] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.436140] INFO: task btrfs-transacti:1974 blocked for more than 120 seconds. [23379.436151] Tainted: G O 3.14.0-rc2 #2 [23379.436161] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [23379.436174] btrfs-transacti D ffffffff8180fa60 0 1974 2 0x00000000 [23379.436177] ffff8800d3fad9e8 0000000000000046 ffff8800d3fad9d8 0000000000000292 [23379.436181] ffff8802145bc8c0 00000000000133c0 ffff8800d3fadfd8 00000000000133c0 [23379.436185] ffff880215169840 ffff8802145bc8c0 ffff8800d3fad9f8 ffff88008b406b98 [23379.436188] Call Trace: [23379.436193] [<ffffffff816b67f9>] schedule+0x29/0x70 [23379.436206] [<ffffffffa035b52d>] btrfs_tree_read_lock+0x8d/0x100 [btrfs] [23379.436210] [<ffffffff81095200>] ? __init_waitqueue_head+0x40/0x40 [23379.436220] [<ffffffffa02fbbcb>] btrfs_read_lock_root_node+0x3b/0x50 [btrfs] [23379.436230] [<ffffffffa0300ffb>] btrfs_search_slot+0x5bb/0x9d0 [btrfs] [23379.436235] [<ffffffff816b9ace>] ? _raw_spin_unlock+0xe/0x10 [23379.436246] [<ffffffffa0302a58>] btrfs_insert_empty_items+0x78/0xd0 [btrfs] [23379.436257] [<ffffffffa0309c9a>] alloc_reserved_file_extent+0x9a/0x2e0 [btrfs] [23379.436269] [<ffffffffa0310893>] __btrfs_run_delayed_refs+0x623/0x1210 [btrfs] [23379.436281] [<ffffffffa03136e6>] btrfs_run_delayed_refs+0x76/0x2b0 [btrfs] [23379.436294] [<ffffffffa0323fe9>] btrfs_commit_transaction+0x59/0xa00 [btrfs] [23379.436299] [<ffffffff8105cdf0>] ? call_timer_fn+0x160/0x160 [23379.436311] [<ffffffffa0321f55>] transaction_kthread+0x1d5/0x250 [btrfs] [23379.436323] [<ffffffffa0321d80>] ? open_ctree+0x2220/0x2220 [btrfs] [23379.436327] [<ffffffff81074191>] kthread+0xe1/0x100 [23379.436331] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.436335] [<ffffffff816c236c>] ret_from_fork+0x7c/0xb0 [23379.436338] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.436341] INFO: task kworker/u8:1:2041 blocked for more than 120 seconds. [23379.436351] Tainted: G O 3.14.0-rc2 #2 [23379.436362] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [23379.436374] kworker/u8:1 D ffffffff8180fa60 0 2041 2 0x00000000 [23379.436381] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1) [23379.436384] ffff8801c86f18a8 0000000000000046 ffff880202afd468 0000000000000292 [23379.436387] ffff88002eb948c0 00000000000133c0 ffff8801c86f1fd8 00000000000133c0 [23379.436391] ffff880215168000 ffff88002eb948c0 ffff8801c86f18b8 ffff88008b406b98 [23379.436394] Call Trace: [23379.436398] [<ffffffff816b67f9>] schedule+0x29/0x70 [23379.436413] [<ffffffffa035b91d>] btrfs_tree_lock+0xad/0x1f0 [btrfs] [23379.436416] [<ffffffff81095200>] ? __init_waitqueue_head+0x40/0x40 [23379.436431] [<ffffffffa0344fbe>] btree_write_cache_pages+0x25e/0x800 [btrfs] [23379.436444] [<ffffffffa031b671>] btree_writepages+0x71/0x80 [btrfs] [23379.436449] [<ffffffff8113ebb3>] do_writepages+0x23/0x40 [23379.436452] [<ffffffff811bbda5>] __writeback_single_inode+0x45/0x2a0 [23379.436456] [<ffffffff8108bf73>] ? check_preempt_wakeup+0x163/0x260 [23379.436460] [<ffffffff811bd17d>] writeback_sb_inodes+0x23d/0x3e0 [23379.436463] [<ffffffff811bd3be>] __writeback_inodes_wb+0x9e/0xd0 [23379.436467] [<ffffffff811bd67b>] wb_writeback+0x28b/0x330 [23379.436471] [<ffffffff811c0699>] bdi_writeback_workfn+0x209/0x460 [23379.436476] [<ffffffff8106c1bd>] process_one_work+0x17d/0x4a0 [23379.436480] [<ffffffff8106d59b>] worker_thread+0x11b/0x370 [23379.436484] [<ffffffff8106d480>] ? manage_workers.isra.21+0x2e0/0x2e0 [23379.436488] [<ffffffff81074191>] kthread+0xe1/0x100 [23379.436492] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.436496] [<ffffffff816c236c>] ret_from_fork+0x7c/0xb0 [23379.436500] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.436503] INFO: task btrfs-endio-wri:2177 blocked for more than 120 seconds. [23379.436514] Tainted: G O 3.14.0-rc2 #2 [23379.436524] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [23379.436537] btrfs-endio-wri D ffffffff8180fa60 0 2177 2 0x00000000 [23379.436540] ffff880024165928 0000000000000046 ffff8800241658e8 0000000000000296 [23379.436543] ffff8802025d8000 00000000000133c0 ffff880024165fd8 00000000000133c0 [23379.436547] ffff88021516b080 ffff8802025d8000 ffff880024165938 ffff8800c1151090 [23379.436550] Call Trace: [23379.436554] [<ffffffff816b67f9>] schedule+0x29/0x70 [23379.436568] [<ffffffffa035b91d>] btrfs_tree_lock+0xad/0x1f0 [btrfs] [23379.436571] [<ffffffff81095200>] ? __init_waitqueue_head+0x40/0x40 [23379.436582] [<ffffffffa0301261>] btrfs_search_slot+0x821/0x9d0 [btrfs] [23379.436592] [<ffffffffa0302a58>] btrfs_insert_empty_items+0x78/0xd0 [btrfs] [23379.436603] [<ffffffffa0309c9a>] alloc_reserved_file_extent+0x9a/0x2e0 [btrfs] [23379.436615] [<ffffffffa0310893>] __btrfs_run_delayed_refs+0x623/0x1210 [btrfs] [23379.436629] [<ffffffffa033f8fa>] ? merge_state+0x6a/0x150 [btrfs] [23379.436641] [<ffffffffa03136e6>] btrfs_run_delayed_refs+0x76/0x2b0 [btrfs] [23379.436652] [<ffffffffa030b47c>] ? btrfs_delayed_refs_qgroup_accounting+0xbc/0x100 [btrfs] [23379.436665] [<ffffffffa0324c5f>] __btrfs_end_transaction+0x27f/0x390 [btrfs] [23379.436677] [<ffffffffa0324da0>] btrfs_end_transaction+0x10/0x20 [btrfs] [23379.436690] [<ffffffffa032cfbb>] btrfs_finish_ordered_io+0xcb/0x500 [btrfs] [23379.436703] [<ffffffffa032d405>] finish_ordered_fn+0x15/0x20 [btrfs] [23379.436717] [<ffffffffa03531ea>] worker_loop+0x16a/0x570 [btrfs] [23379.436730] [<ffffffffa0353080>] ? btrfs_queue_worker+0x300/0x300 [btrfs] [23379.436733] [<ffffffff81074191>] kthread+0xe1/0x100 [23379.436737] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.436741] [<ffffffff816c236c>] ret_from_fork+0x7c/0xb0 [23379.436745] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.436748] INFO: task iozone:2406 blocked for more than 120 seconds. [23379.436758] Tainted: G O 3.14.0-rc2 #2 [23379.436768] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [23379.436811] iozone D 0000000000000000 0 2406 2165 0x00000000 [23379.436814] ffff8800a62778e8 0000000000000082 ffff8800a6277888 ffffffff8113b08e [23379.436818] ffff8800cc909840 00000000000133c0 ffff8800a6277fd8 00000000000133c0 [23379.436821] ffff880047046100 ffff8800cc909840 ffff88012df4a800 ffff880058b7cdd8 [23379.436825] Call Trace: [23379.436829] [<ffffffff8113b08e>] ? __free_memcg_kmem_pages+0xe/0x10 [23379.436833] [<ffffffff816b67f9>] schedule+0x29/0x70 [23379.436836] [<ffffffff816b5a2d>] schedule_timeout+0x1bd/0x220 [23379.436851] [<ffffffffa033ccf2>] ? btrfs_put_ordered_extent+0xf2/0x110 [btrfs] [23379.436855] [<ffffffff816b7909>] wait_for_completion+0xa9/0x110 [23379.436859] [<ffffffff81084930>] ? try_to_wake_up+0x2c0/0x2c0 [23379.436872] [<ffffffffa033d2de>] btrfs_wait_ordered_extents+0x1ee/0x240 [btrfs] [23379.436884] [<ffffffffa033d43f>] btrfs_wait_ordered_roots+0x10f/0x1c0 [btrfs] [23379.436895] [<ffffffffa030c585>] reserve_metadata_bytes+0x435/0x970 [btrfs] [23379.436909] [<ffffffffa033f771>] ? free_extent_state+0xb1/0x120 [btrfs] [23379.436921] [<ffffffffa030d9bd>] btrfs_delalloc_reserve_metadata+0x16d/0x4a0 [btrfs] [23379.436935] [<ffffffffa0335d4d>] __btrfs_buffered_write+0x15d/0x5a0 [btrfs] [23379.436939] [<ffffffff81054926>] ? current_fs_time+0x16/0x60 [23379.436952] [<ffffffffa03363a5>] btrfs_file_aio_write+0x215/0x5c0 [btrfs] [23379.436956] [<ffffffff81089e25>] ? set_next_entity+0xa5/0xc0 [23379.436960] [<ffffffff8108af51>] ? update_curr+0x141/0x200 [23379.436964] [<ffffffff81193afa>] do_sync_write+0x5a/0x90 [23379.436968] [<ffffffff811945db>] vfs_write+0xcb/0x1f0 [23379.436972] [<ffffffff81194ad2>] SyS_write+0x52/0xa0 [23379.436977] [<ffffffff816c2412>] system_call_fastpath+0x16/0x1b [23379.436980] INFO: task btrfs-endio-wri:2409 blocked for more than 120 seconds. [23379.437020] Tainted: G O 3.14.0-rc2 #2 [23379.437044] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [23379.437086] btrfs-endio-wri D ffffffff8180fa60 0 2409 2 0x00000000 [23379.437089] ffff88004834d2a8 0000000000000046 0000000000000000 0000000000000292 [23379.437093] ffff880064cb1840 00000000000133c0 ffff88004834dfd8 00000000000133c0 [23379.437096] ffff880215169840 ffff880064cb1840 ffff88004834d2b8 ffff8800d77866a0 [23379.437100] Call Trace: [23379.437103] [<ffffffff816b67f9>] schedule+0x29/0x70 [23379.437117] [<ffffffffa035b8d5>] btrfs_tree_lock+0x65/0x1f0 [btrfs] [23379.437121] [<ffffffff81095200>] ? __init_waitqueue_head+0x40/0x40 [23379.437131] [<ffffffffa02fcdea>] push_nodes_for_insert+0xea/0x4d0 [btrfs] [23379.437141] [<ffffffffa02fe198>] ? key_search+0x88/0xb0 [btrfs] [23379.437151] [<ffffffffa02fdf4c>] ? comp_keys+0x2c/0x30 [btrfs] [23379.437161] [<ffffffffa02fd251>] split_node+0x81/0x4b0 [btrfs] [23379.437173] [<ffffffffa031e1d5>] ? btrfs_buffer_uptodate+0x65/0x80 [btrfs] [23379.437183] [<ffffffffa02f95ad>] ? reada_for_balance+0xfd/0x1d0 [btrfs] [23379.437193] [<ffffffffa0300d36>] btrfs_search_slot+0x2f6/0x9d0 [btrfs] [23379.437197] [<ffffffff816b8896>] ? down_write+0x16/0x40 [23379.437208] [<ffffffffa0302a58>] btrfs_insert_empty_items+0x78/0xd0 [btrfs] [23379.437218] [<ffffffffa0302b15>] btrfs_insert_item+0x65/0xe0 [btrfs] [23379.437230] [<ffffffffa03135f8>] btrfs_create_pending_block_groups+0xf8/0x170 [btrfs] [23379.437242] [<ffffffffa0324aaa>] __btrfs_end_transaction+0xca/0x390 [btrfs] [23379.437254] [<ffffffffa0324da0>] btrfs_end_transaction+0x10/0x20 [btrfs] [23379.437265] [<ffffffffa030ff83>] find_free_extent+0xa03/0xb60 [btrfs] [23379.437277] [<ffffffffa03101a2>] btrfs_reserve_extent+0xa2/0x130 [btrfs] [23379.437288] [<ffffffffa03116e3>] btrfs_alloc_free_block+0x103/0x4b0 [btrfs] [23379.437298] [<ffffffffa03000ed>] split_leaf+0x12d/0x7b0 [btrfs] [23379.437309] [<ffffffffa03013a4>] btrfs_search_slot+0x964/0x9d0 [btrfs] [23379.437319] [<ffffffffa0302a58>] btrfs_insert_empty_items+0x78/0xd0 [btrfs] [23379.437330] [<ffffffffa0309c9a>] alloc_reserved_file_extent+0x9a/0x2e0 [btrfs] [23379.437341] [<ffffffffa0310893>] __btrfs_run_delayed_refs+0x623/0x1210 [btrfs] [23379.437355] [<ffffffffa033f8fa>] ? merge_state+0x6a/0x150 [btrfs] [23379.437367] [<ffffffffa03136e6>] btrfs_run_delayed_refs+0x76/0x2b0 [btrfs] [23379.437378] [<ffffffffa030b47c>] ? btrfs_delayed_refs_qgroup_accounting+0xbc/0x100 [btrfs] [23379.437390] [<ffffffffa0324c5f>] __btrfs_end_transaction+0x27f/0x390 [btrfs] [23379.437402] [<ffffffffa0324da0>] btrfs_end_transaction+0x10/0x20 [btrfs] [23379.437414] [<ffffffffa032cfbb>] btrfs_finish_ordered_io+0xcb/0x500 [btrfs] [23379.437426] [<ffffffffa032d405>] finish_ordered_fn+0x15/0x20 [btrfs] [23379.437440] [<ffffffffa03531ea>] worker_loop+0x16a/0x570 [btrfs] [23379.437454] [<ffffffffa0353080>] ? btrfs_queue_worker+0x300/0x300 [btrfs] [23379.437457] [<ffffffff81074191>] kthread+0xe1/0x100 [23379.437461] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.437465] [<ffffffff816c236c>] ret_from_fork+0x7c/0xb0 [23379.437469] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.437472] INFO: task btrfs-endio-wri:2410 blocked for more than 120 seconds. [23379.437512] Tainted: G O 3.14.0-rc2 #2 [23379.437536] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [23379.437578] btrfs-endio-wri D 0000000000000000 0 2410 2 0x00000000 [23379.437581] ffff8800232f1928 0000000000000046 ffff8800232f18e8 0000000000000296 [23379.437584] ffff880064cb3080 00000000000133c0 ffff8800232f1fd8 00000000000133c0 [23379.437588] ffff8802025d8000 ffff880064cb3080 ffff8800232f1938 ffff8800c1151090 [23379.437591] Call Trace: [23379.437595] [<ffffffff816b67f9>] schedule+0x29/0x70 [23379.437608] [<ffffffffa035b91d>] btrfs_tree_lock+0xad/0x1f0 [btrfs] [23379.437612] [<ffffffff81095200>] ? __init_waitqueue_head+0x40/0x40 [23379.437622] [<ffffffffa0301261>] btrfs_search_slot+0x821/0x9d0 [btrfs] [23379.437632] [<ffffffffa0302a58>] btrfs_insert_empty_items+0x78/0xd0 [btrfs] [23379.437643] [<ffffffffa0309c9a>] alloc_reserved_file_extent+0x9a/0x2e0 [btrfs] [23379.437655] [<ffffffffa0310893>] __btrfs_run_delayed_refs+0x623/0x1210 [btrfs] [23379.437669] [<ffffffffa033f8fa>] ? merge_state+0x6a/0x150 [btrfs] [23379.437681] [<ffffffffa03136e6>] btrfs_run_delayed_refs+0x76/0x2b0 [btrfs] [23379.437692] [<ffffffffa030b47c>] ? btrfs_delayed_refs_qgroup_accounting+0xbc/0x100 [btrfs] [23379.437704] [<ffffffffa0324c5f>] __btrfs_end_transaction+0x27f/0x390 [btrfs] [23379.437716] [<ffffffffa0324da0>] btrfs_end_transaction+0x10/0x20 [btrfs] [23379.437729] [<ffffffffa032cfbb>] btrfs_finish_ordered_io+0xcb/0x500 [btrfs] [23379.437741] [<ffffffffa032d405>] finish_ordered_fn+0x15/0x20 [btrfs] [23379.437754] [<ffffffffa03531ea>] worker_loop+0x16a/0x570 [btrfs] [23379.437767] [<ffffffffa0353080>] ? btrfs_queue_worker+0x300/0x300 [btrfs] [23379.437771] [<ffffffff81074191>] kthread+0xe1/0x100 [23379.437775] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.437779] [<ffffffff816c236c>] ret_from_fork+0x7c/0xb0 [23379.437783] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.437785] INFO: task btrfs-endio-wri:2411 blocked for more than 120 seconds. [23379.437826] Tainted: G O 3.14.0-rc2 #2 [23379.437850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [23379.437892] btrfs-endio-wri D ffffffff8180fa60 0 2411 2 0x00000000 [23379.437895] ffff8800177dfa58 0000000000000046 ffff88020a1a6c08 0000000000000296 [23379.437898] ffff880064cb0000 00000000000133c0 ffff8800177dffd8 00000000000133c0 [23379.437902] ffffffff81c104a0 ffff880064cb0000 ffff8800177dfa68 ffff88008b406b98 [23379.437905] Call Trace: [23379.437909] [<ffffffff816b67f9>] schedule+0x29/0x70 [23379.437922] [<ffffffffa035b52d>] btrfs_tree_read_lock+0x8d/0x100 [btrfs] [23379.437925] [<ffffffff81095200>] ? __init_waitqueue_head+0x40/0x40 [23379.437935] [<ffffffffa02fbbcb>] btrfs_read_lock_root_node+0x3b/0x50 [btrfs] [23379.437945] [<ffffffffa0300ffb>] btrfs_search_slot+0x5bb/0x9d0 [btrfs] [23379.437955] [<ffffffffa02f8ee0>] ? leaf_space_used+0xe0/0x110 [btrfs] [23379.437965] [<ffffffffa0302a58>] btrfs_insert_empty_items+0x78/0xd0 [btrfs] [23379.437975] [<ffffffffa0302b15>] btrfs_insert_item+0x65/0xe0 [btrfs] [23379.437987] [<ffffffffa03135f8>] btrfs_create_pending_block_groups+0xf8/0x170 [btrfs] [23379.437999] [<ffffffffa0324aaa>] __btrfs_end_transaction+0xca/0x390 [btrfs] [23379.438011] [<ffffffffa0324da0>] btrfs_end_transaction+0x10/0x20 [btrfs] [23379.438023] [<ffffffffa032cfbb>] btrfs_finish_ordered_io+0xcb/0x500 [btrfs] [23379.438035] [<ffffffffa032d405>] finish_ordered_fn+0x15/0x20 [btrfs] [23379.438048] [<ffffffffa03531ea>] worker_loop+0x16a/0x570 [btrfs] [23379.438061] [<ffffffffa0353080>] ? btrfs_queue_worker+0x300/0x300 [btrfs] [23379.438065] [<ffffffff81074191>] kthread+0xe1/0x100 [23379.438068] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.438072] [<ffffffff816c236c>] ret_from_fork+0x7c/0xb0 [23379.438076] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.438079] INFO: task btrfs-endio-wri:2412 blocked for more than 120 seconds. [23379.438120] Tainted: G O 3.14.0-rc2 #2 [23379.438144] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [23379.438185] btrfs-endio-wri D ffffffff8180fa60 0 2412 2 0x00000000 [23379.438188] ffff880024791908 0000000000000046 ffff8800247918d8 0000000000000292 [23379.438192] ffff88002e981840 00000000000133c0 ffff880024791fd8 00000000000133c0 [23379.438195] ffff880215168000 ffff88002e981840 ffff880024791918 ffff88008b406b98 [23379.438199] Call Trace: [23379.438203] [<ffffffff816b67f9>] schedule+0x29/0x70 [23379.438215] [<ffffffffa035b52d>] btrfs_tree_read_lock+0x8d/0x100 [btrfs] [23379.438218] [<ffffffff81095200>] ? __init_waitqueue_head+0x40/0x40 [23379.438228] [<ffffffffa02fbbcb>] btrfs_read_lock_root_node+0x3b/0x50 [btrfs] [23379.438238] [<ffffffffa0300ffb>] btrfs_search_slot+0x5bb/0x9d0 [btrfs] [23379.438243] [<ffffffff811845c5>] ? kmem_cache_alloc_trace+0x35/0x160 [23379.438253] [<ffffffffa0302a58>] btrfs_insert_empty_items+0x78/0xd0 [btrfs] [23379.438264] [<ffffffffa0309c9a>] alloc_reserved_file_extent+0x9a/0x2e0 [btrfs] [23379.438278] [<ffffffffa033b224>] ? btrfs_get_token_32+0x64/0xf0 [btrfs] [23379.438290] [<ffffffffa0310893>] __btrfs_run_delayed_refs+0x623/0x1210 [btrfs] [23379.438304] [<ffffffffa033f8fa>] ? merge_state+0x6a/0x150 [btrfs] [23379.438316] [<ffffffffa03136e6>] btrfs_run_delayed_refs+0x76/0x2b0 [btrfs] [23379.438327] [<ffffffffa030b47c>] ? btrfs_delayed_refs_qgroup_accounting+0xbc/0x100 [btrfs] [23379.438339] [<ffffffffa0324c5f>] __btrfs_end_transaction+0x27f/0x390 [btrfs] [23379.438351] [<ffffffffa0324da0>] btrfs_end_transaction+0x10/0x20 [btrfs] [23379.438363] [<ffffffffa032cfbb>] btrfs_finish_ordered_io+0xcb/0x500 [btrfs] [23379.438375] [<ffffffffa032d405>] finish_ordered_fn+0x15/0x20 [btrfs] [23379.438388] [<ffffffffa03531ea>] worker_loop+0x16a/0x570 [btrfs] [23379.438401] [<ffffffffa0353080>] ? btrfs_queue_worker+0x300/0x300 [btrfs] [23379.438404] [<ffffffff81074191>] kthread+0xe1/0x100 [23379.438408] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.438412] [<ffffffff816c236c>] ret_from_fork+0x7c/0xb0 [23379.438416] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.438419] INFO: task btrfs-endio-wri:2413 blocked for more than 120 seconds. [23379.438459] Tainted: G O 3.14.0-rc2 #2 [23379.438483] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [23379.438525] btrfs-endio-wri D ffffffff8180fa60 0 2413 2 0x00000000 [23379.438528] ffff88003b1ad928 0000000000000046 ffff88003b1ad8e8 0000000000000296 [23379.438531] ffff880047046100 00000000000133c0 ffff88003b1adfd8 00000000000133c0 [23379.438535] ffff880215168000 ffff880047046100 ffff88003b1ad938 ffff8800c1151090 [23379.438539] Call Trace: [23379.438542] [<ffffffff816b67f9>] schedule+0x29/0x70 [23379.438555] [<ffffffffa035b985>] btrfs_tree_lock+0x115/0x1f0 [btrfs] [23379.438558] [<ffffffff81095200>] ? __init_waitqueue_head+0x40/0x40 [23379.438569] [<ffffffffa0301261>] btrfs_search_slot+0x821/0x9d0 [btrfs] [23379.438579] [<ffffffffa0302a58>] btrfs_insert_empty_items+0x78/0xd0 [btrfs] [23379.438590] [<ffffffffa0309c9a>] alloc_reserved_file_extent+0x9a/0x2e0 [btrfs] [23379.438602] [<ffffffffa0310893>] __btrfs_run_delayed_refs+0x623/0x1210 [btrfs] [23379.438615] [<ffffffffa033f969>] ? merge_state+0xd9/0x150 [btrfs] [23379.438628] [<ffffffffa033f969>] ? merge_state+0xd9/0x150 [btrfs] [23379.438640] [<ffffffffa03136e6>] btrfs_run_delayed_refs+0x76/0x2b0 [btrfs] [23379.438651] [<ffffffffa030b47c>] ? btrfs_delayed_refs_qgroup_accounting+0xbc/0x100 [btrfs] [23379.438663] [<ffffffffa0324c5f>] __btrfs_end_transaction+0x27f/0x390 [btrfs] [23379.438675] [<ffffffffa0324da0>] btrfs_end_transaction+0x10/0x20 [btrfs] [23379.438687] [<ffffffffa032cfbb>] btrfs_finish_ordered_io+0xcb/0x500 [btrfs] [23379.438699] [<ffffffffa032d405>] finish_ordered_fn+0x15/0x20 [btrfs] [23379.438712] [<ffffffffa03531ea>] worker_loop+0x16a/0x570 [btrfs] [23379.438725] [<ffffffffa0353080>] ? btrfs_queue_worker+0x300/0x300 [btrfs] [23379.438728] [<ffffffff81074191>] kthread+0xe1/0x100 [23379.438732] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 [23379.438736] [<ffffffff816c236c>] ret_from_fork+0x7c/0xb0 [23379.438740] [<ffffffff810740b0>] ? flush_kthread_worker+0xe0/0xe0 ==========The following is my postmark config (Iam not sure if it can be reliably reproduced because i only ran once)======= set location /mnt/btrfs set size 100000 5000000 #set size 1000000 #set size 4096 set transactions 5000000 #set bias read 10 set number 1000000 run quit 2014-04-08 0:01 GMT+08:00 ylet ylet <levin.front@gmail.com>: > hi Josef > > Thanks, I got it. But i still have no idea to reliably reproduce it. > This bug-on is triggered when I ran POSTMARK test lasting two days. > > BTW, can you reproduce the bug reported by email with title of "btrfs > hung with iozone test under linux kernel 3.14." The reported bug can > be reliably reproduced in my environment. > > 2014-04-07 23:50 GMT+08:00 Josef Bacik <jbacik@fb.com>: >> On 04/07/2014 11:45 AM, ylet ylet wrote: >>> >>> Assuming thread 1 (may be VFS want to release the page) wants to >>> release a CLEAN page and thus the eb attaching the page. >>> and thread 2 wants to access the eb and cow the eb. >>> >>> Thread 1 >>> Thread 2 >>> >>> btree_releasepage >>> >>> try_release_extent_buffer >>> >>> release_extent_buffer >>> ->if (atomic_dec_and_test(&eb->refs)) { >>> ->spin_unlock(&eb->refs_lock); >>> -------switch to thread 2 here------ >>> >>> read_block_for_search >>> >>> btrfs_find_tree_block >>> >>> radix_tree_lookup && atomic_inc_not_zero >>> >> >> atomic_inc_not_zero would return 0 here because we've dropped the last >> buffer for this, that is how we protect against this problem. How reliably >> are you able to reproduce this bug? Cause I'd love to find it if you can >> reproduce reliably. Thanks, >> >> Josef ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Is this a race bug when releasing eb? 2014-04-07 16:12 ` ylet ylet @ 2014-04-07 16:16 ` Josef Bacik 0 siblings, 0 replies; 6+ messages in thread From: Josef Bacik @ 2014-04-07 16:16 UTC (permalink / raw) To: ylet ylet, linux-btrfs On 04/07/2014 12:12 PM, ylet ylet wrote: > ======================This is the IOzone test bug (can be reliably > reproduced)================================== > > Btrfs falls into hang when I ran iozone test with the following configuration: > > ./iozone -s 8g -i 0 -i 2 -i 1 -t 4 -r 4k -+w 50 -+y 20 -+C 60 -F > /mnt/btrfs/test1 /mnt/btrfs/test2 /mnt/btrfs/test3 /mnt/btrfs/test > > The bug can be very easily triggered when running this test in my PC. > My PC equipped with 4-core intel I5 cpu, 8G memory and a SSD. dmesg > result is as below, hoping to help you guys: Oh goody one I can answer! That is fixed by this Btrfs: fix deadlock with nested trans handles I'll look into the postmark thing. Thanks, Josef ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-04-07 16:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-07 15:45 Is this a race bug when releasing eb? ylet ylet
2014-04-07 15:50 ` Josef Bacik
[not found] ` <CAAcaxmWEDa+jS8n4LZKdE_ZXFVAoOgACDseQs0_fViYdfRHShg@mail.gmail.com>
2014-04-07 16:01 ` Fwd: " ylet ylet
2014-04-07 16:03 ` Josef Bacik
2014-04-07 16:12 ` ylet ylet
2014-04-07 16:16 ` Josef Bacik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).