* Re: Blocked tasks on 3.15.1
2014-07-22 14:53 ` Chris Mason
@ 2014-07-22 15:14 ` Torbjørn
2014-07-22 16:46 ` Marc MERLIN
` (6 subsequent siblings)
7 siblings, 0 replies; 47+ messages in thread
From: Torbjørn @ 2014-07-22 15:14 UTC (permalink / raw)
To: Chris Mason; +Cc: linux-btrfs
On 07/22/2014 04:53 PM, Chris Mason wrote:
>
> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
>
>>> Running 3.15.6 with this patch applied on top:
>>> - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/`
>>> - no extra error messages printed (`dmesg | grep racing`) compared to
>>> without the patch
>> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with
>> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
>>
>>> To recap some details (so I can have it all in one place):
>>> - /home/ is btrfs with compress=lzo
>> BTRFS RAID 1 with lzo.
>>
>>> - I have _not_ created any nodatacow files.
>> Me neither.
>>
>>> - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
>>> mentioning the use of dmcrypt)
>> Same, except no dmcrypt.
>>
> Thanks for the help in tracking this down everyone. We'll get there!
> Are you all running multi-disk systems (from a btrfs POV, more than one
> device?) I don't care how many physical drives this maps to, just does
> btrfs think there's more than one drive.
>
> -chris
Hi,
In case it's interesting:
From an earlier email thread with subject: 3.15-rc6 -
btrfs-transacti:4157 blocked for more than 120
TLDR: yes, btrfs sees multiple devices.
sata <-> dmcrypt <-> btrfs raid10
btrfs raid10 consist of multiple dmcrypt devices from multiple sata devices.
Mount: /dev/mapper/sdu on /mnt/storage type btrfs
(rw,noatime,space_cache,compress=lzo,inode_cache,subvol=storage)
(yes I know inode_cache is not recommended for general use)
I have a nocow directory in a separate subvolume containing vm-images
used by kvm.
The same kvm-vms are reading/writing data from that array over nfs.
I'm still holding that system on 3.14. Anything above causes blocks.
--
Torbjørn
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: Blocked tasks on 3.15.1
2014-07-22 14:53 ` Chris Mason
2014-07-22 15:14 ` Torbjørn
@ 2014-07-22 16:46 ` Marc MERLIN
2014-07-22 19:42 ` Torbjørn
` (5 subsequent siblings)
7 siblings, 0 replies; 47+ messages in thread
From: Marc MERLIN @ 2014-07-22 16:46 UTC (permalink / raw)
To: Chris Mason; +Cc: Martin Steigerwald, Cody P Schafer, Chris Samuel, linux-btrfs
On Tue, Jul 22, 2014 at 10:53:03AM -0400, Chris Mason wrote:
> Thanks for the help in tracking this down everyone. We'll get there!
> Are you all running multi-disk systems (from a btrfs POV, more than one
> device?) I don't care how many physical drives this maps to, just does
> btrfs think there's more than one drive.
In the bugs I sent you, it was a mix of arrays that were
mdraid / dmcrypt / btrfs
I have also one array with:
disk1 / dmcrypt \
- btrfs (2 drives visible by btrfs)
disk2 / dmcrypt /
The multidrive setup seemed a bit worse, I just destroyed it and went
back to putting all the drives together with mdadm and showing a single
dmcrypted device to btrfs.
But that is still super unstable on my server with 3.15, while being
somewhat usable with my laptop (it still hangs, but more rarely)
The one difference is that my laptop actually does
disk > dmcrypt > btrfs
while my server does
disks > mdadm > dmcrypt > btrfs
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: Blocked tasks on 3.15.1
2014-07-22 14:53 ` Chris Mason
2014-07-22 15:14 ` Torbjørn
2014-07-22 16:46 ` Marc MERLIN
@ 2014-07-22 19:42 ` Torbjørn
2014-07-22 19:50 ` Chris Mason
2014-07-22 21:13 ` Martin Steigerwald
` (4 subsequent siblings)
7 siblings, 1 reply; 47+ messages in thread
From: Torbjørn @ 2014-07-22 19:42 UTC (permalink / raw)
To: Chris Mason; +Cc: linux-btrfs
On 07/22/2014 04:53 PM, Chris Mason wrote:
>
> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
>
>>> Running 3.15.6 with this patch applied on top:
>>> - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/`
>>> - no extra error messages printed (`dmesg | grep racing`) compared to
>>> without the patch
>> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with
>> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
>>
>>> To recap some details (so I can have it all in one place):
>>> - /home/ is btrfs with compress=lzo
>> BTRFS RAID 1 with lzo.
>>
>>> - I have _not_ created any nodatacow files.
>> Me neither.
>>
>>> - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
>>> mentioning the use of dmcrypt)
>> Same, except no dmcrypt.
>>
> Thanks for the help in tracking this down everyone. We'll get there!
> Are you all running multi-disk systems (from a btrfs POV, more than one
> device?) I don't care how many physical drives this maps to, just does
> btrfs think there's more than one drive.
>
> -chris
3.16-rc6 with your patch on top still causes hangs here.
No traces of "racing" in dmesg
Hang is on a btrfs raid 0 consisting of 3 drives.
Full stack is: sata <-> dmcrypt <-> btrfs raid0
Hang was caused by
1. Several rsync -av --inplace --delete <source> <backup subvol>
2. btrfs subvolume snapshot -r <backup subvol> <bacup snap>
The rsync jobs are done one at a time
btrfs is stuck when trying to create the read only snapshot
--
Torbjørn
All output via netconsole.
sysrq-w:
https://gist.github.com/anonymous/d1837187e261f9a4cbd2#file-gistfile1-txt
sysrq-t:
https://gist.github.com/anonymous/2bdb73f035ab9918c63d#file-gistfile1-txt
dmesg:
[ 9352.784136] INFO: task btrfs-transacti:3874 blocked for more than 120
seconds.
[ 9352.784222] Tainted: G E 3.16.0-rc6+ #64
[ 9352.784270] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 9352.784354] btrfs-transacti D ffff88042fc943c0 0 3874 2
0x00000000
[ 9352.784413] ffff8803fb9dfca0 0000000000000002 ffff8800c4214b90
ffff8803fb9dffd8
[ 9352.784502] 00000000000143c0 00000000000143c0 ffff88041977b260
ffff8803d29f23a0
[ 9352.784592] ffff8803d29f23a8 7fffffffffffffff ffff8800c4214b90
ffff880232e2c0a8
[ 9352.784682] Call Trace:
[ 9352.784726] [<ffffffff8170eb59>] schedule+0x29/0x70
[ 9352.784774] [<ffffffff8170df99>] schedule_timeout+0x209/0x280
[ 9352.784827] [<ffffffff8170874b>] ? __slab_free+0xfe/0x2c3
[ 9352.784879] [<ffffffff810829f4>] ? wake_up_worker+0x24/0x30
[ 9352.784929] [<ffffffff8170f656>] wait_for_completion+0xa6/0x160
[ 9352.784981] [<ffffffff8109d4e0>] ? wake_up_state+0x20/0x20
[ 9352.785049] [<ffffffffc045b936>]
btrfs_wait_and_free_delalloc_work+0x16/0x30 [btrfs]
[ 9352.785141] [<ffffffffc04658be>]
btrfs_run_ordered_operations+0x1ee/0x2c0 [btrfs]
[ 9352.785260] [<ffffffffc044bbb7>] btrfs_commit_transaction+0x27/0xa40
[btrfs]
[ 9352.785324] [<ffffffffc0447d65>] transaction_kthread+0x1b5/0x240 [btrfs]
[ 9352.785385] [<ffffffffc0447bb0>] ?
btrfs_cleanup_transaction+0x560/0x560 [btrfs]
[ 9352.785469] [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9352.785517] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.785571] [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9352.785620] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.785678] INFO: task kworker/u16:3:6932 blocked for more than 120
seconds.
[ 9352.785732] Tainted: G E 3.16.0-rc6+ #64
[ 9352.785780] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 9352.785863] kworker/u16:3 D ffff88042fd943c0 0 6932 2
0x00000000
[ 9352.785930] Workqueue: btrfs-flush_delalloc normal_work_helper [btrfs]
[ 9352.785983] ffff88035f1bbb58 0000000000000002 ffff880417e564c0
ffff88035f1bbfd8
[ 9352.786072] 00000000000143c0 00000000000143c0 ffff8800c1a03260
ffff88042fd94cd8
[ 9352.786160] ffff88042ffb4be8 ffff88035f1bbbe0 0000000000000002
ffffffff81159930
[ 9352.786250] Call Trace:
[ 9352.786292] [<ffffffff81159930>] ? wait_on_page_read+0x60/0x60
[ 9352.786343] [<ffffffff8170ee6d>] io_schedule+0x9d/0x130
[ 9352.786393] [<ffffffff8115993e>] sleep_on_page+0xe/0x20
[ 9352.786443] [<ffffffff8170f3e8>] __wait_on_bit_lock+0x48/0xb0
[ 9352.786495] [<ffffffff81159a4a>] __lock_page+0x6a/0x70
[ 9352.786544] [<ffffffff810b14a0>] ? autoremove_wake_function+0x40/0x40
[ 9352.786607] [<ffffffffc046711e>] ? flush_write_bio+0xe/0x10 [btrfs]
[ 9352.786669] [<ffffffffc046b0c0>]
extent_write_cache_pages.isra.28.constprop.46+0x3d0/0x3f0 [btrfs]
[ 9352.786766] [<ffffffffc046cd2d>] extent_writepages+0x4d/0x70 [btrfs]
[ 9352.786828] [<ffffffffc04506f0>] ? btrfs_submit_direct+0x6a0/0x6a0
[btrfs]
[ 9352.786883] [<ffffffff810b0d78>] ? __wake_up_common+0x58/0x90
[ 9352.786943] [<ffffffffc044e1d8>] btrfs_writepages+0x28/0x30 [btrfs]
[ 9352.786997] [<ffffffff811668ee>] do_writepages+0x1e/0x40
[ 9352.787045] [<ffffffff8115b409>] __filemap_fdatawrite_range+0x59/0x60
[ 9352.787097] [<ffffffff8115b4bc>] filemap_flush+0x1c/0x20
[ 9352.787151] [<ffffffffc045158a>] btrfs_run_delalloc_work+0x5a/0xa0
[btrfs]
[ 9352.787211] [<ffffffffc047accf>] normal_work_helper+0x11f/0x2b0 [btrfs]
[ 9352.787266] [<ffffffff810858d2>] process_one_work+0x182/0x450
[ 9352.787317] [<ffffffff81086043>] worker_thread+0x123/0x5a0
[ 9352.787464] [<ffffffff81085f20>] ? rescuer_thread+0x380/0x380
[ 9352.787515] [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9352.787563] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.787616] [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9352.787666] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.787744] INFO: task kworker/u16:9:8360 blocked for more than 120
seconds.
[ 9352.787799] Tainted: G E 3.16.0-rc6+ #64
[ 9352.787847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 9352.787949] kworker/u16:9 D ffff88042fd543c0 0 8360 2
0x00000000
[ 9352.788015] Workqueue: btrfs-delalloc normal_work_helper [btrfs]
[ 9352.788139] ffff88035f20f838 0000000000000002 ffff8803216c1930
ffff88035f20ffd8
[ 9352.788229] 00000000000143c0 00000000000143c0 ffff8804157e0000
ffff88042fd54cd8
[ 9352.788317] ffff88042ffad6e8 ffff88035f20f8c0 0000000000000002
ffffffff81159930
[ 9352.788406] Call Trace:
[ 9352.788445] [<ffffffff81159930>] ? wait_on_page_read+0x60/0x60
[ 9352.788495] [<ffffffff8170ee6d>] io_schedule+0x9d/0x130
[ 9352.788543] [<ffffffff8115993e>] sleep_on_page+0xe/0x20
[ 9352.788591] [<ffffffff8170f3e8>] __wait_on_bit_lock+0x48/0xb0
[ 9352.788642] [<ffffffff8133c8a4>] ? blk_finish_plug+0x14/0x40
[ 9352.788692] [<ffffffff81159a4a>] __lock_page+0x6a/0x70
[ 9352.788740] [<ffffffff810b14a0>] ? autoremove_wake_function+0x40/0x40
[ 9352.788792] [<ffffffff8115ac34>] pagecache_get_page+0x164/0x1f0
[ 9352.788849] [<ffffffffc048e447>] io_ctl_prepare_pages+0x67/0x180 [btrfs]
[ 9352.788933] [<ffffffffc048fd0f>] __load_free_space_cache+0x1ff/0x700
[btrfs]
[ 9352.788998] [<ffffffffc049030c>] load_free_space_cache+0xfc/0x1c0
[btrfs]
[ 9352.789059] [<ffffffffc0430292>] cache_block_group+0x192/0x390 [btrfs]
[ 9352.789113] [<ffffffff810b1460>] ? prepare_to_wait_event+0x100/0x100
[ 9352.789173] [<ffffffffc04390a2>] find_free_extent+0xc72/0xcb0 [btrfs]
[ 9352.789233] [<ffffffffc04392ef>] btrfs_reserve_extent+0xaf/0x1b0 [btrfs]
[ 9352.789295] [<ffffffffc04526cc>] cow_file_range+0x13c/0x430 [btrfs]
[ 9352.789355] [<ffffffffc0453c80>] ?
submit_compressed_extents+0x480/0x480 [btrfs]
[ 9352.789444] [<ffffffffc04539d2>]
submit_compressed_extents+0x1d2/0x480 [btrfs]
[ 9352.789533] [<ffffffffc0451524>] ? async_cow_free+0x24/0x30 [btrfs]
[ 9352.789592] [<ffffffffc0453c80>] ?
submit_compressed_extents+0x480/0x480 [btrfs]
[ 9352.789680] [<ffffffffc0453d06>] async_cow_submit+0x86/0x90 [btrfs]
[ 9352.789737] [<ffffffffc047ad43>] normal_work_helper+0x193/0x2b0 [btrfs]
[ 9352.789790] [<ffffffff810858d2>] process_one_work+0x182/0x450
[ 9352.789840] [<ffffffff81086043>] worker_thread+0x123/0x5a0
[ 9352.789889] [<ffffffff81085f20>] ? rescuer_thread+0x380/0x380
[ 9352.789938] [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9352.789984] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.790038] [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9352.790088] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.790144] INFO: task kworker/u16:0:11682 blocked for more than 120
seconds.
[ 9352.790199] Tainted: G E 3.16.0-rc6+ #64
[ 9352.795261] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 9352.795347] kworker/u16:0 D ffff88042fd543c0 0 11682 2
0x00000000
[ 9352.795409] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-3)
[ 9352.795465] ffff8802161cf9d8 0000000000000002 ffff8803fc3c0000
ffff8802161cffd8
[ 9352.795554] 00000000000143c0 00000000000143c0 ffff8804197e0000
ffff88042fd54cd8
[ 9352.795644] ffff88042ffaa9e8 ffff8802161cfa60 0000000000000002
ffffffff81159930
[ 9352.795734] Call Trace:
[ 9352.795776] [<ffffffff81159930>] ? wait_on_page_read+0x60/0x60
[ 9352.795829] [<ffffffff8170ee6d>] io_schedule+0x9d/0x130
[ 9352.795879] [<ffffffff8115993e>] sleep_on_page+0xe/0x20
[ 9352.795928] [<ffffffff8170f3e8>] __wait_on_bit_lock+0x48/0xb0
[ 9352.795979] [<ffffffff81159a4a>] __lock_page+0x6a/0x70
[ 9352.796029] [<ffffffff810b14a0>] ? autoremove_wake_function+0x40/0x40
[ 9352.796150] [<ffffffffc046711e>] ? flush_write_bio+0xe/0x10 [btrfs]
[ 9352.796212] [<ffffffffc046b0c0>]
extent_write_cache_pages.isra.28.constprop.46+0x3d0/0x3f0 [btrfs]
[ 9352.796309] [<ffffffffc046cd2d>] extent_writepages+0x4d/0x70 [btrfs]
[ 9352.796369] [<ffffffffc04506f0>] ? btrfs_submit_direct+0x6a0/0x6a0
[btrfs]
[ 9352.796428] [<ffffffffc044e1d8>] btrfs_writepages+0x28/0x30 [btrfs]
[ 9352.796481] [<ffffffff811668ee>] do_writepages+0x1e/0x40
[ 9352.796530] [<ffffffff811f17d0>] __writeback_single_inode+0x40/0x210
[ 9352.796581] [<ffffffff811f22b7>] writeback_sb_inodes+0x247/0x3e0
[ 9352.796632] [<ffffffff811f24ef>] __writeback_inodes_wb+0x9f/0xd0
[ 9352.796682] [<ffffffff811f2763>] wb_writeback+0x243/0x2c0
[ 9352.796731] [<ffffffff811f4e09>] bdi_writeback_workfn+0x1b9/0x430
[ 9352.796784] [<ffffffff810858d2>] process_one_work+0x182/0x450
[ 9352.796833] [<ffffffff81086043>] worker_thread+0x123/0x5a0
[ 9352.796882] [<ffffffff81085f20>] ? rescuer_thread+0x380/0x380
[ 9352.796933] [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9352.796979] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.797031] [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9352.797079] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.797134] INFO: task kworker/u16:4:13743 blocked for more than 120
seconds.
[ 9352.797189] Tainted: G E 3.16.0-rc6+ #64
[ 9352.797299] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 9352.797383] kworker/u16:4 D ffff88042fc943c0 0 13743 2
0x00000000
[ 9352.797451] Workqueue: btrfs-flush_delalloc normal_work_helper [btrfs]
[ 9352.797505] ffff880139843b58 0000000000000002 ffff880191e38000
ffff880139843fd8
[ 9352.797594] 00000000000143c0 00000000000143c0 ffff88041977b260
ffff88042fc94cd8
[ 9352.797684] ffff88042ffb4be8 ffff880139843be0 0000000000000002
ffffffff81159930
[ 9352.797773] Call Trace:
[ 9352.797813] [<ffffffff81159930>] ? wait_on_page_read+0x60/0x60
[ 9352.797865] [<ffffffff8170ee6d>] io_schedule+0x9d/0x130
[ 9352.797914] [<ffffffff8115993e>] sleep_on_page+0xe/0x20
[ 9352.797963] [<ffffffff8170f3e8>] __wait_on_bit_lock+0x48/0xb0
[ 9352.798013] [<ffffffff81159a4a>] __lock_page+0x6a/0x70
[ 9352.798060] [<ffffffff810b14a0>] ? autoremove_wake_function+0x40/0x40
[ 9352.798119] [<ffffffffc046711e>] ? flush_write_bio+0xe/0x10 [btrfs]
[ 9352.798176] [<ffffffffc046b0c0>]
extent_write_cache_pages.isra.28.constprop.46+0x3d0/0x3f0 [btrfs]
[ 9352.798269] [<ffffffffc046cd2d>] extent_writepages+0x4d/0x70 [btrfs]
[ 9352.798326] [<ffffffffc04506f0>] ? btrfs_submit_direct+0x6a0/0x6a0
[btrfs]
[ 9352.798385] [<ffffffffc044e1d8>] btrfs_writepages+0x28/0x30 [btrfs]
[ 9352.798437] [<ffffffff811668ee>] do_writepages+0x1e/0x40
[ 9352.798485] [<ffffffff8115b409>] __filemap_fdatawrite_range+0x59/0x60
[ 9352.798537] [<ffffffff8115b4bc>] filemap_flush+0x1c/0x20
[ 9352.798590] [<ffffffffc045158a>] btrfs_run_delalloc_work+0x5a/0xa0
[btrfs]
[ 9352.798650] [<ffffffffc047accf>] normal_work_helper+0x11f/0x2b0 [btrfs]
[ 9352.798703] [<ffffffff810858d2>] process_one_work+0x182/0x450
[ 9352.798752] [<ffffffff81086043>] worker_thread+0x123/0x5a0
[ 9352.798801] [<ffffffff81085f20>] ? rescuer_thread+0x380/0x380
[ 9352.798851] [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9352.798897] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.798949] [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9352.798997] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.799049] INFO: task btrfs:13823 blocked for more than 120 seconds.
[ 9352.799101] Tainted: G E 3.16.0-rc6+ #64
[ 9352.799148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 9352.799229] btrfs D ffff88042fd943c0 0 13823 13461
0x00000000
[ 9352.799286] ffff880177203a80 0000000000000002 ffff8800c1a01930
ffff880177203fd8
[ 9352.799375] 00000000000143c0 00000000000143c0 ffff8800c2f664c0
ffff8803d29f2438
[ 9352.799463] ffff8803d29f2440 7fffffffffffffff ffff8800c1a01930
ffff880177203bc0
[ 9352.799552] Call Trace:
[ 9352.799590] [<ffffffff8170eb59>] schedule+0x29/0x70
[ 9352.799637] [<ffffffff8170df99>] schedule_timeout+0x209/0x280
[ 9352.799687] [<ffffffff8109d4a3>] ? wake_up_process+0x23/0x40
[ 9352.799736] [<ffffffff810829f4>] ? wake_up_worker+0x24/0x30
[ 9352.799786] [<ffffffff810836ab>] ? insert_work+0x6b/0xb0
[ 9352.799877] [<ffffffff8170f656>] wait_for_completion+0xa6/0x160
[ 9352.799928] [<ffffffff8109d4e0>] ? wake_up_state+0x20/0x20
[ 9352.799989] [<ffffffffc045b936>]
btrfs_wait_and_free_delalloc_work+0x16/0x30 [btrfs]
[ 9352.800080] [<ffffffffc045bafe>] __start_delalloc_inodes+0x1ae/0x2a0
[btrfs]
[ 9352.800150] [<ffffffffc045bc2e>]
btrfs_start_delalloc_inodes+0x3e/0x120 [btrfs]
[ 9352.800235] [<ffffffff810b1228>] ? finish_wait+0x58/0x70
[ 9352.800293] [<ffffffffc047fa24>] btrfs_mksubvol.isra.29+0x1d4/0x570
[btrfs]
[ 9352.800348] [<ffffffff810b1460>] ? prepare_to_wait_event+0x100/0x100
[ 9352.800409] [<ffffffffc047ff46>]
btrfs_ioctl_snap_create_transid+0x186/0x190 [btrfs]
[ 9352.800501] [<ffffffffc04800ab>]
btrfs_ioctl_snap_create_v2+0xeb/0x130 [btrfs]
[ 9352.800592] [<ffffffffc048218f>] btrfs_ioctl+0xcaf/0x2ae0 [btrfs]
[ 9352.800645] [<ffffffff8118d933>] ? mmap_region+0x163/0x5d0
[ 9352.800696] [<ffffffff81057c8c>] ? __do_page_fault+0x20c/0x550
[ 9352.800748] [<ffffffff811dc840>] do_vfs_ioctl+0x2e0/0x4c0
[ 9352.800798] [<ffffffff810a0e64>] ? vtime_account_user+0x54/0x60
[ 9352.800850] [<ffffffff811dcaa1>] SyS_ioctl+0x81/0xa0
[ 9352.800899] [<ffffffff817130bf>] tracesys+0xe1/0xe6
[ 9354.863323] kvm [4191]: vcpu0 unhandled rdmsr: 0x345
[ 9354.869074] kvm_set_msr_common: 118 callbacks suppressed
[ 9354.869130] kvm [4191]: vcpu0 unhandled wrmsr: 0x680 data 0
[ 9354.869185] kvm [4191]: vcpu0 unhandled wrmsr: 0x6c0 data 0
[ 9354.869240] kvm [4191]: vcpu0 unhandled wrmsr: 0x681 data 0
[ 9354.869294] kvm [4191]: vcpu0 unhandled wrmsr: 0x6c1 data 0
[ 9354.869346] kvm [4191]: vcpu0 unhandled wrmsr: 0x682 data 0
[ 9354.869397] kvm [4191]: vcpu0 unhandled wrmsr: 0x6c2 data 0
[ 9354.869449] kvm [4191]: vcpu0 unhandled wrmsr: 0x683 data 0
[ 9354.869502] kvm [4191]: vcpu0 unhandled wrmsr: 0x6c3 data 0
[ 9354.869556] kvm [4191]: vcpu0 unhandled wrmsr: 0x684 data 0
[ 9354.869610] kvm [4191]: vcpu0 unhandled wrmsr: 0x6c4 data 0
[ 9376.448998] kvm [4191]: vcpu0 unhandled rdmsr: 0x611
[ 9376.449052] kvm [4191]: vcpu0 unhandled rdmsr: 0x639
[ 9376.449100] kvm [4191]: vcpu0 unhandled rdmsr: 0x641
[ 9376.449148] kvm [4191]: vcpu0 unhandled rdmsr: 0x619
[ 9472.705412] INFO: task btrfs-transacti:3874 blocked for more than 120
seconds.
[ 9472.705499] Tainted: G E 3.16.0-rc6+ #64
[ 9472.705547] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 9472.705629] btrfs-transacti D ffff88042fc943c0 0 3874 2
0x00000000
[ 9472.705688] ffff8803fb9dfca0 0000000000000002 ffff8800c4214b90
ffff8803fb9dffd8
[ 9472.705778] 00000000000143c0 00000000000143c0 ffff88041977b260
ffff8803d29f23a0
[ 9472.705868] ffff8803d29f23a8 7fffffffffffffff ffff8800c4214b90
ffff880232e2c0a8
[ 9472.705976] Call Trace:
[ 9472.706019] [<ffffffff8170eb59>] schedule+0x29/0x70
[ 9472.706067] [<ffffffff8170df99>] schedule_timeout+0x209/0x280
[ 9472.706118] [<ffffffff8170874b>] ? __slab_free+0xfe/0x2c3
[ 9472.706169] [<ffffffff810829f4>] ? wake_up_worker+0x24/0x30
[ 9472.706219] [<ffffffff8170f656>] wait_for_completion+0xa6/0x160
[ 9472.706270] [<ffffffff8109d4e0>] ? wake_up_state+0x20/0x20
[ 9472.706338] [<ffffffffc045b936>]
btrfs_wait_and_free_delalloc_work+0x16/0x30 [btrfs]
[ 9472.706429] [<ffffffffc04658be>]
btrfs_run_ordered_operations+0x1ee/0x2c0 [btrfs]
[ 9472.706518] [<ffffffffc044bbb7>] btrfs_commit_transaction+0x27/0xa40
[btrfs]
[ 9472.706579] [<ffffffffc0447d65>] transaction_kthread+0x1b5/0x240 [btrfs]
[ 9472.706637] [<ffffffffc0447bb0>] ?
btrfs_cleanup_transaction+0x560/0x560 [btrfs]
[ 9472.706720] [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9472.706767] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9472.706820] [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9472.706869] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9472.706926] INFO: task kworker/u16:3:6932 blocked for more than 120
seconds.
[ 9472.706980] Tainted: G E 3.16.0-rc6+ #64
[ 9472.707029] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 9472.707112] kworker/u16:3 D ffff88042fd943c0 0 6932 2
0x00000000
[ 9472.707179] Workqueue: btrfs-flush_delalloc normal_work_helper [btrfs]
[ 9472.707233] ffff88035f1bbb58 0000000000000002 ffff880417e564c0
ffff88035f1bbfd8
[ 9472.707322] 00000000000143c0 00000000000143c0 ffff8800c1a03260
ffff88042fd94cd8
[ 9472.707413] ffff88042ffb4be8 ffff88035f1bbbe0 0000000000000002
ffffffff81159930
[ 9472.707502] Call Trace:
[ 9472.707543] [<ffffffff81159930>] ? wait_on_page_read+0x60/0x60
[ 9472.707593] [<ffffffff8170ee6d>] io_schedule+0x9d/0x130
[ 9472.707643] [<ffffffff8115993e>] sleep_on_page+0xe/0x20
[ 9472.707697] [<ffffffff8170f3e8>] __wait_on_bit_lock+0x48/0xb0
[ 9472.707770] [<ffffffff81159a4a>] __lock_page+0x6a/0x70
[ 9472.707820] [<ffffffff810b14a0>] ? autoremove_wake_function+0x40/0x40
[ 9472.707884] [<ffffffffc046711e>] ? flush_write_bio+0xe/0x10 [btrfs]
[ 9472.707947] [<ffffffffc046b0c0>]
extent_write_cache_pages.isra.28.constprop.46+0x3d0/0x3f0 [btrfs]
[ 9472.713000] [<ffffffffc046cd2d>] extent_writepages+0x4d/0x70 [btrfs]
[ 9472.713092] [<ffffffffc04506f0>] ? btrfs_submit_direct+0x6a0/0x6a0
[btrfs]
[ 9472.713148] [<ffffffff810b0d78>] ? __wake_up_common+0x58/0x90
[ 9472.713203] [<ffffffffc044e1d8>] btrfs_writepages+0x28/0x30 [btrfs]
[ 9472.713256] [<ffffffff811668ee>] do_writepages+0x1e/0x40
[ 9472.713305] [<ffffffff8115b409>] __filemap_fdatawrite_range+0x59/0x60
[ 9472.713359] [<ffffffff8115b4bc>] filemap_flush+0x1c/0x20
[ 9472.713425] [<ffffffffc045158a>] btrfs_run_delalloc_work+0x5a/0xa0
[btrfs]
[ 9472.713491] [<ffffffffc047accf>] normal_work_helper+0x11f/0x2b0 [btrfs]
[ 9472.713547] [<ffffffff810858d2>] process_one_work+0x182/0x450
[ 9472.713598] [<ffffffff81086043>] worker_thread+0x123/0x5a0
[ 9472.713648] [<ffffffff81085f20>] ? rescuer_thread+0x380/0x380
[ 9472.713723] [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9472.713794] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9472.713849] [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9472.713898] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9472.713953] INFO: task kworker/u16:9:8360 blocked for more than 120
seconds.
[ 9472.714028] Tainted: G E 3.16.0-rc6+ #64
[ 9472.714076] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 9472.714159] kworker/u16:9 D ffff88042fd543c0 0 8360 2
0x00000000
[ 9472.714225] Workqueue: btrfs-delalloc normal_work_helper [btrfs]
[ 9472.714277] ffff88035f20f838 0000000000000002 ffff8803216c1930
ffff88035f20ffd8
[ 9472.714366] 00000000000143c0 00000000000143c0 ffff8804157e0000
ffff88042fd54cd8
[ 9472.714455] ffff88042ffad6e8 ffff88035f20f8c0 0000000000000002
ffffffff81159930
[ 9472.714544] Call Trace:
[ 9472.714584] [<ffffffff81159930>] ? wait_on_page_read+0x60/0x60
[ 9472.714636] [<ffffffff8170ee6d>] io_schedule+0x9d/0x130
[ 9472.714684] [<ffffffff8115993e>] sleep_on_page+0xe/0x20
[ 9472.714734] [<ffffffff8170f3e8>] __wait_on_bit_lock+0x48/0xb0
[ 9472.714785] [<ffffffff8133c8a4>] ? blk_finish_plug+0x14/0x40
[ 9472.714835] [<ffffffff81159a4a>] __lock_page+0x6a/0x70
[ 9472.714884] [<ffffffff810b14a0>] ? autoremove_wake_function+0x40/0x40
[ 9472.714938] [<ffffffff8115ac34>] pagecache_get_page+0x164/0x1f0
[ 9472.714999] [<ffffffffc048e447>] io_ctl_prepare_pages+0x67/0x180 [btrfs]
[ 9472.715088] [<ffffffffc048fd0f>] __load_free_space_cache+0x1ff/0x700
[btrfs]
[ 9472.715152] [<ffffffffc049030c>] load_free_space_cache+0xfc/0x1c0
[btrfs]
[ 9472.715213] [<ffffffffc0430292>] cache_block_group+0x192/0x390 [btrfs]
[ 9472.715267] [<ffffffff810b1460>] ? prepare_to_wait_event+0x100/0x100
[ 9472.715328] [<ffffffffc04390a2>] find_free_extent+0xc72/0xcb0 [btrfs]
[ 9472.715388] [<ffffffffc04392ef>] btrfs_reserve_extent+0xaf/0x1b0 [btrfs]
[ 9472.715451] [<ffffffffc04526cc>] cow_file_range+0x13c/0x430 [btrfs]
[ 9472.715511] [<ffffffffc0453c80>] ?
submit_compressed_extents+0x480/0x480 [btrfs]
[ 9472.715602] [<ffffffffc04539d2>]
submit_compressed_extents+0x1d2/0x480 [btrfs]
[ 9472.715691] [<ffffffffc0451524>] ? async_cow_free+0x24/0x30 [btrfs]
[ 9472.715748] [<ffffffffc0453c80>] ?
submit_compressed_extents+0x480/0x480 [btrfs]
[ 9472.715835] [<ffffffffc0453d06>] async_cow_submit+0x86/0x90 [btrfs]
[ 9472.715894] [<ffffffffc047ad43>] normal_work_helper+0x193/0x2b0 [btrfs]
[ 9472.715948] [<ffffffff810858d2>] process_one_work+0x182/0x450
[ 9472.715998] [<ffffffff81086043>] worker_thread+0x123/0x5a0
[ 9472.716047] [<ffffffff81085f20>] ? rescuer_thread+0x380/0x380
[ 9472.716098] [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9472.716145] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9472.716197] [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9472.716246] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9472.716303] INFO: task kworker/u16:0:11682 blocked for more than 120
seconds.
[ 9472.716357] Tainted: G E 3.16.0-rc6+ #64
[ 9472.716405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 9472.716488] kworker/u16:0 D ffff88042fd543c0 0 11682 2
0x00000000
[ 9472.716548] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-3)
[ 9472.716610] ffff8802161cf9d8 0000000000000002 ffff8803fc3c0000
ffff8802161cffd8
[ 9472.716699] 00000000000143c0 00000000000143c0 ffff8804197e0000
ffff88042fd54cd8
[ 9472.716788] ffff88042ffaa9e8 ffff8802161cfa60 0000000000000002
ffffffff81159930
[ 9472.716878] Call Trace:
[ 9472.716917] [<ffffffff81159930>] ? wait_on_page_read+0x60/0x60
[ 9472.716968] [<ffffffff8170ee6d>] io_schedule+0x9d/0x130
[ 9472.717017] [<ffffffff8115993e>] sleep_on_page+0xe/0x20
[ 9472.717067] [<ffffffff8170f3e8>] __wait_on_bit_lock+0x48/0xb0
[ 9472.717118] [<ffffffff81159a4a>] __lock_page+0x6a/0x70
[ 9472.717167] [<ffffffff810b14a0>] ? autoremove_wake_function+0x40/0x40
[ 9472.717231] [<ffffffffc046711e>] ? flush_write_bio+0xe/0x10 [btrfs]
[ 9472.717293] [<ffffffffc046b0c0>]
extent_write_cache_pages.isra.28.constprop.46+0x3d0/0x3f0 [btrfs]
[ 9472.717402] [<ffffffffc046cd2d>] extent_writepages+0x4d/0x70 [btrfs]
[ 9472.717465] [<ffffffffc04506f0>] ? btrfs_submit_direct+0x6a0/0x6a0
[btrfs]
[ 9472.717553] [<ffffffffc044e1d8>] btrfs_writepages+0x28/0x30 [btrfs]
[ 9472.717607] [<ffffffff811668ee>] do_writepages+0x1e/0x40
[ 9472.717657] [<ffffffff811f17d0>] __writeback_single_inode+0x40/0x210
[ 9472.717710] [<ffffffff811f22b7>] writeback_sb_inodes+0x247/0x3e0
[ 9472.717784] [<ffffffff811f24ef>] __writeback_inodes_wb+0x9f/0xd0
[ 9472.717836] [<ffffffff811f2763>] wb_writeback+0x243/0x2c0
[ 9472.717885] [<ffffffff811f4e09>] bdi_writeback_workfn+0x1b9/0x430
[ 9472.717938] [<ffffffff810858d2>] process_one_work+0x182/0x450
[ 9472.717990] [<ffffffff81086043>] worker_thread+0x123/0x5a0
[ 9472.718039] [<ffffffff81085f20>] ? rescuer_thread+0x380/0x380
[ 9472.718089] [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9472.718136] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9472.718188] [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9472.718237] [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: Blocked tasks on 3.15.1
2014-07-22 19:42 ` Torbjørn
@ 2014-07-22 19:50 ` Chris Mason
2014-07-22 20:10 ` Torbjørn
0 siblings, 1 reply; 47+ messages in thread
From: Chris Mason @ 2014-07-22 19:50 UTC (permalink / raw)
To: Torbjørn; +Cc: linux-btrfs
On 07/22/2014 03:42 PM, Torbjørn wrote:
> On 07/22/2014 04:53 PM, Chris Mason wrote:
>>
>> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
>>
>>>> Running 3.15.6 with this patch applied on top:
>>>> - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/
>>>> /home/nyx/`
>>>> - no extra error messages printed (`dmesg | grep racing`) compared to
>>>> without the patch
>>> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang
>>> with
>>> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
>>>
>>>> To recap some details (so I can have it all in one place):
>>>> - /home/ is btrfs with compress=lzo
>>> BTRFS RAID 1 with lzo.
>>>
>>>> - I have _not_ created any nodatacow files.
>>> Me neither.
>>>
>>>> - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
>>>> mentioning the use of dmcrypt)
>>> Same, except no dmcrypt.
>>>
>> Thanks for the help in tracking this down everyone. We'll get there!
>> Are you all running multi-disk systems (from a btrfs POV, more than one
>> device?) I don't care how many physical drives this maps to, just does
>> btrfs think there's more than one drive.
>>
>> -chris
> 3.16-rc6 with your patch on top still causes hangs here.
> No traces of "racing" in dmesg
> Hang is on a btrfs raid 0 consisting of 3 drives.
> Full stack is: sata <-> dmcrypt <-> btrfs raid0
>
> Hang was caused by
> 1. Several rsync -av --inplace --delete <source> <backup subvol>
> 2. btrfs subvolume snapshot -r <backup subvol> <bacup snap>
>
> The rsync jobs are done one at a time
> btrfs is stuck when trying to create the read only snapshot
The trace is similar, but you're stuck trying to read the free space
cache. This one I saw earlier this morning, but I haven't seen these
parts from the 3.15 bug reports.
Maybe they are related though, I'll dig into the 3.15 bug reports again.
-chris
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Blocked tasks on 3.15.1
2014-07-22 19:50 ` Chris Mason
@ 2014-07-22 20:10 ` Torbjørn
0 siblings, 0 replies; 47+ messages in thread
From: Torbjørn @ 2014-07-22 20:10 UTC (permalink / raw)
To: Chris Mason; +Cc: linux-btrfs
On 07/22/2014 09:50 PM, Chris Mason wrote:
> On 07/22/2014 03:42 PM, Torbjørn wrote:
>> On 07/22/2014 04:53 PM, Chris Mason wrote:
>>> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
>>>
>>>>> Running 3.15.6 with this patch applied on top:
>>>>> - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/
>>>>> /home/nyx/`
>>>>> - no extra error messages printed (`dmesg | grep racing`) compared to
>>>>> without the patch
>>>> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang
>>>> with
>>>> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
>>>>
>>>>> To recap some details (so I can have it all in one place):
>>>>> - /home/ is btrfs with compress=lzo
>>>> BTRFS RAID 1 with lzo.
>>>>
>>>>> - I have _not_ created any nodatacow files.
>>>> Me neither.
>>>>
>>>>> - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
>>>>> mentioning the use of dmcrypt)
>>>> Same, except no dmcrypt.
>>>>
>>> Thanks for the help in tracking this down everyone. We'll get there!
>>> Are you all running multi-disk systems (from a btrfs POV, more than one
>>> device?) I don't care how many physical drives this maps to, just does
>>> btrfs think there's more than one drive.
>>>
>>> -chris
>> 3.16-rc6 with your patch on top still causes hangs here.
>> No traces of "racing" in dmesg
>> Hang is on a btrfs raid 0 consisting of 3 drives.
>> Full stack is: sata <-> dmcrypt <-> btrfs raid0
>>
>> Hang was caused by
>> 1. Several rsync -av --inplace --delete <source> <backup subvol>
>> 2. btrfs subvolume snapshot -r <backup subvol> <bacup snap>
>>
>> The rsync jobs are done one at a time
>> btrfs is stuck when trying to create the read only snapshot
> The trace is similar, but you're stuck trying to read the free space
> cache. This one I saw earlier this morning, but I haven't seen these
> parts from the 3.15 bug reports.
>
> Maybe they are related though, I'll dig into the 3.15 bug reports again.
>
> -chris
In case it was not clear, this hang was on a different btrfs volume than
the 3.15 hang (but the same server).
Earlier the affected volume was readable during the hang. This time the
volume is not readable either.
I'll keep the patched 3.16 running and see if I can trigger something
similar to the 3.15 hang.
Thanks
--
Torbjørn
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Blocked tasks on 3.15.1
2014-07-22 14:53 ` Chris Mason
` (2 preceding siblings ...)
2014-07-22 19:42 ` Torbjørn
@ 2014-07-22 21:13 ` Martin Steigerwald
2014-07-22 21:15 ` Chris Mason
2014-07-23 1:06 ` Rich Freeman
` (3 subsequent siblings)
7 siblings, 1 reply; 47+ messages in thread
From: Martin Steigerwald @ 2014-07-22 21:13 UTC (permalink / raw)
To: Chris Mason; +Cc: Cody P Schafer, Chris Samuel, linux-btrfs
Am Dienstag, 22. Juli 2014, 10:53:03 schrieb Chris Mason:
> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
> >> Running 3.15.6 with this patch applied on top:
> >> - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/
> >> /home/nyx/`
> >>
> >> - no extra error messages printed (`dmesg | grep racing`) compared to
> >> without the patch
> >
> > I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with
> > 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
> >
> >> To recap some details (so I can have it all in one place):
> >> - /home/ is btrfs with compress=lzo
> >
> > BTRFS RAID 1 with lzo.
> >
> >> - I have _not_ created any nodatacow files.
> >
> > Me neither.
> >
> >> - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
> >>
> >> mentioning the use of dmcrypt)
> >
> > Same, except no dmcrypt.
>
> Thanks for the help in tracking this down everyone. We'll get there!
> Are you all running multi-disk systems (from a btrfs POV, more than one
> device?) I don't care how many physical drives this maps to, just does
> btrfs think there's more than one drive.
As I told before I am using BTRFS RAID 1. Two logival volumes on two distinct
SSDs. RAID is directly in BTRFS, no SoftRAID here (which I wouldn´t want to
use with SSDs anyway).
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: Blocked tasks on 3.15.1
2014-07-22 21:13 ` Martin Steigerwald
@ 2014-07-22 21:15 ` Chris Mason
2014-07-23 11:13 ` Martin Steigerwald
0 siblings, 1 reply; 47+ messages in thread
From: Chris Mason @ 2014-07-22 21:15 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: Cody P Schafer, Chris Samuel, linux-btrfs
On 07/22/2014 05:13 PM, Martin Steigerwald wrote:
> Am Dienstag, 22. Juli 2014, 10:53:03 schrieb Chris Mason:
>> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
>>>> Running 3.15.6 with this patch applied on top:
>>>> - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/
>>>> /home/nyx/`
>>>>
>>>> - no extra error messages printed (`dmesg | grep racing`) compared to
>>>> without the patch
>>>
>>> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with
>>> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
>>>
>>>> To recap some details (so I can have it all in one place):
>>>> - /home/ is btrfs with compress=lzo
>>>
>>> BTRFS RAID 1 with lzo.
>>>
>>>> - I have _not_ created any nodatacow files.
>>>
>>> Me neither.
>>>
>>>> - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
>>>>
>>>> mentioning the use of dmcrypt)
>>>
>>> Same, except no dmcrypt.
>>
>> Thanks for the help in tracking this down everyone. We'll get there!
>> Are you all running multi-disk systems (from a btrfs POV, more than one
>> device?) I don't care how many physical drives this maps to, just does
>> btrfs think there's more than one drive.
>
> As I told before I am using BTRFS RAID 1. Two logival volumes on two distinct
> SSDs. RAID is directly in BTRFS, no SoftRAID here (which I wouldn´t want to
> use with SSDs anyway).
>
When you say logical volumes, you mean LVM right? Just making sure I
know all the pieces involved.
-chris
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Blocked tasks on 3.15.1
2014-07-22 21:15 ` Chris Mason
@ 2014-07-23 11:13 ` Martin Steigerwald
0 siblings, 0 replies; 47+ messages in thread
From: Martin Steigerwald @ 2014-07-23 11:13 UTC (permalink / raw)
To: Chris Mason; +Cc: Cody P Schafer, Chris Samuel, linux-btrfs
Am Dienstag, 22. Juli 2014, 17:15:21 schrieb Chris Mason:
> On 07/22/2014 05:13 PM, Martin Steigerwald wrote:
> > Am Dienstag, 22. Juli 2014, 10:53:03 schrieb Chris Mason:
> >> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
> >>>> Running 3.15.6 with this patch applied on top:
> >>>> - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/
> >>>> /home/nyx/`
> >>>>
> >>>> - no extra error messages printed (`dmesg | grep racing`) compared to
> >>>> without the patch
> >>>
> >>> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang
> >>> with
> >>> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
> >>>
> >>>> To recap some details (so I can have it all in one place):
> >>>> - /home/ is btrfs with compress=lzo
> >>>
> >>> BTRFS RAID 1 with lzo.
> >>>
> >>>> - I have _not_ created any nodatacow files.
> >>>
> >>> Me neither.
> >>>
> >>>> - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
> >>>>
> >>>> mentioning the use of dmcrypt)
> >>>
> >>> Same, except no dmcrypt.
> >>
> >> Thanks for the help in tracking this down everyone. We'll get there!
> >> Are you all running multi-disk systems (from a btrfs POV, more than one
> >> device?) I don't care how many physical drives this maps to, just does
> >> btrfs think there's more than one drive.
> >
> > As I told before I am using BTRFS RAID 1. Two logival volumes on two
> > distinct SSDs. RAID is directly in BTRFS, no SoftRAID here (which I
> > wouldn´t want to use with SSDs anyway).
>
> When you say logical volumes, you mean LVM right? Just making sure I
> know all the pieces involved.
Exactly.
As a recap from the other thread:
merkaba:~> btrfs fi sh /home
Label: 'home' uuid: […]
Total devices 2 FS bytes used 123.20GiB
devid 1 size 160.00GiB used 159.98GiB path /dev/mapper/msata-home
devid 2 size 160.00GiB used 159.98GiB path /dev/dm-3
Btrfs v3.14.1
merkaba:~#1> btrfs fi df /home
Data, RAID1: total=154.95GiB, used=120.61GiB
System, RAID1: total=32.00MiB, used=48.00KiB
Metadata, RAID1: total=5.00GiB, used=2.59GiB
unknown, single: total=512.00MiB, used=0.00
merkaba:~> df -hT /home
Dateisystem Typ Größe Benutzt Verf. Verw% Eingehängt auf
/dev/mapper/msata-home btrfs 320G 247G 69G 79% /home
merkaba:~> file -sk /dev/sata/home
/dev/sata/home: symbolic link to `../dm-3'
merkaba:~> file -sk /dev/dm-3
/dev/dm-3: BTRFS Filesystem label "home", sectorsize 4096, nodesize 16384,
leafsize 16384, UUID=[…],
132303151104/343597383680 bytes used, 2 devices
And LVM layout:
merkaba:~> lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 279,5G 0 disk
├─sda1 8:1 0 4M 0 part
├─sda2 8:2 0 191M 0 part
├─sda3 8:3 0 286M 0 part
└─sda4 8:4 0 279G 0 part
├─sata-home (dm-3) 254:3 0 160G 0 lvm
├─sata-swap (dm-4) 254:4 0 12G 0 lvm [SWAP]
└─sata-debian (dm-5) 254:5 0 30G 0 lvm
sdb 8:16 0 447,1G 0 disk
├─sdb1 8:17 0 200M 0 part
├─sdb2 8:18 0 300M 0 part /boot
└─sdb3 8:19 0 446,7G 0 part
├─msata-home (dm-0) 254:0 0 160G 0 lvm
├─msata-daten (dm-1) 254:1 0 200G 0 lvm
└─msata-debian (dm-2) 254:2 0 30G 0 lvm
sr0 11:0 1 1024M 0 rom
sda is Intel SSD 320 SATA
sdb is Crucial m500 mSATA
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Blocked tasks on 3.15.1
2014-07-22 14:53 ` Chris Mason
` (3 preceding siblings ...)
2014-07-22 21:13 ` Martin Steigerwald
@ 2014-07-23 1:06 ` Rich Freeman
2014-07-23 6:38 ` Felix Seidel
2014-07-23 13:20 ` Charles Cazabon
` (2 subsequent siblings)
7 siblings, 1 reply; 47+ messages in thread
From: Rich Freeman @ 2014-07-23 1:06 UTC (permalink / raw)
To: Chris Mason; +Cc: Martin Steigerwald, Cody P Schafer, Chris Samuel, Btrfs BTRFS
On Tue, Jul 22, 2014 at 10:53 AM, Chris Mason <clm@fb.com> wrote:
>
> Thanks for the help in tracking this down everyone. We'll get there!
> Are you all running multi-disk systems (from a btrfs POV, more than one
> device?) I don't care how many physical drives this maps to, just does
> btrfs think there's more than one drive.
I've been away on vacation so I haven't been able to try your latest
patch, but I can try whatever is out there starting this weekend.
I was getting fairly consistent hangs during heavy IO (especially
rsync) on 3.15 with lzo enabled. This is on raid1 across 5 drives,
directly against the partitions themselves (no dmcrypt, mdadm, lvm,
etc). I disabled lzo and haven't had problems since. I'm now running
on mainline without issue, but I think I did see the hang on mainline
when I tried enabling lzo again briefly.
Rich
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: Blocked tasks on 3.15.1
2014-07-23 1:06 ` Rich Freeman
@ 2014-07-23 6:38 ` Felix Seidel
0 siblings, 0 replies; 47+ messages in thread
From: Felix Seidel @ 2014-07-23 6:38 UTC (permalink / raw)
To: Rich Freeman; +Cc: Chris Mason, Btrfs BTRFS
[-- Attachment #1: Type: text/plain, Size: 571 bytes --]
On 23 Jul 2014, at 03:06, Rich Freeman <r-btrfs@thefreemanclan.net> wrote:
> I disabled lzo and haven't had problems since. I'm now running
> on mainline without issue, but I think I did see the hang on mainline
> when I tried enabling lzo again briefly.
Can confirm. I’m running mainline 3.16rc5 and was experiencing deadlocks
when having LZO enabled.
Disabled it, now all seems ok.
Using btrfs RAID1 -> dm-crypt -> SATA.
I’ve attached some more dmesg “blocked” messages using kernel versions
3.15.5, 3.14.6 and 3.16rc5 just in case it helps anyone.
[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 14924 bytes --]
Jul 18 23:36:58 nas kernel: INFO: task sudo:1214 blocked for more than 120 seconds.
Jul 18 23:36:58 nas kernel: Tainted: G O 3.15.5-2-ARCH #1
Jul 18 23:36:58 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 18 23:36:58 nas kernel: sudo D 0000000000000000 0 1214 1 0x00000004
Jul 18 23:36:58 nas kernel: ffff88001d0ebc20 0000000000000086 ffff88002cca5bb0 0000000000014700
Jul 18 23:36:58 nas kernel: ffff88001d0ebfd8 0000000000014700 ffff88002cca5bb0 0000000000000000
Jul 18 23:36:58 nas kernel: ffff880028ee4000 0000000000000003 00000000284e0d53 0000000000000002
Jul 18 23:36:58 nas kernel: Call Trace:
Jul 18 23:36:58 nas kernel: [<ffffffff815110dc>] ? __do_page_fault+0x2ec/0x600
Jul 18 23:36:58 nas kernel: [<ffffffff81509fa9>] schedule+0x29/0x70
Jul 18 23:36:58 nas kernel: [<ffffffff8150a426>] schedule_preempt_disabled+0x16/0x20
Jul 18 23:36:58 nas kernel: [<ffffffff8150bda5>] __mutex_lock_slowpath+0xe5/0x230
Jul 18 23:36:58 nas kernel: [<ffffffff8150bf07>] mutex_lock+0x17/0x30
Jul 18 23:36:58 nas kernel: [<ffffffff811bfa24>] lookup_slow+0x34/0xc0
Jul 18 23:36:58 nas kernel: [<ffffffff811c1b73>] path_lookupat+0x723/0x880
Jul 18 23:36:58 nas kernel: [<ffffffff8114f111>] ? release_pages+0xc1/0x280
Jul 18 23:36:58 nas kernel: [<ffffffff811bfd97>] ? getname_flags+0x37/0x130
Jul 18 23:36:58 nas kernel: [<ffffffff811c1cf6>] filename_lookup.isra.30+0x26/0x80
Jul 18 23:36:58 nas kernel: [<ffffffff811c4fd7>] user_path_at_empty+0x67/0xd0
Jul 18 23:36:58 nas kernel: [<ffffffff81172b52>] ? unmap_region+0xe2/0x130
Jul 18 23:36:58 nas kernel: [<ffffffff811c5051>] user_path_at+0x11/0x20
Jul 18 23:36:58 nas kernel: [<ffffffff811b979a>] vfs_fstatat+0x6a/0xd0
Jul 18 23:36:58 nas kernel: [<ffffffff811b981b>] vfs_stat+0x1b/0x20
Jul 18 23:36:58 nas kernel: [<ffffffff811b9df9>] SyS_newstat+0x29/0x60
Jul 18 23:36:58 nas kernel: [<ffffffff8117501c>] ? vm_munmap+0x4c/0x60
Jul 18 23:36:58 nas kernel: [<ffffffff81175f92>] ? SyS_munmap+0x22/0x30
Jul 18 23:36:58 nas kernel: [<ffffffff81515fa9>] system_call_fastpath+0x16/0x1b
---
Jul 19 18:34:17 nas kernel: INFO: task rsync:4900 blocked for more than 120 seconds.
Jul 19 18:34:17 nas kernel: Tainted: G O 3.15.5-2-ARCH #1
Jul 19 18:34:17 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 18:34:17 nas kernel: rsync D 0000000000000000 0 4900 4899 0x00000000
Jul 19 18:34:17 nas kernel: ffff880005947c20 0000000000000082 ffff880034aa4750 0000000000014700
Jul 19 18:34:17 nas kernel: ffff880005947fd8 0000000000014700 ffff880034aa4750 ffffffff810a5995
Jul 19 18:34:17 nas kernel: ffff88011fc14700 ffff8800dd828a30 ffff8800cece6a00 ffff880005947bd8
Jul 19 18:34:17 nas kernel: Call Trace:
Jul 19 18:34:17 nas kernel: [<ffffffff810a5995>] ? set_next_entity+0x95/0xb0
Jul 19 18:34:17 nas kernel: [<ffffffff810ac0be>] ? pick_next_task_fair+0x46e/0x550
Jul 19 18:34:17 nas kernel: [<ffffffff810136c1>] ? __switch_to+0x1f1/0x540
Jul 19 18:34:17 nas kernel: [<ffffffff81509fa9>] schedule+0x29/0x70
Jul 19 18:34:17 nas kernel: [<ffffffff8150a426>] schedule_preempt_disabled+0x16/0x20
Jul 19 18:34:17 nas kernel: [<ffffffff8150bda5>] __mutex_lock_slowpath+0xe5/0x230
Jul 19 18:34:17 nas kernel: [<ffffffff8150bf07>] mutex_lock+0x17/0x30
Jul 19 18:34:17 nas kernel: [<ffffffff811bfa24>] lookup_slow+0x34/0xc0
Jul 19 18:34:17 nas kernel: [<ffffffff811c1b73>] path_lookupat+0x723/0x880
Jul 19 18:34:17 nas kernel: [<ffffffff8150a2bf>] ? io_schedule+0xbf/0xf0
Jul 19 18:34:17 nas kernel: [<ffffffff8150a7d1>] ? __wait_on_bit_lock+0x91/0xb0
Jul 19 18:34:17 nas kernel: [<ffffffff811bfd97>] ? getname_flags+0x37/0x130
Jul 19 18:34:17 nas kernel: [<ffffffff811c1cf6>] filename_lookup.isra.30+0x26/0x80
Jul 19 18:34:17 nas kernel: [<ffffffff811c4fd7>] user_path_at_empty+0x67/0xd0
Jul 19 18:34:17 nas kernel: [<ffffffff811c5051>] user_path_at+0x11/0x20
Jul 19 18:34:17 nas kernel: [<ffffffff811b979a>] vfs_fstatat+0x6a/0xd0
Jul 19 18:34:17 nas kernel: [<ffffffff811d4414>] ? mntput+0x24/0x40
Jul 19 18:34:17 nas kernel: [<ffffffff811b983e>] vfs_lstat+0x1e/0x20
Jul 19 18:34:17 nas kernel: [<ffffffff811b9e59>] SyS_newlstat+0x29/0x60
Jul 19 18:34:17 nas kernel: [<ffffffff8108a3c4>] ? task_work_run+0xa4/0xe0
Jul 19 18:34:17 nas kernel: [<ffffffff8150e939>] ? do_device_not_available+0x19/0x20
Jul 19 18:34:17 nas kernel: [<ffffffff8151760e>] ? device_not_available+0x1e/0x30
Jul 19 18:34:17 nas kernel: [<ffffffff81515fa9>] system_call_fastpath+0x16/0x1b
---
Jul 19 23:58:39 nas kernel: INFO: task rsync:3407 blocked for more than 120 seconds.
Jul 19 23:58:39 nas kernel: Not tainted 3.14.6-1-ARCH #1
Jul 19 23:58:39 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 23:58:39 nas kernel: rsync D ffff8800cd9e3110 0 3407 3405 0x00000000
Jul 19 23:58:39 nas kernel: ffff8800a2f21bc0 0000000000000086 ffff8800a2f21b10 ffff8800cd9e3110
Jul 19 23:58:39 nas kernel: 00000000000146c0 ffff8800a2f21fd8 00000000000146c0 ffff8800cd9e3110
Jul 19 23:58:39 nas kernel: 00000000ffffffef ffff8800dd51bf60 ffffffffa02f4c5c ffff8800a2f21b28
Jul 19 23:58:39 nas kernel: Call Trace:
Jul 19 23:58:39 nas kernel: [<ffffffffa02f4c5c>] ? __set_extent_bit+0x45c/0x550 [btrfs]
Jul 19 23:58:39 nas kernel: [<ffffffffa02f4263>] ? free_extent_state+0x43/0xc0 [btrfs]
Jul 19 23:58:39 nas kernel: [<ffffffffa02f4c5c>] ? __set_extent_bit+0x45c/0x550 [btrfs]
Jul 19 23:58:39 nas kernel: [<ffffffff8150bb09>] schedule+0x29/0x70
Jul 19 23:58:39 nas kernel: [<ffffffffa02f5b72>] lock_extent_bits+0x152/0x1e0 [btrfs]
Jul 19 23:58:39 nas kernel: [<ffffffff810b3ec0>] ? __wake_up_sync+0x20/0x20
Jul 19 23:58:39 nas kernel: [<ffffffffa02e39b9>] btrfs_evict_inode+0x139/0x520 [btrfs]
Jul 19 23:58:39 nas kernel: [<ffffffff811d5ac0>] evict+0xb0/0x1c0
Jul 19 23:58:39 nas kernel: [<ffffffff811d6375>] iput+0xf5/0x1a0
Jul 19 23:58:39 nas kernel: [<ffffffff811d1bc0>] dentry_kill+0x240/0x290
Jul 19 23:58:39 nas kernel: [<ffffffff811d1c8b>] dput+0x7b/0x120
Jul 19 23:58:39 nas kernel: [<ffffffff811cc0da>] SyS_renameat+0x2ca/0x400
Jul 19 23:58:39 nas kernel: [<ffffffff811bd001>] ? __sb_end_write+0x31/0x60
Jul 19 23:58:39 nas kernel: [<ffffffff811d9111>] ? mnt_drop_write+0x31/0x50
Jul 19 23:58:39 nas kernel: [<ffffffff811b81c3>] ? chmod_common+0xe3/0x160
Jul 19 23:58:39 nas kernel: [<ffffffff811da334>] ? mntput+0x24/0x40
Jul 19 23:58:39 nas kernel: [<ffffffff811c478e>] ? path_put+0x1e/0x30
Jul 19 23:58:39 nas kernel: [<ffffffff811b96e2>] ? SyS_fchmodat+0x62/0xc0
Jul 19 23:58:39 nas kernel: [<ffffffff811cc22b>] SyS_rename+0x1b/0x20
Jul 19 23:58:39 nas kernel: [<ffffffff81517b29>] system_call_fastpath+0x16/0x1b
---
Jul 21 22:31:02 nas kernel: INFO: task kworker/u8:1:12866 blocked for more than 120 seconds.
Jul 21 22:31:02 nas kernel: Not tainted 3.16.0-1-mainline #1
Jul 21 22:31:02 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 22:31:02 nas kernel: kworker/u8:1 D 0000000000000000 0 12866 2 0x00000000
Jul 21 22:31:02 nas kernel: Workqueue: btrfs-delalloc normal_work_helper [btrfs]
Jul 21 22:31:02 nas kernel: ffff88010760f5d8 0000000000000046 ffff8800d293bd20 00000000000145c0
Jul 21 22:31:02 nas kernel: ffff88010760ffd8 00000000000145c0 ffff8800d293bd20 0000000000000001
Jul 21 22:31:02 nas kernel: ffffffff810b6c30 ffff88010760f5c8 ffff88010d4bc2e8 ffff8800a45b4000
Jul 21 22:31:02 nas kernel: Call Trace:
Jul 21 22:31:02 nas kernel: [<ffffffff810b6c30>] ? __wake_up_sync+0x20/0x20
Jul 21 22:31:02 nas kernel: [<ffffffffa0357406>] ? btrfs_bio_counter_dec+0x26/0x60 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa0312665>] ? btrfs_map_bio+0x4b5/0x520 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffff8114b750>] ? filemap_fdatawait+0x30/0x30
Jul 21 22:31:02 nas kernel: [<ffffffff8151e919>] schedule+0x29/0x70
Jul 21 22:31:02 nas kernel: [<ffffffff8151ebf4>] io_schedule+0x94/0xf0
Jul 21 22:31:02 nas kernel: [<ffffffff8114b75e>] sleep_on_page+0xe/0x20
Jul 21 22:31:02 nas kernel: [<ffffffff8151f014>] __wait_on_bit+0x64/0x90
Jul 21 22:31:02 nas kernel: [<ffffffff8114b509>] wait_on_page_bit+0x89/0xa0
Jul 21 22:31:02 nas kernel: [<ffffffff810b6c70>] ? autoremove_wake_function+0x40/0x40
Jul 21 22:31:02 nas kernel: [<ffffffffa030943a>] read_extent_buffer_pages+0x30a/0x350 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02dc1e0>] ? free_root_pointers+0x60/0x60 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02ddd01>] btree_read_extent_buffer_pages.constprop.45+0xb1/0x110 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02de378>] read_tree_block+0x38/0x60 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02be770>] read_block_for_search.isra.29+0x160/0x3b0 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02c429a>] btrfs_next_old_leaf+0x26a/0x460 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02c44a0>] btrfs_next_leaf+0x10/0x20 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa030e924>] find_free_dev_extent+0xc4/0x360 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa030ed8a>] __btrfs_alloc_chunk+0x1ca/0x810 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa03119b4>] btrfs_alloc_chunk+0x34/0x40 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02cc82e>] do_chunk_alloc+0x1de/0x3d0 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02d1e65>] find_free_extent+0xbd5/0xe00 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02d2268>] btrfs_reserve_extent+0x68/0x160 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02ec36b>] cow_file_range+0x13b/0x430 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02ed715>] submit_compressed_extents+0x215/0x4a0 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02ed9a0>] ? submit_compressed_extents+0x4a0/0x4a0 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02eda26>] async_cow_submit+0x86/0x90 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa0316735>] normal_work_helper+0x205/0x350 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffff8108afd8>] process_one_work+0x168/0x450
Jul 21 22:31:02 nas kernel: [<ffffffff8108b79c>] worker_thread+0x1fc/0x550
Jul 21 22:31:02 nas kernel: [<ffffffff8108b5a0>] ? init_pwq.part.22+0x10/0x10
Jul 21 22:31:02 nas kernel: [<ffffffff81091d1a>] kthread+0xea/0x100
Jul 21 22:31:02 nas kernel: [<ffffffff81091c30>] ? kthread_create_on_node+0x1b0/0x1b0
Jul 21 22:31:02 nas kernel: [<ffffffff8152277c>] ret_from_fork+0x7c/0xb0
Jul 21 22:31:02 nas kernel: [<ffffffff81091c30>] ? kthread_create_on_node+0x1b0/0x1b0
---
Jul 21 22:31:02 nas kernel: INFO: task afpd:15988 blocked for more than 120 seconds.
Jul 21 22:31:02 nas kernel: Not tainted 3.16.0-1-mainline #1
Jul 21 22:31:02 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 22:31:02 nas kernel: afpd D 0000000000000001 0 15988 11686 0x00000004
Jul 21 22:31:02 nas kernel: ffff88010d083c48 0000000000000086 ffff8800d8d88000 00000000000145c0
Jul 21 22:31:02 nas kernel: ffff88010d083fd8 00000000000145c0 ffff8800d8d88000 ffff8800a4505000
Jul 21 22:31:02 nas kernel: 0000000000000002 00000000000a6fff ffff88010d083c28 ffffffffa02ed342
Jul 21 22:31:02 nas kernel: Call Trace:
Jul 21 22:31:02 nas kernel: [<ffffffffa02ed342>] ? run_delalloc_range+0x192/0x350 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffff8114b750>] ? filemap_fdatawait+0x30/0x30
Jul 21 22:31:02 nas kernel: [<ffffffff8151e919>] schedule+0x29/0x70
Jul 21 22:31:02 nas kernel: [<ffffffff8151ebf4>] io_schedule+0x94/0xf0
Jul 21 22:31:02 nas kernel: [<ffffffff8114b75e>] sleep_on_page+0xe/0x20
Jul 21 22:31:02 nas kernel: [<ffffffff8151f138>] __wait_on_bit_lock+0x48/0xb0
Jul 21 22:31:02 nas kernel: [<ffffffff8114b8b8>] __lock_page+0x78/0x90
Jul 21 22:31:02 nas kernel: [<ffffffff810b6c70>] ? autoremove_wake_function+0x40/0x40
Jul 21 22:31:02 nas kernel: [<ffffffffa0305a18>] extent_write_cache_pages.isra.28.constprop.43+0x2d8/0x410 [btr
Jul 21 22:31:02 nas kernel: [<ffffffffa0307cec>] extent_writepages+0x5c/0x90 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02e9780>] ? btrfs_direct_IO+0x350/0x350 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffffa02e8518>] btrfs_writepages+0x28/0x30 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffff8115913e>] do_writepages+0x1e/0x30
Jul 21 22:31:02 nas kernel: [<ffffffff8114d22d>] __filemap_fdatawrite_range+0x5d/0x80
Jul 21 22:31:02 nas kernel: [<ffffffff8114d313>] filemap_fdatawrite_range+0x13/0x20
Jul 21 22:31:02 nas kernel: [<ffffffffa02f68ad>] btrfs_sync_file+0x5d/0x320 [btrfs]
Jul 21 22:31:02 nas kernel: [<ffffffff811f1e11>] do_fsync+0x51/0x80
Jul 21 22:31:02 nas kernel: [<ffffffff811f20c0>] SyS_fsync+0x10/0x20
Jul 21 22:31:02 nas kernel: [<ffffffff81522829>] system_call_fastpath+0x16/0x1b
---
Jul 21 22:33:02 nas kernel: INFO: task btrfs-transacti:921 blocked for more than 120 seconds.
Jul 21 22:33:02 nas kernel: Not tainted 3.16.0-1-mainline #1
Jul 21 22:33:02 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 22:33:02 nas kernel: btrfs-transacti D 0000000000000000 0 921 2 0x00000000
Jul 21 22:33:02 nas kernel: ffff8800d27dbe00 0000000000000046 ffff8800d28d5bb0 00000000000145c0
Jul 21 22:33:02 nas kernel: ffff8800d27dbfd8 00000000000145c0 ffff8800d28d5bb0 ffff880101f72480
Jul 21 22:33:02 nas kernel: ffffffffa033b5a6 ffff8800d27dbda8 ffffffffa033b864 ffff8800d18f95a0
Jul 21 22:33:02 nas kernel: Call Trace:
Jul 21 22:33:02 nas kernel: [<ffffffffa033b5a6>] ? btrfs_release_delayed_inode+0x46/0x60 [btrfs]
Jul 21 22:33:02 nas kernel: [<ffffffffa033b864>] ? __btrfs_update_delayed_inode+0x1a4/0x210 [btrfs]
Jul 21 22:33:02 nas kernel: [<ffffffff811a570a>] ? kmem_cache_alloc+0x16a/0x170
Jul 21 22:33:02 nas kernel: [<ffffffffa02bb89a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
Jul 21 22:33:02 nas kernel: [<ffffffffa034344a>] ? btrfs_scrub_pause+0xea/0x110 [btrfs]
Jul 21 22:33:02 nas kernel: [<ffffffff8151e919>] schedule+0x29/0x70
Jul 21 22:33:02 nas kernel: [<ffffffffa02e5745>] btrfs_commit_transaction+0x405/0xa50 [btrfs]
Jul 21 22:33:02 nas kernel: [<ffffffff810b6c30>] ? __wake_up_sync+0x20/0x20
Jul 21 22:33:02 nas kernel: [<ffffffffa02e1275>] transaction_kthread+0x1e5/0x250 [btrfs]
Jul 21 22:33:02 nas kernel: [<ffffffffa02e1090>] ? btrfs_cleanup_transaction+0x5a0/0x5a0 [btrfs]
Jul 21 22:33:02 nas kernel: [<ffffffff81091d1a>] kthread+0xea/0x100
Jul 21 22:33:02 nas kernel: [<ffffffff81090000>] ? parse_args+0x4b0/0x4e0
Jul 21 22:33:02 nas kernel: [<ffffffff81091c30>] ? kthread_create_on_node+0x1b0/0x1b0
Jul 21 22:33:02 nas kernel: [<ffffffff8152277c>] ret_from_fork+0x7c/0xb0
Jul 21 22:33:02 nas kernel: [<ffffffff81091c30>] ? kthread_create_on_node+0x1b0/0x1b0
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Blocked tasks on 3.15.1
2014-07-22 14:53 ` Chris Mason
` (4 preceding siblings ...)
2014-07-23 1:06 ` Rich Freeman
@ 2014-07-23 13:20 ` Charles Cazabon
2014-07-25 2:27 ` Cody P Schafer
2014-08-12 2:55 ` Charles Cazabon
7 siblings, 0 replies; 47+ messages in thread
From: Charles Cazabon @ 2014-07-23 13:20 UTC (permalink / raw)
To: linux-btrfs
Chris Mason <clm@fb.com> wrote:
>
> Thanks for the help in tracking this down everyone. We'll get there!
> Are you all running multi-disk systems (from a btrfs POV, more than one
> device?) I don't care how many physical drives this maps to, just does
> btrfs think there's more than one drive.
Not me, at least - I'm doing the device aggregation down at the LVM level
(sata-dmcrypt-lvm-btrfs stack), so it's presented to btrfs as a single logical
device.
Charles
--
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at: http://pyropus.ca/software/
-----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: Blocked tasks on 3.15.1
2014-07-22 14:53 ` Chris Mason
` (5 preceding siblings ...)
2014-07-23 13:20 ` Charles Cazabon
@ 2014-07-25 2:27 ` Cody P Schafer
2014-08-07 15:12 ` Tobias Holst
2014-08-12 2:55 ` Charles Cazabon
7 siblings, 1 reply; 47+ messages in thread
From: Cody P Schafer @ 2014-07-25 2:27 UTC (permalink / raw)
To: Chris Mason; +Cc: Martin Steigerwald, Chris Samuel, linux-btrfs
On Tue, Jul 22, 2014 at 9:53 AM, Chris Mason <clm@fb.com> wrote:
>
>
> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
>
>>> Running 3.15.6 with this patch applied on top:
>>> - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/`
>>> - no extra error messages printed (`dmesg | grep racing`) compared to
>>> without the patch
>>
>> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with
>> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
>>
>>> To recap some details (so I can have it all in one place):
>>> - /home/ is btrfs with compress=lzo
>>
>> BTRFS RAID 1 with lzo.
>>
>>> - I have _not_ created any nodatacow files.
>>
>> Me neither.
>>
>>> - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
>>> mentioning the use of dmcrypt)
>>
>> Same, except no dmcrypt.
>>
>
> Thanks for the help in tracking this down everyone. We'll get there!
> Are you all running multi-disk systems (from a btrfs POV, more than one
> device?) I don't care how many physical drives this maps to, just does
> btrfs think there's more than one drive.
No, both of my btrfs filesystems are single disk.
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: Blocked tasks on 3.15.1
2014-07-25 2:27 ` Cody P Schafer
@ 2014-08-07 15:12 ` Tobias Holst
2014-08-07 16:05 ` Duncan
0 siblings, 1 reply; 47+ messages in thread
From: Tobias Holst @ 2014-08-07 15:12 UTC (permalink / raw)
To: linux-btrfs
Hi
Is there anything new on this topic? I am using Ubuntu 14.04.1 and
experiencing the same problem.
- 6 HDDs
- LUKS on every HDD
- btrfs RAID6 over this 6 crypt-devices
No LVM, no nodatacow files.
Mount-options: defaults,compress-force=lzo,space_cache
With the original 3.13-kernel (3.13.0-32-generic) it is working fine.
Then I tried the following kernels from here:
http://kernel.ubuntu.com/~kernel-ppa/mainline/
linux-image-3.14.15-031415-generic_3.14.15-031415.201407311853_amd64.deb
- not even booting, kernel panic at boot.
linux-image-3.15.6-031506-generic_3.15.6-031506.201407172034_amd64.deb,
linux-image-3.15.7-031507-generic_3.15.7-031507.201407281235_amd64.deb,
and linux-image-3.16.0-031600-generic_3.16.0-031600.201408031935_amd64.deb
causing the hangs like described in this thread. When doing big IO
(unpacking a .rar-archive with multiple GB) the filesystem stops
working. Load stays very high but nothing actually happens on the
drives accoding to dstat. htop shows a D (uninterruptible sleep
(usually IO)) at many kworker-threads.
Unmounting of the btrfs-filesystem only works with -l (lazy) option.
Reboot or shutdown doesn't work because of the blocking threads. So
only a power cut works. After the reboot the last written data before
the hang is lost.
I am now back on 3.13.
Regards
2014-07-25 4:27 GMT+02:00 Cody P Schafer <dev@codyps.com>:
>
> On Tue, Jul 22, 2014 at 9:53 AM, Chris Mason <clm@fb.com> wrote:
> >
> >
> > On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
> >
> >>> Running 3.15.6 with this patch applied on top:
> >>> - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/`
> >>> - no extra error messages printed (`dmesg | grep racing`) compared to
> >>> without the patch
> >>
> >> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with
> >> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
> >>
> >>> To recap some details (so I can have it all in one place):
> >>> - /home/ is btrfs with compress=lzo
> >>
> >> BTRFS RAID 1 with lzo.
> >>
> >>> - I have _not_ created any nodatacow files.
> >>
> >> Me neither.
> >>
> >>> - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
> >>> mentioning the use of dmcrypt)
> >>
> >> Same, except no dmcrypt.
> >>
> >
> > Thanks for the help in tracking this down everyone. We'll get there!
> > Are you all running multi-disk systems (from a btrfs POV, more than one
> > device?) I don't care how many physical drives this maps to, just does
> > btrfs think there's more than one drive.
>
> No, both of my btrfs filesystems are single disk.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Blocked tasks on 3.15.1
2014-08-07 15:12 ` Tobias Holst
@ 2014-08-07 16:05 ` Duncan
0 siblings, 0 replies; 47+ messages in thread
From: Duncan @ 2014-08-07 16:05 UTC (permalink / raw)
To: linux-btrfs
Tobias Holst posted on Thu, 07 Aug 2014 17:12:17 +0200 as excerpted:
> Is there anything new on this topic? I am using Ubuntu 14.04.1 and
> experiencing the same problem.
> - 6 HDDs - LUKS on every HDD - btrfs RAID6 over this 6 crypt-devices No
> LVM, no nodatacow files.
> Mount-options: defaults,compress-force=lzo,space_cache With the original
> 3.13-kernel (3.13.0-32-generic) it is working fine.
I see you're using compress-force. See the recent replies to the
"Btrfs: fix compressed write corruption on enospc" thread.
I'm not /sure/ your case is directly related (tho the kworker code is
pretty new and 3.13 may be working for you due to being before the
migration to kworkers, supporting the case of it being either the same
problem or another related to it), but that's certainly one problem
they've recently traced down... to a bug in the kworker threads code,
that starts a new worker that can race with the first instead of obeying
a flag that says keep it on the first worker.
Looks like they're doing patch that takes a slower but safer path to work
around the kworker bug for now, as that bug was just traced (there was
another bug, with a patch available originally hiding the ultimate
problem, but obviously that's only half the fix as it simply revealed
another bug underneath) and fixing it properly is likely to take some
time. Now that it's basically traced the workaround patch should be
published on-list shortly and should make it into 3.17 and back into the
stables, altho I'm not sure it'll make it into 3.16.1, etc.
But there's certainly progress. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Blocked tasks on 3.15.1
2014-07-22 14:53 ` Chris Mason
` (6 preceding siblings ...)
2014-07-25 2:27 ` Cody P Schafer
@ 2014-08-12 2:55 ` Charles Cazabon
2014-08-12 2:56 ` Liu Bo
2014-08-12 4:49 ` Marc MERLIN
7 siblings, 2 replies; 47+ messages in thread
From: Charles Cazabon @ 2014-08-12 2:55 UTC (permalink / raw)
To: linux-btrfs
The blocked tasks issue that got significantly worse in 3.15 -- did anything
go into 3.16 related to this? I didn't see a single "btrfs" in Linus' 3.16
announcement, so I don't know whether it should be better, the same, or worse
in this respect...
I haven't seen a definite statement about this on this list, either.
Can someone more familiar with the state of development comment on this?
Charles
--
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at: http://pyropus.ca/software/
-----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: Blocked tasks on 3.15.1
2014-08-12 2:55 ` Charles Cazabon
@ 2014-08-12 2:56 ` Liu Bo
2014-08-12 4:18 ` Duncan
2014-08-12 4:49 ` Marc MERLIN
1 sibling, 1 reply; 47+ messages in thread
From: Liu Bo @ 2014-08-12 2:56 UTC (permalink / raw)
To: linux-btrfs
On Mon, Aug 11, 2014 at 08:55:21PM -0600, Charles Cazabon wrote:
> The blocked tasks issue that got significantly worse in 3.15 -- did anything
> go into 3.16 related to this? I didn't see a single "btrfs" in Linus' 3.16
> announcement, so I don't know whether it should be better, the same, or worse
> in this respect...
>
> I haven't seen a definite statement about this on this list, either.
>
> Can someone more familiar with the state of development comment on this?
Good news is that we've figured out the bug and the patch is already under
testing :-)
thanks,
-liubo
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Blocked tasks on 3.15.1
2014-08-12 2:56 ` Liu Bo
@ 2014-08-12 4:18 ` Duncan
0 siblings, 0 replies; 47+ messages in thread
From: Duncan @ 2014-08-12 4:18 UTC (permalink / raw)
To: linux-btrfs
Liu Bo posted on Tue, 12 Aug 2014 10:56:42 +0800 as excerpted:
> On Mon, Aug 11, 2014 at 08:55:21PM -0600, Charles Cazabon wrote:
>> The blocked tasks issue that got significantly worse in 3.15 -- did
>> anything go into 3.16 related to this? I didn't see a single "btrfs"
>> in Linus' 3.16 announcement, so I don't know whether it should be
>> better, the same, or worse in this respect...
>>
>> I haven't seen a definite statement about this on this list, either.
>>
>> Can someone more familiar with the state of development comment on
>> this?
>
> Good news is that we've figured out the bug and the patch is already
> under testing :-)
IOW, it's not in 3.16.0, but will hopefully make it into 3.16.2 (it'll
likely be a too late for 3.16.1).
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Blocked tasks on 3.15.1
2014-08-12 2:55 ` Charles Cazabon
2014-08-12 2:56 ` Liu Bo
@ 2014-08-12 4:49 ` Marc MERLIN
2014-08-18 20:34 ` James Cloos
1 sibling, 1 reply; 47+ messages in thread
From: Marc MERLIN @ 2014-08-12 4:49 UTC (permalink / raw)
To: linux-btrfs
On Mon, Aug 11, 2014 at 08:55:21PM -0600, Charles Cazabon wrote:
> The blocked tasks issue that got significantly worse in 3.15 -- did anything
> go into 3.16 related to this? I didn't see a single "btrfs" in Linus' 3.16
> announcement, so I don't know whether it should be better, the same, or worse
> in this respect...
>
> I haven't seen a definite statement about this on this list, either.
Yes, 3.15 is unusable for some workloads, mine included.
Go back to 3.14 until there is a patch in 3.16, which there isn't quite
as for right now, but very soon hopefully.
Note 3.16.0 is actually worse than 3.15 for me.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 47+ messages in thread* Re: Blocked tasks on 3.15.1
2014-08-12 4:49 ` Marc MERLIN
@ 2014-08-18 20:34 ` James Cloos
0 siblings, 0 replies; 47+ messages in thread
From: James Cloos @ 2014-08-18 20:34 UTC (permalink / raw)
To: linux-btrfs
>>>>> "MM" == Marc MERLIN <marc@merlins.org> writes:
MM> Note 3.16.0 is actually worse than 3.15 for me.
Here (a single partition btrfs), 3.16.0 works fine, but 3.17-rc1 fails again.
My /var/log is also a compressed, single-partition btrfs; that doesn't
show the problem with any version. Just the partition with git, svn and
rsync trees.
Last night's test of 3.17-rc1 showed the problem with the first git
pull, getting stuck reading FETCH_HEAD. All repos on that fs failed
the same way.
But rebooting back to 3.16.0 let everything work perfectly.
-JimC
--
James Cloos <cloos@jhcloos.com> OpenPGP: 0x997A9F17ED7DAEA6
^ permalink raw reply [flat|nested] 47+ messages in thread