* Qgroup accounting issue on kdave/for-next branch
@ 2016-11-29 6:36 Chandan Rajendra
2016-11-29 7:10 ` Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Chandan Rajendra @ 2016-11-29 6:36 UTC (permalink / raw)
To: linux-btrfs; +Cc: quwenruo
When executing btrfs/126 test on kdave/for-next branch on a ppc64 guest, I
noticed the following call trace.
[ 77.335887] ------------[ cut here ]------------
[ 77.336115] WARNING: CPU: 0 PID: 8325 at /root/repos/linux/fs/btrfs/qgroup.c:2443 .btrfs_qgroup_free_refroot+0x188/0x220
[ 77.336303] Modules linked in:
[ 77.336393] CPU: 0 PID: 8325 Comm: umount Not tainted 4.9.0-rc5-00062-g6b74e43 #22
[ 77.336526] task: c00000062b8d4880 task.stack: c00000062b9a4000
[ 77.336638] NIP: c0000000005cf018 LR: c0000000005ceff8 CTR: c0000000005390c0
[ 77.336771] REGS: c00000062b9a7450 TRAP: 0700 Not tainted (4.9.0-rc5-00062-g6b74e43)
[ 77.336908] MSR: 800000000282b032
[ 77.336960] <
[ 77.337027] SF
[ 77.337053] ,VEC
[ 77.337087] ,VSX
[ 77.337114] ,EE
[ 77.337146] ,FP
[ 77.337173] ,ME
[ 77.337207] ,IR
[ 77.337233] ,DR
[ 77.337267] ,RI
[ 77.337294] >
[ 77.337330] CR: 88000842 XER: 00000000
[ 77.337392] CFAR: c0000000005c9b5c
[ 77.337443] SOFTE: 1
[ 77.337477]
GPR00:
[ 77.337517] c0000000005cefcc
[ 77.337575] c00000062b9a76d0
[ 77.337626] c000000001103a00
[ 77.337714] c00000063e058d40
[ 77.337765]
GPR04:
[ 77.337817] c00000062b9a7740
[ 77.337868] 0000000000000008
[ 77.337927] 000000000005f000
[ 77.337978] 000000063eed0000
[ 77.338037]
GPR08:
[ 77.338080] c00000062f9629c8
[ 77.338138] 0000000000000001
[ 77.338191] fffffffffffff000
[ 77.338248] c00000063e058e80
[ 77.338300]
GPR12:
[ 77.338384] 0000000000000000
[ 77.338435] c00000000fe00000
[ 77.338498] 0000000020000000
[ 77.338548] 0000000000000000
[ 77.338605]
GPR16:
[ 77.338645] 0000000000000008
[ 77.338703] 000000004d5906fc
[ 77.338754] 000000004d5a6c08
[ 77.338811] 000000004d54b3d0
[ 77.338866]
GPR20:
[ 77.338921] 000001000bcf8440
[ 77.338972] 0000000000000000
[ 77.339030] 0000000000000000
[ 77.339080] c00000062523b078
[ 77.339138]
GPR24:
[ 77.339178] c00000062523b080
[ 77.339240] c000000000da2b58
[ 77.339290] 0000000000000000
[ 77.339347] c00000062e539600
[ 77.339398]
GPR28:
[ 77.339485] 0000000000060000
[ 77.339536] c00000062523b000
[ 77.339594] c00000062e539600
[ 77.339644] c00000062e539688
[ 77.339740] NIP [c0000000005cf018] .btrfs_qgroup_free_refroot+0x188/0x220
[ 77.339852] LR [c0000000005ceff8] .btrfs_qgroup_free_refroot+0x168/0x220
[ 77.339959] Call Trace:
[ 77.339998] [c00000062b9a76d0] [c0000000005cefcc] .btrfs_qgroup_free_refroot+0x13c/0x220
[ 77.340123] (unreliable)
[ 77.340193] [c00000062b9a7790] [c00000000054210c] .commit_fs_roots+0x19c/0x240
[ 77.340355] [c00000062b9a78a0] [c0000000005451a0] .btrfs_commit_transaction.part.5+0x480/0xbe0
[ 77.340554] [c00000062b9a7970] [c000000000503bd4] .btrfs_sync_fs+0x74/0x1c0
[ 77.340725] [c00000062b9a7a10] [c0000000002d6ce0] .sync_filesystem+0xd0/0x100
[ 77.340891] [c00000062b9a7a90] [c000000000294f88] .generic_shutdown_super+0x38/0x1a0
[ 77.341052] [c00000062b9a7b10] [c000000000295508] .kill_anon_super+0x18/0x30
[ 77.342243] [c00000062b9a7b90] [c000000000503db8] .btrfs_kill_super+0x18/0xd0
[ 77.342411] [c00000062b9a7c10] [c0000000002958c8] .deactivate_locked_super+0x98/0xe0
[ 77.342573] [c00000062b9a7c90] [c0000000002be114] .cleanup_mnt+0x54/0xa0
[ 77.342723] [c00000062b9a7d10] [c0000000000d7e74] .task_work_run+0x124/0x180
[ 77.342862] [c00000062b9a7db0] [c00000000001c354] .do_notify_resume+0xa4/0xb0
[ 77.343030] [c00000062b9a7e30] [c00000000000c344] .ret_from_except_lite+0x70/0x74
[ 77.343187] Instruction dump:
[ 77.343276] 3b400000 e87d0c78 38810070 4bffab1d 60000000 2c230000 4182ff40 e8dfffc8
[ 77.343522] eb630008 7d5c3010 7d294910 7d2900d0 <0b090000> 7fbc3040 41dd0070 f95fffc8
[ 77.343771] ---[ end trace 8130fb71377d4ff8 ]---
[ 77.343857] BTRFS warning (device loop1): qgroup 5 reserved space underflow, have: 389120, to free: 393216
[ 77.735193] BTRFS info (device loop1): disabling disk space caching
[ 77.736629] BTRFS info (device loop1): has skinny extents
The above call trace happens because of the following:
|--------------------------------+-----------------------------------|
| Task A | Task B |
|--------------------------------+-----------------------------------|
| Write 4k bytes to a file | |
| at offset range [0, 4095] | |
| - __btrfs_buffered_write | |
| -- btrfs_check_data_free_space | |
| --- Set EXTENT_QGROUP_RESERVED | |
| bit for range [0, 64k-1] | |
| -- Copy 4k data from userspace | |
| to page cache | |
| | Write 4k bytes to the file |
| | at offset range [4096, 8191] |
| | - __btrfs_buffered_write |
| | -- btrfs_check_data_free_space |
| | --- Since EXTENT_QGROUP_RESERVED |
| | is already set we don't |
| | reserve qgroup space |
| | -- Assume the call to |
| | btrfs_delalloc_reserve_metadata() |
| | fails |
| | -- btrfs_free_reserved_data_space |
| | --- Clear EXTENT_QGROUP_RESERVED |
| | file range [4096, 8191] |
|--------------------------------+-----------------------------------|
On x86_64, It has been almost impossible to get the call to
btrfs_delalloc_reserve_metadata() in __btrfs_buffered_write() to fail. Hence I
have the following change in __btrfs_buffered_write() ...
if (pos != 2048)
ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes);
else
ret = 1;
if (ret) {
if (!only_release_metadata)
btrfs_free_reserved_data_space(inode, pos,
write_bytes);
else
btrfs_end_write_no_snapshoting(root);
break;
}
With the above change and by using max_inline=0 mount option, the following
command line will reproduce the call trace on an x86_64 machine,
$ xfs_io -f -c 'pwrite -b 2048 0 8192' -c sync /mnt/file-0.bin
--
chandan
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Qgroup accounting issue on kdave/for-next branch 2016-11-29 6:36 Qgroup accounting issue on kdave/for-next branch Chandan Rajendra @ 2016-11-29 7:10 ` Qu Wenruo 2016-11-29 7:55 ` Qu Wenruo 2016-11-29 15:56 ` Jeff Mahoney 2 siblings, 0 replies; 8+ messages in thread From: Qu Wenruo @ 2016-11-29 7:10 UTC (permalink / raw) To: Chandan Rajendra, linux-btrfs Thanks for the detailed reports and reproducer. I'll investigate it soon. Thanks, Qu At 11/29/2016 02:36 PM, Chandan Rajendra wrote: > When executing btrfs/126 test on kdave/for-next branch on a ppc64 guest, I > noticed the following call trace. > > [ 77.335887] ------------[ cut here ]------------ > [ 77.336115] WARNING: CPU: 0 PID: 8325 at /root/repos/linux/fs/btrfs/qgroup.c:2443 .btrfs_qgroup_free_refroot+0x188/0x220 > [ 77.336303] Modules linked in: > [ 77.336393] CPU: 0 PID: 8325 Comm: umount Not tainted 4.9.0-rc5-00062-g6b74e43 #22 > [ 77.336526] task: c00000062b8d4880 task.stack: c00000062b9a4000 > [ 77.336638] NIP: c0000000005cf018 LR: c0000000005ceff8 CTR: c0000000005390c0 > [ 77.336771] REGS: c00000062b9a7450 TRAP: 0700 Not tainted (4.9.0-rc5-00062-g6b74e43) > [ 77.336908] MSR: 800000000282b032 > [ 77.336960] < > [ 77.337027] SF > [ 77.337053] ,VEC > [ 77.337087] ,VSX > [ 77.337114] ,EE > [ 77.337146] ,FP > [ 77.337173] ,ME > [ 77.337207] ,IR > [ 77.337233] ,DR > [ 77.337267] ,RI > [ 77.337294] > > [ 77.337330] CR: 88000842 XER: 00000000 > [ 77.337392] CFAR: c0000000005c9b5c > [ 77.337443] SOFTE: 1 > [ 77.337477] > GPR00: > [ 77.337517] c0000000005cefcc > [ 77.337575] c00000062b9a76d0 > [ 77.337626] c000000001103a00 > [ 77.337714] c00000063e058d40 > [ 77.337765] > GPR04: > [ 77.337817] c00000062b9a7740 > [ 77.337868] 0000000000000008 > [ 77.337927] 000000000005f000 > [ 77.337978] 000000063eed0000 > [ 77.338037] > GPR08: > [ 77.338080] c00000062f9629c8 > [ 77.338138] 0000000000000001 > [ 77.338191] fffffffffffff000 > [ 77.338248] c00000063e058e80 > [ 77.338300] > GPR12: > [ 77.338384] 0000000000000000 > [ 77.338435] c00000000fe00000 > [ 77.338498] 0000000020000000 > [ 77.338548] 0000000000000000 > [ 77.338605] > GPR16: > [ 77.338645] 0000000000000008 > [ 77.338703] 000000004d5906fc > [ 77.338754] 000000004d5a6c08 > [ 77.338811] 000000004d54b3d0 > [ 77.338866] > GPR20: > [ 77.338921] 000001000bcf8440 > [ 77.338972] 0000000000000000 > [ 77.339030] 0000000000000000 > [ 77.339080] c00000062523b078 > [ 77.339138] > GPR24: > [ 77.339178] c00000062523b080 > [ 77.339240] c000000000da2b58 > [ 77.339290] 0000000000000000 > [ 77.339347] c00000062e539600 > [ 77.339398] > GPR28: > [ 77.339485] 0000000000060000 > [ 77.339536] c00000062523b000 > [ 77.339594] c00000062e539600 > [ 77.339644] c00000062e539688 > > [ 77.339740] NIP [c0000000005cf018] .btrfs_qgroup_free_refroot+0x188/0x220 > [ 77.339852] LR [c0000000005ceff8] .btrfs_qgroup_free_refroot+0x168/0x220 > [ 77.339959] Call Trace: > [ 77.339998] [c00000062b9a76d0] [c0000000005cefcc] .btrfs_qgroup_free_refroot+0x13c/0x220 > [ 77.340123] (unreliable) > [ 77.340193] [c00000062b9a7790] [c00000000054210c] .commit_fs_roots+0x19c/0x240 > [ 77.340355] [c00000062b9a78a0] [c0000000005451a0] .btrfs_commit_transaction.part.5+0x480/0xbe0 > [ 77.340554] [c00000062b9a7970] [c000000000503bd4] .btrfs_sync_fs+0x74/0x1c0 > [ 77.340725] [c00000062b9a7a10] [c0000000002d6ce0] .sync_filesystem+0xd0/0x100 > [ 77.340891] [c00000062b9a7a90] [c000000000294f88] .generic_shutdown_super+0x38/0x1a0 > [ 77.341052] [c00000062b9a7b10] [c000000000295508] .kill_anon_super+0x18/0x30 > [ 77.342243] [c00000062b9a7b90] [c000000000503db8] .btrfs_kill_super+0x18/0xd0 > [ 77.342411] [c00000062b9a7c10] [c0000000002958c8] .deactivate_locked_super+0x98/0xe0 > [ 77.342573] [c00000062b9a7c90] [c0000000002be114] .cleanup_mnt+0x54/0xa0 > [ 77.342723] [c00000062b9a7d10] [c0000000000d7e74] .task_work_run+0x124/0x180 > [ 77.342862] [c00000062b9a7db0] [c00000000001c354] .do_notify_resume+0xa4/0xb0 > [ 77.343030] [c00000062b9a7e30] [c00000000000c344] .ret_from_except_lite+0x70/0x74 > [ 77.343187] Instruction dump: > [ 77.343276] 3b400000 e87d0c78 38810070 4bffab1d 60000000 2c230000 4182ff40 e8dfffc8 > [ 77.343522] eb630008 7d5c3010 7d294910 7d2900d0 <0b090000> 7fbc3040 41dd0070 f95fffc8 > [ 77.343771] ---[ end trace 8130fb71377d4ff8 ]--- > [ 77.343857] BTRFS warning (device loop1): qgroup 5 reserved space underflow, have: 389120, to free: 393216 > [ 77.735193] BTRFS info (device loop1): disabling disk space caching > [ 77.736629] BTRFS info (device loop1): has skinny extents > > > The above call trace happens because of the following: > > |--------------------------------+-----------------------------------| > | Task A | Task B | > |--------------------------------+-----------------------------------| > | Write 4k bytes to a file | | > | at offset range [0, 4095] | | > | - __btrfs_buffered_write | | > | -- btrfs_check_data_free_space | | > | --- Set EXTENT_QGROUP_RESERVED | | > | bit for range [0, 64k-1] | | > | -- Copy 4k data from userspace | | > | to page cache | | > | | Write 4k bytes to the file | > | | at offset range [4096, 8191] | > | | - __btrfs_buffered_write | > | | -- btrfs_check_data_free_space | > | | --- Since EXTENT_QGROUP_RESERVED | > | | is already set we don't | > | | reserve qgroup space | > | | -- Assume the call to | > | | btrfs_delalloc_reserve_metadata() | > | | fails | > | | -- btrfs_free_reserved_data_space | > | | --- Clear EXTENT_QGROUP_RESERVED | > | | file range [4096, 8191] | > |--------------------------------+-----------------------------------| > > On x86_64, It has been almost impossible to get the call to > btrfs_delalloc_reserve_metadata() in __btrfs_buffered_write() to fail. Hence I > have the following change in __btrfs_buffered_write() ... > > if (pos != 2048) > ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes); > else > ret = 1; > if (ret) { > if (!only_release_metadata) > btrfs_free_reserved_data_space(inode, pos, > write_bytes); > else > btrfs_end_write_no_snapshoting(root); > break; > } > > > With the above change and by using max_inline=0 mount option, the following > command line will reproduce the call trace on an x86_64 machine, > $ xfs_io -f -c 'pwrite -b 2048 0 8192' -c sync /mnt/file-0.bin > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Qgroup accounting issue on kdave/for-next branch 2016-11-29 6:36 Qgroup accounting issue on kdave/for-next branch Chandan Rajendra 2016-11-29 7:10 ` Qu Wenruo @ 2016-11-29 7:55 ` Qu Wenruo 2016-11-29 8:21 ` Chandan Rajendra 2016-11-29 15:56 ` Jeff Mahoney 2 siblings, 1 reply; 8+ messages in thread From: Qu Wenruo @ 2016-11-29 7:55 UTC (permalink / raw) To: Chandan Rajendra, linux-btrfs At 11/29/2016 02:36 PM, Chandan Rajendra wrote: > When executing btrfs/126 test on kdave/for-next branch on a ppc64 guest, I > noticed the following call trace. > > [ 77.335887] ------------[ cut here ]------------ > [ 77.336115] WARNING: CPU: 0 PID: 8325 at /root/repos/linux/fs/btrfs/qgroup.c:2443 .btrfs_qgroup_free_refroot+0x188/0x220 > [ 77.336303] Modules linked in: > [ 77.336393] CPU: 0 PID: 8325 Comm: umount Not tainted 4.9.0-rc5-00062-g6b74e43 #22 > [ 77.336526] task: c00000062b8d4880 task.stack: c00000062b9a4000 > [ 77.336638] NIP: c0000000005cf018 LR: c0000000005ceff8 CTR: c0000000005390c0 > [ 77.336771] REGS: c00000062b9a7450 TRAP: 0700 Not tainted (4.9.0-rc5-00062-g6b74e43) > [ 77.336908] MSR: 800000000282b032 > [ 77.336960] < > [ 77.337027] SF > [ 77.337053] ,VEC > [ 77.337087] ,VSX > [ 77.337114] ,EE > [ 77.337146] ,FP > [ 77.337173] ,ME > [ 77.337207] ,IR > [ 77.337233] ,DR > [ 77.337267] ,RI > [ 77.337294] > > [ 77.337330] CR: 88000842 XER: 00000000 > [ 77.337392] CFAR: c0000000005c9b5c > [ 77.337443] SOFTE: 1 > [ 77.337477] > GPR00: > [ 77.337517] c0000000005cefcc > [ 77.337575] c00000062b9a76d0 > [ 77.337626] c000000001103a00 > [ 77.337714] c00000063e058d40 > [ 77.337765] > GPR04: > [ 77.337817] c00000062b9a7740 > [ 77.337868] 0000000000000008 > [ 77.337927] 000000000005f000 > [ 77.337978] 000000063eed0000 > [ 77.338037] > GPR08: > [ 77.338080] c00000062f9629c8 > [ 77.338138] 0000000000000001 > [ 77.338191] fffffffffffff000 > [ 77.338248] c00000063e058e80 > [ 77.338300] > GPR12: > [ 77.338384] 0000000000000000 > [ 77.338435] c00000000fe00000 > [ 77.338498] 0000000020000000 > [ 77.338548] 0000000000000000 > [ 77.338605] > GPR16: > [ 77.338645] 0000000000000008 > [ 77.338703] 000000004d5906fc > [ 77.338754] 000000004d5a6c08 > [ 77.338811] 000000004d54b3d0 > [ 77.338866] > GPR20: > [ 77.338921] 000001000bcf8440 > [ 77.338972] 0000000000000000 > [ 77.339030] 0000000000000000 > [ 77.339080] c00000062523b078 > [ 77.339138] > GPR24: > [ 77.339178] c00000062523b080 > [ 77.339240] c000000000da2b58 > [ 77.339290] 0000000000000000 > [ 77.339347] c00000062e539600 > [ 77.339398] > GPR28: > [ 77.339485] 0000000000060000 > [ 77.339536] c00000062523b000 > [ 77.339594] c00000062e539600 > [ 77.339644] c00000062e539688 > > [ 77.339740] NIP [c0000000005cf018] .btrfs_qgroup_free_refroot+0x188/0x220 > [ 77.339852] LR [c0000000005ceff8] .btrfs_qgroup_free_refroot+0x168/0x220 > [ 77.339959] Call Trace: > [ 77.339998] [c00000062b9a76d0] [c0000000005cefcc] .btrfs_qgroup_free_refroot+0x13c/0x220 > [ 77.340123] (unreliable) > [ 77.340193] [c00000062b9a7790] [c00000000054210c] .commit_fs_roots+0x19c/0x240 > [ 77.340355] [c00000062b9a78a0] [c0000000005451a0] .btrfs_commit_transaction.part.5+0x480/0xbe0 > [ 77.340554] [c00000062b9a7970] [c000000000503bd4] .btrfs_sync_fs+0x74/0x1c0 > [ 77.340725] [c00000062b9a7a10] [c0000000002d6ce0] .sync_filesystem+0xd0/0x100 > [ 77.340891] [c00000062b9a7a90] [c000000000294f88] .generic_shutdown_super+0x38/0x1a0 > [ 77.341052] [c00000062b9a7b10] [c000000000295508] .kill_anon_super+0x18/0x30 > [ 77.342243] [c00000062b9a7b90] [c000000000503db8] .btrfs_kill_super+0x18/0xd0 > [ 77.342411] [c00000062b9a7c10] [c0000000002958c8] .deactivate_locked_super+0x98/0xe0 > [ 77.342573] [c00000062b9a7c90] [c0000000002be114] .cleanup_mnt+0x54/0xa0 > [ 77.342723] [c00000062b9a7d10] [c0000000000d7e74] .task_work_run+0x124/0x180 > [ 77.342862] [c00000062b9a7db0] [c00000000001c354] .do_notify_resume+0xa4/0xb0 > [ 77.343030] [c00000062b9a7e30] [c00000000000c344] .ret_from_except_lite+0x70/0x74 > [ 77.343187] Instruction dump: > [ 77.343276] 3b400000 e87d0c78 38810070 4bffab1d 60000000 2c230000 4182ff40 e8dfffc8 > [ 77.343522] eb630008 7d5c3010 7d294910 7d2900d0 <0b090000> 7fbc3040 41dd0070 f95fffc8 > [ 77.343771] ---[ end trace 8130fb71377d4ff8 ]--- > [ 77.343857] BTRFS warning (device loop1): qgroup 5 reserved space underflow, have: 389120, to free: 393216 > [ 77.735193] BTRFS info (device loop1): disabling disk space caching > [ 77.736629] BTRFS info (device loop1): has skinny extents > > > The above call trace happens because of the following: > > |--------------------------------+-----------------------------------| > | Task A | Task B | > |--------------------------------+-----------------------------------| > | Write 4k bytes to a file | | > | at offset range [0, 4095] | | > | - __btrfs_buffered_write | | > | -- btrfs_check_data_free_space | | > | --- Set EXTENT_QGROUP_RESERVED | | > | bit for range [0, 64k-1] | | > | -- Copy 4k data from userspace | | > | to page cache | | > | | Write 4k bytes to the file | > | | at offset range [4096, 8191] | > | | - __btrfs_buffered_write | > | | -- btrfs_check_data_free_space | > | | --- Since EXTENT_QGROUP_RESERVED | > | | is already set we don't | > | | reserve qgroup space | > | | -- Assume the call to | > | | btrfs_delalloc_reserve_metadata() | > | | fails | > | | -- btrfs_free_reserved_data_space | > | | --- Clear EXTENT_QGROUP_RESERVED | > | | file range [4096, 8191] | > |--------------------------------+-----------------------------------| Task B part seems a little strange. For the lastest for-next branch, for-next-20161125, in btrfs_delalloc_reserve_metadata(), it will roundup @start and @len. So for btrfs_qgroup_free_data(), we will free the range of [0, 64k-1], just like what we reserved in Task A. Are we using the same branch? Or I missed something? Thanks, Qu > > On x86_64, It has been almost impossible to get the call to > btrfs_delalloc_reserve_metadata() in __btrfs_buffered_write() to fail. Hence I > have the following change in __btrfs_buffered_write() ... > > if (pos != 2048) > ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes); > else > ret = 1; > if (ret) { > if (!only_release_metadata) > btrfs_free_reserved_data_space(inode, pos, > write_bytes); > else > btrfs_end_write_no_snapshoting(root); > break; > } > > > With the above change and by using max_inline=0 mount option, the following > command line will reproduce the call trace on an x86_64 machine, > $ xfs_io -f -c 'pwrite -b 2048 0 8192' -c sync /mnt/file-0.bin > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Qgroup accounting issue on kdave/for-next branch 2016-11-29 7:55 ` Qu Wenruo @ 2016-11-29 8:21 ` Chandan Rajendra 2016-11-29 8:41 ` Qu Wenruo 0 siblings, 1 reply; 8+ messages in thread From: Chandan Rajendra @ 2016-11-29 8:21 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Tuesday, November 29, 2016 03:55:53 PM Qu Wenruo wrote: > At 11/29/2016 02:36 PM, Chandan Rajendra wrote: > > When executing btrfs/126 test on kdave/for-next branch on a ppc64 guest, I > > noticed the following call trace. > > > > [ 77.335887] ------------[ cut here ]------------ > > [ 77.336115] WARNING: CPU: 0 PID: 8325 at /root/repos/linux/fs/btrfs/qgroup.c:2443 .btrfs_qgroup_free_refroot+0x188/0x220 > > [ 77.336303] Modules linked in: > > [ 77.336393] CPU: 0 PID: 8325 Comm: umount Not tainted 4.9.0-rc5-00062-g6b74e43 #22 > > [ 77.336526] task: c00000062b8d4880 task.stack: c00000062b9a4000 > > [ 77.336638] NIP: c0000000005cf018 LR: c0000000005ceff8 CTR: c0000000005390c0 > > [ 77.336771] REGS: c00000062b9a7450 TRAP: 0700 Not tainted (4.9.0-rc5-00062-g6b74e43) > > [ 77.336908] MSR: 800000000282b032 > > [ 77.336960] < > > [ 77.337027] SF > > [ 77.337053] ,VEC > > [ 77.337087] ,VSX > > [ 77.337114] ,EE > > [ 77.337146] ,FP > > [ 77.337173] ,ME > > [ 77.337207] ,IR > > [ 77.337233] ,DR > > [ 77.337267] ,RI > > [ 77.337294] > > > [ 77.337330] CR: 88000842 XER: 00000000 > > [ 77.337392] CFAR: c0000000005c9b5c > > [ 77.337443] SOFTE: 1 > > [ 77.337477] > > GPR00: > > [ 77.337517] c0000000005cefcc > > [ 77.337575] c00000062b9a76d0 > > [ 77.337626] c000000001103a00 > > [ 77.337714] c00000063e058d40 > > [ 77.337765] > > GPR04: > > [ 77.337817] c00000062b9a7740 > > [ 77.337868] 0000000000000008 > > [ 77.337927] 000000000005f000 > > [ 77.337978] 000000063eed0000 > > [ 77.338037] > > GPR08: > > [ 77.338080] c00000062f9629c8 > > [ 77.338138] 0000000000000001 > > [ 77.338191] fffffffffffff000 > > [ 77.338248] c00000063e058e80 > > [ 77.338300] > > GPR12: > > [ 77.338384] 0000000000000000 > > [ 77.338435] c00000000fe00000 > > [ 77.338498] 0000000020000000 > > [ 77.338548] 0000000000000000 > > [ 77.338605] > > GPR16: > > [ 77.338645] 0000000000000008 > > [ 77.338703] 000000004d5906fc > > [ 77.338754] 000000004d5a6c08 > > [ 77.338811] 000000004d54b3d0 > > [ 77.338866] > > GPR20: > > [ 77.338921] 000001000bcf8440 > > [ 77.338972] 0000000000000000 > > [ 77.339030] 0000000000000000 > > [ 77.339080] c00000062523b078 > > [ 77.339138] > > GPR24: > > [ 77.339178] c00000062523b080 > > [ 77.339240] c000000000da2b58 > > [ 77.339290] 0000000000000000 > > [ 77.339347] c00000062e539600 > > [ 77.339398] > > GPR28: > > [ 77.339485] 0000000000060000 > > [ 77.339536] c00000062523b000 > > [ 77.339594] c00000062e539600 > > [ 77.339644] c00000062e539688 > > > > [ 77.339740] NIP [c0000000005cf018] .btrfs_qgroup_free_refroot+0x188/0x220 > > [ 77.339852] LR [c0000000005ceff8] .btrfs_qgroup_free_refroot+0x168/0x220 > > [ 77.339959] Call Trace: > > [ 77.339998] [c00000062b9a76d0] [c0000000005cefcc] .btrfs_qgroup_free_refroot+0x13c/0x220 > > [ 77.340123] (unreliable) > > [ 77.340193] [c00000062b9a7790] [c00000000054210c] .commit_fs_roots+0x19c/0x240 > > [ 77.340355] [c00000062b9a78a0] [c0000000005451a0] .btrfs_commit_transaction.part.5+0x480/0xbe0 > > [ 77.340554] [c00000062b9a7970] [c000000000503bd4] .btrfs_sync_fs+0x74/0x1c0 > > [ 77.340725] [c00000062b9a7a10] [c0000000002d6ce0] .sync_filesystem+0xd0/0x100 > > [ 77.340891] [c00000062b9a7a90] [c000000000294f88] .generic_shutdown_super+0x38/0x1a0 > > [ 77.341052] [c00000062b9a7b10] [c000000000295508] .kill_anon_super+0x18/0x30 > > [ 77.342243] [c00000062b9a7b90] [c000000000503db8] .btrfs_kill_super+0x18/0xd0 > > [ 77.342411] [c00000062b9a7c10] [c0000000002958c8] .deactivate_locked_super+0x98/0xe0 > > [ 77.342573] [c00000062b9a7c90] [c0000000002be114] .cleanup_mnt+0x54/0xa0 > > [ 77.342723] [c00000062b9a7d10] [c0000000000d7e74] .task_work_run+0x124/0x180 > > [ 77.342862] [c00000062b9a7db0] [c00000000001c354] .do_notify_resume+0xa4/0xb0 > > [ 77.343030] [c00000062b9a7e30] [c00000000000c344] .ret_from_except_lite+0x70/0x74 > > [ 77.343187] Instruction dump: > > [ 77.343276] 3b400000 e87d0c78 38810070 4bffab1d 60000000 2c230000 4182ff40 e8dfffc8 > > [ 77.343522] eb630008 7d5c3010 7d294910 7d2900d0 <0b090000> 7fbc3040 41dd0070 f95fffc8 > > [ 77.343771] ---[ end trace 8130fb71377d4ff8 ]--- > > [ 77.343857] BTRFS warning (device loop1): qgroup 5 reserved space underflow, have: 389120, to free: 393216 > > [ 77.735193] BTRFS info (device loop1): disabling disk space caching > > [ 77.736629] BTRFS info (device loop1): has skinny extents > > > > > > The above call trace happens because of the following: > > > > |--------------------------------+-----------------------------------| > > | Task A | Task B | > > |--------------------------------+-----------------------------------| > > | Write 4k bytes to a file | | > > | at offset range [0, 4095] | | > > | - __btrfs_buffered_write | | > > | -- btrfs_check_data_free_space | | > > | --- Set EXTENT_QGROUP_RESERVED | | > > | bit for range [0, 64k-1] | | > > | -- Copy 4k data from userspace | | > > | to page cache | | > > | | Write 4k bytes to the file | > > | | at offset range [4096, 8191] | > > | | - __btrfs_buffered_write | > > | | -- btrfs_check_data_free_space | > > | | --- Since EXTENT_QGROUP_RESERVED | > > | | is already set we don't | > > | | reserve qgroup space | > > | | -- Assume the call to | > > | | btrfs_delalloc_reserve_metadata() | > > | | fails | > > | | -- btrfs_free_reserved_data_space | > > | | --- Clear EXTENT_QGROUP_RESERVED | > > | | file range [4096, 8191] | > > |--------------------------------+-----------------------------------| > > Task B part seems a little strange. > > For the lastest for-next branch, for-next-20161125, in > btrfs_delalloc_reserve_metadata(), it will roundup @start and @len. I was using for-next-20161118. But the problem is still reproducible on for-next-20161125. I guess, you meant to mention btrfs_free_reserved_data_space() instead of btrfs_delalloc_reserve_metadata(). > > So for btrfs_qgroup_free_data(), we will free the range of [0, 64k-1], > just like what we reserved in Task A. So with the new code, Instead of 4k, Task B frees 64k worth of qgroup space that it had *not* reserved. The reservation was originally done by Task A. So when Task A's file extent item is being inserted, it will end up releasing 64k additional qgroup reservation. Executing "perf record -e btrfs:btrfs_qgroup_release_data -e btrfs:btrfs_qgroup_reserve_data -a -g ./check btrfs/126" gives the following, xfs_io 9536 [002] 454.601476: btrfs:btrfs_qgroup_reserve_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=65536, op=reserve xfs_io 9536 [002] 454.601522: btrfs:btrfs_qgroup_reserve_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=0, op=reserve xfs_io 9536 [002] 454.601530: btrfs:btrfs_qgroup_release_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=65536, op=free kthreadd 9171 [003] 454.704145: btrfs:btrfs_qgroup_release_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=0, op=release Please note that in the above perf ouput, Instead of two tasks, the same task writes two 4k blocks of data. -- chandan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Qgroup accounting issue on kdave/for-next branch 2016-11-29 8:21 ` Chandan Rajendra @ 2016-11-29 8:41 ` Qu Wenruo 2016-11-29 10:19 ` Chandan Rajendra 0 siblings, 1 reply; 8+ messages in thread From: Qu Wenruo @ 2016-11-29 8:41 UTC (permalink / raw) To: Chandan Rajendra; +Cc: linux-btrfs At 11/29/2016 04:21 PM, Chandan Rajendra wrote: > On Tuesday, November 29, 2016 03:55:53 PM Qu Wenruo wrote: >> At 11/29/2016 02:36 PM, Chandan Rajendra wrote: >>> When executing btrfs/126 test on kdave/for-next branch on a ppc64 guest, I >>> noticed the following call trace. >>> >>> [ 77.335887] ------------[ cut here ]------------ >>> [ 77.336115] WARNING: CPU: 0 PID: 8325 at /root/repos/linux/fs/btrfs/qgroup.c:2443 .btrfs_qgroup_free_refroot+0x188/0x220 >>> [ 77.336303] Modules linked in: >>> [ 77.336393] CPU: 0 PID: 8325 Comm: umount Not tainted 4.9.0-rc5-00062-g6b74e43 #22 >>> [ 77.336526] task: c00000062b8d4880 task.stack: c00000062b9a4000 >>> [ 77.336638] NIP: c0000000005cf018 LR: c0000000005ceff8 CTR: c0000000005390c0 >>> [ 77.336771] REGS: c00000062b9a7450 TRAP: 0700 Not tainted (4.9.0-rc5-00062-g6b74e43) >>> [ 77.336908] MSR: 800000000282b032 >>> [ 77.336960] < >>> [ 77.337027] SF >>> [ 77.337053] ,VEC >>> [ 77.337087] ,VSX >>> [ 77.337114] ,EE >>> [ 77.337146] ,FP >>> [ 77.337173] ,ME >>> [ 77.337207] ,IR >>> [ 77.337233] ,DR >>> [ 77.337267] ,RI >>> [ 77.337294] > >>> [ 77.337330] CR: 88000842 XER: 00000000 >>> [ 77.337392] CFAR: c0000000005c9b5c >>> [ 77.337443] SOFTE: 1 >>> [ 77.337477] >>> GPR00: >>> [ 77.337517] c0000000005cefcc >>> [ 77.337575] c00000062b9a76d0 >>> [ 77.337626] c000000001103a00 >>> [ 77.337714] c00000063e058d40 >>> [ 77.337765] >>> GPR04: >>> [ 77.337817] c00000062b9a7740 >>> [ 77.337868] 0000000000000008 >>> [ 77.337927] 000000000005f000 >>> [ 77.337978] 000000063eed0000 >>> [ 77.338037] >>> GPR08: >>> [ 77.338080] c00000062f9629c8 >>> [ 77.338138] 0000000000000001 >>> [ 77.338191] fffffffffffff000 >>> [ 77.338248] c00000063e058e80 >>> [ 77.338300] >>> GPR12: >>> [ 77.338384] 0000000000000000 >>> [ 77.338435] c00000000fe00000 >>> [ 77.338498] 0000000020000000 >>> [ 77.338548] 0000000000000000 >>> [ 77.338605] >>> GPR16: >>> [ 77.338645] 0000000000000008 >>> [ 77.338703] 000000004d5906fc >>> [ 77.338754] 000000004d5a6c08 >>> [ 77.338811] 000000004d54b3d0 >>> [ 77.338866] >>> GPR20: >>> [ 77.338921] 000001000bcf8440 >>> [ 77.338972] 0000000000000000 >>> [ 77.339030] 0000000000000000 >>> [ 77.339080] c00000062523b078 >>> [ 77.339138] >>> GPR24: >>> [ 77.339178] c00000062523b080 >>> [ 77.339240] c000000000da2b58 >>> [ 77.339290] 0000000000000000 >>> [ 77.339347] c00000062e539600 >>> [ 77.339398] >>> GPR28: >>> [ 77.339485] 0000000000060000 >>> [ 77.339536] c00000062523b000 >>> [ 77.339594] c00000062e539600 >>> [ 77.339644] c00000062e539688 >>> >>> [ 77.339740] NIP [c0000000005cf018] .btrfs_qgroup_free_refroot+0x188/0x220 >>> [ 77.339852] LR [c0000000005ceff8] .btrfs_qgroup_free_refroot+0x168/0x220 >>> [ 77.339959] Call Trace: >>> [ 77.339998] [c00000062b9a76d0] [c0000000005cefcc] .btrfs_qgroup_free_refroot+0x13c/0x220 >>> [ 77.340123] (unreliable) >>> [ 77.340193] [c00000062b9a7790] [c00000000054210c] .commit_fs_roots+0x19c/0x240 >>> [ 77.340355] [c00000062b9a78a0] [c0000000005451a0] .btrfs_commit_transaction.part.5+0x480/0xbe0 >>> [ 77.340554] [c00000062b9a7970] [c000000000503bd4] .btrfs_sync_fs+0x74/0x1c0 >>> [ 77.340725] [c00000062b9a7a10] [c0000000002d6ce0] .sync_filesystem+0xd0/0x100 >>> [ 77.340891] [c00000062b9a7a90] [c000000000294f88] .generic_shutdown_super+0x38/0x1a0 >>> [ 77.341052] [c00000062b9a7b10] [c000000000295508] .kill_anon_super+0x18/0x30 >>> [ 77.342243] [c00000062b9a7b90] [c000000000503db8] .btrfs_kill_super+0x18/0xd0 >>> [ 77.342411] [c00000062b9a7c10] [c0000000002958c8] .deactivate_locked_super+0x98/0xe0 >>> [ 77.342573] [c00000062b9a7c90] [c0000000002be114] .cleanup_mnt+0x54/0xa0 >>> [ 77.342723] [c00000062b9a7d10] [c0000000000d7e74] .task_work_run+0x124/0x180 >>> [ 77.342862] [c00000062b9a7db0] [c00000000001c354] .do_notify_resume+0xa4/0xb0 >>> [ 77.343030] [c00000062b9a7e30] [c00000000000c344] .ret_from_except_lite+0x70/0x74 >>> [ 77.343187] Instruction dump: >>> [ 77.343276] 3b400000 e87d0c78 38810070 4bffab1d 60000000 2c230000 4182ff40 e8dfffc8 >>> [ 77.343522] eb630008 7d5c3010 7d294910 7d2900d0 <0b090000> 7fbc3040 41dd0070 f95fffc8 >>> [ 77.343771] ---[ end trace 8130fb71377d4ff8 ]--- >>> [ 77.343857] BTRFS warning (device loop1): qgroup 5 reserved space underflow, have: 389120, to free: 393216 >>> [ 77.735193] BTRFS info (device loop1): disabling disk space caching >>> [ 77.736629] BTRFS info (device loop1): has skinny extents >>> >>> >>> The above call trace happens because of the following: >>> >>> |--------------------------------+-----------------------------------| >>> | Task A | Task B | >>> |--------------------------------+-----------------------------------| >>> | Write 4k bytes to a file | | >>> | at offset range [0, 4095] | | >>> | - __btrfs_buffered_write | | >>> | -- btrfs_check_data_free_space | | >>> | --- Set EXTENT_QGROUP_RESERVED | | >>> | bit for range [0, 64k-1] | | >>> | -- Copy 4k data from userspace | | >>> | to page cache | | >>> | | Write 4k bytes to the file | >>> | | at offset range [4096, 8191] | >>> | | - __btrfs_buffered_write | >>> | | -- btrfs_check_data_free_space | >>> | | --- Since EXTENT_QGROUP_RESERVED | >>> | | is already set we don't | >>> | | reserve qgroup space | >>> | | -- Assume the call to | >>> | | btrfs_delalloc_reserve_metadata() | >>> | | fails | >>> | | -- btrfs_free_reserved_data_space | >>> | | --- Clear EXTENT_QGROUP_RESERVED | >>> | | file range [4096, 8191] | >>> |--------------------------------+-----------------------------------| >> >> Task B part seems a little strange. >> >> For the lastest for-next branch, for-next-20161125, in >> btrfs_delalloc_reserve_metadata(), it will roundup @start and @len. > > I was using for-next-20161118. But the problem is still reproducible on > for-next-20161125. I guess, you meant to mention > btrfs_free_reserved_data_space() instead of > btrfs_delalloc_reserve_metadata(). My fault. And yes, I was meant to mention btrfs_free_reserved_data_space(). >> >> So for btrfs_qgroup_free_data(), we will free the range of [0, 64k-1], >> just like what we reserved in Task A. > > So with the new code, Instead of 4k, Task B frees 64k worth of qgroup space > that it had *not* reserved. For 64K page size, we only support 64K sectorsize. (Without your subpage size support patches) And in that case, file extent will be 64K size, so we reserve 64K and free 64K, which is correct. > The reservation was originally done by Task A. So > when Task A's file extent item is being inserted, it will end up releasing 64k > additional qgroup reservation. You are right about the case. If Tasl A's data isn't written into disk(buffered write), then Task B doesn't need to reserve space, just reuse the space Task A reserved. And in that case, you're completely right, in error path for Task B, the free will be deadly. Two fix seems to be available: 1) Stop buffered write for the reserved space of Task A But since Task A succeeded in buffered write, it should assume the data will reach disk, and we will break the assumption. 2) Use accurate free_reserved range in error path I think it's possible to export the extent_changeset structure and use them to record what range we reserved, so we can free accurate amount of space. But the change may be a little large. Several interface needs to be changed. If I misunderstood the problem again, please point it out in case I missed something again. BTW, does the 64K sectorsize of PPC64 makes metadata much easier to hit ENOSPC? Thanks again for your detailed report, Qu > > Executing "perf record -e btrfs:btrfs_qgroup_release_data -e > btrfs:btrfs_qgroup_reserve_data -a -g ./check btrfs/126" gives the following, > > xfs_io 9536 [002] 454.601476: btrfs:btrfs_qgroup_reserve_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=65536, op=reserve > xfs_io 9536 [002] 454.601522: btrfs:btrfs_qgroup_reserve_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=0, op=reserve > xfs_io 9536 [002] 454.601530: btrfs:btrfs_qgroup_release_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=65536, op=free > kthreadd 9171 [003] 454.704145: btrfs:btrfs_qgroup_release_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=0, op=release > > Please note that in the above perf ouput, Instead of two tasks, the same task > writes two 4k blocks of data. > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Qgroup accounting issue on kdave/for-next branch 2016-11-29 8:41 ` Qu Wenruo @ 2016-11-29 10:19 ` Chandan Rajendra 0 siblings, 0 replies; 8+ messages in thread From: Chandan Rajendra @ 2016-11-29 10:19 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Tuesday, November 29, 2016 04:41:41 PM Qu Wenruo wrote: > > At 11/29/2016 04:21 PM, Chandan Rajendra wrote: > > On Tuesday, November 29, 2016 03:55:53 PM Qu Wenruo wrote: > >> At 11/29/2016 02:36 PM, Chandan Rajendra wrote: > >>> When executing btrfs/126 test on kdave/for-next branch on a ppc64 guest, I > >>> noticed the following call trace. > >>> > >>> [ 77.335887] ------------[ cut here ]------------ > >>> [ 77.336115] WARNING: CPU: 0 PID: 8325 at /root/repos/linux/fs/btrfs/qgroup.c:2443 .btrfs_qgroup_free_refroot+0x188/0x220 > >>> [ 77.336303] Modules linked in: > >>> [ 77.336393] CPU: 0 PID: 8325 Comm: umount Not tainted 4.9.0-rc5-00062-g6b74e43 #22 > >>> [ 77.336526] task: c00000062b8d4880 task.stack: c00000062b9a4000 > >>> [ 77.336638] NIP: c0000000005cf018 LR: c0000000005ceff8 CTR: c0000000005390c0 > >>> [ 77.336771] REGS: c00000062b9a7450 TRAP: 0700 Not tainted (4.9.0-rc5-00062-g6b74e43) > >>> [ 77.336908] MSR: 800000000282b032 > >>> [ 77.336960] < > >>> [ 77.337027] SF > >>> [ 77.337053] ,VEC > >>> [ 77.337087] ,VSX > >>> [ 77.337114] ,EE > >>> [ 77.337146] ,FP > >>> [ 77.337173] ,ME > >>> [ 77.337207] ,IR > >>> [ 77.337233] ,DR > >>> [ 77.337267] ,RI > >>> [ 77.337294] > > >>> [ 77.337330] CR: 88000842 XER: 00000000 > >>> [ 77.337392] CFAR: c0000000005c9b5c > >>> [ 77.337443] SOFTE: 1 > >>> [ 77.337477] > >>> GPR00: > >>> [ 77.337517] c0000000005cefcc > >>> [ 77.337575] c00000062b9a76d0 > >>> [ 77.337626] c000000001103a00 > >>> [ 77.337714] c00000063e058d40 > >>> [ 77.337765] > >>> GPR04: > >>> [ 77.337817] c00000062b9a7740 > >>> [ 77.337868] 0000000000000008 > >>> [ 77.337927] 000000000005f000 > >>> [ 77.337978] 000000063eed0000 > >>> [ 77.338037] > >>> GPR08: > >>> [ 77.338080] c00000062f9629c8 > >>> [ 77.338138] 0000000000000001 > >>> [ 77.338191] fffffffffffff000 > >>> [ 77.338248] c00000063e058e80 > >>> [ 77.338300] > >>> GPR12: > >>> [ 77.338384] 0000000000000000 > >>> [ 77.338435] c00000000fe00000 > >>> [ 77.338498] 0000000020000000 > >>> [ 77.338548] 0000000000000000 > >>> [ 77.338605] > >>> GPR16: > >>> [ 77.338645] 0000000000000008 > >>> [ 77.338703] 000000004d5906fc > >>> [ 77.338754] 000000004d5a6c08 > >>> [ 77.338811] 000000004d54b3d0 > >>> [ 77.338866] > >>> GPR20: > >>> [ 77.338921] 000001000bcf8440 > >>> [ 77.338972] 0000000000000000 > >>> [ 77.339030] 0000000000000000 > >>> [ 77.339080] c00000062523b078 > >>> [ 77.339138] > >>> GPR24: > >>> [ 77.339178] c00000062523b080 > >>> [ 77.339240] c000000000da2b58 > >>> [ 77.339290] 0000000000000000 > >>> [ 77.339347] c00000062e539600 > >>> [ 77.339398] > >>> GPR28: > >>> [ 77.339485] 0000000000060000 > >>> [ 77.339536] c00000062523b000 > >>> [ 77.339594] c00000062e539600 > >>> [ 77.339644] c00000062e539688 > >>> > >>> [ 77.339740] NIP [c0000000005cf018] .btrfs_qgroup_free_refroot+0x188/0x220 > >>> [ 77.339852] LR [c0000000005ceff8] .btrfs_qgroup_free_refroot+0x168/0x220 > >>> [ 77.339959] Call Trace: > >>> [ 77.339998] [c00000062b9a76d0] [c0000000005cefcc] .btrfs_qgroup_free_refroot+0x13c/0x220 > >>> [ 77.340123] (unreliable) > >>> [ 77.340193] [c00000062b9a7790] [c00000000054210c] .commit_fs_roots+0x19c/0x240 > >>> [ 77.340355] [c00000062b9a78a0] [c0000000005451a0] .btrfs_commit_transaction.part.5+0x480/0xbe0 > >>> [ 77.340554] [c00000062b9a7970] [c000000000503bd4] .btrfs_sync_fs+0x74/0x1c0 > >>> [ 77.340725] [c00000062b9a7a10] [c0000000002d6ce0] .sync_filesystem+0xd0/0x100 > >>> [ 77.340891] [c00000062b9a7a90] [c000000000294f88] .generic_shutdown_super+0x38/0x1a0 > >>> [ 77.341052] [c00000062b9a7b10] [c000000000295508] .kill_anon_super+0x18/0x30 > >>> [ 77.342243] [c00000062b9a7b90] [c000000000503db8] .btrfs_kill_super+0x18/0xd0 > >>> [ 77.342411] [c00000062b9a7c10] [c0000000002958c8] .deactivate_locked_super+0x98/0xe0 > >>> [ 77.342573] [c00000062b9a7c90] [c0000000002be114] .cleanup_mnt+0x54/0xa0 > >>> [ 77.342723] [c00000062b9a7d10] [c0000000000d7e74] .task_work_run+0x124/0x180 > >>> [ 77.342862] [c00000062b9a7db0] [c00000000001c354] .do_notify_resume+0xa4/0xb0 > >>> [ 77.343030] [c00000062b9a7e30] [c00000000000c344] .ret_from_except_lite+0x70/0x74 > >>> [ 77.343187] Instruction dump: > >>> [ 77.343276] 3b400000 e87d0c78 38810070 4bffab1d 60000000 2c230000 4182ff40 e8dfffc8 > >>> [ 77.343522] eb630008 7d5c3010 7d294910 7d2900d0 <0b090000> 7fbc3040 41dd0070 f95fffc8 > >>> [ 77.343771] ---[ end trace 8130fb71377d4ff8 ]--- > >>> [ 77.343857] BTRFS warning (device loop1): qgroup 5 reserved space underflow, have: 389120, to free: 393216 > >>> [ 77.735193] BTRFS info (device loop1): disabling disk space caching > >>> [ 77.736629] BTRFS info (device loop1): has skinny extents > >>> > >>> > >>> The above call trace happens because of the following: > >>> > >>> |--------------------------------+-----------------------------------| > >>> | Task A | Task B | > >>> |--------------------------------+-----------------------------------| > >>> | Write 4k bytes to a file | | > >>> | at offset range [0, 4095] | | > >>> | - __btrfs_buffered_write | | > >>> | -- btrfs_check_data_free_space | | > >>> | --- Set EXTENT_QGROUP_RESERVED | | > >>> | bit for range [0, 64k-1] | | > >>> | -- Copy 4k data from userspace | | > >>> | to page cache | | > >>> | | Write 4k bytes to the file | > >>> | | at offset range [4096, 8191] | > >>> | | - __btrfs_buffered_write | > >>> | | -- btrfs_check_data_free_space | > >>> | | --- Since EXTENT_QGROUP_RESERVED | > >>> | | is already set we don't | > >>> | | reserve qgroup space | > >>> | | -- Assume the call to | > >>> | | btrfs_delalloc_reserve_metadata() | > >>> | | fails | > >>> | | -- btrfs_free_reserved_data_space | > >>> | | --- Clear EXTENT_QGROUP_RESERVED | > >>> | | file range [4096, 8191] | > >>> |--------------------------------+-----------------------------------| > >> > >> Task B part seems a little strange. > >> > >> For the lastest for-next branch, for-next-20161125, in > >> btrfs_delalloc_reserve_metadata(), it will roundup @start and @len. > > > > I was using for-next-20161118. But the problem is still reproducible on > > for-next-20161125. I guess, you meant to mention > > btrfs_free_reserved_data_space() instead of > > btrfs_delalloc_reserve_metadata(). > > My fault. And yes, I was meant to mention btrfs_free_reserved_data_space(). > >> > >> So for btrfs_qgroup_free_data(), we will free the range of [0, 64k-1], > >> just like what we reserved in Task A. > > > > So with the new code, Instead of 4k, Task B frees 64k worth of qgroup space > > that it had *not* reserved. > > For 64K page size, we only support 64K sectorsize. (Without your subpage > size support patches) > And in that case, file extent will be 64K size, so we reserve 64K and > free 64K, which is correct. > > > The reservation was originally done by Task A. So > > when Task A's file extent item is being inserted, it will end up releasing 64k > > additional qgroup reservation. > > You are right about the case. > > If Tasl A's data isn't written into disk(buffered write), then Task B > doesn't need to reserve space, just reuse the space Task A reserved. > > And in that case, you're completely right, in error path for Task B, the > free will be deadly. > > Two fix seems to be available: > 1) Stop buffered write for the reserved space of Task A > But since Task A succeeded in buffered write, it should assume the > data will reach disk, and we will break the assumption. > 2) Use accurate free_reserved range in error path > I think it's possible to export the extent_changeset structure and > use them to record what range we reserved, so we can free accurate > amount of space. > But the change may be a little large. Several interface needs to be > changed. > > If I misunderstood the problem again, please point it out in case I > missed something again. > > > BTW, does the 64K sectorsize of PPC64 makes metadata much easier to hit > ENOSPC? On ppc64, I am actually hitting EDQUOT easily since nodesize is 64k and also the 0/5 qgroup is limited to 512k. > > Thanks again for your detailed report, > Qu > > > > > Executing "perf record -e btrfs:btrfs_qgroup_release_data -e > > btrfs:btrfs_qgroup_reserve_data -a -g ./check btrfs/126" gives the following, > > > > xfs_io 9536 [002] 454.601476: btrfs:btrfs_qgroup_reserve_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=65536, op=reserve > > xfs_io 9536 [002] 454.601522: btrfs:btrfs_qgroup_reserve_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=0, op=reserve > > xfs_io 9536 [002] 454.601530: btrfs:btrfs_qgroup_release_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=65536, op=free > > kthreadd 9171 [003] 454.704145: btrfs:btrfs_qgroup_release_data: (nil)U: root=5, ino=257, start=0, len=65536, reserved=0, op=release > > > > Please note that in the above perf ouput, Instead of two tasks, the same task > > writes two 4k blocks of data. > > > > -- chandan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Qgroup accounting issue on kdave/for-next branch 2016-11-29 6:36 Qgroup accounting issue on kdave/for-next branch Chandan Rajendra 2016-11-29 7:10 ` Qu Wenruo 2016-11-29 7:55 ` Qu Wenruo @ 2016-11-29 15:56 ` Jeff Mahoney 2016-11-29 16:01 ` Jeff Mahoney 2 siblings, 1 reply; 8+ messages in thread From: Jeff Mahoney @ 2016-11-29 15:56 UTC (permalink / raw) To: Chandan Rajendra, linux-btrfs; +Cc: quwenruo [-- Attachment #1.1: Type: text/plain, Size: 8751 bytes --] On 11/29/16 1:36 AM, Chandan Rajendra wrote: > When executing btrfs/126 test on kdave/for-next branch on a ppc64 guest, I > noticed the following call trace. > > [ 77.335887] ------------[ cut here ]------------ > [ 77.336115] WARNING: CPU: 0 PID: 8325 at /root/repos/linux/fs/btrfs/qgroup.c:2443 .btrfs_qgroup_free_refroot+0x188/0x220 > [ 77.336303] Modules linked in: > [ 77.336393] CPU: 0 PID: 8325 Comm: umount Not tainted 4.9.0-rc5-00062-g6b74e43 #22 > [ 77.336526] task: c00000062b8d4880 task.stack: c00000062b9a4000 > [ 77.336638] NIP: c0000000005cf018 LR: c0000000005ceff8 CTR: c0000000005390c0 > [ 77.336771] REGS: c00000062b9a7450 TRAP: 0700 Not tainted (4.9.0-rc5-00062-g6b74e43) > [ 77.336908] MSR: 800000000282b032 > [ 77.336960] < > [ 77.337027] SF > [ 77.337053] ,VEC > [ 77.337087] ,VSX > [ 77.337114] ,EE > [ 77.337146] ,FP > [ 77.337173] ,ME > [ 77.337207] ,IR > [ 77.337233] ,DR > [ 77.337267] ,RI > [ 77.337294] > > [ 77.337330] CR: 88000842 XER: 00000000 > [ 77.337392] CFAR: c0000000005c9b5c > [ 77.337443] SOFTE: 1 > [ 77.337477] > GPR00: > [ 77.337517] c0000000005cefcc > [ 77.337575] c00000062b9a76d0 > [ 77.337626] c000000001103a00 > [ 77.337714] c00000063e058d40 > [ 77.337765] > GPR04: > [ 77.337817] c00000062b9a7740 > [ 77.337868] 0000000000000008 > [ 77.337927] 000000000005f000 > [ 77.337978] 000000063eed0000 > [ 77.338037] > GPR08: > [ 77.338080] c00000062f9629c8 > [ 77.338138] 0000000000000001 > [ 77.338191] fffffffffffff000 > [ 77.338248] c00000063e058e80 > [ 77.338300] > GPR12: > [ 77.338384] 0000000000000000 > [ 77.338435] c00000000fe00000 > [ 77.338498] 0000000020000000 > [ 77.338548] 0000000000000000 > [ 77.338605] > GPR16: > [ 77.338645] 0000000000000008 > [ 77.338703] 000000004d5906fc > [ 77.338754] 000000004d5a6c08 > [ 77.338811] 000000004d54b3d0 > [ 77.338866] > GPR20: > [ 77.338921] 000001000bcf8440 > [ 77.338972] 0000000000000000 > [ 77.339030] 0000000000000000 > [ 77.339080] c00000062523b078 > [ 77.339138] > GPR24: > [ 77.339178] c00000062523b080 > [ 77.339240] c000000000da2b58 > [ 77.339290] 0000000000000000 > [ 77.339347] c00000062e539600 > [ 77.339398] > GPR28: > [ 77.339485] 0000000000060000 > [ 77.339536] c00000062523b000 > [ 77.339594] c00000062e539600 > [ 77.339644] c00000062e539688 > > [ 77.339740] NIP [c0000000005cf018] .btrfs_qgroup_free_refroot+0x188/0x220 > [ 77.339852] LR [c0000000005ceff8] .btrfs_qgroup_free_refroot+0x168/0x220 > [ 77.339959] Call Trace: > [ 77.339998] [c00000062b9a76d0] [c0000000005cefcc] .btrfs_qgroup_free_refroot+0x13c/0x220 > [ 77.340123] (unreliable) > [ 77.340193] [c00000062b9a7790] [c00000000054210c] .commit_fs_roots+0x19c/0x240 > [ 77.340355] [c00000062b9a78a0] [c0000000005451a0] .btrfs_commit_transaction.part.5+0x480/0xbe0 > [ 77.340554] [c00000062b9a7970] [c000000000503bd4] .btrfs_sync_fs+0x74/0x1c0 > [ 77.340725] [c00000062b9a7a10] [c0000000002d6ce0] .sync_filesystem+0xd0/0x100 > [ 77.340891] [c00000062b9a7a90] [c000000000294f88] .generic_shutdown_super+0x38/0x1a0 > [ 77.341052] [c00000062b9a7b10] [c000000000295508] .kill_anon_super+0x18/0x30 > [ 77.342243] [c00000062b9a7b90] [c000000000503db8] .btrfs_kill_super+0x18/0xd0 > [ 77.342411] [c00000062b9a7c10] [c0000000002958c8] .deactivate_locked_super+0x98/0xe0 > [ 77.342573] [c00000062b9a7c90] [c0000000002be114] .cleanup_mnt+0x54/0xa0 > [ 77.342723] [c00000062b9a7d10] [c0000000000d7e74] .task_work_run+0x124/0x180 > [ 77.342862] [c00000062b9a7db0] [c00000000001c354] .do_notify_resume+0xa4/0xb0 > [ 77.343030] [c00000062b9a7e30] [c00000000000c344] .ret_from_except_lite+0x70/0x74 > [ 77.343187] Instruction dump: > [ 77.343276] 3b400000 e87d0c78 38810070 4bffab1d 60000000 2c230000 4182ff40 e8dfffc8 > [ 77.343522] eb630008 7d5c3010 7d294910 7d2900d0 <0b090000> 7fbc3040 41dd0070 f95fffc8 > [ 77.343771] ---[ end trace 8130fb71377d4ff8 ]--- > [ 77.343857] BTRFS warning (device loop1): qgroup 5 reserved space underflow, have: 389120, to free: 393216 > [ 77.735193] BTRFS info (device loop1): disabling disk space caching > [ 77.736629] BTRFS info (device loop1): has skinny extents > > > The above call trace happens because of the following: > > |--------------------------------+-----------------------------------| > | Task A | Task B | > |--------------------------------+-----------------------------------| > | Write 4k bytes to a file | | > | at offset range [0, 4095] | | > | - __btrfs_buffered_write | | > | -- btrfs_check_data_free_space | | > | --- Set EXTENT_QGROUP_RESERVED | | > | bit for range [0, 64k-1] | | > | -- Copy 4k data from userspace | | > | to page cache | | > | | Write 4k bytes to the file | > | | at offset range [4096, 8191] | > | | - __btrfs_buffered_write | > | | -- btrfs_check_data_free_space | > | | --- Since EXTENT_QGROUP_RESERVED | > | | is already set we don't | > | | reserve qgroup space | > | | -- Assume the call to | > | | btrfs_delalloc_reserve_metadata() | > | | fails | > | | -- btrfs_free_reserved_data_space | > | | --- Clear EXTENT_QGROUP_RESERVED | > | | file range [4096, 8191] | > |--------------------------------+-----------------------------------| > > On x86_64, It has been almost impossible to get the call to > btrfs_delalloc_reserve_metadata() in __btrfs_buffered_write() to fail. Hence I > have the following change in __btrfs_buffered_write() ... Setting a qgroup limit makes it fail pretty reliably. That's exactly the test case I used when tracking down problem described in "[PATCH] btrfs: Ensure proper sector alignment for btrfs_free_reserved_data_space" I think this is also the same issue described there -- write_bytes can be anything, but reserve_bytes is sector-aligned. -Jeff > if (pos != 2048) > ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes); > else > ret = 1; > if (ret) { > if (!only_release_metadata) > btrfs_free_reserved_data_space(inode, pos, > write_bytes); > else > btrfs_end_write_no_snapshoting(root); > break; > } > > > With the above change and by using max_inline=0 mount option, the following > command line will reproduce the call trace on an x86_64 machine, > $ xfs_io -f -c 'pwrite -b 2048 0 8192' -c sync /mnt/file-0.bin > -- Jeff Mahoney SUSE Labs [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 841 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Qgroup accounting issue on kdave/for-next branch 2016-11-29 15:56 ` Jeff Mahoney @ 2016-11-29 16:01 ` Jeff Mahoney 0 siblings, 0 replies; 8+ messages in thread From: Jeff Mahoney @ 2016-11-29 16:01 UTC (permalink / raw) To: Chandan Rajendra, linux-btrfs; +Cc: quwenruo [-- Attachment #1.1: Type: text/plain, Size: 9316 bytes --] On 11/29/16 10:56 AM, Jeff Mahoney wrote: > On 11/29/16 1:36 AM, Chandan Rajendra wrote: >> When executing btrfs/126 test on kdave/for-next branch on a ppc64 guest, I >> noticed the following call trace. >> >> [ 77.335887] ------------[ cut here ]------------ >> [ 77.336115] WARNING: CPU: 0 PID: 8325 at /root/repos/linux/fs/btrfs/qgroup.c:2443 .btrfs_qgroup_free_refroot+0x188/0x220 >> [ 77.336303] Modules linked in: >> [ 77.336393] CPU: 0 PID: 8325 Comm: umount Not tainted 4.9.0-rc5-00062-g6b74e43 #22 >> [ 77.336526] task: c00000062b8d4880 task.stack: c00000062b9a4000 >> [ 77.336638] NIP: c0000000005cf018 LR: c0000000005ceff8 CTR: c0000000005390c0 >> [ 77.336771] REGS: c00000062b9a7450 TRAP: 0700 Not tainted (4.9.0-rc5-00062-g6b74e43) >> [ 77.336908] MSR: 800000000282b032 >> [ 77.336960] < >> [ 77.337027] SF >> [ 77.337053] ,VEC >> [ 77.337087] ,VSX >> [ 77.337114] ,EE >> [ 77.337146] ,FP >> [ 77.337173] ,ME >> [ 77.337207] ,IR >> [ 77.337233] ,DR >> [ 77.337267] ,RI >> [ 77.337294] > >> [ 77.337330] CR: 88000842 XER: 00000000 >> [ 77.337392] CFAR: c0000000005c9b5c >> [ 77.337443] SOFTE: 1 >> [ 77.337477] >> GPR00: >> [ 77.337517] c0000000005cefcc >> [ 77.337575] c00000062b9a76d0 >> [ 77.337626] c000000001103a00 >> [ 77.337714] c00000063e058d40 >> [ 77.337765] >> GPR04: >> [ 77.337817] c00000062b9a7740 >> [ 77.337868] 0000000000000008 >> [ 77.337927] 000000000005f000 >> [ 77.337978] 000000063eed0000 >> [ 77.338037] >> GPR08: >> [ 77.338080] c00000062f9629c8 >> [ 77.338138] 0000000000000001 >> [ 77.338191] fffffffffffff000 >> [ 77.338248] c00000063e058e80 >> [ 77.338300] >> GPR12: >> [ 77.338384] 0000000000000000 >> [ 77.338435] c00000000fe00000 >> [ 77.338498] 0000000020000000 >> [ 77.338548] 0000000000000000 >> [ 77.338605] >> GPR16: >> [ 77.338645] 0000000000000008 >> [ 77.338703] 000000004d5906fc >> [ 77.338754] 000000004d5a6c08 >> [ 77.338811] 000000004d54b3d0 >> [ 77.338866] >> GPR20: >> [ 77.338921] 000001000bcf8440 >> [ 77.338972] 0000000000000000 >> [ 77.339030] 0000000000000000 >> [ 77.339080] c00000062523b078 >> [ 77.339138] >> GPR24: >> [ 77.339178] c00000062523b080 >> [ 77.339240] c000000000da2b58 >> [ 77.339290] 0000000000000000 >> [ 77.339347] c00000062e539600 >> [ 77.339398] >> GPR28: >> [ 77.339485] 0000000000060000 >> [ 77.339536] c00000062523b000 >> [ 77.339594] c00000062e539600 >> [ 77.339644] c00000062e539688 >> >> [ 77.339740] NIP [c0000000005cf018] .btrfs_qgroup_free_refroot+0x188/0x220 >> [ 77.339852] LR [c0000000005ceff8] .btrfs_qgroup_free_refroot+0x168/0x220 >> [ 77.339959] Call Trace: >> [ 77.339998] [c00000062b9a76d0] [c0000000005cefcc] .btrfs_qgroup_free_refroot+0x13c/0x220 >> [ 77.340123] (unreliable) >> [ 77.340193] [c00000062b9a7790] [c00000000054210c] .commit_fs_roots+0x19c/0x240 >> [ 77.340355] [c00000062b9a78a0] [c0000000005451a0] .btrfs_commit_transaction.part.5+0x480/0xbe0 >> [ 77.340554] [c00000062b9a7970] [c000000000503bd4] .btrfs_sync_fs+0x74/0x1c0 >> [ 77.340725] [c00000062b9a7a10] [c0000000002d6ce0] .sync_filesystem+0xd0/0x100 >> [ 77.340891] [c00000062b9a7a90] [c000000000294f88] .generic_shutdown_super+0x38/0x1a0 >> [ 77.341052] [c00000062b9a7b10] [c000000000295508] .kill_anon_super+0x18/0x30 >> [ 77.342243] [c00000062b9a7b90] [c000000000503db8] .btrfs_kill_super+0x18/0xd0 >> [ 77.342411] [c00000062b9a7c10] [c0000000002958c8] .deactivate_locked_super+0x98/0xe0 >> [ 77.342573] [c00000062b9a7c90] [c0000000002be114] .cleanup_mnt+0x54/0xa0 >> [ 77.342723] [c00000062b9a7d10] [c0000000000d7e74] .task_work_run+0x124/0x180 >> [ 77.342862] [c00000062b9a7db0] [c00000000001c354] .do_notify_resume+0xa4/0xb0 >> [ 77.343030] [c00000062b9a7e30] [c00000000000c344] .ret_from_except_lite+0x70/0x74 >> [ 77.343187] Instruction dump: >> [ 77.343276] 3b400000 e87d0c78 38810070 4bffab1d 60000000 2c230000 4182ff40 e8dfffc8 >> [ 77.343522] eb630008 7d5c3010 7d294910 7d2900d0 <0b090000> 7fbc3040 41dd0070 f95fffc8 >> [ 77.343771] ---[ end trace 8130fb71377d4ff8 ]--- >> [ 77.343857] BTRFS warning (device loop1): qgroup 5 reserved space underflow, have: 389120, to free: 393216 >> [ 77.735193] BTRFS info (device loop1): disabling disk space caching >> [ 77.736629] BTRFS info (device loop1): has skinny extents >> >> >> The above call trace happens because of the following: >> >> |--------------------------------+-----------------------------------| >> | Task A | Task B | >> |--------------------------------+-----------------------------------| >> | Write 4k bytes to a file | | >> | at offset range [0, 4095] | | >> | - __btrfs_buffered_write | | >> | -- btrfs_check_data_free_space | | >> | --- Set EXTENT_QGROUP_RESERVED | | >> | bit for range [0, 64k-1] | | >> | -- Copy 4k data from userspace | | >> | to page cache | | >> | | Write 4k bytes to the file | >> | | at offset range [4096, 8191] | >> | | - __btrfs_buffered_write | >> | | -- btrfs_check_data_free_space | >> | | --- Since EXTENT_QGROUP_RESERVED | >> | | is already set we don't | >> | | reserve qgroup space | >> | | -- Assume the call to | >> | | btrfs_delalloc_reserve_metadata() | >> | | fails | >> | | -- btrfs_free_reserved_data_space | >> | | --- Clear EXTENT_QGROUP_RESERVED | >> | | file range [4096, 8191] | >> |--------------------------------+-----------------------------------| >> >> On x86_64, It has been almost impossible to get the call to >> btrfs_delalloc_reserve_metadata() in __btrfs_buffered_write() to fail. Hence I >> have the following change in __btrfs_buffered_write() ... > > Setting a qgroup limit makes it fail pretty reliably. That's exactly > the test case I used when tracking down problem described in "[PATCH] > btrfs: Ensure proper sector alignment for btrfs_free_reserved_data_space" > > I think this is also the same issue described there -- write_bytes can > be anything, but reserve_bytes is sector-aligned. Actually, nevermind. This is related, but not the same thing. As you mention later, having overlapping writes where the qgroup bits are already set in one will cause problems. The patch in the above post ensures that we're releasing the correct number of bytes reserved in the same call, but doesn't handle the case that the range was already set. -Jeff >> if (pos != 2048) >> ret = btrfs_delalloc_reserve_metadata(inode, reserve_bytes); >> else >> ret = 1; >> if (ret) { >> if (!only_release_metadata) >> btrfs_free_reserved_data_space(inode, pos, >> write_bytes); >> else >> btrfs_end_write_no_snapshoting(root); >> break; >> } >> >> >> With the above change and by using max_inline=0 mount option, the following >> command line will reproduce the call trace on an x86_64 machine, >> $ xfs_io -f -c 'pwrite -b 2048 0 8192' -c sync /mnt/file-0.bin >> > > -- Jeff Mahoney SUSE Labs [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 841 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-11-29 16:04 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-11-29 6:36 Qgroup accounting issue on kdave/for-next branch Chandan Rajendra 2016-11-29 7:10 ` Qu Wenruo 2016-11-29 7:55 ` Qu Wenruo 2016-11-29 8:21 ` Chandan Rajendra 2016-11-29 8:41 ` Qu Wenruo 2016-11-29 10:19 ` Chandan Rajendra 2016-11-29 15:56 ` Jeff Mahoney 2016-11-29 16:01 ` Jeff Mahoney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).