btrfs freezing on writes

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* btrfs freezing on writes
@ 2020-04-07 19:46 kjansen387
  2020-04-07 20:11 ` Chris Murphy
  2020-04-09  4:32 ` Zygo Blaxell
  0 siblings, 2 replies; 13+ messages in thread
From: kjansen387 @ 2020-04-07 19:46 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]

Hello,

I'm using btrfs on fedora 31 running 5.5.10-200.fc31.x86_64 .

I've moved my workload from md raid5 to btrfs raid1.
# btrfs filesystem show
Label: none  uuid: 8ce9e167-57ea-4cf8-8678-3049ba028c12
         Total devices 5 FS bytes used 3.73TiB
         devid    1 size 3.64TiB used 2.53TiB path /dev/sde
         devid    2 size 3.64TiB used 2.53TiB path /dev/sdf
         devid    3 size 1.82TiB used 902.00GiB path /dev/sdb
         devid    4 size 1.82TiB used 902.03GiB path /dev/sdc
         devid    5 size 1.82TiB used 904.03GiB path /dev/sdd

# btrfs fi df /export
Data, RAID1: total=3.85TiB, used=3.72TiB
System, RAID1: total=32.00MiB, used=608.00KiB
Metadata, RAID1: total=6.00GiB, used=5.16GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

After moving to btrfs, I'm seeing freezes of ~10 seconds very, very 
often (multiple times per minute). Mariadb, vim, influxdb, etc.

See attachment for a stacktrace of vim, and the dmesg output of 'echo w 
 > /proc/sysrq-trigger' also including other hanging processes.

What's going on.. ? Hope someone can help.

Thanks,
Klaas

[-- Attachment #2: debug.txt --]
[-- Type: text/plain, Size: 15988 bytes --]

[937013.793854] sysrq: Show Blocked State
[937013.793988]   task                        PC stack   pid father
[937013.794093] mysqld          D    0 10400      1 0x00004000
[937013.794095] Call Trace:
[937013.794100]  ? __schedule+0x2c7/0x740
[937013.794101]  schedule+0x4a/0xb0
[937013.794102]  schedule_preempt_disabled+0xa/0x10
[937013.794103]  __mutex_lock.isra.0+0x16b/0x4d0
[937013.794105]  ? __switch_to_asm+0x40/0x70
[937013.794144]  ? can_overcommit.part.0+0x50/0xa0 [btrfs]
[937013.794156]  ? btrfs_reserve_metadata_bytes+0x6dc/0x940 [btrfs]
[937013.794168]  btrfs_log_inode_parent+0x149/0xdf0 [btrfs]
[937013.794171]  ? prepare_to_wait_event+0x7e/0x150
[937013.794173]  ? finish_wait+0x3f/0x80
[937013.794184]  ? btrfs_block_rsv_add_bytes+0x20/0x60 [btrfs]
[937013.794192]  ? wait_current_trans+0x2f/0x130 [btrfs]
[937013.794193]  ? _cond_resched+0x15/0x30
[937013.794202]  ? join_transaction+0x24/0x410 [btrfs]
[937013.794213]  btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
[937013.794237]  btrfs_sync_file+0x25f/0x400 [btrfs]
[937013.794240]  do_fsync+0x38/0x70
[937013.794240]  __x64_sys_fsync+0x10/0x20
[937013.794242]  do_syscall_64+0x5b/0x1c0
[937013.794244]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[937013.794245] RIP: 0033:0x7fd0d48e1deb
[937013.794247] Code: Bad RIP value.
[937013.794247] RSP: 002b:00007fd0a4bcc200 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[937013.794248] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007fd0d48e1deb
[937013.794248] RDX: 0000000000000010 RSI: 0000000000000010 RDI: 0000000000000010
[937013.794249] RBP: 00007fd0a4bcc2d0 R08: 0000000000000000 R09: 00000000000002f3
[937013.794249] R10: 0000000000000010 R11: 0000000000000293 R12: 00007fd0a4bcc240
[937013.794250] R13: 0000000000000000 R14: 0005a2b86cf17940 R15: 0000000000056925
[937013.794253] mysqld          D    0 10412      1 0x00004000
[937013.794254] Call Trace:
[937013.794255]  ? __schedule+0x2c7/0x740
[937013.794256]  schedule+0x4a/0xb0
[937013.794264]  wait_for_commit+0x58/0x80 [btrfs]
[937013.794266]  ? finish_wait+0x80/0x80
[937013.794274]  btrfs_commit_transaction+0x87b/0xa20 [btrfs]
[937013.794276]  ? dput+0xb6/0x2d0
[937013.794286]  ? btrfs_log_dentry_safe+0x55/0x70 [btrfs]
[937013.794295]  btrfs_sync_file+0x3b2/0x400 [btrfs]
[937013.794297]  do_fsync+0x38/0x70
[937013.794298]  __x64_sys_fsync+0x10/0x20
[937013.794299]  do_syscall_64+0x5b/0x1c0
[937013.794300]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[937013.794300] RIP: 0033:0x7fd0d48e1deb
[937013.794301] Code: Bad RIP value.
[937013.794302] RSP: 002b:00007fd0934dfcd0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[937013.794302] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd0d48e1deb
[937013.794303] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000054
[937013.794303] RBP: 00007fd0934dfed0 R08: 0000000000000000 R09: 0000000000000000
[937013.794303] R10: 0000000000000009 R11: 0000000000000293 R12: 00007fd0934dff20
[937013.794304] R13: 0000000000000054 R14: 8f5c28f5c28f5c29 R15: 028f5c28f5c28f5c
[937013.794306] mysqld          D    0 10421      1 0x00004000
[937013.794307] Call Trace:
[937013.794308]  ? __schedule+0x2c7/0x740
[937013.794309]  ? __switch_to_asm+0x34/0x70
[937013.794310]  ? __switch_to_asm+0x34/0x70
[937013.794311]  schedule+0x4a/0xb0
[937013.794312]  schedule_timeout+0x20f/0x300
[937013.794313]  ? __schedule+0x2cf/0x740
[937013.794314]  io_schedule_timeout+0x19/0x40
[937013.794314]  wait_for_completion_io+0x119/0x160
[937013.794316]  ? wake_up_q+0xa0/0xa0
[937013.794323]  write_all_supers+0x7d2/0x890 [btrfs]
[937013.794332]  btrfs_commit_transaction+0x753/0xa20 [btrfs]
[937013.794342]  ? btrfs_log_dentry_safe+0x55/0x70 [btrfs]
[937013.794351]  btrfs_sync_file+0x3b2/0x400 [btrfs]
[937013.794353]  do_fsync+0x38/0x70
[937013.794353]  __x64_sys_fsync+0x10/0x20
[937013.794355]  do_syscall_64+0x5b/0x1c0
[937013.794356]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[937013.794356] RIP: 0033:0x7fd0d48e1deb
[937013.794357] Code: Bad RIP value.
[937013.794358] RSP: 002b:00007fd0917fcfd0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[937013.794358] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd0d48e1deb
[937013.794358] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000019
[937013.794359] RBP: 00007fd0917fd1d0 R08: 0000000000000000 R09: 0000000000000000
[937013.794359] R10: 00005601eb73eee0 R11: 0000000000000293 R12: 00007fd0917fd220
[937013.794360] R13: 0000000000000019 R14: 8f5c28f5c28f5c29 R15: 028f5c28f5c28f5c
[937013.794385] postmaster      D    0  3216   2409 0x00004000
[937013.794385] Call Trace:
[937013.794387]  ? __schedule+0x2c7/0x740
[937013.794388]  schedule+0x4a/0xb0
[937013.794388]  schedule_preempt_disabled+0xa/0x10
[937013.794390]  __mutex_lock.isra.0+0x16b/0x4d0
[937013.794398]  ? start_transaction+0xbb/0x4c0 [btrfs]
[937013.794408]  btrfs_pin_log_trans+0x19/0x30 [btrfs]
[937013.794417]  btrfs_rename2+0x25b/0x1f60 [btrfs]
[937013.794419]  ? link_path_walk.part.0+0x74/0x540
[937013.794420]  ? path_parentat.isra.0+0x3f/0x80
[937013.794421]  ? inode_permission+0xad/0x140
[937013.794422]  ? vfs_rename+0x3e1/0x9e0
[937013.794430]  ? btrfs_create+0x200/0x200 [btrfs]
[937013.794431]  vfs_rename+0x3e1/0x9e0
[937013.794433]  ? __d_lookup+0x5e/0x140
[937013.794434]  ? lookup_dcache+0x18/0x60
[937013.794435]  do_renameat2+0x381/0x530
[937013.794437]  __x64_sys_rename+0x1f/0x30
[937013.794438]  do_syscall_64+0x5b/0x1c0
[937013.794439]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[937013.794439] RIP: 0033:0x7fb3ce3bae8b
[937013.794440] Code: Bad RIP value.
[937013.794441] RSP: 002b:00007ffc432333c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
[937013.794441] RAX: ffffffffffffffda RBX: 00007ffc43233410 RCX: 00007fb3ce3bae8b
[937013.794442] RDX: 0000557d55c1bd50 RSI: 00007ffc43233830 RDI: 00007ffc43233430
[937013.794442] RBP: 0000557d55c4b8e0 R08: 0000000000000000 R09: 00000000000001ff
[937013.794442] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000000
[937013.794443] R13: 00007ffc43233430 R14: 00007ffc43233830 R15: 0000000000000001
[937013.794750] mongod          D    0 1186970 1186954 0x00000320
[937013.794750] Call Trace:
[937013.794752]  ? __schedule+0x2c7/0x740
[937013.794753]  schedule+0x4a/0xb0
[937013.794755]  rwsem_down_write_slowpath+0x24c/0x510
[937013.794765]  btrfs_file_write_iter+0x83/0x570 [btrfs]
[937013.794773]  ? btrfs_real_readdir+0x2b0/0x4c0 [btrfs]
[937013.794775]  ? __seccomp_filter+0x7b/0x670
[937013.794777]  new_sync_write+0x12d/0x1d0
[937013.794778]  vfs_write+0xb6/0x1a0
[937013.794779]  ksys_pwrite64+0x65/0xa0
[937013.794780]  do_syscall_64+0x5b/0x1c0
[937013.794782]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[937013.794782] RIP: 0033:0x7f9b6f1c504f
[937013.794783] Code: Bad RIP value.
[937013.794784] RSP: 002b:00007f9b6cdbc560 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
[937013.794784] RAX: ffffffffffffffda RBX: 00000000000000b3 RCX: 00007f9b6f1c504f
[937013.794785] RDX: 0000000000004300 RSI: 000055b4f307b000 RDI: 00000000000000b3
[937013.794785] RBP: 000055b4f307b000 R08: 0000000000000000 R09: 0000000000237f80
[937013.794786] R10: 0000000000237f80 R11: 0000000000000293 R12: 0000000000004300
[937013.794786] R13: 0000000000237f80 R14: 000055b4f509f640 R15: 0000000000237f80
[937013.794788] WTJourn.Flusher D    0 1186978 1186954 0x00004320
[937013.794788] Call Trace:
[937013.794789]  ? __schedule+0x2c7/0x740
[937013.794790]  schedule+0x4a/0xb0
[937013.794791]  schedule_preempt_disabled+0xa/0x10
[937013.794792]  __mutex_lock.isra.0+0x16b/0x4d0
[937013.794793]  ? __switch_to_asm+0x40/0x70
[937013.794804]  ? can_overcommit.part.0+0x50/0xa0 [btrfs]
[937013.794830]  ? btrfs_reserve_metadata_bytes+0x6dc/0x940 [btrfs]
[937013.794841]  btrfs_log_inode_parent+0x149/0xdf0 [btrfs]
[937013.794843]  ? prepare_to_wait_event+0x7e/0x150
[937013.794844]  ? finish_wait+0x3f/0x80
[937013.794855]  ? btrfs_block_rsv_add_bytes+0x20/0x60 [btrfs]
[937013.794863]  ? wait_current_trans+0x2f/0x130 [btrfs]
[937013.794864]  ? _cond_resched+0x15/0x30
[937013.794872]  ? join_transaction+0x24/0x410 [btrfs]
[937013.794883]  btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
[937013.794893]  btrfs_sync_file+0x25f/0x400 [btrfs]
[937013.794894]  do_fsync+0x38/0x70
[937013.794895]  __x64_sys_fdatasync+0x13/0x20
[937013.794896]  do_syscall_64+0x5b/0x1c0
[937013.794897]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[937013.794898] RIP: 0033:0x7f9b6eed92e7
[937013.794899] Code: Bad RIP value.
[937013.794899] RSP: 002b:00007f9b695b52d0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[937013.794900] RAX: ffffffffffffffda RBX: 00000000000000b3 RCX: 00007f9b6eed92e7
[937013.794900] RDX: 0000000000000000 RSI: 00000000000000b3 RDI: 00000000000000b3
[937013.794901] RBP: 00007f9b695b5310 R08: 000055b504668080 R09: 00007f9b695b52e8
[937013.794901] R10: 00007f9b695b52e0 R11: 0000000000000293 R12: 000055b5046680c0
[937013.794902] R13: 000055b4ef32c559 R14: 0000000000000000 R15: 000055b4f0aef0f8
[937013.794903] ftdc            D    0 1186980 1186954 0x00000320
[937013.794904] Call Trace:
[937013.794905]  ? __schedule+0x2c7/0x740
[937013.794906]  schedule+0x4a/0xb0
[937013.794907]  schedule_preempt_disabled+0xa/0x10
[937013.794907]  __mutex_lock.isra.0+0x16b/0x4d0
[937013.794916]  ? start_transaction+0xbb/0x4c0 [btrfs]
[937013.794927]  btrfs_pin_log_trans+0x19/0x30 [btrfs]
[937013.794936]  btrfs_rename2+0x25b/0x1f60 [btrfs]
[937013.794938]  ? link_path_walk.part.0+0x74/0x540
[937013.794939]  ? path_parentat.isra.0+0x3f/0x80
[937013.794940]  ? inode_permission+0xad/0x140
[937013.794942]  ? vfs_rename+0x3e1/0x9e0
[937013.794951]  ? btrfs_create+0x200/0x200 [btrfs]
(gdb) #0  0x00007fb16c2badc7 in fsync () from /lib64/libpthread.so.0
#1  0x000055d92401053f in mf_sync ()
#2  0x000055d923ec772e in ml_sync_all ()
#3  0x000055d923e96527 in before_blocking ()
#4  0x000055d923fa0fe1 in inchar_loop ()
#5  0x000055d923fa04d4 in ui_inchar ()
#6  0x000055d923e9697f in inchar ()
#7  0x000055d923e9743e in vgetorpeek ()
#8  0x000055d923e98ba7 in vgetc ()
#9  0x000055d923e98d5d in safe_vgetc ()
#10 0x000055d923e3aabd in edit ()
#11 0x000055d923edfa16 in invoke_edit.isra ()
#12 0x000055d923ee7866 in normal_cmd ()
#13 0x000055d92400d7aa in main_loop ()
#14 0x000055d92400e75b in vim_main2 ()
#15 0x000055d923dfd23d in main ()


[937013.794952]  vfs_rename+0x3e1/0x9e0
[937013.794953]  ? __d_lookup+0x5e/0x140
[937013.794955]  do_renameat2+0x381/0x530
[937013.794956]  __x64_sys_rename+0x1f/0x30
[937013.794957]  do_syscall_64+0x5b/0x1c0
[937013.794959]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[937013.794959] RIP: 0033:0x7f9b6ee3dd37
[937013.794960] Code: Bad RIP value.
[937013.794960] RSP: 002b:00007f9b685b3318 EFLAGS: 00000202 ORIG_RAX: 0000000000000052
[937013.794961] RAX: ffffffffffffffda RBX: 000055b4f4762c50 RCX: 00007f9b6ee3dd37
[937013.794961] RDX: 00007f9b685b3360 RSI: 000055b4f43d5ff0 RDI: 000055b4f43d5fa0
[937013.794962] RBP: 00007f9b685b3340 R08: 00007f9b685b4700 R09: 000055b4ef32e2e9
[937013.794962] R10: 00000000000001b6 R11: 0000000000000202 R12: 000055b4f4762c30
[937013.794962] R13: 00007f9b685b3360 R14: 00007f9b685b3720 R15: 000055b4ff007004
[937013.795022] influxd         D    0 2082782      1 0x00004000
[937013.795023] Call Trace:
[937013.795024]  ? __schedule+0x2c7/0x740
[937013.795025]  schedule+0x4a/0xb0
[937013.795026]  schedule_preempt_disabled+0xa/0x10
[937013.795027]  __mutex_lock.isra.0+0x16b/0x4d0
[937013.795028]  ? __switch_to_asm+0x40/0x70
[937013.795029]  ? __switch_to_asm+0x34/0x70
[937013.795030]  ? __switch_to_asm+0x34/0x70
[937013.795031]  ? __switch_to_asm+0x40/0x70
[937013.795032]  ? __switch_to+0x10d/0x420
[937013.795044]  btrfs_log_inode_parent+0x407/0xdf0 [btrfs]
[937013.795046]  ? prepare_to_wait_event+0x7e/0x150
[937013.795047]  ? finish_wait+0x3f/0x80
[937013.795056]  ? wait_current_trans+0xc0/0x130 [btrfs]
[937013.795064]  ? join_transaction+0x24/0x410 [btrfs]
[937013.795074]  btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
[937013.795084]  btrfs_sync_file+0x25f/0x400 [btrfs]
[937013.795086]  do_fsync+0x38/0x70
[937013.795087]  __x64_sys_fsync+0x10/0x20
[937013.795088]  do_syscall_64+0x5b/0x1c0
[937013.795090]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[937013.795090] RIP: 0033:0x4b1420
[937013.795091] Code: Bad RIP value.
[937013.795092] RSP: 002b:000000c00246e5d8 EFLAGS: 00000212 ORIG_RAX: 000000000000004a
[937013.795092] RAX: ffffffffffffffda RBX: 000000c000058000 RCX: 00000000004b1420
[937013.795093] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000007f4
[937013.795093] RBP: 000000c00246e618 R08: 0000000000000000 R09: 0000000000000000
[937013.795094] R10: 0000000000000000 R11: 0000000000000212 R12: 000000c0490ae120
[937013.795094] R13: 000000c0490ae120 R14: 000000c0436fd020 R15: 0000000000000055
[937013.795098] influxd         D    0 2082804      1 0x00004000
[937013.795099] Call Trace:
[937013.795100]  ? __schedule+0x2c7/0x740
[937013.795102]  schedule+0x4a/0xb0
[937013.795110]  wait_for_commit+0x58/0x80 [btrfs]
[937013.795112]  ? finish_wait+0x80/0x80
[937013.795120]  btrfs_commit_transaction+0x87b/0xa20 [btrfs]
[937013.795121]  ? dput+0xb6/0x2d0
[937013.795132]  ? btrfs_log_dentry_safe+0x55/0x70 [btrfs]
[937013.795141]  btrfs_sync_file+0x3b2/0x400 [btrfs]
[937013.795143]  do_fsync+0x38/0x70
[937013.795144]  __x64_sys_fsync+0x10/0x20
[937013.795145]  do_syscall_64+0x5b/0x1c0
[937013.795146]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[937013.795147] RIP: 0033:0x4b1420
[937013.795148] Code: Bad RIP value.
[937013.795148] RSP: 002b:000000c000efa5d8 EFLAGS: 00000212 ORIG_RAX: 000000000000004a
[937013.795149] RAX: ffffffffffffffda RBX: 000000c000053400 RCX: 00000000004b1420
[937013.795149] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000051
[937013.795150] RBP: 000000c000efa618 R08: 0000000000000000 R09: 0000000000000000
[937013.795150] R10: 0000000000000000 R11: 0000000000000212 R12: 000000c04c5ae360
[937013.795151] R13: 000000c04c5ae360 R14: 000000c0008b6240 R15: 0000000000000055
[937013.795191] vim             D    0 2228648 2223845 0x00004000
[937013.795191] Call Trace:
[937013.795193]  ? __schedule+0x2c7/0x740
[937013.795194]  ? __switch_to_asm+0x40/0x70
[937013.795195]  ? __switch_to_asm+0x40/0x70
[937013.795196]  schedule+0x4a/0xb0
[937013.795197]  schedule_preempt_disabled+0xa/0x10
[937013.795198]  __mutex_lock.isra.0+0x16b/0x4d0
[937013.795199]  ? __switch_to_asm+0x40/0x70
[937013.795210]  ? can_overcommit.part.0+0x50/0xa0 [btrfs]
[937013.795221]  ? btrfs_reserve_metadata_bytes+0x6dc/0x940 [btrfs]
[937013.795232]  btrfs_log_inode_parent+0x149/0xdf0 [btrfs]
[937013.795234]  ? prepare_to_wait_event+0x7e/0x150
[937013.795235]  ? finish_wait+0x3f/0x80
[937013.795245]  ? btrfs_block_rsv_add_bytes+0x20/0x60 [btrfs]
[937013.795254]  ? wait_current_trans+0x2f/0x130 [btrfs]
[937013.795256]  ? _cond_resched+0x15/0x30
[937013.795264]  ? join_transaction+0x24/0x410 [btrfs]
[937013.795274]  btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
[937013.795284]  btrfs_sync_file+0x25f/0x400 [btrfs]
[937013.795286]  do_fsync+0x38/0x70
[937013.795286]  __x64_sys_fsync+0x10/0x20
[937013.795287]  do_syscall_64+0x5b/0x1c0
[937013.795289]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[937013.795289] RIP: 0033:0x7fb16c2badc7
[937013.795290] Code: Bad RIP value.
[937013.795291] RSP: 002b:00007ffc7dc31d78 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[937013.795291] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb16c2badc7
[937013.795292] RDX: 0000000000000000 RSI: 000055d92402ad3b RDI: 0000000000000003
[937013.795292] RBP: 000055d9251a9380 R08: 0000000000000001 R09: 0000000000000000
[937013.795292] R10: 000055d9240c0e00 R11: 0000000000000246 R12: 0000000000000001
[937013.795293] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs freezing on writes
  2020-04-07 19:46 btrfs freezing on writes kjansen387
@ 2020-04-07 20:11 ` Chris Murphy
  2020-04-07 20:22   ` Chris Murphy
  2020-04-09  4:32 ` Zygo Blaxell
  1 sibling, 1 reply; 13+ messages in thread
From: Chris Murphy @ 2020-04-07 20:11 UTC (permalink / raw)
  To: kjansen387; +Cc: Btrfs BTRFS

On Tue, Apr 7, 2020 at 1:46 PM kjansen387 <kjansen387@gmail.com> wrote:
>
> Hello,
>
> I'm using btrfs on fedora 31 running 5.5.10-200.fc31.x86_64 .
>
> I've moved my workload from md raid5 to btrfs raid1.
> # btrfs filesystem show
> Label: none  uuid: 8ce9e167-57ea-4cf8-8678-3049ba028c12
>          Total devices 5 FS bytes used 3.73TiB
>          devid    1 size 3.64TiB used 2.53TiB path /dev/sde
>          devid    2 size 3.64TiB used 2.53TiB path /dev/sdf
>          devid    3 size 1.82TiB used 902.00GiB path /dev/sdb
>          devid    4 size 1.82TiB used 902.03GiB path /dev/sdc
>          devid    5 size 1.82TiB used 904.03GiB path /dev/sdd
>
> # btrfs fi df /export
> Data, RAID1: total=3.85TiB, used=3.72TiB
> System, RAID1: total=32.00MiB, used=608.00KiB
> Metadata, RAID1: total=6.00GiB, used=5.16GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> After moving to btrfs, I'm seeing freezes of ~10 seconds very, very
> often (multiple times per minute). Mariadb, vim, influxdb, etc.
>
> See attachment for a stacktrace of vim, and the dmesg output of 'echo w
>  > /proc/sysrq-trigger' also including other hanging processes.
>
> What's going on.. ? Hope someone can help.

How busy are the databases? What are the mount options for this volume?

I think there is some kind of write contention possible if there's
heavy fsync writes, since the tree log is per subvolume? (Maybe a
developer can describe this correctly if I haven't.) A possible work
around is putting each database in its own subvolume.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs freezing on writes
  2020-04-07 20:11 ` Chris Murphy
@ 2020-04-07 20:22   ` Chris Murphy
  2020-04-07 20:39     ` kjansen387
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Murphy @ 2020-04-07 20:22 UTC (permalink / raw)
  To: Chris Murphy; +Cc: kjansen387, Btrfs BTRFS

On Tue, Apr 7, 2020 at 2:11 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Tue, Apr 7, 2020 at 1:46 PM kjansen387 <kjansen387@gmail.com> wrote:
> >
> > Hello,
> >
> > I'm using btrfs on fedora 31 running 5.5.10-200.fc31.x86_64 .
> >
> > I've moved my workload from md raid5 to btrfs raid1.
> > # btrfs filesystem show
> > Label: none  uuid: 8ce9e167-57ea-4cf8-8678-3049ba028c12
> >          Total devices 5 FS bytes used 3.73TiB
> >          devid    1 size 3.64TiB used 2.53TiB path /dev/sde
> >          devid    2 size 3.64TiB used 2.53TiB path /dev/sdf
> >          devid    3 size 1.82TiB used 902.00GiB path /dev/sdb
> >          devid    4 size 1.82TiB used 902.03GiB path /dev/sdc
> >          devid    5 size 1.82TiB used 904.03GiB path /dev/sdd
> >
> > # btrfs fi df /export
> > Data, RAID1: total=3.85TiB, used=3.72TiB
> > System, RAID1: total=32.00MiB, used=608.00KiB
> > Metadata, RAID1: total=6.00GiB, used=5.16GiB
> > GlobalReserve, single: total=512.00MiB, used=0.00B
> >
> > After moving to btrfs, I'm seeing freezes of ~10 seconds very, very
> > often (multiple times per minute). Mariadb, vim, influxdb, etc.
> >
> > See attachment for a stacktrace of vim, and the dmesg output of 'echo w
> >  > /proc/sysrq-trigger' also including other hanging processes.
> >
> > What's going on.. ? Hope someone can help.
>
> How busy are the databases? What are the mount options for this volume?
>
> I think there is some kind of write contention possible if there's
> heavy fsync writes, since the tree log is per subvolume? (Maybe a
> developer can describe this correctly if I haven't.) A possible work
> around is putting each database in its own subvolume.

You could, of course, start with just the busiest database and see if
it makes any difference. By default on Fedora, only /home and / are on
separate subvolumes. The general idea would be to use a subvolume
instead of a directory, e.g.

##shutdown the database
mv /var/lib/mysql /var/lib/mysqlold
btrfs sub create /var/lib/mysql
restorecon -Rv /var/lib/mysql
cp -r --reflink /var/lib/mysqlold/* /var/lib/mysql/
##resume database operation, and clean up mysqlold whenever

Another thing that might help, depending on the workload, is
space_cache=v2. You can safely switch to v2 via:

mount -o remount,clear_cache,space_cache=v2 /

This is a one time command. It will set a feature flag for space cache
v2, and it'll be used automatically on subsequent mounts. It may be
busy for a minute or so while the cache is rebuilt.

You might test these separately so you have an idea of their relative
effects on the workload.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs freezing on writes
  2020-04-07 20:22   ` Chris Murphy
@ 2020-04-07 20:39     ` kjansen387
  2020-04-07 22:11       ` Chris Murphy
  0 siblings, 1 reply; 13+ messages in thread
From: kjansen387 @ 2020-04-07 20:39 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 3658 bytes --]

Hi Chris,

$ grep /export /etc/fstab
UUID=8ce9e167-57ea-4cf8-8678-3049ba028c12 /export       btrfs 
device=/dev/sde,device=/dev/sdf    0 2

The device section is not complete, but I don't think that matters 
because btrfs handles this automatically right ?

$ mount | grep /export
/dev/sde on /export type btrfs (rw,relatime,space_cache,subvolid=5,subvol=/)

I've attached sysstat info of my disks. What's obvious is that 2 disks 
have the load (one is written to, the other one is the mirror), and 3 
are pretty idle. But, it's 2.4MB per second - that's not much!

I've just changed the space_cache to v2, but it doesn't seem to help 
much. 'sync -f /export/tmp' still takes very long at times (just took 22 
seconds!)

Any way we can find the cause, before I move everything into subvolumes 
? I'd like to avoid that if possible. Sounds a bit overkill for 2.4MB/s 
writes, and I think most of it is going to one influxdb database anyway.

Thanks!

On 07-Apr-20 22:22, Chris Murphy wrote:
> On Tue, Apr 7, 2020 at 2:11 PM Chris Murphy <lists@colorremedies.com> wrote:
>>
>> On Tue, Apr 7, 2020 at 1:46 PM kjansen387 <kjansen387@gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> I'm using btrfs on fedora 31 running 5.5.10-200.fc31.x86_64 .
>>>
>>> I've moved my workload from md raid5 to btrfs raid1.
>>> # btrfs filesystem show
>>> Label: none  uuid: 8ce9e167-57ea-4cf8-8678-3049ba028c12
>>>           Total devices 5 FS bytes used 3.73TiB
>>>           devid    1 size 3.64TiB used 2.53TiB path /dev/sde
>>>           devid    2 size 3.64TiB used 2.53TiB path /dev/sdf
>>>           devid    3 size 1.82TiB used 902.00GiB path /dev/sdb
>>>           devid    4 size 1.82TiB used 902.03GiB path /dev/sdc
>>>           devid    5 size 1.82TiB used 904.03GiB path /dev/sdd
>>>
>>> # btrfs fi df /export
>>> Data, RAID1: total=3.85TiB, used=3.72TiB
>>> System, RAID1: total=32.00MiB, used=608.00KiB
>>> Metadata, RAID1: total=6.00GiB, used=5.16GiB
>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>>
>>> After moving to btrfs, I'm seeing freezes of ~10 seconds very, very
>>> often (multiple times per minute). Mariadb, vim, influxdb, etc.
>>>
>>> See attachment for a stacktrace of vim, and the dmesg output of 'echo w
>>>   > /proc/sysrq-trigger' also including other hanging processes.
>>>
>>> What's going on.. ? Hope someone can help.
>>
>> How busy are the databases? What are the mount options for this volume?
>>
>> I think there is some kind of write contention possible if there's
>> heavy fsync writes, since the tree log is per subvolume? (Maybe a
>> developer can describe this correctly if I haven't.) A possible work
>> around is putting each database in its own subvolume.
> 
> You could, of course, start with just the busiest database and see if
> it makes any difference. By default on Fedora, only /home and / are on
> separate subvolumes. The general idea would be to use a subvolume
> instead of a directory, e.g.
> 
> ##shutdown the database
> mv /var/lib/mysql /var/lib/mysqlold
> btrfs sub create /var/lib/mysql
> restorecon -Rv /var/lib/mysql
> cp -r --reflink /var/lib/mysqlold/* /var/lib/mysql/
> ##resume database operation, and clean up mysqlold whenever
> 
> Another thing that might help, depending on the workload, is
> space_cache=v2. You can safely switch to v2 via:
> 
> mount -o remount,clear_cache,space_cache=v2 /
> 
> This is a one time command. It will set a feature flag for space cache
> v2, and it'll be used automatically on subsequent mounts. It may be
> busy for a minute or so while the cache is rebuilt.
> 
> You might test these separately so you have an idea of their relative
> effects on the workload.
> 
> 

[-- Attachment #2: debug_sar.txt --]
[-- Type: text/plain, Size: 31067 bytes --]

$ sar -dp -s 21:00:00 -e 22:00:00 | egrep -e '(tps|sd[bcdef])'
21:00:01          DEV       tps     rkB/s     wkB/s     dkB/s   areq-sz    aqu-sz     await     %util
21:01:00          sdb      6.21      0.09    101.08      0.00     16.30      0.03      5.52      0.75
21:01:00          sdc      4.15      0.03     10.81      0.00      2.61      0.03      7.05      0.56
21:01:00          sdd      3.95      0.50      9.30      0.00      2.48      0.03      8.18      0.52
21:01:00          sde     44.66      2.08   2178.70      0.00     48.83      8.17    183.46      2.49
21:01:00          sdf     47.07      0.44   2146.21      0.00     45.61      6.53    139.14      2.50
21:02:00          sdb      7.58      0.09    156.24      0.00     20.64      0.05      6.72      0.81
21:02:00          sdc      5.78      0.02     16.38      0.00      2.84      0.05      9.00      0.67
21:02:00          sdd      4.43      0.02     11.72      0.00      2.65      0.04      8.80      0.59
21:02:00          sde     67.12      0.02   2549.52      0.00     37.99      2.39     36.12      3.09
21:02:00          sdf     71.21      0.29   2753.11      0.00     38.66      7.49    105.73      3.19
21:03:00          sdb      6.78      0.02    207.26      0.00     30.59      0.03      5.40      0.58
21:03:00          sdc      2.41      0.02      5.79      0.00      2.41      0.02      7.97      0.33
21:03:00          sdd      2.45      0.09      5.93      0.00      2.46      0.03     11.99      0.32
21:03:00          sde     34.33      0.02   1554.25      0.00     45.27      7.60    221.78      2.11
21:03:00          sdf     36.41      0.02   1740.13      0.00     47.79      8.06    221.85      2.22
21:04:00          sdb      6.73      0.02    265.20      0.00     39.43      0.02      3.77      0.45
21:04:00          sdc      1.65      0.02      4.06      0.00      2.48      0.02     11.16      0.20
21:04:00          sdd      1.71      0.02      4.60      0.00      2.69      0.03     15.21      0.19
21:04:00          sde     39.88      1.16   1927.74      0.00     48.37     10.57    265.53      2.04
21:04:00          sdf     44.11      0.16   2056.74      0.00     46.64      2.24     51.22      2.15
21:05:00          sdb      2.58      0.05     49.74      0.00     19.30      0.02      8.17      0.24
21:05:00          sdc      1.10      0.05      2.20      0.00      2.05      0.01     11.00      0.15
21:05:00          sdd      1.23      0.05      3.46      0.00      2.85      0.04     31.08      0.15
21:05:00          sde     21.69      0.05   1086.47      0.00     50.09      4.37    202.00      1.23
21:05:00          sdf     18.03      0.05   1106.38      0.00     61.37      9.84    546.46      1.22
21:06:00          sdb      1.77      0.02     35.17      0.00     19.94      0.01      7.28      0.16
21:06:00          sdc      0.67      0.02      1.53      0.00      2.34      0.01     10.25      0.08
21:06:00          sdd      0.72      0.02      1.73      0.00      2.45      0.01     20.72      0.08
21:06:00          sde     23.98      1.42   1013.76      0.00     42.33      0.42     17.94      1.25
21:06:00          sdf     20.50      0.29   1055.99      0.00     51.53     15.91    776.58      1.24
21:07:00          sdb      3.66      0.02     91.71      0.00     25.04      0.02      6.30      0.32
21:07:00          sdc      1.35      0.29      3.33      0.00      2.69      0.01      8.98      0.18
21:07:00          sdd      1.43      0.09      4.13      0.00      2.95      0.01     10.57      0.17
21:07:00          sde     29.74      0.49   1274.66      0.00     42.88      3.63    122.68      1.49
21:07:00          sdf     26.39      0.29   1351.98      0.00     51.24     11.08    420.47      1.49
21:08:00          sdb      2.36      4.55     44.61      0.00     20.80      0.02      6.92      0.22
21:08:00          sdc      0.80      0.02      1.86      0.00      2.36      0.01     12.58      0.10
21:08:00          sdd      0.87      0.02      2.33      0.00      2.72      0.01     16.15      0.10
21:08:00          sde     23.97      1.16   1128.15      0.00     47.11     12.84    536.13      1.44
21:08:00          sdf     27.02      0.02   1164.18      0.00     43.09      9.12    338.15      1.52
21:09:00          sdb      6.16      0.02    178.38      0.00     28.96      0.03      5.61      0.50
21:09:00          sdc      2.16      0.02      5.06      0.00      2.35      0.03     13.26      0.28
21:09:00          sdd      2.33      0.09      6.66      0.00      2.90      0.03     11.49      0.28
21:09:00          sde     47.24      0.02   2285.96      0.00     48.39      5.08    108.10      2.19
21:09:00          sdf     46.24      0.29   2438.23      0.00     52.73      9.52    206.36      2.41
21:10:00          sdb      3.70      0.22    113.22      0.00     30.69      0.01      4.37      0.29
21:10:00          sdc      1.12      0.02      2.60      0.00      2.35      0.01      9.06      0.14
21:10:00          sdd      1.42      1.22      3.00      0.00      2.98      0.02     14.45      0.18
21:10:00          sde     29.44     31.86   1348.12      0.00     46.88      0.48     16.75      1.54
21:10:00          sdf     24.08      0.82   1416.18      0.00     58.86     13.47    559.88      1.41
21:11:00          sdb      2.83      0.02     38.36      0.00     13.56      0.02      6.88      0.30
21:11:00          sdc      2.65      0.02      8.92      0.00      3.38      0.02      8.69      0.20
21:11:00          sdd      1.50      0.09      3.66      0.00      2.51      0.02     11.19      0.18
21:11:00          sde     28.97      0.02   1456.10      0.00     50.27      8.24    284.85      1.63
21:11:00          sdf     33.59      0.02   1582.09      0.00     47.09      4.01    119.77      1.71
21:12:00          sdb      5.63      0.02    149.48      0.00     26.56      0.03      5.40      0.52
21:12:00          sdc      2.15      0.02      5.46      0.00      2.55      0.02      7.76      0.30
21:12:00          sdd      2.23      0.02      6.86      0.00      3.09      0.02     11.04      0.28
21:12:00          sde     45.41      0.02   2012.12      0.00     44.31      2.08     46.19      2.18
21:12:00          sdf     43.61      0.02   2108.91      0.00     48.35     12.94    297.30      2.23
21:13:00          sdb      5.71      0.02     82.92      0.00     14.52      0.03      5.87      0.65
21:13:00          sdc      3.51      0.02      7.93      0.00      2.26      0.03      7.89      0.48
21:13:00          sdd      3.60      0.02      8.66      0.00      2.41      0.03      9.21      0.47
21:13:00          sde     47.12      0.02   1945.85      0.00     41.30      1.17     25.40      2.20
21:13:00          sdf     46.29      0.02   2005.59      0.00     43.33      4.54     98.58      2.22
21:14:00          sdb      3.40      5.15     34.37      0.00     11.63      0.02      7.26      0.37
21:14:00          sdc      1.78      0.69      3.86      0.00      2.56      0.01      8.64      0.24
21:14:00          sdd      1.75      0.02      4.20      0.00      2.41      0.02     13.26      0.22
21:14:00          sde     31.34     12.35   1519.93      0.00     48.89      8.77    280.39      1.76
21:14:00          sdf     35.69      6.69   1559.10      0.00     43.88      3.03     85.51      1.71
21:15:00          sdb      6.26      0.06    129.45      0.00     20.69      0.03      5.82      0.58
21:15:00          sdc      3.16      0.06      7.19      0.00      2.29      0.03      9.32      0.42
21:15:00          sdd      3.18      0.06      7.46      0.00      2.36      0.06     19.19      0.40
21:15:00          sde     49.08      0.32   2347.86      0.00     47.85      6.09    124.60      2.56
21:15:00          sdf     51.61      0.46   2328.95      0.00     45.14      4.57     89.12      2.63
21:16:00          sdb      1.90      0.02     45.42      0.00     23.94      0.01      7.19      0.18
21:16:00          sdc      0.98      0.02      2.80      0.00      2.87      0.01     10.24      0.12
21:16:00          sdd      1.07      0.29      2.93      0.00      3.02      0.01     12.55      0.12
21:16:00          sde     16.55     11.88   1008.52      0.00     61.66     15.59    942.31      1.14
21:16:00          sdf     21.99      0.69   1031.37      0.00     46.92      0.27     12.55      1.12
21:17:00          sdb      2.06      0.02     37.44      0.00     18.14      0.01      6.96      0.21
21:17:00          sdc      1.08      0.02      2.73      0.00      2.55      0.01      9.58      0.14
21:17:00          sdd      1.27      0.22      4.00      0.00      3.34      0.02     12.63      0.14
21:17:00          sde     19.22      8.42   1329.36      0.00     69.61     13.69    712.74      1.37
21:17:00          sdf     26.73      2.49   1306.71      0.00     48.98      0.61     23.24      1.34
21:18:01          sdb      6.48      0.02    104.81      0.00     16.19      0.04      7.32      0.75
21:18:01          sdc      4.01      0.02      9.32      0.00      2.33      0.03      7.99      0.56
21:18:01          sdd      4.08      0.02      9.99      0.00      2.46      0.03      7.30      0.53
21:18:01          sde     58.90      1.09   2357.38      0.00     40.04      3.71     63.55      2.75
21:18:01          sdf     59.96      0.16   2372.57      0.00     39.57      1.34     22.83      2.78
21:19:01          sdb      2.30      0.02     20.92      0.00      9.11      0.02      7.54      0.24
21:19:01          sdc      1.37      0.02      3.33      0.00      2.46      0.01      8.41      0.17
21:19:01          sdd      1.42      0.02      3.93      0.00      2.79      0.02     11.29      0.16
21:19:01          sde     31.59      0.02   1686.53      0.00     53.39      7.34    232.92      1.78
21:19:01          sdf     33.12      0.02   1610.12      0.00     48.61      9.48    286.80      1.86
21:20:01          sdb      2.61      0.29     36.16      0.00     13.95      0.01      5.72      0.27
21:20:01          sdc      1.51      0.02      3.66      0.00      2.43      0.02     12.71      0.20
21:20:01          sdd      1.55      0.02      4.20      0.00      2.73      0.02     10.18      0.19
21:20:01          sde     34.09      0.02   1615.58      0.00     47.39      8.43    247.65      1.90
21:20:01          sdf     33.79      0.02   1670.18      0.00     49.42     10.99    325.83      1.88
21:21:01          sdb      2.88      1.29     17.98      0.00      6.69      0.02      6.71      0.35
21:21:01          sdc      2.95      0.02      9.59      0.00      3.26      0.01      5.07      0.26
21:21:01          sdd      1.76      0.02      4.73      0.00      2.69      0.02      9.55      0.22
21:21:01          sde     35.70      8.75   1959.24      0.00     55.13     11.88    333.36      2.13
21:21:01          sdf     43.74     48.91   1973.09      0.00     46.23      3.60     82.75      2.47
21:22:01          sdb      3.10      0.02     16.92      0.00      5.47      0.01      5.12      0.39
21:22:01          sdc      2.18      0.02      5.53      0.00      2.55      0.02      8.05      0.30
21:22:01          sdd      2.23      0.02      5.93      0.00      2.67      0.02      8.52      0.28
21:22:01          sde     33.57      0.02   1472.57      0.00     43.86      3.44    102.90      2.03
21:22:01          sdf     32.07      0.02   1476.04      0.00     46.02      9.75    304.46      1.90
21:23:01          sdb      2.93      0.02     18.58      0.00      6.35      0.02      6.43      0.33
21:23:01          sdc      1.98      0.02      4.73      0.00      2.40      0.02     12.38      0.27
21:23:01          sdd      2.08      0.09      5.46      0.00      2.67      0.02     10.12      0.25
21:23:01          sde     42.64      0.02   1941.59      0.00     45.53      5.33    125.49      2.17
21:23:01          sdf     41.56      0.02   1942.99      0.00     46.75     10.29    248.03      2.09
21:24:01          sdb      3.95      0.09     19.21      0.00      4.89      0.03      6.79      0.50
21:24:01          sdc      3.15      0.03      7.23      0.00      2.31      0.03      9.10      0.45
21:24:01          sdd      3.21      0.03      7.97      0.00      2.49      0.03      9.65      0.43
21:24:01          sde     43.18      0.03   1820.85      0.00     42.17      5.30    123.18      2.40
21:24:01          sdf     43.68      0.03   1826.74      0.00     41.82      6.16    141.60      2.26
21:25:01          sdb      3.28      0.98     15.64      0.00      5.07      0.02      7.61      0.41
21:25:01          sdc      2.48      0.05      5.57      0.00      2.26      0.02      8.09      0.35
21:25:01          sdd      2.67      0.05      7.29      0.00      2.75      0.05     17.55      0.34
21:25:01          sde     36.86      7.07   1739.47      0.00     47.39      5.13    139.82      1.94
21:25:01          sdf     38.50      3.69   1741.46      0.00     45.33      6.78    176.60      1.92
21:26:01          sdb      0.92      0.09      3.00      0.00      3.37      0.01      9.91      0.11
21:26:01          sdc      0.75      0.02      1.80      0.00      2.43      0.01     11.84      0.10
21:26:01          sdd      0.78      0.02      2.00      0.00      2.59      0.01     14.91      0.09
21:26:01          sde     12.74      0.02    876.46      0.00     68.81     11.13    874.11      0.95
21:26:01          sdf     13.54      0.02    939.79      0.00     69.43     12.19    901.14      0.95
21:27:01          sdb      1.57      0.02      9.99      0.00      6.40      0.01      7.76      0.18
21:27:01          sdc      1.10      0.02      2.86      0.00      2.63      0.01      9.20      0.15
21:27:01          sdd      1.17      0.02      3.33      0.00      2.88      0.01     11.43      0.14
21:27:01          sde     23.73      0.02   1364.46      0.00     57.50      3.53    149.12      1.39
21:27:01          sdf     21.90      0.02   1317.10      0.00     60.15     11.87    542.60      1.31
21:28:01          sdb      1.72      0.03      8.41      0.00      4.91      0.01      7.81      0.23
21:28:01          sdc      1.40      0.03      3.27      0.00      2.35      0.01      9.50      0.20
21:28:01          sdd      1.47      0.03      3.74      0.00      2.56      0.02     13.36      0.19
21:28:01          sde     26.21      0.03   1266.63      0.00     48.33      3.38    129.37      1.39
21:28:01          sdf     24.12      0.03   1244.93      0.00     51.62      5.84    242.51      1.39
21:29:01          sdb      5.42      0.03     26.34      0.00      4.87      0.03      6.85      0.68
21:29:01          sdc      4.36      0.03     10.03      0.00      2.30      0.04      8.70      0.62
21:29:01          sdd      4.56      0.09     12.17      0.00      2.69      0.04      8.22      0.59
21:29:01          sde     60.35      0.09   2264.75      0.00     37.53      2.76     46.25      2.90
21:29:01          sdf     61.66      0.03   2203.58      0.00     35.74      3.33     54.54      2.81
21:30:01          sdb      4.32      0.02     25.15      0.00      5.83      0.03      7.85      0.55
21:30:01          sdc      3.31      0.02      7.54      0.00      2.29      0.03     10.39      0.48
21:30:01          sdd      3.36      0.02      7.94      0.00      2.37      0.02      7.23      0.44
21:30:01          sde     51.75      0.02   2065.78      0.00     39.92      0.96     19.05      2.53
21:30:01          sdf     50.07      0.55   2089.21      0.00     41.74      5.24    105.19      2.51
21:31:01          sdb      4.20      0.02     20.85      0.00      4.97      0.03      6.72      0.54
21:31:01          sdc      4.43      0.02     13.19      0.00      2.98      0.02      5.36      0.47
21:31:01          sdd      3.55      1.09      8.66      0.00      2.75      0.04     10.48      0.45
21:31:01          sde     44.56      1.89   2270.60      0.00     51.00      7.86    176.80      2.61
21:31:01          sdf     46.72      0.22   2325.54      0.00     49.78     11.52    247.11      2.62
21:32:01          sdb      5.24      0.03     24.13      0.00      4.61      0.05      9.07      0.66
21:32:01          sdc      4.41      0.03     10.72      0.00      2.44      0.04      9.17      0.62
21:32:01          sdd      4.51      0.03     12.06      0.00      2.68      0.03      7.95      0.57
21:32:01          sde     58.13      0.03   2335.79      0.00     40.18      2.63     45.75      2.89
21:32:01          sdf     58.45      0.03   2340.62      0.00     40.05      4.05     69.71      2.76
21:33:01          sdb      2.36      0.02     11.41      0.00      4.86      0.02      8.26      0.31
21:33:01          sdc      1.92      0.02      4.45      0.00      2.32      0.02      9.39      0.26
21:33:01          sdd      1.99      0.02      4.91      0.00      2.48      0.02     10.93      0.25
21:33:01          sde     28.98      0.02   1349.54      0.00     46.57      5.67    196.25      1.61
21:33:01          sdf     28.58      0.02   1316.44      0.00     46.06      2.29     80.75      1.53
21:34:01          sdb      2.50      0.03     12.89      0.00      5.17      0.02      7.48      0.29
21:34:01          sdc      1.97      0.03      4.66      0.00      2.37      0.02     10.15      0.27
21:34:01          sdd      2.01      0.03      5.19      0.00      2.60      0.02     11.58      0.25
21:34:01          sde     29.60      0.03   1298.84      0.00     43.88      1.46     49.83      1.65
21:34:01          sdf     28.88      0.03   1338.51      0.00     46.36      8.35    289.83      1.64
21:35:01          sdb      1.56      0.05      6.89      0.00      4.46      0.01      8.19      0.18
21:35:01          sdc      1.27      0.05      2.85      0.00      2.27      0.01     10.86      0.16
21:35:01          sdd      1.29      0.05      3.11      0.00      2.45      0.03     21.23      0.15
21:35:01          sde     23.72      0.05   1340.62      0.00     56.53     10.50    443.37      1.43
21:35:01          sdf     28.02      0.05   1343.13      0.00     47.94      4.09    146.62      1.40
21:36:01          sdb      1.55     17.62      6.53      0.00     15.59      0.01      6.68      0.15
21:36:01          sdc      1.40     17.88      1.53      0.00     13.88      0.01      7.00      0.12
21:36:01          sdd      1.38     17.62      1.53      0.00     13.85      0.02     11.42      0.11
21:36:01          sde     18.27     17.62    902.88      0.00     50.37     12.86    704.01      1.31
21:36:01          sdf     17.76     17.62    940.53      0.00     53.96     13.46    758.66      1.30
21:37:01          sdb      8.10      0.09    144.64      0.00     17.88      0.04      5.59      0.86
21:37:01          sdc     31.19      0.03    517.64      0.00     16.60      0.75     24.56      2.07
21:37:01          sdd     31.07      0.03    518.31      0.00     16.68      0.10      3.74      2.13
21:37:01          sde    102.70      0.29   3837.35      0.00     37.37      2.17     21.66      4.14
21:37:01          sdf    100.55      0.29   3683.64      0.00     36.64      1.11     11.53      4.08
21:38:00          sdb      5.93      0.03     26.00      0.00      4.39      0.04      6.97      0.77
21:38:00          sdc     11.69      0.03    139.47      0.00     11.93      0.12     10.68      0.93
21:38:00          sdd     11.71      0.03    140.14      0.00     11.97      0.06      5.82      0.89
21:38:00          sde     95.35      0.23   3413.91      0.00     35.81      0.93     10.23      3.89
21:38:00          sdf     96.39      0.03   3417.87      0.00     35.46      1.06     11.51      3.85
21:39:00          sdb     51.18      0.02    847.68      0.00     16.56      0.59     11.94      2.62
21:39:00          sdc      5.56      0.02     12.78      0.00      2.30      0.04      7.99      0.78
21:39:00          sdd      5.75      0.09     14.64      0.00      2.56      0.04      6.42      0.75
21:39:00          sde     68.13      0.02   3002.38      0.00     44.07      0.50      7.86      3.25
21:39:00          sdf    111.67      0.02   3827.48      0.00     34.27      2.57     23.51      4.65
21:40:01          sdb     74.28      0.09   1347.79      0.00     18.15      1.08     14.98      3.73
21:40:01          sdc      5.86      0.02     13.57      0.00      2.32      0.04      7.91      0.81
21:40:01          sdd      5.96      0.02     14.43      0.00      2.43      0.04      7.85      0.79
21:40:01          sde     63.27      0.02   3366.73      0.00     53.21      0.70     11.50      3.31
21:40:01          sdf    131.17      0.29   4649.23      0.00     35.45      1.36     10.83      5.69
21:41:01          sdb     41.49      0.09    754.77      0.00     18.19      0.69     17.12      2.51
21:41:01          sdc      8.13      0.03     21.25      0.00      2.61      0.05      6.34      1.00
21:41:01          sdd      7.00      0.03     16.70      0.00      2.39      0.04      6.68      0.92
21:41:01          sde     69.88      0.29   3200.33      0.00     45.80      1.30     19.15      3.80
21:41:01          sdf     99.06      0.56   3681.11      0.00     37.16      1.31     13.73      4.74
21:42:00          sdb      6.65      0.03     31.28      0.00      4.71      0.04      5.82      0.86
21:42:00          sdc      5.41      0.03     12.86      0.00      2.38      0.04      7.27      0.77
21:42:00          sdd      5.58      0.03     14.54      0.00      2.61      0.04      6.61      0.73
21:42:00          sde     74.96      0.03   2923.73      0.00     39.00      1.30     17.85      4.10
21:42:00          sdf     76.60      0.03   2924.33      0.00     38.18      1.47     19.71      4.01
21:43:01          sdb      6.14      0.02     25.17      0.00      4.10      0.04      6.78      0.84
21:43:01          sdc      5.23      0.02     12.02      0.00      2.30      0.04      7.64      0.74
21:43:01          sdd      5.32      0.02     12.75      0.00      2.40      0.03      6.62      0.71
21:43:01          sde     68.64      0.02   2657.53      0.00     38.72      1.27     19.07      3.61
21:43:01          sdf     68.11      0.02   2662.81      0.00     39.09      1.12     16.90      3.60
21:44:00          sdb      6.30      0.03     33.27      0.00      5.29      0.04      7.21      0.82
21:44:00          sdc      5.08      0.03     11.52      0.00      2.27      0.04      9.01      0.71
21:44:00          sdd      5.12      0.03     12.05      0.00      2.36      0.03      6.54      0.69
21:44:00          sde     68.80      0.03   2527.21      0.00     36.73      0.89     13.45      3.71
21:44:00          sdf     69.18      0.03   2542.76      0.00     36.76      1.96     28.81      3.66
21:45:01          sdb      6.40      0.06     25.49      0.00      3.99      0.04      7.03      0.85
21:45:01          sdc      5.55      0.06     12.41      0.00      2.25      0.04      8.07      0.77
21:45:01          sdd      5.60      0.32     12.94      0.00      2.37      0.07     12.99      0.72
21:45:01          sde     72.05      0.06   2806.21      0.00     38.95      0.87     12.63      3.51
21:45:01          sdf     72.91      0.32   2812.28      0.00     38.58      0.91     12.96      3.55
21:46:01          sdb      8.73      0.03    117.82      0.00     13.50      0.05      6.13      1.07
21:46:01          sdc      6.55      0.03     15.85      0.00      2.42      0.05      7.44      0.89
21:46:01          sdd      6.97      1.57     16.19      0.00      2.55      0.05      7.69      0.89
21:46:01          sde     78.81      2.71   3419.65      0.00     43.43      1.97     25.52      4.95
21:46:01          sdf     78.27      1.10   3444.50      0.00     44.02      2.33     30.33      4.72
21:47:00          sdb      5.92      0.03     31.38      0.00      5.31      0.03      6.25      0.76
21:47:00          sdc      4.58      0.03     10.46      0.00      2.29      0.03      7.63      0.65
21:47:00          sdd      4.68      0.09     11.26      0.00      2.43      0.03      6.77      0.62
21:47:00          sde     69.63      0.03   2466.38      0.00     35.42      1.98     28.94      4.25
21:47:00          sdf     69.26      0.03   2468.45      0.00     35.64      1.90     27.87      4.15
21:48:01          sdb      7.07      0.02     29.71      0.00      4.21      0.05      6.80      0.95
21:48:01          sdc      5.79      0.02     13.09      0.00      2.26      0.04      7.82      0.81
21:48:01          sdd      5.82      0.02     13.61      0.00      2.34      0.04      6.46      0.78
21:48:01          sde     76.31      0.02   2922.56      0.00     38.30      2.03     27.05      4.66
21:48:01          sdf     78.32      0.02   2932.11      0.00     37.44      2.39     31.06      4.81
21:49:01          sdb      6.45      0.02     26.88      0.00      4.17      0.04      6.20      0.86
21:49:01          sdc      5.29      0.02     11.84      0.00      2.24      0.04      7.35      0.75
21:49:01          sdd      5.32      0.02     12.38      0.00      2.33      0.03      6.19      0.72
21:49:01          sde     68.41      0.02   2402.66      0.00     35.12      2.41     35.66      4.01
21:49:01          sdf     70.06      0.02   2408.98      0.00     34.38      1.49     21.75      4.44
21:50:01          sdb      4.68      0.03     18.72      0.00      4.01      0.03      6.66      0.62
21:50:01          sdc      4.05      0.03      9.19      0.00      2.27      0.03      8.19      0.55
21:50:01          sdd      4.09      0.03      9.73      0.00      2.39      0.03      7.16      0.54
21:50:01          sde     62.16      0.03   2202.13      0.00     35.43      1.04     17.15      3.20
21:50:01          sdf     60.03      0.03   2205.58      0.00     36.74      3.44     57.78      3.12
21:51:01          sdb      7.19      0.03     29.03      0.00      4.04      0.04      6.20      0.97
21:51:01          sdc      6.96      0.03     18.55      0.00      2.67      0.04      5.75      0.82
21:51:01          sdd      5.81      0.03     13.95      0.00      2.41      0.04      6.64      0.76
21:51:01          sde     79.19      0.03   2975.14      0.00     37.57      2.01     25.83      4.79
21:51:01          sdf     80.73      0.03   2988.55      0.00     37.02      3.07     38.59      5.02
21:52:01          sdb      6.89      0.03     65.27      0.00      9.48      0.04      5.58      0.83
21:52:01          sdc      4.72      0.03     11.14      0.00      2.37      0.04      7.92      0.67
21:52:01          sdd      5.06      0.09     21.29      0.00      4.23      0.04      7.57      0.63
21:52:01          sde     76.41      0.03   2972.24      0.00     38.90      3.84     50.79      5.18
21:52:01          sdf     77.69      0.03   3009.41      0.00     38.73      3.57     46.45      5.20
21:53:01          sdb      6.07      0.02     31.33      0.00      5.17      0.04      6.21      0.77
21:53:01          sdc      4.64      0.02     10.58      0.00      2.28      0.03      7.08      0.65
21:53:01          sdd      5.02      0.02     16.36      0.00      3.26      0.04      8.33      0.65
21:53:01          sde     68.11      0.02   2519.26      0.00     36.99      1.33     20.07      3.90
21:53:01          sdf     68.61      0.02   2523.71      0.00     36.79      2.41     35.69      3.84
21:54:01          sdb      6.29      0.02     29.63      0.00      4.71      0.04      6.08      0.80
21:54:01          sdc      5.08      0.02     11.72      0.00      2.31      0.04      8.11      0.70
21:54:01          sdd      5.26      0.02     14.18      0.00      2.70      0.04      7.26      0.68
21:54:01          sde     69.39      0.02   2613.95      0.00     37.67      1.29     19.08      3.70
21:54:01          sdf     67.85      0.02   2618.41      0.00     38.59      1.56     23.50      3.67
21:55:00          sdb      7.03      0.05     30.49      0.00      4.34      0.06      9.19      0.91
21:55:00          sdc      5.81      0.05     13.36      0.00      2.31      0.05      9.83      0.82
21:55:00          sdd      6.09      0.05     16.59      0.00      2.73      0.07     11.96      0.78
21:55:00          sde     66.03      0.12   2727.33      0.00     41.30      0.99     15.43      3.53
21:55:00          sdf     66.49      0.05   2731.70      0.00     41.09      1.24     19.20      3.52
21:56:01          sdb      5.13      0.02     21.36      0.00      4.17      0.03      7.05      0.66
21:56:01          sdc      4.65      0.02     12.07      0.00      2.60      0.03      7.43      0.58
21:56:01          sdd      4.19      0.02     10.09      0.00      2.42      0.04      8.84      0.56
21:56:01          sde     48.02      0.02   2200.92      0.00     45.83      4.83    101.11      3.05
21:56:01          sdf     50.43      0.02   2208.57      0.00     43.80      5.34    106.47      3.13
21:57:01          sdb      5.87      0.02     24.63      0.00      4.20      0.04      7.44      0.76
21:57:01          sdc      4.98      0.02     11.65      0.00      2.35      0.04      8.58      0.70
21:57:01          sdd      5.09      0.02     12.45      0.00      2.45      0.03      7.14      0.67
21:57:01          sde     59.42      0.56   2439.73      0.00     41.07      2.49     42.36      3.10
21:57:01          sdf     61.40      0.02   2446.59      0.00     39.85      0.79     13.36      3.18
21:58:01          sdb      6.36      0.03     30.14      0.00      4.74      0.04      6.25      0.85
21:58:01          sdc      5.30      0.03     12.04      0.00      2.28      0.04      7.80      0.74
21:58:01          sdd      5.40      0.03     12.85      0.00      2.38      0.04      6.80      0.72
21:58:01          sde     58.39      0.29   2092.45      0.00     35.84      0.56     10.04      2.83
21:58:01          sdf     58.92      0.03   2102.81      0.00     35.69      0.60     10.75      2.87
21:59:00          sdb      7.47      0.03     61.00      0.00      8.16      0.04      6.25      0.96
21:59:00          sdc      5.68      0.03     12.87      0.00      2.27      0.04      7.30      0.81
21:59:00          sdd      5.70      0.03     13.41      0.00      2.36      0.04      7.63      0.76
21:59:00          sde     63.15      0.29   2417.43      0.00     38.29      0.68     11.27      3.29
21:59:00          sdf     64.77      0.03   2459.33      0.00     37.97      1.16     18.47      3.38
Average:          sdb      7.33      0.54    101.03      0.00     13.86      0.07      9.65      0.68
Average:          sdc      4.07      0.35     19.11      0.00      4.78      0.04     10.41      0.50
Average:          sdd      4.02      0.42     19.50      0.00      4.95      0.03      8.29      0.48
Average:          sde     49.50      1.92   2101.35      0.00     42.49      4.44     90.20      2.66
Average:          sdf     52.62      1.49   2167.92      0.00     41.22      5.22     99.74      2.76

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs freezing on writes
  2020-04-07 20:39     ` kjansen387
@ 2020-04-07 22:11       ` Chris Murphy
  2020-04-07 22:30         ` kjansen387
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Murphy @ 2020-04-07 22:11 UTC (permalink / raw)
  To: kjansen387; +Cc: Btrfs BTRFS, Qu Wenruo

On Tue, Apr 7, 2020 at 2:39 PM kjansen387 <kjansen387@gmail.com> wrote:



>$ grep /export /etc/fstab
>UUID=8ce9e167-57ea-4cf8-8678-3049ba028c12 /export       btrfs
>device=/dev/sde,device=/dev/sdf    0 2

I'd use only noatime for options. There's no advantage of specifying
devices. And also fs_passno can be 0. fsck.btrfs is a no op anyway, so
it doesn't hurt anything to leave it at 2.



> I've attached sysstat info of my disks. What's obvious is that 2 disks
> have the load (one is written to, the other one is the mirror), and 3
> are pretty idle. But, it's 2.4MB per second - that's not much!

Lots of small file writes maybe? What's iostat show for utilization?
Or vmstat for io? Hard drives of course have limited IO per second.

> I've just changed the space_cache to v2, but it doesn't seem to help
> much. 'sync -f /export/tmp' still takes very long at times (just took 22
> seconds!)

I just did a strace on this command and it uses syncfs, not fsync. I'm
pretty sure on Btrfs this is a full filesystem sync, which is
expensive, all data and metadata. So if it's very dirty, yeah it could
take some time to flush everything to disk.

Try it without -f and you'll get fsync.


> Any way we can find the cause, before I move everything into subvolumes
> ? I'd like to avoid that if possible. Sounds a bit overkill for 2.4MB/s
> writes, and I think most of it is going to one influxdb database anyway.

From the sysrq...
[937013.794093] mysqld          D    0 10400      1 0x00004000
[937013.794240]  do_fsync+0x38/0x70

[937013.794253] mysqld          D    0 10412      1 0x00004000
[937013.794297]  do_fsync+0x38/0x70

[937013.794306] mysqld          D    0 10421      1 0x00004000
[937013.794353]  do_fsync+0x38/0x70

[937013.794788] WTJourn.Flusher D    0 1186978 1186954 0x00004320
[937013.794894]  do_fsync+0x38/0x70

[937013.794903] ftdc            D    0 1186980 1186954 0x00000320
[937013.794951]  ? btrfs_create+0x200/0x200 [btrfs]

[937013.795022] influxd         D    0 2082782      1 0x00004000
[937013.795086]  do_fsync+0x38/0x70

[937013.795098] influxd         D    0 2082804      1 0x00004000
[937013.795143]  do_fsync+0x38/0x70

[937013.795191] vim             D    0 2228648 2223845 0x00004000
[937013.795286]  do_fsync+0x38/0x70

Quite a lot of fsync all at once.

But I'm not very good at parsing kernel output. Maybe a developer will
have input.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs freezing on writes
  2020-04-07 22:11       ` Chris Murphy
@ 2020-04-07 22:30         ` kjansen387
  0 siblings, 0 replies; 13+ messages in thread
From: kjansen387 @ 2020-04-07 22:30 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Qu Wenruo

Ah, true, sync -f does a syncfs.

sync without -f on an existing file is always fast. But when I strace 
vim, fsync is slow (sometimes >10 seconds) because it needs to write 
something.

Hopefully a developer can reply - thanks so far!



On 08-Apr-20 00:11, Chris Murphy wrote:
> On Tue, Apr 7, 2020 at 2:39 PM kjansen387 <kjansen387@gmail.com> wrote:
> 
> 
> 
>> $ grep /export /etc/fstab
>> UUID=8ce9e167-57ea-4cf8-8678-3049ba028c12 /export       btrfs
>> device=/dev/sde,device=/dev/sdf    0 2
> 
> I'd use only noatime for options. There's no advantage of specifying
> devices. And also fs_passno can be 0. fsck.btrfs is a no op anyway, so
> it doesn't hurt anything to leave it at 2.
> 
> 
> 
>> I've attached sysstat info of my disks. What's obvious is that 2 disks
>> have the load (one is written to, the other one is the mirror), and 3
>> are pretty idle. But, it's 2.4MB per second - that's not much!
> 
> Lots of small file writes maybe? What's iostat show for utilization?
> Or vmstat for io? Hard drives of course have limited IO per second.
> 
>> I've just changed the space_cache to v2, but it doesn't seem to help
>> much. 'sync -f /export/tmp' still takes very long at times (just took 22
>> seconds!)
> 
> I just did a strace on this command and it uses syncfs, not fsync. I'm
> pretty sure on Btrfs this is a full filesystem sync, which is
> expensive, all data and metadata. So if it's very dirty, yeah it could
> take some time to flush everything to disk.
> 
> Try it without -f and you'll get fsync.
> 
> 
>> Any way we can find the cause, before I move everything into subvolumes
>> ? I'd like to avoid that if possible. Sounds a bit overkill for 2.4MB/s
>> writes, and I think most of it is going to one influxdb database anyway.
> 
>  From the sysrq...
> [937013.794093] mysqld          D    0 10400      1 0x00004000
> [937013.794240]  do_fsync+0x38/0x70
> 
> [937013.794253] mysqld          D    0 10412      1 0x00004000
> [937013.794297]  do_fsync+0x38/0x70
> 
> [937013.794306] mysqld          D    0 10421      1 0x00004000
> [937013.794353]  do_fsync+0x38/0x70
> 
> [937013.794788] WTJourn.Flusher D    0 1186978 1186954 0x00004320
> [937013.794894]  do_fsync+0x38/0x70
> 
> [937013.794903] ftdc            D    0 1186980 1186954 0x00000320
> [937013.794951]  ? btrfs_create+0x200/0x200 [btrfs]
> 
> [937013.795022] influxd         D    0 2082782      1 0x00004000
> [937013.795086]  do_fsync+0x38/0x70
> 
> [937013.795098] influxd         D    0 2082804      1 0x00004000
> [937013.795143]  do_fsync+0x38/0x70
> 
> [937013.795191] vim             D    0 2228648 2223845 0x00004000
> [937013.795286]  do_fsync+0x38/0x70
> 
> Quite a lot of fsync all at once.
> 
> But I'm not very good at parsing kernel output. Maybe a developer will
> have input.
> 
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs freezing on writes
  2020-04-07 19:46 btrfs freezing on writes kjansen387
  2020-04-07 20:11 ` Chris Murphy
@ 2020-04-09  4:32 ` Zygo Blaxell
  2020-04-09 21:53   ` kjansen387
  1 sibling, 1 reply; 13+ messages in thread
From: Zygo Blaxell @ 2020-04-09  4:32 UTC (permalink / raw)
  To: kjansen387; +Cc: linux-btrfs

On Tue, Apr 07, 2020 at 09:46:29PM +0200, kjansen387 wrote:
> Hello,
> 
> I'm using btrfs on fedora 31 running 5.5.10-200.fc31.x86_64 .
> 
> I've moved my workload from md raid5 to btrfs raid1.
> # btrfs filesystem show
> Label: none  uuid: 8ce9e167-57ea-4cf8-8678-3049ba028c12
>         Total devices 5 FS bytes used 3.73TiB
>         devid    1 size 3.64TiB used 2.53TiB path /dev/sde
>         devid    2 size 3.64TiB used 2.53TiB path /dev/sdf
>         devid    3 size 1.82TiB used 902.00GiB path /dev/sdb
>         devid    4 size 1.82TiB used 902.03GiB path /dev/sdc
>         devid    5 size 1.82TiB used 904.03GiB path /dev/sdd
> 
> # btrfs fi df /export
> Data, RAID1: total=3.85TiB, used=3.72TiB
> System, RAID1: total=32.00MiB, used=608.00KiB
> Metadata, RAID1: total=6.00GiB, used=5.16GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B

Try 'btrfs fi usage /export'.

I'm guessing most or all of your metadata is on sde and sdf, because
that's where the raid1 allocator would initially put it if you put those
drives together in 'mkfs.btrfs'.  'fi usage' will confirm or refute
that guess.

If this is true, then the following set of commands may help:

	btrfs fi resize 1:-4g /export;
	btrfs fi resize 2:-4g /export;
	btrfs balance start -mdevid=1 /export;
	btrfs fi resize 1:max /export;
	btrfs fi resize 2:max /export;

Right now your filesystem has equal space unallocated on each drive.
The first resize commands create a temporary shortage of space on the 2
largest drives, which will cause the allocator to place new metadata on
the 3 smaller ones to equalize free space during the balance.  The last
2 resize commands let you use all the space again--they will be filled in
with later data allocations.  A 4G size reduction is required because you
have 5 GB of metadata and 5 disks.  5 GB / 5 disks = 1 GB per disk, and
5 GB (total metadata on disks 1,2) - 1 GB (desired metadata on each disk)
= 4 GB (how much metadata we want to displace from the bigger disks).

This will probably take 2-3 hours, and the system will be very laggy
while it runs.  Best to leave it overnight.

I see that 'space_cache=v2' has already been suggested.  That usually
helps a lot especially on big filesystems.  space_cache=v2 does an
amount of work proportional to committed data size during a commit, while
space_cache=v1 does an amount of work proportional to *filesystem* size
(and committed data size too, but that is usually negligible if your
workload is nothing but write and fsync).

> After moving to btrfs, I'm seeing freezes of ~10 seconds very, very often
> (multiple times per minute). Mariadb, vim, influxdb, etc.
> 
> See attachment for a stacktrace of vim, and the dmesg output of 'echo w >
> /proc/sysrq-trigger' also including other hanging processes.
> 
> What's going on.. ? Hope someone can help.
> 
> Thanks,
> Klaas

> [937013.794253] mysqld          D    0 10412      1 0x00004000
> [937013.794254] Call Trace:
> [937013.794255]  ? __schedule+0x2c7/0x740
> [937013.794256]  schedule+0x4a/0xb0
> [937013.794264]  wait_for_commit+0x58/0x80 [btrfs]
> [937013.794266]  ? finish_wait+0x80/0x80
> [937013.794274]  btrfs_commit_transaction+0x87b/0xa20 [btrfs]
> [937013.794276]  ? dput+0xb6/0x2d0
> [937013.794286]  ? btrfs_log_dentry_safe+0x55/0x70 [btrfs]
> [937013.794295]  btrfs_sync_file+0x3b2/0x400 [btrfs]
> [937013.794297]  do_fsync+0x38/0x70
> [937013.794298]  __x64_sys_fsync+0x10/0x20
> [937013.794299]  do_syscall_64+0x5b/0x1c0
> [937013.794300]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

If fsync() can't do everything it wants to do with the (very fast)
log tree, it switches to a (not fast at all) full filesystem commit.
When that happens, the process calling fsync is stuck waiting for another
process to complete a transaction so it can start its own.

> [937013.795098] influxd         D    0 2082804      1 0x00004000
> [937013.795099] Call Trace:
> [937013.795100]  ? __schedule+0x2c7/0x740
> [937013.795102]  schedule+0x4a/0xb0
> [937013.795110]  wait_for_commit+0x58/0x80 [btrfs]
> [937013.795112]  ? finish_wait+0x80/0x80
> [937013.795120]  btrfs_commit_transaction+0x87b/0xa20 [btrfs]
> [937013.795121]  ? dput+0xb6/0x2d0
> [937013.795132]  ? btrfs_log_dentry_safe+0x55/0x70 [btrfs]
> [937013.795141]  btrfs_sync_file+0x3b2/0x400 [btrfs]
> [937013.795143]  do_fsync+0x38/0x70
> [937013.795144]  __x64_sys_fsync+0x10/0x20
> [937013.795145]  do_syscall_64+0x5b/0x1c0
> [937013.795146]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

This one started the transaction that mysqld's fsync is waiting for.

You also have this:

> [937013.794385] postmaster      D    0  3216   2409 0x00004000
> [937013.794385] Call Trace:
> [937013.794387]  ? __schedule+0x2c7/0x740
> [937013.794388]  schedule+0x4a/0xb0
> [937013.794388]  schedule_preempt_disabled+0xa/0x10
> [937013.794390]  __mutex_lock.isra.0+0x16b/0x4d0
> [937013.794398]  ? start_transaction+0xbb/0x4c0 [btrfs]
> [937013.794408]  btrfs_pin_log_trans+0x19/0x30 [btrfs]
> [937013.794417]  btrfs_rename2+0x25b/0x1f60 [btrfs]
> [937013.794419]  ? link_path_walk.part.0+0x74/0x540
> [937013.794420]  ? path_parentat.isra.0+0x3f/0x80
> [937013.794421]  ? inode_permission+0xad/0x140
> [937013.794422]  ? vfs_rename+0x3e1/0x9e0
> [937013.794430]  ? btrfs_create+0x200/0x200 [btrfs]
> [937013.794431]  vfs_rename+0x3e1/0x9e0
> [937013.794433]  ? __d_lookup+0x5e/0x140
> [937013.794434]  ? lookup_dcache+0x18/0x60
> [937013.794435]  do_renameat2+0x381/0x530
> [937013.794437]  __x64_sys_rename+0x1f/0x30
> [937013.794438]  do_syscall_64+0x5b/0x1c0
> [937013.794439]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

I don't know if this renaming over an existing file, but if you do
delete or truncate files (including renames over existing files) a
lot, you might be accumulating delayed refs (increases or decreases
in reference counts are queued up and performed in batches on btrfs
because there are so many).  Try 'slabtop', see if there are a lot of
'btrfs_delayed_ref_head' slabs ("a lot" is more than 10,000).

It's not likely you will hit this problem unless you are deleting a lot of
files, deleting a snapshot, or running a balance; however, if you are
doing those things, they certainly won't help with commit latency.

If you're hitting that bug, there are fixes in development that may help
(probably one or two kernel releases out though).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs freezing on writes
  2020-04-09  4:32 ` Zygo Blaxell
@ 2020-04-09 21:53   ` kjansen387
  2020-04-09 23:07     ` Zygo Blaxell
  0 siblings, 1 reply; 13+ messages in thread
From: kjansen387 @ 2020-04-09 21:53 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs

Hi Zygo,

Thanks for your reply!

 > Try 'btrfs fi usage /export'.
I'm removing /dev/sdd at the moment, my usage now (metadata usage was 
the same a few weeks back):

# btrfs fi usage /export
Overall:
     Device size:                  10.92TiB
     Device allocated:              7.58TiB
     Device unallocated:            3.34TiB
     Device missing:                  0.00B
     Used:                          7.45TiB
     Free (estimated):              1.73TiB      (min: 1.73TiB)
     Data ratio:                       2.00
     Metadata ratio:                   2.00
     Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID1: Size:3.78TiB, Used:3.72TiB (98.35%)
    /dev/sde        2.53TiB
    /dev/sdf        2.52TiB
    /dev/sdb      869.00GiB
    /dev/sdc      868.00GiB
    /dev/sdd      838.00GiB

Metadata,RAID1: Size:6.00GiB, Used:5.18GiB (86.40%)
    /dev/sde        4.00GiB
    /dev/sdf        5.00GiB
    /dev/sdb        1.00GiB
    /dev/sdc        1.00GiB
    /dev/sdd        1.00GiB

System,RAID1: Size:32.00MiB, Used:592.00KiB (1.81%)
    /dev/sdc       32.00MiB
    /dev/sdd       32.00MiB

Unallocated:
    /dev/sde        1.11TiB
    /dev/sdf        1.11TiB
    /dev/sdb      993.02GiB
    /dev/sdc      993.99GiB
    /dev/sdd     -839.03GiB


What this suggests to me is that I don't have 5GB metadata - I have 12GB ?

 >
 > 	btrfs fi resize 1:-4g /export;
 > 	btrfs fi resize 2:-4g /export;
 > 	btrfs balance start -mdevid=1 /export;
 > 	btrfs fi resize 1:max /export;
 > 	btrfs fi resize 2:max /export;

I'm moving from 5 to 4 disks for the time being, assuming metadata stays 
the same I guess I'd have to aim for 3GB metadata per disk. Do I have to 
change the commands like this ?

btrfs fi resize 1:-1g /export;           # Assuming 4GB metadata
btrfs fi resize 2:-2g /export;           # Assuming 5GB metadata
btrfs balance start -mdevid=1 /export;   # Why only devid 1, and not 2 ?
btrfs fi resize 1:max /export;
btrfs fi resize 2:max /export;

Thanks!


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs freezing on writes
  2020-04-09 21:53   ` kjansen387
@ 2020-04-09 23:07     ` Zygo Blaxell
  2020-04-11 19:46       ` kjansen387
  0 siblings, 1 reply; 13+ messages in thread
From: Zygo Blaxell @ 2020-04-09 23:07 UTC (permalink / raw)
  To: kjansen387; +Cc: linux-btrfs

On Thu, Apr 09, 2020 at 11:53:00PM +0200, kjansen387 wrote:
> Hi Zygo,
> 
> Thanks for your reply!
> 
> > Try 'btrfs fi usage /export'.
> I'm removing /dev/sdd at the moment, my usage now (metadata usage was the
> same a few weeks back):
> 
> # btrfs fi usage /export
> Overall:
>     Device size:                  10.92TiB
>     Device allocated:              7.58TiB
>     Device unallocated:            3.34TiB
>     Device missing:                  0.00B
>     Used:                          7.45TiB
>     Free (estimated):              1.73TiB      (min: 1.73TiB)
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 0.00B)
> 
> Data,RAID1: Size:3.78TiB, Used:3.72TiB (98.35%)
>    /dev/sde        2.53TiB
>    /dev/sdf        2.52TiB
>    /dev/sdb      869.00GiB
>    /dev/sdc      868.00GiB
>    /dev/sdd      838.00GiB
> 
> Metadata,RAID1: Size:6.00GiB, Used:5.18GiB (86.40%)
>    /dev/sde        4.00GiB
>    /dev/sdf        5.00GiB
>    /dev/sdb        1.00GiB
>    /dev/sdc        1.00GiB
>    /dev/sdd        1.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:592.00KiB (1.81%)
>    /dev/sdc       32.00MiB
>    /dev/sdd       32.00MiB
> 
> Unallocated:
>    /dev/sde        1.11TiB
>    /dev/sdf        1.11TiB
>    /dev/sdb      993.02GiB
>    /dev/sdc      993.99GiB
>    /dev/sdd     -839.03GiB
> 
> 
> What this suggests to me is that I don't have 5GB metadata - I have 12GB ?

It's RAID1, so there are 2 sets of 6x 1 GB block groups.  Each block
group consists of a mirrored pair of 1GB chunks on 2 different disks.

They are not all on sde and sdf as I had guessed before.  The worst case
would be that 4GB are mirrored between sde and sdf, then 1GB between
sdc and sdd, and 1GB between sdf and sdb (i.e. sdf has 4GB mirrored with
sde and 1GB mirrored with sdb for a total of 5GB on sdf), but certainly
other combinations are possible to arrive at those numbers.

"Metadata,RAID1: Size:6.00GiB, Used:5.18GiB" is referring to logical
sizes--you have 6 logical GB allocted for 5.18 logical GB of
metadata--while the lines that follow are referring to physical sizes
on each disk (a total of 12 GB allocated on all disks).

You'll notice system and data follow the same pattern.

> > 	btrfs fi resize 1:-4g /export;
> > 	btrfs fi resize 2:-4g /export;
> > 	btrfs balance start -mdevid=1 /export;
> > 	btrfs fi resize 1:max /export;
> > 	btrfs fi resize 2:max /export;
> 
> I'm moving from 5 to 4 disks for the time being, assuming metadata stays the
> same I guess I'd have to aim for 3GB metadata per disk. Do I have to change
> the commands like this ?
> 
> btrfs fi resize 1:-1g /export;           # Assuming 4GB metadata
> btrfs fi resize 2:-2g /export;           # Assuming 5GB metadata

Based on current data, yes; however, it's possible that the device remove
you are already running might balance the metadata as a side-effect.
Redo the math with the values you get after the device remove is done.
You may not need to balance anything.

> btrfs balance start -mdevid=1 /export;   # Why only devid 1, and not 2 ?

We want balance to relocate metadata block groups that are on both
devids 1 and 2, i.e. the BG has a chunk on both drives at the same time.
Balance filters only allow one devid to be specified, but in this case
'devid=1' or 'devid=2' is close enough.  All we want to do here is filter
out block groups where one mirror chunk is already on devid 3, 4, or 5,
since that would just place the metadata somewhere else on the same disks.

> btrfs fi resize 1:max /export;
> btrfs fi resize 2:max /export;
> 
> Thanks!
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs freezing on writes
  2020-04-09 23:07     ` Zygo Blaxell
@ 2020-04-11 19:46       ` kjansen387
  2020-04-11 19:59         ` Chris Murphy
  2020-04-11 20:21         ` Zygo Blaxell
  0 siblings, 2 replies; 13+ messages in thread
From: kjansen387 @ 2020-04-11 19:46 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs

I have tried to rebalance metadata..

Starting point:
# btrfs fi usage /storage
Overall:
     Device size:                  10.92TiB
     Device allocated:              7.45TiB
     Device unallocated:            3.47TiB
     Device missing:                  0.00B
     Used:                          7.35TiB
     Free (estimated):              1.78TiB      (min: 1.78TiB)
     Data ratio:                       2.00
     Metadata ratio:                   2.00
     Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID1: Size:3.72TiB, Used:3.67TiB (98.74%)
    /dev/sdc        2.81TiB
    /dev/sdb        2.81TiB
    /dev/sda     1017.00GiB
    /dev/sdd      840.00GiB

Metadata,RAID1: Size:6.00GiB, Used:5.09GiB (84.86%)
    /dev/sdc        3.00GiB
    /dev/sdb        3.00GiB
    /dev/sda        1.00GiB
    /dev/sdd        5.00GiB

System,RAID1: Size:32.00MiB, Used:608.00KiB (1.86%)
    /dev/sdb       32.00MiB
    /dev/sdd       32.00MiB

Unallocated:
    /dev/sdc      845.02GiB
    /dev/sdb      845.99GiB
    /dev/sda      845.02GiB
    /dev/sdd     1017.99GiB

I did:
# btrfs fi resize 4:-2g /storage/
# btrfs balance start -mdevid=4 /storage
# btrfs fi resize 4:max /storage/

but the distribution of metadata ended up like before.

I also tried (to match the free space of the other disks):
# btrfs fi resize 4:-172g /storage/
# btrfs balance start -mdevid=4 /storage
# btrfs fi resize 4:max /storage/

again, the distribution of metadata ended up like before..

Any other tips to rebalance metadata ?


On 10-Apr-20 01:07, Zygo Blaxell wrote:
> On Thu, Apr 09, 2020 at 11:53:00PM +0200, kjansen387 wrote:
>> btrfs fi resize 1:-1g /export;           # Assuming 4GB metadata
>> btrfs fi resize 2:-2g /export;           # Assuming 5GB metadata
> 
> Based on current data, yes; however, it's possible that the device remove
> you are already running might balance the metadata as a side-effect.
> Redo the math with the values you get after the device remove is done.
> You may not need to balance anything.
> 
>> btrfs balance start -mdevid=1 /export;   # Why only devid 1, and not 2 ?
> 
> We want balance to relocate metadata block groups that are on both
> devids 1 and 2, i.e. the BG has a chunk on both drives at the same time.
> Balance filters only allow one devid to be specified, but in this case
> 'devid=1' or 'devid=2' is close enough.  All we want to do here is filter
> out block groups where one mirror chunk is already on devid 3, 4, or 5,
> since that would just place the metadata somewhere else on the same disks.
> 
>> btrfs fi resize 1:max /export;
>> btrfs fi resize 2:max /export;
>>
>> Thanks!
>>
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs freezing on writes
  2020-04-11 19:46       ` kjansen387
@ 2020-04-11 19:59         ` Chris Murphy
  2020-04-11 20:21         ` Zygo Blaxell
  1 sibling, 0 replies; 13+ messages in thread
From: Chris Murphy @ 2020-04-11 19:59 UTC (permalink / raw)
  To: kjansen387; +Cc: Zygo Blaxell, Btrfs BTRFS

I don't know how much the size of the metadata chunk matters with
this, but I see 256M, 512M, and 1G metadata chunks on my file systems.

Try this for length only, which is in bytes:
$ sudo btrfs insp dump-t -t chunk /dev/nvme0n1p7 | grep METADATA

This will show the entire item, A5 might be needed to show both
devuuids, since you're using raid1, and might want to know on which
device the chunk is located; the devuuid is found in the superblock.
$ sudo btrfs insp dump-t -t chunk /dev/nvme0n1p7 | grep -A4 -B1 METADATA

Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs freezing on writes
  2020-04-11 19:46       ` kjansen387
  2020-04-11 19:59         ` Chris Murphy
@ 2020-04-11 20:21         ` Zygo Blaxell
  2020-04-15 11:14           ` kjansen387
  1 sibling, 1 reply; 13+ messages in thread
From: Zygo Blaxell @ 2020-04-11 20:21 UTC (permalink / raw)
  To: kjansen387; +Cc: linux-btrfs

On Sat, Apr 11, 2020 at 09:46:43PM +0200, kjansen387 wrote:
> I have tried to rebalance metadata..
> 
> Starting point:
> # btrfs fi usage /storage
> Overall:
>     Device size:                  10.92TiB
>     Device allocated:              7.45TiB
>     Device unallocated:            3.47TiB
>     Device missing:                  0.00B
>     Used:                          7.35TiB
>     Free (estimated):              1.78TiB      (min: 1.78TiB)
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 0.00B)
> 
> Data,RAID1: Size:3.72TiB, Used:3.67TiB (98.74%)
>    /dev/sdc        2.81TiB
>    /dev/sdb        2.81TiB
>    /dev/sda     1017.00GiB
>    /dev/sdd      840.00GiB
> 
> Metadata,RAID1: Size:6.00GiB, Used:5.09GiB (84.86%)
>    /dev/sdc        3.00GiB
>    /dev/sdb        3.00GiB
>    /dev/sda        1.00GiB
>    /dev/sdd        5.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:608.00KiB (1.86%)
>    /dev/sdb       32.00MiB
>    /dev/sdd       32.00MiB
> 
> Unallocated:
>    /dev/sdc      845.02GiB
>    /dev/sdb      845.99GiB
>    /dev/sda      845.02GiB
>    /dev/sdd     1017.99GiB
>
> I did:
> # btrfs fi resize 4:-2g /storage/
> # btrfs balance start -mdevid=4 /storage
> # btrfs fi resize 4:max /storage/
> 
> but the distribution of metadata ended up like before.
> 
> I also tried (to match the free space of the other disks):
> # btrfs fi resize 4:-172g /storage/
> # btrfs balance start -mdevid=4 /storage
> # btrfs fi resize 4:max /storage/
> 
> again, the distribution of metadata ended up like before..
> 
> Any other tips to rebalance metadata ?

The purpose of resize -2g was to make a little less unallocated space
on one drive compared to all the others, starting with all the drives
having equal unallocated space.  The purpose of resize -172g was to
make the extra unallocated space on sdd go away, so it would be equal to
the other 3 drives.  You have to do _both_ of those before the balance.
Or just add the two numbers, i.e. resize -174g.

If you really want to be sure, resize by -200g (far more than necessary),
then balance start -mlimit=4,devid=4.  The balance is "I know there are
exactly 5 block groups now, and I want to leave exactly one behind,"
and the resize is "I want no possibility of new block groups on sdd for
some time."

> On 10-Apr-20 01:07, Zygo Blaxell wrote:
> > On Thu, Apr 09, 2020 at 11:53:00PM +0200, kjansen387 wrote:
> > > btrfs fi resize 1:-1g /export;           # Assuming 4GB metadata
> > > btrfs fi resize 2:-2g /export;           # Assuming 5GB metadata
> > 
> > Based on current data, yes; however, it's possible that the device remove
> > you are already running might balance the metadata as a side-effect.
> > Redo the math with the values you get after the device remove is done.
> > You may not need to balance anything.
> > 
> > > btrfs balance start -mdevid=1 /export;   # Why only devid 1, and not 2 ?
> > 
> > We want balance to relocate metadata block groups that are on both
> > devids 1 and 2, i.e. the BG has a chunk on both drives at the same time.
> > Balance filters only allow one devid to be specified, but in this case
> > 'devid=1' or 'devid=2' is close enough.  All we want to do here is filter
> > out block groups where one mirror chunk is already on devid 3, 4, or 5,
> > since that would just place the metadata somewhere else on the same disks.
> > 
> > > btrfs fi resize 1:max /export;
> > > btrfs fi resize 2:max /export;
> > > 
> > > Thanks!
> > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: btrfs freezing on writes
  2020-04-11 20:21         ` Zygo Blaxell
@ 2020-04-15 11:14           ` kjansen387
  0 siblings, 0 replies; 13+ messages in thread
From: kjansen387 @ 2020-04-15 11:14 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs

OK thanks. I did try 2g on top of 172g, that didn't work, but after 
playing with 'limit' it did, all disks now have 3GB metadata.

On 11-Apr-20 22:21, Zygo Blaxell wrote:

> The purpose of resize -2g was to make a little less unallocated space
> on one drive compared to all the others, starting with all the drives
> having equal unallocated space.  The purpose of resize -172g was to
> make the extra unallocated space on sdd go away, so it would be equal to
> the other 3 drives.  You have to do _both_ of those before the balance.
> Or just add the two numbers, i.e. resize -174g.
> 
> If you really want to be sure, resize by -200g (far more than necessary),
> then balance start -mlimit=4,devid=4.  The balance is "I know there are
> exactly 5 block groups now, and I want to leave exactly one behind,"
> and the resize is "I want no possibility of new block groups on sdd for
> some time."
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-04-15 11:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-07 19:46 btrfs freezing on writes kjansen387
2020-04-07 20:11 ` Chris Murphy
2020-04-07 20:22   ` Chris Murphy
2020-04-07 20:39     ` kjansen387
2020-04-07 22:11       ` Chris Murphy
2020-04-07 22:30         ` kjansen387
2020-04-09  4:32 ` Zygo Blaxell
2020-04-09 21:53   ` kjansen387
2020-04-09 23:07     ` Zygo Blaxell
2020-04-11 19:46       ` kjansen387
2020-04-11 19:59         ` Chris Murphy
2020-04-11 20:21         ` Zygo Blaxell
2020-04-15 11:14           ` kjansen387

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).