btrfs induced data loss (on xfs)

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* btrfs induced data loss (on xfs) - 5.19.0-38-generic
@ 2023-04-17  5:20 Kyle Sanderson
  2023-04-17  6:01 ` Greg KH
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Kyle Sanderson @ 2023-04-17  5:20 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Linux-Kernal, linux-block, linux-fsdevel, Linus Torvalds, Greg KH

The single btrfs disk was at 100% utilization and a wa of 50~, reading
back at around 2MB/s. df and similar would simply freeze. Leading up
to this I removed around 2T of data from a single btrfs disk. I
managed to get most of the services shutdown and disks unmounted, but
when the system came back up I had to use xfs_repair (for the first
time in a very long time) to boot into my system. I likely should have
just pulled the power...

[1147997.255020] INFO: task happywriter:3425205 blocked for more than
120 seconds.
[1147997.255088]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
[1147997.255114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1147997.255144] task:happywriter state:D stack:    0 pid:3425205
ppid:557021 flags:0x00004000
[1147997.255151] Call Trace:
[1147997.255155]  <TASK>
[1147997.255160]  __schedule+0x257/0x5d0
[1147997.255169]  ? __schedule+0x25f/0x5d0
[1147997.255173]  schedule+0x68/0x110
[1147997.255176]  rwsem_down_write_slowpath+0x2ee/0x5a0
[1147997.255180]  ? check_heap_object+0x100/0x1e0
[1147997.255185]  down_write+0x4f/0x70
[1147997.255189]  do_unlinkat+0x12b/0x2d0
[1147997.255194]  __x64_sys_unlink+0x42/0x70
[1147997.255197]  ? syscall_exit_to_user_mode+0x2a/0x50
[1147997.255201]  do_syscall_64+0x59/0x90
[1147997.255204]  ? do_syscall_64+0x69/0x90
[1147997.255207]  ? sysvec_call_function_single+0x4e/0xb0
[1147997.255211]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[1147997.255216] RIP: 0033:0x1202a57
[1147997.255220] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000057
[1147997.255224] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
0000000001202a57
[1147997.255226] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
00007fe450054b60
[1147997.255228] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
00000000000000e8
[1147997.255231] R10: 0000000000000001 R11: 0000000000000246 R12:
00007fe467ffd5b0
[1147997.255233] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
00007fe3a4e94d28
[1147997.255239]  </TASK>
[1148118.087966] INFO: task happywriter:3425205 blocked for more than
241 seconds.
[1148118.088022]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
[1148118.088048] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1148118.088077] task:happywriter state:D stack:    0 pid:3425205
ppid:557021 flags:0x00004000
[1148118.088083] Call Trace:
[1148118.088087]  <TASK>
[1148118.088093]  __schedule+0x257/0x5d0
[1148118.088101]  ? __schedule+0x25f/0x5d0
[1148118.088105]  schedule+0x68/0x110
[1148118.088108]  rwsem_down_write_slowpath+0x2ee/0x5a0
[1148118.088113]  ? check_heap_object+0x100/0x1e0
[1148118.088118]  down_write+0x4f/0x70
[1148118.088121]  do_unlinkat+0x12b/0x2d0
[1148118.088126]  __x64_sys_unlink+0x42/0x70
[1148118.088129]  ? syscall_exit_to_user_mode+0x2a/0x50
[1148118.088133]  do_syscall_64+0x59/0x90
[1148118.088136]  ? do_syscall_64+0x69/0x90
[1148118.088139]  ? sysvec_call_function_single+0x4e/0xb0
[1148118.088142]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[1148118.088148] RIP: 0033:0x1202a57
[1148118.088151] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000057
[1148118.088155] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
0000000001202a57
[1148118.088158] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
00007fe450054b60
[1148118.088160] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
00000000000000e8
[1148118.088162] R10: 0000000000000001 R11: 0000000000000246 R12:
00007fe467ffd5b0
[1148118.088164] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
00007fe3a4e94d28
[1148118.088170]  </TASK>
[1148238.912688] INFO: task kcompactd0:70 blocked for more than 120 seconds.
[1148238.912741]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
[1148238.912767] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1148238.912796] task:kcompactd0      state:D stack:    0 pid:   70
ppid:     2 flags:0x00004000
[1148238.912803] Call Trace:
[1148238.912806]  <TASK>
[1148238.912812]  __schedule+0x257/0x5d0
[1148238.912821]  schedule+0x68/0x110
[1148238.912824]  io_schedule+0x46/0x80
[1148238.912827]  folio_wait_bit_common+0x14c/0x3a0
[1148238.912833]  ? filemap_invalidate_unlock_two+0x50/0x50
[1148238.912838]  __folio_lock+0x17/0x30
[1148238.912842]  __unmap_and_move.constprop.0+0x39c/0x640
[1148238.912847]  ? free_unref_page+0xe3/0x190
[1148238.912852]  ? move_freelist_tail+0xe0/0xe0
[1148238.912855]  ? move_freelist_tail+0xe0/0xe0
[1148238.912858]  unmap_and_move+0x7d/0x4e0
[1148238.912862]  migrate_pages+0x3b8/0x770
[1148238.912867]  ? move_freelist_tail+0xe0/0xe0
[1148238.912870]  ? isolate_freepages+0x2f0/0x2f0
[1148238.912874]  compact_zone+0x2ca/0x620
[1148238.912878]  proactive_compact_node+0x8a/0xe0
[1148238.912883]  kcompactd+0x21c/0x4e0
[1148238.912886]  ? destroy_sched_domains_rcu+0x40/0x40
[1148238.912892]  ? kcompactd_do_work+0x240/0x240
[1148238.912896]  kthread+0xeb/0x120
[1148238.912900]  ? kthread_complete_and_exit+0x20/0x20
[1148238.912904]  ret_from_fork+0x1f/0x30
[1148238.912911]  </TASK>
[1148238.913163] INFO: task happywriter:3425205 blocked for more than
362 seconds.
[1148238.913195]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
[1148238.913220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1148238.913250] task:happywriter state:D stack:    0 pid:3425205
ppid:557021 flags:0x00004000
[1148238.913254] Call Trace:
[1148238.913256]  <TASK>
[1148238.913259]  __schedule+0x257/0x5d0
[1148238.913264]  ? __schedule+0x25f/0x5d0
[1148238.913267]  schedule+0x68/0x110
[1148238.913270]  rwsem_down_write_slowpath+0x2ee/0x5a0
[1148238.913276]  ? check_heap_object+0x100/0x1e0
[1148238.913280]  down_write+0x4f/0x70
[1148238.913284]  do_unlinkat+0x12b/0x2d0
[1148238.913288]  __x64_sys_unlink+0x42/0x70
[1148238.913292]  ? syscall_exit_to_user_mode+0x2a/0x50
[1148238.913296]  do_syscall_64+0x59/0x90
[1148238.913299]  ? do_syscall_64+0x69/0x90
[1148238.913301]  ? sysvec_call_function_single+0x4e/0xb0
[1148238.913305]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[1148238.913310] RIP: 0033:0x1202a57
[1148238.913315] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000057
[1148238.913319] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
0000000001202a57
[1148238.913321] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
00007fe450054b60
[1148238.913323] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
00000000000000e8
[1148238.913325] R10: 0000000000000001 R11: 0000000000000246 R12:
00007fe467ffd5b0
[1148238.913328] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
00007fe3a4e94d28
[1148238.913333]  </TASK>
[1148238.913429] INFO: task kworker/u16:20:3402199 blocked for more
than 120 seconds.
[1148238.913459]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
[1148238.913496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1148238.913527] task:kworker/u16:20  state:D stack:    0 pid:3402199
ppid:     2 flags:0x00004000
[1148238.913533] Workqueue: events_unbound io_ring_exit_work
[1148238.913539] Call Trace:
[1148238.913541]  <TASK>
[1148238.913544]  __schedule+0x257/0x5d0
[1148238.913548]  schedule+0x68/0x110
[1148238.913551]  schedule_timeout+0x122/0x160
[1148238.913556]  __wait_for_common+0x8f/0x190
[1148238.913559]  ? usleep_range_state+0xa0/0xa0
[1148238.913564]  wait_for_completion+0x24/0x40
[1148238.913567]  io_ring_exit_work+0x186/0x1e9
[1148238.913571]  ? io_uring_del_tctx_node+0xbf/0xbf
[1148238.913576]  process_one_work+0x21c/0x400
[1148238.913579]  worker_thread+0x50/0x3f0
[1148238.913583]  ? rescuer_thread+0x3a0/0x3a0
[1148238.913586]  kthread+0xeb/0x120
[1148238.913590]  ? kthread_complete_and_exit+0x20/0x20
[1148238.913594]  ret_from_fork+0x1f/0x30
[1148238.913600]  </TASK>
[1148238.913607] INFO: task stat:3434604 blocked for more than 120 seconds.
[1148238.913633]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
[1148238.913658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1148238.913687] task:stat            state:D stack:    0 pid:3434604
ppid:3396625 flags:0x00004004
[1148238.913691] Call Trace:
[1148238.913693]  <TASK>
[1148238.913695]  __schedule+0x257/0x5d0
[1148238.913699]  schedule+0x68/0x110
[1148238.913703]  request_wait_answer+0x13f/0x220
[1148238.913708]  ? destroy_sched_domains_rcu+0x40/0x40
[1148238.913713]  fuse_simple_request+0x1bb/0x370
[1148238.913717]  fuse_do_getattr+0xda/0x320
[1148238.913720]  ? try_to_unlazy+0x5b/0xd0
[1148238.913726]  fuse_update_get_attr+0xb3/0xf0
[1148238.913730]  fuse_getattr+0x87/0xd0
[1148238.913733]  vfs_getattr_nosec+0xba/0x100
[1148238.913737]  vfs_statx+0xa9/0x140
[1148238.913741]  vfs_fstatat+0x59/0x80
[1148238.913744]  __do_sys_newlstat+0x38/0x80
[1148238.913750]  __x64_sys_newlstat+0x16/0x20
[1148238.913753]  do_syscall_64+0x59/0x90
[1148238.913757]  ? handle_mm_fault+0xba/0x2a0
[1148238.913761]  ? exit_to_user_mode_prepare+0xaf/0xd0
[1148238.913765]  ? irqentry_exit_to_user_mode+0x9/0x20
[1148238.913769]  ? irqentry_exit+0x43/0x50
[1148238.913772]  ? exc_page_fault+0x92/0x1b0
[1148238.913776]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[1148238.913780] RIP: 0033:0x7fa76960d6ea
[1148238.913783] RSP: 002b:00007ffe6d011878 EFLAGS: 00000246 ORIG_RAX:
0000000000000006
[1148238.913787] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007fa76960d6ea
[1148238.913789] RDX: 00007ffe6d011890 RSI: 00007ffe6d011890 RDI:
00007ffe6d01290b
[1148238.913791] RBP: 00007ffe6d011a70 R08: 0000000000000001 R09:
0000000000000000
[1148238.913793] R10: fffffffffffff3ce R11: 0000000000000246 R12:
0000000000000000
[1148238.913795] R13: 000055feff0d2960 R14: 00007ffe6d011a68 R15:
00007ffe6d01290b
[1148238.913800]  </TASK>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs induced data loss (on xfs) - 5.19.0-38-generic
  2023-04-17  5:20 btrfs induced data loss (on xfs) - 5.19.0-38-generic Kyle Sanderson
@ 2023-04-17  6:01 ` Greg KH
  2023-04-17  6:07   ` Kyle Sanderson
  2023-04-17  8:42 ` Qu Wenruo
  2023-04-19  1:55 ` Dave Chinner
  2 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2023-04-17  6:01 UTC (permalink / raw)
  To: Kyle Sanderson
  Cc: linux-btrfs, Linux-Kernal, linux-block, linux-fsdevel,
	Linus Torvalds

On Sun, Apr 16, 2023 at 10:20:45PM -0700, Kyle Sanderson wrote:
> The single btrfs disk was at 100% utilization and a wa of 50~, reading
> back at around 2MB/s. df and similar would simply freeze. Leading up
> to this I removed around 2T of data from a single btrfs disk. I
> managed to get most of the services shutdown and disks unmounted, but
> when the system came back up I had to use xfs_repair (for the first
> time in a very long time) to boot into my system. I likely should have
> just pulled the power...
> 
> [1147997.255020] INFO: task happywriter:3425205 blocked for more than
> 120 seconds.
> [1147997.255088]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu

This is a distro-specific kernel, sorry, nothing to do with our releases
as the 5.19 kernel branch is long end-of-life.  Please work with your
distro for this issue if you wish to stick to this kernel version.

good luck!

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs induced data loss (on xfs) - 5.19.0-38-generic
  2023-04-17  6:01 ` Greg KH
@ 2023-04-17  6:07   ` Kyle Sanderson
  0 siblings, 0 replies; 9+ messages in thread
From: Kyle Sanderson @ 2023-04-17  6:07 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-btrfs, Linux-Kernal, linux-block, linux-fsdevel,
	Linus Torvalds

On Sun, Apr 16, 2023 at 11:01 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Sun, Apr 16, 2023 at 10:20:45PM -0700, Kyle Sanderson wrote:
> > The single btrfs disk was at 100% utilization and a wa of 50~, reading
> > back at around 2MB/s. df and similar would simply freeze. Leading up
> > to this I removed around 2T of data from a single btrfs disk. I
> > managed to get most of the services shutdown and disks unmounted, but
> > when the system came back up I had to use xfs_repair (for the first
> > time in a very long time) to boot into my system. I likely should have
> > just pulled the power...
> >
> > [1147997.255020] INFO: task happywriter:3425205 blocked for more than
> > 120 seconds.
> > [1147997.255088]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>
> This is a distro-specific kernel, sorry, nothing to do with our releases
> as the 5.19 kernel branch is long end-of-life.  Please work with your
> distro for this issue if you wish to stick to this kernel version.
>
> good luck!
>
> greg k-h

Disappointing but fair (default kernel for Ubuntu Jammy, they offer no
lts options unfortunately) - thanks for taking a look anyway.

K.

On Sun, Apr 16, 2023 at 11:01 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Sun, Apr 16, 2023 at 10:20:45PM -0700, Kyle Sanderson wrote:
> > The single btrfs disk was at 100% utilization and a wa of 50~, reading
> > back at around 2MB/s. df and similar would simply freeze. Leading up
> > to this I removed around 2T of data from a single btrfs disk. I
> > managed to get most of the services shutdown and disks unmounted, but
> > when the system came back up I had to use xfs_repair (for the first
> > time in a very long time) to boot into my system. I likely should have
> > just pulled the power...
> >
> > [1147997.255020] INFO: task happywriter:3425205 blocked for more than
> > 120 seconds.
> > [1147997.255088]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>
> This is a distro-specific kernel, sorry, nothing to do with our releases
> as the 5.19 kernel branch is long end-of-life.  Please work with your
> distro for this issue if you wish to stick to this kernel version.
>
> good luck!
>
> greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs induced data loss (on xfs) - 5.19.0-38-generic
  2023-04-17  5:20 btrfs induced data loss (on xfs) - 5.19.0-38-generic Kyle Sanderson
  2023-04-17  6:01 ` Greg KH
@ 2023-04-17  8:42 ` Qu Wenruo
  2023-04-18  7:07   ` Kyle Sanderson
  2023-04-19  1:55 ` Dave Chinner
  2 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2023-04-17  8:42 UTC (permalink / raw)
  To: Kyle Sanderson, linux-btrfs
  Cc: Linux-Kernal, linux-block, linux-fsdevel, Linus Torvalds, Greg KH



On 2023/4/17 13:20, Kyle Sanderson wrote:
> The single btrfs disk was at 100% utilization and a wa of 50~, reading
> back at around 2MB/s. df and similar would simply freeze. Leading up
> to this I removed around 2T of data from a single btrfs disk. I
> managed to get most of the services shutdown and disks unmounted, but
> when the system came back up I had to use xfs_repair (for the first
> time in a very long time) to boot into my system. I likely should have
> just pulled the power...

I didn't see any obvious btrfs related work involved in the call trace.
Thus I believe it's not really hanging in a waiting status.

And talking about the "100% utilization", did you mean a single CPU core 
is fully utilized by some btrfs thread?

If so, it looks like qgroup is involved.
Did you just recently deleted one or more snapshots? If so, it's almost 
certain qgroup is the cause of the 100% CPU utilization.

In that case, you can disable quota for that btrfs (permanently), or use 
newer kernel which has a sysfs interface:

   /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold

If you write some value like 3 into that file, btrfs would automatically 
avoid such long CPU usage caused by large snapshot dropping.

But talking about the XFS corruption, I'm not sure how btrfs can lead to 
problems in XFS...

Thanks,
Qu
> 
> [1147997.255020] INFO: task happywriter:3425205 blocked for more than
> 120 seconds.
> [1147997.255088]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> [1147997.255114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [1147997.255144] task:happywriter state:D stack:    0 pid:3425205
> ppid:557021 flags:0x00004000
> [1147997.255151] Call Trace:
> [1147997.255155]  <TASK>
> [1147997.255160]  __schedule+0x257/0x5d0
> [1147997.255169]  ? __schedule+0x25f/0x5d0
> [1147997.255173]  schedule+0x68/0x110
> [1147997.255176]  rwsem_down_write_slowpath+0x2ee/0x5a0
> [1147997.255180]  ? check_heap_object+0x100/0x1e0
> [1147997.255185]  down_write+0x4f/0x70
> [1147997.255189]  do_unlinkat+0x12b/0x2d0
> [1147997.255194]  __x64_sys_unlink+0x42/0x70
> [1147997.255197]  ? syscall_exit_to_user_mode+0x2a/0x50
> [1147997.255201]  do_syscall_64+0x59/0x90
> [1147997.255204]  ? do_syscall_64+0x69/0x90
> [1147997.255207]  ? sysvec_call_function_single+0x4e/0xb0
> [1147997.255211]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [1147997.255216] RIP: 0033:0x1202a57
> [1147997.255220] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000057
> [1147997.255224] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
> 0000000001202a57
> [1147997.255226] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 00007fe450054b60
> [1147997.255228] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
> 00000000000000e8
> [1147997.255231] R10: 0000000000000001 R11: 0000000000000246 R12:
> 00007fe467ffd5b0
> [1147997.255233] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
> 00007fe3a4e94d28
> [1147997.255239]  </TASK>
> [1148118.087966] INFO: task happywriter:3425205 blocked for more than
> 241 seconds.
> [1148118.088022]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> [1148118.088048] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [1148118.088077] task:happywriter state:D stack:    0 pid:3425205
> ppid:557021 flags:0x00004000
> [1148118.088083] Call Trace:
> [1148118.088087]  <TASK>
> [1148118.088093]  __schedule+0x257/0x5d0
> [1148118.088101]  ? __schedule+0x25f/0x5d0
> [1148118.088105]  schedule+0x68/0x110
> [1148118.088108]  rwsem_down_write_slowpath+0x2ee/0x5a0
> [1148118.088113]  ? check_heap_object+0x100/0x1e0
> [1148118.088118]  down_write+0x4f/0x70
> [1148118.088121]  do_unlinkat+0x12b/0x2d0
> [1148118.088126]  __x64_sys_unlink+0x42/0x70
> [1148118.088129]  ? syscall_exit_to_user_mode+0x2a/0x50
> [1148118.088133]  do_syscall_64+0x59/0x90
> [1148118.088136]  ? do_syscall_64+0x69/0x90
> [1148118.088139]  ? sysvec_call_function_single+0x4e/0xb0
> [1148118.088142]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [1148118.088148] RIP: 0033:0x1202a57
> [1148118.088151] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000057
> [1148118.088155] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
> 0000000001202a57
> [1148118.088158] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 00007fe450054b60
> [1148118.088160] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
> 00000000000000e8
> [1148118.088162] R10: 0000000000000001 R11: 0000000000000246 R12:
> 00007fe467ffd5b0
> [1148118.088164] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
> 00007fe3a4e94d28
> [1148118.088170]  </TASK>
> [1148238.912688] INFO: task kcompactd0:70 blocked for more than 120 seconds.
> [1148238.912741]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> [1148238.912767] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [1148238.912796] task:kcompactd0      state:D stack:    0 pid:   70
> ppid:     2 flags:0x00004000
> [1148238.912803] Call Trace:
> [1148238.912806]  <TASK>
> [1148238.912812]  __schedule+0x257/0x5d0
> [1148238.912821]  schedule+0x68/0x110
> [1148238.912824]  io_schedule+0x46/0x80
> [1148238.912827]  folio_wait_bit_common+0x14c/0x3a0
> [1148238.912833]  ? filemap_invalidate_unlock_two+0x50/0x50
> [1148238.912838]  __folio_lock+0x17/0x30
> [1148238.912842]  __unmap_and_move.constprop.0+0x39c/0x640
> [1148238.912847]  ? free_unref_page+0xe3/0x190
> [1148238.912852]  ? move_freelist_tail+0xe0/0xe0
> [1148238.912855]  ? move_freelist_tail+0xe0/0xe0
> [1148238.912858]  unmap_and_move+0x7d/0x4e0
> [1148238.912862]  migrate_pages+0x3b8/0x770
> [1148238.912867]  ? move_freelist_tail+0xe0/0xe0
> [1148238.912870]  ? isolate_freepages+0x2f0/0x2f0
> [1148238.912874]  compact_zone+0x2ca/0x620
> [1148238.912878]  proactive_compact_node+0x8a/0xe0
> [1148238.912883]  kcompactd+0x21c/0x4e0
> [1148238.912886]  ? destroy_sched_domains_rcu+0x40/0x40
> [1148238.912892]  ? kcompactd_do_work+0x240/0x240
> [1148238.912896]  kthread+0xeb/0x120
> [1148238.912900]  ? kthread_complete_and_exit+0x20/0x20
> [1148238.912904]  ret_from_fork+0x1f/0x30
> [1148238.912911]  </TASK>
> [1148238.913163] INFO: task happywriter:3425205 blocked for more than
> 362 seconds.
> [1148238.913195]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> [1148238.913220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [1148238.913250] task:happywriter state:D stack:    0 pid:3425205
> ppid:557021 flags:0x00004000
> [1148238.913254] Call Trace:
> [1148238.913256]  <TASK>
> [1148238.913259]  __schedule+0x257/0x5d0
> [1148238.913264]  ? __schedule+0x25f/0x5d0
> [1148238.913267]  schedule+0x68/0x110
> [1148238.913270]  rwsem_down_write_slowpath+0x2ee/0x5a0
> [1148238.913276]  ? check_heap_object+0x100/0x1e0
> [1148238.913280]  down_write+0x4f/0x70
> [1148238.913284]  do_unlinkat+0x12b/0x2d0
> [1148238.913288]  __x64_sys_unlink+0x42/0x70
> [1148238.913292]  ? syscall_exit_to_user_mode+0x2a/0x50
> [1148238.913296]  do_syscall_64+0x59/0x90
> [1148238.913299]  ? do_syscall_64+0x69/0x90
> [1148238.913301]  ? sysvec_call_function_single+0x4e/0xb0
> [1148238.913305]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [1148238.913310] RIP: 0033:0x1202a57
> [1148238.913315] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000057
> [1148238.913319] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
> 0000000001202a57
> [1148238.913321] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 00007fe450054b60
> [1148238.913323] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
> 00000000000000e8
> [1148238.913325] R10: 0000000000000001 R11: 0000000000000246 R12:
> 00007fe467ffd5b0
> [1148238.913328] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
> 00007fe3a4e94d28
> [1148238.913333]  </TASK>
> [1148238.913429] INFO: task kworker/u16:20:3402199 blocked for more
> than 120 seconds.
> [1148238.913459]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> [1148238.913496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [1148238.913527] task:kworker/u16:20  state:D stack:    0 pid:3402199
> ppid:     2 flags:0x00004000
> [1148238.913533] Workqueue: events_unbound io_ring_exit_work
> [1148238.913539] Call Trace:
> [1148238.913541]  <TASK>
> [1148238.913544]  __schedule+0x257/0x5d0
> [1148238.913548]  schedule+0x68/0x110
> [1148238.913551]  schedule_timeout+0x122/0x160
> [1148238.913556]  __wait_for_common+0x8f/0x190
> [1148238.913559]  ? usleep_range_state+0xa0/0xa0
> [1148238.913564]  wait_for_completion+0x24/0x40
> [1148238.913567]  io_ring_exit_work+0x186/0x1e9
> [1148238.913571]  ? io_uring_del_tctx_node+0xbf/0xbf
> [1148238.913576]  process_one_work+0x21c/0x400
> [1148238.913579]  worker_thread+0x50/0x3f0
> [1148238.913583]  ? rescuer_thread+0x3a0/0x3a0
> [1148238.913586]  kthread+0xeb/0x120
> [1148238.913590]  ? kthread_complete_and_exit+0x20/0x20
> [1148238.913594]  ret_from_fork+0x1f/0x30
> [1148238.913600]  </TASK>
> [1148238.913607] INFO: task stat:3434604 blocked for more than 120 seconds.
> [1148238.913633]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> [1148238.913658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [1148238.913687] task:stat            state:D stack:    0 pid:3434604
> ppid:3396625 flags:0x00004004
> [1148238.913691] Call Trace:
> [1148238.913693]  <TASK>
> [1148238.913695]  __schedule+0x257/0x5d0
> [1148238.913699]  schedule+0x68/0x110
> [1148238.913703]  request_wait_answer+0x13f/0x220
> [1148238.913708]  ? destroy_sched_domains_rcu+0x40/0x40
> [1148238.913713]  fuse_simple_request+0x1bb/0x370
> [1148238.913717]  fuse_do_getattr+0xda/0x320
> [1148238.913720]  ? try_to_unlazy+0x5b/0xd0
> [1148238.913726]  fuse_update_get_attr+0xb3/0xf0
> [1148238.913730]  fuse_getattr+0x87/0xd0
> [1148238.913733]  vfs_getattr_nosec+0xba/0x100
> [1148238.913737]  vfs_statx+0xa9/0x140
> [1148238.913741]  vfs_fstatat+0x59/0x80
> [1148238.913744]  __do_sys_newlstat+0x38/0x80
> [1148238.913750]  __x64_sys_newlstat+0x16/0x20
> [1148238.913753]  do_syscall_64+0x59/0x90
> [1148238.913757]  ? handle_mm_fault+0xba/0x2a0
> [1148238.913761]  ? exit_to_user_mode_prepare+0xaf/0xd0
> [1148238.913765]  ? irqentry_exit_to_user_mode+0x9/0x20
> [1148238.913769]  ? irqentry_exit+0x43/0x50
> [1148238.913772]  ? exc_page_fault+0x92/0x1b0
> [1148238.913776]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [1148238.913780] RIP: 0033:0x7fa76960d6ea
> [1148238.913783] RSP: 002b:00007ffe6d011878 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000006
> [1148238.913787] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007fa76960d6ea
> [1148238.913789] RDX: 00007ffe6d011890 RSI: 00007ffe6d011890 RDI:
> 00007ffe6d01290b
> [1148238.913791] RBP: 00007ffe6d011a70 R08: 0000000000000001 R09:
> 0000000000000000
> [1148238.913793] R10: fffffffffffff3ce R11: 0000000000000246 R12:
> 0000000000000000
> [1148238.913795] R13: 000055feff0d2960 R14: 00007ffe6d011a68 R15:
> 00007ffe6d01290b
> [1148238.913800]  </TASK>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs induced data loss (on xfs) - 5.19.0-38-generic
  2023-04-17  8:42 ` Qu Wenruo
@ 2023-04-18  7:07   ` Kyle Sanderson
  2023-04-18  7:18     ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Kyle Sanderson @ 2023-04-18  7:07 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: linux-btrfs, Linux-Kernal, linux-block, linux-fsdevel,
	Linus Torvalds, Greg KH

On Mon, Apr 17, 2023 at 1:42 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> And talking about the "100% utilization", did you mean a single CPU core
> is fully utilized by some btrfs thread?

No, iostat reporting the disk is at "100%" utilization.

> If so, it looks like qgroup is involved.
> Did you just recently deleted one or more snapshots? If so, it's almost
> certain qgroup is the cause of the 100% CPU utilization.

No, I used to run snapraid-btrfs (uses snapper) but both projects are
effectively orphaned. There's no snapshots left, but I used to run
zygo/bees also on these disks but that also has basic problems
(specifically mlock' on GBs of data when it's sitting completely idle
for hours at a time). bees was not running when this happened (or for
the duration of the uptime of the machine).

> In that case, you can disable quota for that btrfs (permanently), or use
> newer kernel which has a sysfs interface:
>
>    /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold

Thank you, I'll give it a go after the reboot here. I don't think I
have quotas enabled but I did look at it at one point (I remember
toggling on then off).

> If you write some value like 3 into that file, btrfs would automatically
> avoid such long CPU usage caused by large snapshot dropping.
>
> But talking about the XFS corruption, I'm not sure how btrfs can lead to
> problems in XFS...

All I/O seemed to be dead on the box.

I tried to remove the same data again and the task hung... this time I
was able to interrupt the removal and the box recovered.

[92798.210656] INFO: task sync:1282043 blocked for more than 120 seconds.
[92798.210683]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
[92798.210707] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[92798.210735] task:sync            state:D stack:    0 pid:1282043
ppid:1281934 flags:0x00004002
[92798.210739] Call Trace:
[92798.210741]  <TASK>
[92798.210743]  __schedule+0x257/0x5d0
[92798.210747]  schedule+0x68/0x110
[92798.210750]  wait_current_trans+0xde/0x140 [btrfs]
[92798.210820]  ? destroy_sched_domains_rcu+0x40/0x40
[92798.210824]  start_transaction+0x34d/0x5f0 [btrfs]
[92798.210895]  btrfs_attach_transaction_barrier+0x23/0x70 [btrfs]
[92798.210966]  btrfs_sync_fs+0x4a/0x1c0 [btrfs]
[92798.211029]  ? vfs_fsync_range+0xa0/0xa0
[92798.211033]  sync_fs_one_sb+0x26/0x40
[92798.211037]  iterate_supers+0x9e/0x110
[92798.211041]  ksys_sync+0x62/0xb0
[92798.211045]  __do_sys_sync+0xe/0x20
[92798.211049]  do_syscall_64+0x59/0x90
[92798.211052]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[92798.211057] RIP: 0033:0x7fd08471babb
[92798.211059] RSP: 002b:00007ffd344545f8 EFLAGS: 00000246 ORIG_RAX:
00000000000000a2
[92798.211063] RAX: ffffffffffffffda RBX: 00007ffd344547d8 RCX: 00007fd08471babb
[92798.211065] RDX: 00007fd084821101 RSI: 00007ffd344547d8 RDI: 00007fd0847d9e29
[92798.211067] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[92798.211069] R10: 0000556ba8b21e40 R11: 0000000000000246 R12: 0000556ba8acebc0
[92798.211071] R13: 0000556ba8acc19f R14: 00007fd08481942c R15: 0000556ba8acc034
[92798.211075]  </TASK>

Kyle.

On Mon, Apr 17, 2023 at 1:42 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2023/4/17 13:20, Kyle Sanderson wrote:
> > The single btrfs disk was at 100% utilization and a wa of 50~, reading
> > back at around 2MB/s. df and similar would simply freeze. Leading up
> > to this I removed around 2T of data from a single btrfs disk. I
> > managed to get most of the services shutdown and disks unmounted, but
> > when the system came back up I had to use xfs_repair (for the first
> > time in a very long time) to boot into my system. I likely should have
> > just pulled the power...
>
> I didn't see any obvious btrfs related work involved in the call trace.
> Thus I believe it's not really hanging in a waiting status.
>
> And talking about the "100% utilization", did you mean a single CPU core
> is fully utilized by some btrfs thread?
>
> If so, it looks like qgroup is involved.
> Did you just recently deleted one or more snapshots? If so, it's almost
> certain qgroup is the cause of the 100% CPU utilization.
>
> In that case, you can disable quota for that btrfs (permanently), or use
> newer kernel which has a sysfs interface:
>
>    /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold
>
> If you write some value like 3 into that file, btrfs would automatically
> avoid such long CPU usage caused by large snapshot dropping.
>
> But talking about the XFS corruption, I'm not sure how btrfs can lead to
> problems in XFS...
>
> Thanks,
> Qu
> >
> > [1147997.255020] INFO: task happywriter:3425205 blocked for more than
> > 120 seconds.
> > [1147997.255088]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> > [1147997.255114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [1147997.255144] task:happywriter state:D stack:    0 pid:3425205
> > ppid:557021 flags:0x00004000
> > [1147997.255151] Call Trace:
> > [1147997.255155]  <TASK>
> > [1147997.255160]  __schedule+0x257/0x5d0
> > [1147997.255169]  ? __schedule+0x25f/0x5d0
> > [1147997.255173]  schedule+0x68/0x110
> > [1147997.255176]  rwsem_down_write_slowpath+0x2ee/0x5a0
> > [1147997.255180]  ? check_heap_object+0x100/0x1e0
> > [1147997.255185]  down_write+0x4f/0x70
> > [1147997.255189]  do_unlinkat+0x12b/0x2d0
> > [1147997.255194]  __x64_sys_unlink+0x42/0x70
> > [1147997.255197]  ? syscall_exit_to_user_mode+0x2a/0x50
> > [1147997.255201]  do_syscall_64+0x59/0x90
> > [1147997.255204]  ? do_syscall_64+0x69/0x90
> > [1147997.255207]  ? sysvec_call_function_single+0x4e/0xb0
> > [1147997.255211]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > [1147997.255216] RIP: 0033:0x1202a57
> > [1147997.255220] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000057
> > [1147997.255224] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
> > 0000000001202a57
> > [1147997.255226] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > 00007fe450054b60
> > [1147997.255228] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
> > 00000000000000e8
> > [1147997.255231] R10: 0000000000000001 R11: 0000000000000246 R12:
> > 00007fe467ffd5b0
> > [1147997.255233] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
> > 00007fe3a4e94d28
> > [1147997.255239]  </TASK>
> > [1148118.087966] INFO: task happywriter:3425205 blocked for more than
> > 241 seconds.
> > [1148118.088022]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> > [1148118.088048] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [1148118.088077] task:happywriter state:D stack:    0 pid:3425205
> > ppid:557021 flags:0x00004000
> > [1148118.088083] Call Trace:
> > [1148118.088087]  <TASK>
> > [1148118.088093]  __schedule+0x257/0x5d0
> > [1148118.088101]  ? __schedule+0x25f/0x5d0
> > [1148118.088105]  schedule+0x68/0x110
> > [1148118.088108]  rwsem_down_write_slowpath+0x2ee/0x5a0
> > [1148118.088113]  ? check_heap_object+0x100/0x1e0
> > [1148118.088118]  down_write+0x4f/0x70
> > [1148118.088121]  do_unlinkat+0x12b/0x2d0
> > [1148118.088126]  __x64_sys_unlink+0x42/0x70
> > [1148118.088129]  ? syscall_exit_to_user_mode+0x2a/0x50
> > [1148118.088133]  do_syscall_64+0x59/0x90
> > [1148118.088136]  ? do_syscall_64+0x69/0x90
> > [1148118.088139]  ? sysvec_call_function_single+0x4e/0xb0
> > [1148118.088142]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > [1148118.088148] RIP: 0033:0x1202a57
> > [1148118.088151] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000057
> > [1148118.088155] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
> > 0000000001202a57
> > [1148118.088158] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > 00007fe450054b60
> > [1148118.088160] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
> > 00000000000000e8
> > [1148118.088162] R10: 0000000000000001 R11: 0000000000000246 R12:
> > 00007fe467ffd5b0
> > [1148118.088164] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
> > 00007fe3a4e94d28
> > [1148118.088170]  </TASK>
> > [1148238.912688] INFO: task kcompactd0:70 blocked for more than 120 seconds.
> > [1148238.912741]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> > [1148238.912767] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [1148238.912796] task:kcompactd0      state:D stack:    0 pid:   70
> > ppid:     2 flags:0x00004000
> > [1148238.912803] Call Trace:
> > [1148238.912806]  <TASK>
> > [1148238.912812]  __schedule+0x257/0x5d0
> > [1148238.912821]  schedule+0x68/0x110
> > [1148238.912824]  io_schedule+0x46/0x80
> > [1148238.912827]  folio_wait_bit_common+0x14c/0x3a0
> > [1148238.912833]  ? filemap_invalidate_unlock_two+0x50/0x50
> > [1148238.912838]  __folio_lock+0x17/0x30
> > [1148238.912842]  __unmap_and_move.constprop.0+0x39c/0x640
> > [1148238.912847]  ? free_unref_page+0xe3/0x190
> > [1148238.912852]  ? move_freelist_tail+0xe0/0xe0
> > [1148238.912855]  ? move_freelist_tail+0xe0/0xe0
> > [1148238.912858]  unmap_and_move+0x7d/0x4e0
> > [1148238.912862]  migrate_pages+0x3b8/0x770
> > [1148238.912867]  ? move_freelist_tail+0xe0/0xe0
> > [1148238.912870]  ? isolate_freepages+0x2f0/0x2f0
> > [1148238.912874]  compact_zone+0x2ca/0x620
> > [1148238.912878]  proactive_compact_node+0x8a/0xe0
> > [1148238.912883]  kcompactd+0x21c/0x4e0
> > [1148238.912886]  ? destroy_sched_domains_rcu+0x40/0x40
> > [1148238.912892]  ? kcompactd_do_work+0x240/0x240
> > [1148238.912896]  kthread+0xeb/0x120
> > [1148238.912900]  ? kthread_complete_and_exit+0x20/0x20
> > [1148238.912904]  ret_from_fork+0x1f/0x30
> > [1148238.912911]  </TASK>
> > [1148238.913163] INFO: task happywriter:3425205 blocked for more than
> > 362 seconds.
> > [1148238.913195]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> > [1148238.913220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [1148238.913250] task:happywriter state:D stack:    0 pid:3425205
> > ppid:557021 flags:0x00004000
> > [1148238.913254] Call Trace:
> > [1148238.913256]  <TASK>
> > [1148238.913259]  __schedule+0x257/0x5d0
> > [1148238.913264]  ? __schedule+0x25f/0x5d0
> > [1148238.913267]  schedule+0x68/0x110
> > [1148238.913270]  rwsem_down_write_slowpath+0x2ee/0x5a0
> > [1148238.913276]  ? check_heap_object+0x100/0x1e0
> > [1148238.913280]  down_write+0x4f/0x70
> > [1148238.913284]  do_unlinkat+0x12b/0x2d0
> > [1148238.913288]  __x64_sys_unlink+0x42/0x70
> > [1148238.913292]  ? syscall_exit_to_user_mode+0x2a/0x50
> > [1148238.913296]  do_syscall_64+0x59/0x90
> > [1148238.913299]  ? do_syscall_64+0x69/0x90
> > [1148238.913301]  ? sysvec_call_function_single+0x4e/0xb0
> > [1148238.913305]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > [1148238.913310] RIP: 0033:0x1202a57
> > [1148238.913315] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000057
> > [1148238.913319] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
> > 0000000001202a57
> > [1148238.913321] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > 00007fe450054b60
> > [1148238.913323] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
> > 00000000000000e8
> > [1148238.913325] R10: 0000000000000001 R11: 0000000000000246 R12:
> > 00007fe467ffd5b0
> > [1148238.913328] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
> > 00007fe3a4e94d28
> > [1148238.913333]  </TASK>
> > [1148238.913429] INFO: task kworker/u16:20:3402199 blocked for more
> > than 120 seconds.
> > [1148238.913459]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> > [1148238.913496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [1148238.913527] task:kworker/u16:20  state:D stack:    0 pid:3402199
> > ppid:     2 flags:0x00004000
> > [1148238.913533] Workqueue: events_unbound io_ring_exit_work
> > [1148238.913539] Call Trace:
> > [1148238.913541]  <TASK>
> > [1148238.913544]  __schedule+0x257/0x5d0
> > [1148238.913548]  schedule+0x68/0x110
> > [1148238.913551]  schedule_timeout+0x122/0x160
> > [1148238.913556]  __wait_for_common+0x8f/0x190
> > [1148238.913559]  ? usleep_range_state+0xa0/0xa0
> > [1148238.913564]  wait_for_completion+0x24/0x40
> > [1148238.913567]  io_ring_exit_work+0x186/0x1e9
> > [1148238.913571]  ? io_uring_del_tctx_node+0xbf/0xbf
> > [1148238.913576]  process_one_work+0x21c/0x400
> > [1148238.913579]  worker_thread+0x50/0x3f0
> > [1148238.913583]  ? rescuer_thread+0x3a0/0x3a0
> > [1148238.913586]  kthread+0xeb/0x120
> > [1148238.913590]  ? kthread_complete_and_exit+0x20/0x20
> > [1148238.913594]  ret_from_fork+0x1f/0x30
> > [1148238.913600]  </TASK>
> > [1148238.913607] INFO: task stat:3434604 blocked for more than 120 seconds.
> > [1148238.913633]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> > [1148238.913658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [1148238.913687] task:stat            state:D stack:    0 pid:3434604
> > ppid:3396625 flags:0x00004004
> > [1148238.913691] Call Trace:
> > [1148238.913693]  <TASK>
> > [1148238.913695]  __schedule+0x257/0x5d0
> > [1148238.913699]  schedule+0x68/0x110
> > [1148238.913703]  request_wait_answer+0x13f/0x220
> > [1148238.913708]  ? destroy_sched_domains_rcu+0x40/0x40
> > [1148238.913713]  fuse_simple_request+0x1bb/0x370
> > [1148238.913717]  fuse_do_getattr+0xda/0x320
> > [1148238.913720]  ? try_to_unlazy+0x5b/0xd0
> > [1148238.913726]  fuse_update_get_attr+0xb3/0xf0
> > [1148238.913730]  fuse_getattr+0x87/0xd0
> > [1148238.913733]  vfs_getattr_nosec+0xba/0x100
> > [1148238.913737]  vfs_statx+0xa9/0x140
> > [1148238.913741]  vfs_fstatat+0x59/0x80
> > [1148238.913744]  __do_sys_newlstat+0x38/0x80
> > [1148238.913750]  __x64_sys_newlstat+0x16/0x20
> > [1148238.913753]  do_syscall_64+0x59/0x90
> > [1148238.913757]  ? handle_mm_fault+0xba/0x2a0
> > [1148238.913761]  ? exit_to_user_mode_prepare+0xaf/0xd0
> > [1148238.913765]  ? irqentry_exit_to_user_mode+0x9/0x20
> > [1148238.913769]  ? irqentry_exit+0x43/0x50
> > [1148238.913772]  ? exc_page_fault+0x92/0x1b0
> > [1148238.913776]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > [1148238.913780] RIP: 0033:0x7fa76960d6ea
> > [1148238.913783] RSP: 002b:00007ffe6d011878 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000006
> > [1148238.913787] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> > 00007fa76960d6ea
> > [1148238.913789] RDX: 00007ffe6d011890 RSI: 00007ffe6d011890 RDI:
> > 00007ffe6d01290b
> > [1148238.913791] RBP: 00007ffe6d011a70 R08: 0000000000000001 R09:
> > 0000000000000000
> > [1148238.913793] R10: fffffffffffff3ce R11: 0000000000000246 R12:
> > 0000000000000000
> > [1148238.913795] R13: 000055feff0d2960 R14: 00007ffe6d011a68 R15:
> > 00007ffe6d01290b
> > [1148238.913800]  </TASK>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs induced data loss (on xfs) - 5.19.0-38-generic
  2023-04-18  7:07   ` Kyle Sanderson
@ 2023-04-18  7:18     ` Qu Wenruo
  2023-04-18  7:31       ` Kyle Sanderson
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2023-04-18  7:18 UTC (permalink / raw)
  To: Kyle Sanderson
  Cc: linux-btrfs, Linux-Kernal, linux-block, linux-fsdevel,
	Linus Torvalds, Greg KH



On 2023/4/18 15:07, Kyle Sanderson wrote:
> On Mon, Apr 17, 2023 at 1:42 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> And talking about the "100% utilization", did you mean a single CPU core
>> is fully utilized by some btrfs thread?
> 
> No, iostat reporting the disk is at "100%" utilization.
> 
>> If so, it looks like qgroup is involved.
>> Did you just recently deleted one or more snapshots? If so, it's almost
>> certain qgroup is the cause of the 100% CPU utilization.
> 
> No, I used to run snapraid-btrfs (uses snapper) but both projects are
> effectively orphaned. There's no snapshots left, but I used to run
> zygo/bees also on these disks but that also has basic problems
> (specifically mlock' on GBs of data when it's sitting completely idle
> for hours at a time). bees was not running when this happened (or for
> the duration of the uptime of the machine).

Those are not affecting qgroups directly.

> 
>> In that case, you can disable quota for that btrfs (permanently), or use
>> newer kernel which has a sysfs interface:
>>
>>     /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold
> 
> Thank you, I'll give it a go after the reboot here. I don't think I
> have quotas enabled but I did look at it at one point (I remember
> toggling on then off).

Just in case, if that path exists, it means qgroup is enabled, and then 
qgroup would be the most possible cause.

IIRC a lot of tools like snapper would enable qgroup automatically.

Thus better to make sure qgroup is not enabled, with the following command:

  # btrfs qgroup show -pcre <mnt>

If it shows error, then qgroup is really disabled and we can look 
somewhere else.

Thanks,
Qu
> 
>> If you write some value like 3 into that file, btrfs would automatically
>> avoid such long CPU usage caused by large snapshot dropping.
>>
>> But talking about the XFS corruption, I'm not sure how btrfs can lead to
>> problems in XFS...
> 
> All I/O seemed to be dead on the box.
> 
> I tried to remove the same data again and the task hung... this time I
> was able to interrupt the removal and the box recovered.
> 
> [92798.210656] INFO: task sync:1282043 blocked for more than 120 seconds.
> [92798.210683]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> [92798.210707] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [92798.210735] task:sync            state:D stack:    0 pid:1282043
> ppid:1281934 flags:0x00004002
> [92798.210739] Call Trace:
> [92798.210741]  <TASK>
> [92798.210743]  __schedule+0x257/0x5d0
> [92798.210747]  schedule+0x68/0x110
> [92798.210750]  wait_current_trans+0xde/0x140 [btrfs]
> [92798.210820]  ? destroy_sched_domains_rcu+0x40/0x40
> [92798.210824]  start_transaction+0x34d/0x5f0 [btrfs]
> [92798.210895]  btrfs_attach_transaction_barrier+0x23/0x70 [btrfs]
> [92798.210966]  btrfs_sync_fs+0x4a/0x1c0 [btrfs]
> [92798.211029]  ? vfs_fsync_range+0xa0/0xa0
> [92798.211033]  sync_fs_one_sb+0x26/0x40
> [92798.211037]  iterate_supers+0x9e/0x110
> [92798.211041]  ksys_sync+0x62/0xb0
> [92798.211045]  __do_sys_sync+0xe/0x20
> [92798.211049]  do_syscall_64+0x59/0x90
> [92798.211052]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [92798.211057] RIP: 0033:0x7fd08471babb
> [92798.211059] RSP: 002b:00007ffd344545f8 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000a2
> [92798.211063] RAX: ffffffffffffffda RBX: 00007ffd344547d8 RCX: 00007fd08471babb
> [92798.211065] RDX: 00007fd084821101 RSI: 00007ffd344547d8 RDI: 00007fd0847d9e29
> [92798.211067] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
> [92798.211069] R10: 0000556ba8b21e40 R11: 0000000000000246 R12: 0000556ba8acebc0
> [92798.211071] R13: 0000556ba8acc19f R14: 00007fd08481942c R15: 0000556ba8acc034
> [92798.211075]  </TASK>
> 
> Kyle.
> 
> On Mon, Apr 17, 2023 at 1:42 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2023/4/17 13:20, Kyle Sanderson wrote:
>>> The single btrfs disk was at 100% utilization and a wa of 50~, reading
>>> back at around 2MB/s. df and similar would simply freeze. Leading up
>>> to this I removed around 2T of data from a single btrfs disk. I
>>> managed to get most of the services shutdown and disks unmounted, but
>>> when the system came back up I had to use xfs_repair (for the first
>>> time in a very long time) to boot into my system. I likely should have
>>> just pulled the power...
>>
>> I didn't see any obvious btrfs related work involved in the call trace.
>> Thus I believe it's not really hanging in a waiting status.
>>
>> And talking about the "100% utilization", did you mean a single CPU core
>> is fully utilized by some btrfs thread?
>>
>> If so, it looks like qgroup is involved.
>> Did you just recently deleted one or more snapshots? If so, it's almost
>> certain qgroup is the cause of the 100% CPU utilization.
>>
>> In that case, you can disable quota for that btrfs (permanently), or use
>> newer kernel which has a sysfs interface:
>>
>>     /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold
>>
>> If you write some value like 3 into that file, btrfs would automatically
>> avoid such long CPU usage caused by large snapshot dropping.
>>
>> But talking about the XFS corruption, I'm not sure how btrfs can lead to
>> problems in XFS...
>>
>> Thanks,
>> Qu
>>>
>>> [1147997.255020] INFO: task happywriter:3425205 blocked for more than
>>> 120 seconds.
>>> [1147997.255088]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>>> [1147997.255114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [1147997.255144] task:happywriter state:D stack:    0 pid:3425205
>>> ppid:557021 flags:0x00004000
>>> [1147997.255151] Call Trace:
>>> [1147997.255155]  <TASK>
>>> [1147997.255160]  __schedule+0x257/0x5d0
>>> [1147997.255169]  ? __schedule+0x25f/0x5d0
>>> [1147997.255173]  schedule+0x68/0x110
>>> [1147997.255176]  rwsem_down_write_slowpath+0x2ee/0x5a0
>>> [1147997.255180]  ? check_heap_object+0x100/0x1e0
>>> [1147997.255185]  down_write+0x4f/0x70
>>> [1147997.255189]  do_unlinkat+0x12b/0x2d0
>>> [1147997.255194]  __x64_sys_unlink+0x42/0x70
>>> [1147997.255197]  ? syscall_exit_to_user_mode+0x2a/0x50
>>> [1147997.255201]  do_syscall_64+0x59/0x90
>>> [1147997.255204]  ? do_syscall_64+0x69/0x90
>>> [1147997.255207]  ? sysvec_call_function_single+0x4e/0xb0
>>> [1147997.255211]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>> [1147997.255216] RIP: 0033:0x1202a57
>>> [1147997.255220] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
>>> 0000000000000057
>>> [1147997.255224] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
>>> 0000000001202a57
>>> [1147997.255226] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>>> 00007fe450054b60
>>> [1147997.255228] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
>>> 00000000000000e8
>>> [1147997.255231] R10: 0000000000000001 R11: 0000000000000246 R12:
>>> 00007fe467ffd5b0
>>> [1147997.255233] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
>>> 00007fe3a4e94d28
>>> [1147997.255239]  </TASK>
>>> [1148118.087966] INFO: task happywriter:3425205 blocked for more than
>>> 241 seconds.
>>> [1148118.088022]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>>> [1148118.088048] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [1148118.088077] task:happywriter state:D stack:    0 pid:3425205
>>> ppid:557021 flags:0x00004000
>>> [1148118.088083] Call Trace:
>>> [1148118.088087]  <TASK>
>>> [1148118.088093]  __schedule+0x257/0x5d0
>>> [1148118.088101]  ? __schedule+0x25f/0x5d0
>>> [1148118.088105]  schedule+0x68/0x110
>>> [1148118.088108]  rwsem_down_write_slowpath+0x2ee/0x5a0
>>> [1148118.088113]  ? check_heap_object+0x100/0x1e0
>>> [1148118.088118]  down_write+0x4f/0x70
>>> [1148118.088121]  do_unlinkat+0x12b/0x2d0
>>> [1148118.088126]  __x64_sys_unlink+0x42/0x70
>>> [1148118.088129]  ? syscall_exit_to_user_mode+0x2a/0x50
>>> [1148118.088133]  do_syscall_64+0x59/0x90
>>> [1148118.088136]  ? do_syscall_64+0x69/0x90
>>> [1148118.088139]  ? sysvec_call_function_single+0x4e/0xb0
>>> [1148118.088142]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>> [1148118.088148] RIP: 0033:0x1202a57
>>> [1148118.088151] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
>>> 0000000000000057
>>> [1148118.088155] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
>>> 0000000001202a57
>>> [1148118.088158] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>>> 00007fe450054b60
>>> [1148118.088160] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
>>> 00000000000000e8
>>> [1148118.088162] R10: 0000000000000001 R11: 0000000000000246 R12:
>>> 00007fe467ffd5b0
>>> [1148118.088164] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
>>> 00007fe3a4e94d28
>>> [1148118.088170]  </TASK>
>>> [1148238.912688] INFO: task kcompactd0:70 blocked for more than 120 seconds.
>>> [1148238.912741]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>>> [1148238.912767] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [1148238.912796] task:kcompactd0      state:D stack:    0 pid:   70
>>> ppid:     2 flags:0x00004000
>>> [1148238.912803] Call Trace:
>>> [1148238.912806]  <TASK>
>>> [1148238.912812]  __schedule+0x257/0x5d0
>>> [1148238.912821]  schedule+0x68/0x110
>>> [1148238.912824]  io_schedule+0x46/0x80
>>> [1148238.912827]  folio_wait_bit_common+0x14c/0x3a0
>>> [1148238.912833]  ? filemap_invalidate_unlock_two+0x50/0x50
>>> [1148238.912838]  __folio_lock+0x17/0x30
>>> [1148238.912842]  __unmap_and_move.constprop.0+0x39c/0x640
>>> [1148238.912847]  ? free_unref_page+0xe3/0x190
>>> [1148238.912852]  ? move_freelist_tail+0xe0/0xe0
>>> [1148238.912855]  ? move_freelist_tail+0xe0/0xe0
>>> [1148238.912858]  unmap_and_move+0x7d/0x4e0
>>> [1148238.912862]  migrate_pages+0x3b8/0x770
>>> [1148238.912867]  ? move_freelist_tail+0xe0/0xe0
>>> [1148238.912870]  ? isolate_freepages+0x2f0/0x2f0
>>> [1148238.912874]  compact_zone+0x2ca/0x620
>>> [1148238.912878]  proactive_compact_node+0x8a/0xe0
>>> [1148238.912883]  kcompactd+0x21c/0x4e0
>>> [1148238.912886]  ? destroy_sched_domains_rcu+0x40/0x40
>>> [1148238.912892]  ? kcompactd_do_work+0x240/0x240
>>> [1148238.912896]  kthread+0xeb/0x120
>>> [1148238.912900]  ? kthread_complete_and_exit+0x20/0x20
>>> [1148238.912904]  ret_from_fork+0x1f/0x30
>>> [1148238.912911]  </TASK>
>>> [1148238.913163] INFO: task happywriter:3425205 blocked for more than
>>> 362 seconds.
>>> [1148238.913195]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>>> [1148238.913220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [1148238.913250] task:happywriter state:D stack:    0 pid:3425205
>>> ppid:557021 flags:0x00004000
>>> [1148238.913254] Call Trace:
>>> [1148238.913256]  <TASK>
>>> [1148238.913259]  __schedule+0x257/0x5d0
>>> [1148238.913264]  ? __schedule+0x25f/0x5d0
>>> [1148238.913267]  schedule+0x68/0x110
>>> [1148238.913270]  rwsem_down_write_slowpath+0x2ee/0x5a0
>>> [1148238.913276]  ? check_heap_object+0x100/0x1e0
>>> [1148238.913280]  down_write+0x4f/0x70
>>> [1148238.913284]  do_unlinkat+0x12b/0x2d0
>>> [1148238.913288]  __x64_sys_unlink+0x42/0x70
>>> [1148238.913292]  ? syscall_exit_to_user_mode+0x2a/0x50
>>> [1148238.913296]  do_syscall_64+0x59/0x90
>>> [1148238.913299]  ? do_syscall_64+0x69/0x90
>>> [1148238.913301]  ? sysvec_call_function_single+0x4e/0xb0
>>> [1148238.913305]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>> [1148238.913310] RIP: 0033:0x1202a57
>>> [1148238.913315] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
>>> 0000000000000057
>>> [1148238.913319] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
>>> 0000000001202a57
>>> [1148238.913321] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>>> 00007fe450054b60
>>> [1148238.913323] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
>>> 00000000000000e8
>>> [1148238.913325] R10: 0000000000000001 R11: 0000000000000246 R12:
>>> 00007fe467ffd5b0
>>> [1148238.913328] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
>>> 00007fe3a4e94d28
>>> [1148238.913333]  </TASK>
>>> [1148238.913429] INFO: task kworker/u16:20:3402199 blocked for more
>>> than 120 seconds.
>>> [1148238.913459]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>>> [1148238.913496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [1148238.913527] task:kworker/u16:20  state:D stack:    0 pid:3402199
>>> ppid:     2 flags:0x00004000
>>> [1148238.913533] Workqueue: events_unbound io_ring_exit_work
>>> [1148238.913539] Call Trace:
>>> [1148238.913541]  <TASK>
>>> [1148238.913544]  __schedule+0x257/0x5d0
>>> [1148238.913548]  schedule+0x68/0x110
>>> [1148238.913551]  schedule_timeout+0x122/0x160
>>> [1148238.913556]  __wait_for_common+0x8f/0x190
>>> [1148238.913559]  ? usleep_range_state+0xa0/0xa0
>>> [1148238.913564]  wait_for_completion+0x24/0x40
>>> [1148238.913567]  io_ring_exit_work+0x186/0x1e9
>>> [1148238.913571]  ? io_uring_del_tctx_node+0xbf/0xbf
>>> [1148238.913576]  process_one_work+0x21c/0x400
>>> [1148238.913579]  worker_thread+0x50/0x3f0
>>> [1148238.913583]  ? rescuer_thread+0x3a0/0x3a0
>>> [1148238.913586]  kthread+0xeb/0x120
>>> [1148238.913590]  ? kthread_complete_and_exit+0x20/0x20
>>> [1148238.913594]  ret_from_fork+0x1f/0x30
>>> [1148238.913600]  </TASK>
>>> [1148238.913607] INFO: task stat:3434604 blocked for more than 120 seconds.
>>> [1148238.913633]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>>> [1148238.913658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [1148238.913687] task:stat            state:D stack:    0 pid:3434604
>>> ppid:3396625 flags:0x00004004
>>> [1148238.913691] Call Trace:
>>> [1148238.913693]  <TASK>
>>> [1148238.913695]  __schedule+0x257/0x5d0
>>> [1148238.913699]  schedule+0x68/0x110
>>> [1148238.913703]  request_wait_answer+0x13f/0x220
>>> [1148238.913708]  ? destroy_sched_domains_rcu+0x40/0x40
>>> [1148238.913713]  fuse_simple_request+0x1bb/0x370
>>> [1148238.913717]  fuse_do_getattr+0xda/0x320
>>> [1148238.913720]  ? try_to_unlazy+0x5b/0xd0
>>> [1148238.913726]  fuse_update_get_attr+0xb3/0xf0
>>> [1148238.913730]  fuse_getattr+0x87/0xd0
>>> [1148238.913733]  vfs_getattr_nosec+0xba/0x100
>>> [1148238.913737]  vfs_statx+0xa9/0x140
>>> [1148238.913741]  vfs_fstatat+0x59/0x80
>>> [1148238.913744]  __do_sys_newlstat+0x38/0x80
>>> [1148238.913750]  __x64_sys_newlstat+0x16/0x20
>>> [1148238.913753]  do_syscall_64+0x59/0x90
>>> [1148238.913757]  ? handle_mm_fault+0xba/0x2a0
>>> [1148238.913761]  ? exit_to_user_mode_prepare+0xaf/0xd0
>>> [1148238.913765]  ? irqentry_exit_to_user_mode+0x9/0x20
>>> [1148238.913769]  ? irqentry_exit+0x43/0x50
>>> [1148238.913772]  ? exc_page_fault+0x92/0x1b0
>>> [1148238.913776]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>> [1148238.913780] RIP: 0033:0x7fa76960d6ea
>>> [1148238.913783] RSP: 002b:00007ffe6d011878 EFLAGS: 00000246 ORIG_RAX:
>>> 0000000000000006
>>> [1148238.913787] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>> 00007fa76960d6ea
>>> [1148238.913789] RDX: 00007ffe6d011890 RSI: 00007ffe6d011890 RDI:
>>> 00007ffe6d01290b
>>> [1148238.913791] RBP: 00007ffe6d011a70 R08: 0000000000000001 R09:
>>> 0000000000000000
>>> [1148238.913793] R10: fffffffffffff3ce R11: 0000000000000246 R12:
>>> 0000000000000000
>>> [1148238.913795] R13: 000055feff0d2960 R14: 00007ffe6d011a68 R15:
>>> 00007ffe6d01290b
>>> [1148238.913800]  </TASK>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs induced data loss (on xfs) - 5.19.0-38-generic
  2023-04-18  7:18     ` Qu Wenruo
@ 2023-04-18  7:31       ` Kyle Sanderson
  2023-04-18  8:00         ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Kyle Sanderson @ 2023-04-18  7:31 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: linux-btrfs, Linux-Kernal, linux-block, linux-fsdevel,
	Linus Torvalds, Greg KH

On Tue, Apr 18, 2023 at 12:19 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2023/4/18 15:07, Kyle Sanderson wrote:
> > On Mon, Apr 17, 2023 at 1:42 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> Thus better to make sure qgroup is not enabled, with the following command:
> If it shows error, then qgroup is really disabled and we can look
> somewhere else.

All my disks:
ERROR: can't list qgroups: quotas not enabled

As long as there's nothing else running on the disk I seem to be able
to remove the files now. If I have a small workload (5k~ open
descriptors across multiple pids) running with them open it just seems
to fully lockup the entire box (100% utilization on the single disk,
with the same iowait symptom and system load increasing). I killed
everything and was able to clear around 4T. The background tasks are
running now to actually free the space, but the "frontend" unlink
action is done. Running multiple unlinks in parallel seems to trigger
this - does it take a full lock per pid or something?

Kyle.

On Tue, Apr 18, 2023 at 12:19 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2023/4/18 15:07, Kyle Sanderson wrote:
> > On Mon, Apr 17, 2023 at 1:42 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >> And talking about the "100% utilization", did you mean a single CPU core
> >> is fully utilized by some btrfs thread?
> >
> > No, iostat reporting the disk is at "100%" utilization.
> >
> >> If so, it looks like qgroup is involved.
> >> Did you just recently deleted one or more snapshots? If so, it's almost
> >> certain qgroup is the cause of the 100% CPU utilization.
> >
> > No, I used to run snapraid-btrfs (uses snapper) but both projects are
> > effectively orphaned. There's no snapshots left, but I used to run
> > zygo/bees also on these disks but that also has basic problems
> > (specifically mlock' on GBs of data when it's sitting completely idle
> > for hours at a time). bees was not running when this happened (or for
> > the duration of the uptime of the machine).
>
> Those are not affecting qgroups directly.
>
> >
> >> In that case, you can disable quota for that btrfs (permanently), or use
> >> newer kernel which has a sysfs interface:
> >>
> >>     /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold
> >
> > Thank you, I'll give it a go after the reboot here. I don't think I
> > have quotas enabled but I did look at it at one point (I remember
> > toggling on then off).
>
> Just in case, if that path exists, it means qgroup is enabled, and then
> qgroup would be the most possible cause.
>
> IIRC a lot of tools like snapper would enable qgroup automatically.
>
> Thus better to make sure qgroup is not enabled, with the following command:
>
>   # btrfs qgroup show -pcre <mnt>
>
> If it shows error, then qgroup is really disabled and we can look
> somewhere else.
>
> Thanks,
> Qu
> >
> >> If you write some value like 3 into that file, btrfs would automatically
> >> avoid such long CPU usage caused by large snapshot dropping.
> >>
> >> But talking about the XFS corruption, I'm not sure how btrfs can lead to
> >> problems in XFS...
> >
> > All I/O seemed to be dead on the box.
> >
> > I tried to remove the same data again and the task hung... this time I
> > was able to interrupt the removal and the box recovered.
> >
> > [92798.210656] INFO: task sync:1282043 blocked for more than 120 seconds.
> > [92798.210683]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> > [92798.210707] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [92798.210735] task:sync            state:D stack:    0 pid:1282043
> > ppid:1281934 flags:0x00004002
> > [92798.210739] Call Trace:
> > [92798.210741]  <TASK>
> > [92798.210743]  __schedule+0x257/0x5d0
> > [92798.210747]  schedule+0x68/0x110
> > [92798.210750]  wait_current_trans+0xde/0x140 [btrfs]
> > [92798.210820]  ? destroy_sched_domains_rcu+0x40/0x40
> > [92798.210824]  start_transaction+0x34d/0x5f0 [btrfs]
> > [92798.210895]  btrfs_attach_transaction_barrier+0x23/0x70 [btrfs]
> > [92798.210966]  btrfs_sync_fs+0x4a/0x1c0 [btrfs]
> > [92798.211029]  ? vfs_fsync_range+0xa0/0xa0
> > [92798.211033]  sync_fs_one_sb+0x26/0x40
> > [92798.211037]  iterate_supers+0x9e/0x110
> > [92798.211041]  ksys_sync+0x62/0xb0
> > [92798.211045]  __do_sys_sync+0xe/0x20
> > [92798.211049]  do_syscall_64+0x59/0x90
> > [92798.211052]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > [92798.211057] RIP: 0033:0x7fd08471babb
> > [92798.211059] RSP: 002b:00007ffd344545f8 EFLAGS: 00000246 ORIG_RAX:
> > 00000000000000a2
> > [92798.211063] RAX: ffffffffffffffda RBX: 00007ffd344547d8 RCX: 00007fd08471babb
> > [92798.211065] RDX: 00007fd084821101 RSI: 00007ffd344547d8 RDI: 00007fd0847d9e29
> > [92798.211067] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
> > [92798.211069] R10: 0000556ba8b21e40 R11: 0000000000000246 R12: 0000556ba8acebc0
> > [92798.211071] R13: 0000556ba8acc19f R14: 00007fd08481942c R15: 0000556ba8acc034
> > [92798.211075]  </TASK>
> >
> > Kyle.
> >
> > On Mon, Apr 17, 2023 at 1:42 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>
> >>
> >>
> >> On 2023/4/17 13:20, Kyle Sanderson wrote:
> >>> The single btrfs disk was at 100% utilization and a wa of 50~, reading
> >>> back at around 2MB/s. df and similar would simply freeze. Leading up
> >>> to this I removed around 2T of data from a single btrfs disk. I
> >>> managed to get most of the services shutdown and disks unmounted, but
> >>> when the system came back up I had to use xfs_repair (for the first
> >>> time in a very long time) to boot into my system. I likely should have
> >>> just pulled the power...
> >>
> >> I didn't see any obvious btrfs related work involved in the call trace.
> >> Thus I believe it's not really hanging in a waiting status.
> >>
> >> And talking about the "100% utilization", did you mean a single CPU core
> >> is fully utilized by some btrfs thread?
> >>
> >> If so, it looks like qgroup is involved.
> >> Did you just recently deleted one or more snapshots? If so, it's almost
> >> certain qgroup is the cause of the 100% CPU utilization.
> >>
> >> In that case, you can disable quota for that btrfs (permanently), or use
> >> newer kernel which has a sysfs interface:
> >>
> >>     /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold
> >>
> >> If you write some value like 3 into that file, btrfs would automatically
> >> avoid such long CPU usage caused by large snapshot dropping.
> >>
> >> But talking about the XFS corruption, I'm not sure how btrfs can lead to
> >> problems in XFS...
> >>
> >> Thanks,
> >> Qu
> >>>
> >>> [1147997.255020] INFO: task happywriter:3425205 blocked for more than
> >>> 120 seconds.
> >>> [1147997.255088]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> >>> [1147997.255114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [1147997.255144] task:happywriter state:D stack:    0 pid:3425205
> >>> ppid:557021 flags:0x00004000
> >>> [1147997.255151] Call Trace:
> >>> [1147997.255155]  <TASK>
> >>> [1147997.255160]  __schedule+0x257/0x5d0
> >>> [1147997.255169]  ? __schedule+0x25f/0x5d0
> >>> [1147997.255173]  schedule+0x68/0x110
> >>> [1147997.255176]  rwsem_down_write_slowpath+0x2ee/0x5a0
> >>> [1147997.255180]  ? check_heap_object+0x100/0x1e0
> >>> [1147997.255185]  down_write+0x4f/0x70
> >>> [1147997.255189]  do_unlinkat+0x12b/0x2d0
> >>> [1147997.255194]  __x64_sys_unlink+0x42/0x70
> >>> [1147997.255197]  ? syscall_exit_to_user_mode+0x2a/0x50
> >>> [1147997.255201]  do_syscall_64+0x59/0x90
> >>> [1147997.255204]  ? do_syscall_64+0x69/0x90
> >>> [1147997.255207]  ? sysvec_call_function_single+0x4e/0xb0
> >>> [1147997.255211]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>> [1147997.255216] RIP: 0033:0x1202a57
> >>> [1147997.255220] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
> >>> 0000000000000057
> >>> [1147997.255224] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
> >>> 0000000001202a57
> >>> [1147997.255226] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> >>> 00007fe450054b60
> >>> [1147997.255228] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
> >>> 00000000000000e8
> >>> [1147997.255231] R10: 0000000000000001 R11: 0000000000000246 R12:
> >>> 00007fe467ffd5b0
> >>> [1147997.255233] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
> >>> 00007fe3a4e94d28
> >>> [1147997.255239]  </TASK>
> >>> [1148118.087966] INFO: task happywriter:3425205 blocked for more than
> >>> 241 seconds.
> >>> [1148118.088022]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> >>> [1148118.088048] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [1148118.088077] task:happywriter state:D stack:    0 pid:3425205
> >>> ppid:557021 flags:0x00004000
> >>> [1148118.088083] Call Trace:
> >>> [1148118.088087]  <TASK>
> >>> [1148118.088093]  __schedule+0x257/0x5d0
> >>> [1148118.088101]  ? __schedule+0x25f/0x5d0
> >>> [1148118.088105]  schedule+0x68/0x110
> >>> [1148118.088108]  rwsem_down_write_slowpath+0x2ee/0x5a0
> >>> [1148118.088113]  ? check_heap_object+0x100/0x1e0
> >>> [1148118.088118]  down_write+0x4f/0x70
> >>> [1148118.088121]  do_unlinkat+0x12b/0x2d0
> >>> [1148118.088126]  __x64_sys_unlink+0x42/0x70
> >>> [1148118.088129]  ? syscall_exit_to_user_mode+0x2a/0x50
> >>> [1148118.088133]  do_syscall_64+0x59/0x90
> >>> [1148118.088136]  ? do_syscall_64+0x69/0x90
> >>> [1148118.088139]  ? sysvec_call_function_single+0x4e/0xb0
> >>> [1148118.088142]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>> [1148118.088148] RIP: 0033:0x1202a57
> >>> [1148118.088151] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
> >>> 0000000000000057
> >>> [1148118.088155] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
> >>> 0000000001202a57
> >>> [1148118.088158] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> >>> 00007fe450054b60
> >>> [1148118.088160] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
> >>> 00000000000000e8
> >>> [1148118.088162] R10: 0000000000000001 R11: 0000000000000246 R12:
> >>> 00007fe467ffd5b0
> >>> [1148118.088164] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
> >>> 00007fe3a4e94d28
> >>> [1148118.088170]  </TASK>
> >>> [1148238.912688] INFO: task kcompactd0:70 blocked for more than 120 seconds.
> >>> [1148238.912741]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> >>> [1148238.912767] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [1148238.912796] task:kcompactd0      state:D stack:    0 pid:   70
> >>> ppid:     2 flags:0x00004000
> >>> [1148238.912803] Call Trace:
> >>> [1148238.912806]  <TASK>
> >>> [1148238.912812]  __schedule+0x257/0x5d0
> >>> [1148238.912821]  schedule+0x68/0x110
> >>> [1148238.912824]  io_schedule+0x46/0x80
> >>> [1148238.912827]  folio_wait_bit_common+0x14c/0x3a0
> >>> [1148238.912833]  ? filemap_invalidate_unlock_two+0x50/0x50
> >>> [1148238.912838]  __folio_lock+0x17/0x30
> >>> [1148238.912842]  __unmap_and_move.constprop.0+0x39c/0x640
> >>> [1148238.912847]  ? free_unref_page+0xe3/0x190
> >>> [1148238.912852]  ? move_freelist_tail+0xe0/0xe0
> >>> [1148238.912855]  ? move_freelist_tail+0xe0/0xe0
> >>> [1148238.912858]  unmap_and_move+0x7d/0x4e0
> >>> [1148238.912862]  migrate_pages+0x3b8/0x770
> >>> [1148238.912867]  ? move_freelist_tail+0xe0/0xe0
> >>> [1148238.912870]  ? isolate_freepages+0x2f0/0x2f0
> >>> [1148238.912874]  compact_zone+0x2ca/0x620
> >>> [1148238.912878]  proactive_compact_node+0x8a/0xe0
> >>> [1148238.912883]  kcompactd+0x21c/0x4e0
> >>> [1148238.912886]  ? destroy_sched_domains_rcu+0x40/0x40
> >>> [1148238.912892]  ? kcompactd_do_work+0x240/0x240
> >>> [1148238.912896]  kthread+0xeb/0x120
> >>> [1148238.912900]  ? kthread_complete_and_exit+0x20/0x20
> >>> [1148238.912904]  ret_from_fork+0x1f/0x30
> >>> [1148238.912911]  </TASK>
> >>> [1148238.913163] INFO: task happywriter:3425205 blocked for more than
> >>> 362 seconds.
> >>> [1148238.913195]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> >>> [1148238.913220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [1148238.913250] task:happywriter state:D stack:    0 pid:3425205
> >>> ppid:557021 flags:0x00004000
> >>> [1148238.913254] Call Trace:
> >>> [1148238.913256]  <TASK>
> >>> [1148238.913259]  __schedule+0x257/0x5d0
> >>> [1148238.913264]  ? __schedule+0x25f/0x5d0
> >>> [1148238.913267]  schedule+0x68/0x110
> >>> [1148238.913270]  rwsem_down_write_slowpath+0x2ee/0x5a0
> >>> [1148238.913276]  ? check_heap_object+0x100/0x1e0
> >>> [1148238.913280]  down_write+0x4f/0x70
> >>> [1148238.913284]  do_unlinkat+0x12b/0x2d0
> >>> [1148238.913288]  __x64_sys_unlink+0x42/0x70
> >>> [1148238.913292]  ? syscall_exit_to_user_mode+0x2a/0x50
> >>> [1148238.913296]  do_syscall_64+0x59/0x90
> >>> [1148238.913299]  ? do_syscall_64+0x69/0x90
> >>> [1148238.913301]  ? sysvec_call_function_single+0x4e/0xb0
> >>> [1148238.913305]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>> [1148238.913310] RIP: 0033:0x1202a57
> >>> [1148238.913315] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
> >>> 0000000000000057
> >>> [1148238.913319] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
> >>> 0000000001202a57
> >>> [1148238.913321] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> >>> 00007fe450054b60
> >>> [1148238.913323] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
> >>> 00000000000000e8
> >>> [1148238.913325] R10: 0000000000000001 R11: 0000000000000246 R12:
> >>> 00007fe467ffd5b0
> >>> [1148238.913328] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
> >>> 00007fe3a4e94d28
> >>> [1148238.913333]  </TASK>
> >>> [1148238.913429] INFO: task kworker/u16:20:3402199 blocked for more
> >>> than 120 seconds.
> >>> [1148238.913459]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> >>> [1148238.913496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [1148238.913527] task:kworker/u16:20  state:D stack:    0 pid:3402199
> >>> ppid:     2 flags:0x00004000
> >>> [1148238.913533] Workqueue: events_unbound io_ring_exit_work
> >>> [1148238.913539] Call Trace:
> >>> [1148238.913541]  <TASK>
> >>> [1148238.913544]  __schedule+0x257/0x5d0
> >>> [1148238.913548]  schedule+0x68/0x110
> >>> [1148238.913551]  schedule_timeout+0x122/0x160
> >>> [1148238.913556]  __wait_for_common+0x8f/0x190
> >>> [1148238.913559]  ? usleep_range_state+0xa0/0xa0
> >>> [1148238.913564]  wait_for_completion+0x24/0x40
> >>> [1148238.913567]  io_ring_exit_work+0x186/0x1e9
> >>> [1148238.913571]  ? io_uring_del_tctx_node+0xbf/0xbf
> >>> [1148238.913576]  process_one_work+0x21c/0x400
> >>> [1148238.913579]  worker_thread+0x50/0x3f0
> >>> [1148238.913583]  ? rescuer_thread+0x3a0/0x3a0
> >>> [1148238.913586]  kthread+0xeb/0x120
> >>> [1148238.913590]  ? kthread_complete_and_exit+0x20/0x20
> >>> [1148238.913594]  ret_from_fork+0x1f/0x30
> >>> [1148238.913600]  </TASK>
> >>> [1148238.913607] INFO: task stat:3434604 blocked for more than 120 seconds.
> >>> [1148238.913633]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
> >>> [1148238.913658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [1148238.913687] task:stat            state:D stack:    0 pid:3434604
> >>> ppid:3396625 flags:0x00004004
> >>> [1148238.913691] Call Trace:
> >>> [1148238.913693]  <TASK>
> >>> [1148238.913695]  __schedule+0x257/0x5d0
> >>> [1148238.913699]  schedule+0x68/0x110
> >>> [1148238.913703]  request_wait_answer+0x13f/0x220
> >>> [1148238.913708]  ? destroy_sched_domains_rcu+0x40/0x40
> >>> [1148238.913713]  fuse_simple_request+0x1bb/0x370
> >>> [1148238.913717]  fuse_do_getattr+0xda/0x320
> >>> [1148238.913720]  ? try_to_unlazy+0x5b/0xd0
> >>> [1148238.913726]  fuse_update_get_attr+0xb3/0xf0
> >>> [1148238.913730]  fuse_getattr+0x87/0xd0
> >>> [1148238.913733]  vfs_getattr_nosec+0xba/0x100
> >>> [1148238.913737]  vfs_statx+0xa9/0x140
> >>> [1148238.913741]  vfs_fstatat+0x59/0x80
> >>> [1148238.913744]  __do_sys_newlstat+0x38/0x80
> >>> [1148238.913750]  __x64_sys_newlstat+0x16/0x20
> >>> [1148238.913753]  do_syscall_64+0x59/0x90
> >>> [1148238.913757]  ? handle_mm_fault+0xba/0x2a0
> >>> [1148238.913761]  ? exit_to_user_mode_prepare+0xaf/0xd0
> >>> [1148238.913765]  ? irqentry_exit_to_user_mode+0x9/0x20
> >>> [1148238.913769]  ? irqentry_exit+0x43/0x50
> >>> [1148238.913772]  ? exc_page_fault+0x92/0x1b0
> >>> [1148238.913776]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>> [1148238.913780] RIP: 0033:0x7fa76960d6ea
> >>> [1148238.913783] RSP: 002b:00007ffe6d011878 EFLAGS: 00000246 ORIG_RAX:
> >>> 0000000000000006
> >>> [1148238.913787] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>> 00007fa76960d6ea
> >>> [1148238.913789] RDX: 00007ffe6d011890 RSI: 00007ffe6d011890 RDI:
> >>> 00007ffe6d01290b
> >>> [1148238.913791] RBP: 00007ffe6d011a70 R08: 0000000000000001 R09:
> >>> 0000000000000000
> >>> [1148238.913793] R10: fffffffffffff3ce R11: 0000000000000246 R12:
> >>> 0000000000000000
> >>> [1148238.913795] R13: 000055feff0d2960 R14: 00007ffe6d011a68 R15:
> >>> 00007ffe6d01290b
> >>> [1148238.913800]  </TASK>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs induced data loss (on xfs) - 5.19.0-38-generic
  2023-04-18  7:31       ` Kyle Sanderson
@ 2023-04-18  8:00         ` Qu Wenruo
  0 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2023-04-18  8:00 UTC (permalink / raw)
  To: Kyle Sanderson
  Cc: linux-btrfs, Linux-Kernal, linux-block, linux-fsdevel,
	Linus Torvalds, Greg KH



On 2023/4/18 15:31, Kyle Sanderson wrote:
> On Tue, Apr 18, 2023 at 12:19 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> On 2023/4/18 15:07, Kyle Sanderson wrote:
>>> On Mon, Apr 17, 2023 at 1:42 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> Thus better to make sure qgroup is not enabled, with the following command:
>> If it shows error, then qgroup is really disabled and we can look
>> somewhere else.
> 
> All my disks:
> ERROR: can't list qgroups: quotas not enabled
> 
> As long as there's nothing else running on the disk I seem to be able
> to remove the files now. If I have a small workload (5k~ open
> descriptors across multiple pids) running with them open it just seems
> to fully lockup the entire box (100% utilization on the single disk,
> with the same iowait symptom and system load increasing). I killed
> everything and was able to clear around 4T. The background tasks are
> running now to actually free the space, but the "frontend" unlink
> action is done. Running multiple unlinks in parallel seems to trigger
> this - does it take a full lock per pid or something?

Unlink in btrfs is expensive, unlike other filesystems.

As btrfs put every inode into the same btree (subvolume tree), and doing 
opeartions like unlink which may involves tons of file extents, all the 
unlink operations will race to hold lock on the same btree nodes/leaves.

Normally if there are parallel operations, it's recommended to put all 
those files into different subvolumes, which can greatly improve the 
performance for btrfs.

Unlike other filesystems, which mostly put one inode per-tree, btrfs is 
not that good at handling high load inside one subvolume.

Thanks,
Qu
> 
> Kyle.
> 
> On Tue, Apr 18, 2023 at 12:19 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2023/4/18 15:07, Kyle Sanderson wrote:
>>> On Mon, Apr 17, 2023 at 1:42 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>> And talking about the "100% utilization", did you mean a single CPU core
>>>> is fully utilized by some btrfs thread?
>>>
>>> No, iostat reporting the disk is at "100%" utilization.
>>>
>>>> If so, it looks like qgroup is involved.
>>>> Did you just recently deleted one or more snapshots? If so, it's almost
>>>> certain qgroup is the cause of the 100% CPU utilization.
>>>
>>> No, I used to run snapraid-btrfs (uses snapper) but both projects are
>>> effectively orphaned. There's no snapshots left, but I used to run
>>> zygo/bees also on these disks but that also has basic problems
>>> (specifically mlock' on GBs of data when it's sitting completely idle
>>> for hours at a time). bees was not running when this happened (or for
>>> the duration of the uptime of the machine).
>>
>> Those are not affecting qgroups directly.
>>
>>>
>>>> In that case, you can disable quota for that btrfs (permanently), or use
>>>> newer kernel which has a sysfs interface:
>>>>
>>>>      /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold
>>>
>>> Thank you, I'll give it a go after the reboot here. I don't think I
>>> have quotas enabled but I did look at it at one point (I remember
>>> toggling on then off).
>>
>> Just in case, if that path exists, it means qgroup is enabled, and then
>> qgroup would be the most possible cause.
>>
>> IIRC a lot of tools like snapper would enable qgroup automatically.
>>
>> Thus better to make sure qgroup is not enabled, with the following command:
>>
>>    # btrfs qgroup show -pcre <mnt>
>>
>> If it shows error, then qgroup is really disabled and we can look
>> somewhere else.
>>
>> Thanks,
>> Qu
>>>
>>>> If you write some value like 3 into that file, btrfs would automatically
>>>> avoid such long CPU usage caused by large snapshot dropping.
>>>>
>>>> But talking about the XFS corruption, I'm not sure how btrfs can lead to
>>>> problems in XFS...
>>>
>>> All I/O seemed to be dead on the box.
>>>
>>> I tried to remove the same data again and the task hung... this time I
>>> was able to interrupt the removal and the box recovered.
>>>
>>> [92798.210656] INFO: task sync:1282043 blocked for more than 120 seconds.
>>> [92798.210683]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>>> [92798.210707] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [92798.210735] task:sync            state:D stack:    0 pid:1282043
>>> ppid:1281934 flags:0x00004002
>>> [92798.210739] Call Trace:
>>> [92798.210741]  <TASK>
>>> [92798.210743]  __schedule+0x257/0x5d0
>>> [92798.210747]  schedule+0x68/0x110
>>> [92798.210750]  wait_current_trans+0xde/0x140 [btrfs]
>>> [92798.210820]  ? destroy_sched_domains_rcu+0x40/0x40
>>> [92798.210824]  start_transaction+0x34d/0x5f0 [btrfs]
>>> [92798.210895]  btrfs_attach_transaction_barrier+0x23/0x70 [btrfs]
>>> [92798.210966]  btrfs_sync_fs+0x4a/0x1c0 [btrfs]
>>> [92798.211029]  ? vfs_fsync_range+0xa0/0xa0
>>> [92798.211033]  sync_fs_one_sb+0x26/0x40
>>> [92798.211037]  iterate_supers+0x9e/0x110
>>> [92798.211041]  ksys_sync+0x62/0xb0
>>> [92798.211045]  __do_sys_sync+0xe/0x20
>>> [92798.211049]  do_syscall_64+0x59/0x90
>>> [92798.211052]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>> [92798.211057] RIP: 0033:0x7fd08471babb
>>> [92798.211059] RSP: 002b:00007ffd344545f8 EFLAGS: 00000246 ORIG_RAX:
>>> 00000000000000a2
>>> [92798.211063] RAX: ffffffffffffffda RBX: 00007ffd344547d8 RCX: 00007fd08471babb
>>> [92798.211065] RDX: 00007fd084821101 RSI: 00007ffd344547d8 RDI: 00007fd0847d9e29
>>> [92798.211067] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
>>> [92798.211069] R10: 0000556ba8b21e40 R11: 0000000000000246 R12: 0000556ba8acebc0
>>> [92798.211071] R13: 0000556ba8acc19f R14: 00007fd08481942c R15: 0000556ba8acc034
>>> [92798.211075]  </TASK>
>>>
>>> Kyle.
>>>
>>> On Mon, Apr 17, 2023 at 1:42 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2023/4/17 13:20, Kyle Sanderson wrote:
>>>>> The single btrfs disk was at 100% utilization and a wa of 50~, reading
>>>>> back at around 2MB/s. df and similar would simply freeze. Leading up
>>>>> to this I removed around 2T of data from a single btrfs disk. I
>>>>> managed to get most of the services shutdown and disks unmounted, but
>>>>> when the system came back up I had to use xfs_repair (for the first
>>>>> time in a very long time) to boot into my system. I likely should have
>>>>> just pulled the power...
>>>>
>>>> I didn't see any obvious btrfs related work involved in the call trace.
>>>> Thus I believe it's not really hanging in a waiting status.
>>>>
>>>> And talking about the "100% utilization", did you mean a single CPU core
>>>> is fully utilized by some btrfs thread?
>>>>
>>>> If so, it looks like qgroup is involved.
>>>> Did you just recently deleted one or more snapshots? If so, it's almost
>>>> certain qgroup is the cause of the 100% CPU utilization.
>>>>
>>>> In that case, you can disable quota for that btrfs (permanently), or use
>>>> newer kernel which has a sysfs interface:
>>>>
>>>>      /sys/fs/btrfs/<UUID>/qgroups/drop_subtree_threshold
>>>>
>>>> If you write some value like 3 into that file, btrfs would automatically
>>>> avoid such long CPU usage caused by large snapshot dropping.
>>>>
>>>> But talking about the XFS corruption, I'm not sure how btrfs can lead to
>>>> problems in XFS...
>>>>
>>>> Thanks,
>>>> Qu
>>>>>
>>>>> [1147997.255020] INFO: task happywriter:3425205 blocked for more than
>>>>> 120 seconds.
>>>>> [1147997.255088]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>>>>> [1147997.255114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [1147997.255144] task:happywriter state:D stack:    0 pid:3425205
>>>>> ppid:557021 flags:0x00004000
>>>>> [1147997.255151] Call Trace:
>>>>> [1147997.255155]  <TASK>
>>>>> [1147997.255160]  __schedule+0x257/0x5d0
>>>>> [1147997.255169]  ? __schedule+0x25f/0x5d0
>>>>> [1147997.255173]  schedule+0x68/0x110
>>>>> [1147997.255176]  rwsem_down_write_slowpath+0x2ee/0x5a0
>>>>> [1147997.255180]  ? check_heap_object+0x100/0x1e0
>>>>> [1147997.255185]  down_write+0x4f/0x70
>>>>> [1147997.255189]  do_unlinkat+0x12b/0x2d0
>>>>> [1147997.255194]  __x64_sys_unlink+0x42/0x70
>>>>> [1147997.255197]  ? syscall_exit_to_user_mode+0x2a/0x50
>>>>> [1147997.255201]  do_syscall_64+0x59/0x90
>>>>> [1147997.255204]  ? do_syscall_64+0x69/0x90
>>>>> [1147997.255207]  ? sysvec_call_function_single+0x4e/0xb0
>>>>> [1147997.255211]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>> [1147997.255216] RIP: 0033:0x1202a57
>>>>> [1147997.255220] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
>>>>> 0000000000000057
>>>>> [1147997.255224] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
>>>>> 0000000001202a57
>>>>> [1147997.255226] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>>>>> 00007fe450054b60
>>>>> [1147997.255228] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
>>>>> 00000000000000e8
>>>>> [1147997.255231] R10: 0000000000000001 R11: 0000000000000246 R12:
>>>>> 00007fe467ffd5b0
>>>>> [1147997.255233] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
>>>>> 00007fe3a4e94d28
>>>>> [1147997.255239]  </TASK>
>>>>> [1148118.087966] INFO: task happywriter:3425205 blocked for more than
>>>>> 241 seconds.
>>>>> [1148118.088022]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>>>>> [1148118.088048] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [1148118.088077] task:happywriter state:D stack:    0 pid:3425205
>>>>> ppid:557021 flags:0x00004000
>>>>> [1148118.088083] Call Trace:
>>>>> [1148118.088087]  <TASK>
>>>>> [1148118.088093]  __schedule+0x257/0x5d0
>>>>> [1148118.088101]  ? __schedule+0x25f/0x5d0
>>>>> [1148118.088105]  schedule+0x68/0x110
>>>>> [1148118.088108]  rwsem_down_write_slowpath+0x2ee/0x5a0
>>>>> [1148118.088113]  ? check_heap_object+0x100/0x1e0
>>>>> [1148118.088118]  down_write+0x4f/0x70
>>>>> [1148118.088121]  do_unlinkat+0x12b/0x2d0
>>>>> [1148118.088126]  __x64_sys_unlink+0x42/0x70
>>>>> [1148118.088129]  ? syscall_exit_to_user_mode+0x2a/0x50
>>>>> [1148118.088133]  do_syscall_64+0x59/0x90
>>>>> [1148118.088136]  ? do_syscall_64+0x69/0x90
>>>>> [1148118.088139]  ? sysvec_call_function_single+0x4e/0xb0
>>>>> [1148118.088142]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>> [1148118.088148] RIP: 0033:0x1202a57
>>>>> [1148118.088151] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
>>>>> 0000000000000057
>>>>> [1148118.088155] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
>>>>> 0000000001202a57
>>>>> [1148118.088158] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>>>>> 00007fe450054b60
>>>>> [1148118.088160] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
>>>>> 00000000000000e8
>>>>> [1148118.088162] R10: 0000000000000001 R11: 0000000000000246 R12:
>>>>> 00007fe467ffd5b0
>>>>> [1148118.088164] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
>>>>> 00007fe3a4e94d28
>>>>> [1148118.088170]  </TASK>
>>>>> [1148238.912688] INFO: task kcompactd0:70 blocked for more than 120 seconds.
>>>>> [1148238.912741]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>>>>> [1148238.912767] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [1148238.912796] task:kcompactd0      state:D stack:    0 pid:   70
>>>>> ppid:     2 flags:0x00004000
>>>>> [1148238.912803] Call Trace:
>>>>> [1148238.912806]  <TASK>
>>>>> [1148238.912812]  __schedule+0x257/0x5d0
>>>>> [1148238.912821]  schedule+0x68/0x110
>>>>> [1148238.912824]  io_schedule+0x46/0x80
>>>>> [1148238.912827]  folio_wait_bit_common+0x14c/0x3a0
>>>>> [1148238.912833]  ? filemap_invalidate_unlock_two+0x50/0x50
>>>>> [1148238.912838]  __folio_lock+0x17/0x30
>>>>> [1148238.912842]  __unmap_and_move.constprop.0+0x39c/0x640
>>>>> [1148238.912847]  ? free_unref_page+0xe3/0x190
>>>>> [1148238.912852]  ? move_freelist_tail+0xe0/0xe0
>>>>> [1148238.912855]  ? move_freelist_tail+0xe0/0xe0
>>>>> [1148238.912858]  unmap_and_move+0x7d/0x4e0
>>>>> [1148238.912862]  migrate_pages+0x3b8/0x770
>>>>> [1148238.912867]  ? move_freelist_tail+0xe0/0xe0
>>>>> [1148238.912870]  ? isolate_freepages+0x2f0/0x2f0
>>>>> [1148238.912874]  compact_zone+0x2ca/0x620
>>>>> [1148238.912878]  proactive_compact_node+0x8a/0xe0
>>>>> [1148238.912883]  kcompactd+0x21c/0x4e0
>>>>> [1148238.912886]  ? destroy_sched_domains_rcu+0x40/0x40
>>>>> [1148238.912892]  ? kcompactd_do_work+0x240/0x240
>>>>> [1148238.912896]  kthread+0xeb/0x120
>>>>> [1148238.912900]  ? kthread_complete_and_exit+0x20/0x20
>>>>> [1148238.912904]  ret_from_fork+0x1f/0x30
>>>>> [1148238.912911]  </TASK>
>>>>> [1148238.913163] INFO: task happywriter:3425205 blocked for more than
>>>>> 362 seconds.
>>>>> [1148238.913195]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>>>>> [1148238.913220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [1148238.913250] task:happywriter state:D stack:    0 pid:3425205
>>>>> ppid:557021 flags:0x00004000
>>>>> [1148238.913254] Call Trace:
>>>>> [1148238.913256]  <TASK>
>>>>> [1148238.913259]  __schedule+0x257/0x5d0
>>>>> [1148238.913264]  ? __schedule+0x25f/0x5d0
>>>>> [1148238.913267]  schedule+0x68/0x110
>>>>> [1148238.913270]  rwsem_down_write_slowpath+0x2ee/0x5a0
>>>>> [1148238.913276]  ? check_heap_object+0x100/0x1e0
>>>>> [1148238.913280]  down_write+0x4f/0x70
>>>>> [1148238.913284]  do_unlinkat+0x12b/0x2d0
>>>>> [1148238.913288]  __x64_sys_unlink+0x42/0x70
>>>>> [1148238.913292]  ? syscall_exit_to_user_mode+0x2a/0x50
>>>>> [1148238.913296]  do_syscall_64+0x59/0x90
>>>>> [1148238.913299]  ? do_syscall_64+0x69/0x90
>>>>> [1148238.913301]  ? sysvec_call_function_single+0x4e/0xb0
>>>>> [1148238.913305]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>> [1148238.913310] RIP: 0033:0x1202a57
>>>>> [1148238.913315] RSP: 002b:00007fe467ffd4c8 EFLAGS: 00000246 ORIG_RAX:
>>>>> 0000000000000057
>>>>> [1148238.913319] RAX: ffffffffffffffda RBX: 00007fe3a4e94d28 RCX:
>>>>> 0000000001202a57
>>>>> [1148238.913321] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>>>>> 00007fe450054b60
>>>>> [1148238.913323] RBP: 00007fe450054b60 R08: 0000000000000000 R09:
>>>>> 00000000000000e8
>>>>> [1148238.913325] R10: 0000000000000001 R11: 0000000000000246 R12:
>>>>> 00007fe467ffd5b0
>>>>> [1148238.913328] R13: 00007fe467ffd5f0 R14: 00007fe467ffd5e0 R15:
>>>>> 00007fe3a4e94d28
>>>>> [1148238.913333]  </TASK>
>>>>> [1148238.913429] INFO: task kworker/u16:20:3402199 blocked for more
>>>>> than 120 seconds.
>>>>> [1148238.913459]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>>>>> [1148238.913496] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [1148238.913527] task:kworker/u16:20  state:D stack:    0 pid:3402199
>>>>> ppid:     2 flags:0x00004000
>>>>> [1148238.913533] Workqueue: events_unbound io_ring_exit_work
>>>>> [1148238.913539] Call Trace:
>>>>> [1148238.913541]  <TASK>
>>>>> [1148238.913544]  __schedule+0x257/0x5d0
>>>>> [1148238.913548]  schedule+0x68/0x110
>>>>> [1148238.913551]  schedule_timeout+0x122/0x160
>>>>> [1148238.913556]  __wait_for_common+0x8f/0x190
>>>>> [1148238.913559]  ? usleep_range_state+0xa0/0xa0
>>>>> [1148238.913564]  wait_for_completion+0x24/0x40
>>>>> [1148238.913567]  io_ring_exit_work+0x186/0x1e9
>>>>> [1148238.913571]  ? io_uring_del_tctx_node+0xbf/0xbf
>>>>> [1148238.913576]  process_one_work+0x21c/0x400
>>>>> [1148238.913579]  worker_thread+0x50/0x3f0
>>>>> [1148238.913583]  ? rescuer_thread+0x3a0/0x3a0
>>>>> [1148238.913586]  kthread+0xeb/0x120
>>>>> [1148238.913590]  ? kthread_complete_and_exit+0x20/0x20
>>>>> [1148238.913594]  ret_from_fork+0x1f/0x30
>>>>> [1148238.913600]  </TASK>
>>>>> [1148238.913607] INFO: task stat:3434604 blocked for more than 120 seconds.
>>>>> [1148238.913633]       Not tainted 5.19.0-38-generic #39~22.04.1-Ubuntu
>>>>> [1148238.913658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [1148238.913687] task:stat            state:D stack:    0 pid:3434604
>>>>> ppid:3396625 flags:0x00004004
>>>>> [1148238.913691] Call Trace:
>>>>> [1148238.913693]  <TASK>
>>>>> [1148238.913695]  __schedule+0x257/0x5d0
>>>>> [1148238.913699]  schedule+0x68/0x110
>>>>> [1148238.913703]  request_wait_answer+0x13f/0x220
>>>>> [1148238.913708]  ? destroy_sched_domains_rcu+0x40/0x40
>>>>> [1148238.913713]  fuse_simple_request+0x1bb/0x370
>>>>> [1148238.913717]  fuse_do_getattr+0xda/0x320
>>>>> [1148238.913720]  ? try_to_unlazy+0x5b/0xd0
>>>>> [1148238.913726]  fuse_update_get_attr+0xb3/0xf0
>>>>> [1148238.913730]  fuse_getattr+0x87/0xd0
>>>>> [1148238.913733]  vfs_getattr_nosec+0xba/0x100
>>>>> [1148238.913737]  vfs_statx+0xa9/0x140
>>>>> [1148238.913741]  vfs_fstatat+0x59/0x80
>>>>> [1148238.913744]  __do_sys_newlstat+0x38/0x80
>>>>> [1148238.913750]  __x64_sys_newlstat+0x16/0x20
>>>>> [1148238.913753]  do_syscall_64+0x59/0x90
>>>>> [1148238.913757]  ? handle_mm_fault+0xba/0x2a0
>>>>> [1148238.913761]  ? exit_to_user_mode_prepare+0xaf/0xd0
>>>>> [1148238.913765]  ? irqentry_exit_to_user_mode+0x9/0x20
>>>>> [1148238.913769]  ? irqentry_exit+0x43/0x50
>>>>> [1148238.913772]  ? exc_page_fault+0x92/0x1b0
>>>>> [1148238.913776]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>> [1148238.913780] RIP: 0033:0x7fa76960d6ea
>>>>> [1148238.913783] RSP: 002b:00007ffe6d011878 EFLAGS: 00000246 ORIG_RAX:
>>>>> 0000000000000006
>>>>> [1148238.913787] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>>>> 00007fa76960d6ea
>>>>> [1148238.913789] RDX: 00007ffe6d011890 RSI: 00007ffe6d011890 RDI:
>>>>> 00007ffe6d01290b
>>>>> [1148238.913791] RBP: 00007ffe6d011a70 R08: 0000000000000001 R09:
>>>>> 0000000000000000
>>>>> [1148238.913793] R10: fffffffffffff3ce R11: 0000000000000246 R12:
>>>>> 0000000000000000
>>>>> [1148238.913795] R13: 000055feff0d2960 R14: 00007ffe6d011a68 R15:
>>>>> 00007ffe6d01290b
>>>>> [1148238.913800]  </TASK>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs induced data loss (on xfs) - 5.19.0-38-generic
  2023-04-17  5:20 btrfs induced data loss (on xfs) - 5.19.0-38-generic Kyle Sanderson
  2023-04-17  6:01 ` Greg KH
  2023-04-17  8:42 ` Qu Wenruo
@ 2023-04-19  1:55 ` Dave Chinner
  2 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2023-04-19  1:55 UTC (permalink / raw)
  To: Kyle Sanderson
  Cc: linux-btrfs, Linux-Kernal, linux-block, linux-fsdevel,
	Linus Torvalds, Greg KH

On Sun, Apr 16, 2023 at 10:20:45PM -0700, Kyle Sanderson wrote:
> The single btrfs disk was at 100% utilization and a wa of 50~, reading
> back at around 2MB/s. df and similar would simply freeze. Leading up
> to this I removed around 2T of data from a single btrfs disk. I
> managed to get most of the services shutdown and disks unmounted, but
> when the system came back up I had to use xfs_repair (for the first

What exactly was the error messages XFS emitted when it failed to
mount, and what did xfs_repair fix to enable it to boot? Unless you
have a broken disk (i.e. firmware either lies about cache flush
completion or the implementation is broken), there is no reason for
any amount of load on the disk to cause filesystem corruption....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-04-19  1:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-17  5:20 btrfs induced data loss (on xfs) - 5.19.0-38-generic Kyle Sanderson
2023-04-17  6:01 ` Greg KH
2023-04-17  6:07   ` Kyle Sanderson
2023-04-17  8:42 ` Qu Wenruo
2023-04-18  7:07   ` Kyle Sanderson
2023-04-18  7:18     ` Qu Wenruo
2023-04-18  7:31       ` Kyle Sanderson
2023-04-18  8:00         ` Qu Wenruo
2023-04-19  1:55 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).