commands like "du", "df", and "btrfs fs sync" hang

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* commands like "du", "df", and "btrfs fs sync" hang
@ 2016-05-01  7:00 Kai Krakow
  2016-05-01 12:47 ` Duncan
  2016-05-02  0:39 ` Nicholas D Steeves
  0 siblings, 2 replies; 19+ messages in thread
From: Kai Krakow @ 2016-05-01  7:00 UTC (permalink / raw)
  To: linux-btrfs

Hello!

I'm not sure what triggeres this, neither if it is btrfs specific. The
filesystems have been recreated from scratch. Mainly during my rsync
backup (from btrfs to btrfs), but not necessarily limited to rsync
usage, my system experiences uninterruptable freezes of commands like
"df", "du", "btrfs fs sync", and probably more.

I created a backtrace in dmesg using sysrq+w. Here's the log:
(https://gist.github.com/kakra/8e6a2b7ac4047fa9e345648664f5667b)

[92481.503505] sysrq: SysRq : Show Blocked State
[92481.503511]   task                        PC stack   pid father
[92481.503603] ksysguardd      D 00000000000141c0     0 32516      1 0x00000004
[92481.503609]  ffff880231748c00 ffff88032a468000 ffff88032a467bc0 ffff880231748c00
[92481.503612]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000 ffffffff817167cc
[92481.503615]  7fffffffffffffff ffffffff817190df ffff88032a467bc0 ffffffff810b4d99
[92481.503617] Call Trace:
[92481.503625]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
[92481.503630]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
[92481.503635]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
[92481.503639]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
[92481.503642]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
[92481.503646]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
[92481.503649]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
[92481.503652]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
[92481.503654]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
[92481.503657]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
[92481.503660]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
[92481.503662]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
[92481.503665]  [<ffffffff8117ddc8>] ? path_lookupat+0x58/0x100
[92481.503667]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
[92481.503671]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
[92481.503674]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
[92481.503677]  [<ffffffff8119cc8a>] ? user_statfs+0x2a/0x80
[92481.503679]  [<ffffffff8119ccf0>] ? SYSC_statfs+0x10/0x30
[92481.503683]  [<ffffffff8103ea53>] ? __do_page_fault+0x1a3/0x400
[92481.503686]  [<ffffffff81719cd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
[92481.503688] trash.so        D 00000000000141c0     0   449   1478 0x00000004
[92481.503691]  ffff8802ee5b1800 ffff88022c720000 ffff88022c71fb70 ffff8802ee5b1800
[92481.503694]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000 ffffffff817167cc
[92481.503697]  7fffffffffffffff ffffffff817190df ffff88022c71fb70 ffffffff810b4d99
[92481.503699] Call Trace:
[92481.503702]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
[92481.503706]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
[92481.503709]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
[92481.503712]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
[92481.503714]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
[92481.503717]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
[92481.503720]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
[92481.503722]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
[92481.503724]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
[92481.503727]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
[92481.503729]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
[92481.503731]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
[92481.503734]  [<ffffffff8117d973>] ? link_path_walk+0x163/0x4f0
[92481.503736]  [<ffffffff8117c4d1>] ? path_init+0x1f1/0x330
[92481.503738]  [<ffffffff8117dde7>] ? path_lookupat+0x77/0x100
[92481.503741]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
[92481.503743]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
[92481.503746]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
[92481.503749]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
[92481.503752]  [<ffffffff81176ddd>] ? SYSC_newlstat+0x1d/0x40
[92481.503754]  [<ffffffff8117271d>] ? vfs_write+0x13d/0x180
[92481.503757]  [<ffffffff811734a1>] ? SyS_write+0x61/0x90
[92481.503760]  [<ffffffff810f5175>] ? from_kuid_munged+0x5/0x10
[92481.503763]  [<ffffffff81719cd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
[92481.503766] ksysguardd      D 00000000000141c0     0  9869      1 0x00000004
[92481.503769]  ffff880092012400 ffff880203868000 ffff880203867bc0 ffff880092012400
[92481.503771]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000 ffffffff817167cc
[92481.503774]  7fffffffffffffff ffffffff817190df ffff880203867bc0 ffffffff810b4d99
[92481.503776] Call Trace:
[92481.503779]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
[92481.503782]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
[92481.503786]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
[92481.503789]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
[92481.503791]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
[92481.503794]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
[92481.503797]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
[92481.503799]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
[92481.503801]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
[92481.503803]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
[92481.503806]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
[92481.503808]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
[92481.503810]  [<ffffffff8117ddc8>] ? path_lookupat+0x58/0x100
[92481.503813]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
[92481.503815]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
[92481.503818]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
[92481.503820]  [<ffffffff8119cc8a>] ? user_statfs+0x2a/0x80
[92481.503822]  [<ffffffff8119ccf0>] ? SYSC_statfs+0x10/0x30
[92481.503825]  [<ffffffff8103ea53>] ? __do_page_fault+0x1a3/0x400
[92481.503827]  [<ffffffff81719cd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
[92481.503830] trash.so        D 0000000000000000     0 14351   1478 0x00000004
[92481.503833]  ffff8803072ac800 ffff8800438fc000 ffff8800438fbb70 ffff8803072ac800
[92481.503835]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000 ffffffff817167cc
[92481.503838]  7fffffffffffffff ffffffff817190df 0000000000000010 0000000000000287
[92481.503841] Call Trace:
[92481.503844]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
[92481.503847]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
[92481.503850]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
[92481.503852]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
[92481.503855]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
[92481.503858]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
[92481.503860]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
[92481.503862]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
[92481.503864]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
[92481.503866]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
[92481.503869]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
[92481.503871]  [<ffffffff8117d973>] ? link_path_walk+0x163/0x4f0
[92481.503873]  [<ffffffff8117c4d1>] ? path_init+0x1f1/0x330
[92481.503876]  [<ffffffff8117dde7>] ? path_lookupat+0x77/0x100
[92481.503878]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
[92481.503880]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
[92481.503883]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
[92481.503886]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
[92481.503889]  [<ffffffff81176ddd>] ? SYSC_newlstat+0x1d/0x40
[92481.503891]  [<ffffffff8117271d>] ? vfs_write+0x13d/0x180
[92481.503893]  [<ffffffff811734a1>] ? SyS_write+0x61/0x90
[92481.503895]  [<ffffffff810f5175>] ? from_kuid_munged+0x5/0x10
[92481.503898]  [<ffffffff81719cd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
[92481.503901] trash.so        D 0000000000000000     0 17062   1478 0x00000004
[92481.503903]  ffff880350ee4800 ffff8801e4d5c000 ffff8801e4d5bb70 ffff880350ee4800
[92481.503906]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000 ffffffff817167cc
[92481.503908]  7fffffffffffffff ffffffff817190df ffff8801e4d5bb70 ffffffff810b4d99
[92481.503911] Call Trace:
[92481.503914]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
[92481.503917]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
[92481.503920]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
[92481.503923]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
[92481.503925]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
[92481.503928]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
[92481.503931]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
[92481.503934]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
[92481.503936]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
[92481.503938]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
[92481.503940]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
[92481.503942]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
[92481.503945]  [<ffffffff8117d973>] ? link_path_walk+0x163/0x4f0
[92481.503947]  [<ffffffff8117c4d1>] ? path_init+0x1f1/0x330
[92481.503949]  [<ffffffff8117dde7>] ? path_lookupat+0x77/0x100
[92481.503951]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
[92481.503954]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
[92481.503956]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
[92481.503959]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
[92481.503962]  [<ffffffff81176ddd>] ? SYSC_newlstat+0x1d/0x40
[92481.503964]  [<ffffffff8117271d>] ? vfs_write+0x13d/0x180
[92481.503966]  [<ffffffff811734a1>] ? SyS_write+0x61/0x90
[92481.503969]  [<ffffffff810f5175>] ? from_kuid_munged+0x5/0x10
[92481.503971]  [<ffffffff81719cd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
[92481.503973] df              D 00000000000141c0     0 20885  20797 0x00000004
[92481.503976]  ffff8803c42e6000 ffff8801001b4000 ffff8801001b3bd0 ffff8803c42e6000
[92481.503979]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000 ffffffff817167cc
[92481.503981]  7fffffffffffffff ffffffff817190df ffff8801001b3bd0 ffffffff810b4d99
[92481.503984] Call Trace:
[92481.503987]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
[92481.503990]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
[92481.503993]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
[92481.503996]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
[92481.503999]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
[92481.504002]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
[92481.504005]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
[92481.504007]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
[92481.504009]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
[92481.504011]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
[92481.504013]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
[92481.504016]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
[92481.504018]  [<ffffffff8117ddc8>] ? path_lookupat+0x58/0x100
[92481.504020]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
[92481.504023]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
[92481.504026]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
[92481.504028]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
[92481.504031]  [<ffffffff81176d9a>] ? SYSC_newstat+0x1a/0x40
[92481.504034]  [<ffffffff81719cd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a


-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* commands like "du", "df", and "btrfs fs sync" hang
@ 2016-05-01 10:22 Kai Krakow
  0 siblings, 0 replies; 19+ messages in thread
From: Kai Krakow @ 2016-05-01 10:22 UTC (permalink / raw)
  To: linux-btrfs

Hello!

I'm not sure what triggeres this, neither if it is btrfs specific. The
filesystems have been recreated from scratch. Mainly during my rsync
backup (from btrfs to btrfs), but not necessarily limited to rsync
usage, my system experiences uninterruptable freezes of commands like
"df", "du", "btrfs fs sync", and probably more.

I created a backtrace in dmesg using sysrq+w. Here's the log:
(https://gist.github.com/kakra/8e6a2b7ac4047fa9e345648664f5667b)

[92481.503505] sysrq: SysRq : Show Blocked State
[92481.503511]   task                        PC stack   pid father
[92481.503603] ksysguardd      D 00000000000141c0     0 32516      1
0x00000004 [92481.503609]  ffff880231748c00 ffff88032a468000
ffff88032a467bc0 ffff880231748c00 [92481.503612]  ffff880403cd8f20
ffff880403cd8f00 0000000000000000 ffffffff817167cc [92481.503615]
7fffffffffffffff ffffffff817190df ffff88032a467bc0 ffffffff810b4d99
[92481.503617] Call Trace: [92481.503625]  [<ffffffff817167cc>] ?
schedule+0x2c/0x80 [92481.503630]  [<ffffffff817190df>] ?
schedule_timeout+0x13f/0x1a0 [92481.503635]  [<ffffffff810b4d99>] ?
finish_wait+0x29/0x60 [92481.503639]  [<ffffffff81718229>] ?
mutex_lock+0x9/0x30 [92481.503642]  [<ffffffff811e0e23>] ?
autofs4_wait+0x433/0x710 [92481.503646]  [<ffffffff81717012>] ?
wait_for_completion+0x92/0xf0 [92481.503649]  [<ffffffff810a2b80>] ?
wake_up_q+0x60/0x60 [92481.503652]  [<ffffffff811e1a4f>] ?
autofs4_expire_wait+0x5f/0x80 [92481.503654]  [<ffffffff811dfa36>] ?
autofs4_d_manage+0x56/0x150 [92481.503657]  [<ffffffff8117b811>] ?
follow_managed+0x91/0x2c0 [92481.503660]  [<ffffffff8117bcd2>] ?
lookup_fast+0x102/0x300 [92481.503662]  [<ffffffff8117d3f1>] ?
walk_component+0x31/0x450 [92481.503665]  [<ffffffff8117ddc8>] ?
path_lookupat+0x58/0x100 [92481.503667]  [<ffffffff811800d5>] ?
filename_lookup+0x95/0x110 [92481.503671]  [<ffffffff8115b55e>] ?
kmem_cache_alloc+0x10e/0x160 [92481.503674]  [<ffffffff8117fdeb>] ?
getname_flags+0x4b/0x170 [92481.503677]  [<ffffffff8119cc8a>] ?
user_statfs+0x2a/0x80 [92481.503679]  [<ffffffff8119ccf0>] ?
SYSC_statfs+0x10/0x30 [92481.503683]  [<ffffffff8103ea53>] ?
__do_page_fault+0x1a3/0x400 [92481.503686]  [<ffffffff81719cd7>] ?
entry_SYSCALL_64_fastpath+0x12/0x6a [92481.503688] trash.so        D
00000000000141c0     0   449   1478 0x00000004 [92481.503691]
ffff8802ee5b1800 ffff88022c720000 ffff88022c71fb70 ffff8802ee5b1800
[92481.503694]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000
ffffffff817167cc [92481.503697]  7fffffffffffffff ffffffff817190df
ffff88022c71fb70 ffffffff810b4d99 [92481.503699] Call Trace:
[92481.503702]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
[92481.503706]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
[92481.503709]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
[92481.503712]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
[92481.503714]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
[92481.503717]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
[92481.503720]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
[92481.503722]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
[92481.503724]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
[92481.503727]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
[92481.503729]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
[92481.503731]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
[92481.503734]  [<ffffffff8117d973>] ? link_path_walk+0x163/0x4f0
[92481.503736]  [<ffffffff8117c4d1>] ? path_init+0x1f1/0x330
[92481.503738]  [<ffffffff8117dde7>] ? path_lookupat+0x77/0x100
[92481.503741]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
[92481.503743]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
[92481.503746]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
[92481.503749]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
[92481.503752]  [<ffffffff81176ddd>] ? SYSC_newlstat+0x1d/0x40
[92481.503754]  [<ffffffff8117271d>] ? vfs_write+0x13d/0x180
[92481.503757]  [<ffffffff811734a1>] ? SyS_write+0x61/0x90
[92481.503760]  [<ffffffff810f5175>] ? from_kuid_munged+0x5/0x10
[92481.503763]  [<ffffffff81719cd7>] ?
entry_SYSCALL_64_fastpath+0x12/0x6a [92481.503766] ksysguardd      D
00000000000141c0     0  9869      1 0x00000004 [92481.503769]
ffff880092012400 ffff880203868000 ffff880203867bc0 ffff880092012400
[92481.503771]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000
ffffffff817167cc [92481.503774]  7fffffffffffffff ffffffff817190df
ffff880203867bc0 ffffffff810b4d99 [92481.503776] Call Trace:
[92481.503779]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
[92481.503782]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
[92481.503786]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
[92481.503789]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
[92481.503791]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
[92481.503794]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
[92481.503797]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
[92481.503799]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
[92481.503801]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
[92481.503803]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
[92481.503806]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
[92481.503808]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
[92481.503810]  [<ffffffff8117ddc8>] ? path_lookupat+0x58/0x100
[92481.503813]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
[92481.503815]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
[92481.503818]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
[92481.503820]  [<ffffffff8119cc8a>] ? user_statfs+0x2a/0x80
[92481.503822]  [<ffffffff8119ccf0>] ? SYSC_statfs+0x10/0x30
[92481.503825]  [<ffffffff8103ea53>] ? __do_page_fault+0x1a3/0x400
[92481.503827]  [<ffffffff81719cd7>] ?
entry_SYSCALL_64_fastpath+0x12/0x6a [92481.503830] trash.so        D
0000000000000000     0 14351   1478 0x00000004 [92481.503833]
ffff8803072ac800 ffff8800438fc000 ffff8800438fbb70 ffff8803072ac800
[92481.503835]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000
ffffffff817167cc [92481.503838]  7fffffffffffffff ffffffff817190df
0000000000000010 0000000000000287 [92481.503841] Call Trace:
[92481.503844]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
[92481.503847]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
[92481.503850]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
[92481.503852]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
[92481.503855]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
[92481.503858]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
[92481.503860]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
[92481.503862]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
[92481.503864]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
[92481.503866]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
[92481.503869]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
[92481.503871]  [<ffffffff8117d973>] ? link_path_walk+0x163/0x4f0
[92481.503873]  [<ffffffff8117c4d1>] ? path_init+0x1f1/0x330
[92481.503876]  [<ffffffff8117dde7>] ? path_lookupat+0x77/0x100
[92481.503878]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
[92481.503880]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
[92481.503883]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
[92481.503886]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
[92481.503889]  [<ffffffff81176ddd>] ? SYSC_newlstat+0x1d/0x40
[92481.503891]  [<ffffffff8117271d>] ? vfs_write+0x13d/0x180
[92481.503893]  [<ffffffff811734a1>] ? SyS_write+0x61/0x90
[92481.503895]  [<ffffffff810f5175>] ? from_kuid_munged+0x5/0x10
[92481.503898]  [<ffffffff81719cd7>] ?
entry_SYSCALL_64_fastpath+0x12/0x6a [92481.503901] trash.so        D
0000000000000000     0 17062   1478 0x00000004 [92481.503903]
ffff880350ee4800 ffff8801e4d5c000 ffff8801e4d5bb70 ffff880350ee4800
[92481.503906]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000
ffffffff817167cc [92481.503908]  7fffffffffffffff ffffffff817190df
ffff8801e4d5bb70 ffffffff810b4d99 [92481.503911] Call Trace:
[92481.503914]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
[92481.503917]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
[92481.503920]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
[92481.503923]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
[92481.503925]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
[92481.503928]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
[92481.503931]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
[92481.503934]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
[92481.503936]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
[92481.503938]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
[92481.503940]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
[92481.503942]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
[92481.503945]  [<ffffffff8117d973>] ? link_path_walk+0x163/0x4f0
[92481.503947]  [<ffffffff8117c4d1>] ? path_init+0x1f1/0x330
[92481.503949]  [<ffffffff8117dde7>] ? path_lookupat+0x77/0x100
[92481.503951]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
[92481.503954]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
[92481.503956]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
[92481.503959]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
[92481.503962]  [<ffffffff81176ddd>] ? SYSC_newlstat+0x1d/0x40
[92481.503964]  [<ffffffff8117271d>] ? vfs_write+0x13d/0x180
[92481.503966]  [<ffffffff811734a1>] ? SyS_write+0x61/0x90
[92481.503969]  [<ffffffff810f5175>] ? from_kuid_munged+0x5/0x10
[92481.503971]  [<ffffffff81719cd7>] ?
entry_SYSCALL_64_fastpath+0x12/0x6a [92481.503973] df              D
00000000000141c0     0 20885  20797 0x00000004 [92481.503976]
ffff8803c42e6000 ffff8801001b4000 ffff8801001b3bd0 ffff8803c42e6000
[92481.503979]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000
ffffffff817167cc [92481.503981]  7fffffffffffffff ffffffff817190df
ffff8801001b3bd0 ffffffff810b4d99 [92481.503984] Call Trace:
[92481.503987]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
[92481.503990]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
[92481.503993]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
[92481.503996]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
[92481.503999]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
[92481.504002]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
[92481.504005]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
[92481.504007]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
[92481.504009]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
[92481.504011]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
[92481.504013]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
[92481.504016]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
[92481.504018]  [<ffffffff8117ddc8>] ? path_lookupat+0x58/0x100
[92481.504020]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
[92481.504023]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
[92481.504026]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
[92481.504028]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
[92481.504031]  [<ffffffff81176d9a>] ? SYSC_newstat+0x1a/0x40
[92481.504034]  [<ffffffff81719cd7>] ?
entry_SYSCALL_64_fastpath+0x12/0x6a


-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-01  7:00 commands like "du", "df", and "btrfs fs sync" hang Kai Krakow
@ 2016-05-01 12:47 ` Duncan
  2016-05-01 13:54   ` Mike Fleetwood
                     ` (2 more replies)
  2016-05-02  0:39 ` Nicholas D Steeves
  1 sibling, 3 replies; 19+ messages in thread
From: Duncan @ 2016-05-01 12:47 UTC (permalink / raw)
  To: linux-btrfs

Kai Krakow posted on Sun, 01 May 2016 09:00:46 +0200 as excerpted:

> I'm not sure what triggeres this, neither if it is btrfs specific. The
> filesystems have been recreated from scratch. Mainly during my rsync
> backup (from btrfs to btrfs), but not necessarily limited to rsync
> usage, my system experiences uninterruptable freezes of commands like
> "df", "du", "btrfs fs sync", and probably more.
> 
> I created a backtrace in dmesg using sysrq+w. Here's the log:
> (https://gist.github.com/kakra/8e6a2b7ac4047fa9e345648664f5667b)

FWIW, that link give me the github "this is not the web page you are
looking for" 404 error...  Do you have it set private or something?

Anyway...

> [92481.503505] sysrq: SysRq : Show Blocked State
> [92481.503511]   task                        PC stack   pid father
> [92481.503603] ksysguardd      D 00000000000141c0     0 32516      1 0x00000004
> [92481.503609]  ffff880231748c00 ffff88032a468000 ffff88032a467bc0 ffff880231748c00
> [92481.503612]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000 ffffffff817167cc
> [92481.503615]  7fffffffffffffff ffffffff817190df ffff88032a467bc0 ffffffff810b4d99
> [92481.503617] Call Trace:
> [92481.503625]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
> [92481.503630]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
> [92481.503635]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
> [92481.503639]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
> [92481.503642]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
> [92481.503646]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
> [92481.503649]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
> [92481.503652]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
> [92481.503654]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
> [92481.503657]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
> [92481.503660]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
> [92481.503662]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
> [92481.503665]  [<ffffffff8117ddc8>] ? path_lookupat+0x58/0x100
> [92481.503667]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
> [92481.503671]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
> [92481.503674]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
> [92481.503677]  [<ffffffff8119cc8a>] ? user_statfs+0x2a/0x80
> [92481.503679]  [<ffffffff8119ccf0>] ? SYSC_statfs+0x10/0x30
> [92481.503683]  [<ffffffff8103ea53>] ? __do_page_fault+0x1a3/0x400
> [92481.503686]  [<ffffffff81719cd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
> [92481.503688] trash.so        D 00000000000141c0     0   449   1478 0x00000004
> [92481.503691]  ffff8802ee5b1800 ffff88022c720000 ffff88022c71fb70 ffff8802ee5b1800
> [92481.503694]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000 ffffffff817167cc
> [92481.503697]  7fffffffffffffff ffffffff817190df ffff88022c71fb70 ffffffff810b4d99
> [92481.503699] Call Trace:
> [92481.503702]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
> [92481.503706]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
> [92481.503709]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
> [92481.503712]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
> [92481.503714]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
> [92481.503717]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
> [92481.503720]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
> [92481.503722]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
> [92481.503724]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
> [92481.503727]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
> [92481.503729]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
> [92481.503731]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
> [92481.503734]  [<ffffffff8117d973>] ? link_path_walk+0x163/0x4f0
> [92481.503736]  [<ffffffff8117c4d1>] ? path_init+0x1f1/0x330
> [92481.503738]  [<ffffffff8117dde7>] ? path_lookupat+0x77/0x100
> [92481.503741]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
> [92481.503743]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
> [92481.503746]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
> [92481.503749]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
> [92481.503752]  [<ffffffff81176ddd>] ? SYSC_newlstat+0x1d/0x40
> [92481.503754]  [<ffffffff8117271d>] ? vfs_write+0x13d/0x180
> [92481.503757]  [<ffffffff811734a1>] ? SyS_write+0x61/0x90
> [92481.503760]  [<ffffffff810f5175>] ? from_kuid_munged+0x5/0x10
> [92481.503763]  [<ffffffff81719cd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
> [92481.503766] ksysguardd      D 00000000000141c0     0  9869      1 0x00000004
> [92481.503769]  ffff880092012400 ffff880203868000 ffff880203867bc0 ffff880092012400
> [92481.503771]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000 ffffffff817167cc
> [92481.503774]  7fffffffffffffff ffffffff817190df ffff880203867bc0 ffffffff810b4d99
> [92481.503776] Call Trace:
> [92481.503779]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
> [92481.503782]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
> [92481.503786]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
> [92481.503789]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
> [92481.503791]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
> [92481.503794]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
> [92481.503797]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
> [92481.503799]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
> [92481.503801]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
> [92481.503803]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
> [92481.503806]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
> [92481.503808]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
> [92481.503810]  [<ffffffff8117ddc8>] ? path_lookupat+0x58/0x100
> [92481.503813]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
> [92481.503815]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
> [92481.503818]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
> [92481.503820]  [<ffffffff8119cc8a>] ? user_statfs+0x2a/0x80
> [92481.503822]  [<ffffffff8119ccf0>] ? SYSC_statfs+0x10/0x30
> [92481.503825]  [<ffffffff8103ea53>] ? __do_page_fault+0x1a3/0x400
> [92481.503827]  [<ffffffff81719cd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
> [92481.503830] trash.so        D 0000000000000000     0 14351   1478 0x00000004
> [92481.503833]  ffff8803072ac800 ffff8800438fc000 ffff8800438fbb70 ffff8803072ac800
> [92481.503835]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000 ffffffff817167cc
> [92481.503838]  7fffffffffffffff ffffffff817190df 0000000000000010 0000000000000287
> [92481.503841] Call Trace:
> [92481.503844]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
> [92481.503847]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
> [92481.503850]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
> [92481.503852]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
> [92481.503855]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
> [92481.503858]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
> [92481.503860]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
> [92481.503862]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
> [92481.503864]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
> [92481.503866]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
> [92481.503869]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
> [92481.503871]  [<ffffffff8117d973>] ? link_path_walk+0x163/0x4f0
> [92481.503873]  [<ffffffff8117c4d1>] ? path_init+0x1f1/0x330
> [92481.503876]  [<ffffffff8117dde7>] ? path_lookupat+0x77/0x100
> [92481.503878]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
> [92481.503880]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
> [92481.503883]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
> [92481.503886]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
> [92481.503889]  [<ffffffff81176ddd>] ? SYSC_newlstat+0x1d/0x40
> [92481.503891]  [<ffffffff8117271d>] ? vfs_write+0x13d/0x180
> [92481.503893]  [<ffffffff811734a1>] ? SyS_write+0x61/0x90
> [92481.503895]  [<ffffffff810f5175>] ? from_kuid_munged+0x5/0x10
> [92481.503898]  [<ffffffff81719cd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
> [92481.503901] trash.so        D 0000000000000000     0 17062   1478 0x00000004
> [92481.503903]  ffff880350ee4800 ffff8801e4d5c000 ffff8801e4d5bb70 ffff880350ee4800
> [92481.503906]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000 ffffffff817167cc
> [92481.503908]  7fffffffffffffff ffffffff817190df ffff8801e4d5bb70 ffffffff810b4d99
> [92481.503911] Call Trace:
> [92481.503914]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
> [92481.503917]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
> [92481.503920]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
> [92481.503923]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
> [92481.503925]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
> [92481.503928]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
> [92481.503931]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
> [92481.503934]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
> [92481.503936]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
> [92481.503938]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
> [92481.503940]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
> [92481.503942]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
> [92481.503945]  [<ffffffff8117d973>] ? link_path_walk+0x163/0x4f0
> [92481.503947]  [<ffffffff8117c4d1>] ? path_init+0x1f1/0x330
> [92481.503949]  [<ffffffff8117dde7>] ? path_lookupat+0x77/0x100
> [92481.503951]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
> [92481.503954]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
> [92481.503956]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
> [92481.503959]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
> [92481.503962]  [<ffffffff81176ddd>] ? SYSC_newlstat+0x1d/0x40
> [92481.503964]  [<ffffffff8117271d>] ? vfs_write+0x13d/0x180
> [92481.503966]  [<ffffffff811734a1>] ? SyS_write+0x61/0x90
> [92481.503969]  [<ffffffff810f5175>] ? from_kuid_munged+0x5/0x10
> [92481.503971]  [<ffffffff81719cd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
> [92481.503973] df              D 00000000000141c0     0 20885  20797 0x00000004
> [92481.503976]  ffff8803c42e6000 ffff8801001b4000 ffff8801001b3bd0 ffff8803c42e6000
> [92481.503979]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000 ffffffff817167cc
> [92481.503981]  7fffffffffffffff ffffffff817190df ffff8801001b3bd0 ffffffff810b4d99
> [92481.503984] Call Trace:
> [92481.503987]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
> [92481.503990]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
> [92481.503993]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
> [92481.503996]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
> [92481.503999]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
> [92481.504002]  [<ffffffff81717012>] ? wait_for_completion+0x92/0xf0
> [92481.504005]  [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60
> [92481.504007]  [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
> [92481.504009]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
> [92481.504011]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
> [92481.504013]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
> [92481.504016]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
> [92481.504018]  [<ffffffff8117ddc8>] ? path_lookupat+0x58/0x100
> [92481.504020]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
> [92481.504023]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
> [92481.504026]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
> [92481.504028]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
> [92481.504031]  [<ffffffff81176d9a>] ? SYSC_newstat+0x1a/0x40
> [92481.504034]  [<ffffffff81719cd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a

IIRC you're on gentoo as I am.  It's interesting how close our systems are
to each other, and yet how far away.  Based on the ksysguardd above, I'm
guessing you run kde, as I do, and we both use claws for mail (tho messages
here will say pan, as I use it for news and do my mailing lists as news via
gmane.org's list2news service).

But I run a pretty stripped down kde; nothing like automount (the autofs4
in your traces) at all, the baloo parts of plasma-workspace and
plasma-desktop patched out at build time so I can use a virtual
"null-package" baloo from my overlay that installs no files and has no
dependencies to fill the dep (easier than patching it out of the ebuild,
unfortunately the hoops I gotta go thru now that they dumped the
semantic-desktop USE flag for kde/frameworks/plasma5); similarly the
runtime-only udisks dep is filled by a null-package as I don't need or
want those features and definitely don't want to pull in all its deps, etc.

But with kde so heavily stripped down, mostly only the basic desktop and
many of the games, it's reasonably easy to run the live-9999 ebuilds from
the gentoo/kde overlay and use smart-live-rebuild to only rebuild packages
with new commits, so I'm actually running live-git kde, for the parts of
kde I have installed at all, anyway. =:^)

To the issue at hand...

As you know I'm a btrfs user, not a dev.  However, if I'm not mistaken,
the D after each of the apps indicates the same "disk sleep" status
you'd see in top and the like, for apps waiting on IO.

Further, each of the listed apps appears to be waiting at the same
schedule function, basically yielding the cpu to other apps until the
filesystem returns what they're looking for, up the call stack to
obviously each of the apps doing a file operation, apparently write in
some cases, stat on some not-yet-cached file in other cases.

Looks to me pretty normal, just waiting for access to the disk when
something else is hogging all the (limited) disk bandwidth.  Nothing
like btrfs worker threads hung up, etc, that would make it
btrfs-specific.

Meanwhile, du isn't in the traces above (tho df is), but you mention
it in the title and first paragraph, so it's worth noting that du,
unlike df, generally crawls the directory trees you point it at to
get the space used by all the files in all the dirs, recursively,
so it'll do much more IO than something like df, which in general is
a far faster call to the filesystem, asking for a single set of stats
that the filesystem can normally return without crawling a big
directory tree.

But obviously the df call here was waiting for /something/ from disk.
Maybe it was actually still trying to load the df binary itself?  (df
doesn't load but three libs, the linux-vdso kernel-virtual being one
of them, the ld loader itself, and the libc, pretty basic and the two
actual file-based libs, the loader itself and libc, will obviously be
cached, so it shouldn't block on loading them.)

As for btrfs fi sync, that's going to force-write-out all dirty
write-cache for that filesystem, so if it's called on the btrfs
the rsync is writing to, that's likely to block for quite some time
waiting for all those writes to flush, as well.

If btrfs filesystem sync is called on a _different_ btrfs, AND that
independent btrfs is on an entirely separate physical device (AND
that separate physical device isn't on the same legacy IDE cable or
SATA port-multiplier), THEN btrfs filesystem sync should return
relatively fast, compared to a general sync at least, since it
will only flush that separate filesystem, not all filesystems as
would a general sync.

Meanwhile, general IO scheduling is a somewhat different subject.
I believe the kernel does try to prioritize reads over writes,
pushing them to the front of the queue, which should get them
addressed and any apps only waiting on reads unstuck faster,
but on spinning rust, it can still take some time.

But there are at least two possible large problems.  One is
BIOS related and on many systems, not configurable at all, tho
on higher end server or gaming systems it can be, the other
is kernel IO tuning.

The BIOS related one, and I'm not sure it even applies to PCIE,
as the old system I had with the option was a pre-PCIE server
board with high-speed PCIX (and AGP still, remember that? =:^)
but not PCIE, has to do with IO thruput vs. responsiveness.
Unfortunately IDR what it was called exactly, but the settings
were IIRC powers of two from something like 16 MB (IIRC) up thru
128 or 256 MB.  Lower settings were more responsive, but 
would slow total thruput in certain use-cases as there was
more command and bus overhead.

The reason I still mention this here at all is that with the
higher settings, I definitely had IO-related lagginess in a lot
of things, including mouse movement (freeze-and-jump syndrome)
and even audio playback (dropouts).  I tried a bunch of kernel
adjustments, etc, but the problem in this case was below the
kernel in the BIOS settings, and wasn't fixed until I decided
to try lowering that setting to IIRC 32 MB (or whatever), after
which the problem simply disappeared!

The point being, sometimes it's nothing the kernel can really
control!  Of course that general primary point applies regardless
of whether there's anything like that specific setting in more
modern PCIE style hardware, which is why I still mentioned it.

Meanwhile, what kernel IO scheduler do you use (deadline, noop,
cfq,... cfq is the normal default)?  Do you use either normal
process nice/priority or ionice to control the rsync?  What
about cgroups?

And finally, what are your sysctl aka /proc/sys/vm settings for
dirty_* and vfs_cache_pressure?  Have you modified these from
defaults at all, either by changing your /etc/sysctl.(d|conf)
vm.* settings, or by writing directly to the files in
/proc/sys/vm, say at bootup (which is what changing the sysctl
config actually does)?

Because the default cache settings as a percent of memory were
setup back when memory was much smaller, and many people argue
that on today's multi-GiB memory machines, they allow far too
much dirty content to accumulate in cache before triggering
writeback, particularly for slow spinning rust.  When the time
expiry is set to 30 seconds but the amount of dirty data allowed
to accumulate before triggering high priority writeback is near
a minute's worth of writeback activity, something's wrong,
and that's very often a large part of the reason people see
IO related freezes in other processes trying to read stuff
off disk, as well as other lagginess on occasion (tho as
touched on above, that's rarer these days, due to point-to-
point buses such as PCIE and SATA, as opposed to the old
shared buses of PCI and EIDE).

FWIW, while I have ssds for the main system now, I already
had my system tuned in this regard:

1) I set things like emerge, kernel building, rsync, etc,
to idle/batch priority (19 niceness), which is more
efficient for cpu scheduling batch processes as they get
longer time slices but at idle/lowest priority so they
don't disturb other tasks much.

Additionally, for the cfq IO scheduler, it sees the idle
cpu priority and automatically lowers IO priority as well,
so manual use of ionice isn't necessary.  (I don't believe
the other schedulers account for this, however, which is
one reason I asked about them, above.)

For something like rsync that's normally IO-bound anyway,
the primary effect would be the automatic IO-nicing due
to the process nicing.

That was pretty effective at that level.  But combined
with #2, it's even more so.

2) I tuned my vm.dirty settings to trigger at much
lower sizes for both low priority background writeback
and higher priority foreground writeback.  You can
check the kernel's procfs documentation for the
specific settings if you like, but here's what I have
for my 16 GiB system (ssd now as I said, but I saw
no reason to change it from the spinning rust settings
I had, particularly since I still use spinning rust
for my media partition).

Direct from that section of my /etc/sysctl.conf:

################################################################################
# Virtual-machine: swap, write-cache

# vm.vfs_cache_pressure = 100
# vm.laptop_mode = 0
# vm.swappiness = 60
vm.swappiness = 100

# write-cache, foreground/background flushing
# vm.dirty_ratio = 10 (% of RAM)
# make it 3% of 16G ~ half a gig
vm.dirty_ratio = 3
# vm.dirty_bytes = 0

# vm.dirty_background_ratio = 5 (% of RAM)
# make it 1% of 16G ~ 160 M
vm.dirty_background_ratio = 1
# vm.dirty_background_bytes = 0

# vm.dirty_expire_centisecs = 2999 (30 sec)
# vm.dirty_writeback_centisecs = 499 (5 sec)
# make it 10 sec
vm.dirty_writeback_centisecs = 1000
################################################################################

The commented values are normal defaults.  Either ratio, percentage of RAM, or
direct bytes can be set.  Setting one clears (zeros) the other.  While the
ratio isn't of total RAM but of available RAM, total RAM's a reasonable
approximation on most modern systems.

Taking the foreground, vm.dirty_ratio setting first:

Spinning rust may be as low as 30 MiB/sec thruput, and 10% of 16 gig of RAM
is 1.6 gig, ~1600+ meg.  Doing the math that's ~53 seconds worth of writeback
accumulated by default, before it kicks into high priority writeback mode.
With a 30 second default timeout, that makes no sense at all as it's almost
double the timeout!

Besides, who wants to wait nearly a minute for it to dump all that?

So I set that to 3%, which with 16 gigs of RAM is ~half a gig, or about 16
seconds worth of writeback at 30 MB/sec.  That's only about half the 30
second time expiry and isn't /too/ bad to wait, tho you'll probably notice
if it takes that full 16 seconds.  But it's reasonable, and given
the integer setting and that we want background set lower, 3% is getting
about as low as practical.  (Obviously if I upped to 32 GiB RAM, I'd
want to switch to the bytes setting for this very reason.)

The background vm.dirty_background_ratio setting is where the lower
priority background writeback kicks in, so with luck the higher priority
foreground limit is never reached, tho it obviously will be for something
doing a lot of writing, like rsync often will.  So this should be lower
than foreground.

With foreground set to 3%, that doesn't leave much room for background, but 1%
is still reasonable.  That's about 160 MB or 5 seconds worth of writeback at
30 MB/sec, so it's reasonable.

As you can see, with the size triggers lowered to something reasonable, I
decided I could actually up the background time expiry from the default 5
seconds to 10.  That's still well and safely under the 30 second foreground
time expiry, and with the stronger size triggers I thought it was reasonable.

I haven't actually touched vfs_cache_pressure, but that's another related
knob you can twist.

I don't actually have any swap configured on my current machine (and
indeed, have it disabled in the kernel config) so the swappiness setting
doesn't matter, but I did have swap setup on my old 8 GiB machine.  I was
actually using the priority= option to set four swap partitions, one
each on four different devices, to the same priority, effectively
raid0-striping them.  My system was mdraid1 on the same four devices.
So I upped swappiness to 100 (it's more or less percentage, 0-100 range)
from its default 60, to force swapping out over dumping cache.  Even still,
I seldom went more than a few hundred MiB into swap, so the 8 GiB was
just about perfect for that 4-cores system.  (My new system has 6
cores and I figured the same 2 GiB per core.  I think now a more realistic
figure for my usage at least is say 4 GiB, plus a GiB per core, which 
seems to work out for both my old 4-core system with 8 gig RAM, and
my newer 6-core system where 12 gig of RAM would hit about the same
usage.  I have 16 gig, but seldom actually use the last 4 gig at all,
so 12 gig would be perfect, and then I might configure swap, just in
case.  But swap would be a waste with 16 gig since I literally seldom
use that last 4 gig even for cache anyway, so it's effectively my
overflow area.)

I used laptop mode, with laptop-mode-tools, on my netbook, the reason
it's comment-listed there as of-interest, but I've no reason to use
it on my main machine, which is wall-plug powered.  IIRC when active
it would try to flush everything possible once the disk was already
active, so it could spin down for longer between activations.
Laptop-mode-tools allows configuring a bunch of other stuff to toggle
between wall power and battery power as well, and of course one of
the other settings it changed was the above vm.dirty_* settings, to
much larger triggers and longer timeouts (upto 5 minutes or so,
tho I've read of people going to extremes and setting it to 15
minutes or even longer, tho that of course risks losing all that work
in a crash (!!)), again, to let the drive stay spun down for longer.

Between the two, setting much lower writeback cache size triggers,
and using a nice of 19 to set idle/batch mode for stuff like emerge
and rsync, after setting that bios setting on the old system as well,
responsiveness under heavy load, either IO or CPU, was MUCH better.

Hopefully you'll find it's similarly effective for you, assuming
you don't already have similar settings and are STILL seeing
the problem.

Of course for things like rsync, you could probably use ionice (with
the cfq io scheduler) instead and not bother doing normal nice on 
rsync, but because the cfq scheduler already effectively does ionice
on strongly niced processes, I didn't need to worry about it here.

As I said, with ssds I don't really need it that strict any more,
but I saw no need to change it, and indeed, with spinning rust still
for my media drive, it's still useful there.

So there it is.  Hope it's useful. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-01 12:47 ` Duncan
@ 2016-05-01 13:54   ` Mike Fleetwood
  2016-05-01 23:16     ` Duncan
  2016-05-01 16:54   ` Kai Krakow
  2016-05-02 13:13   ` Austin S. Hemmelgarn
  2 siblings, 1 reply; 19+ messages in thread
From: Mike Fleetwood @ 2016-05-01 13:54 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On 1 May 2016 at 13:47, Duncan <1i5t5.duncan@cox.net> wrote:
> Direct from that section of my /etc/sysctl.conf:
>
> ################################################################################
> # Virtual-machine: swap, write-cache

Hi Duncan,

You mean virtual memory.

Quoting from the kernel documentation
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/sysctl/vm.txt
"
This file contains the documentation for the sysctl files in
/proc/sys/vm and is valid for Linux kernel version 2.6.29.

The files in this directory can be used to tune the operation
of the virtual memory (VM) subsystem of the Linux kernel and
the writeout of dirty data to disk.
"

> # vm.vfs_cache_pressure = 100
> # vm.laptop_mode = 0
> # vm.swappiness = 60
> vm.swappiness = 100
>
> # write-cache, foreground/background flushing
> # vm.dirty_ratio = 10 (% of RAM)
> # make it 3% of 16G ~ half a gig
> vm.dirty_ratio = 3
> # vm.dirty_bytes = 0
>
> # vm.dirty_background_ratio = 5 (% of RAM)
> # make it 1% of 16G ~ 160 M
> vm.dirty_background_ratio = 1
> # vm.dirty_background_bytes = 0
>
> # vm.dirty_expire_centisecs = 2999 (30 sec)
> # vm.dirty_writeback_centisecs = 499 (5 sec)
> # make it 10 sec
> vm.dirty_writeback_centisecs = 1000
> ################################################################################

Thanks,
Mike

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-01 12:47 ` Duncan
  2016-05-01 13:54   ` Mike Fleetwood
@ 2016-05-01 16:54   ` Kai Krakow
  2016-05-01 23:22     ` Duncan
                       ` (2 more replies)
  2016-05-02 13:13   ` Austin S. Hemmelgarn
  2 siblings, 3 replies; 19+ messages in thread
From: Kai Krakow @ 2016-05-01 16:54 UTC (permalink / raw)
  To: linux-btrfs

Am Sun, 1 May 2016 12:47:00 +0000 (UTC)
schrieb Duncan <1i5t5.duncan@cox.net>:

> Kai Krakow posted on Sun, 01 May 2016 09:00:46 +0200 as excerpted:
> 
> > I'm not sure what triggeres this, neither if it is btrfs specific.
> > The filesystems have been recreated from scratch. Mainly during my
> > rsync backup (from btrfs to btrfs), but not necessarily limited to
> > rsync usage, my system experiences uninterruptable freezes of
> > commands like "df", "du", "btrfs fs sync", and probably more.
> > 
> > I created a backtrace in dmesg using sysrq+w. Here's the log:
> > (https://gist.github.com/kakra/8e6a2b7ac4047fa9e345648664f5667b)  
> 
> FWIW, that link give me the github "this is not the web page you are
> looking for" 404 error...  Do you have it set private or something?

Yes it's private but it's working for me in an incognito window of
Chromium. So maybe you copied the closing bracket or something into the
URL? Or some temporary error?

> Anyway...

It's actually just what follows...

> > [92481.503505] sysrq: SysRq : Show Blocked State
> > [92481.503511]   task                        PC stack   pid father
> > [92481.503603] ksysguardd      D 00000000000141c0     0 32516
> > 1 0x00000004 [92481.503609]  ffff880231748c00 ffff88032a468000
> > ffff88032a467bc0 ffff880231748c00 [92481.503612]  ffff880403cd8f20
> > ffff880403cd8f00 0000000000000000 ffffffff817167cc [92481.503615]
> > 7fffffffffffffff ffffffff817190df ffff88032a467bc0 ffffffff810b4d99
> > [92481.503617] Call Trace: [92481.503625]  [<ffffffff817167cc>] ?
> > schedule+0x2c/0x80 [92481.503630]  [<ffffffff817190df>] ?
> > schedule_timeout+0x13f/0x1a0 [92481.503635]  [<ffffffff810b4d99>] ?
> > finish_wait+0x29/0x60 [92481.503639]  [<ffffffff81718229>] ?
> > mutex_lock+0x9/0x30 [92481.503642]  [<ffffffff811e0e23>] ?
> > autofs4_wait+0x433/0x710 [92481.503646]  [<ffffffff81717012>] ?
> > wait_for_completion+0x92/0xf0 [92481.503649]
> > [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60 [92481.503652]
> > [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
> > [92481.503654]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
> > [92481.503657]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
> > [92481.503660]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
> > [92481.503662]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
> > [92481.503665]  [<ffffffff8117ddc8>] ? path_lookupat+0x58/0x100
> > [92481.503667]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
> > [92481.503671]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
> > [92481.503674]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
> > [92481.503677]  [<ffffffff8119cc8a>] ? user_statfs+0x2a/0x80
> > [92481.503679]  [<ffffffff8119ccf0>] ? SYSC_statfs+0x10/0x30
> > [92481.503683]  [<ffffffff8103ea53>] ? __do_page_fault+0x1a3/0x400
> > [92481.503686]  [<ffffffff81719cd7>] ?
> > entry_SYSCALL_64_fastpath+0x12/0x6a [92481.503688] trash.so
> > D 00000000000141c0     0   449   1478 0x00000004 [92481.503691]
> > ffff8802ee5b1800 ffff88022c720000 ffff88022c71fb70 ffff8802ee5b1800
> > [92481.503694]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000
> > ffffffff817167cc [92481.503697]  7fffffffffffffff ffffffff817190df
> > ffff88022c71fb70 ffffffff810b4d99 [92481.503699] Call Trace:
> > [92481.503702]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
> > [92481.503706]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
> > [92481.503709]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
> > [92481.503712]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
> > [92481.503714]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
> > [92481.503717]  [<ffffffff81717012>] ?
> > wait_for_completion+0x92/0xf0 [92481.503720]
> > [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60 [92481.503722]
> > [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
> > [92481.503724]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
> > [92481.503727]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
> > [92481.503729]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
> > [92481.503731]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
> > [92481.503734]  [<ffffffff8117d973>] ? link_path_walk+0x163/0x4f0
> > [92481.503736]  [<ffffffff8117c4d1>] ? path_init+0x1f1/0x330
> > [92481.503738]  [<ffffffff8117dde7>] ? path_lookupat+0x77/0x100
> > [92481.503741]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
> > [92481.503743]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
> > [92481.503746]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
> > [92481.503749]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
> > [92481.503752]  [<ffffffff81176ddd>] ? SYSC_newlstat+0x1d/0x40
> > [92481.503754]  [<ffffffff8117271d>] ? vfs_write+0x13d/0x180
> > [92481.503757]  [<ffffffff811734a1>] ? SyS_write+0x61/0x90
> > [92481.503760]  [<ffffffff810f5175>] ? from_kuid_munged+0x5/0x10
> > [92481.503763]  [<ffffffff81719cd7>] ?
> > entry_SYSCALL_64_fastpath+0x12/0x6a [92481.503766] ksysguardd
> > D 00000000000141c0     0  9869      1 0x00000004 [92481.503769]
> > ffff880092012400 ffff880203868000 ffff880203867bc0 ffff880092012400
> > [92481.503771]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000
> > ffffffff817167cc [92481.503774]  7fffffffffffffff ffffffff817190df
> > ffff880203867bc0 ffffffff810b4d99 [92481.503776] Call Trace:
> > [92481.503779]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
> > [92481.503782]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
> > [92481.503786]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
> > [92481.503789]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
> > [92481.503791]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
> > [92481.503794]  [<ffffffff81717012>] ?
> > wait_for_completion+0x92/0xf0 [92481.503797]
> > [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60 [92481.503799]
> > [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
> > [92481.503801]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
> > [92481.503803]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
> > [92481.503806]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
> > [92481.503808]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
> > [92481.503810]  [<ffffffff8117ddc8>] ? path_lookupat+0x58/0x100
> > [92481.503813]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
> > [92481.503815]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
> > [92481.503818]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
> > [92481.503820]  [<ffffffff8119cc8a>] ? user_statfs+0x2a/0x80
> > [92481.503822]  [<ffffffff8119ccf0>] ? SYSC_statfs+0x10/0x30
> > [92481.503825]  [<ffffffff8103ea53>] ? __do_page_fault+0x1a3/0x400
> > [92481.503827]  [<ffffffff81719cd7>] ?
> > entry_SYSCALL_64_fastpath+0x12/0x6a [92481.503830] trash.so
> > D 0000000000000000     0 14351   1478 0x00000004 [92481.503833]
> > ffff8803072ac800 ffff8800438fc000 ffff8800438fbb70 ffff8803072ac800
> > [92481.503835]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000
> > ffffffff817167cc [92481.503838]  7fffffffffffffff ffffffff817190df
> > 0000000000000010 0000000000000287 [92481.503841] Call Trace:
> > [92481.503844]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
> > [92481.503847]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
> > [92481.503850]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
> > [92481.503852]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
> > [92481.503855]  [<ffffffff81717012>] ?
> > wait_for_completion+0x92/0xf0 [92481.503858]
> > [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60 [92481.503860]
> > [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
> > [92481.503862]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
> > [92481.503864]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
> > [92481.503866]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
> > [92481.503869]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
> > [92481.503871]  [<ffffffff8117d973>] ? link_path_walk+0x163/0x4f0
> > [92481.503873]  [<ffffffff8117c4d1>] ? path_init+0x1f1/0x330
> > [92481.503876]  [<ffffffff8117dde7>] ? path_lookupat+0x77/0x100
> > [92481.503878]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
> > [92481.503880]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
> > [92481.503883]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
> > [92481.503886]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
> > [92481.503889]  [<ffffffff81176ddd>] ? SYSC_newlstat+0x1d/0x40
> > [92481.503891]  [<ffffffff8117271d>] ? vfs_write+0x13d/0x180
> > [92481.503893]  [<ffffffff811734a1>] ? SyS_write+0x61/0x90
> > [92481.503895]  [<ffffffff810f5175>] ? from_kuid_munged+0x5/0x10
> > [92481.503898]  [<ffffffff81719cd7>] ?
> > entry_SYSCALL_64_fastpath+0x12/0x6a [92481.503901] trash.so
> > D 0000000000000000     0 17062   1478 0x00000004 [92481.503903]
> > ffff880350ee4800 ffff8801e4d5c000 ffff8801e4d5bb70 ffff880350ee4800
> > [92481.503906]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000
> > ffffffff817167cc [92481.503908]  7fffffffffffffff ffffffff817190df
> > ffff8801e4d5bb70 ffffffff810b4d99 [92481.503911] Call Trace:
> > [92481.503914]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
> > [92481.503917]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
> > [92481.503920]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
> > [92481.503923]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
> > [92481.503925]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
> > [92481.503928]  [<ffffffff81717012>] ?
> > wait_for_completion+0x92/0xf0 [92481.503931]
> > [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60 [92481.503934]
> > [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
> > [92481.503936]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
> > [92481.503938]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
> > [92481.503940]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
> > [92481.503942]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
> > [92481.503945]  [<ffffffff8117d973>] ? link_path_walk+0x163/0x4f0
> > [92481.503947]  [<ffffffff8117c4d1>] ? path_init+0x1f1/0x330
> > [92481.503949]  [<ffffffff8117dde7>] ? path_lookupat+0x77/0x100
> > [92481.503951]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
> > [92481.503954]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
> > [92481.503956]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
> > [92481.503959]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
> > [92481.503962]  [<ffffffff81176ddd>] ? SYSC_newlstat+0x1d/0x40
> > [92481.503964]  [<ffffffff8117271d>] ? vfs_write+0x13d/0x180
> > [92481.503966]  [<ffffffff811734a1>] ? SyS_write+0x61/0x90
> > [92481.503969]  [<ffffffff810f5175>] ? from_kuid_munged+0x5/0x10
> > [92481.503971]  [<ffffffff81719cd7>] ?
> > entry_SYSCALL_64_fastpath+0x12/0x6a [92481.503973] df
> > D 00000000000141c0     0 20885  20797 0x00000004 [92481.503976]
> > ffff8803c42e6000 ffff8801001b4000 ffff8801001b3bd0 ffff8803c42e6000
> > [92481.503979]  ffff880403cd8f20 ffff880403cd8f00 0000000000000000
> > ffffffff817167cc [92481.503981]  7fffffffffffffff ffffffff817190df
> > ffff8801001b3bd0 ffffffff810b4d99 [92481.503984] Call Trace:
> > [92481.503987]  [<ffffffff817167cc>] ? schedule+0x2c/0x80
> > [92481.503990]  [<ffffffff817190df>] ? schedule_timeout+0x13f/0x1a0
> > [92481.503993]  [<ffffffff810b4d99>] ? finish_wait+0x29/0x60
> > [92481.503996]  [<ffffffff81718229>] ? mutex_lock+0x9/0x30
> > [92481.503999]  [<ffffffff811e0e23>] ? autofs4_wait+0x433/0x710
> > [92481.504002]  [<ffffffff81717012>] ?
> > wait_for_completion+0x92/0xf0 [92481.504005]
> > [<ffffffff810a2b80>] ? wake_up_q+0x60/0x60 [92481.504007]
> > [<ffffffff811e1a4f>] ? autofs4_expire_wait+0x5f/0x80
> > [92481.504009]  [<ffffffff811dfa36>] ? autofs4_d_manage+0x56/0x150
> > [92481.504011]  [<ffffffff8117b811>] ? follow_managed+0x91/0x2c0
> > [92481.504013]  [<ffffffff8117bcd2>] ? lookup_fast+0x102/0x300
> > [92481.504016]  [<ffffffff8117d3f1>] ? walk_component+0x31/0x450
> > [92481.504018]  [<ffffffff8117ddc8>] ? path_lookupat+0x58/0x100
> > [92481.504020]  [<ffffffff811800d5>] ? filename_lookup+0x95/0x110
> > [92481.504023]  [<ffffffff8115b55e>] ? kmem_cache_alloc+0x10e/0x160
> > [92481.504026]  [<ffffffff8117fdeb>] ? getname_flags+0x4b/0x170
> > [92481.504028]  [<ffffffff81176934>] ? vfs_fstatat+0x44/0x90
> > [92481.504031]  [<ffffffff81176d9a>] ? SYSC_newstat+0x1a/0x40
> > [92481.504034]  [<ffffffff81719cd7>] ?
> > entry_SYSCALL_64_fastpath+0x12/0x6a  
> 
> IIRC you're on gentoo as I am.  It's interesting how close our
> systems are to each other, and yet how far away.  Based on the
> ksysguardd above, I'm guessing you run kde, as I do, and we both use
> claws for mail (tho messages here will say pan, as I use it for news
> and do my mailing lists as news via gmane.org's list2news service).

I use Claws only for gmane.org's list2news service, running NNTP thru
leafnode. Apparently, knode no longer works with plasma, except I'm
pulling in a lot of legacy KDE stuff which conflicts with what I want
to have installed.

> But I run a pretty stripped down kde; nothing like automount (the
> autofs4 in your traces) at all, the baloo parts of plasma-workspace
> and plasma-desktop patched out at build time so I can use a virtual
> "null-package" baloo from my overlay that installs no files and has no
> dependencies to fill the dep (easier than patching it out of the
> ebuild, unfortunately the hoops I gotta go thru now that they dumped
> the semantic-desktop USE flag for kde/frameworks/plasma5); similarly
> the runtime-only udisks dep is filled by a null-package as I don't
> need or want those features and definitely don't want to pull in all
> its deps, etc.

I'm using full-blown KDE plasma here, nothing stripped down. I love
balloo - it's fast, the system is responsive, I never notice indexing,
the searching features are really great for starting programs. The
system boots up in under 15s, the desktop is loaded in maybe 5 seconds
with a lot of autostart stuff like Steam. This also includes three
container VMs I'm using for development stuff (systemd-nspawn) and a
local MariaDB server.

> But with kde so heavily stripped down, mostly only the basic desktop
> and many of the games, it's reasonably easy to run the live-9999
> ebuilds from the gentoo/kde overlay and use smart-live-rebuild to
> only rebuild packages with new commits, so I'm actually running
> live-git kde, for the parts of kde I have installed at all, anyway.
> =:^)

I didn't now about smart-live-rebuild. Interesting. ;-)

> To the issue at hand...

Yep.

> As you know I'm a btrfs user, not a dev.  However, if I'm not
> mistaken, the D after each of the apps indicates the same "disk
> sleep" status you'd see in top and the like, for apps waiting on IO.

Yes, D indicates disk sleep - but I don't have any outstanding IO I'd
know of, neither I see HDD LED activity, nor I to excessive reading
that could explain that.

I'm managing a lot of Gentoo-based servers at my company, and a lot of
tuning has gone into optimizing IO - so I know the knobs and where to
look for problems.

This one however leaves me a little clueless.

I can see "du", "df", etc in "ps axuw" in disk sleep state - not sure
why they don't show up in that generated trace. Ctrl+c'ing those
commands simply does nothing except display "^C" on the shell. Neither
"kill -9" works on those hanging commands.

> Further, each of the listed apps appears to be waiting at the same
> schedule function, basically yielding the cpu to other apps until the
> filesystem returns what they're looking for, up the call stack to
> obviously each of the apps doing a file operation, apparently write in
> some cases, stat on some not-yet-cached file in other cases.

When the systems comes to the point those commands hang, everything
else is still working: No excessive IO, no unexpected IO, performance
is still very good. But processes looking for free disk space on _any_
of my filesystems then hang - even Wine, PlayOnLinux, ksysguardd, the
KDE trash handling routines... Shutting the system down in this state
takes at least 2 minutes vs. usually 10-20s. I expect systemd is just
trying to hard kill processes and then eventually ignores them. In
turn, probably the filesystems won't be unmounted properly which may
explain the corruption I had previously. But now, I returned all
filesystems to sane state by recreating them from scratch, using a
brand-new SSD for bcache, etc. I also error checked components like
HDD, RAM, etc. All is well.

> Looks to me pretty normal, just waiting for access to the disk when
> something else is hogging all the (limited) disk bandwidth.  Nothing
> like btrfs worker threads hung up, etc, that would make it
> btrfs-specific.

Not so normal however is that state holds on forever. Some point in
time such processes should eventually finish their jobs.

> Meanwhile, du isn't in the traces above (tho df is), but you mention
> it in the title and first paragraph, so it's worth noting that du,
> unlike df, generally crawls the directory trees you point it at to
> get the space used by all the files in all the dirs, recursively,
> so it'll do much more IO than something like df, which in general is
> a far faster call to the filesystem, asking for a single set of stats
> that the filesystem can normally return without crawling a big
> directory tree.

Running "du" just hangs - there's no IO going on which I would expect
when running "du". Further, df hangs, sync hangs, ksysguard only
partially works (it still shows ongoing IO, but some other graphs no
longer show values) and eventually later refuses to work completely.

Until a reboot fixes all of that.

> But obviously the df call here was waiting for /something/ from disk.
> Maybe it was actually still trying to load the df binary itself?  (df
> doesn't load but three libs, the linux-vdso kernel-virtual being one
> of them, the ld loader itself, and the libc, pretty basic and the two
> actual file-based libs, the loader itself and libc, will obviously be
> cached, so it shouldn't block on loading them.)

I'm pretty sure that all those libs and binaries are even in cache when
I run them: Of my 16 GB RAM there's still plenty of RAM free with a lot
of it being cache.

I experienced Linux servers in the past that were running from cache
completely fine with the RAID controller frozen. Writing froze but
everything else still worked - until the cache filled up, then the
system broke completely down.

> As for btrfs fi sync, that's going to force-write-out all dirty
> write-cache for that filesystem, so if it's called on the btrfs
> the rsync is writing to, that's likely to block for quite some time
> waiting for all those writes to flush, as well.

It affects all file systems. The "btrfs fi sync" is used to finish my
rsync backup and ensure everything is written before I'm trying to
unmount it or the system goes back to sleep.

"df" and friends also freeze on tmp (ramdisk) fs and vfat fs (my EFI
boot partition).

> If btrfs filesystem sync is called on a _different_ btrfs, AND that
> independent btrfs is on an entirely separate physical device (AND
> that separate physical device isn't on the same legacy IDE cable or
> SATA port-multiplier), THEN btrfs filesystem sync should return
> relatively fast, compared to a general sync at least, since it
> will only flush that separate filesystem, not all filesystems as
> would a general sync.

Yes, I expect that to be the difference (full flush vs. fs-specific
flush).

> Meanwhile, general IO scheduling is a somewhat different subject.
> I believe the kernel does try to prioritize reads over writes,
> pushing them to the front of the queue, which should get them
> addressed and any apps only waiting on reads unstuck faster,
> but on spinning rust, it can still take some time.

The kernel, with cfq and deadline schedules, prioritizes reads over
writes but ensures there's no write starvation. The background is that
a process is expected to block on write more often then it is expected
to block on reads (e.g. if you save a file in your editor, you are
comfortable with waiting a little time, but if it'd happen if you
click a menu or open a dialog - ah, well, not so nice experience).

Bcache uses a similar approach for calculating IO congestion on the SSD
to fall back to passing IO directly to the backing devices. Tho, the
congestion counter in bcache never ever counted anything for me - which
tells me: excessive IO is not my problem.

> But there are at least two possible large problems.  One is
> BIOS related and on many systems, not configurable at all, tho
> on higher end server or gaming systems it can be, the other
> is kernel IO tuning.

No traditional BIOS here, I use EFI ultra-fast boot (aka Windows 8 fast
boot) with NVIDIA GOP to get an EFI framebuffer. According to
systemd-analyze, my system only uses 1s in EFI firmware during boot.
The EFI setup itself doesn't expose much stuff regarding to PCI-E
latency settings - and even when, I think it defaults to 32 ticks which
I never had problems with. I remember the days with buggy VIA baords
where you had to tune that setting so that your Windows 95 wouldn't
freeze so often during audio output, or even to fix audio stuttering.
But that's history a long-time.

> The BIOS related one, and I'm not sure it even applies to PCIE,
> as the old system I had with the option was a pre-PCIE server
> board with high-speed PCIX (and AGP still, remember that? =:^)
> but not PCIE, has to do with IO thruput vs. responsiveness.
> Unfortunately IDR what it was called exactly, but the settings
> were IIRC powers of two from something like 16 MB (IIRC) up thru
> 128 or 256 MB.  Lower settings were more responsive, but 
> would slow total thruput in certain use-cases as there was
> more command and bus overhead.

I think this is something you find with server boards only, because
those are optimized for throughput instead of interactivity. A classic
desktop board shouldn't have those knobs tho I'd expect some
manufactures to tune those knobs to show better benchmark results.
Still, those are usually not exposed in the BIOS UI, or they are
hidden behing obscure settings.

However, my settings are pretty standard - I just lowered CPU voltage a
little bit for a cooler CPU and ran some burn-in benchmarks to verify
it's still running stable. That gained my up to 10°C less CPU
temperature when idle. The system stays cooler and more quiet. And it's
using less watts.
 
> The reason I still mention this here at all is that with the
> higher settings, I definitely had IO-related lagginess in a lot
> of things, including mouse movement (freeze-and-jump syndrome)
> and even audio playback (dropouts).  I tried a bunch of kernel
> adjustments, etc, but the problem in this case was below the
> kernel in the BIOS settings, and wasn't fixed until I decided
> to try lowering that setting to IIRC 32 MB (or whatever), after
> which the problem simply disappeared!

No mouse lags here. Even no lags at all, after boot all programs
(including Chromium) almost start instantly when clicking them. Neither
after the system has been up for a long time: Everything is almost
instant. As I expect it after migrating to bcache.

> The point being, sometimes it's nothing the kernel can really
> control!  Of course that general primary point applies regardless
> of whether there's anything like that specific setting in more
> modern PCIE style hardware, which is why I still mentioned it.

Your observation about responsiveness on server hardware is at least
interesting. I will keep that in mind and look for this setting when I
see such problems with a server board.

> Meanwhile, what kernel IO scheduler do you use (deadline, noop,
> cfq,... cfq is the normal default)?  Do you use either normal
> process nice/priority or ionice to control the rsync?  What
> about cgroups?

Due to systemd booting my system, everything is in a cgroup. The backup
job itself is not tied to its own cgroup. But it actually uses ionicing
and nicing:

$ systemctl cat usb-backup.service
# /etc/systemd/system/usb-backup.service
[Unit]
Description=Daily USB Backup Service

[Service]
Type=oneshot
ExecStart=/usr/local/sbin/usb-backup.sh
IOSchedulingClass=idle
IOSchedulingPriority=7
CPUSchedulingPolicy=batch
Nice=3
PrivateTmp=true
ProtectHome=read-only
ProtectSystem=true

But whatever those settings are, it should never expose a behavior as I
see. I don't think this is related. I only know the rsync job is the
most probable to trigger the problem.

> And finally, what are your sysctl aka /proc/sys/vm settings for
> dirty_* and vfs_cache_pressure?  Have you modified these from
> defaults at all, either by changing your /etc/sysctl.(d|conf)
> vm.* settings, or by writing directly to the files in
> /proc/sys/vm, say at bootup (which is what changing the sysctl
> config actually does)?

This is standard here. I experimented with it in the past, but it only
showed very observable bad performance in edge cases while only showing
almost non-existing improvements in normal performance.

$ sudo sysctl -a 2>&1 | egrep 'vm.dirty|vfs'
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200
vm.vfs_cache_pressure = 100

> Because the default cache settings as a percent of memory were
> setup back when memory was much smaller, and many people argue
> that on today's multi-GiB memory machines, they allow far too
> much dirty content to accumulate in cache before triggering
> writeback, particularly for slow spinning rust.  When the time
> expiry is set to 30 seconds but the amount of dirty data allowed
> to accumulate before triggering high priority writeback is near
> a minute's worth of writeback activity, something's wrong,
> and that's very often a large part of the reason people see
> IO related freezes in other processes trying to read stuff
> off disk, as well as other lagginess on occasion (tho as
> touched on above, that's rarer these days, due to point-to-
> point buses such as PCIE and SATA, as opposed to the old
> shared buses of PCI and EIDE).

Experimented with that a found it's not worth to do for my system.
Writeback rate is very high due to bcache, 500 MB/s, and if SSD is
congested, I could get some extra MB/s from the backing devices. This
is similar behavior to the servers I configure with BBU and writeback,
where I can see up to 2 GB/s writes for a few minutes before the
writeback cache becomes congested.

Even if I copy a lot of data around, or a lot of data from USB3 to my
FS, write rates are not an issue. The data is written a lot faster than
it comes in - or completely fits into bcache. I rarely experience IO
stalls - usually only when a multi GB process segfaults and dumps it
core.

> FWIW, while I have ssds for the main system now, I already
> had my system tuned in this regard:
> 
> 1) I set things like emerge, kernel building, rsync, etc,
> to idle/batch priority (19 niceness), which is more
> efficient for cpu scheduling batch processes as they get
> longer time slices but at idle/lowest priority so they
> don't disturb other tasks much.

The difference is maybe a few seconds on a multi-minute build with the
difference that interactivity is a lot better. Tho, on my system I
don't see any difference now, whether I build the kernel "make -j9" or
"nice make -j9". With bcache, even "ionice rsync" does no longer have
such a high impact as before.

> Additionally, for the cfq IO scheduler, it sees the idle
> cpu priority and automatically lowers IO priority as well,
> so manual use of ionice isn't necessary.  (I don't believe
> the other schedulers account for this, however, which is
> one reason I asked about them, above.)

I used BFQ for a while and was very happy with it. But it's currently
not in the gentoo-patches for the kernel.

> For something like rsync that's normally IO-bound anyway,
> the primary effect would be the automatic IO-nicing due
> to the process nicing.

Almost no effect visible here due to bcache.

> That was pretty effective at that level.  But combined
> with #2, it's even more so.
> 
> 2) I tuned my vm.dirty settings to trigger at much
> lower sizes for both low priority background writeback
> and higher priority foreground writeback.  You can
> check the kernel's procfs documentation for the
> specific settings if you like, but here's what I have
> for my 16 GiB system (ssd now as I said, but I saw
> no reason to change it from the spinning rust settings
> I had, particularly since I still use spinning rust
> for my media partition).

That only led to bad behavior in edge cases for me. I prefer having a
percent-sized writeback cache now, otherwise the system would stutter
too often for me. It's not worth trying to micro-optimize that if you
use bcache.

> 
> Direct from that section of my /etc/sysctl.conf:
> 
> ################################################################################
> # Virtual-machine: swap, write-cache

Err, it's VM = virtual memory. ;-)

[...]

> The commented values are normal defaults.  Either ratio, percentage
> of RAM, or direct bytes can be set.  Setting one clears (zeros) the
> other.  While the ratio isn't of total RAM but of available RAM,
> total RAM's a reasonable approximation on most modern systems.
> 
> Taking the foreground, vm.dirty_ratio setting first:
> 
> Spinning rust may be as low as 30 MiB/sec thruput, and 10% of 16 gig
> of RAM is 1.6 gig, ~1600+ meg.  Doing the math that's ~53 seconds
> worth of writeback accumulated by default, before it kicks into high
> priority writeback mode. With a 30 second default timeout, that makes
> no sense at all as it's almost double the timeout!

Accoring to boot chart, I get IO rates up to 800MB/s during boot-up. Of
course, that is reading. I trimmed down writing a lot during the last
months (e.g. by switching to noatime). That was, after I saw how fast
bcache wears the SSD. Now, I switched to 4x bigger SSD. Bcache doesn't
fill the cache currently, so it's not rewriting caches over and over
again. That should take a lot of writes out of the equation.

> Besides, who wants to wait nearly a minute for it to dump all that?

Dumping 16GB to my FS just takes 10-20 seconds for me. Dumping the 10%
of that (worth of writeback) is thus usually done in 1-2 seconds. Even
if it is slower due to seeking, it should still be below 10s. I have no
problems with that, I only very very rarely experience stalls due to
writeback. And even then, applications block for a second maybe.

> So I set that to 3%, which with 16 gigs of RAM is ~half a gig, or
> about 16 seconds worth of writeback at 30 MB/sec.  That's only about
> half the 30 second time expiry and isn't /too/ bad to wait, tho
> you'll probably notice if it takes that full 16 seconds.  But it's
> reasonable, and given the integer setting and that we want background
> set lower, 3% is getting about as low as practical.  (Obviously if I
> upped to 32 GiB RAM, I'd want to switch to the bytes setting for this
> very reason.)
> 
> The background vm.dirty_background_ratio setting is where the lower
> priority background writeback kicks in, so with luck the higher
> priority foreground limit is never reached, tho it obviously will be
> for something doing a lot of writing, like rsync often will.  So this
> should be lower than foreground.

rsync writes to a device that is write-only. No desktop process ever
reads from that device. And I also don't see this invalidating my
cache. The system feels normal during backup (except games may stutter,
as I already experienced).

I only increased nr_requests which improves IO performance a lot better
than fiddling around with vm dirty ratios.

#!/bin/sh
for nr_requests in /sys/bus/scsi/devices/[012345]:0:0:0/block/*/queue/nr_requests; do
        echo 512 >$nr_requests
#       echo 128 >$nr_requests
done

With that setting, I see write rates up to 120MB/s to my USB3 HDD,
mostly churning around 50MB/s.

> With foreground set to 3%, that doesn't leave much room for
> background, but 1% is still reasonable.  That's about 160 MB or 5
> seconds worth of writeback at 30 MB/sec, so it's reasonable.
> 
> As you can see, with the size triggers lowered to something
> reasonable, I decided I could actually up the background time expiry
> from the default 5 seconds to 10.  That's still well and safely under
> the 30 second foreground time expiry, and with the stronger size
> triggers I thought it was reasonable.
> 
> I haven't actually touched vfs_cache_pressure, but that's another
> related knob you can twist.

As there's no IO going on, I doubt it'd make a difference for my
problem. And I experimented with that in the past and it was just not
worth tuning it. I've put in bcache, tuned nr_requests. That's it.
Tuning these other knobs just doesn't work well - edge-case behavior
suffers a lot from that.

> I don't actually have any swap configured on my current machine (and
> indeed, have it disabled in the kernel config) so the swappiness
> setting doesn't matter, but I did have swap setup on my old 8 GiB
> machine.

I'm using 60GB of swap, as I use a lot of tmpfs filesystem (30GB
for /var/tmp/portage). The swap is 3x 20GB on each HDD, same priority
so it is evenly spread by the kernel. Bcache caches the normal
filesystems (not the swaps!), so I suffer almost no performance loss for
normal FS operation when the kernel starts using the swaps.

>  I was actually using the priority= option to set four swap
> partitions, one each on four different devices, to the same priority,
> effectively raid0-striping them.  My system was mdraid1 on the same
> four devices. So I upped swappiness to 100 (it's more or less
> percentage, 0-100 range) from its default 60, to force swapping out
> over dumping cache.  Even still, I seldom went more than a few
> hundred MiB into swap, so the 8 GiB was just about perfect for that
> 4-cores system.  (My new system has 6 cores and I figured the same 2
> GiB per core.  I think now a more realistic figure for my usage at
> least is say 4 GiB, plus a GiB per core, which seems to work out for
> both my old 4-core system with 8 gig RAM, and my newer 6-core system
> where 12 gig of RAM would hit about the same usage.  I have 16 gig,
> but seldom actually use the last 4 gig at all, so 12 gig would be
> perfect, and then I might configure swap, just in case.  But swap
> would be a waste with 16 gig since I literally seldom use that last 4
> gig even for cache anyway, so it's effectively my overflow area.)

Here's it's almost impossible to use more than 9 GB RAM during normal
operation. After boot (full-blown KDE), ksysguard shows 2-3 GB usage of
RAM (this excludes caches).

So I concluded: Desktop systems with 12 GB RAM a well equipped (and
that is what I ordered at work for my workplace).

> I used laptop mode, with laptop-mode-tools, on my netbook, the reason
> it's comment-listed there as of-interest, but I've no reason to use
> it on my main machine, which is wall-plug powered.  IIRC when active
> it would try to flush everything possible once the disk was already
> active, so it could spin down for longer between activations.
> Laptop-mode-tools allows configuring a bunch of other stuff to toggle
> between wall power and battery power as well, and of course one of
> the other settings it changed was the above vm.dirty_* settings, to
> much larger triggers and longer timeouts (upto 5 minutes or so,
> tho I've read of people going to extremes and setting it to 15
> minutes or even longer, tho that of course risks losing all that work
> in a crash (!!)), again, to let the drive stay spun down for longer.

Never worked for me. Something always triggered even tiny writes once
in a while a woke up the HDDs. So I stopped experimenting with that. If
it's for a laptop, you gain much much more battery by just swapping the
HDD with an SSD - and that's it. Simple, easy. SSDs have become cheap
enough that it's no longer worth a hassle (standard laptop come with
512GB HDD, pay €120 extra and you have a 512GB SSD + an external HDD
what used to be the internal SSD).

There's just one caveat (seen in tests, and seen in the wild during my
job): Samsung SSD die a sudden death. They go from working to
inaccessible without any previous notice if they wear out - which is
after about 3-5 years worth of a bit more than normal desktop usage.

Apparently, I'm also using a 512GB Samsung SSD now, and it seems it has
no SMART indicate of write-usage as my Crucial SSD had. So I'm left for
surprise I think. I suppose I have to figure out how to convert
"241 Total_LBAs_Written" to life-time against the 85TB
write-performance of those SSDs.

> Between the two, setting much lower writeback cache size triggers,
> and using a nice of 19 to set idle/batch mode for stuff like emerge
> and rsync, after setting that bios setting on the old system as well,
> responsiveness under heavy load, either IO or CPU, was MUCH better.

I'm using these vm dirty knobs only for servers to even-out web stalls.

> Hopefully you'll find it's similarly effective for you, assuming
> you don't already have similar settings and are STILL seeing
> the problem.

Nope. Tried. Didn't make anything better, just worse.

> Of course for things like rsync, you could probably use ionice (with
> the cfq io scheduler) instead and not bother doing normal nice on 
> rsync, but because the cfq scheduler already effectively does ionice
> on strongly niced processes, I didn't need to worry about it here.
> 
> As I said, with ssds I don't really need it that strict any more,
> but I saw no need to change it, and indeed, with spinning rust still
> for my media drive, it's still useful there.

A dirty cache is not the problem here as far as I see. It simply
deadlocks due to some unknown reason (probably).

> So there it is.  Hope it's useful. =:^)

At least it was long... ;-)


-- 
Regards,
Kai

Replies to list-only preferred.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-01 13:54   ` Mike Fleetwood
@ 2016-05-01 23:16     ` Duncan
  0 siblings, 0 replies; 19+ messages in thread
From: Duncan @ 2016-05-01 23:16 UTC (permalink / raw)
  To: linux-btrfs

Mike Fleetwood posted on Sun, 01 May 2016 14:54:44 +0100 as excerpted:

> On 1 May 2016 at 13:47, Duncan <1i5t5.duncan@cox.net> wrote:
>> Direct from that section of my /etc/sysctl.conf:
>>
>> ################################################################################
>> # Virtual-machine: swap, write-cache
> 
> Hi Duncan,
> 
> You mean virtual memory.

Yes, thanks for catching that.  I had figured that out at some point, but the
virtual-machine comment was from before that and I guess I never changed it.
Additionally, I'm used to VM in the machine not memory context, so still slip
up and write machine even when I'm talking about memory from time to time.
=:^( )

I'll change that comment as soon as I send this, and hopefully the specific
discussion will tune my sensitivity to it enough that I quit making that
mistake in discussion from now on, as well.  So thanks again. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-01 16:54   ` Kai Krakow
@ 2016-05-01 23:22     ` Duncan
  2016-05-01 23:49     ` Duncan
  2016-05-02  0:23     ` Duncan
  2 siblings, 0 replies; 19+ messages in thread
From: Duncan @ 2016-05-01 23:22 UTC (permalink / raw)
  To: linux-btrfs

Kai Krakow posted on Sun, 01 May 2016 18:54:18 +0200 as excerpted:

>> > I created a backtrace in dmesg using sysrq+w. Here's the log:
>> > (https://gist.github.com/kakra/8e6a2b7ac4047fa9e345648664f5667b)  
>> 
>> FWIW, that link give me the github "this is not the web page you are
>> looking for" 404 error...  Do you have it set private or something?
> 
> Yes it's private but it's working for me in an incognito window of
> Chromium. So maybe you copied the closing bracket or something into the
> URL? Or some temporary error?

FWIW I didn't actually copy it, I just clicked the link, but my client is 
dumb enough to include closing brackets/parenthesis in links, so...

And I'm dumb enough to have forgotten that irritating bug in my client,
tho I know about it and have caught and corrected it before, so...

Thanks for pointing that out, it was indeed that closing parenthesis.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-01 16:54   ` Kai Krakow
  2016-05-01 23:22     ` Duncan
@ 2016-05-01 23:49     ` Duncan
  2016-05-02 13:19       ` Austin S. Hemmelgarn
  2016-05-02  0:23     ` Duncan
  2 siblings, 1 reply; 19+ messages in thread
From: Duncan @ 2016-05-01 23:49 UTC (permalink / raw)
  To: linux-btrfs

Kai Krakow posted on Sun, 01 May 2016 18:54:18 +0200 as excerpted:

> It affects all file systems. The "btrfs fi sync" is used to finish my
> rsync backup and ensure everything is written before I'm trying to
> unmount it or the system goes back to sleep.
> 
> "df" and friends also freeze on tmp (ramdisk) fs and vfat fs (my EFI
> boot partition).

That's just weird, there.  df on tmpfs triggers a stall as well?  Weird 
to the point I wonder if you're seeing a general block layer bug... only 
involving btrfs if btrfs is somehow fowling up the block layer for 
everyone else as well.  Whatever it is, it's obviously far different than 
my first guess, which now looks ridiculously wrong.  Oh, well...

But given that I've seen no other reports of it, I'd consider the things 
that are unique to your kernel config compared to others running btrfs.  
The first thing off the top of my head, bcache.  Do you happen to follow 
the bcache list (presuming it has one) as well?  Given this new 
information and the contrast with others running btrfs, I'd wonder if 
they've had similar reported.

If I understand correctly, with bcache, you can't entirely turn it off or 
bypass it as the filesystems are installed on top of it.  Have you tried 
tinkering with its writeback settings?  Alternatively, if you have the 
space/time to test, try creating a new btrfs with the same options, etc, 
but without the bcache layer, and see if it gives you similar behavior in 
terms of even df of a tmpfs hanging like that, of course without the 
positive effects of bcache as well, tho.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-01 16:54   ` Kai Krakow
  2016-05-01 23:22     ` Duncan
  2016-05-01 23:49     ` Duncan
@ 2016-05-02  0:23     ` Duncan
  2 siblings, 0 replies; 19+ messages in thread
From: Duncan @ 2016-05-02  0:23 UTC (permalink / raw)
  To: linux-btrfs

Kai Krakow posted on Sun, 01 May 2016 18:54:18 +0200 as excerpted:

> I didn't now about smart-live-rebuild. Interesting. ;-)

Yeah.  I didn't know about it until a few years ago, either, including 
when I first setup kde4-live, tho at that point I was doing the stable-
branch live, not master live, so there were fewer changes.  But I was 
relying only on ccache to speed things up and rebuilding all my live 
packages at every update, at the time.

Once I saw mention of smart-live-rebuild, however, and tried it, I soon 
switched to live-master, and ran it in the kde4 era for a couple years 
until development on kde4 nearly stopped as they focused on the 
frameworks5-based versions.  Then I waited until an actual frameworks5-
based plasma in /release/ versions would install and work for me, before 
working on eliminating the deps for features I wasn't going to use 
anyway, and then finally switching again to now frameworks5-based live-
git versions, only recently.  Only this time I skipped the beta release 
step and the stable-branch step and went directly from working release to 
live-git-master.

But smart-live-rebuild *definitely* makes it easier, as do my helper 
scripts that let me easily follow git logs for every package, and even do 
git bisects when necessary, without having to manually switch into the 
package's git dir first. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-01  7:00 commands like "du", "df", and "btrfs fs sync" hang Kai Krakow
  2016-05-01 12:47 ` Duncan
@ 2016-05-02  0:39 ` Nicholas D Steeves
  2016-05-03  6:48   ` Kai Krakow
  1 sibling, 1 reply; 19+ messages in thread
From: Nicholas D Steeves @ 2016-05-02  0:39 UTC (permalink / raw)
  To: Kai Krakow; +Cc: Btrfs BTRFS

On 1 May 2016 at 03:00, Kai Krakow <hurikhan77@gmail.com> wrote:
> Hello!
>
> I'm not sure what triggeres this, neither if it is btrfs specific. The
> filesystems have been recreated from scratch. Mainly during my rsync
> backup (from btrfs to btrfs), but not necessarily limited to rsync
> usage, my system experiences uninterruptable freezes of commands like
> "df", "du", "btrfs fs sync", and probably more.

Out of curiosity, does this occur if you don't create or delete
snapshots, or if your backup script doesn't create or delete
snapshots?  And when it occurs, are you able to go to another terminal
and run a command that you don't use often, that definitely had to be
read from disk, but that doesn't query any filesystem properties (eg:
whois)?

Thanks,
Nicholas

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-01 12:47 ` Duncan
  2016-05-01 13:54   ` Mike Fleetwood
  2016-05-01 16:54   ` Kai Krakow
@ 2016-05-02 13:13   ` Austin S. Hemmelgarn
  2016-05-03 15:09     ` Duncan
  2 siblings, 1 reply; 19+ messages in thread
From: Austin S. Hemmelgarn @ 2016-05-02 13:13 UTC (permalink / raw)
  To: Duncan, linux-btrfs

On 2016-05-01 08:47, Duncan wrote:
> Meanwhile, what kernel IO scheduler do you use (deadline, noop,
> cfq,... cfq is the normal default)?  Do you use either normal
> process nice/priority or ionice to control the rsync?  What
> about cgroups?
CFQ is the default on many systems, unless you are using a new enough 
kernel, in which case the blk-mq code is used instead, which has no real 
I/O scheduler yet (although in my experience, it really doesn't need 
it).  As of right now, the SCSI layer (which in turn means USB and SATA 
as well,as they are routed through the SCSI layer), and the 
device-mapper code have the option of using this code (some distros may 
be using it already, there's a kconfig option to enable it by default 
for each of them), and the NVMe driver is unconditionally using it. 
There's also talk of the MMC driver adding support, although I don't 
know how soon.
>
> And finally, what are your sysctl aka /proc/sys/vm settings for
> dirty_* and vfs_cache_pressure?  Have you modified these from
> defaults at all, either by changing your /etc/sysctl.(d|conf)
> vm.* settings, or by writing directly to the files in
> /proc/sys/vm, say at bootup (which is what changing the sysctl
> config actually does)?
>
> Because the default cache settings as a percent of memory were
> setup back when memory was much smaller, and many people argue
> that on today's multi-GiB memory machines, they allow far too
> much dirty content to accumulate in cache before triggering
> writeback, particularly for slow spinning rust.  When the time
> expiry is set to 30 seconds but the amount of dirty data allowed
> to accumulate before triggering high priority writeback is near
> a minute's worth of writeback activity, something's wrong,
> and that's very often a large part of the reason people see
> IO related freezes in other processes trying to read stuff
> off disk, as well as other lagginess on occasion (tho as
> touched on above, that's rarer these days, due to point-to-
> point buses such as PCIE and SATA, as opposed to the old
> shared buses of PCI and EIDE).
>
>
> FWIW, while I have ssds for the main system now, I already
> had my system tuned in this regard:
>
> 1) I set things like emerge, kernel building, rsync, etc,
> to idle/batch priority (19 niceness), which is more
> efficient for cpu scheduling batch processes as they get
> longer time slices but at idle/lowest priority so they
> don't disturb other tasks much.
>
> Additionally, for the cfq IO scheduler, it sees the idle
> cpu priority and automatically lowers IO priority as well,
> so manual use of ionice isn't necessary.  (I don't believe
> the other schedulers account for this, however, which is
> one reason I asked about them, above.)
The deadline scheduler doesn't have a concept of I/O priority, no-op by 
definition can't, and blk-mq code seems to honor it, but I'm not sure in 
what way.
>
> For something like rsync that's normally IO-bound anyway,
> the primary effect would be the automatic IO-nicing due
> to the process nicing.
>
> That was pretty effective at that level.  But combined
> with #2, it's even more so.
>
> 2) I tuned my vm.dirty settings to trigger at much
> lower sizes for both low priority background writeback
> and higher priority foreground writeback.  You can
> check the kernel's procfs documentation for the
> specific settings if you like, but here's what I have
> for my 16 GiB system (ssd now as I said, but I saw
> no reason to change it from the spinning rust settings
> I had, particularly since I still use spinning rust
> for my media partition).
>
> Direct from that section of my /etc/sysctl.conf:
>
> ################################################################################
> # Virtual-machine: swap, write-cache
>
> # vm.vfs_cache_pressure = 100
> # vm.laptop_mode = 0
> # vm.swappiness = 60
> vm.swappiness = 100
>
> # write-cache, foreground/background flushing
> # vm.dirty_ratio = 10 (% of RAM)
> # make it 3% of 16G ~ half a gig
> vm.dirty_ratio = 3
> # vm.dirty_bytes = 0
>
> # vm.dirty_background_ratio = 5 (% of RAM)
> # make it 1% of 16G ~ 160 M
> vm.dirty_background_ratio = 1
> # vm.dirty_background_bytes = 0
>
> # vm.dirty_expire_centisecs = 2999 (30 sec)
> # vm.dirty_writeback_centisecs = 499 (5 sec)
> # make it 10 sec
> vm.dirty_writeback_centisecs = 1000
> ################################################################################
>
> The commented values are normal defaults.  Either ratio, percentage of RAM, or
> direct bytes can be set.  Setting one clears (zeros) the other.  While the
> ratio isn't of total RAM but of available RAM, total RAM's a reasonable
> approximation on most modern systems.
>
> Taking the foreground, vm.dirty_ratio setting first:
>
> Spinning rust may be as low as 30 MiB/sec thruput, and 10% of 16 gig of RAM
> is 1.6 gig, ~1600+ meg.  Doing the math that's ~53 seconds worth of writeback
> accumulated by default, before it kicks into high priority writeback mode.
> With a 30 second default timeout, that makes no sense at all as it's almost
> double the timeout!
>
> Besides, who wants to wait nearly a minute for it to dump all that?
>
> So I set that to 3%, which with 16 gigs of RAM is ~half a gig, or about 16
> seconds worth of writeback at 30 MB/sec.  That's only about half the 30
> second time expiry and isn't /too/ bad to wait, tho you'll probably notice
> if it takes that full 16 seconds.  But it's reasonable, and given
> the integer setting and that we want background set lower, 3% is getting
> about as low as practical.  (Obviously if I upped to 32 GiB RAM, I'd
> want to switch to the bytes setting for this very reason.)
>
> The background vm.dirty_background_ratio setting is where the lower
> priority background writeback kicks in, so with luck the higher priority
> foreground limit is never reached, tho it obviously will be for something
> doing a lot of writing, like rsync often will.  So this should be lower
> than foreground.
>
> With foreground set to 3%, that doesn't leave much room for background, but 1%
> is still reasonable.  That's about 160 MB or 5 seconds worth of writeback at
> 30 MB/sec, so it's reasonable.
I think you've got things a bit confused here (at least, what you are 
saying doesn't match the documentation, and also doesn't match my 
(limited) understanding of the code).  The background values are 
system-wide values, while the unlabeled ones (what you call foreground) 
are per-process.  IOW, the background value provides an upper bound for 
the whole system, while the foreground ones provide an upper bound for 
each individual process.  As a result, it is actually somewhat 
nonsensical to use a background value that is lower than the foreground 
value unless you don't want to limit things per-process as well as 
system wide.

I find on my systems that setting a 64M limit per-process and 256M 
system wide gives a good balance between throughput and latency (I see a 
slight slowdown on my SSD's, but see reduced write latency on flash 
drives and traditional HDD's).

As for the timeouts, dirty_expire_centisecs is the longest time that a 
writeback buffer can sit around before being flushed (note that this is 
an upper bound, not something telling the kernel to wait this long 
before flushing it), and dirty_writeback_centisecs is how often the 
kernel will flush a chunk of the buffer once it starts flushing 
writeback buffers.

On my systems, I actually have those set way below the default values, 
at roughly 1 second for expiration, and 0.1 seconds for periodic 
writeback.  Based on testing, I've found that this is roughly the amount 
of time it takes for most desktop apps to fill a 64M buffer and how long 
on average it takes for my systems to flush it under light load 
respectively.

All of this will of course change if Linux ever gets the ability to set 
per-device write-back settings (OS X and newer versions of Windows for 
example automatically disable write-back buffering on flash drives).
> As you can see, with the size triggers lowered to something reasonable, I
> decided I could actually up the background time expiry from the default 5
> seconds to 10.  That's still well and safely under the 30 second foreground
> time expiry, and with the stronger size triggers I thought it was reasonable.
>
> I haven't actually touched vfs_cache_pressure, but that's another related
> knob you can twist.
I wouldn't touch this much, it's mostly or fine-tuning for specific 
applications (for example, on a system with very little RAM and 
unpredictable file access patterns, increasing this might increase 
performance).
>
> I don't actually have any swap configured on my current machine (and
> indeed, have it disabled in the kernel config) so the swappiness setting
> doesn't matter, but I did have swap setup on my old 8 GiB machine.  I was
> actually using the priority= option to set four swap partitions, one
> each on four different devices, to the same priority, effectively
> raid0-striping them.  My system was mdraid1 on the same four devices.
> So I upped swappiness to 100 (it's more or less percentage, 0-100 range)
> from its default 60, to force swapping out over dumping cache.  Even still,
> I seldom went more than a few hundred MiB into swap, so the 8 GiB was
> just about perfect for that 4-cores system.  (My new system has 6
> cores and I figured the same 2 GiB per core.  I think now a more realistic
> figure for my usage at least is say 4 GiB, plus a GiB per core, which
> seems to work out for both my old 4-core system with 8 gig RAM, and
> my newer 6-core system where 12 gig of RAM would hit about the same
> usage.  I have 16 gig, but seldom actually use the last 4 gig at all,
> so 12 gig would be perfect, and then I might configure swap, just in
> case.  But swap would be a waste with 16 gig since I literally seldom
> use that last 4 gig even for cache anyway, so it's effectively my
> overflow area.)
>
> I used laptop mode, with laptop-mode-tools, on my netbook, the reason
> it's comment-listed there as of-interest, but I've no reason to use
> it on my main machine, which is wall-plug powered.  IIRC when active
> it would try to flush everything possible once the disk was already
> active, so it could spin down for longer between activations.
> Laptop-mode-tools allows configuring a bunch of other stuff to toggle
> between wall power and battery power as well, and of course one of
> the other settings it changed was the above vm.dirty_* settings, to
> much larger triggers and longer timeouts (upto 5 minutes or so,
> tho I've read of people going to extremes and setting it to 15
> minutes or even longer, tho that of course risks losing all that work
> in a crash (!!)), again, to let the drive stay spun down for longer.
If you're using an SSD, then deferring write-back has limited value. 
Good SSD's usually use no more than 2W when writing and less than 1W 
when reading (and milliwatts when idle even when not in standby or 
powered down), as compared to usually greater than 5W power spikes to 
spin up traditional disks and roughly 2 to keep them spinning.

As far as the other things covered by 'laptop-mode', the standard 
scripts in pm-utils cover pretty much everything there.
>
>
> Between the two, setting much lower writeback cache size triggers,
> and using a nice of 19 to set idle/batch mode for stuff like emerge
> and rsync, after setting that bios setting on the old system as well,
> responsiveness under heavy load, either IO or CPU, was MUCH better.
>
> Hopefully you'll find it's similarly effective for you, assuming
> you don't already have similar settings and are STILL seeing
> the problem.
>
> Of course for things like rsync, you could probably use ionice (with
> the cfq io scheduler) instead and not bother doing normal nice on
> rsync, but because the cfq scheduler already effectively does ionice
> on strongly niced processes, I didn't need to worry about it here.
>
> As I said, with ssds I don't really need it that strict any more,
> but I saw no need to change it, and indeed, with spinning rust still
> for my media drive, it's still useful there.
>
> So there it is.  Hope it's useful. =:^)
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-01 23:49     ` Duncan
@ 2016-05-02 13:19       ` Austin S. Hemmelgarn
  2016-05-03  6:44         ` Kai Krakow
  0 siblings, 1 reply; 19+ messages in thread
From: Austin S. Hemmelgarn @ 2016-05-02 13:19 UTC (permalink / raw)
  To: linux-btrfs

On 2016-05-01 19:49, Duncan wrote:
> Kai Krakow posted on Sun, 01 May 2016 18:54:18 +0200 as excerpted:
>
>> It affects all file systems. The "btrfs fi sync" is used to finish my
>> rsync backup and ensure everything is written before I'm trying to
>> unmount it or the system goes back to sleep.
>>
>> "df" and friends also freeze on tmp (ramdisk) fs and vfat fs (my EFI
>> boot partition).
>
> That's just weird, there.  df on tmpfs triggers a stall as well?  Weird
> to the point I wonder if you're seeing a general block layer bug... only
> involving btrfs if btrfs is somehow fowling up the block layer for
> everyone else as well.  Whatever it is, it's obviously far different than
> my first guess, which now looks ridiculously wrong.  Oh, well...
The question is: is it actually on a ramdisk (with a separate filesystem 
on top of that), or is it on tmpfs?  The ramdisk method is used by some 
people because it lets you do some more interesting things, but tmpfs is 
much more efficient (in terms of both processing time, and RAM usage). 
If it's on a ramdisk, then it may be a block layer bug, but if it's on 
tmpfs, then it's a VFS layer bug instead.

That said, if something i8n BTRFS is causing stalls in the block layer, 
that would definately explain what is going on, as that can cause all 
I/O to stall across the whole system.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-02 13:19       ` Austin S. Hemmelgarn
@ 2016-05-03  6:44         ` Kai Krakow
  0 siblings, 0 replies; 19+ messages in thread
From: Kai Krakow @ 2016-05-03  6:44 UTC (permalink / raw)
  To: linux-btrfs

Am Mon, 2 May 2016 09:19:13 -0400
schrieb "Austin S. Hemmelgarn" <ahferroin7@gmail.com>:

> On 2016-05-01 19:49, Duncan wrote:
> > Kai Krakow posted on Sun, 01 May 2016 18:54:18 +0200 as excerpted:
> >  
> >> It affects all file systems. The "btrfs fi sync" is used to finish
> >> my rsync backup and ensure everything is written before I'm trying
> >> to unmount it or the system goes back to sleep.
> >>
> >> "df" and friends also freeze on tmp (ramdisk) fs and vfat fs (my
> >> EFI boot partition).  
> >
> > That's just weird, there.  df on tmpfs triggers a stall as well?
> > Weird to the point I wonder if you're seeing a general block layer
> > bug... only involving btrfs if btrfs is somehow fowling up the
> > block layer for everyone else as well.  Whatever it is, it's
> > obviously far different than my first guess, which now looks
> > ridiculously wrong.  Oh, well...  
> The question is: is it actually on a ramdisk (with a separate
> filesystem on top of that), or is it on tmpfs?  The ramdisk method is
> used by some people because it lets you do some more interesting
> things, but tmpfs is much more efficient (in terms of both processing
> time, and RAM usage). If it's on a ramdisk, then it may be a block
> layer bug, but if it's on tmpfs, then it's a VFS layer bug instead.

Sorry for the confusion: It's actually tmpfs - NOT ramdisk.

> That said, if something i8n BTRFS is causing stalls in the block
> layer, that would definately explain what is going on, as that can
> cause all I/O to stall across the whole system.

It seems to be triggered by access to btrfs for me, tho I don't use
tmpfs that much, so this assumption is more or less bogus.


-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-02  0:39 ` Nicholas D Steeves
@ 2016-05-03  6:48   ` Kai Krakow
  2016-05-05  6:35     ` Kai Krakow
  0 siblings, 1 reply; 19+ messages in thread
From: Kai Krakow @ 2016-05-03  6:48 UTC (permalink / raw)
  To: linux-btrfs

Am Sun, 1 May 2016 20:39:31 -0400
schrieb Nicholas D Steeves <nsteeves@gmail.com>:

> On 1 May 2016 at 03:00, Kai Krakow <hurikhan77@gmail.com> wrote:
> > Hello!
> >
> > I'm not sure what triggeres this, neither if it is btrfs specific.
> > The filesystems have been recreated from scratch. Mainly during my
> > rsync backup (from btrfs to btrfs), but not necessarily limited to
> > rsync usage, my system experiences uninterruptable freezes of
> > commands like "df", "du", "btrfs fs sync", and probably more.  
> 
> Out of curiosity, does this occur if you don't create or delete
> snapshots, or if your backup script doesn't create or delete
> snapshots?  And when it occurs, are you able to go to another terminal
> and run a command that you don't use often, that definitely had to be
> read from disk, but that doesn't query any filesystem properties (eg:
> whois)?

I only create snapshots in the destination device, not on the source. I
could try disabling the snapshotting and see if it changes things.

It seems from my observation, that only programs querying disk free or
disk usage status hang, especially all of the btrfs subcommands doing
it hang, in addition to traditional programs like du and df. I think
also "btrfs sub delete" and friends hang. So your guess may very well
go into the right direction. Let me try. Coming back later...

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-02 13:13   ` Austin S. Hemmelgarn
@ 2016-05-03 15:09     ` Duncan
  0 siblings, 0 replies; 19+ messages in thread
From: Duncan @ 2016-05-03 15:09 UTC (permalink / raw)
  To: linux-btrfs

Austin S. Hemmelgarn posted on Mon, 02 May 2016 09:13:41 -0400 as
excerpted:

>> Direct from that section of my /etc/sysctl.conf:
>>
>>
>> # write-cache, foreground/background flushing 
>> # vm.dirty_ratio = 10 (% of RAM)
>> # make it 3% of 16G ~ half a gig 
>> vm.dirty_ratio = 3 
>> # vm.dirty_bytes = 0
>>
>> # vm.dirty_background_ratio = 5 (% of RAM)
>> # make it 1% of 16G ~ 160 M
>> vm.dirty_background_ratio = 1
>> # vm.dirty_background_bytes = 0
>>
>> The commented values are normal defaults.  Either ratio, percentage of
>> RAM, or direct bytes can be set.  Setting one clears (zeros) the other.
>>  While the ratio isn't of total RAM but of available RAM, total RAM's a
>> reasonable approximation on most modern systems.
>>
>> Taking the foreground, vm.dirty_ratio setting first:
>> [I] set that to 3%, which with 16 gigs of RAM is ~half a gig, or about
>> 16 seconds worth of writeback at 30 MB/sec.  That's only about half the
>> 30 second time expiry and isn't /too/ bad to wait, tho you'll probably
>> notice if it takes that full 16 seconds.  But it's reasonable, and
>> given the integer setting and that we want background set lower, 3% is
>> getting about as low as practical.  (Obviously if I upped to 32 GiB
>> RAM, I'd want to switch to the bytes setting for this very reason.)
>>
>> The background vm.dirty_background_ratio setting is where the lower
>> priority background writeback kicks in, so with luck the higher
>> priority foreground limit is never reached, tho it obviously will be
>> for something doing a lot of writing, like rsync often will.  So this
>> should be lower than foreground.
>>
>> With foreground set to 3%, that doesn't leave much room for background,
>> but 1% is still reasonable.  That's about 160 MB or 5 seconds worth of
>> writeback at 30 MB/sec, so it's reasonable.

> I think you've got things a bit confused here (at least, what you are
> saying doesn't match the documentation, and also doesn't match my
> (limited) understanding of the code).  The background values are
> system-wide values, while the unlabeled ones (what you call foreground)
> are per-process.  IOW, the background value provides an upper bound for
> the whole system, while the foreground ones provide an upper bound for
> each individual process.  As a result, it is actually somewhat
> nonsensical to use a background value that is lower than the foreground
> value unless you don't want to limit things per-process as well as
> system wide.

Why is the background value lower (5% background compared to 10% 
unlabeled/foreground) by default, then?

AFAIK, you're partially but not entirely correct.  The values are both 
global, and it's the kernel via the various block and filesystem drivers 
(well, except for fuse, etc, in userspace) that's actually doing the 
writing.  Where processes come into the picture with the unlabeled/
foreground value is that above that value, the kernel charges writes to 
the processes dirtying even more memory, effectively throttling them as 
more of their timeslice is spent by the kernel writing stuff out, leaving 
less of it to actually dirty even more cache pages, until a process can't 
dirty memory any faster than that memory can be written to backing store.

So while writeback with dirty memory above the unlabeled/foreground value 
is indeed happening in process context, because the kernel is charging 
write time to individual processes at that point (the part you got 
right), the value is still global and it's still actually the kernel 
doing the writing -- it's just charging the process for it at that point, 
whereas when the dirty memory drops below the unlabeled/foreground value, 
processes don't get penalized for dirtying memory -- handling the dirty 
memory is then simply a service provided by the kernel with no penalty in 
terms of process timeslice.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-03  6:48   ` Kai Krakow
@ 2016-05-05  6:35     ` Kai Krakow
  2016-05-07 11:40       ` Kai Krakow
  0 siblings, 1 reply; 19+ messages in thread
From: Kai Krakow @ 2016-05-05  6:35 UTC (permalink / raw)
  To: linux-btrfs

Am Tue, 3 May 2016 08:48:14 +0200
schrieb Kai Krakow <hurikhan77@gmail.com>:

> Am Sun, 1 May 2016 20:39:31 -0400
> schrieb Nicholas D Steeves <nsteeves@gmail.com>:
> 
> > On 1 May 2016 at 03:00, Kai Krakow <hurikhan77@gmail.com> wrote:  
> > > Hello!
> > >
> > > I'm not sure what triggeres this, neither if it is btrfs specific.
> > > The filesystems have been recreated from scratch. Mainly during my
> > > rsync backup (from btrfs to btrfs), but not necessarily limited to
> > > rsync usage, my system experiences uninterruptable freezes of
> > > commands like "df", "du", "btrfs fs sync", and probably more.    
> > 
> > Out of curiosity, does this occur if you don't create or delete
> > snapshots, or if your backup script doesn't create or delete
> > snapshots?  And when it occurs, are you able to go to another
> > terminal and run a command that you don't use often, that
> > definitely had to be read from disk, but that doesn't query any
> > filesystem properties (eg: whois)?  
> 
> I only create snapshots in the destination device, not on the source.
> I could try disabling the snapshotting and see if it changes things.
> 
> It seems from my observation, that only programs querying disk free or
> disk usage status hang, especially all of the btrfs subcommands doing
> it hang, in addition to traditional programs like du and df. I think
> also "btrfs sub delete" and friends hang. So your guess may very well
> go into the right direction. Let me try. Coming back later...

With the snapshot and sync related bits disabled in my script, I no
longer experience freezing df/du/... commands.


-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-05  6:35     ` Kai Krakow
@ 2016-05-07 11:40       ` Kai Krakow
  2016-05-09 17:13         ` Nicholas D Steeves
  0 siblings, 1 reply; 19+ messages in thread
From: Kai Krakow @ 2016-05-07 11:40 UTC (permalink / raw)
  To: linux-btrfs

Am Thu, 5 May 2016 08:35:37 +0200
schrieb Kai Krakow <hurikhan77@gmail.com>:

> Am Tue, 3 May 2016 08:48:14 +0200
> schrieb Kai Krakow <hurikhan77@gmail.com>:
> 
> > Am Sun, 1 May 2016 20:39:31 -0400
> > schrieb Nicholas D Steeves <nsteeves@gmail.com>:
> >   
> > > On 1 May 2016 at 03:00, Kai Krakow <hurikhan77@gmail.com>
> > > wrote:    
>  [...]  
> > > 
> > > Out of curiosity, does this occur if you don't create or delete
> > > snapshots, or if your backup script doesn't create or delete
> > > snapshots?  And when it occurs, are you able to go to another
> > > terminal and run a command that you don't use often, that
> > > definitely had to be read from disk, but that doesn't query any
> > > filesystem properties (eg: whois)?    
> > 
> > I only create snapshots in the destination device, not on the
> > source. I could try disabling the snapshotting and see if it
> > changes things.
> > 
> > It seems from my observation, that only programs querying disk free
> > or disk usage status hang, especially all of the btrfs subcommands
> > doing it hang, in addition to traditional programs like du and df.
> > I think also "btrfs sub delete" and friends hang. So your guess may
> > very well go into the right direction. Let me try. Coming back
> > later...  
> 
> With the snapshot and sync related bits disabled in my script, I no
> longer experience freezing df/du/... commands.

Ah well, I still do - it just triggers much less often.

But I can track it down to the automounter now: As soon as I stop the
automounter of my backup device, du/df/etc no longer hang. Still I'm
not sure if the problem is originating from btrfs or autofs itself.

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-07 11:40       ` Kai Krakow
@ 2016-05-09 17:13         ` Nicholas D Steeves
  2016-05-09 18:20           ` Kai Krakow
  0 siblings, 1 reply; 19+ messages in thread
From: Nicholas D Steeves @ 2016-05-09 17:13 UTC (permalink / raw)
  To: Kai Krakow; +Cc: Btrfs BTRFS

On May 7, 2016 7:43 AM, "Kai Krakow" <hurikhan77@gmail.com> wrote:
>
> Am Thu, 5 May 2016 08:35:37 +0200
> schrieb Kai Krakow <hurikhan77@gmail.com>:
>
> > Am Tue, 3 May 2016 08:48:14 +0200
> > schrieb Kai Krakow <hurikhan77@gmail.com>:
> >
> > > Am Sun, 1 May 2016 20:39:31 -0400
> > > schrieb Nicholas D Steeves <nsteeves@gmail.com>:
> > >
> > > > On 1 May 2016 at 03:00, Kai Krakow <hurikhan77@gmail.com>
> > > > wrote:
> >  [...]
> > > >
> > > > Out of curiosity, does this occur if you don't create or delete
> > > > snapshots, or if your backup script doesn't create or delete
> > > > snapshots?  And when it occurs, are you able to go to another
> > > > terminal and run a command that you don't use often, that
> > > > definitely had to be read from disk, but that doesn't query any
> > > > filesystem properties (eg: whois)?
> > >
> > > I only create snapshots in the destination device, not on the
> > > source. I could try disabling the snapshotting and see if it
> > > changes things.
> > >
> > > It seems from my observation, that only programs querying disk free
> > > or disk usage status hang, especially all of the btrfs subcommands
> > > doing it hang, in addition to traditional programs like du and df.
> > > I think also "btrfs sub delete" and friends hang. So your guess may
> > > very well go into the right direction. Let me try. Coming back
> > > later...
> >
> > With the snapshot and sync related bits disabled in my script, I no
> > longer experience freezing df/du/... commands.
>
> Ah well, I still do - it just triggers much less often.
>
> But I can track it down to the automounter now: As soon as I stop the
> automounter of my backup device, du/df/etc no longer hang. Still I'm
> not sure if the problem is originating from btrfs or autofs itself.
>

Hi Kai,

If it's anything like what I've encountered it can be worked around
with careful use of sleep, sync, btrfs sub sync and/or btrfs fi sync.
Also, if it's a similar issue to my own then from what I've read on
this list it's a timing or race issue (or maybe a locking issue).
There's a Debian bug where someone is testing out different
values/strategies.  The values/strategy I use can be found in
btrfs-borg on github.  If I'm correct, you'll probably also need to
script in delays to things time to settle with your automounter.

Of course, ideally someone will one day spend the time debugging
what's actually going on.

Cheers,
Nick

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: commands like "du", "df", and "btrfs fs sync" hang
  2016-05-09 17:13         ` Nicholas D Steeves
@ 2016-05-09 18:20           ` Kai Krakow
  0 siblings, 0 replies; 19+ messages in thread
From: Kai Krakow @ 2016-05-09 18:20 UTC (permalink / raw)
  To: linux-btrfs

Am Mon, 9 May 2016 13:13:53 -0400
schrieb Nicholas D Steeves <nsteeves@gmail.com>:

> On May 7, 2016 7:43 AM, "Kai Krakow" <hurikhan77@gmail.com> wrote:
> >
> > Am Thu, 5 May 2016 08:35:37 +0200
> > schrieb Kai Krakow <hurikhan77@gmail.com>:
> >  
> > > Am Tue, 3 May 2016 08:48:14 +0200
> > > schrieb Kai Krakow <hurikhan77@gmail.com>:
> > >  
>  [...]  
>  [...]  
> > >  [...]  
>  [...]  
>  [...]  
> > >
> > > With the snapshot and sync related bits disabled in my script, I
> > > no longer experience freezing df/du/... commands.  
> >
> > Ah well, I still do - it just triggers much less often.
> >
> > But I can track it down to the automounter now: As soon as I stop
> > the automounter of my backup device, du/df/etc no longer hang.
> > Still I'm not sure if the problem is originating from btrfs or
> > autofs itself. 
> 
> Hi Kai,
> 
> If it's anything like what I've encountered it can be worked around
> with careful use of sleep, sync, btrfs sub sync and/or btrfs fi sync.
> Also, if it's a similar issue to my own then from what I've read on
> this list it's a timing or race issue (or maybe a locking issue).
> There's a Debian bug where someone is testing out different
> values/strategies.  The values/strategy I use can be found in
> btrfs-borg on github.  If I'm correct, you'll probably also need to
> script in delays to things time to settle with your automounter.
> 
> Of course, ideally someone will one day spend the time debugging
> what's actually going on.

Thanks for the pointer... Maybe I could just remove the idle timeout of
the automount unit, so it never tries to automatically unmount once
mounted. Tho, I prefer if it did that. I don't want the backup FS being
online all of the time.

Maybe such a race results from btrfs still doing background work (like
removing snapshots) while there is no longer a user of the mount point
running (processes terminated) - and now, when systemd's automounter
tries to unmount, it doesn't work and autofs hangs.

If I know experience hanging processes, I can simply "systemctl restart
mnt-usbdevice.automount" and the hanging processes start to work again.

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-05-09 18:20 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-01  7:00 commands like "du", "df", and "btrfs fs sync" hang Kai Krakow
2016-05-01 12:47 ` Duncan
2016-05-01 13:54   ` Mike Fleetwood
2016-05-01 23:16     ` Duncan
2016-05-01 16:54   ` Kai Krakow
2016-05-01 23:22     ` Duncan
2016-05-01 23:49     ` Duncan
2016-05-02 13:19       ` Austin S. Hemmelgarn
2016-05-03  6:44         ` Kai Krakow
2016-05-02  0:23     ` Duncan
2016-05-02 13:13   ` Austin S. Hemmelgarn
2016-05-03 15:09     ` Duncan
2016-05-02  0:39 ` Nicholas D Steeves
2016-05-03  6:48   ` Kai Krakow
2016-05-05  6:35     ` Kai Krakow
2016-05-07 11:40       ` Kai Krakow
2016-05-09 17:13         ` Nicholas D Steeves
2016-05-09 18:20           ` Kai Krakow
  -- strict thread matches above, loose matches on Subject: below --
2016-05-01 10:22 Kai Krakow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).