* lots of snapshots, forcing defragmentation, and figuring out which files to defrag or nodatacow
@ 2015-09-29 5:51 James Cook
2015-09-30 6:33 ` Duncan
0 siblings, 1 reply; 2+ messages in thread
From: James Cook @ 2015-09-29 5:51 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 2183 bytes --]
(Trying again in plain text and without the 100KiB log file.)
Hi linux-btrfs,
The context of these three questions is that I'm experiencing
occasional hangs for several seconds while btrfs-transacti works, and
very long boot times. General comments welcome. System info at bottom,
end part of dmesg.log attached.
Q1:
I keep a lot of read-only snapshots (> 300 total) of all of my
subvolumes and haven't deleted any so far. Is this in itself a problem
or unanticipated use of btrfs?
Q2:
I have some files that remain heavily fragmented (according to
filefrag) even after I run btrfs fi defragment. I think this happens
because btrfs doesn't want to unlink parts of the files from their
snapshotted copies. Can I tell btrfs to defragment them anyway, and
not worry about wasting space? And can I make the autodefrag mount
option do this?
For example (not all output shown):
# filefrag *
...
system@1973a03e3af1449ba5dd93362953fd5f-0000000000000001-00051f9377f11af6.journal:
553 extents found
...
# btrfs fi defragment -rf .
# filefrag *
...
system@1973a03e3af1449ba5dd93362953fd5f-0000000000000001-00051f9377f11af6.journal:
331 extents found
...
Q3:
What's the best way to tell which files are causing the hangs? My
current method is to make an educated guess (e.g. think of programs
that store large database files) then use filefrag to see if there's
fragmentation. I'm not confident I've found all the sources of bad
performance.
Thanks,
James
System info:
[root@angel-nixos:~]# uname -a
Linux angel-nixos 3.18.20 #1-NixOS SMP Thu Jan 1 00:00:01 UTC 1970
x86_64 GNU/Linux
[root@angel-nixos:~]# btrfs --version
btrfs-progs v4.2
[root@angel-nixos:~]# btrfs fi show
Label: 'AngelBtrfs' uuid: 7f4b4b5d-1ba5-46cc-b782-938e3600a427
Total devices 2 FS bytes used 1.06TiB
devid 5 size 2.00TiB used 1.07TiB path /dev/mapper/KfricBtrSec
devid 6 size 2.00TiB used 1.07TiB path /dev/mapper/SfricBtrSec
btrfs-progs v4.2
[root@angel-nixos:~]# btrfs fi df /
Data, RAID1: total=1.01TiB, used=1.01TiB
System, RAID1: total=32.00MiB, used=192.00KiB
Metadata, RAID1: total=60.00GiB, used=58.65GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
[-- Attachment #2: end_of_dmesg.log --]
[-- Type: application/octet-stream, Size: 37047 bytes --]
[ 284.033786] systemd[1]: Stopping Journal Service...
[ 284.038198] systemd[1]: Starting Journal Service...
[ 286.215587] systemd-journald[564]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[ 344.151729] systemd[1]: systemd-journald.service watchdog timeout (limit 1min)!
[ 434.389389] systemd[1]: systemd-journald.service stop-sigabrt timed out. Terminating.
[ 446.750751] systemd[1]: systemd-journald.service: main process exited, code=killed, status=6/ABRT
[ 446.755534] systemd[1]: Failed to start Journal Service.
[ 446.775552] systemd[1]: Dependency failed for Trigger Flushing of Journal to Persistent Storage.
[ 446.786049] systemd[1]: Unit systemd-journald.service entered failed state.
[ 446.786108] systemd[1]: systemd-journald.service failed.
[ 446.789210] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[ 446.809674] systemd[1]: Stopping Journal Service...
[ 446.960987] systemd[1]: Starting Journal Service...
[ 446.970223] systemd-journald[571]: File /run/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[ 480.134384] INFO: task btrfs-transacti:223 blocked for more than 120 seconds.
[ 480.136736] Not tainted 3.18.20 #1-NixOS
[ 480.139076] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 480.141437] btrfs-transacti D ffff88042fc92c40 0 223 2 0x00000000
[ 480.141441] ffff8800cbb73d78 0000000000000046 ffff880418a48880 0000000000012c40
[ 480.141444] ffff8800cbb73fd8 0000000000012c40 ffff88041953cc80 ffff880418a48880
[ 480.141446] ffff8800cbb73d88 ffff880418cf4000 ffff880418cf4070 ffff8800cb4181a0
[ 480.141448] Call Trace:
[ 480.141455] [<ffffffff814b2459>] schedule+0x29/0x70
[ 480.141482] [<ffffffffa005c6d5>] wait_for_commit.isra.14+0x45/0x80 [btrfs]
[ 480.141486] [<ffffffff8109f990>] ? wait_woken+0x60/0x60
[ 480.141497] [<ffffffffa005d4d2>] btrfs_commit_transaction+0x132/0xa70 [btrfs]
[ 480.141507] [<ffffffffa005dea3>] ? start_transaction+0x93/0x5a0 [btrfs]
[ 480.141517] [<ffffffffa0059345>] transaction_kthread+0x235/0x250 [btrfs]
[ 480.141527] [<ffffffffa0059110>] ? btrfs_cleanup_transaction+0x540/0x540 [btrfs]
[ 480.141530] [<ffffffff81083bd9>] kthread+0xc9/0xe0
[ 480.141534] [<ffffffff81010000>] ? xen_smp_intr_init+0x10/0x220
[ 480.141536] [<ffffffff81083b10>] ? kthread_create_on_node+0x180/0x180
[ 480.141538] [<ffffffff814b65d8>] ret_from_fork+0x58/0x90
[ 480.141541] [<ffffffff81083b10>] ? kthread_create_on_node+0x180/0x180
[ 505.704286] BTRFS info (device dm-0): disk space caching is enabled
[ 505.716195] systemd[1]: Started Remount Root and Kernel File Systems.
[ 505.716910] systemd[1]: Starting Load/Save RF Kill Switch Status of rfkill0...
[ 505.719830] systemd[1]: Starting Local File Systems (Pre).
[ 505.722322] systemd[1]: Reached target Local File Systems (Pre).
[ 505.722360] systemd[1]: Mounting /tmp...
[ 505.725196] systemd[1]: Mounting /home...
[ 505.727987] systemd[1]: Mounting /btr...
[ 505.730711] systemd[1]: Mounting /boot...
[ 505.733387] systemd[1]: Starting Load/Save Random Seed...
[ 505.736061] systemd[1]: Starting Update UTMP about System Boot/Shutdown...
[ 505.741789] systemd[1]: Mounted /tmp.
[ 505.749331] systemd[1]: Mounted /btr.
[ 505.764618] systemd[1]: Started Load/Save RF Kill Switch Status of rfkill0.
[ 505.816026] EXT2-fs (sdb3): warning: mounting unchecked fs, running e2fsck is recommended
[ 505.818618] systemd[1]: Mounted /boot.
[ 505.845271] systemd[1]: Mounted /home.
[ 505.846029] systemd[1]: Starting Local File Systems.
[ 505.848239] systemd[1]: Reached target Local File Systems.
[ 505.897926] systemd[1]: Started Update UTMP about System Boot/Shutdown.
[ 505.899460] systemd-journald[571]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[ 505.902188] systemd[1]: Started Load/Save Random Seed.
[ 507.129606] systemd[1]: systemd-journald.service watchdog timeout (limit 1min)!
[ 510.964005] systemd[1]: systemd-journald.service: main process exited, code=killed, status=6/ABRT
[ 510.968902] systemd[1]: Failed to start Journal Service.
[ 510.977288] systemd[1]: Dependency failed for Trigger Flushing of Journal to Persistent Storage.
[ 510.979311] systemd[1]: Unit systemd-journald.service entered failed state.
[ 510.979315] systemd[1]: systemd-journald.service failed.
[ 510.979395] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[ 510.981588] systemd[1]: Stopping Journal Service...
[ 510.985870] systemd[1]: Starting Journal Service...
[ 510.991713] systemd-journald[596]: File /run/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[ 512.212260] systemd-journald[596]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[ 512.376395] systemd[1]: Started Journal Service.
[ 512.531016] systemd-journald[596]: Received request to flush runtime journal from PID 1
[ 528.320098] systemd-journald[596]: Deleted empty journal /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system@00051e779bc895e3-df837743793ef8e2.journal~ (8912896 bytes).
[ 528.320204] systemd-journald[596]: Deleted empty journal /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system@00051e77ace49f89-328cc5f91fee7d07.journal~ (8912896 bytes).
[ 531.116932] r8169 0000:03:00.0 enp3s0: link down
[ 531.116976] r8169 0000:03:00.0 enp3s0: link down
[ 531.117053] IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
[ 531.272952] ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2561s.bin'
[ 531.482224] ieee80211 phy0: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.8
[ 531.500620] IPv6: ADDRCONF(NETDEV_UP): wlp4s6: link is not ready
[ 533.895600] r8169 0000:03:00.0 enp3s0: link up
[ 533.895613] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0: link becomes ready
[ 534.560379] NET: Registered protocol family 17
[ 1136.224439] fuse init (API version 7.23)
[112274.182588] systemd[1]: systemd-journald.service: main process exited, code=killed, status=6/ABRT
[112274.182918] systemd[1]: Unit systemd-journald.service entered failed state.
[112274.182947] systemd[1]: systemd-journald.service failed.
[112274.183641] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[112274.183823] systemd[1]: Stopping Trigger Flushing of Journal to Persistent Storage...
[112274.184460] systemd[1]: Stopping Journal Service...
[112274.184491] systemd[1]: Starting Journal Service...
[112274.658856] systemd-journald[17402]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[112274.742158] systemd[1]: Started Journal Service.
[158451.247261] systemd[1]: systemd-journald.service: main process exited, code=killed, status=6/ABRT
[158451.247452] systemd[1]: Unit systemd-journald.service entered failed state.
[158451.247471] systemd[1]: systemd-journald.service failed.
[158451.247816] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[158451.247869] systemd[1]: Stopping Trigger Flushing of Journal to Persistent Storage...
[158451.248096] systemd[1]: Stopping Journal Service...
[158451.248111] systemd[1]: Starting Journal Service...
[158451.690339] systemd-journald[2422]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[158451.822651] systemd[1]: Started Journal Service.
[233033.975956] systemd[1]: systemd-journald.service: main process exited, code=killed, status=6/ABRT
[233033.976268] systemd[1]: Unit systemd-journald.service entered failed state.
[233033.976297] systemd[1]: systemd-journald.service failed.
[233033.976863] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[233033.976968] systemd[1]: Stopping Trigger Flushing of Journal to Persistent Storage...
[233033.977382] systemd[1]: Stopping Journal Service...
[233033.977411] systemd[1]: Starting Journal Service...
[233038.512986] systemd-journald[1625]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[233038.774155] systemd[1]: Started Journal Service.
[292247.500385] systemd[1]: systemd-journald.service: main process exited, code=killed, status=6/ABRT
[292247.500600] systemd[1]: Unit systemd-journald.service entered failed state.
[292247.500619] systemd[1]: systemd-journald.service failed.
[292247.500954] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[292247.501008] systemd[1]: Stopping Trigger Flushing of Journal to Persistent Storage...
[292247.501235] systemd[1]: Stopping Journal Service...
[292247.501251] systemd[1]: Starting Journal Service...
[292248.076928] systemd-journald[16434]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[292248.253405] systemd[1]: Started Journal Service.
[349176.790408] systemd[1]: systemd-journald.service: main process exited, code=killed, status=6/ABRT
[349176.790605] systemd[1]: Unit systemd-journald.service entered failed state.
[349176.790624] systemd[1]: systemd-journald.service failed.
[349176.790960] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[349176.791017] systemd[1]: Stopping Trigger Flushing of Journal to Persistent Storage...
[349176.791250] systemd[1]: Stopping Journal Service...
[349176.791266] systemd[1]: Starting Journal Service...
[349177.409266] systemd-journald[14306]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[349177.515251] systemd[1]: Started Journal Service.
[548527.442235] systemd[1]: systemd-journald.service: main process exited, code=killed, status=6/ABRT
[548527.442485] systemd[1]: Unit systemd-journald.service entered failed state.
[548527.442507] systemd[1]: systemd-journald.service failed.
[548527.442828] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[548527.443021] systemd[1]: Stopping Trigger Flushing of Journal to Persistent Storage...
[548527.443336] systemd[1]: Stopping Journal Service...
[548527.443358] systemd[1]: Starting Journal Service...
[548527.559358] systemd-journald[27165]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[548528.256488] systemd[1]: Started Journal Service.
[602054.223847] systemd[1]: systemd-journald.service: main process exited, code=killed, status=6/ABRT
[602054.224360] systemd[1]: Unit systemd-journald.service entered failed state.
[602054.224404] systemd[1]: systemd-journald.service failed.
[602054.225092] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[602054.225522] systemd[1]: Stopping Trigger Flushing of Journal to Persistent Storage...
[602054.226274] systemd[1]: Stopping Journal Service...
[602054.226324] systemd[1]: Starting Journal Service...
[602059.969637] systemd-journald[32584]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[602060.026528] systemd[1]: Started Journal Service.
[638529.495516] ata1: exception Emask 0x40 SAct 0x0 SErr 0xc0802 action 0x7
[638529.495523] ata1: SError: { RecovComm HostInt CommWake 10B8B }
[638529.495528] ata1: hard resetting link
[638529.954956] ata1: softreset failed (device not ready)
[638529.954963] ata1: applying PMP SRST workaround and retrying
[638530.109953] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[638530.113215] ata1.00: configured for UDMA/133
[638530.113320] ata1: EH complete
[638530.154548] IPv6: ADDRCONF(NETDEV_UP): tinc.falsifian: link is not ready
[638530.184193] AMD64 EDAC driver v3.4.0
[638530.184218] EDAC amd64: DRAM ECC disabled.
[638530.184225] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[638530.365021] IPv6: ADDRCONF(NETDEV_CHANGE): tinc.falsifian: link becomes ready
[638532.708691] floppy0: no floppy controllers found
[640237.872611] systemd-journald[20204]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[665755.887777] systemd-journald[22499]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[705440.408077] systemd-journald[22798]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[766454.346446] systemd-journald[20042]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[798900.686555] systemd-journald[568]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[837105.392613] systemd-journald[23946]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[935846.508524] systemd-journald[7925]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[974374.679578] systemd-journald[32370]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[1011772.002160] systemd-journald[16655]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[1047931.829130] systemd-journald[23534]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[1101720.396241] systemd-journald[31569]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[1129694.621786] systemd-journald[10564]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[1162516.109559] systemd-journald[25415]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[1219271.147663] systemd-journald[22864]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[1302532.395219] ata1: exception Emask 0x40 SAct 0x0 SErr 0xc0802 action 0x7
[1302532.395226] ata1: SError: { RecovComm HostInt CommWake 10B8B }
[1302532.395230] ata1: hard resetting link
[1302532.855322] ata1: softreset failed (device not ready)
[1302532.855328] ata1: applying PMP SRST workaround and retrying
[1302532.861059] AMD64 EDAC driver v3.4.0
[1302532.861088] EDAC amd64: DRAM ECC disabled.
[1302532.861095] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[1302533.010346] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[1302533.013614] ata1.00: configured for UDMA/133
[1302533.013727] ata1: EH complete
[1302533.138738] IPv6: ADDRCONF(NETDEV_UP): tinc.falsifian: link is not ready
[1302533.503254] IPv6: ADDRCONF(NETDEV_CHANGE): tinc.falsifian: link becomes ready
[1302535.722052] floppy0: no floppy controllers found
[1302538.217344] dhcpcd[8585]: segfault at ec ip 00000000004168e7 sp 00007ffc88782610 error 4 in dhcpcd[400000+3f000]
[1339562.535228] systemd-journald[10896]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[1388548.974100] systemd-journald[15322]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[1435252.783519] systemd-journald[3341]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[1523136.004440] r8169 0000:03:00.0 enp3s0: link down
[1523138.823880] r8169 0000:03:00.0 enp3s0: link up
[1523152.137298] r8169 0000:03:00.0 enp3s0: link down
[1523155.069617] r8169 0000:03:00.0 enp3s0: link up
[1523189.283573] r8169 0000:03:00.0 enp3s0: link down
[1523192.017447] r8169 0000:03:00.0 enp3s0: link up
[1523193.301725] r8169 0000:03:00.0 enp3s0: link down
[1523196.006643] r8169 0000:03:00.0 enp3s0: link up
[1523788.871112] r8169 0000:03:00.0 enp3s0: link down
[1523791.692605] r8169 0000:03:00.0 enp3s0: link up
[1523805.452208] r8169 0000:03:00.0 enp3s0: link down
[1523808.364887] r8169 0000:03:00.0 enp3s0: link up
[1523843.940463] r8169 0000:03:00.0 enp3s0: link down
[1523846.656012] r8169 0000:03:00.0 enp3s0: link up
[1523847.064816] r8169 0000:03:00.0 enp3s0: link down
[1523849.877385] r8169 0000:03:00.0 enp3s0: link up
[1534050.092030] systemd-journald[12778]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[1709686.159418] systemd-journald[10125]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[1805130.007296] systemd-journald[1813]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[1852161.961582] QXcbEventReader[26261]: segfault at 7f6ec509b219 ip 00007f6ec509b219 sp 00007f6ec2e2de60 error 14 in locale-archive[7f6ec5123000+6530000]
[1853062.618241] usb 1-6: new high-speed USB device number 5 using ehci-pci
[1853062.734877] usb 1-6: New USB device found, idVendor=067b, idProduct=2507
[1853062.734889] usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[1853062.734896] usb 1-6: Product: Mass Storage Device
[1853062.734902] usb 1-6: Manufacturer: Prolific Technology Inc.
[1853062.734907] usb 1-6: SerialNumber: 0
[1853062.819732] usb-storage 1-6:1.0: USB Mass Storage device detected
[1853062.819803] usb-storage 1-6:1.0: Quirks match for vid 067b pid 2507: 110
[1853062.819824] scsi host10: usb-storage 1-6:1.0
[1853062.819922] usbcore: registered new interface driver usb-storage
[1853062.828404] usbcore: registered new interface driver uas
[1853063.856662] scsi 10:0:0:0: Direct-Access WDC WD25 00BB-22GUA0 08.0 PQ: 0 ANSI: 0
[1853063.858643] sd 10:0:0:0: [sdc] Adjusting the sector count from its reported value: 488395055
[1853063.858660] sd 10:0:0:0: [sdc] 488395054 512-byte logical blocks: (250 GB/232 GiB)
[1853063.859768] sd 10:0:0:0: [sdc] Write Protect is off
[1853063.859773] sd 10:0:0:0: [sdc] Mode Sense: 03 00 00 00
[1853063.860885] sd 10:0:0:0: [sdc] No Caching mode page found
[1853063.860898] sd 10:0:0:0: [sdc] Assuming drive cache: write through
[1853063.861986] sd 10:0:0:0: [sdc] Adjusting the sector count from its reported value: 488395055
[1853063.885296] sdc: sdc2 sdc3
[1853063.887642] sd 10:0:0:0: [sdc] Adjusting the sector count from its reported value: 488395055
[1853063.889865] sd 10:0:0:0: [sdc] Attached SCSI disk
[1853100.418357] NET: Registered protocol family 38
[1853111.467163] kjournald starting. Commit interval 5 seconds
[1853111.475557] EXT3-fs (dm-5): using internal journal
[1853111.475568] EXT3-fs (dm-5): mounted filesystem with ordered data mode
[1853113.205623] kjournald starting. Commit interval 5 seconds
[1853113.206536] EXT3-fs (dm-3): using internal journal
[1853113.206548] EXT3-fs (dm-3): mounted filesystem with ordered data mode
[1854004.796093] usb 1-6: USB disconnect, device number 5
[1893777.482161] systemd-journald[9755]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[1954173.452791] systemd-journald[1168]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[2159674.482332] systemd-journald[26046]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[2413243.809186] systemd-journald[16685]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[2413244.182847] systemd-journald[16685]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.
[2455070.702439] IPv6: ADDRCONF(NETDEV_UP): tinc.falsifian: link is not ready
[2455070.732450] IPv6: ADDRCONF(NETDEV_CHANGE): tinc.falsifian: link becomes ready
[2455070.748594] AMD64 EDAC driver v3.4.0
[2455070.748620] EDAC amd64: DRAM ECC disabled.
[2455070.748627] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[2455074.546520] floppy0: no floppy controllers found
[2455074.587054] ata1: exception Emask 0x40 SAct 0x0 SErr 0xc0802 action 0x7
[2455074.587060] ata1: SError: { RecovComm HostInt CommWake 10B8B }
[2455074.587065] ata1: hard resetting link
[2455075.046461] ata1: softreset failed (device not ready)
[2455075.046468] ata1: applying PMP SRST workaround and retrying
[2455075.201461] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[2455075.204699] ata1.00: configured for UDMA/133
[2455075.204805] ata1: EH complete
[2455200.218465] systemd[1]: systemd-journald.service watchdog timeout (limit 1min)!
[2455223.431564] systemd[1]: systemd-journald.service: main process exited, code=killed, status=6/ABRT
[2455223.432205] systemd[1]: Unit systemd-journald.service entered failed state.
[2455223.432275] systemd[1]: systemd-journald.service failed.
[2455223.433561] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[2455223.433719] systemd[1]: Stopping Trigger Flushing of Journal to Persistent Storage...
[2455223.434506] systemd[1]: Stopping Journal Service...
[2455223.434559] systemd[1]: Starting Journal Service...
[2455245.372012] systemd-journald[28692]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[2455247.911994] systemd-journald[28692]: Deleted empty journal /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/user-1000@000520a97d68ba65-78b8fafa5aa71dcb.journal~ (8388608 bytes).
[2455248.909262] systemd[1]: Started Journal Service.
[2455249.017045] systemd[1]: Unit nix-daemon.service entered failed state.
[2455249.027029] systemd[1]: nix-daemon.service failed.
[2455281.209611] systemd[1]: user@104.service start operation timed out. Terminating.
[2455349.753201] systemd[1]: Unit user@104.service entered failed state.
[2455349.763189] systemd[1]: user@104.service failed.
[2457003.334505] usb 1-2: new high-speed USB device number 6 using ehci-pci
[2457003.477908] usb 1-2: New USB device found, idVendor=059f, idProduct=1053
[2457003.477914] usb 1-2: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[2457003.477917] usb 1-2: Product: P'9230 Desktop Drive
[2457003.477920] usb 1-2: Manufacturer: LaCie
[2457003.477922] usb 1-2: SerialNumber: 0000000025bc285c00e2
[2457003.478201] usb-storage 1-2:1.0: USB Mass Storage device detected
[2457003.478279] scsi host11: usb-storage 1-2:1.0
[2457014.033362] scsi 11:0:0:0: Direct-Access ST4000DM 000-1F2168 CC54 PQ: 0 ANSI: 5
[2457014.035875] sd 11:0:0:0: [sdc] 976754646 4096-byte logical blocks: (4.00 TB/3.63 TiB)
[2457014.036859] sd 11:0:0:0: [sdc] Write Protect is off
[2457014.036870] sd 11:0:0:0: [sdc] Mode Sense: 23 00 00 00
[2457014.037847] sd 11:0:0:0: [sdc] No Caching mode page found
[2457014.037856] sd 11:0:0:0: [sdc] Assuming drive cache: write through
[2457014.039594] sd 11:0:0:0: [sdc] 976754646 4096-byte logical blocks: (4.00 TB/3.63 TiB)
[2457014.067371] sdc: sdc1 sdc2
[2457014.069335] sd 11:0:0:0: [sdc] 976754646 4096-byte logical blocks: (4.00 TB/3.63 TiB)
[2457014.071469] sd 11:0:0:0: [sdc] Attached SCSI disk
[2457043.636970] BTRFS: device label pers-btr-sec devid 1 transid 7552 /dev/dm-3
[2457049.675503] BTRFS info (device dm-3): disk space caching is enabled
[2457419.484255] systemd[1]: systemd-journald.service: main process exited, code=killed, status=6/ABRT
[2457419.484767] systemd[1]: Unit systemd-journald.service entered failed state.
[2457419.484809] systemd[1]: systemd-journald.service failed.
[2457419.485731] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[2457419.485876] systemd[1]: Stopping Trigger Flushing of Journal to Persistent Storage...
[2457419.486624] systemd[1]: Stopping Journal Service...
[2457419.486685] systemd[1]: Starting Journal Service...
[2457421.313955] systemd-journald[27942]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[2457479.726539] systemd[1]: systemd-journald.service watchdog timeout (limit 1min)!
[2457488.030875] systemd[1]: systemd-journald.service: main process exited, code=killed, status=6/ABRT
[2457488.031433] systemd[1]: Failed to start Journal Service.
[2457488.031527] systemd[1]: Dependency failed for Trigger Flushing of Journal to Persistent Storage.
[2457488.031643] systemd[1]: Unit systemd-journald.service entered failed state.
[2457488.031655] systemd[1]: systemd-journald.service failed.
[2457488.032579] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[2457488.032719] systemd[1]: Stopping Journal Service...
[2457488.032788] systemd[1]: Starting Journal Service...
[2457508.195526] systemd-journald[27960]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[2457512.342138] systemd-journald[27960]: Deleted empty journal /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system@000520b3cc03b5ba-7946652d0a72418a.journal~ (8912896 bytes).
[2457512.865204] systemd[1]: Started Journal Service.
[2488063.167954] systemd[1]: systemd-journald.service stop-sigabrt timed out. Terminating.
[2488063.169272] systemd[1]: Starting Journal Service...
[2488070.799324] INFO: task bitcoind:18765 blocked for more than 120 seconds.
[2488070.799332] Not tainted 3.18.20 #1-NixOS
[2488070.799335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2488070.799338] bitcoind D ffff88042fd12c40 0 18765 18674 0x00000000
[2488070.799344] ffff88035b217b48 0000000000000086 ffff88041953f700 0000000000012c40
[2488070.799349] ffff88035b217fd8 0000000000012c40 ffff88041953d500 ffff88041953f700
[2488070.799353] ffff88035b217b58 ffff880418cf4900 ffff8800b31531f0 ffff8800b31531f0
[2488070.799357] Call Trace:
[2488070.799367] [<ffffffff814b2459>] schedule+0x29/0x70
[2488070.799412] [<ffffffffa005c9df>] wait_current_trans.isra.19+0x9f/0x100 [btrfs]
[2488070.799418] [<ffffffff8109f990>] ? wait_woken+0x60/0x60
[2488070.799439] [<ffffffffa005e171>] start_transaction+0x361/0x5a0 [btrfs]
[2488070.799459] [<ffffffffa005e3cb>] btrfs_start_transaction+0x1b/0x20 [btrfs]
[2488070.799479] [<ffffffffa006c32b>] btrfs_create+0x3b/0x220 [btrfs]
[2488070.799485] [<ffffffff811b9fd5>] vfs_create+0xd5/0x140
[2488070.799489] [<ffffffff811be95f>] do_last+0x114f/0x1290
[2488070.799494] [<ffffffff811bb00e>] ? link_path_walk+0x23e/0xe90
[2488070.799498] [<ffffffff811beb58>] path_openat+0xb8/0x660
[2488070.799503] [<ffffffff812b7194>] ? call_rwsem_down_read_failed+0x14/0x30
[2488070.799508] [<ffffffff811c031a>] do_filp_open+0x3a/0xb0
[2488070.799512] [<ffffffff811ccd2e>] ? __alloc_fd+0x7e/0x120
[2488070.799517] [<ffffffff811ad9ec>] do_sys_open+0x12c/0x220
[2488070.799521] [<ffffffff811adafe>] SyS_open+0x1e/0x20
[2488070.799525] [<ffffffff814b6689>] system_call_fastpath+0x12/0x17
[2488070.799536] INFO: task dhcpcd:20319 blocked for more than 120 seconds.
[2488070.799539] Not tainted 3.18.20 #1-NixOS
[2488070.799541] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2488070.799543] dhcpcd D ffff88042fc12c40 0 20319 1 0x00000000
[2488070.799547] ffff880004823af8 0000000000000082 ffff88007466e600 0000000000012c40
[2488070.799551] ffff880004823fd8 0000000000012c40 ffffffff8181b500 ffff88007466e600
[2488070.799555] ffff880004823b08 ffff880418cf4900 ffff880417f859f0 ffff880417f859f0
[2488070.799558] Call Trace:
[2488070.799563] [<ffffffff814b2459>] schedule+0x29/0x70
[2488070.799583] [<ffffffffa005c9df>] wait_current_trans.isra.19+0x9f/0x100 [btrfs]
[2488070.799587] [<ffffffff8109f990>] ? wait_woken+0x60/0x60
[2488070.799606] [<ffffffffa005e171>] start_transaction+0x361/0x5a0 [btrfs]
[2488070.799625] [<ffffffffa005e3cb>] btrfs_start_transaction+0x1b/0x20 [btrfs]
[2488070.799645] [<ffffffffa0068faa>] btrfs_setattr+0x1ba/0x2d0 [btrfs]
[2488070.799650] [<ffffffff811cbb61>] notify_change+0x261/0x370
[2488070.799654] [<ffffffff811ac476>] do_truncate+0x66/0x90
[2488070.799658] [<ffffffff811abde0>] ? do_dentry_open+0x220/0x320
[2488070.799663] [<ffffffff811be5eb>] do_last+0xddb/0x1290
[2488070.799667] [<ffffffff811bae3d>] ? link_path_walk+0x6d/0xe90
[2488070.799672] [<ffffffff811beb58>] path_openat+0xb8/0x660
[2488070.799676] [<ffffffff811ca52c>] ? destroy_inode+0x3c/0x70
[2488070.799680] [<ffffffff811c031a>] do_filp_open+0x3a/0xb0
[2488070.799684] [<ffffffff811ccd2e>] ? __alloc_fd+0x7e/0x120
[2488070.799688] [<ffffffff811ad9ec>] do_sys_open+0x12c/0x220
[2488070.799692] [<ffffffff811adafe>] SyS_open+0x1e/0x20
[2488070.799696] [<ffffffff814b6689>] system_call_fastpath+0x12/0x17
[2488070.799706] INFO: task systemd-journal:27960 blocked for more than 120 seconds.
[2488070.799709] Not tainted 3.18.20 #1-NixOS
[2488070.799710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2488070.799712] systemd-journal D ffff88042fc92c40 0 27960 1 0x00000004
[2488070.799716] ffff88000b90fa78 0000000000000082 ffff8803d6225d80 0000000000012c40
[2488070.799719] ffff88000b90ffd8 0000000000012c40 ffff88041953cc80 ffff8803d6225d80
[2488070.799723] ffff88000b90fa88 ffff880418cf4900 ffff880417f859f0 ffff880417f859f0
[2488070.799726] Call Trace:
[2488070.799730] [<ffffffff814b2459>] schedule+0x29/0x70
[2488070.799750] [<ffffffffa005c9df>] wait_current_trans.isra.19+0x9f/0x100 [btrfs]
[2488070.799754] [<ffffffff8109f990>] ? wait_woken+0x60/0x60
[2488070.799773] [<ffffffffa005e0f8>] start_transaction+0x2e8/0x5a0 [btrfs]
[2488070.799793] [<ffffffffa005e407>] btrfs_join_transaction+0x17/0x20 [btrfs]
[2488070.799812] [<ffffffffa0064063>] btrfs_dirty_inode+0x43/0xf0 [btrfs]
[2488070.799828] [<ffffffffa003d314>] ? block_rsv_add_bytes+0x24/0x60 [btrfs]
[2488070.799847] [<ffffffffa0064191>] btrfs_update_time+0x81/0xc0 [btrfs]
[2488070.799851] [<ffffffff811ca053>] update_time+0x23/0xc0
[2488070.799856] [<ffffffff810bc796>] ? current_fs_time+0x16/0x60
[2488070.799860] [<ffffffff811ca2b3>] file_update_time+0x83/0xd0
[2488070.799881] [<ffffffffa006d37c>] btrfs_page_mkwrite+0x9c/0x370 [btrfs]
[2488070.799886] [<ffffffff8116c6bf>] do_page_mkwrite+0x3f/0x90
[2488070.799891] [<ffffffff8116e725>] do_wp_page+0x675/0x8d0
[2488070.799896] [<ffffffff81170afd>] handle_mm_fault+0x70d/0xec0
[2488070.799900] [<ffffffff8116cafc>] ? tlb_finish_mmu+0x1c/0x50
[2488070.799905] [<ffffffff81058a08>] __do_page_fault+0x1c8/0x570
[2488070.799909] [<ffffffff811c63ea>] ? dput+0x7a/0x1b0
[2488070.799913] [<ffffffff811746c9>] ? vma_rb_erase+0x129/0x250
[2488070.799917] [<ffffffff81176627>] ? do_munmap+0x267/0x440
[2488070.799921] [<ffffffff81058dd2>] do_page_fault+0x22/0x30
[2488070.799925] [<ffffffff814b8538>] page_fault+0x28/0x30
[2488070.799932] INFO: task kworker/u8:7:2294 blocked for more than 120 seconds.
[2488070.799934] Not tainted 3.18.20 #1-NixOS
[2488070.799936] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2488070.799938] kworker/u8:7 D ffff88042fc92c40 0 2294 2 0x00000000
[2488070.799945] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1)
[2488070.799948] ffff880106053678 0000000000000046 ffff8803b5712200 0000000000012c40
[2488070.799951] ffff880106053fd8 0000000000012c40 ffff88041953cc80 ffff8803b5712200
[2488070.799954] ffff880106053688 ffff880418cf4900 ffff880417f859f0 ffff880417f859f0
[2488070.799958] Call Trace:
[2488070.799963] [<ffffffff814b2459>] schedule+0x29/0x70
[2488070.799982] [<ffffffffa005c9df>] wait_current_trans.isra.19+0x9f/0x100 [btrfs]
[2488070.799986] [<ffffffff8109f990>] ? wait_woken+0x60/0x60
[2488070.800005] [<ffffffffa005e0f8>] start_transaction+0x2e8/0x5a0 [btrfs]
[2488070.800025] [<ffffffffa005e407>] btrfs_join_transaction+0x17/0x20 [btrfs]
[2488070.800044] [<ffffffffa0064c5d>] run_delalloc_nocow+0xbd/0xb10 [btrfs]
[2488070.800050] [<ffffffff8114d700>] ? account_page_dirtied+0xb0/0x110
[2488070.800071] [<ffffffffa007b6ac>] ? lock_extent_bits+0x7c/0x240 [btrfs]
[2488070.800093] [<ffffffffa007c063>] ? test_range_bit+0x33/0x170 [btrfs]
[2488070.800113] [<ffffffffa006571b>] run_delalloc_range+0x6b/0x3c0 [btrfs]
[2488070.800134] [<ffffffffa007c371>] ? find_lock_delalloc_range.constprop.46+0x1d1/0x200 [btrfs]
[2488070.800155] [<ffffffffa007c48c>] writepage_delalloc.isra.31+0xec/0x160 [btrfs]
[2488070.800176] [<ffffffffa007d4ff>] __extent_writepage+0xbf/0x2f0 [btrfs]
[2488070.800197] [<ffffffffa007da5a>] extent_write_cache_pages.isra.27.constprop.45+0x32a/0x3e0 [btrfs]
[2488070.800219] [<ffffffffa007f45d>] extent_writepages+0x4d/0x70 [btrfs]
[2488070.800239] [<ffffffffa00626a0>] ? btrfs_submit_direct+0x7b0/0x7b0 [btrfs]
[2488070.800258] [<ffffffffa005f748>] btrfs_writepages+0x28/0x30 [btrfs]
[2488070.800263] [<ffffffff8114ef2e>] do_writepages+0x1e/0x30
[2488070.800277] [<ffffffff811d8859>] __writeback_single_inode+0x49/0x240
[2488070.800281] [<ffffffff811d8f27>] writeback_sb_inodes+0x1a7/0x3e0
[2488070.800285] [<ffffffff811d91ff>] __writeback_inodes_wb+0x9f/0xd0
[2488070.800289] [<ffffffff811d9463>] wb_writeback+0x233/0x2f0
[2488070.800293] [<ffffffff811c8fc1>] ? get_nr_inodes+0x51/0x80
[2488070.800297] [<ffffffff811db78a>] bdi_writeback_workfn+0x2ba/0x450
[2488070.800302] [<ffffffff8107e6fc>] process_one_work+0x15c/0x410
[2488070.800306] [<ffffffff8107f093>] worker_thread+0x53/0x470
[2488070.800310] [<ffffffff8107f040>] ? rescuer_thread+0x270/0x270
[2488070.800315] [<ffffffff81083bd9>] kthread+0xc9/0xe0
[2488070.800320] [<ffffffff81010000>] ? xen_smp_intr_init+0x10/0x220
[2488070.800325] [<ffffffff81083b10>] ? kthread_create_on_node+0x180/0x180
[2488070.800328] [<ffffffff814b65d8>] ret_from_fork+0x58/0x90
[2488070.800333] [<ffffffff81083b10>] ? kthread_create_on_node+0x180/0x180
[2488147.297463] systemd[1]: systemd-journald.service: main process exited, code=killed, status=6/ABRT
[2488147.298048] systemd[1]: Unit systemd-journald.service entered failed state.
[2488147.298090] systemd[1]: systemd-journald.service failed.
[2488147.299078] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[2488147.299239] systemd[1]: Stopping Trigger Flushing of Journal to Persistent Storage...
[2488147.301087] systemd[1]: Stopping Journal Service...
[2488147.301148] systemd[1]: Starting Journal Service...
[2488151.783902] systemd-journald[2687]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[2488157.710035] systemd[1]: Started Journal Service.
[2488238.953352] systemd[1]: systemd-journald.service: main process exited, code=killed, status=6/ABRT
[2488238.953883] systemd[1]: Unit systemd-journald.service entered failed state.
[2488238.953932] systemd[1]: systemd-journald.service failed.
[2488238.954879] systemd[1]: systemd-journald.service has no holdoff time, scheduling restart.
[2488238.955024] systemd[1]: Stopping Trigger Flushing of Journal to Persistent Storage...
[2488238.955063] systemd[1]: Stopping Journal Service...
[2488238.955109] systemd[1]: Starting Journal Service...
[2488241.180586] systemd-journald[3792]: File /var/log/journal/bbfa3d10a4b7f3931b8b3e3300000008/system.journal corrupted or uncleanly shut down, renaming and replacing.
[2488241.767889] systemd[1]: Started Journal Service.
[2503825.588195] usb 1-2: USB disconnect, device number 6
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: lots of snapshots, forcing defragmentation, and figuring out which files to defrag or nodatacow
2015-09-29 5:51 lots of snapshots, forcing defragmentation, and figuring out which files to defrag or nodatacow James Cook
@ 2015-09-30 6:33 ` Duncan
0 siblings, 0 replies; 2+ messages in thread
From: Duncan @ 2015-09-30 6:33 UTC (permalink / raw)
To: linux-btrfs
James Cook posted on Mon, 28 Sep 2015 22:51:05 -0700 as excerpted:
> The context of these three questions is that I'm experiencing occasional
> hangs for several seconds while btrfs-transacti works, and very long
> boot times. General comments welcome. System info at bottom,
> end part of dmesg.log attached.
>
> Q1:
>
> I keep a lot of read-only snapshots (> 300 total) of all of my
> subvolumes and haven't deleted any so far. Is this in itself a problem
> or unanticipated use of btrfs?
Very large numbers of snapshots do slow things down, but ~300 isn't what
I'd call "very large" -- we're talking thousands to tens of thousands.
My general recommendation is to keep it to ~250ish (under 300) per
snapshotted subvolume, preferably under 2000 (and if possible 1000)
total, easy enough to do even with automated frequent snapshotting (on
the order of an hour apart, initially), as long as an equally automated
snapshot thinning program is also established. At ~250 per subvolume,
1000 is 4 subvolumes worth, 2000 8 subvolumes worth.
A bit over 300, assuming they're all of the same subvolume, is getting a
bit high, but it shouldn't be causing a lot of trouble yet. It's just
time to start thinking about a thinning program.
There's one exception, quotas. Quotas continue to be an issue on btrfs;
they're on their third rewrite now and while they believe it will work
this time, there's still some serious bugs that will take a couple more
kernels to work out. And besides not working right, they dramatically
increase scalability issues. So my recommendation, unless you're
directly working with the devs to test, report problems with, and bug-
trace various quota issues, just don't run them on btrfs at this time.
If you need quotas, use a filesystem where they're mature and work. If
you don't, use btrfs without them.
Really. I've seen at least two confirmed cases posted where people
running quotas turned them off and their scaling issues disappeared.
So if you have them on, that could well be your problem, right there.
> Q2:
>
> I have some files that remain heavily fragmented (according to filefrag)
> even after I run btrfs fi defragment. I think this happens because btrfs
> doesn't want to unlink parts of the files from their snapshotted copies.
> Can I tell btrfs to defragment them anyway, and not worry about wasting
> space? And can I make the autodefrag mount option do this?
>
> For example (not all output shown):
>
> # filefrag *
> ...
>
system@1973a03e3af1449ba5dd93362953fd5f-0000000000000001-00051f9377f11af6.journal:
> 553 extents found ...
>
> # btrfs fi defragment -rf .
>
> # filefrag *
> ...
>
system@1973a03e3af1449ba5dd93362953fd5f-0000000000000001-00051f9377f11af6.journal:
> 331 extents found ...
Several points to note, here:
1) Filefrag doesn't understand btrfs compression.
If you don't use btrfs compression, this doesn't apply, but for btrfs
compressed files, filefrag reports huge numbers of extents -- generally
one per btrfs compression block (128 KiB), so 8 per MiB, 8192 per GiB of
(before compression, not like btrfs give you a way to see post-
compression file size anyway) file size.
But unless you run compress-force you won't see it everywhere, because
btrfs only compresses some files.
2) Btrfs defrag isn't snapshot aware, and will only defrag the files it's
directly pointed at, using more space as it breaks the reflinks to the
snapshotted copy. Around 3.9 snapshot-aware defrag was introduced, but
it turned out to have *severe* scalability issues, so that was rolled
back and snapshot-aware defrag was turned off again in, IIRC, 3.12 (thus
well before what you're running).
So worrying about breaking snapshot reflinks while defragging isn't going
to be your problem, that, per se, is simply not an issue.
3) What /can/ be an issue is dealt with using defrag's -t parameter. I
don't remember what the default target extent size is, but it's somewhat
smaller than you might expect, well under a gig. Extent sizes larger
than this are considered to be already defragged and aren't touched.
(While this does touch on #2 above as well, not unnecessarily breaking
reflinks to extents shared with other snapshots, the mechanism is one of
extent size, not whether the extent is shared with another snapshot. So
even if it's a new file not yet snapshotted, extents over this size won't
be touched.)
It's worth keeping in mind that btrfs' nominal data chunk size is 1 GiB.
As such, that's the nominal largest extent size as well, altho in some
cases (data chunks created on nearly empty TiB-scale filesystems) data
chunk size can be larger, multiple GiB, in which case extent size can be
larger as well.
Regardless, extent sizes > 1 GiB really aren't going to be a performance
issue anyway, so while using the -t 1G or -t 2G option is a good idea and
should reduce fragmentation further for extents between the default size
and your -t size, going above that isn't likely to do any good anyway.
Which means for multi-gig files ideal minimum fragmentation will be a
number of extents equal to their size in GiB, perhaps plus 2, assuming
the first extent is in a partially used data chunk and thus under a gig,
with the last being under a gig as well. You are extremely unlikely to
get better than that, so for file sizes over a GiB, you /will/ see
multiple extents. (Again, keeping in mind the exception where a data
chunk is multi-GiB sized itself.)
So using defrag's -t 1G is likely to get you somewhat better results, but
don't expect multi-gig files to ever be a single extent.
4) Recently (3.17?), btrfs got a new feature that allowed it to
automatically delete chunk-allocations with zero usage. This solved a
serious problem, as before this, btrfs could allocate new chunks but
couldn't deallocate them, and over time and many normal file-creation/
deletion cycles, most people ended up with a huge number of empty data
chunks, often filling their filesystem until no new metadata chunks could
be allocated, resulting in ENOSPC errors despite df reporting tens or
occasionally even hundreds of GiB free. (Also possible, but less common,
was the reverse case, all otherwise free space allocated to empty
metadata chunks, with no space left to allocate new data chunks.) A btrfs
balance start -dusage=0 -musage=0 could quickly get rid of them, but
before this feature, it had to be done manually.
Unfortunately, it appears defrag's previously great strategy of trying to
use the space in existing data chunks before allocating new ones hasn't
yet been adjusted for this, and it will first try to fill up holes in
existing data chunks before allocating new ones. With btrfs
automatically deleting empty chunks now, that means defrag will be
working with partially full chunks with free space, but where
unfortunately that free space is itself fragmented. So for a large multi-
gig file, instead of allocating new GiB sized chunks and defragging to
whole-chunk GiB sized extents, it's likely to be trying to work with much
smaller free-space extents in existing chunks.
Obviously this is going to rather negatively affect the results of a
defrag and post-defrag filefrag results will still be higher numbers of
extent counts than desired. =:^(
This is a known bug, a result of the fairly recent automated deleting
empty data chunks and defrag's reluctance to allocate new ones, that will
no doubt be fixed in time, but meanwhile, we deal with the code we have.
Here's an *entirely* *untested* idea for a workaround that I just came up
with as I was typing the explanation above. If you'd try it and report
back whether it works, we might well have at least some way to work
around the issue.
Before your defrag, do a btrfs fi df (or usage), and see what the numbers
for data are. (If you haven't run a balance recently and you see a big
spread tens of GiB, between data size and data used, try something like
btrfs balance start -dusage=20, which will only try to balance data
chunks under 20% usage. This will go much faster than balancing the
whole filesystem, and often gets you quite a bit of space returned to
unallocated from partially empty data chunks. If it doesn't have the
desired effect, try -dusage=50, or even higher, but be aware that above
50, you're dealing with mostly full chunks already, which will take far
longer to balance, with much less return for the time taken. If you do
such a balance, check again the btrfs fi df results and see how you did.)
Then do a truncate -s 100G some-not-yet-existing-file (adjusting the 100G
as appropriate for your level of unallocated space, see either btrfs fi
show or btrfs fi usage), and a sync, to ensure it's allocated in the
filesystem. Do another fi df and you should see the data size has
increased accordingly.
Now delete that file, sync again, and do a fi df to double-check that you
now have a bunch of empty data space (a spread between data size and used
of near the file's size.
Since I think btrfs doesn't delete those empty data chunks immediately,
this /should/ give defrag a bunch of empty data chunks to use, and post-
defrag filefrag results should end up much better (accounting for the
first three points above, of course)! =:^)
You might have to play with the idea a bit, perhaps using fallocate
instead of truncate, etc, but as long as you get something to allocate a
bunch of otherwise empty data chunks, delete it to empty them, and do the
defrag before btrfs has deleted those empty chunks, I think it should
work. =:^)
5) Judging by the filename of your example file, you run systemd, and
it's a systemd journal file. These files (along with database and VM
image files) have a non-append-only rewrite pattern that is known to be
problematic on COW-based filesystems such as btrfs, triggering very high
levels of fragmentation.
What version of systemd are you running? 221 and later (219 first tried
but there were bugs in the initial 219 implementation that remained in
220, that weren't worked out until 221) are more btrfs-friendly than
previous versions.
For reasons I'll explain in a moment, while I run btrfs and systemd (226
ATM), I don't know how well systemd's efforts at btrfs-friendliness
turned out, at least not as pertains to the journal, but AFAIK, they
consisted of two parts, which together should definitely help with the
problem. However, the one part won't trigger on older systems updated to
the newer btrfs-friendly systemd, without some manual intervention. I'll
explain...
First of all, it can be noted that if you're running an older systemd, or
if you're on a newer systemd now but it was updated from an older one,
you can take these steps manually, as well. I'll outline them that way,
tho newer systemd would take these steps automatically if installed
fresh, not updated from an older installation.
a) With systemd running in emergency/rescue mode or at least with
journaling shut down so it won't interfere...
b) Move/rename the existing journal directories (see the notes under (c),
you might wish to do this with all of /var, or only with /var/log or
/var/log/journal, the latter being what I'd do) to something else.
c) Create a new btrfs /subvolume/ in place of the old /directory/ that
you just moved, so it can be named the same.
The reason for this is that btrf snapshots stop at subvolume boundaries.
What we're doing here is taking the journal dir out of future / (or /var)
snapshots, by making it a subvolume of its own.
It should be noted that systemd's auto-create mechanism is the systemd-
tmpfiles-setup service, as configured in the various tmpfiles.d
locations, namely /usr/lib/tmpfiles.d/*, specifically the var.conf file,
here. If you look at that file, you'll see that systemd (226 at least)
actually creates a directory (d) for /var/log, with the subvolume
creation (v) as /var itself. Of course, only if they don't already exist
(and v/subvolume-creation gracefully degrades to the d/dir-creation
behavior it was before 219, on non-btrfs).
If I were doing it manually, however, I'd create the subvolume as
/var/log/journal, keeping /var either as its own subvolume (as systemd
now creates it) or as a directory on /, depending on whether I wanted the
rest of /var to be snapshotted separately from / or not.
d) Set /var/log/journal (creating it if you just created /var as its own
subvolume) nocow.
Systemd/tmpfiles.d has this (h line, setting +C) in /usr/lib/tmpfile.d/
journal-nocow.conf.
To do it manually, use chattr +C. Do the same for subdirs (the remote
subdir and individual machines subdirs), if applicable.
What nocow does is turn off copy-on-write for the file(s) in question,
making it/them rewrite-in-place instead of cow. As a (btrfs) side
effect, it turns off compression (if otherwise on) and checksumming for
the file as well. On a directory, nocow does nothing, /except/ that
newly created files (and subdirs) in that directory now inherit the nocow
attribute.
The idea is that our journal files should be nocow, and with the +C set
on the containing subdir, /newly/ /created/ journals should now be just
that.
e) Now copy your old journals back in from the renamed backup, but
there's a caveat...
On btrfs, for existing files that have nocow set after they already have
content (are not zero size), when they actually start behaving as nocow
isn't defined. Thus, the files have to be created new in the target
directory, in ordered to ensure that the (inherited) nocow takes effect
immediately.
The easiest way to ensure this is to copy them in from a different
filesystem (not subvolume). Assuming your memory is sufficiently large,
the easiest way to do /that/ is to copy the files to tmpfs, then copy/
move them to their new location under the nocow dir. Because it's a
cross-filesystem copy/move, that will ensure that they are actually
created in the new filesystem, not just reflinked or some such.
(AFAIK, current cp without the reflink option does the right thing
anyway, but there has been quite some discussion about making reflink
copies the default, in which case you'd have to set --reflink=no to turn
it off. Mv will always take a shortcut and simply create a reflink at
the new location, deleting the old one, if it can, which it normally can
if both the source and destination are on the same filesystem, so it'd
definitely be a problem from the same filesystem. So there's ways to do
it without using a separate filesystem as an intermediate, but the
easiest way to be sure it's right is to just use the separate filesystem
as an intermediate, and not have to worry about whether it's actually a
newly created file at the destination, because you /know/ it is due to
the cross-filesystem copy/move.)
OK, so why go to all this trouble?
First of all, nocow means the file is (normally) updated in-place, so
fragmentation isn't an issue -- as long as that remains true.
The problem is that snapshots depend on cow because they lock in place
the existing version. With the existing version locked in place,
obviously new writes must be cowed elsewhere, killing the intent of nocow.
What btrfs actually does with nocow files when a write comes in after a
snapshot, is use what some call cow1. The first write to a (4k) block
cows it elsewhere, as it normally would, but the file's nocow attribute
remains, and further writes to the now new block write in-place to it...
until the next snapshot locks it /too/ in place.
So frequent snapshots pretty well disable nocow and trigger fragmentation
just as if the file wasn't nocow... depending of course on how relatively
frequent and widely spread out those writes are into a file, vs the
frequency of snapshots locking the existing version in place.
So, what systemd did is ensure that newly created journal files are nocow,
by setting the attribute on the existing dirs containing them so they'll
inherit the attribute at creation. That's half the solution.
Unfortunately, systemd didn't get the other half of the solution, the
subvolume side, quite right. They created a subvolume at the /var
level. Which means /var will not be snapshotted with /, OK, but what if
you want to snapshot other files in /var? Now you'll setup a
snapshotting schedule for the /var subvol, but because /var/log/journal
is a directory not a subvolume of its own, it'll be included in the /var
snapshot, triggering the very same cow1 problems on the nocow journals
(tho at a different frequency if /var is snapshotted at a different
frequency) as they'd have if they were still part of the main / subvolume!
Which is why I recommend setting up /var/log/journal as its own subvolume,
to exclude it from the snapshots above (either /var if it's its own subvol
or / if not) and not snapshotting the journal subvolume at all, thus
avoiding snapshot-triggered cow1 fragmentation with the journal files.
Of course not snapshotting the journals is its own tradeoff. If you want
to snapshot them, you can do so, but try to do it at as low a frequency
as possible, and consider running defrag on the files. Depending on your
frequency of snapshotting and how frequently your journal files are
cycled based on your journal configuration, if they're actually rotated
out within a week or so and you're snapshotting also either every week or
perhaps every couple days, then fragmentation shouldn't get bad enough
due to the cow1 to be worth worrying about defrag.
6) Meanwhile, one final point to note, as hinted above, my own journal
configuration, which bypasses all this.
Basically, I did this:
a) When I converted to systemd, I kept my old syslog (syslog-ng), only
now built with the build-time-optional systemd/journal integration.
(FWIW I'm on gentoo, so toggling the systemd integration was as simple as
setting the systemd USE flag and rebuilding syslog-ng. However, binary-
based distros based on systemd will probably already have such
integration enabled in sysloggers they still ship.)
b) Because systemd's journal has some very nice current-session features
(and because it's difficult/impossible to entirely disable in any case,
but the features meant I didn't want to), I configured it to handle
current session stuff only, basically by setting things up in
journald.conf so it only used the /run/log/journal location, which being
on tmpfs by default, is memory-only and thus blown away on reboot or
when /run is unmounted.
This way I was able to keep the systemctl status last-ten-messages from
that service feature, since the journal was still operational, but only
to tmpfs. =:^)
c) I then configured my normal syslogger (syslog-ng in my case) to pull
from systemd/journald, and to continue to log to the normally configured
files in /var/log, as it normally did. And logrotate and various scripts
continue to rotate those logs as they normally would.
** Critically, because the normal syslogger writes in append-only mode,
its files aren't subject to the fragmentation that rewrite-all-over-the-
file files get. Thus, journals are subject to the problem, but the only
journals I have are in tmpfs which is memory-only so access is fast no
matter what it does, while the normal logs in (the btrfs) /var/log are
append-only and thus not subject to the problem.
So for current-session investigations, I can use the journal with its
best features, or the normal syslog files. If I'm investigating
something that happened in an earlier session/boot, I don't have the
journal to use (but also don't have the problems that writing journals to
btrfs brings), but the normal syslog files are still there and usable,
just as they always were.
IOW, I've lost none of the old log style features, while still being able
to use the new journal features in the current session, where they're of
most use anyway.
(As an additional benefit, despite scouring the journald documentation, I
found absolutely no way to filter out "noise" entries so they didn't hit
the journal at all -- it's possible to filter them on the journal-read
side, but not before they're written to journal. With syslog-ng and
presumably with other syslogs as well, filtering out such "noise" entries
before they're written to the logs at all, is simple, once the config
file format is understood, at least. For noise that happens several
times a minute, that's a lot of permanent filesystem writes that I'm
avoiding! Of course they're still stored in the tmpfs journals, using
memory, but luckily the journal's binary format does compression, so 10K
identical noise entries don't actually take up that much more room than
just one does.)
d) I did find that with journald journaling a whole session to the /run
tmpfs, instead of switching to /var/log/journal once it was available, I
did have to increase the size of the /run tmpfs some. However, before
systemd, my old init system (the sysvinit-based openrc) only used /run
for a few PID files, etc, so I had it clamped down rather tight, to
something like 20 MiB max. Now it's 512 MiB, half a GiB, reasonable on
my 16 GiB memory system.
I then had to tweak the journald.conf settings a bit, so it'd use most of
that half-GiB for itself (IIRC it uses only 50% by default, for safety,
then stops journaling), while still leaving a bit of room for the
standard PID files and the like.
FWIW, these are my non-default journald.conf settings; others remain at
default:
ForwardToSyslog=yes
RuntimeKeepFree=48M
RuntimeMaxFileSize=64M
RuntimeMaxUse=448M
Storage=volatile
TTYPath=/dev/tty12
Storage=volatile is the critical one for confining journals to tmpfs.
ForwardToSyslog=yes simply tells journald that I have a syslogger too,
and I want it to get the logs. The Runtime* settings determine systemd's
usage on that tmpfs, and TTYPath is more or less unrelated. (Gentoo
defaults to printing the messages log to tty12 when it's syslog, and here
I just decided to let journald do that, giving me an easy way to check on
the "noise" that syslog-ng is filtering if I want to, instead of having
syslog-ng print to that tty. Obviously there's a corresponding non-
default setting in syslog-ng's config turning off printing to the tty,
there.)
So there you have it, why I don't have to worry about journal behavior on
btrfs at all, should you want to do similar. Otherwise, simply follow
the recommendations in point #5 (as well as the others), and your
problems should go down dramatically, tho obviously I prefer the method I
described here in point #6, thus not having to deal with the problem, for
journal files at least, at all.
> Q3:
>
> What's the best way to tell which files are causing the hangs? My
> current method is to make an educated guess (e.g. think of programs that
> store large database files) then use filefrag to see if there's
> fragmentation. I'm not confident I've found all the sources of bad
> performance.
That one I don't really have a good answer for... *except* to remind you
that systemd, which you do appear to be using, has reasonably good
boottime taken reports available. If the problem's primarily at boot
time or it at least exists there as well, as seems to be the case, you
should be able to use those to at least get some idea of what services
are taking the time, so you can focus on them.
One more thing for completeness, tho based on the below it's unlikely to
apply to you. People report that when their filesystem has many devices
(more than four, so not applying to your two), the mount will sometimes
take so long it times out, tho they can mount the filesystem fine at the
command prompt. One workaround here is to simply lengthen the timeout
for that mount service. IIRC some people had ideas for shortening that
mount time, but I don't remember what they were.
... And few more (short) comments below the system info...
> System info:
>
> [root@angel-nixos:~]# uname -a Linux angel-nixos 3.18.20 #1-NixOS SMP
> Thu Jan 1 00:00:01 UTC 1970 x86_64 GNU/Linux
>
> [root@angel-nixos:~]# btrfs --version btrfs-progs v4.2
>
> [root@angel-nixos:~]# btrfs fi show Label: 'AngelBtrfs' uuid:
> 7f4b4b5d-1ba5-46cc-b782-938e3600a427
> Total devices 2 FS bytes used 1.06TiB
> devid 5 size 2.00TiB used 1.07TiB path /dev/mapper/[snip]
> devid 6 size 2.00TiB used 1.07TiB path /dev/mapper/[snip]
>
> btrfs-progs v4.2
>
> [root@angel-nixos:~]# btrfs fi df /
> Data, RAID1: total=1.01TiB, used=1.01TiB
> System, RAID1: total=32.00MiB, used=192.00KiB
> Metadata, RAID1: total=60.00GiB, used=58.65GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
3.18 series kernel, second to last LTS series, so you're good there.
4.2 userspace, current AFAIK.
Btrfs is raid1 with two devices, and seems a bit more than half full (2
TiB devices, just over 1 TiB data plus 60 GiB metadata on each). The fi
df says data is basically full, with metadata close enough as well, but
as I implied, nearly a full TiB unallocated, so it appears the empty-
chunk removal is functioning fine, and the filesystem is healthy in terms
of space available.
All around, pretty healthy! =:^) The only thing I'd suggest in general
is to setup a schedule of snapshot thinning, before it becomes a
problem. But at ~300 snapshots, it shouldn't be anything like a problem
yet.
And as I said above, disable btrfs quotas if they're turned on (unless
you're specifically helping to fix them), as right now they're simply
more trouble than they're worth, and we have posts demonstrating that
very point.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-09-30 6:33 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-29 5:51 lots of snapshots, forcing defragmentation, and figuring out which files to defrag or nodatacow James Cook
2015-09-30 6:33 ` Duncan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).