From: Martin Steigerwald <Martin@lichtvoll.de>
To: linux-btrfs@vger.kernel.org
Subject: Re: How to refresh degraded BTRFS? free space fragmentation, file fragmentation...
Date: Wed, 16 Jan 2013 21:39:10 +0100 [thread overview]
Message-ID: <201301162139.11140.Martin@lichtvoll.de> (raw)
In-Reply-To: <201212091212.26248.Martin@lichtvoll.de>
Am Sonntag, 9. Dezember 2012 schrieb Martin Steigerwald:
> Hi!
>
> I have BTRFS on some systems since more than two years. My experience so
> far is: Performance at the beginning is pretty good, but some of my more
> often used BTRFS filesystem degrade badly in different areas. On some
> workloads pretty quickly.
>
> There are also some fs however that did not degrade that badly. These were
> some that have way more free space left than the ones that degraded
> badly. About 900 GB freespace left on my eSATA backup disk with BTRFS
> that is also quite new. About 80 GB left on my BTRFS RAID 1 local home disk
> where I can build debian packages or kernels and such without the restrictions
> NFS brings (root squash). These still appear to be fine, but I redid the local
> home one with mkfs.btrfs -n 32768 and -l 32768 not to long ago, but I
> think it was quite fine before anyway, so I might have overdone it here.
> This already points at a way to prevent some degradation BTRFS filesystems:
> Leave more free space.
>
>
> 1) fsync speed on my ThinkPad T23 has gone down that much that I use
[…]
Interesting to try after latest fsync improvements.
> 2) File fragmentation: Example with a SUSE Manager VirtualBox on an
[…]
> 3) Freespace fragmentation on the / filesystem on this ThinkPad T520 with
> Intel SSD 320:
>
> === fstrim ===
>
> merkaba:~> /usr/bin/time fstrim -v /
> /: 6849871872 bytes were trimmed
> 0.00user 5.99system 0:44.69elapsed 13%CPU (0avgtext+0avgdata 752maxresident)k
> 0inputs+0outputs (0major+237minor)pagefaults 0swaps
>
> It took a second or two in the beginning.
>
>
> atop:
>
> LVM | rkaba-debian | busy 91% | read 0 | write 10313 | MBw/s 67.48 | avio 0.20 ms |
> […]
> DSK | sda | busy 90% | read 0 | write 10319 | MBw/s 67.54 | avio 0.19 ms |
> […]
>
> PID TID RUID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR CPU CMD 1/2
> 6085 - root 1 0.29s 0.00s 0K 0K 0K 0K -- - D 0 13% fstrim
>
>
> 10000 write requests in 10 seconds.
I was able to refresh my BTRFS regarding this issue on 11th of January:
merkaba:~> btrfs filesystem df /
Data: total=15.10GB, used=11.06GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.75GB, used=654.12MB
Metadata: total=8.00MB, used=0.00
merkaba:~> btrfs balance start -dusage=5 /
Done, had to relocate 0 out of 25 chunks
merkaba:~> btrfs filesystem df /
Data: total=15.01GB, used=11.06GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.75GB, used=654.05MB
Metadata: total=8.00MB, used=0.00
merkaba:~> btrfs balance start -d /
Done, had to relocate 16 out of 25 chunks
merkaba:~> btrfs filesystem df /
Data: total=11.09GB, used=11.06GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.75GB, used=647.72MB
Metadata: total=8.00MB, used=0.00
merkaba:~> /usr/bin/time -v fstrim -v /
/: 2246623232 bytes were trimmed
Command being timed: "fstrim -v /"
User time (seconds): 0.00
System time (seconds): 2.34
Percent of CPU this job got: 10%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:21.84
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 748
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 239
Voluntary context switches: 110690
Involuntary context switches: 1426
Swaps: 0
File system inputs: 16
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
merkaba:~> btrfs balance start -fmconvert=single /
Done, had to relocate 8 out of 20 chunks
merkaba:~> btrfs filesystem df /
Data: total=11.09GB, used=11.06GB
System: total=36.00MB, used=4.00KB
Metadata: total=1.75GB, used=642.92MB
[406005.831307] btrfs: balance will reduce metadata integrity, use force if you want this
[406129.187057] btrfs: force reducing metadata integrity
[406129.199133] btrfs: relocating block group 9290383360 flags 36
[406132.645299] btrfs: found 6989 extents
[406132.673390] btrfs: relocating block group 8082423808 flags 36
[406135.807065] btrfs: found 6906 extents
[406135.841572] btrfs: relocating block group 7948206080 flags 36
[406138.413270] btrfs: found 4514 extents
[406138.435382] btrfs: relocating block group 6740246528 flags 36
[406142.572004] btrfs: found 10667 extents
[406142.638079] btrfs: relocating block group 6606028800 flags 36
[406146.272095] btrfs: found 19844 extents
[406146.289729] btrfs: relocating block group 6471811072 flags 36
[406149.136422] btrfs: found 14850 extents
[406149.159510] btrfs: relocating block group 29360128 flags 36
[406183.637010] btrfs: found 116645 extents
[406183.653225] btrfs: relocating block group 20971520 flags 34
[406183.671958] btrfs: found 1 extents
Metadata tree still on old size, thus a regular rebalance:
merkaba:~> btrfs balance start -m /
Done, had to relocate 8 out of 20 chunks
merkaba:~> btrfs filesystem df /
Data: total=11.09GB, used=11.06GB
System: total=36.00MB, used=4.00KB
Metadata: total=768.00MB, used=643.38MB
[406270.880962] btrfs: relocating block group 31801212928 flags 2
[406270.961955] btrfs: found 1 extents
[406270.976857] btrfs: relocating block group 31532777472 flags 4
[406270.990729] btrfs: relocating block group 31264342016 flags 4
[406271.006172] btrfs: relocating block group 30995906560 flags 4
[406271.020158] btrfs: relocating block group 30727471104 flags 4
[406271.480442] btrfs: found 5187 extents
[406271.515768] btrfs: relocating block group 30459035648 flags 4
[406277.158280] btrfs: found 54593 extents
[406277.173024] btrfs: relocating block group 30190600192 flags 4
[406284.680294] btrfs: found 63749 extents
[406284.756582] btrfs: relocating block group 29922164736 flags 4
[406290.907101] btrfs: found 59530 extents
merkaba:~> df -hT /
Dateisystem Typ Größe Benutzt Verf. Verw% Eingehängt auf
/dev/dm-0 btrfs 19G 12G 6,8G 64% /
merkaba:~> /usr/bin/time -v fstrim -v /
/: 5472256 bytes were trimmed
Command being timed: "fstrim -v /"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 50%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 748
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 238
Voluntary context switches: 12
Involuntary context switches: 3
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Today Still fast:
merkaba:~#1> /usr/bin/time -v fstrim /
Command being timed: "fstrim /"
User time (seconds): 0.00
System time (seconds): 0.03
Percent of CPU this job got: 17%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.19
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 708
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 227
Voluntary context switches: 736
Involuntary context switches: 35
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Boot time seems a tad bid slower tough:
merkaba:~> systemd-analyze
Startup finished in 5495ms (kernel) + 6331ms (userspace) = 11827ms
merkaba:~> systemd-analyze blame
3051ms cups.service
2330ms dirmngr.service
2267ms postfix.service
1411ms schroot.service
1385ms lvm2.service
1230ms network-manager.service
1128ms ssh.service
1117ms acpi-fakekey.service
1112ms avahi-daemon.service
1061ms privoxy.service
1010ms systemd-logind.service
721ms loadcpufreq.service
646ms colord.service
552ms kdm.service
533ms networking.service
532ms keyboard-setup.service
463ms remount-rootfs.service
368ms bootlogs.service
349ms udev.service
327ms console-kit-log-system-start.service
326ms postgresql.service
322ms binfmt-support.service
316ms acpi-support.service
315ms qemu-kvm.service
310ms sys-kernel-debug.mount
309ms dev-mqueue.mount
309ms anacron.service
303ms atd.service
297ms sys-kernel-security.mount
282ms cron.service
282ms dev-hugepages.mount
272ms lightdm.service
271ms console-kit-daemon.service
271ms lirc.service
268ms lxc.service
259ms cpufrequtils.service
259ms mdadm.service
252ms openntpd.service
240ms smartmontools.service
240ms alsa-utils.service
237ms run-user.mount
237ms speech-dispatcher.service
230ms udftools.service
229ms run-lock.mount
229ms systemd-remount-api-vfs.service
224ms ebtables.service
214ms openbsd-inetd.service
208ms motd.service
199ms hdparm.service
198ms irqbalance.service
190ms mountdebugfs.service
181ms saned.service
160ms systemd-user-sessions.service
157ms polkitd.service
147ms screen-cleanup.service
146ms console-setup.service
141ms networking-routes.service
140ms pppd-dns.service
130ms rc.local.service
130ms jove.service
128ms sysstat.service
112ms rsyslog.service
111ms udev-trigger.service
103ms home.mount
93ms systemd-sysctl.service
89ms boot.mount
85ms dns-clean.service
84ms kbd.service
66ms upower.service
60ms systemd-tmpfiles-setup.service
53ms openvpn.service
37ms boot-efi.mount
27ms udisks.service
22ms sysfsutils.service
22ms mdadm-raid.service
20ms proc-sys-fs-binfmt_misc.mount
18ms tmp.mount
2ms sys-fs-fuse-connections.mount
> vmstat 1:
>
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 3 0 1963688 3943380 156972 1827836 0 0 0 0 5421 15781 6 6 88 0
> 0 0 1963688 3943132 156972 1827852 0 0 0 0 5733 16478 9 7 83 0
> 1 0 1963688 3943008 156972 1827992 0 0 0 0 5050 14434 0 4 96 0
> 1 0 1963688 3949768 156972 1826708 0 0 0 0 5246 14960 2 5 93 0
> 0 0 1963688 3949644 156980 1826712 0 0 0 36 5104 14996 1 4 94 0
> 0 0 1963688 3949768 156980 1826720 0 0 0 0 5102 15210 2 4 94 0
> 3 0 1963688 3949644 156980 1826720 0 0 0 0 5321 15995 4 7 89 0
> 0 0 1963688 3949396 156980 1827188 0 0 0 0 5316 15616 6 5 88 0
> 1 0 1963688 3949148 156980 1827188 0 0 0 0 5102 14944 1 4 95 0
> 1 0 1963688 3949272 156980 1827188 0 0 0 0 5510 15928 5 6 89 0
> 1 0 1963688 3949272 156980 1827188 0 0 0 52 5107 15054 2 4 94 0
> 0 0 1963688 3949396 156980 1826868 0 0 0 4 4930 14567 1 4 95 0
> 1 0 1963688 3949396 156988 1826828 0 0 0 52 5132 15014 2 5 93 0
> 3 0 1963688 3949396 156988 1826836 0 0 0 0 5015 14447 1 4 95 0
> 0 0 1963688 3949520 156988 1826836 0 0 0 0 5233 15652 3 6 91 0
> 1 0 1963684 3949612 156988 1827172 0 0 0 3032 2546 7555 6 4 84 6
>
> After fstrim:
>
> 0 0 1963684 3944244 157016 1827752 0 0 0 0 357 1018 2 1 97 0
> 1 0 1963684 3943776 157024 1827776 0 0 0 64 634 1660 4 2 93 0
> 0 0 1963684 3943872 157024 1827784 0 0 0 0 180 473 0 0 99 0
>
>
> The I/O activity does not seem to be reflected in vmstat, I bet due to page
> cache not involved.
> === fallocate ===
>
> merkaba:/var/tmp> /usr/bin/time fallocate -l 2G fallocate-test
> 0.00user 118.85system 2:00.50elapsed 98%CPU (0avgtext+0avgdata 720maxresident)k
> 14912inputs+49112outputs (0major+227minor)pagefaults 0swaps
Now, lets try this:
merkaba:/var/tmp> /usr/bin/time -v fallocate -l 2G fallocate-test
Command being timed: "fallocate -l 2G fallocate-test"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 80%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 724
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 231
Voluntary context switches: 5
Involuntary context switches: 6
Swaps: 0
File system inputs: 80
File system outputs: 72
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
There we go :)
> Filesystem type is: 9123683e
> File size of fallocate-test is 2147483648 (524288 blocks, blocksize 4096)
> ext logical physical expected length flags
> 0 0 2626450 2048
> 1 2048 3215128 2628498 2040
> 2 4088 3408631 3217168 2032
> 3 6120 3430045 3410663 2024
> 4 8144 3439999 3432069 2016
> 5 10160 3474610 3442015 1004
> 6 11164 3743715 3475614 1002
[…]
> fallocate-test: 4556 extents found
merkaba:/var/tmp> filefrag -v fallocate-test
Filesystem type is: 9123683e
File size of fallocate-test is 2147483648 (524288 blocks, blocksize 4096)
ext logical physical expected length flags
0 0 8501248 524288 eof
fallocate-test: 1 extent found
Yes, thats the same filesystem :)
> But:
>
> merkaba:/var/tmp> /usr/bin/time rm fallocate-test
> 0.00user 0.24system 0:00.38elapsed 63%CPU (0avgtext+0avgdata 784maxresident)k
> 4464inputs+36184outputs (0major+243minor)pagefaults 0swaps
merkaba:/var/tmp> /usr/bin/time rm fallocate-test
0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 784maxresident)k
0inputs+24outputs (0major+243minor)pagefaults 0swaps
> Some more information on the filesystem in question:
>
> merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs fi sh
> failed to read /dev/sr0
> Label: 'debian' uuid: […]
> Total devices 1 FS bytes used 13.56GB
> devid 1 size 18.62GB used 18.62GB path /dev/dm-0
>
> Btrfs v0.19-239-g0155e84
>
>
> merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs fi df /
> Disk size: 18.62GB
> Disk allocated: 18.62GB
> Disk unallocated: 0.00
> Used: 13.56GB
> Free (Estimated): 3.31GB (Max: 3.31GB, min: 3.31GB)
> Data to disk ratio: 91 %
>
>
> merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs fi disk-usage /
> Data,Single: Size:15.10GB, Used:12.94GB
> /dev/dm-0 15.10GB
>
> Metadata,Single: Size:8.00MB, Used:0.00
> /dev/dm-0 8.00MB
>
> Metadata,DUP: Size:1.75GB, Used:630.11MB
> /dev/dm-0 3.50GB
>
> System,Single: Size:4.00MB, Used:0.00
> /dev/dm-0 4.00MB
>
> System,DUP: Size:8.00MB, Used:4.00KB
> /dev/dm-0 16.00MB
>
> Unallocated:
> /dev/dm-0 0.00
> merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs dev disk-usage /
> /dev/dm-0 18.62GB
> Data,Single: 15.10GB
> Metadata,Single: 8.00MB
> Metadata,DUP: 3.50GB
> System,Single: 4.00MB
> System,DUP: 16.00MB
> Unallocated: 0.00
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
prev parent reply other threads:[~2013-01-16 20:39 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-09 11:12 How to refresh degraded BTRFS? free space fragmentation, file fragmentation Martin Steigerwald
2012-12-09 11:20 ` Martin Steigerwald
2013-01-16 20:39 ` Martin Steigerwald [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201301162139.11140.Martin@lichtvoll.de \
--to=martin@lichtvoll.de \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).