Re: How to refresh degraded BTRFS? free space fragmentation, file fragmentation...

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Martin Steigerwald <Martin@lichtvoll.de>
To: linux-btrfs@vger.kernel.org
Subject: Re: How to refresh degraded BTRFS? free space fragmentation, file fragmentation...
Date: Wed, 16 Jan 2013 21:39:10 +0100	[thread overview]
Message-ID: <201301162139.11140.Martin@lichtvoll.de> (raw)
In-Reply-To: <201212091212.26248.Martin@lichtvoll.de>

Am Sonntag, 9. Dezember 2012 schrieb Martin Steigerwald:
> Hi!
> 
> I have BTRFS on some systems since more than two years. My experience so
> far is: Performance at the beginning is pretty good, but some of my more 
> often used BTRFS filesystem degrade badly in different areas. On some
> workloads pretty quickly.
> 
> There are also some fs however that did not degrade that badly. These were
> some that have way more free space left than the ones that degraded
> badly. About 900 GB freespace left on my eSATA backup disk with BTRFS
> that is also quite new. About 80 GB left on my BTRFS RAID 1 local home disk
> where I can build debian packages or kernels and such without the restrictions
> NFS brings (root squash). These still appear to be fine, but I redid the local
> home one with mkfs.btrfs -n 32768 and -l 32768 not to long ago, but I
> think it was quite fine before anyway, so I might have overdone it here.
> This already points at a way to prevent some degradation BTRFS filesystems:
> Leave more free space.
> 
> 
> 1) fsync speed on my ThinkPad T23 has gone down that much that I use
[…]

Interesting to try after latest fsync improvements.

> 2) File fragmentation: Example with a SUSE Manager VirtualBox on an
[…]

> 3) Freespace fragmentation on the / filesystem on this ThinkPad T520 with
> Intel SSD 320:
> 
> === fstrim ===
> 
> merkaba:~> /usr/bin/time fstrim -v /
> /: 6849871872 bytes were trimmed
> 0.00user 5.99system 0:44.69elapsed 13%CPU (0avgtext+0avgdata 752maxresident)k
> 0inputs+0outputs (0major+237minor)pagefaults 0swaps
> 
> It took a second or two in the beginning.
> 
> 
> atop:
> 
> LVM |  rkaba-debian  |  busy     91%  |  read       0  |   write  10313  |  MBw/s  67.48  |  avio 0.20 ms  |
> […]
> DSK |           sda  |  busy     90%  |  read       0  |   write  10319  |  MBw/s  67.54  |  avio 0.19 ms  |
> […]
> 
>   PID   TID RUID      THR   SYSCPU  USRCPU  VGROW  RGROW   RDDSK  WRDSK ST EXC  S CPUNR  CPU CMD         1/2
>  6085     - root        1    0.29s   0.00s     0K     0K      0K     0K --   -  D     0  13% fstrim
> 
> 
> 10000 write requests in 10 seconds.

I was able to refresh my BTRFS regarding this issue on 11th of January:

merkaba:~> btrfs filesystem df /
Data: total=15.10GB, used=11.06GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.75GB, used=654.12MB
Metadata: total=8.00MB, used=0.00


merkaba:~> btrfs balance start -dusage=5 /
Done, had to relocate 0 out of 25 chunks
merkaba:~> btrfs filesystem df /          
Data: total=15.01GB, used=11.06GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.75GB, used=654.05MB
Metadata: total=8.00MB, used=0.00


merkaba:~> btrfs balance start -d /       
Done, had to relocate 16 out of 25 chunks
merkaba:~> btrfs filesystem df /   
Data: total=11.09GB, used=11.06GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.75GB, used=647.72MB
Metadata: total=8.00MB, used=0.00


merkaba:~> /usr/bin/time -v fstrim -v /
/: 2246623232 bytes were trimmed
        Command being timed: "fstrim -v /"
        User time (seconds): 0.00
        System time (seconds): 2.34
        Percent of CPU this job got: 10%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:21.84
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 748
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 239
        Voluntary context switches: 110690
        Involuntary context switches: 1426
        Swaps: 0
        File system inputs: 16
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0



merkaba:~> btrfs balance start -fmconvert=single /   
Done, had to relocate 8 out of 20 chunks
merkaba:~> btrfs filesystem df /
Data: total=11.09GB, used=11.06GB
System: total=36.00MB, used=4.00KB
Metadata: total=1.75GB, used=642.92MB



[406005.831307] btrfs: balance will reduce metadata integrity, use force if you want this
[406129.187057] btrfs: force reducing metadata integrity
[406129.199133] btrfs: relocating block group 9290383360 flags 36
[406132.645299] btrfs: found 6989 extents
[406132.673390] btrfs: relocating block group 8082423808 flags 36
[406135.807065] btrfs: found 6906 extents
[406135.841572] btrfs: relocating block group 7948206080 flags 36
[406138.413270] btrfs: found 4514 extents
[406138.435382] btrfs: relocating block group 6740246528 flags 36
[406142.572004] btrfs: found 10667 extents
[406142.638079] btrfs: relocating block group 6606028800 flags 36
[406146.272095] btrfs: found 19844 extents
[406146.289729] btrfs: relocating block group 6471811072 flags 36
[406149.136422] btrfs: found 14850 extents
[406149.159510] btrfs: relocating block group 29360128 flags 36
[406183.637010] btrfs: found 116645 extents
[406183.653225] btrfs: relocating block group 20971520 flags 34
[406183.671958] btrfs: found 1 extents



Metadata tree still on old size, thus a regular rebalance:

merkaba:~> btrfs balance start -m /               
Done, had to relocate 8 out of 20 chunks
merkaba:~> btrfs filesystem df /
Data: total=11.09GB, used=11.06GB
System: total=36.00MB, used=4.00KB
Metadata: total=768.00MB, used=643.38MB


[406270.880962] btrfs: relocating block group 31801212928 flags 2
[406270.961955] btrfs: found 1 extents
[406270.976857] btrfs: relocating block group 31532777472 flags 4
[406270.990729] btrfs: relocating block group 31264342016 flags 4
[406271.006172] btrfs: relocating block group 30995906560 flags 4
[406271.020158] btrfs: relocating block group 30727471104 flags 4
[406271.480442] btrfs: found 5187 extents
[406271.515768] btrfs: relocating block group 30459035648 flags 4
[406277.158280] btrfs: found 54593 extents
[406277.173024] btrfs: relocating block group 30190600192 flags 4
[406284.680294] btrfs: found 63749 extents
[406284.756582] btrfs: relocating block group 29922164736 flags 4
[406290.907101] btrfs: found 59530 extents


merkaba:~> df -hT /
Dateisystem    Typ   Größe Benutzt Verf. Verw% Eingehängt auf
/dev/dm-0      btrfs   19G     12G  6,8G   64% /

merkaba:~> /usr/bin/time -v fstrim -v /            
/: 5472256 bytes were trimmed
        Command being timed: "fstrim -v /"
        User time (seconds): 0.00
        System time (seconds): 0.00
        Percent of CPU this job got: 50%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 748
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 238
        Voluntary context switches: 12
        Involuntary context switches: 3
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0



Today Still fast:

merkaba:~#1> /usr/bin/time -v fstrim /
        Command being timed: "fstrim /"
        User time (seconds): 0.00
        System time (seconds): 0.03
        Percent of CPU this job got: 17%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.19
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 708
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 227
        Voluntary context switches: 736
        Involuntary context switches: 35
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0




Boot time seems a tad bid slower tough:

merkaba:~> systemd-analyze
Startup finished in 5495ms (kernel) + 6331ms (userspace) = 11827ms
merkaba:~> systemd-analyze blame  
  3051ms cups.service
  2330ms dirmngr.service
  2267ms postfix.service
  1411ms schroot.service
  1385ms lvm2.service
  1230ms network-manager.service
  1128ms ssh.service
  1117ms acpi-fakekey.service
  1112ms avahi-daemon.service
  1061ms privoxy.service
  1010ms systemd-logind.service
   721ms loadcpufreq.service
   646ms colord.service
   552ms kdm.service
   533ms networking.service
   532ms keyboard-setup.service
   463ms remount-rootfs.service
   368ms bootlogs.service
   349ms udev.service
   327ms console-kit-log-system-start.service
   326ms postgresql.service
   322ms binfmt-support.service
   316ms acpi-support.service
   315ms qemu-kvm.service
   310ms sys-kernel-debug.mount
   309ms dev-mqueue.mount
   309ms anacron.service
   303ms atd.service
   297ms sys-kernel-security.mount
   282ms cron.service
   282ms dev-hugepages.mount
   272ms lightdm.service
   271ms console-kit-daemon.service
   271ms lirc.service
   268ms lxc.service
   259ms cpufrequtils.service
   259ms mdadm.service
   252ms openntpd.service
   240ms smartmontools.service
   240ms alsa-utils.service
   237ms run-user.mount
   237ms speech-dispatcher.service
   230ms udftools.service
   229ms run-lock.mount
   229ms systemd-remount-api-vfs.service
   224ms ebtables.service
   214ms openbsd-inetd.service
   208ms motd.service
   199ms hdparm.service
   198ms irqbalance.service
   190ms mountdebugfs.service
   181ms saned.service
   160ms systemd-user-sessions.service
   157ms polkitd.service
   147ms screen-cleanup.service
   146ms console-setup.service
   141ms networking-routes.service
   140ms pppd-dns.service
   130ms rc.local.service
   130ms jove.service
   128ms sysstat.service
   112ms rsyslog.service
   111ms udev-trigger.service
   103ms home.mount
    93ms systemd-sysctl.service
    89ms boot.mount
    85ms dns-clean.service
    84ms kbd.service
    66ms upower.service
    60ms systemd-tmpfiles-setup.service
    53ms openvpn.service
    37ms boot-efi.mount
    27ms udisks.service
    22ms sysfsutils.service
    22ms mdadm-raid.service
    20ms proc-sys-fs-binfmt_misc.mount
    18ms tmp.mount
     2ms sys-fs-fuse-connections.mount



> vmstat 1:
> 
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  3  0 1963688 3943380 156972 1827836    0    0     0     0 5421 15781  6  6 88  0
>  0  0 1963688 3943132 156972 1827852    0    0     0     0 5733 16478  9  7 83  0
>  1  0 1963688 3943008 156972 1827992    0    0     0     0 5050 14434  0  4 96  0
>  1  0 1963688 3949768 156972 1826708    0    0     0     0 5246 14960  2  5 93  0
>  0  0 1963688 3949644 156980 1826712    0    0     0    36 5104 14996  1  4 94  0
>  0  0 1963688 3949768 156980 1826720    0    0     0     0 5102 15210  2  4 94  0
>  3  0 1963688 3949644 156980 1826720    0    0     0     0 5321 15995  4  7 89  0
>  0  0 1963688 3949396 156980 1827188    0    0     0     0 5316 15616  6  5 88  0
>  1  0 1963688 3949148 156980 1827188    0    0     0     0 5102 14944  1  4 95  0
>  1  0 1963688 3949272 156980 1827188    0    0     0     0 5510 15928  5  6 89  0
>  1  0 1963688 3949272 156980 1827188    0    0     0    52 5107 15054  2  4 94  0
>  0  0 1963688 3949396 156980 1826868    0    0     0     4 4930 14567  1  4 95  0
>  1  0 1963688 3949396 156988 1826828    0    0     0    52 5132 15014  2  5 93  0
>  3  0 1963688 3949396 156988 1826836    0    0     0     0 5015 14447  1  4 95  0
>  0  0 1963688 3949520 156988 1826836    0    0     0     0 5233 15652  3  6 91  0
>  1  0 1963684 3949612 156988 1827172    0    0     0  3032 2546 7555  6  4 84  6
> 
> After fstrim:
> 
>  0  0 1963684 3944244 157016 1827752    0    0     0     0  357 1018  2  1 97  0
>  1  0 1963684 3943776 157024 1827776    0    0     0    64  634 1660  4  2 93  0
>  0  0 1963684 3943872 157024 1827784    0    0     0     0  180  473  0  0 99  0
> 
> 
> The I/O activity does not seem to be reflected in vmstat, I bet due to page
> cache not involved.



> === fallocate ===
> 
> merkaba:/var/tmp> /usr/bin/time fallocate -l 2G fallocate-test
> 0.00user 118.85system 2:00.50elapsed 98%CPU (0avgtext+0avgdata 720maxresident)k
> 14912inputs+49112outputs (0major+227minor)pagefaults 0swaps

Now, lets try this:

merkaba:/var/tmp> /usr/bin/time -v fallocate -l 2G fallocate-test
        Command being timed: "fallocate -l 2G fallocate-test"
        User time (seconds): 0.00
        System time (seconds): 0.00
        Percent of CPU this job got: 80%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 724
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 231
        Voluntary context switches: 5
        Involuntary context switches: 6
        Swaps: 0
        File system inputs: 80
        File system outputs: 72
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0


There we go :)

> Filesystem type is: 9123683e
> File size of fallocate-test is 2147483648 (524288 blocks, blocksize 4096)
>  ext logical physical expected length flags
>    0       0  2626450            2048 
>    1    2048  3215128  2628498   2040 
>    2    4088  3408631  3217168   2032 
>    3    6120  3430045  3410663   2024 
>    4    8144  3439999  3432069   2016 
>    5   10160  3474610  3442015   1004 
>    6   11164  3743715  3475614   1002 
[…]
> fallocate-test: 4556 extents found

merkaba:/var/tmp> filefrag -v fallocate-test                     
Filesystem type is: 9123683e
File size of fallocate-test is 2147483648 (524288 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0  8501248          524288 eof
fallocate-test: 1 extent found


Yes, thats the same filesystem :)

> But:
> 
> merkaba:/var/tmp> /usr/bin/time rm fallocate-test
> 0.00user 0.24system 0:00.38elapsed 63%CPU (0avgtext+0avgdata 784maxresident)k
> 4464inputs+36184outputs (0major+243minor)pagefaults 0swaps

merkaba:/var/tmp> /usr/bin/time rm fallocate-test 
0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 784maxresident)k
0inputs+24outputs (0major+243minor)pagefaults 0swaps

> Some more information on the filesystem in question:
> 
> merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs fi sh
> failed to read /dev/sr0
> Label: 'debian'  uuid: […]
>         Total devices 1 FS bytes used 13.56GB
>         devid    1 size 18.62GB used 18.62GB path /dev/dm-0
> 
> Btrfs v0.19-239-g0155e84
> 
> 
> merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs fi df /
> Disk size:                18.62GB
> Disk allocated:           18.62GB
> Disk unallocated:            0.00
> Used:                     13.56GB
> Free (Estimated):          3.31GB       (Max: 3.31GB, min: 3.31GB)
> Data to disk ratio:          91 %
> 
> 
> merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs fi disk-usage /
> Data,Single: Size:15.10GB, Used:12.94GB
>    /dev/dm-0       15.10GB
> 
> Metadata,Single: Size:8.00MB, Used:0.00
>    /dev/dm-0        8.00MB
> 
> Metadata,DUP: Size:1.75GB, Used:630.11MB
>    /dev/dm-0        3.50GB
> 
> System,Single: Size:4.00MB, Used:0.00
>    /dev/dm-0        4.00MB
> 
> System,DUP: Size:8.00MB, Used:4.00KB
>    /dev/dm-0       16.00MB
> 
> Unallocated:
>    /dev/dm-0          0.00
> merkaba:/home/martin/Linux/Dateisysteme/BTRFS/btrfs-progs-unstable> ./btrfs dev disk-usage /
> /dev/dm-0          18.62GB
>    Data,Single:             15.10GB
>    Metadata,Single:          8.00MB
>    Metadata,DUP:             3.50GB
>    System,Single:            4.00MB
>    System,DUP:              16.00MB
>    Unallocated:                0.00

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

     prev parent reply	other threads:[~2013-01-16 20:39 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-09 11:12 How to refresh degraded BTRFS? free space fragmentation, file fragmentation Martin Steigerwald
2012-12-09 11:20 ` Martin Steigerwald
2013-01-16 20:39 ` Martin Steigerwald [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201301162139.11140.Martin@lichtvoll.de \
    --to=martin@lichtvoll.de \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).