On 2015-12-02 08:53, Tomasz Chmielewski wrote: > On 2015-12-02 22:03, Austin S Hemmelgarn wrote: > >>> From these numbers (124 GB used where data size is 153 GB), it appears >>> that we save around 20% with zlib compression enabled. >>> Is 20% reasonable saving for zlib? Typically text compresses much better >>> with that algorithm, although I understand that we have several >>> limitations when applying that on a filesystem level. >> >> This is actually an excellent question. A couple of things to note >> before I share what I've seen: >> 1. Text compresses better with any compression algorithm. It is by >> nature highly patterned and moderately redundant data, which is what >> benefits the most from compression. > > It looks that compress=zlib does not compress very well. Following > Duncan's suggestion, I've changed it to compress-force=zlib, and > re-copied the data to make sure the file are compressed. For future reference, if you run 'btrfs filesystem defrag -r -czlib' on the top level directory, you can achieve the same effect without having to deal with the copy overhead. This has a side effect of breaking reflinks, but copying the files off and back onto the filesystem does so also, and even then, I doubt that you're using reflinks. There probably wouldn't be much difference in the time it takes, but at least you wouldn't be hitting another disk in the process. > > Compression ratio is much much better now (on a slightly changed data set): > > # df -h > /dev/xvdb 200G 24G 176G 12% /var/log/remote > > > # du -sh /var/log/remote/ > 138G /var/log/remote/ > > > So, 138 GB files use just 24 GB on disk - nice! > > However, I would still expect that compress=zlib has almost the same > effect as compress-force=zlib, for 100% text files/logs. > That's better than 80% space savings (it works out to about 83.6%), so I doubt that you'd manage to get anything better than that even with only plain text files. It's interesting that there's such a big discrepancy though, that indicates that BTRFS really needs some work WRT deciding what to compress.