From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f44.google.com ([209.85.214.44]:46574 "EHLO mail-it0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758899AbdKPUhN (ORCPT ); Thu, 16 Nov 2017 15:37:13 -0500 Received: by mail-it0-f44.google.com with SMTP id r127so1536098itb.5 for ; Thu, 16 Nov 2017 12:37:13 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <6d10a13a-f4b4-3688-4445-8dd2f645c222@gmail.com> References: <43412efa-ff56-9682-c8f7-a5966b87b10e@lukas-pirl.de> <361d92ee-9aee-35e1-024d-45ec5b79902b@gmail.com> <37eb6ee9-2f7e-de42-3f7c-32db11d7648a@gmail.com> <6d10a13a-f4b4-3688-4445-8dd2f645c222@gmail.com> From: Timofey Titovets Date: Thu, 16 Nov 2017 23:36:32 +0300 Message-ID: Subject: Re: zstd compression To: "Austin S. Hemmelgarn" Cc: Duncan <1i5t5.duncan@cox.net>, linux-btrfs Content-Type: text/plain; charset="UTF-8" Sender: linux-btrfs-owner@vger.kernel.org List-ID: 2017-11-16 19:32 GMT+03:00 Austin S. Hemmelgarn : > On 2017-11-16 08:43, Duncan wrote: >> >> Austin S. Hemmelgarn posted on Thu, 16 Nov 2017 07:30:47 -0500 as >> excerpted: >> >>> On 2017-11-15 16:31, Duncan wrote: >>>> >>>> Austin S. Hemmelgarn posted on Wed, 15 Nov 2017 07:57:06 -0500 as >>>> excerpted: >>>> >>>>> The 'compress' and 'compress-force' mount options only impact newly >>>>> written data. The compression used is stored with the metadata for >>>>> the extents themselves, so any existing data on the volume will be >>>>> read just fine with whatever compression method it was written with, >>>>> while new data will be written with the specified compression method. >>>>> >>>>> If you want to convert existing files, you can use the '-c' option to >>>>> the defrag command to do so. >>>> >>>> >>>> ... Being aware of course that using defrag to recompress files like >>>> that will break 100% of the existing reflinks, effectively (near) >>>> doubling data usage if the files are snapshotted, since the snapshot >>>> will now share 0% of its extents with the newly compressed files. >>> >>> Good point, I forgot to mention that. >>>> >>>> >>>> (The actual effect shouldn't be quite that bad, as some files are >>>> likely to be uncompressed due to not compressing well, and I'm not sure >>>> if defrag -c rewrites them or not. Further, if there's multiple >>>> snapshots data usage should only double with respect to the latest one, >>>> the data delta between it and previous snapshots won't be doubled as >>>> well.) >>> >>> I'm pretty sure defrag is equivalent to 'compress-force', not >>> 'compress', but I may be wrong. >> >> >> But... compress-force doesn't actually force compression _all_ the time. >> Rather, it forces btrfs to continue checking whether compression is worth >> it for each "block"[1] of the file, instead of giving up if the first >> quick try at the beginning says that block won't compress. >> >> So what I'm saying is that if the snapshotted data is already compressed, >> think (pre-)compressed tarballs or image files such as jpeg that are >> unlikely to /easily/ compress further and might well actually be _bigger_ >> once the compression algorithm is run over them, defrag -c will likely >> fail to compress them further even if it's the equivalent of compress- >> force, and thus /should/ leave them as-is, not breaking the reflinks of >> the snapshots and thus not doubling the data usage for that file, or more >> exactly, that extent of that file. >> >> Tho come to think of it, is defrag -c that smart, to actually leave the >> data as-is if it doesn't compress further, or does it still rewrite it >> even if it doesn't compress, thus breaking the reflink and doubling the >> usage regardless? > > I'm not certain how compression factors in, but if you aren't compressing > the file, it will only get rewritten if it's fragmented (which is shy > defragmenting the system root directory is usually insanely fast on most > systems, stuff there is almost never fragmented). >> >> >> --- >> [1] Block: I'm not positive it's the usual 4K block in this case. I >> think I read that it's 16K, but I might be confused on that. But >> regardless the size, the point is, with compress-force btrfs won't give >> up like simple compress will if the first "block" doesn't compress, it'll >> keep trying. >> >> Of course the new compression heuristic changes this a bit too, but the >> same general idea holds, compress-force continues to try for the entire >> file, compress will give up much faster. > > I'm not actually sure, I would think it checks 128k blocks of data (the > effective block size for compression), but if it doesn't it should be > checking at the filesystem block size (which means 16k on most recently > created filesystems). > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Defragment of data on btrfs, is simply rewrite data if, data doesn't meet some criteria. And only that -c does, it's say which compression method apply for new written data, no more, no less. On write side, FS see long/short data ranges for writing (see compress_file_range()), if compression needed, split data to 128KiB and pass it to compression logic. compression logic give up it self in 2 cases: 1. Compression of 2 (or 3?) first page sized blocks of 128KiB make data bigger -> give up -> write data as is 2. After compression done, if compression not free at least one sector size -> write data as is i.e. If you write 16 KiB at time, btrfs will compress each separate write as 16 KiB. If you write 1 MiB at time, btrfs will split it by 128 KiB. If you write 1025KiB, btrfs will split it by 128 KiB and last 1 KiB will be written as is. JFYI: Only that heuristic logic doing (i.e. compress, not compress-force) is: On every write, kernel check if compression are needed by inode_need_compress(). i.e. check flags like compress, nocompress, compress-force, defrag-compress (work like compress-force AFAIK) Internal logic: - Up to 4.14 kernel: If compression of first 128 KiB of file are fail by any criteria -> mark file as non compressible -> skip compression for new data - On 4.15+, if heuristic will work as expected (it does by logic): while check file (see inode_need_compress()), if it's marked for compression and it's not compression-force, heuristic check input write data range for some patterns and anti-patterns of compressible data, and can make decision for every written data, does it worth compression or not. Instead of blind decision based on prefix estimation. Thanks -- Have a nice day, Timofey.