* Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo @ 2016-05-11 19:50 Niccolò Belli 2016-05-11 20:07 ` Christoph Anton Mitterer 2016-05-12 10:18 ` Duncan 0 siblings, 2 replies; 6+ messages in thread From: Niccolò Belli @ 2016-05-11 19:50 UTC (permalink / raw) To: linux-btrfs Hi, Before doing the daily backup I did a btrfs check and btrfs scrub as usual. After that this time I also decided to run btrfs filesystem defragment -r -v -clzo on all subvolumes (from a live distro) and just to be sure I runned check and scrub once again. Before defragment: total bytes scrubbed: 15.90GiB with 0 errors After defragment: total bytes scrubbed: 26.66GiB with 0 errors What did happen? This is something like a night and day difference: almost double the data! As stated in the subject all the subolumes have always been mounted with compress=lzo in /etc/fstab, even when I installed the distro a couple of days ago I manually mounted the subvolumes with -o compress=lzo. Instead I never used autodefrag. Niccolò ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo 2016-05-11 19:50 Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo Niccolò Belli @ 2016-05-11 20:07 ` Christoph Anton Mitterer 2016-05-12 10:18 ` Duncan 1 sibling, 0 replies; 6+ messages in thread From: Christoph Anton Mitterer @ 2016-05-11 20:07 UTC (permalink / raw) To: Niccolò Belli, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 159 bytes --] On Wed, 2016-05-11 at 21:50 +0200, Niccolò Belli wrote: > What did happen? Perhaps because defrag unfortunately breaks up any reflinks? Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5930 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo 2016-05-11 19:50 Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo Niccolò Belli 2016-05-11 20:07 ` Christoph Anton Mitterer @ 2016-05-12 10:18 ` Duncan 2016-05-12 13:56 ` Niccolò Belli 1 sibling, 1 reply; 6+ messages in thread From: Duncan @ 2016-05-12 10:18 UTC (permalink / raw) To: linux-btrfs Niccolò Belli posted on Wed, 11 May 2016 21:50:43 +0200 as excerpted: > Hi, > Before doing the daily backup I did a btrfs check and btrfs scrub as > usual. > After that this time I also decided to run btrfs filesystem defragment > -r -v -clzo on all subvolumes (from a live distro) and just to be sure I > runned check and scrub once again. > > Before defragment: total bytes scrubbed: 15.90GiB with 0 errors > After defragment: total bytes scrubbed: 26.66GiB with 0 errors > > What did happen? This is something like a night and day difference: > almost double the data! As stated in the subject all the subolumes have > always been mounted with compress=lzo in /etc/fstab, even when I > installed the distro a couple of days ago I manually mounted the > subvolumes with -o compress=lzo. Instead I never used autodefrag. I'd place money on your use of either snapshots or dedup. As CAM says (perhaps too) briefly, defrag isn't snapshot (technically, reflink) aware, and will break reflinks from other snapshots/dedups as it defrags whatever file it's currently working on. If there's few to no reflinks, as there won't be if you're not using snapshots, btrfs dedup, etc, no problem, but where there's existing reflinks, the mechanism both snapshots and the various btrfs dedup tools use, it will rewrite only the copy of the data it's working on, leaving the others as they are, thus effectively doubling (for the snapshots and first defrag case) the data usage, the old possibly multiply snapshot- reflinked copy, and the new defragged copy that no longer shares extents with the snapshots and other previously reflinked copies. And unlike a normal defrag, when you use the compress option, it forced rewrite of every file in ordered to (possibly re)compress it. So while a normal defrag would have only rewritten some files and would have only expanded data usage to the extent it actually did rewrites, the compress option forced it to recompress all files it came across, breaking all those reflinks and duplicating the data if existing snapshots, etc, still referenced the old copies, in the process, thereby effectively doubling your data usage. The fact that it didn't /quite/ double usage may be down to the normal compress mount option only doing a quick compression test and not compressing it if the file doesn't seem particularly compressible based on that quick test, while the defrag with compress likely actually checks every (128 KiB compression) block, getting a bit better compression in the process. So the defrag/compress run didn't quite double usage as it compressed some stuff that the runtime compression didn't. (FWIW, you can get the more thorough runtime compression behavior with the compress- force option, which always tries compression, not just doing a quick test and skipping compression on the entire file if the bit the test tried didn't compress so well.) FWIW, around 3.9, btrfs defrag was actually snapshot/reflink aware for a few releases, but it turned out that dealing with all those reflinks simply didn't scale well with the then existing code, and people were reporting defrag runs taking days or weeks, to (would-be) months with enough snapshots and with quotas (which didn't scale well either) turned on. Obviously that was simply unworkable, so defrag's snapshot awareness was reverted until they could make it scale better, as a working but snapshot unaware defrag was clearly more practical than one that couldn't be run because it'd take months, and that snapshot awareness has yet to be reactivated. So now the bottom line is don't defrag what you don't want un-reflinked. FWIW, autodefrag has the same problem in theory, but the effect in practice is far more limited, in part because it only does its defrag thing when some part of the file is being rewritten (and thus COWed elsewhere, doing a limited dereflink already for the actually written block(s) already, and while autodefrag will magnify that a bit by COWing somewhat larger extents, for files of any size (MiB scale and larger) it's not going to rewrite and thus duplicate the entire file, as as defrag could do. And it's definitely not going to be rewriting all files in large sections of the filesystem as recursive defrag with the compression option will. Additionally, autodefrag will tend to defrag the file shortly after it has been changed, likely before any snapshots have been taken if they're only taken daily or so, so you'll only have effectively two copies of the portion of the file that was changed, the old version as still locked in place by previous snapshots and the new version, not the three that you're likely to have if you wait until snapshots have been done before doing the defrag (the old version as in previous snapshots, the new version as initially written and locked in place by post-change pre- defrag snapshots, and the new version as defragged). -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo 2016-05-12 10:18 ` Duncan @ 2016-05-12 13:56 ` Niccolò Belli 2016-05-13 6:11 ` Duncan 0 siblings, 1 reply; 6+ messages in thread From: Niccolò Belli @ 2016-05-12 13:56 UTC (permalink / raw) To: linux-btrfs; +Cc: Duncan Thanks for the detailed explanation, hopefully in the future someone will be able to make defrag snapshot/reflink aware in a scalable manner. I will not use use defrag anymore, but what do you suggest me to do to reclaim the lost space? Get rid of my current snapshots or maybe simply running bedup? Niccolò ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo 2016-05-12 13:56 ` Niccolò Belli @ 2016-05-13 6:11 ` Duncan 2016-05-20 15:51 ` Niccolò Belli 0 siblings, 1 reply; 6+ messages in thread From: Duncan @ 2016-05-13 6:11 UTC (permalink / raw) To: linux-btrfs Niccolò Belli posted on Thu, 12 May 2016 15:56:20 +0200 as excerpted: > Thanks for the detailed explanation, hopefully in the future someone > will be able to make defrag snapshot/reflink aware in a scalable manner. It's still planned, AFAIK, but one of the scaling issues in particular, quotas, have turned out to be a particularly challenging thing to even actually get working correctly. They've rewritten the quota code twice (so they're on their third attempted solution), and it's still broken in certain corner-cases ATM, to the point that while they're still actually trying to get the existing third try to work in the tough corner-cases as well, they're already talking about an eventual third rewrite (4th attempt, having scrapped three) once they actually have the corner-cases working, to try to bring better performance once they know the tough corner-cases and can actually design a solution with both them and performance in mind from the beginning. So in practice an actually scalable snapshot-aware defrag is likely to be years out, as it's going to need actually working and scalable quota code, and even then, that's only part of the full scalable snapshot/ reflink-aware defrag solution. The good news is that while there's still work to be done, progress has been healthy in other areas, so once the quota code both actually works and is scalable, the other aspects should hopefully fall into place relatively fast, as they've already been maturing on their own, separately. > I will not use use defrag anymore, but what do you suggest me to do to > reclaim the lost space? Get rid of my current snapshots or maybe simply > running bedup? Neither snapshots nor dedup are one of my direct use-cases so my practical knowledge there is limited, but removing the snapshots should indeed clear the space (but you'll likely have to remove all of them covering a specific subvolume in ordered to free the space) as in doing so you'll be removing all references locking the old extents in place. If you already have them backed up (using send/receive, for instance) elsewhere or don't actually need them, however, it's a viable alternative. In theory the various btrfs dedup solutions out there should work as well, while letting you keep the snapshots (at least to the extent they're either writable snapshots so can be reflink modified, or a single read-only snapshot that the others including the freshly defragged working copy can be reflinked to), since that's their mechanism of operation -- finding identical block sequences and reflinking them so there's only one actual copy on the filesystem, with the rest being reflinks to it -- so in effect it should undo the reflink-breaking you did with the defrag. *But*, without any personal experience with them, I have no idea either how effective they are in practice in a situation like this, or how practical vs. convoluted the commandlines are going to be to actually accomplish your goal. Best-case, it's a simple and fast command to run and it not only undoes the defrag reflink breakage, but actually finds enough duplication in the dataset to reduce usage even further than before, worst-case it's multiple complex commands that take a week or longer to run and don't actually help much. So in practice, you have a choice between EITHER deleting all the snapshots and along with them everything locking down the old extents, thus leaving you with only the new, fresh copy (which by itself should be smaller than before), but at the expense of losing your snapshots, OR the at least from my knowledge relative unknown of the various btrfs dedup solutions, which in theory should work well, but in practice... I simply don't know. AND of course you have the option of basically doing nothing, leaving things as they are. However, given the context of this thread, it seems you don't consider that a viable longer term option as apparently you were trying to clear space, not use MORE of it, and presumably you actually need that space for something else, and that precludes just letting things be, unless of course you can afford to simply buy your way out of the problem with more storage devices. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo 2016-05-13 6:11 ` Duncan @ 2016-05-20 15:51 ` Niccolò Belli 0 siblings, 0 replies; 6+ messages in thread From: Niccolò Belli @ 2016-05-20 15:51 UTC (permalink / raw) To: linux-btrfs; +Cc: Duncan On venerdì 13 maggio 2016 08:11:27 CEST, Duncan wrote: > In theory the various btrfs dedup solutions out there should work as > well, while letting you keep the snapshots (at least to the extent > they're either writable snapshots so can be reflink modified Unfortunately as you said dedup doesn't work with read-only snapshots (I only use read-only snapshots with snapper) :( Does dedup's dedup-syscall branch (https://github.com/g2p/bedup/tree/wip/dedup-syscall) which uses the new batch deduplication ioctl merged in Linux 3.12 fix this? Unfortunately latest commit is from september :( ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-05-20 15:51 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-05-11 19:50 Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo Niccolò Belli 2016-05-11 20:07 ` Christoph Anton Mitterer 2016-05-12 10:18 ` Duncan 2016-05-12 13:56 ` Niccolò Belli 2016-05-13 6:11 ` Duncan 2016-05-20 15:51 ` Niccolò Belli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).