Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo
@ 2016-05-11 19:50 Niccolò Belli
  2016-05-11 20:07 ` Christoph Anton Mitterer
  2016-05-12 10:18 ` Duncan
  0 siblings, 2 replies; 6+ messages in thread
From: Niccolò Belli @ 2016-05-11 19:50 UTC (permalink / raw)
  To: linux-btrfs

Hi,
Before doing the daily backup I did a btrfs check and btrfs scrub as usual. 
After that this time I also decided to run btrfs filesystem defragment -r 
-v -clzo on all subvolumes (from a live distro) and just to be sure I 
runned check and scrub once again.

Before defragment: total bytes scrubbed: 15.90GiB with 0 errors
After defragment: total bytes scrubbed: 26.66GiB with 0 errors

What did happen? This is something like a night and day difference: almost 
double the data! As stated in the subject all the subolumes have always 
been mounted with compress=lzo in /etc/fstab, even when I installed the 
distro a couple of days ago I manually mounted the subvolumes with -o 
compress=lzo. Instead I never used autodefrag.

Niccolò

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo
  2016-05-11 19:50 Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo Niccolò Belli
@ 2016-05-11 20:07 ` Christoph Anton Mitterer
  2016-05-12 10:18 ` Duncan
  1 sibling, 0 replies; 6+ messages in thread
From: Christoph Anton Mitterer @ 2016-05-11 20:07 UTC (permalink / raw)
  To: Niccolò Belli, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 159 bytes --]

On Wed, 2016-05-11 at 21:50 +0200, Niccolò Belli wrote:
> What did happen?

Perhaps because defrag unfortunately breaks up any reflinks?

Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo
  2016-05-11 19:50 Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo Niccolò Belli
  2016-05-11 20:07 ` Christoph Anton Mitterer
@ 2016-05-12 10:18 ` Duncan
  2016-05-12 13:56   ` Niccolò Belli
  1 sibling, 1 reply; 6+ messages in thread
From: Duncan @ 2016-05-12 10:18 UTC (permalink / raw)
  To: linux-btrfs

Niccolò Belli posted on Wed, 11 May 2016 21:50:43 +0200 as excerpted:

> Hi,
> Before doing the daily backup I did a btrfs check and btrfs scrub as
> usual.

> After that this time I also decided to run btrfs filesystem defragment
> -r -v -clzo on all subvolumes (from a live distro) and just to be sure I
> runned check and scrub once again.
> 
> Before defragment: total bytes scrubbed: 15.90GiB with 0 errors
> After  defragment: total bytes scrubbed: 26.66GiB with 0 errors
> 
> What did happen? This is something like a night and day difference:
> almost double the data! As stated in the subject all the subolumes have
> always been mounted with compress=lzo in /etc/fstab, even when I
> installed the distro a couple of days ago I manually mounted the
> subvolumes with -o compress=lzo. Instead I never used autodefrag.

I'd place money on your use of either snapshots or dedup.  As CAM says 
(perhaps too) briefly, defrag isn't snapshot (technically, reflink) 
aware, and will break reflinks from other snapshots/dedups as it defrags 
whatever file it's currently working on.

If there's few to no reflinks, as there won't be if you're not using 
snapshots, btrfs dedup, etc, no problem, but where there's existing 
reflinks, the mechanism both snapshots and the various btrfs dedup tools 
use, it will rewrite only the copy of the data it's working on, leaving 
the others as they are, thus effectively doubling (for the snapshots and 
first defrag case) the data usage, the old possibly multiply snapshot-
reflinked copy, and the new defragged copy that no longer shares extents 
with the snapshots and other previously reflinked copies.

And unlike a normal defrag, when you use the compress option, it forced 
rewrite of every file in ordered to (possibly re)compress it.  So while a 
normal defrag would have only rewritten some files and would have only 
expanded data usage to the extent it actually did rewrites, the compress 
option forced it to recompress all files it came across, breaking all 
those reflinks and duplicating the data if existing snapshots, etc, still 
referenced the old copies, in the process, thereby effectively doubling 
your data usage.

The fact that it didn't /quite/ double usage may be down to the normal 
compress mount option only doing a quick compression test and not 
compressing it if the file doesn't seem particularly compressible based 
on that quick test, while the defrag with compress likely actually checks 
every (128 KiB compression) block, getting a bit better compression in 
the process.  So the defrag/compress run didn't quite double usage as it 
compressed some stuff that the runtime compression didn't.  (FWIW, you 
can get the more thorough runtime compression behavior with the compress-
force option, which always tries compression, not just doing a quick test 
and skipping compression on the entire file if the bit the test tried 
didn't compress so well.)

FWIW, around 3.9, btrfs defrag was actually snapshot/reflink aware for a 
few releases, but it turned out that dealing with all those reflinks 
simply didn't scale well with the then existing code, and people were 
reporting defrag runs taking days or weeks, to (would-be) months with 
enough snapshots and with quotas (which didn't scale well either) turned 
on.

Obviously that was simply unworkable, so defrag's snapshot awareness was 
reverted until they could make it scale better, as a working but snapshot 
unaware defrag was clearly more practical than one that couldn't be run 
because it'd take months, and that snapshot awareness has yet to be 
reactivated.

So now the bottom line is don't defrag what you don't want un-reflinked.

FWIW, autodefrag has the same problem in theory, but the effect in 
practice is far more limited, in part because it only does its defrag 
thing when some part of the file is being rewritten (and thus COWed 
elsewhere, doing a limited dereflink already for the actually written 
block(s) already, and while autodefrag will magnify that a bit by COWing 
somewhat larger extents, for files of any size (MiB scale and larger) 
it's not going to rewrite and thus duplicate the entire file, as as 
defrag could do.  And it's definitely not going to be rewriting all files 
in large sections of the filesystem as recursive defrag with the 
compression option will.

Additionally, autodefrag will tend to defrag the file shortly after it 
has been changed, likely before any snapshots have been taken if they're 
only taken daily or so, so you'll only have effectively two copies of the 
portion of the file that was changed, the old version as still locked in 
place by previous snapshots and the new version, not the three that 
you're likely to have if you wait until snapshots have been done before 
doing the defrag (the old version as in previous snapshots, the new 
version as initially written and locked in place by post-change pre-
defrag snapshots, and the new version as defragged).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo
  2016-05-12 10:18 ` Duncan
@ 2016-05-12 13:56   ` Niccolò Belli
  2016-05-13  6:11     ` Duncan
  0 siblings, 1 reply; 6+ messages in thread
From: Niccolò Belli @ 2016-05-12 13:56 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Duncan

Thanks for the detailed explanation, hopefully in the future someone will 
be able to make defrag snapshot/reflink aware in a scalable manner.
I will not use use defrag anymore, but what do you suggest me to do to 
reclaim the lost space? Get rid of my current snapshots or maybe simply 
running bedup?

Niccolò

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo
  2016-05-12 13:56   ` Niccolò Belli
@ 2016-05-13  6:11     ` Duncan
  2016-05-20 15:51       ` Niccolò Belli
  0 siblings, 1 reply; 6+ messages in thread
From: Duncan @ 2016-05-13  6:11 UTC (permalink / raw)
  To: linux-btrfs

Niccolò Belli posted on Thu, 12 May 2016 15:56:20 +0200 as excerpted:

> Thanks for the detailed explanation, hopefully in the future someone
> will be able to make defrag snapshot/reflink aware in a scalable manner.

It's still planned, AFAIK, but one of the scaling issues in particular, 
quotas, have turned out to be a particularly challenging thing to even 
actually get working correctly.  They've rewritten the quota code twice 
(so they're on their third attempted solution), and it's still broken in 
certain corner-cases ATM, to the point that while they're still actually 
trying to get the existing third try to work in the tough corner-cases as 
well, they're already talking about an eventual third rewrite (4th 
attempt, having scrapped three) once they actually have the corner-cases 
working, to try to bring better performance once they know the tough 
corner-cases and can actually design a solution with both them and 
performance in mind from the beginning.

So in practice an actually scalable snapshot-aware defrag is likely to be 
years out, as it's going to need actually working and scalable quota 
code, and even then, that's only part of the full scalable snapshot/
reflink-aware defrag solution.

The good news is that while there's still work to be done, progress has 
been healthy in other areas, so once the quota code both actually works 
and is scalable, the other aspects should hopefully fall into place 
relatively fast, as they've already been maturing on their own, 
separately.

> I will not use use defrag anymore, but what do you suggest me to do to
> reclaim the lost space? Get rid of my current snapshots or maybe simply
> running bedup?

Neither snapshots nor dedup are one of my direct use-cases so my 
practical knowledge there is limited, but removing the snapshots should 
indeed clear the space (but you'll likely have to remove all of them 
covering a specific subvolume in ordered to free the space) as in doing 
so you'll be removing all references locking the old extents in place.  
If you already have them backed up (using send/receive, for instance) 
elsewhere or don't actually need them, however, it's a viable alternative.

In theory the various btrfs dedup solutions out there should work as 
well, while letting you keep the snapshots (at least to the extent 
they're either writable snapshots so can be reflink modified, or a single 
read-only snapshot that the others including the freshly defragged 
working copy can be reflinked to), since that's their mechanism of 
operation -- finding identical block sequences and reflinking them so 
there's only one actual copy on the filesystem, with the rest being 
reflinks to it -- so in effect it should undo the reflink-breaking you 
did with the defrag.  *But*, without any personal experience with them, I 
have no idea either how effective they are in practice in a situation 
like this, or how practical vs. convoluted the commandlines are going to 
be to actually accomplish your goal.  Best-case, it's a simple and fast 
command to run and it not only undoes the defrag reflink breakage, but 
actually finds enough duplication in the dataset to reduce usage even 
further than before, worst-case it's multiple complex commands that take 
a week or longer to run and don't actually help much.

So in practice, you have a choice between EITHER deleting all the 
snapshots and along with them everything locking down the old extents, 
thus leaving you with only the new, fresh copy (which by itself should be 
smaller than before), but at the expense of losing your snapshots, OR the 
at least from my knowledge relative unknown of the various btrfs dedup 
solutions, which in theory should work well, but in practice... I simply 
don't know.

AND of course you have the option of basically doing nothing, leaving 
things as they are.  However, given the context of this thread, it seems 
you don't consider that a viable longer term option as apparently you 
were trying to clear space, not use MORE of it, and presumably you 
actually need that space for something else, and that precludes just 
letting things be, unless of course you can afford to simply buy your way 
out of the problem with more storage devices.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo
  2016-05-13  6:11     ` Duncan
@ 2016-05-20 15:51       ` Niccolò Belli
  0 siblings, 0 replies; 6+ messages in thread
From: Niccolò Belli @ 2016-05-20 15:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Duncan

On venerdì 13 maggio 2016 08:11:27 CEST, Duncan wrote:
> In theory the various btrfs dedup solutions out there should work as 
> well, while letting you keep the snapshots (at least to the extent 
> they're either writable snapshots so can be reflink modified

Unfortunately as you said dedup doesn't work with read-only snapshots (I 
only use read-only snapshots with snapper) :(

Does dedup's dedup-syscall branch 
(https://github.com/g2p/bedup/tree/wip/dedup-syscall) which uses the new 
batch deduplication ioctl merged in Linux 3.12 fix this? Unfortunately 
latest commit is from september :(

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-05-20 15:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-11 19:50 Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo Niccolò Belli
2016-05-11 20:07 ` Christoph Anton Mitterer
2016-05-12 10:18 ` Duncan
2016-05-12 13:56   ` Niccolò Belli
2016-05-13  6:11     ` Duncan
2016-05-20 15:51       ` Niccolò Belli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).