From: Mark Fasheh <mfasheh@suse.de>
To: Mordechay Kaganer <mkaganer@gmail.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: running duperemove but no free space gain
Date: Mon, 6 Jul 2015 15:34:04 -0700 [thread overview]
Message-ID: <20150706223404.GF7507@wotan.suse.de> (raw)
In-Reply-To: <CA+xOVSO2Cd1aYxAnA_Aq32oBTiMTbuo1YQo2cqEY_5WONQV=qA@mail.gmail.com>
On Tue, Jul 07, 2015 at 12:54:01AM +0300, Mordechay Kaganer wrote:
> I have a btrfs volume which is used as a backup using rsync from the
> main servers. It contains many duplicate files across different
> subvolumes and i have some read only snapshots of each subvolume,
> which are created every time after the backup completes.
>
> I'm was trying to gain some free space using duperemove (compiled from
> git master of this repo: https://github.com/markfasheh/duperemove).
>
> Executed like this:
>
> duperemove -rdAh <first_dir> <second_dir>
>
> Both directories point to the most recent read only snapshots of the
> corresponding subvolumes, but not to the subvolumes themselves, so i
> had to add -r option. AFAIK, they should point to exactly the same
> data because nothing was changed since the snapshots were taken.
>
> It runs successfully for several hours and prints out many files which
> are indeed duplicate like this:
>
> Showing 4 identical extents with id 5164bb47
> Start Length Filename
> 0.0 4.8M "...."
> 0.0 4.8M "...."
> 0.0 4.8M "...."
> 0.0 4.8M "...."
> ....skip...
> [0x78dee80] Try to dedupe extents with id 5164bb47
> [0x78dee80] Dedupe 3 extents (id: 5164bb47) with target: (0.0, 4.8M), "...."
>
> But the actual free space reported by "df" or by "btrfs fi df" doesn't
> seem to change. Used space and metadata space even increases slightly.
There were some patches for 4.2 which are both on the list and upstream that
fix an issue where the unligned tail of extents wasn't being deduplicated.
It sounds like you may have hit this. So we can tell, can you run the
'show-shared-extents' program that comes with duperemove (or 'filefrag -e')
against two of the files that should have been deduped together and provide
the output here. If most of the extent is showing deduped but there's a
not-deduped tail extent then that's most likely what you're seeing.
> I thought that doing deduplication on a file in one snapshot would
> affect all snapshots/subvolumes that contain this (exact version of
> the) file because they all actually should point to the same data
> extents, am i wrong?
Well the case you're describing is one where dedupe wouldn't work - the
extent would already be considered deduplicated since there is only one of
them.
If the data has changed from one snapshot to another, we've created new
extents (for the new data) and it can be deduped against any other extent.
For duperemove to discover it though you have to provide it a path which
will eventually resolve to those extents (that is, duperemove has to find it
in the file scan stage).
--Mark
--
Mark Fasheh
next prev parent reply other threads:[~2015-07-06 22:34 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-06 21:54 running duperemove but no free space gain Mordechay Kaganer
2015-07-06 22:34 ` Mark Fasheh [this message]
2015-07-06 23:03 ` Mordechay Kaganer
2015-07-06 23:07 ` Mark Fasheh
2015-07-07 6:27 ` Ryan Bourne
2015-07-07 13:14 ` Mordechay Kaganer
2015-07-08 5:57 ` Mordechay Kaganer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150706223404.GF7507@wotan.suse.de \
--to=mfasheh@suse.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=mkaganer@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).