linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* running duperemove but no free space gain
@ 2015-07-06 21:54 Mordechay Kaganer
  2015-07-06 22:34 ` Mark Fasheh
  0 siblings, 1 reply; 7+ messages in thread
From: Mordechay Kaganer @ 2015-07-06 21:54 UTC (permalink / raw)
  To: Btrfs BTRFS

B.H.

Hello.

I have a btrfs volume which is used as a backup using rsync from the
main servers. It contains many duplicate files across different
subvolumes and i have some read only snapshots of each subvolume,
which are created every time after the backup completes.

I'm was trying to gain some free space using duperemove (compiled from
git master of this repo: https://github.com/markfasheh/duperemove).

Executed like this:

duperemove -rdAh <first_dir> <second_dir>

Both directories point to the most recent read only snapshots of the
corresponding subvolumes, but not to the subvolumes themselves, so i
had to add -r option. AFAIK, they should point to exactly the same
data because nothing was changed since the snapshots were taken.

It runs successfully for several hours and prints out many files which
are indeed duplicate like this:

Showing 4 identical extents with id 5164bb47
Start           Length          Filename
0.0     4.8M    "...."
0.0     4.8M    "...."
0.0     4.8M    "...."
0.0     4.8M    "...."
....skip...
[0x78dee80] Try to dedupe extents with id 5164bb47
[0x78dee80] Dedupe 3 extents (id: 5164bb47) with target: (0.0, 4.8M), "...."

But the actual free space reported by "df" or by "btrfs fi df" doesn't
seem to change. Used space and metadata space even increases slightly.

I thought that doing deduplication on a file in one snapshot would
affect all snapshots/subvolumes that contain this (exact version of
the) file because they all actually should point to the same data
extents, am i wrong?

Versions:

duperemove v0.11-dev

# uname -a
Linux yemot-bu 4.1.0-040100-generic #201507030940 SMP Fri Jul 3
09:41:47 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

# btrfs version
btrfs-progs v4.1

Thanks!

-- 
משיח NOW!
Moshiach is coming very soon, prepare yourself!
יחי אדוננו מורינו ורבינו מלך המשיח לעולם ועד!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: running duperemove but no free space gain
  2015-07-06 21:54 running duperemove but no free space gain Mordechay Kaganer
@ 2015-07-06 22:34 ` Mark Fasheh
  2015-07-06 23:03   ` Mordechay Kaganer
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Fasheh @ 2015-07-06 22:34 UTC (permalink / raw)
  To: Mordechay Kaganer; +Cc: Btrfs BTRFS

On Tue, Jul 07, 2015 at 12:54:01AM +0300, Mordechay Kaganer wrote:
> I have a btrfs volume which is used as a backup using rsync from the
> main servers. It contains many duplicate files across different
> subvolumes and i have some read only snapshots of each subvolume,
> which are created every time after the backup completes.
> 
> I'm was trying to gain some free space using duperemove (compiled from
> git master of this repo: https://github.com/markfasheh/duperemove).
> 
> Executed like this:
> 
> duperemove -rdAh <first_dir> <second_dir>
> 
> Both directories point to the most recent read only snapshots of the
> corresponding subvolumes, but not to the subvolumes themselves, so i
> had to add -r option. AFAIK, they should point to exactly the same
> data because nothing was changed since the snapshots were taken.
> 
> It runs successfully for several hours and prints out many files which
> are indeed duplicate like this:
> 
> Showing 4 identical extents with id 5164bb47
> Start           Length          Filename
> 0.0     4.8M    "...."
> 0.0     4.8M    "...."
> 0.0     4.8M    "...."
> 0.0     4.8M    "...."
> ....skip...
> [0x78dee80] Try to dedupe extents with id 5164bb47
> [0x78dee80] Dedupe 3 extents (id: 5164bb47) with target: (0.0, 4.8M), "...."
> 
> But the actual free space reported by "df" or by "btrfs fi df" doesn't
> seem to change. Used space and metadata space even increases slightly.

There were some patches for 4.2 which are both on the list and upstream that
fix an issue where the unligned tail of extents wasn't being deduplicated.
It sounds like you may have hit this. So we can tell, can you run the
'show-shared-extents' program that comes with duperemove (or 'filefrag -e')
against two of the files that should have been deduped together and provide
the output here. If most of the extent is showing deduped but there's a
not-deduped tail extent then that's most likely what you're seeing.


> I thought that doing deduplication on a file in one snapshot would
> affect all snapshots/subvolumes that contain this (exact version of
> the) file because they all actually should point to the same data
> extents, am i wrong?

Well the case you're describing is one where dedupe wouldn't work - the
extent would already be considered deduplicated since there is only one of
them.

If the data has changed from one snapshot to another, we've created new
extents (for the new data) and it can be deduped against any other extent.
For duperemove to discover it though you have to provide it a path which
will eventually resolve to those extents (that is, duperemove has to find it
in the file scan stage).
	--Mark

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: running duperemove but no free space gain
  2015-07-06 22:34 ` Mark Fasheh
@ 2015-07-06 23:03   ` Mordechay Kaganer
  2015-07-06 23:07     ` Mark Fasheh
  0 siblings, 1 reply; 7+ messages in thread
From: Mordechay Kaganer @ 2015-07-06 23:03 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: Btrfs BTRFS

B.H.

On Tue, Jul 7, 2015 at 1:34 AM, Mark Fasheh <mfasheh@suse.de> wrote:
>>
>> It runs successfully for several hours and prints out many files which
>> are indeed duplicate like this:
>>
>> Showing 4 identical extents with id 5164bb47
>> Start           Length          Filename
>> 0.0     4.8M    "...."
>> 0.0     4.8M    "...."
>> 0.0     4.8M    "...."
>> 0.0     4.8M    "...."
>> ....skip...
>> [0x78dee80] Try to dedupe extents with id 5164bb47
>> [0x78dee80] Dedupe 3 extents (id: 5164bb47) with target: (0.0, 4.8M), "...."
>>
>> But the actual free space reported by "df" or by "btrfs fi df" doesn't
>> seem to change. Used space and metadata space even increases slightly.
>
> There were some patches for 4.2 which are both on the list and upstream that
> fix an issue where the unligned tail of extents wasn't being deduplicated.
> It sounds like you may have hit this. So we can tell, can you run the
> 'show-shared-extents' program that comes with duperemove (or 'filefrag -e')
> against two of the files that should have been deduped together and provide
> the output here. If most of the extent is showing deduped but there's a
> not-deduped tail extent then that's most likely what you're seeing.
>

# show-shared-extents <first_file> <second_file>
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
350771204096, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
350771318784, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 131072, fe_physical:
350771425280, fe_flags: 0x2008 (encoded shared )
(fiemap) [3] fe_logical: 393216, fe_length: 131072, fe_physical:
350771548160, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 524288, fe_length: 131072, fe_physical:
350771666944, fe_flags: 0x2008 (encoded shared )
(fiemap) [5] fe_logical: 655360, fe_length: 131072, fe_physical:
350771781632, fe_flags: 0x2008 (encoded shared )
(fiemap) [6] fe_logical: 786432, fe_length: 131072, fe_physical:
350771900416, fe_flags: 0x2008 (encoded shared )
(fiemap) [7] fe_logical: 917504, fe_length: 131072, fe_physical:
350772019200, fe_flags: 0x2008 (encoded shared )
(fiemap) [8] fe_logical: 1048576, fe_length: 131072, fe_physical:
350772137984, fe_flags: 0x2008 (encoded shared )
(fiemap) [9] fe_logical: 1179648, fe_length: 131072, fe_physical:
350772256768, fe_flags: 0x2008 (encoded shared )
(fiemap) [10] fe_logical: 1310720, fe_length: 131072, fe_physical:
350772375552, fe_flags: 0x2008 (encoded shared )
(fiemap) [11] fe_logical: 1441792, fe_length: 131072, fe_physical:
350772494336, fe_flags: 0x2008 (encoded shared )
(fiemap) [12] fe_logical: 1572864, fe_length: 131072, fe_physical:
350772617216, fe_flags: 0x2008 (encoded shared )
(fiemap) [13] fe_logical: 1703936, fe_length: 131072, fe_physical:
350772740096, fe_flags: 0x2008 (encoded shared )
(fiemap) [14] fe_logical: 1835008, fe_length: 131072, fe_physical:
350772854784, fe_flags: 0x2008 (encoded shared )
(fiemap) [15] fe_logical: 1966080, fe_length: 131072, fe_physical:
350772977664, fe_flags: 0x2008 (encoded shared )
(fiemap) [16] fe_logical: 2097152, fe_length: 131072, fe_physical:
350773100544, fe_flags: 0x2008 (encoded shared )
(fiemap) [17] fe_logical: 2228224, fe_length: 131072, fe_physical:
350773223424, fe_flags: 0x2008 (encoded shared )
(fiemap) [18] fe_logical: 2359296, fe_length: 131072, fe_physical:
350773342208, fe_flags: 0x2008 (encoded shared )
(fiemap) [19] fe_logical: 2490368, fe_length: 131072, fe_physical:
350773460992, fe_flags: 0x2008 (encoded shared )
(fiemap) [20] fe_logical: 2621440, fe_length: 131072, fe_physical:
350773579776, fe_flags: 0x2008 (encoded shared )
(fiemap) [21] fe_logical: 2752512, fe_length: 131072, fe_physical:
350773698560, fe_flags: 0x2008 (encoded shared )
(fiemap) [22] fe_logical: 2883584, fe_length: 131072, fe_physical:
350773821440, fe_flags: 0x2008 (encoded shared )
(fiemap) [23] fe_logical: 3014656, fe_length: 131072, fe_physical:
350773944320, fe_flags: 0x2008 (encoded shared )
(fiemap) [24] fe_logical: 3145728, fe_length: 131072, fe_physical:
350774067200, fe_flags: 0x2008 (encoded shared )
(fiemap) [25] fe_logical: 3276800, fe_length: 131072, fe_physical:
350774181888, fe_flags: 0x2008 (encoded shared )
(fiemap) [26] fe_logical: 3407872, fe_length: 131072, fe_physical:
350774300672, fe_flags: 0x2008 (encoded shared )
(fiemap) [27] fe_logical: 3538944, fe_length: 131072, fe_physical:
350774423552, fe_flags: 0x2008 (encoded shared )
(fiemap) [28] fe_logical: 3670016, fe_length: 131072, fe_physical:
350774546432, fe_flags: 0x2008 (encoded shared )
(fiemap) [29] fe_logical: 3801088, fe_length: 131072, fe_physical:
350774669312, fe_flags: 0x2008 (encoded shared )
(fiemap) [30] fe_logical: 3932160, fe_length: 131072, fe_physical:
350774792192, fe_flags: 0x2008 (encoded shared )
(fiemap) [31] fe_logical: 4063232, fe_length: 131072, fe_physical:
350774915072, fe_flags: 0x2008 (encoded shared )
(fiemap) [32] fe_logical: 4194304, fe_length: 131072, fe_physical:
350775037952, fe_flags: 0x2008 (encoded shared )
(fiemap) [33] fe_logical: 4325376, fe_length: 131072, fe_physical:
350775160832, fe_flags: 0x2008 (encoded shared )
(fiemap) [34] fe_logical: 4456448, fe_length: 131072, fe_physical:
350775279616, fe_flags: 0x2008 (encoded shared )
(fiemap) [35] fe_logical: 4587520, fe_length: 131072, fe_physical:
350775394304, fe_flags: 0x2008 (encoded shared )
(fiemap) [36] fe_logical: 4718592, fe_length: 131072, fe_physical:
350775517184, fe_flags: 0x2008 (encoded shared )
(fiemap) [37] fe_logical: 4849664, fe_length: 131072, fe_physical:
350775640064, fe_flags: 0x2008 (encoded shared )
(fiemap) [38] fe_logical: 4980736, fe_length: 61440, fe_physical:
350775758848, fe_flags: 0x2009 (last encoded shared )
<first_file>: 5042176 shared bytes
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
350771204096, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
350771318784, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 131072, fe_physical:
350771425280, fe_flags: 0x2008 (encoded shared )
(fiemap) [3] fe_logical: 393216, fe_length: 131072, fe_physical:
350771548160, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 524288, fe_length: 131072, fe_physical:
350771666944, fe_flags: 0x2008 (encoded shared )
(fiemap) [5] fe_logical: 655360, fe_length: 131072, fe_physical:
350771781632, fe_flags: 0x2008 (encoded shared )
(fiemap) [6] fe_logical: 786432, fe_length: 131072, fe_physical:
350771900416, fe_flags: 0x2008 (encoded shared )
(fiemap) [7] fe_logical: 917504, fe_length: 131072, fe_physical:
350772019200, fe_flags: 0x2008 (encoded shared )
(fiemap) [8] fe_logical: 1048576, fe_length: 131072, fe_physical:
350772137984, fe_flags: 0x2008 (encoded shared )
(fiemap) [9] fe_logical: 1179648, fe_length: 131072, fe_physical:
350772256768, fe_flags: 0x2008 (encoded shared )
(fiemap) [10] fe_logical: 1310720, fe_length: 131072, fe_physical:
350772375552, fe_flags: 0x2008 (encoded shared )
(fiemap) [11] fe_logical: 1441792, fe_length: 131072, fe_physical:
350772494336, fe_flags: 0x2008 (encoded shared )
(fiemap) [12] fe_logical: 1572864, fe_length: 131072, fe_physical:
350772617216, fe_flags: 0x2008 (encoded shared )
(fiemap) [13] fe_logical: 1703936, fe_length: 131072, fe_physical:
350772740096, fe_flags: 0x2008 (encoded shared )
(fiemap) [14] fe_logical: 1835008, fe_length: 131072, fe_physical:
350772854784, fe_flags: 0x2008 (encoded shared )
(fiemap) [15] fe_logical: 1966080, fe_length: 131072, fe_physical:
350772977664, fe_flags: 0x2008 (encoded shared )
(fiemap) [16] fe_logical: 2097152, fe_length: 131072, fe_physical:
350773100544, fe_flags: 0x2008 (encoded shared )
(fiemap) [17] fe_logical: 2228224, fe_length: 131072, fe_physical:
350773223424, fe_flags: 0x2008 (encoded shared )
(fiemap) [18] fe_logical: 2359296, fe_length: 131072, fe_physical:
350773342208, fe_flags: 0x2008 (encoded shared )
(fiemap) [19] fe_logical: 2490368, fe_length: 131072, fe_physical:
350773460992, fe_flags: 0x2008 (encoded shared )
(fiemap) [20] fe_logical: 2621440, fe_length: 131072, fe_physical:
350773579776, fe_flags: 0x2008 (encoded shared )
(fiemap) [21] fe_logical: 2752512, fe_length: 131072, fe_physical:
350773698560, fe_flags: 0x2008 (encoded shared )
(fiemap) [22] fe_logical: 2883584, fe_length: 131072, fe_physical:
350773821440, fe_flags: 0x2008 (encoded shared )
(fiemap) [23] fe_logical: 3014656, fe_length: 131072, fe_physical:
350773944320, fe_flags: 0x2008 (encoded shared )
(fiemap) [24] fe_logical: 3145728, fe_length: 131072, fe_physical:
350774067200, fe_flags: 0x2008 (encoded shared )
(fiemap) [25] fe_logical: 3276800, fe_length: 131072, fe_physical:
350774181888, fe_flags: 0x2008 (encoded shared )
(fiemap) [26] fe_logical: 3407872, fe_length: 131072, fe_physical:
350774300672, fe_flags: 0x2008 (encoded shared )
(fiemap) [27] fe_logical: 3538944, fe_length: 131072, fe_physical:
350774423552, fe_flags: 0x2008 (encoded shared )
(fiemap) [28] fe_logical: 3670016, fe_length: 131072, fe_physical:
350774546432, fe_flags: 0x2008 (encoded shared )
(fiemap) [29] fe_logical: 3801088, fe_length: 131072, fe_physical:
350774669312, fe_flags: 0x2008 (encoded shared )
(fiemap) [30] fe_logical: 3932160, fe_length: 131072, fe_physical:
350774792192, fe_flags: 0x2008 (encoded shared )
(fiemap) [31] fe_logical: 4063232, fe_length: 131072, fe_physical:
350774915072, fe_flags: 0x2008 (encoded shared )
(fiemap) [32] fe_logical: 4194304, fe_length: 131072, fe_physical:
350775037952, fe_flags: 0x2008 (encoded shared )
(fiemap) [33] fe_logical: 4325376, fe_length: 131072, fe_physical:
350775160832, fe_flags: 0x2008 (encoded shared )
(fiemap) [34] fe_logical: 4456448, fe_length: 131072, fe_physical:
350775279616, fe_flags: 0x2008 (encoded shared )
(fiemap) [35] fe_logical: 4587520, fe_length: 131072, fe_physical:
350775394304, fe_flags: 0x2008 (encoded shared )
(fiemap) [36] fe_logical: 4718592, fe_length: 131072, fe_physical:
350775517184, fe_flags: 0x2008 (encoded shared )
(fiemap) [37] fe_logical: 4849664, fe_length: 131072, fe_physical:
350775640064, fe_flags: 0x2008 (encoded shared )
(fiemap) [38] fe_logical: 4980736, fe_length: 57344, fe_physical:
350775758848, fe_flags: 0x2008 (encoded shared )
(fiemap) [39] fe_logical: 5038080, fe_length: 4096, fe_physical:
351184961536, fe_flags: 0x9 (last encoded )
<second_file>: 5038080 shared bytes

And another pair (all extents reported shared):

show-shared-extents  <first_file> <second_file>
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
3576952483840, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
3576952606720, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 262144, fe_physical:
3576952733696, fe_flags: 0x2000 (shared )
(fiemap) [3] fe_logical: 524288, fe_length: 131072, fe_physical:
3576952995840, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 655360, fe_length: 393216, fe_physical:
3576953122816, fe_flags: 0x2000 (shared )
(fiemap) [5] fe_logical: 1048576, fe_length: 131072, fe_physical:
3576953516032, fe_flags: 0x2008 (encoded shared )
(fiemap) [6] fe_logical: 1179648, fe_length: 131072, fe_physical:
3576953643008, fe_flags: 0x2008 (encoded shared )
(fiemap) [7] fe_logical: 1310720, fe_length: 262144, fe_physical:
3576953769984, fe_flags: 0x2000 (shared )
(fiemap) [8] fe_logical: 1572864, fe_length: 131072, fe_physical:
3576954032128, fe_flags: 0x2008 (encoded shared )
(fiemap) [9] fe_logical: 1703936, fe_length: 393216, fe_physical:
3576954159104, fe_flags: 0x2000 (shared )
(fiemap) [10] fe_logical: 2097152, fe_length: 131072, fe_physical:
3576954552320, fe_flags: 0x2008 (encoded shared )
(fiemap) [11] fe_logical: 2228224, fe_length: 393216, fe_physical:
3576954679296, fe_flags: 0x2000 (shared )
(fiemap) [12] fe_logical: 2621440, fe_length: 131072, fe_physical:
3576955072512, fe_flags: 0x2008 (encoded shared )
(fiemap) [13] fe_logical: 2752512, fe_length: 131072, fe_physical:
3576955199488, fe_flags: 0x2008 (encoded shared )
(fiemap) [14] fe_logical: 2883584, fe_length: 262144, fe_physical:
3576955326464, fe_flags: 0x2000 (shared )
(fiemap) [15] fe_logical: 3145728, fe_length: 131072, fe_physical:
3576955588608, fe_flags: 0x2008 (encoded shared )
(fiemap) [16] fe_logical: 3276800, fe_length: 1748992, fe_physical:
3576955715584, fe_flags: 0x2001 (last shared )
<first_file>: 5025792 shared bytes
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
3576952483840, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
3576952606720, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 262144, fe_physical:
3576952733696, fe_flags: 0x2000 (shared )
(fiemap) [3] fe_logical: 524288, fe_length: 131072, fe_physical:
3576952995840, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 655360, fe_length: 393216, fe_physical:
3576953122816, fe_flags: 0x2000 (shared )
(fiemap) [5] fe_logical: 1048576, fe_length: 131072, fe_physical:
3576953516032, fe_flags: 0x2008 (encoded shared )
(fiemap) [6] fe_logical: 1179648, fe_length: 131072, fe_physical:
3576953643008, fe_flags: 0x2008 (encoded shared )
(fiemap) [7] fe_logical: 1310720, fe_length: 262144, fe_physical:
3576953769984, fe_flags: 0x2000 (shared )
(fiemap) [8] fe_logical: 1572864, fe_length: 131072, fe_physical:
3576954032128, fe_flags: 0x2008 (encoded shared )
(fiemap) [9] fe_logical: 1703936, fe_length: 393216, fe_physical:
3576954159104, fe_flags: 0x2000 (shared )
(fiemap) [10] fe_logical: 2097152, fe_length: 131072, fe_physical:
3576954552320, fe_flags: 0x2008 (encoded shared )
(fiemap) [11] fe_logical: 2228224, fe_length: 393216, fe_physical:
3576954679296, fe_flags: 0x2000 (shared )
(fiemap) [12] fe_logical: 2621440, fe_length: 131072, fe_physical:
3576955072512, fe_flags: 0x2008 (encoded shared )
(fiemap) [13] fe_logical: 2752512, fe_length: 131072, fe_physical:
3576955199488, fe_flags: 0x2008 (encoded shared )
(fiemap) [14] fe_logical: 2883584, fe_length: 262144, fe_physical:
3576955326464, fe_flags: 0x2000 (shared )
(fiemap) [15] fe_logical: 3145728, fe_length: 131072, fe_physical:
3576955588608, fe_flags: 0x2008 (encoded shared )
(fiemap) [16] fe_logical: 3276800, fe_length: 393216, fe_physical:
3576955715584, fe_flags: 0x2000 (shared )
(fiemap) [17] fe_logical: 3670016, fe_length: 524288, fe_physical:
3576956108800, fe_flags: 0x2000 (shared )
(fiemap) [18] fe_logical: 4194304, fe_length: 524288, fe_physical:
3576956633088, fe_flags: 0x2000 (shared )
(fiemap) [19] fe_logical: 4718592, fe_length: 303104, fe_physical:
3576957157376, fe_flags: 0x2000 (shared )
(fiemap) [20] fe_logical: 5021696, fe_length: 4096, fe_physical:
742996164608, fe_flags: 0x2001 (last shared )
 <second_file>: 5025792 shared bytes

Checked some more pairs, most extents appear as "shared". In some
cases there is "last encoded" not shared extent with length 4096.

Since i use snapshots, may shared also mean "shared between snapshots"?

>
>> I thought that doing deduplication on a file in one snapshot would
>> affect all snapshots/subvolumes that contain this (exact version of
>> the) file because they all actually should point to the same data
>> extents, am i wrong?
>
> Well the case you're describing is one where dedupe wouldn't work - the
> extent would already be considered deduplicated since there is only one of
> them.
>
> If the data has changed from one snapshot to another, we've created new
> extents (for the new data) and it can be deduped against any other extent.
> For duperemove to discover it though you have to provide it a path which
> will eventually resolve to those extents (that is, duperemove has to find it
> in the file scan stage).

I didn't explain this properly: the snapshots are from different
subvolumes, not from the same subvolume. They contain data which i
know is actually duplicated on the main server (there are copies of a
huge Dropbox folder for instance on each subvolume). And duperemove
does find those duplicate files properly in the first phase.

My question is, is this sufficient to point duperemove to the read
only snapshots instead of the main subvolumes?

-- 
משיח NOW!
Moshiach is coming very soon, prepare yourself!
יחי אדוננו מורינו ורבינו מלך המשיח לעולם ועד!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: running duperemove but no free space gain
  2015-07-06 23:03   ` Mordechay Kaganer
@ 2015-07-06 23:07     ` Mark Fasheh
  2015-07-07  6:27       ` Ryan Bourne
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Fasheh @ 2015-07-06 23:07 UTC (permalink / raw)
  To: Mordechay Kaganer; +Cc: Btrfs BTRFS

On Tue, Jul 07, 2015 at 02:03:06AM +0300, Mordechay Kaganer wrote:
> 
> Checked some more pairs, most extents appear as "shared". In some
> cases there is "last encoded" not shared extent with length 4096.
> 
> Since i use snapshots, may shared also mean "shared between snapshots"?

Yes I forgot about that but in your case almost everything will be reported
shared. Btw, I have to leave my office now but will get to the rest of your e-mail
later.

--
Mark Fasheh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: running duperemove but no free space gain
  2015-07-06 23:07     ` Mark Fasheh
@ 2015-07-07  6:27       ` Ryan Bourne
  2015-07-07 13:14         ` Mordechay Kaganer
  0 siblings, 1 reply; 7+ messages in thread
From: Ryan Bourne @ 2015-07-07  6:27 UTC (permalink / raw)
  To: linux-btrfs

On 7/07/15 9:07 AM, Mark Fasheh wrote:
>
> Yes I forgot about that but in your case almost everything will be reported
> shared. Btw, I have to leave my office now but will get to the rest of your e-mail
> later.
>
> --
> Mark Fasheh
> --

To clarify, if I did the following:

# btrfs subvolume create a
# dd bs=1M count=10 if=/dev/urandom of=a/1
# dd if=a/1 of=a/2
# btrfs subvolume snapshot a b

then I have four files containing the same data. a/1, b/1 share extents 
and a/2, b/2 share extents.

If I then deduplicate a/1 and a/2 will all four files be sharing 
extents, or only three? (Assuming I have the patches for 4.2)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: running duperemove but no free space gain
  2015-07-07  6:27       ` Ryan Bourne
@ 2015-07-07 13:14         ` Mordechay Kaganer
  2015-07-08  5:57           ` Mordechay Kaganer
  0 siblings, 1 reply; 7+ messages in thread
From: Mordechay Kaganer @ 2015-07-07 13:14 UTC (permalink / raw)
  To: Btrfs BTRFS

B.H.

On Tue, Jul 7, 2015 at 9:27 AM, Ryan Bourne <hub@spotprint.com.au> wrote:
> To clarify, if I did the following:
>
> # btrfs subvolume create a
> # dd bs=1M count=10 if=/dev/urandom of=a/1
> # dd if=a/1 of=a/2
> # btrfs subvolume snapshot a b
>
> then I have four files containing the same data. a/1, b/1 share extents and a/2, b/2 share extents.
>
> If I then deduplicate a/1 and a/2 will all four files be sharing extents, or only three? (Assuming I have the patches for 4.2)
>

OK, i did a test almost exactly as you have suggested. It appears that
dedupe does not affect the "b" snapshot so only 3 of 4 files are
deduped, which explains no free space gain as the duplicate data is
still used.

Here's the log - fe_physical/fe_length can be used to figure out what
is actually deduped:

; Setup:

# btrfs sub create a
# dd bs=128K count=8 if=/dev/urandom of=a/1
# dd if=a/1 of=a/2
# btrfs subvolume snapshot a b

; Before dedupe:

# show-shared-extents a/1 a/2 b/1 b/2

(fiemap) [0] fe_logical: 0, fe_length: 524288, fe_physical:
3632062464, fe_flags: 0x2000 (shared )
(fiemap) [1] fe_logical: 524288, fe_length: 524288, fe_physical:
3632586752, fe_flags: 0x2001 (last shared )
a/1: 1048576 shared bytes

(fiemap) [0] fe_logical: 0, fe_length: 524288, fe_physical:
3633111040, fe_flags: 0x2000 (shared )
(fiemap) [1] fe_logical: 524288, fe_length: 524288, fe_physical:
3633635328, fe_flags: 0x2001 (last shared )
a/2: 1048576 shared bytes

(fiemap) [0] fe_logical: 0, fe_length: 1048576, fe_physical:
3632062464, fe_flags: 0x2001 (last shared )
b/1: 1048576 shared bytes

(fiemap) [0] fe_logical: 0, fe_length: 1048576, fe_physical:
3633111040, fe_flags: 0x2001 (last shared )
b/2: 1048576 shared bytes

; Dedupe:
# duperemove -d a/1 a/2
Using 128K blocks
Using hash: murmur3
Using 4 threads for file hashing phase
csum: a/1       [1/2] (50.00%)
csum: a/2       [2/2] (100.00%)
Hashing completed. Calculating duplicate extents - this may take some time.
[########################################]
Search completed with no errors.
Simple read and compare of file data found 1 instances of extents that
might benefit from deduplication.
Showing 2 identical extents with id 7ec588f6
Start           Length          Filename
0       1048576 "a/2"
0       1048576 "a/1"
Using 4 threads for dedupe phase
[0x1e42540] Try to dedupe extents with id 7ec588f6
[0x1e42540] Dedupe 1 extents (id: 7ec588f6) with target: (0, 1048576), "a/2"
Kernel processed data (excludes target files): 1048576
Comparison of extent info shows a net change in shared extents of: 0

; After dedupe:
# show-shared-extents a/1 a/2 b/1 b/2

(fiemap) [0] fe_logical: 0, fe_length: 524288, fe_physical:
3633111040, fe_flags: 0x2000 (shared )
(fiemap) [1] fe_logical: 524288, fe_length: 524288, fe_physical:
3633635328, fe_flags: 0x2001 (last shared )
a/1: 1048576 shared bytes

(fiemap) [0] fe_logical: 0, fe_length: 524288, fe_physical:
3633111040, fe_flags: 0x2000 (shared )
(fiemap) [1] fe_logical: 524288, fe_length: 524288, fe_physical:
3633635328, fe_flags: 0x2001 (last shared )
a/2: 1048576 shared bytes

(fiemap) [0] fe_logical: 0, fe_length: 1048576, fe_physical:
3632062464, fe_flags: 0x1 (last )
b/1: 0 shared bytes

(fiemap) [0] fe_logical: 0, fe_length: 1048576, fe_physical:
3633111040, fe_flags: 0x2001 (last shared )
b/2: 1048576 shared bytes


b/1 was not affected by duperemove. As far as i understand, after
creating snapshot the dedupe operation actually modifies the metadata
of a/1 and/or a/2 which causes it to be COWed so b's data is not
affected.

The conclusion is: to actually reclaim the duplicated space you have
to include all snapshots that may point to the file.

-- 
משיח NOW!
Moshiach is coming very soon, prepare yourself!
יחי אדוננו מורינו ורבינו מלך המשיח לעולם ועד!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: running duperemove but no free space gain
  2015-07-07 13:14         ` Mordechay Kaganer
@ 2015-07-08  5:57           ` Mordechay Kaganer
  0 siblings, 0 replies; 7+ messages in thread
From: Mordechay Kaganer @ 2015-07-08  5:57 UTC (permalink / raw)
  To: Btrfs BTRFS

B.H.


On Tue, Jul 7, 2015 at 4:14 PM, Mordechay Kaganer <mkaganer@gmail.com> wrote:
>
>
> The conclusion is: to actually reclaim the duplicated space you have
> to include all snapshots that may point to the file.
>

Tried to dedupe the real data, including all snapshots. Still no free
space gain. This time, this looks like dedupe didn't actually work.
The log:

# duperemove:

Showing 12 identical extents with id 572cbc6e
Start           Length          Filename
0.0     54.9M   "/test_btrfs/snapshots/2015-06-13/subvol1/dir770/21+/3065.wav"
0.0     54.9M
"/test_btrfs/snapshots/2015-06-13/subvol1/dir770.sav/21/3065.wav"
0.0     54.9M   "/test_btrfs/snapshots/2015-06-16/subvol1/dir770/21+/3065.wav"
0.0     54.9M
"/test_btrfs/snapshots/2015-06-16/subvol1/dir770.sav/21/3065.wav"
0.0     54.9M   "/test_btrfs/snapshots/2015-06-20/subvol1/dir770/21+/3065.wav"
0.0     54.9M
"/test_btrfs/snapshots/2015-06-20/subvol1/dir770.sav/21/3065.wav"
0.0     54.9M   "/test_btrfs/snapshots/2015-06-23/subvol1/dir770/21+/3065.wav"
0.0     54.9M
"/test_btrfs/snapshots/2015-06-23/subvol1/dir770.sav/21/3065.wav"
0.0     54.9M   "/test_btrfs/snapshots/2015-07-05/subvol1/dir770/21+/3065.wav"
0.0     54.9M
"/test_btrfs/snapshots/2015-07-05/subvol1/dir770.sav/21/3065.wav"
0.0     54.9M   "/test_btrfs/subvol1/dir770/21+/3065.wav"
0.0     54.9M   "/test_btrfs/subvol1/dir770.sav/21/3065.wav"
....
[0x9b7800] Try to dedupe extents with id 572cbc6e
[0x9b7800] Dedupe 1 extents (id: 572cbc6e) with target: (0.0, 54.9M),
"/test_btrfs/snapshots/2015-06-13/subvol1/dir770/21+/3065.wav"
....
Kernel processed data (excludes target files): 49.8G
Comparison of extent info shows a net change in shared extents of: 0.0

# show-shared-extents $file | head -5:

show-shared-extents
/test_btrfs/snapshots/2015-06-13/subvol1/dir770/21+/3065.wav:
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
8178629230592, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
8178629300224, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 131072, fe_physical:
8178629369856, fe_flags: 0x2008 (encoded shared )
(fiemap) [3] fe_logical: 393216, fe_length: 131072, fe_physical:
8178629451776, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 524288, fe_length: 131072, fe_physical:
8178629533696, fe_flags: 0x2008 (encoded shared )
show-shared-extents
/test_btrfs/snapshots/2015-06-13/subvol1/dir770.sav/21/3065.wav:
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
8178629230592, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
8178629300224, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 131072, fe_physical:
8178629369856, fe_flags: 0x2008 (encoded shared )
(fiemap) [3] fe_logical: 393216, fe_length: 131072, fe_physical:
8178629451776, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 524288, fe_length: 131072, fe_physical:
8178629533696, fe_flags: 0x2008 (encoded shared )
show-shared-extents
/test_btrfs/snapshots/2015-06-16/subvol1/dir770/21+/3065.wav:
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
8178629230592, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
8178629300224, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 131072, fe_physical:
8178629369856, fe_flags: 0x2008 (encoded shared )
(fiemap) [3] fe_logical: 393216, fe_length: 131072, fe_physical:
8178629451776, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 524288, fe_length: 131072, fe_physical:
8178629533696, fe_flags: 0x2008 (encoded shared )
show-shared-extents
/test_btrfs/snapshots/2015-06-16/subvol1/dir770.sav/21/3065.wav:
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
7960106323968, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
7960106393600, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 131072, fe_physical:
7960106463232, fe_flags: 0x2008 (encoded shared )
(fiemap) [3] fe_logical: 393216, fe_length: 131072, fe_physical:
7960106545152, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 524288, fe_length: 131072, fe_physical:
7960106627072, fe_flags: 0x2008 (encoded shared )
show-shared-extents
/test_btrfs/snapshots/2015-06-20/subvol1/dir770/21+/3065.wav:
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
8178629230592, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
8178629300224, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 131072, fe_physical:
8178629369856, fe_flags: 0x2008 (encoded shared )
(fiemap) [3] fe_logical: 393216, fe_length: 131072, fe_physical:
8178629451776, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 524288, fe_length: 131072, fe_physical:
8178629533696, fe_flags: 0x2008 (encoded shared )
show-shared-extents
/test_btrfs/snapshots/2015-06-20/subvol1/dir770.sav/21/3065.wav:
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
7960106323968, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
7960106393600, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 131072, fe_physical:
7960106463232, fe_flags: 0x2008 (encoded shared )
(fiemap) [3] fe_logical: 393216, fe_length: 131072, fe_physical:
7960106545152, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 524288, fe_length: 131072, fe_physical:
7960106627072, fe_flags: 0x2008 (encoded shared )
show-shared-extents
/test_btrfs/snapshots/2015-06-23/subvol1/dir770/21+/3065.wav:
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
8178629230592, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
8178629300224, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 131072, fe_physical:
8178629369856, fe_flags: 0x2008 (encoded shared )
(fiemap) [3] fe_logical: 393216, fe_length: 131072, fe_physical:
8178629451776, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 524288, fe_length: 131072, fe_physical:
8178629533696, fe_flags: 0x2008 (encoded shared )
show-shared-extents
/test_btrfs/snapshots/2015-06-23/subvol1/dir770.sav/21/3065.wav:
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
7960106323968, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
7960106393600, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 131072, fe_physical:
7960106463232, fe_flags: 0x2008 (encoded shared )
(fiemap) [3] fe_logical: 393216, fe_length: 131072, fe_physical:
7960106545152, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 524288, fe_length: 131072, fe_physical:
7960106627072, fe_flags: 0x2008 (encoded shared )
show-shared-extents
/test_btrfs/snapshots/2015-07-05/subvol1/dir770/21+/3065.wav:
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
8178629230592, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
8178629300224, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 131072, fe_physical:
8178629369856, fe_flags: 0x2008 (encoded shared )
(fiemap) [3] fe_logical: 393216, fe_length: 131072, fe_physical:
8178629451776, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 524288, fe_length: 131072, fe_physical:
8178629533696, fe_flags: 0x2008 (encoded shared )
show-shared-extents
/test_btrfs/snapshots/2015-07-05/subvol1/dir770.sav/21/3065.wav:
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
7960106323968, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
7960106393600, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 131072, fe_physical:
7960106463232, fe_flags: 0x2008 (encoded shared )
(fiemap) [3] fe_logical: 393216, fe_length: 131072, fe_physical:
7960106545152, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 524288, fe_length: 131072, fe_physical:
7960106627072, fe_flags: 0x2008 (encoded shared )
show-shared-extents /test_btrfs/subvol1/dir770/21+/3065.wav:
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
8178629230592, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
8178629300224, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 131072, fe_physical:
8178629369856, fe_flags: 0x2008 (encoded shared )
(fiemap) [3] fe_logical: 393216, fe_length: 131072, fe_physical:
8178629451776, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 524288, fe_length: 131072, fe_physical:
8178629533696, fe_flags: 0x2008 (encoded shared )
show-shared-extents /test_btrfs/subvol1/dir770.sav/21/3065.wav:
(fiemap) [0] fe_logical: 0, fe_length: 131072, fe_physical:
7960106323968, fe_flags: 0x2008 (encoded shared )
(fiemap) [1] fe_logical: 131072, fe_length: 131072, fe_physical:
7960106393600, fe_flags: 0x2008 (encoded shared )
(fiemap) [2] fe_logical: 262144, fe_length: 131072, fe_physical:
7960106463232, fe_flags: 0x2008 (encoded shared )
(fiemap) [3] fe_logical: 393216, fe_length: 131072, fe_physical:
7960106545152, fe_flags: 0x2008 (encoded shared )
(fiemap) [4] fe_logical: 524288, fe_length: 131072, fe_physical:
7960106627072, fe_flags: 0x2008 (encoded shared )

Note that here we actually have 2 duplicate files
(subvol1/dir770/21+/3065.wav, subvol1/dir770.sav/21/3065.wav) and 5
snapshots of each of file. We attempted to dedupe all 12 instances at
once but according to fe_physical we still have 2 copies of the data.
Why is this happening?

-- 
משיח NOW!
Moshiach is coming very soon, prepare yourself!
יחי אדוננו מורינו ורבינו מלך המשיח לעולם ועד!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-07-08  5:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-06 21:54 running duperemove but no free space gain Mordechay Kaganer
2015-07-06 22:34 ` Mark Fasheh
2015-07-06 23:03   ` Mordechay Kaganer
2015-07-06 23:07     ` Mark Fasheh
2015-07-07  6:27       ` Ryan Bourne
2015-07-07 13:14         ` Mordechay Kaganer
2015-07-08  5:57           ` Mordechay Kaganer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).