linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mordechay Kaganer <mkaganer@gmail.com>
To: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: running duperemove but no free space gain
Date: Tue, 7 Jul 2015 16:14:24 +0300	[thread overview]
Message-ID: <CA+xOVSN1v5v9z3=zP8E6phTV0c54pOCf7Rh10EBR4zkkFNVo4A@mail.gmail.com> (raw)
In-Reply-To: <559B7158.6050404@spotprint.com.au>

B.H.

On Tue, Jul 7, 2015 at 9:27 AM, Ryan Bourne <hub@spotprint.com.au> wrote:
> To clarify, if I did the following:
>
> # btrfs subvolume create a
> # dd bs=1M count=10 if=/dev/urandom of=a/1
> # dd if=a/1 of=a/2
> # btrfs subvolume snapshot a b
>
> then I have four files containing the same data. a/1, b/1 share extents and a/2, b/2 share extents.
>
> If I then deduplicate a/1 and a/2 will all four files be sharing extents, or only three? (Assuming I have the patches for 4.2)
>

OK, i did a test almost exactly as you have suggested. It appears that
dedupe does not affect the "b" snapshot so only 3 of 4 files are
deduped, which explains no free space gain as the duplicate data is
still used.

Here's the log - fe_physical/fe_length can be used to figure out what
is actually deduped:

; Setup:

# btrfs sub create a
# dd bs=128K count=8 if=/dev/urandom of=a/1
# dd if=a/1 of=a/2
# btrfs subvolume snapshot a b

; Before dedupe:

# show-shared-extents a/1 a/2 b/1 b/2

(fiemap) [0] fe_logical: 0, fe_length: 524288, fe_physical:
3632062464, fe_flags: 0x2000 (shared )
(fiemap) [1] fe_logical: 524288, fe_length: 524288, fe_physical:
3632586752, fe_flags: 0x2001 (last shared )
a/1: 1048576 shared bytes

(fiemap) [0] fe_logical: 0, fe_length: 524288, fe_physical:
3633111040, fe_flags: 0x2000 (shared )
(fiemap) [1] fe_logical: 524288, fe_length: 524288, fe_physical:
3633635328, fe_flags: 0x2001 (last shared )
a/2: 1048576 shared bytes

(fiemap) [0] fe_logical: 0, fe_length: 1048576, fe_physical:
3632062464, fe_flags: 0x2001 (last shared )
b/1: 1048576 shared bytes

(fiemap) [0] fe_logical: 0, fe_length: 1048576, fe_physical:
3633111040, fe_flags: 0x2001 (last shared )
b/2: 1048576 shared bytes

; Dedupe:
# duperemove -d a/1 a/2
Using 128K blocks
Using hash: murmur3
Using 4 threads for file hashing phase
csum: a/1       [1/2] (50.00%)
csum: a/2       [2/2] (100.00%)
Hashing completed. Calculating duplicate extents - this may take some time.
[########################################]
Search completed with no errors.
Simple read and compare of file data found 1 instances of extents that
might benefit from deduplication.
Showing 2 identical extents with id 7ec588f6
Start           Length          Filename
0       1048576 "a/2"
0       1048576 "a/1"
Using 4 threads for dedupe phase
[0x1e42540] Try to dedupe extents with id 7ec588f6
[0x1e42540] Dedupe 1 extents (id: 7ec588f6) with target: (0, 1048576), "a/2"
Kernel processed data (excludes target files): 1048576
Comparison of extent info shows a net change in shared extents of: 0

; After dedupe:
# show-shared-extents a/1 a/2 b/1 b/2

(fiemap) [0] fe_logical: 0, fe_length: 524288, fe_physical:
3633111040, fe_flags: 0x2000 (shared )
(fiemap) [1] fe_logical: 524288, fe_length: 524288, fe_physical:
3633635328, fe_flags: 0x2001 (last shared )
a/1: 1048576 shared bytes

(fiemap) [0] fe_logical: 0, fe_length: 524288, fe_physical:
3633111040, fe_flags: 0x2000 (shared )
(fiemap) [1] fe_logical: 524288, fe_length: 524288, fe_physical:
3633635328, fe_flags: 0x2001 (last shared )
a/2: 1048576 shared bytes

(fiemap) [0] fe_logical: 0, fe_length: 1048576, fe_physical:
3632062464, fe_flags: 0x1 (last )
b/1: 0 shared bytes

(fiemap) [0] fe_logical: 0, fe_length: 1048576, fe_physical:
3633111040, fe_flags: 0x2001 (last shared )
b/2: 1048576 shared bytes


b/1 was not affected by duperemove. As far as i understand, after
creating snapshot the dedupe operation actually modifies the metadata
of a/1 and/or a/2 which causes it to be COWed so b's data is not
affected.

The conclusion is: to actually reclaim the duplicated space you have
to include all snapshots that may point to the file.

-- 
משיח NOW!
Moshiach is coming very soon, prepare yourself!
יחי אדוננו מורינו ורבינו מלך המשיח לעולם ועד!

  reply	other threads:[~2015-07-07 13:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-06 21:54 running duperemove but no free space gain Mordechay Kaganer
2015-07-06 22:34 ` Mark Fasheh
2015-07-06 23:03   ` Mordechay Kaganer
2015-07-06 23:07     ` Mark Fasheh
2015-07-07  6:27       ` Ryan Bourne
2015-07-07 13:14         ` Mordechay Kaganer [this message]
2015-07-08  5:57           ` Mordechay Kaganer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+xOVSN1v5v9z3=zP8E6phTV0c54pOCf7Rh10EBR4zkkFNVo4A@mail.gmail.com' \
    --to=mkaganer@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).