From: Marc MERLIN <marc@merlins.org>
To: Brendan Hide <brendan@swiftspirit.co.za>
Cc: Scott Middleton <scott@assuretek.com.au>,
linux-btrfs@vger.kernel.org, Mark Fasheh <mfasheh@suse.de>
Subject: Re: historical backups with hardlinks vs cp --reflink vs snapshots
Date: Tue, 20 May 2014 20:59:28 -0700 [thread overview]
Message-ID: <20140521035928.GW10656@merlins.org> (raw)
In-Reply-To: <537A2AD5.9050507@swiftspirit.co.za>
On Mon, May 19, 2014 at 06:01:25PM +0200, Brendan Hide wrote:
> On 19/05/14 15:00, Scott Middleton wrote:
> >On 19 May 2014 09:07, Marc MERLIN <marc@merlins.org> wrote:
> >>On Wed, May 14, 2014 at 11:36:03PM +0800, Scott Middleton wrote:
> >>>I read so much about BtrFS that I mistaked Bedup with Duperemove.
> >>>Duperemove is actually what I am testing.
> >>I'm currently using programs that find files that are the same, and
> >>hardlink them together:
> >>http://marc.merlins.org/perso/linux/post_2012-05-01_Handy-tip-to-save-on-inodes-and-disk-space_-finddupes_-fdupes_-and-hardlink_py.html
> >>
> >>hardlink.py actually seems to be the faster (memory and CPU) one event
> >>though it's in python.
> >>I can get others to run out of RAM on my 8GB server easily :(
>
> Interesting app.
>
> An issue with hardlinking (with the backups use-case, this problem isn't likely to happen), is that if you modify a file, all the hardlinks get changed along with it - including the ones that you don't want changed.
>
> @Marc: Since you've been using btrfs for a while now I'm sure you've already considered whether or not a reflink copy is the better/worse option.
Yes, I have indeed considered it :)
I just wrote a blog post about the 3 way of doing historical snapshots:
http://marc.merlins.org/perso/btrfs/post_2014-05-20_Historical-Snapshots-With-Btrfs.html
I love reflink, but that forces me to use btrfs send as the only way to
copy a filesystem without losing the reflink relationship, and I have no
good way from user space to see the blocks shared to see how many are
shared or whether some just got duped in a copy.
As a result, for now I still use hardlinks.
Once bedup is a bit more ready, I may switch.
That said, duperemove is another dedup I wasn't aware of and I should
look at indeed:
https://github.com/markfasheh/duperemove/blob/master/README
Does it basically do the same work then bedup and tell btrfs to
consolidate blocks it indentified as dupes?
Does it work across subvolumes?
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
next prev parent reply other threads:[~2014-05-21 3:59 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-12 12:27 send/receive and bedup Scott Middleton
2014-05-14 13:20 ` Duncan
2014-05-14 15:36 ` Scott Middleton
2014-05-19 1:07 ` Marc MERLIN
2014-05-19 13:00 ` Scott Middleton
2014-05-19 16:01 ` Brendan Hide
2014-05-19 17:12 ` Konstantinos Skarlatos
2014-05-19 17:55 ` Mark Fasheh
2014-05-19 17:59 ` Austin S Hemmelgarn
2014-05-19 18:27 ` Mark Fasheh
2014-05-19 17:38 ` Mark Fasheh
2014-05-19 22:07 ` Konstantinos Skarlatos
2014-05-20 11:12 ` Scott Middleton
2014-05-20 22:37 ` Mark Fasheh
2014-05-20 22:56 ` Konstantinos Skarlatos
2014-05-21 0:58 ` Chris Murphy
2014-05-23 15:48 ` Konstantinos Skarlatos
2014-05-23 16:24 ` Chris Murphy
2014-05-21 3:59 ` Marc MERLIN [this message]
2014-05-22 4:24 ` historical backups with hardlinks vs cp --reflink vs snapshots Russell Coker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140521035928.GW10656@merlins.org \
--to=marc@merlins.org \
--cc=brendan@swiftspirit.co.za \
--cc=linux-btrfs@vger.kernel.org \
--cc=mfasheh@suse.de \
--cc=scott@assuretek.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.