From: Hubert Kario <hka@qbs.com.pl>
To: jim owens <owens6336@gmail.com>
Cc: linux-btrfs@vger.kernel.org, Robert Collins <robertc@robertcollins.net>
Subject: Re: BackupPC, per-dir hard link limit, Debian packaging
Date: Wed, 3 Mar 2010 01:05:48 +0100 [thread overview]
Message-ID: <201003030105.49442.hka@qbs.com.pl> (raw)
In-Reply-To: <4B8D9DB7.7090207@gmail.com>
On Wednesday 03 March 2010 00:22:31 jim owens wrote:
> Hubert Kario wrote:
> > On Tuesday 02 March 2010 03:29:05 Robert Collins wrote:
> >> As I say, I realise this is queued to get addressed anyway, but it=
seems
> >> like a realistic thing for people to do (use BackupPC on btrfs) - =
even
> >> if something better still can be written to replace the BackupPC s=
tore
> >> in the future. I will note though, that simple snapshots won't ach=
ieve
> >> the deduplication level that BackupPC does, because the fils don't=
start
> >> out as the same: they are identified as being identical post-backu=
p.
> >=20
> > Isn't the main idea behind deduplication to merge identical parts o=
f
> > files together using cow? This way you could have many very similar
> > images of virtual machines, run the deduplication process and reduc=
e
> > massively the space used while maintaining the differences between
> > images.
> >=20
> > If memory serves me right, the plan is to do it in userland on a
> > post-fact filesystem, not when the data is being saved. If such a d=
aemon
> > or program was available you would run it on the system after rsync=
ing
> > the workstations.
> >=20
> > Though the question remains which system would reduce space usage m=
ore in
> > your use case. From my experience, hardlinks take less space on dis=
k, I
> > don't know whatever it could be possible to optimise btrfs cow syst=
em
> > for files that are exactly the same.
>=20
> Space use is not the key difference between these methods.
> The btrfs COW makes data sharing safe. The hard link method
> means changing a file invalidates the content of all linked files.
>=20
> So a BackupPC output should be read-only.
I know that, but if you're using "dumb" tools to replicate systems (say=
=20
rsync), you don't want them to overwrite different versions of files an=
d you=20
still want to reclaim disk space used by essentially the same data.
My idea behind btrfs as backup storage and using cow not hardlinks for=20
duplicated files comes from need to keep archival copies (something not=
really=20
possible with hardlinks) in a way similar to rdiff-backup.
As first backup I just rsync to backup server from all workstations.
But on subsequent backups I copy the last version to a .snapshot/todays=
-date =20
directory using cow, rsync from workstations and then run deduplication=
=20
daemon.
This way I get both reduced storage and old copies (handy for user home=
=20
directories...).
With such use-case, the ability to use cow while needing similar amount=
s of=20
space as hardlinks would be at least useful if not very desired.
That's why I asked if it's possible to optimise btrfs cow mechanism for=
=20
identical files.
=46rom my testing (directory 584MiB in size, 17395 files, Arch kernel 2=
=2E6.32.9,=20
coreutils 8.4, btrfs-progs 0.19, 10GiB partition, default mkfs and moun=
t=20
options):
cp -al
free space decrease: 6176KiB
cp -a --reflink=3Dalways
free space decrease: 23296KiB
and in the second run:
cp -al
free space decrease: 6064KiB
cp -a --reflink=3Dalways
free space decrease: 23324KiB
that's nearly 4 times more!
--=20
Hubert Kario
QBS - Quality Business Software
ul. Ksawer=C3=B3w 30/85
02-656 Warszawa
POLAND
tel. +48 (22) 646-61-51, 646-74-24
fax +48 (22) 646-61-50
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2010-03-03 0:05 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-02 2:29 BackupPC, per-dir hard link limit, Debian packaging Robert Collins
2010-03-02 13:09 ` Hubert Kario
2010-03-02 23:22 ` jim owens
2010-03-03 0:05 ` Hubert Kario [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201003030105.49442.hka@qbs.com.pl \
--to=hka@qbs.com.pl \
--cc=linux-btrfs@vger.kernel.org \
--cc=owens6336@gmail.com \
--cc=robertc@robertcollins.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.