public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Hubert Kario <hka@qbs.com.pl>
To: jim owens <owens6336@gmail.com>
Cc: linux-btrfs@vger.kernel.org, Robert Collins <robertc@robertcollins.net>
Subject: Re: BackupPC, per-dir hard link limit, Debian packaging
Date: Wed, 3 Mar 2010 01:05:48 +0100	[thread overview]
Message-ID: <201003030105.49442.hka@qbs.com.pl> (raw)
In-Reply-To: <4B8D9DB7.7090207@gmail.com>

On Wednesday 03 March 2010 00:22:31 jim owens wrote:
> Hubert Kario wrote:
> > On Tuesday 02 March 2010 03:29:05 Robert Collins wrote:
> >> As I say, I realise this is queued to get addressed anyway, but it=
 seems
> >> like a realistic thing for people to do (use BackupPC on btrfs) - =
even
> >> if something better still can be written to replace the BackupPC s=
tore
> >> in the future. I will note though, that simple snapshots won't ach=
ieve
> >> the deduplication level that BackupPC does, because the fils don't=
 start
> >> out as the same: they are identified as being identical post-backu=
p.
> >=20
> > Isn't the main idea behind deduplication to merge identical parts o=
f
> > files together using cow? This way you could have many very similar
> > images of virtual machines, run the deduplication process and reduc=
e
> > massively the space used while maintaining the differences between
> > images.
> >=20
> > If memory serves me right, the plan is to do it in userland on a
> > post-fact filesystem, not when the data is being saved. If such a d=
aemon
> > or program was available you would run it on the system after rsync=
ing
> > the workstations.
> >=20
> > Though the question remains which system would reduce space usage m=
ore in
> > your use case. From my experience, hardlinks take less space on dis=
k, I
> > don't know whatever it could be possible to optimise btrfs cow syst=
em
> > for files that are exactly the same.
>=20
> Space use is not the key difference between these methods.
> The btrfs COW makes data sharing safe.  The hard link method
> means changing a file invalidates the content of all linked files.
>=20
> So a BackupPC output should be read-only.

I know that, but if you're using "dumb" tools to replicate systems (say=
=20
rsync), you don't want them to overwrite different versions of files an=
d you=20
still want to reclaim disk space used by essentially the same data.

My idea behind btrfs as backup storage and using cow not hardlinks for=20
duplicated files comes from need to keep archival copies (something not=
 really=20
possible with hardlinks) in a way similar to rdiff-backup.

As first backup I just rsync to backup server from all workstations.
But on subsequent backups I copy the last version to a .snapshot/todays=
-date =20
directory using cow, rsync from workstations and then run deduplication=
=20
daemon.

This way I get both reduced storage and old copies (handy for user home=
=20
directories...).

With such use-case, the ability to use cow while needing similar amount=
s of=20
space as hardlinks would be at least useful if not very desired.

That's why I asked if it's possible to optimise btrfs cow mechanism for=
=20
identical files.

=46rom my testing (directory 584MiB in size, 17395 files, Arch kernel 2=
=2E6.32.9,=20
coreutils 8.4, btrfs-progs 0.19, 10GiB partition, default mkfs and moun=
t=20
options):
cp -al
free space decrease: 6176KiB

cp -a --reflink=3Dalways
free space decrease: 23296KiB

and in the second run:
cp -al
free space decrease: 6064KiB

cp -a --reflink=3Dalways
free space decrease: 23324KiB

that's nearly 4 times more!
--=20
Hubert Kario
QBS - Quality Business Software
ul. Ksawer=C3=B3w 30/85
02-656 Warszawa
POLAND
tel. +48 (22) 646-61-51, 646-74-24
fax +48 (22) 646-61-50
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      reply	other threads:[~2010-03-03  0:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-02  2:29 BackupPC, per-dir hard link limit, Debian packaging Robert Collins
2010-03-02 13:09 ` Hubert Kario
2010-03-02 23:22   ` jim owens
2010-03-03  0:05     ` Hubert Kario [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201003030105.49442.hka@qbs.com.pl \
    --to=hka@qbs.com.pl \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=owens6336@gmail.com \
    --cc=robertc@robertcollins.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox