public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Robert Collins <robertc@robertcollins.net>
To: linux-btrfs@vger.kernel.org
Subject: BackupPC, per-dir hard link limit, Debian packaging
Date: Tue, 02 Mar 2010 13:29:05 +1100	[thread overview]
Message-ID: <1267496945.9222.155.camel@lifeless-64> (raw)

[-- Attachment #1: Type: text/plain, Size: 1336 bytes --]

I realise that the hard link limit is in the queue to fix, and I read
the recent thread as well as the older (october I think) thread.

I just wanted to note that BackupPC *does* in fact run into the hard
link limit, and its due to the dpkg configuration scripts.

BackupPC hard links files with the same content together by scanning new
files and linking them together, whether or not they started as a hard
link in the backed up source PCs.

It also builds a directory structure precisely matching the source
machine (basically it rsyncs across, then hardlinks aggressively).

If you back up a Debian host, /var/lib/dpkg/info contains many identical
files because debhelper generates the same script in the common case:
ls /var/lib/dpkg/info/*.postinst | xargs -n1 sha1sum | awk '{ print
$1 }' | sort -u | wc -l
862
ls /var/lib/dpkg/info/*.postinst | wc -l
1533

As I say, I realise this is queued to get addressed anyway, but it seems
like a realistic thing for people to do (use BackupPC on btrfs) - even
if something better still can be written to replace the BackupPC store
in the future. I will note though, that simple snapshots won't achieve
the deduplication level that BackupPC does, because the fils don't start
out as the same: they are identified as being identical post-backup.

Cheers,
Rob

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

             reply	other threads:[~2010-03-02  2:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-02  2:29 Robert Collins [this message]
2010-03-02 13:09 ` BackupPC, per-dir hard link limit, Debian packaging Hubert Kario
2010-03-02 23:22   ` jim owens
2010-03-03  0:05     ` Hubert Kario

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1267496945.9222.155.camel@lifeless-64 \
    --to=robertc@robertcollins.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox