From: Gabriel de Perthuis <g2p.code@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: Possible to dedpulicate read-only snapshots for space-efficient backups
Date: Tue, 7 May 2013 22:07:39 +0000 (UTC) [thread overview]
Message-ID: <kmbtvb$all$1@ger.gmane.org> (raw)
In-Reply-To: 64hi5a-9rq.ln1@hurikhan.ath.cx
> Do you plan to support deduplication on a finer grained basis than file
> level? As an example, in the end it could be interesting to deduplicate 1M
> blocks of huge files. Backups of VM images come to my mind as a good
> candidate. While my current backup script[1] takes care of this by using
> "rsync --inplace" it won't consider files moved between two backup cycles.
> This is the main purpose I'm using bedup for on my backup drive.
>
> Maybe you could define another cutoff value to consider huge files for
> block-level deduplication?
I'm considering deduplicating aligned blocks of large files sharing the
same size (VMs with the same baseline. Those would ideally come
pre-cowed, but rsync or scp could have broken that).
It sounds simple, and was sort-of prompted by the new syscall taking
short ranges, but it is tricky figuring out a sane heuristic (when to
hash, when to bail, when to submit without comparing, what should be the
source in the last case), and it's not something I have an immediate
need for. It is also possible to use 9p (with standard cow and/or
small-file dedup) and trade a bit of configuration for much more
space-efficient VMs.
Finer-grained tracking of which ranges have changed, and maybe some
caching of range hashes, would be a good first step before doing any
crazy large-file heuristics. The hash caching would actually benefit
all use cases.
> Regards,
> Kai
>
> [1]: https://gist.github.com/kakra/5520370
next prev parent reply other threads:[~2013-05-07 22:07 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-05 10:07 Possible to dedpulicate read-only snapshots for space-efficient backups Kai Krakow
2013-05-05 12:55 ` Gabriel de Perthuis
2013-05-05 17:22 ` Kai Krakow
2013-05-07 22:07 ` Gabriel de Perthuis [this message]
2013-05-07 23:04 ` Kai Krakow
2013-05-07 23:22 ` Kai Krakow
2013-05-07 23:35 ` Possible to deduplicate " Gabriel de Perthuis
2013-05-06 6:15 ` Possible to dedpulicate " Jan Schmidt
2013-05-06 7:44 ` Kai Krakow
2013-05-06 14:35 ` james northrup
2013-05-06 20:48 ` Kai Krakow
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='kmbtvb$all$1@ger.gmane.org' \
--to=g2p.code@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.