All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kai Krakow <hurikhan77+btrfs@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: Possible to dedpulicate read-only snapshots for space-efficient backups
Date: Wed, 08 May 2013 01:22:05 +0200	[thread overview]
Message-ID: <steo5a-lpe.ln1@hurikhan.ath.cx> (raw)
In-Reply-To: 9tdo5a-hde.ln1@hurikhan.ath.cx

Kai Krakow <hurikhan77+btrfs@gmail.com> schrieb:

> Gabriel de Perthuis <g2p.code@gmail.com> schrieb:
> 
>> It sounds simple, and was sort-of prompted by the new syscall taking
>> short ranges, but it is tricky figuring out a sane heuristic (when to
>> hash, when to bail, when to submit without comparing, what should be the
>> source in the last case), and it's not something I have an immediate
>> need for.  It is also possible to use 9p (with standard cow and/or
>> small-file dedup) and trade a bit of configuration for much more
>> space-efficient VMs.
>> 
>> Finer-grained tracking of which ranges have changed, and maybe some
>> caching of range hashes, would be a good first step before doing any
>> crazy large-file heuristics.  The hash caching would actually benefit
>> all use cases.
> 
> Looking back to good old peer-2-peer days (I think we all got in touch
> with that the one or the other way), one title pops back into my mind:
> tiger- tree-hash...
> 
> I'm not really into it, but would it be possible to use tiger-tree-hashes
> to find identical blocks? Even accross different sized files...

While thinking about it: That hash was probably invented for the purpose of 
distributing the same content to multiple peers in as small deltas as 
possible. Well, deduplication is somehow the other way around: Coalescing 
all those wild distribution back into a single source of content. So some 
"inverse" of tiger-tree would probably work better / more efficient.

Regards,
Kai


  reply	other threads:[~2013-05-07 23:24 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-05 10:07 Possible to dedpulicate read-only snapshots for space-efficient backups Kai Krakow
2013-05-05 12:55 ` Gabriel de Perthuis
2013-05-05 17:22   ` Kai Krakow
2013-05-07 22:07     ` Gabriel de Perthuis
2013-05-07 23:04       ` Kai Krakow
2013-05-07 23:22         ` Kai Krakow [this message]
2013-05-07 23:35         ` Possible to deduplicate " Gabriel de Perthuis
2013-05-06  6:15 ` Possible to dedpulicate " Jan Schmidt
2013-05-06  7:44   ` Kai Krakow
2013-05-06 14:35     ` james northrup
2013-05-06 20:48       ` Kai Krakow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=steo5a-lpe.ln1@hurikhan.ath.cx \
    --to=hurikhan77+btrfs@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.