All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gabriel <g2p.code@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: [RFC] Systemcall for offline deduplication
Date: Fri, 26 Oct 2012 16:21:29 +0000 (UTC)	[thread overview]
Message-ID: <k6eda8$1ad$2@ger.gmane.org> (raw)
In-Reply-To: k6ec1a$1ad$1@ger.gmane.org

>> As for online dedupe (which seems useful for reducing writes), would it
>> be useful if one could, given a write request, compare each of the
>> dirty pages in that request against whatever else the fs has loaded in
>> the page cache, and try to dedupe against that?  We could probably
>> speed up the search by storing hashes of whatever we have in the page
>> cache and using that to find candidates for the memcmp() test.  This of
>> course is not a comprehensive solution, but (a)
>> we combine it with offline dedupe later and (b) we don't make a disk
>> write out data that we've recently read or written.  Obviously you'd
>> want to be able to opt-in to this sort of thing with an inode flag or
>> something.
> 
> That's another kettle of fish, and will require an entirely different
> approach. ZFS has some experience doing that. While their implementation
> may reduce writes it is at the cost of storing hashes of every block in
> RAM.

Though your proposal is quite different from the ZFS thing, and might 
actually be useful for a larger public, so forget I said anything about 
it.



      reply	other threads:[~2012-10-26 16:21 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-15 17:09 Systemcall for offline deduplication Bob Marley
2012-10-15 20:15 ` David Sterba
2012-10-17 11:39   ` [RFC] " Gabriel
2012-10-26  6:26     ` Darrick J. Wong
2012-10-26 15:59       ` Gabriel
2012-10-26 16:21         ` Gabriel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='k6eda8$1ad$2@ger.gmane.org' \
    --to=g2p.code@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.