From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Glanzmann Subject: Re: Data Deduplication with the help of an online filesystem check Date: Tue, 28 Apr 2009 22:15:53 +0200 Message-ID: <20090428201553.GJ7217@cip.informatik.uni-erlangen.de> References: <20090427033331.GC17677@cip.informatik.uni-erlangen.de> <1240839448.26451.13.camel@think.oraclecorp.com> <20090428155900.GA1722@cip.informatik.uni-erlangen.de> <1240939437.15136.23.camel@think.oraclecorp.com> <20090428173719.GD7217@cip.informatik.uni-erlangen.de> <1240940588.15136.31.camel@think.oraclecorp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-btrfs@vger.kernel.org To: Chris Mason Return-path: In-Reply-To: <1240940588.15136.31.camel@think.oraclecorp.com> List-ID: Hello Chris, > Yes, but for the purposes of dedup, it's not exactly what you want. > You want an index by checksum, and the current btrfs code indexes by > logical byte number in the disk. that would be good for online dedup, but in practice that is not going to work or I don't see how. > So you need an extra index either way. It makes sense to keep the > crc32c csums for fast verification of the data read from disk and only > use the expensive csums for dedup. I think that this should be part of a userland programm. It can take all the time it wants during weekends to find dedup blocks. > > Does that mean that I can dedup 4k blocks even if you use extents? > Yes. Perfect. :-) Thomas