On 2014-08-01 09:23, David Sterba wrote: > On Fri, Aug 01, 2014 at 06:17:44AM -0400, Austin S Hemmelgarn wrote: >> I do think however that having the option of a background thread doing >> deduplication asynchronously is a good idea, but then you would have to >> have some way to trigger it on individual files/trees, and triggering on >> writes like the autodefrag thread does doesn't make much sense. Having >> some userspace program to tell it to run on a given set of files would >> probably be the best approach for a trigger. I don't remember if this >> kind of thing was also included in the online deduplication patches that >> got posted a while back or not. > > IIRC the proposed implementation only merged new writes with existing > data. > > For the out-of-band ("off-line") dedup there's bedup > (https://github.com/g2p/bedup) or Mark's duperemove tool > (https://github.com/markfasheh/duperemove) that work on a set of files. > Something kernel-side to do the work asynchronously would be nice, especially if it could leverage the check-sums that BTRFS already stores for the blocks. Having a userspace interface for offline deduplication similar to that for scrub operations would even better.