From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ric Wheeler Subject: Re: Offline Deduplication for Btrfs Date: Mon, 10 Jan 2011 10:28:14 -0500 Message-ID: <4D2B258E.7010706@gmail.com> References: <1294245410-4739-1-git-send-email-josef@redhat.com> <4D24AD92.4070107@bobich.net> <1294276285-sup-9136@think> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Cc: Josef Bacik , BTRFS MAILING LIST To: Chris Mason Return-path: In-Reply-To: <1294276285-sup-9136@think> List-ID: I think that dedup has a variety of use cases that are all very dependent on your workload. The approach you have here seems to be a quite reasonable one. I did not see it in the code, but it is great to be able to collect statistics on how effective your hash is and any counters for the extra IO imposed. Also very useful to have a paranoid mode where when you see a hash collision (dedup candidate), you fall back to a byte-by-byte compare to verify that the the collision is correct. Keeping stats on how often this is a false collision would be quite interesting as well :) Ric