From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tharp Subject: Re: Data Deduplication with the help of an online filesystem check Date: Wed, 29 Apr 2009 09:11:36 -0400 Message-ID: <49F85208.1090108@partiallystapled.com> References: <20090427033331.GC17677@cip.informatik.uni-erlangen.de> <200904281945.10274.hjclaes@web.de> <20090428201619.GK7217@cip.informatik.uni-erlangen.de> <200904282236.07428.hjclaes@web.de> <20090428205242.GA13112@cip.informatik.uni-erlangen.de> <1240952295.15136.73.camel@think.oraclecorp.com> <20090428211255.GB13112@cip.informatik.uni-erlangen.de> <1240953977.15136.76.camel@think.oraclecorp.com> <20090428221455.GA27794@cip.informatik.uni-erlangen.de> <1240960687.15136.88.camel@think.oraclecorp.com> <20090429120300.GG22917@cip.informatik.uni-erlangen.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Cc: Chris Mason , Heinz-Josef Claes , linux-btrfs@vger.kernel.org To: Thomas Glanzmann Return-path: In-Reply-To: <20090429120300.GG22917@cip.informatik.uni-erlangen.de> List-ID: Thomas Glanzmann wrote: > Looking at this picture, when I'm going to implement the dedup code, do I also > have to take care to spread the blocks over the different devices or is > there already infrastructure in place that automates that process? If you somehow had blocks duplicated exactly across two devices such that the deduplicator discarded all the blocks from one disk and kept all the blocks from the other disk, then there would be a problem. The best way to solve this would be to always keep the block on the least-full device and discard the rest, which shouldn't be too difficult (especially since this would be happening in kernelspace), but the cheapest solution is to randomize the choice which would be sufficient to prevent any further imbalance from developing. -- m. tharp