From mboxrd@z Thu Jan 1 00:00:00 1970 From: david@lang.hm Subject: Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS Date: Tue, 19 Jun 2007 11:20:29 -0700 (PDT) Message-ID: References: <20070612161029.GB28279@think.oraclecorp.com> <4676C2D6.8030708@vlnb.net> <46779DB1.7060807@draigBrady.com> <4677A972.6030909@vlnb.net> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: =?ISO-8859-1?Q?P=E1draig_Brady?= , Chris Mason , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org To: Vladislav Bolkhovitin Return-path: In-Reply-To: <4677A972.6030909@vlnb.net> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Tue, 19 Jun 2007, Vladislav Bolkhovitin wrote: >> > 3. De-de-duplicate blocks on disk, i.e. copy them on write >> > >> > I suppose that de-duplication itself would be done by some user space >> > process that would scan files, determine blocks with the same data and >> > then de-duplicate them by using syscall or IOCTL (2). >> > >> > That would be very usable feature, which in most cases would allow to >> > shrink occupied disk space on 50-90%. >> >> Have you references for this number? > > No, I've seen it somewhere and it well confirms with my own observations. > >> In my experience one gets a lot of benefit from >> the much simpler process of "de-duplication" of files. > > Yes, sure, de-duplication on files level brings its benefits, but on FS > blocks level it would bring ever more benefits, because there are many more > or less big files, which are different as a whole, but with a lot of the same > blocks. Simple example of such files is UNIX-style mail boxes on a mail > server. unix style mail boxes would not be a good example of wins for sector-based de-duplication since the duplicate mail is not going to be sector aligned. David Lang