From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp1040.oracle.com ([141.146.126.69]:31103 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753904AbcKHQ5T (ORCPT ); Tue, 8 Nov 2016 11:57:19 -0500 Date: Tue, 8 Nov 2016 08:57:06 -0800 From: "Darrick J. Wong" To: "Austin S. Hemmelgarn" Cc: Christoph Anton Mitterer , dsterba@suse.cz, James Pharaoh , linux-btrfs@vger.kernel.org, mark@fasheh.com Subject: Re: Announcing btrfs-dedupe Message-ID: <20161108165706.GB16801@birch.djwong.org> References: <2855552b-714c-d1de-08f9-89153c293772@wellbehavedsoftware.com> <20161107140200.GM12522@suse.cz> <1478572812.28957.4.camel@scientia.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Tue, Nov 08, 2016 at 08:26:02AM -0500, Austin S. Hemmelgarn wrote: > On 2016-11-07 21:40, Christoph Anton Mitterer wrote: > >On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote: > >>I think adding a whole-file dedup mode to duperemove would be better > >>(from user's POV) than writing a whole new tool > > > >What would IMO be really good from a user's POV was, if one of the > >tools, deemed to be the "best", would be added to the btrfs-progs and > >simply become "the official" one. > > The problem is that for deduplication, most tools won't work well for > everything. For example the cases I use it in are very specific and have > horrible performance using pretty much any available tool (I have a couple > cases where I have disjoint subsets of the same directory tree with > different prefixes, so I can tell exactly which files are duplicated, and > that any duplicate file is 100% duplicate, as well as a couple of cases > where changes are small, scattered, and highly predictable (and thus it's > easier to find what's changed and dedupe everything else instead of finding > what's the same), and none of the existing options do well in either > situation). > > I'd argue at minimum for having the extent-same tool from duperemove in > btrfs-progs, as that lets people do deduplication how they want without > having to write C code. Something equivalent that would let you call any > BTRFS ioctl with (reasonably) arbitrary arguments might actually be even > better (I can see such a tool being wonderful for debugging). Since xfsprogs 4.3, xfs_io has a 'dedupe' command that can talk to FIDEDUPERANGE (f.k.a. EXTENT SAME): $ xfs_io -c '/mnt/srcfile srcoffset dstoffset length' /mnt/destfile --D > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html