From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cantor2.suse.de ([195.135.220.15]:34809 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750737AbaETWhG (ORCPT ); Tue, 20 May 2014 18:37:06 -0400 Date: Tue, 20 May 2014 15:37:02 -0700 From: Mark Fasheh To: Konstantinos Skarlatos Cc: Brendan Hide , Scott Middleton , linux-btrfs@vger.kernel.org Subject: Re: send/receive and bedup Message-ID: <20140520223702.GQ27178@wotan.suse.de> Reply-To: Mark Fasheh References: <20140519010705.GI10566@merlins.org> <537A2AD5.9050507@swiftspirit.co.za> <20140519173854.GN27178@wotan.suse.de> <537A80B6.9080202@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <537A80B6.9080202@gmail.com> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Tue, May 20, 2014 at 01:07:50AM +0300, Konstantinos Skarlatos wrote: >> Duperemove will be shipping as supported software in a major SUSE release so >> it will be bug fixed, etc as you would expect. At the moment I'm very busy >> trying to fix qgroup bugs so I haven't had much time to add features, or >> handle external bug reports, etc. Also I'm not very good at advertising my >> software which would be why it hasn't really been mentioned on list lately >> :) >> >> I would say that state that it's in is that I've gotten the feature set to a >> point which feels reasonable, and I've fixed enough bugs that I'd appreciate >> folks giving it a spin and providing reasonable feedback. > Well, after having good results with duperemove with a few gigs of data, i > tried it on a 500gb subvolume. After it scanned all files, it is stuck at > 100% of one cpu core for about 5 hours, and still hasn't done any deduping. > My cpu is an Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz, so i guess thats > not the problem. So I guess the speed of duperemove drops dramatically as > data volume increases. Yeah I doubt it's your CPU. Duperemove is right now targeted at smaller data sets (a few VMS, iso images, etc) than you threw it at as you undoubtedly have figured out. It will need a bit of work before it can handle entire file systems. My guess is that it was spending an enormous amount of time finding duplicates (it has a very thorough check that could probably be optimized). For what it's worth, handling larger data sets is the type of work I want to be doing on it in the future. --Mark -- Mark Fasheh