From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: [RFC/PATCH 4/4] inexact rename detection eye candy Date: Sun, 20 Feb 2011 04:48:04 -0500 Message-ID: <20110220094803.GA988@sigill.intra.peff.net> References: <20110219101936.GB20577@sigill.intra.peff.net> <20110219102533.GD22508@sigill.intra.peff.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: Junio C Hamano , Martin Langhoff , Git Mailing List To: Linus Torvalds X-From: git-owner@vger.kernel.org Sun Feb 20 10:48:24 2011 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Pr5tw-0004H7-Ch for gcvg-git-2@lo.gmane.org; Sun, 20 Feb 2011 10:48:24 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752733Ab1BTJsJ (ORCPT ); Sun, 20 Feb 2011 04:48:09 -0500 Received: from xen6.gtisc.gatech.edu ([143.215.130.70]:45177 "EHLO peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752652Ab1BTJsH (ORCPT ); Sun, 20 Feb 2011 04:48:07 -0500 Received: (qmail 16458 invoked by uid 111); 20 Feb 2011 09:48:06 -0000 Received: from 99-108-226-0.lightspeed.iplsin.sbcglobal.net (HELO sigill.intra.peff.net) (99.108.226.0) (smtp-auth username relayok, mechanism cram-md5) by peff.net (qpsmtpd/0.40) with ESMTPA; Sun, 20 Feb 2011 09:48:06 +0000 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Sun, 20 Feb 2011 04:48:04 -0500 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Sat, Feb 19, 2011 at 07:57:57AM -0800, Linus Torvalds wrote: > > I made it update progress for each of the rename_src * rename_dst > > similarity estimates. We could just as easily count rename_dst items we > > look at, but hey, it's eye candy, and obviously bigger numbers are > > better. > > Uhh. My only big reaction to your patch was literally "why don't you > just do it on the 'dst' items". I really don't think bigger numbers > are better, and if you have _so_ many sources that each dst takes so > long that you'd want updates at that granularity, you're too screwed > anyway. > > Don't make the "update progress" be part of the O(n^2) problem. I timed it and it's not. The progress code is smart enough not to actually print anything more than once per second. So it's just an extra function call per loop, which is dwarfed by the massive estimate_similarity(). It's really not that tight a loop. That being said, the output seems a little smoother to me hoisting it out, so I've put that in my re-roll. The bigger numbers are worth keeping, IMHO, as they are a more accurate reflection of how much work is being done. You do the same amount of work with 2 dests and 1000 sources as you do with 1000 dests and 2 sources. But one will count to 1000, and the other will count to 2. They should probably both count to 2000, the number of estimate_similarity() calls we must make, which is the expensive part. (Actually that is not quite accurate, as we may skip some calls for destinations already found, but it's not worth the effort to figure out how many calls we'll actually make ahead of time). So here is the re-roll, which I think is probably OK for inclusion. It replaces 4/4 from my last series. [1/3]: add inexact rename detection progress infrastructure [2/3]: merge: enable progress reporting for rename detection [3/3]: pull: propagate --progress to merge -Peff